Confidence labels
What 'low', 'medium', 'high' confidence on an aggregation means and when to use which.
How we assign confidence
Confidence is a categorical label on every aggregation, based on three factors: sample size, source diversity, and time-window completeness. We use it as a UI dim/highlight signal — low-confidence aggregates are still computed and shown but are visually de-emphasized.
High confidence
Sample ≥ 100 observations, at least 4 distinct data sources contributing, and the time window is fully covered with no gaps > 14 days. High-confidence aggregates are used in indexes, time series, comparison tables, and CAGR.
Medium confidence
Sample ≥ 30 but < 100, or coverage from 2-3 sources, or with a gap of 14-30 days. Medium-confidence aggregates appear in variant statistics but are excluded from index construction.
Low confidence
Sample < 30, or single-source, or gaps > 30 days. We compute and display the aggregate but explicitly flag it. We never include low-confidence aggregates in index values or in CAGR calculations.
Other topics
- Median, percentiles and IQR · Why we publish median and percentiles instead of mean prices, and how to read th…
- Compound Annual Growth Rate (CAGR) · How we annualize price changes across 1, 3, and 5 year windows.…
- Volume weighting and index construction · How components contribute to a Bagonomics index: square-root weights, caps, and …
- Premium to retail · Definition of premium-to-retail, the country-of-retail convention, and how to in…