Most scoring systems are binary: right or wrong, pass or fail. They're easy to understand but they discard enormous amounts of information. When a doctor says "I'm 60% confident this is a viral infection," does that confidence matter? When an analyst says "I'm fairly certain this acquisition will fail," is "fairly certain" 70% or 95%?

The Brier Score is the mathematical tool that makes probability forecasts measurable and comparable. It was invented by meteorologist Glenn Brier in 1950 to evaluate weather forecasts, and it's now a core tool in forecasting research, clinical medicine, and any domain where expressing confidence is as important as expressing a conclusion.

The mathematics, simply

The Brier Score for a single prediction is calculated as:

BS = (probability − outcome)²

Where "probability" is your stated confidence (expressed as a number between 0 and 1) and "outcome" is what actually happened (1 if the thing occurred, 0 if it didn't).

Examples:

  • You say 80% confident, and you're right: BS = (0.8 − 1)² = 0.04
  • You say 80% confident, and you're wrong: BS = (0.8 − 0)² = 0.64
  • You say 50% confident, and you're right: BS = (0.5 − 1)² = 0.25
  • You say 50% confident, and you're wrong: BS = (0.5 − 0)² = 0.25

Lower scores are better. A Brier Score of 0 is perfect — you only said 100% when you were right and 0% when you were wrong. A Brier Score of 0.25 is what you'd get by always saying "50% likely" — the uninformed baseline for binary questions.

What good looks like

Philip Tetlock's Good Judgment Project, which ran for years with thousands of volunteer forecasters making predictions about geopolitical and economic events, provides the best real-world benchmarks:

  • Average forecaster: Brier Score around 0.20–0.22
  • Trained forecaster: 0.16–0.18 (after calibration training)
  • Superforecaster: 0.10–0.12 (top 2% consistently)
  • Intelligence community analysts: Around 0.15–0.17

These differences look small but are practically enormous. A Brier Score of 0.12 vs 0.20 means the better forecaster is right with much higher confidence when they're confident, and appropriately uncertain when the evidence is weak. That difference in decision quality, compounded across hundreds of decisions, changes outcomes.

Why squaring matters

The squaring in the Brier Score formula is important. It means that being confidently wrong is penalised much more harshly than being moderately wrong. If you say 90% confident and you're wrong, you get penalised (0.9)² = 0.81 — nearly the worst possible score. If you say 60% confident and you're wrong, you get (0.6)² = 0.36.

This creates the right incentive: say 90% only when you're genuinely 90% sure. Inflating confidence to seem decisive gets severely punished by the Brier Score in a way that binary right/wrong scoring never would.

How MindFrame uses it

MindFrame displays your Brier Score alongside your Calibration Error — the mean absolute gap between your stated confidence levels and your actual accuracy rates within each confidence bin. Together, these tell you not just how accurate you are but how accurately you know how accurate you are.

Your calibration chart shows the full picture: a reliability diagram plotting stated confidence on one axis against actual accuracy on the other. A perfectly calibrated forecaster's chart would be a diagonal line. Overconfident forecasters sit below the diagonal (their accuracy is lower than their stated confidence). Underconfident forecasters sit above it.

With enough sessions, you can see your reliability diagram shift toward the diagonal — a measurable, quantified improvement in a skill that most people have never even thought to train.