---
name: Probabilistic Forecasting
status: draft
territory: future-exploration
msi_territory: future-exploration
sources:
  - title: "Tetlock, Philip E. (2005), Expert Political Judgment: How Good Is It? How Can We Know?, Princeton University Press"
    url: https://openlibrary.org/works/OL5737028W
  - title: "Tetlock, Philip E. & Gardner, Dan (2015), Superforecasting: The Art and Science of Prediction, Crown"
    url: https://openlibrary.org/works/OL18233878W
---

# Probabilistic Forecasting

## Why it matters

Most claims about the future hide inside a word. Someone says an outcome is "likely", or "a real possibility", or that it "probably won't happen" — and everyone nods, having heard a different number. Worse, when the future arrives, nobody can say whether the forecaster was right, because "likely" is unfalsifiable: it covers everything from 55% to 95%, and whatever happens, it sort of fits. Probabilistic forecasting is the discipline of replacing that word with a number on a specific, resolvable event — and then keeping score, so that being wrong actually costs something and being right can be earned again.

For example: in 2026 you ask whether the Federal Reserve will cut its policy rate by at least 75 basis points before year-end. The vague answer is "probably not, given inflation." The forecast answer names the event precisely (cumulative cuts ≥ 75 bps, resolved by the December meeting), starts from a base rate (how often has the Fed moved that fast in a single year over the past few decades?), adjusts for what is specific to now (the labor-market and inflation data), and lands on a number with a range — say 30%, plausibly 20–40%. A year later the event either happened or it did not, the forecast scores well or badly, and the forecaster learns something they can carry to the next call. The word "probably" teaches you nothing; the number, scored, teaches you everything.

- **What it reveals.** A calibrated numerical probability on a specific future event — not "likely" but, say, 30% with a stated range — anchored in a base rate, adjusted for the case, and stated precisely enough that reality can later prove it right or wrong.
- **How it changes the read.** You stop asking *"will this happen?"* as a yes/no and start asking *"what are the odds, starting from how often things like this happen, and what about this case moves the number off that anchor?"*
- **When to foreground it.** A specific, resolvable question — an observable outcome by a stated date — where you want a number you could act or bet on, and where there is a reference class of similar past cases to anchor against.
- **What you'd miss without it.** That a confident-sounding word hides a wide range of actual beliefs, that an estimate built from the case alone while ignoring how often such things happen is usually far too extreme, and that a forecast you never score is a forecast you can never improve.
- **Where it misleads.** Pushed onto questions with no resolvable outcome it manufactures false precision — a crisp "63%" on something that can never be checked is worse than an honest "I don't know"; and a single point estimate with no range hides how thin the evidence beneath it really is.

## Realtime examples

See real, dated analyses where this mode put a calibrated number on a question in the news → **[Probabilistic Forecasting on Main Street Independent](https://mainstreetindependent.com/analyses/technique/future-exploration/probabilistic-forecasting)**

## How to invoke it in Ora

You have a specific question about the future, you want a number rather than a narrative — odds, a probability, a forecast you could act on — and there is some history to anchor against: a class of past cases like this one.

Frame the question so it can be settled, then ask:

> "What's the probability that [specific event] happens by [date]? Give me a calibrated estimate with the base rate and the case-specific adjustment."

The phrases *what's the probability*, *what are the odds*, *calibrated estimate*, and *what's the base rate* are what route you here. Bring two things. First, an outcome that is observable and a date by which it resolves — "will the Fed cut rates" is too loose, "will the Fed cut by ≥75 bps before the December 2026 meeting" forecasts cleanly. Second, any sense you have of the reference class — the past cases that resemble this one — because the base rate drawn from them is the anchor the whole estimate hangs off.

Two boundaries worth knowing. If you want named, narrated futures — distinct stories of how things could unfold — rather than one number, that is scenario work, not a single probability. And if the question has no resolvable answer, or "yes" and "no" are themselves contested, the question has to be sharpened before any number is honest; a crisp probability on a fuzzy question is false precision dressed up as rigor.

## How it works

Start with the cleanest version of the idea, which comes from weather. A forecaster who says "70% chance of rain" every day is making a claim you can audit: gather all the days she said 70%, and on about 70 out of every 100 of them it should actually have rained. If it rains on 70% of her 70%-days, 40% of her 40%-days, and 90% of her 90%-days, she is **calibrated** — her numbers mean what they say. That is the first and most important property of a good forecast, and notice what it requires: not that any single call comes true, but that the numbers, taken as a population, line up with reality. A calibrated forecaster can be "wrong" on any given day (she said 30%, it rained) and still be exactly right about her 30%, because three out of ten 30%-days are supposed to rain.

Calibration alone, though, is too easy to game. If it rains on 25% of days where you live, you can be perfectly calibrated by saying "25%" every single day forever — and you will have told no one anything. So a good forecast needs a second property: **resolution**, the willingness to move away from that base rate, to say 5% on the clear days and 90% on the stormy ones, and be right when you do. Calibration keeps you honest; resolution makes you useful. The art is to have both — to push your numbers toward the extremes when the evidence warrants, without pushing so far that they stop matching reality.

Both properties get folded into a single number that lets you keep score: the **Brier score**, introduced by the meteorologist Glenn Brier in 1950. The rule is simple. Write your probability as a decimal (70% becomes 0.7), look at the outcome (1 if it happened, 0 if it didn't), take the difference, and square it. Forecast 0.7 and it rains: the error is (0.7 − 1)² = 0.09. Forecast 0.7 and it stays dry: (0.7 − 0)² = 0.49 — a much bigger penalty, because you were confident and wrong. Average that squared error over many forecasts and you get a score where lower is better, and where the squaring does something clever: it punishes confident mistakes far more than hedged ones, so you cannot win by being recklessly sure. The Brier score is what turns forecasting from opinion into a practice you can get measurably better at, because now "wrong" has a price and "right" leaves a record.

How do you actually arrive at a good number? The move that separates skilled forecasters from confident amateurs is to start from the **base rate** — the outside view — before looking at the specifics. Ask "how often do things in this reference class happen at all?" and let that frequency be your anchor. Only then bring in the **inside view**: the particulars of this case that argue for nudging the number up or down from the anchor. A startup founder, asked her odds of success, reasons from the inside — her team, her idea, her drive — and says 90%. The base rate for startups in her sector says something nearer 10%. The disciplined forecast does not pick one; it starts at the 10% anchor and adjusts upward for what is genuinely better-than-typical about this case, landing somewhere honest in between — and it shows the arithmetic, so a reader can see the anchor, see the adjustment, and reproduce the result. Skipping the base rate and reasoning from the case alone is the single most common way forecasts go wrong: it is how you get 90% on a 10% endeavor.

This is not a theory about how forecasting ought to work; it is what the evidence shows about who actually forecasts well. The psychologist Philip Tetlock spent two decades scoring tens of thousands of predictions from political and economic experts and found, famously, that the average expert was about as accurate as chance — and that the more famous the expert, the worse the calibration, because confident, tidy, single-cause stories make better television than honest uncertainty. But in a later forecasting tournament he found a minority he called **superforecasters** who genuinely beat the field, and even beat intelligence analysts with access to classified material. Their edge was not brilliance or special information. It was method: they broke big questions into resolvable pieces, anchored on base rates, sought out the other side's evidence, and — above all — made *many small updates* as news arrived rather than one dramatic call. And forecasts **aggregated** across a group of them beat almost any individual, because independent errors cancel. The lesson is humbling and practical at once: good forecasting is a skill, it looks like patience and arithmetic rather than genius, and it lives entirely in numbers you can score — which is exactly why the vague word, however confident, is the thing to distrust.

## Framework & implementation

*This section uses Ora's own terms for the parts of an analysis, so that if you open the actual mode file they line up. Each is glossed in plain language on first use.*

### Pipeline execution

Probabilistic Forecasting is an **atomic mode** in the **future-exploration** territory — a single forecasting pass that produces one calibrated number, not a composite of sub-analyses. It runs at **Gear 4**, Ora's most thorough setting: a **Depth analyst** and a **Breadth analyst** work the question in parallel and then critique each other (**cross-adversarial evaluation**) before a consolidator integrates the result — a structure that suits forecasting, where the depth pass presses on the case-specific drivers while the breadth pass guards the base rate and the range against overconfidence.

The pass does four things in order. It **locks the resolution criteria** — restating the question as an observable outcome with a fixed resolution date, so the forecast has a definite yes/no it can later be scored against; a question that cannot be resolved is sent to be sharpened first rather than forecast on shaky ground. It **selects a reference class and states its base rate** — the outside-view anchor, with the alternative classes it considered and why it chose this one. It **inventories the inside-view drivers** — the case-specific factors, each tagged with a direction (raises or lowers) and a magnitude, kept separate from the base rate so the adjustment stays visible. And it **produces the probability as a range with the adjustment math shown**, so a reader can reproduce the estimate from base rate plus drivers, followed by leading indicators that would prompt revision and a two-part confidence statement.

The mode's reasoning tools ride in its **`ANALYTICAL PERSPECTIVES`** block — the lenses it loads as it works. The load-bearing one is the **Tetlock superforecasting** lens (the whole outside-view-first, many-small-updates discipline above). Where bias-correction is central it also loads an **overconfidence / cognitive-bias** lens — the corrective that watches for the estimate drifting too close to the first number mentioned (anchor bias), for a range drawn too narrow for the evidence (false precision), and for the inside view swamping the base rate (base-rate neglect).

### Output contract

The deliverable is a fixed set of sections, so the forecast is auditable and later scoreable rather than a paragraph of hedging: **Resolution Criteria** (the question restated as an observable outcome with a resolution date, and what "yes" versus "no" looks like), **Reference Class and Base Rate** (the chosen outside-view anchor with its base rate, plus the alternative classes considered and the rationale for the pick), **Inside-View Drivers** (the case-specific factors, each with a direction and a magnitude, kept distinct from the base rate), **Outside-View Adjustment** (the transparent arithmetic that moves from base rate to estimate, reproducible by the reader), **Probability Estimate with Range** (the forecast as an interval, not a point, with the width carrying real information about the strength of the evidence), **Leading Indicators and Update Triggers** (observable signals that would move the number, and where to watch for them), and **Confidence** split two ways — **calibration confidence** (am I right about the range?) kept distinct from **point confidence** (where in the range is most likely?).

### Origin and evidence

The method's spine is the work of Philip Tetlock. His *Expert Political Judgment* (2005) reported two decades of scored expert predictions and established the uncomfortable headline — that average expert accuracy was near chance, and inversely related to fame and confidence. *Superforecasting*, written with Dan Gardner (2015), reported the follow-on forecasting tournament and identified the minority who genuinely beat the field, along with the habits that explained their edge: base-rate anchoring, decomposition, frequent small updates, and aggregation. The scoring machinery underneath comes from meteorology: Glenn Brier's 1950 paper *Verification of forecasts expressed in terms of probability* gave forecasting its first proper scoring rule — the squared-error measure that penalizes confident mistakes and made calibration something you could audit. Behind all of it sits the older idea, associated with Leonard Savage and the subjectivist school of probability, that a probability can legitimately be a *degree of belief* about a one-off event — not just a long-run frequency — which is what licenses putting a number on a future that will only ever happen once.

### Applications and common uses

- **Geopolitical and policy risk.** The native use, and the one Tetlock's tournaments were built on: the odds of a specific event — an election, a conflict, a central-bank move — by a stated date.
- **Markets and macro calls.** Rate decisions, recession timing, earnings or commodity thresholds — anywhere a calibrated probability with a reference class beats a confident narrative.
- **Technology and capability timelines.** Will a particular product ship, or a benchmark be cleared, by a given year — anchored on the base rate of similar past milestones.
- **Project and operational delivery.** The probability a launch, a migration, or a deadline actually lands on time, anchored on how often comparable efforts have.
- **Personal and strategic decisions under uncertainty.** Any choice where naming the odds — and the range — sharpens a bet that would otherwise hide inside "probably".

### Failure modes and when not to use it

- **False precision.** A crisp single number, or a range drawn far too narrow, projects a confidence the evidence does not support. The mode's guard is the mandatory range whose width is meant to reflect actual uncertainty — a wide range is information, not a hedge.
- **Base-rate neglect.** Reasoning from the vivid specifics of the case while ignoring how often such things happen at all is how a 10% endeavor gets a 90% forecast. The mode forces an explicit outside-view anchor before any inside-view adjustment.
- **Anchoring and overconfidence.** The final number can drift toward the first figure mentioned or a salient round number, and confident calls can crowd out honest doubt. The mode keeps the adjustment math visible and flags when the estimate sits suspiciously close to its anchor.

**When not to reach for it.** When the future is better told as **branching qualitative stories** — several distinct, named ways things could unfold — than as one number, route to scenario planning. When the real task is **updating a set of linked hypotheses** against incoming evidence — propagating one piece of news through a web of connected beliefs — that is a Bayesian hypothesis network, not a single forecast. And when the uncertainty is **deep and genuinely unresolvable** — a question with no real reference class, or one whose very terms are contested — a single probability is false comfort; scenario planning or a wicked-future analysis fits the irreducible ambiguity better than a number that only looks like knowledge.

## Related

- **Scenario Planning** — the territory sibling for when the future is better told as several named, narrated pathways than reduced to a single number; the boundary this mode hands off across when the question resists one probability.
- **Bayesian Hypothesis Network** — the mode for when the task is updating a web of linked hypotheses against incoming evidence, rather than putting one calibrated number on one event.
- **Consequences and Sequel** — the lighter forward sibling: tracing the causal cascade of what follows from an event, without committing to a probability on any of it.
- **Tetlock Superforecasting** and **Base Rate Neglect** — the two lenses this mode leans on: the outside-view-first, many-small-updates discipline, and the corrective that refuses an inside-view estimate with no base-rate anchor.

## Sources

- [Tetlock, Philip E. (2005), Expert Political Judgment: How Good Is It? How Can We Know?, Princeton University Press](https://openlibrary.org/works/OL5737028W)
- [Tetlock, Philip E. & Gardner, Dan (2015), Superforecasting: The Art and Science of Prediction, Crown](https://openlibrary.org/works/OL18233878W)
