---
name: Analysis of Competing Hypotheses
status: draft
territory: hypothesis-evaluation
msi_territory: hypothesis-evaluation
sources:
  - title: Heuer, Richards J., Jr. (1999), Psychology of Intelligence Analysis, Center for the Study of Intelligence, CIA
    url: https://openlibrary.org/works/OL20202835W
  - title: Heuer, Richards J., Jr. & Pherson, Randolph H. (2010), Structured Analytic Techniques for Intelligence Analysis, CQ Press
    url: https://openlibrary.org/works/OL21076070W
  - title: Popper, Karl (1959), The Logic of Scientific Discovery, Hutchinson
    url: https://openlibrary.org/works/OL1984582W
---

# Analysis of Competing Hypotheses

## Why it matters

When several explanations are alive at once — three theories for why engagement dropped, four for why a defect keeps recurring — the mind does something quietly fatal: it picks the one that feels most plausible early and then spends the rest of its attention gathering reasons that fit. Each new fact that agrees with the favored story feels like confirmation, and the story gets stronger in your head without ever getting tested. Analysis of competing hypotheses is the discipline that breaks this habit. It puts every explanation on the table at once and then asks, of each piece of evidence, not *which story does this support?* but *which stories does this rule out?* The explanation left standing is the one the evidence fails to contradict — not the one it seems to confirm.

For example: an app's engagement falls 30% in March, and the team has three theories — an algorithm change, content fatigue, a seasonal dip. The tempting move is to pick the favorite and pile up supporting facts. The disciplined move is to lay all three side by side and test each fact against all three. "Engagement fell off a cliff on one specific day" is consistent with an algorithm change, but it contradicts content fatigue, which would unfold gradually. "Competitor apps dropped the same 30%" contradicts an internal algorithm change but fits a seasonal or platform-wide cause. No single fact decides it — but fact by fact, the contradictions accumulate against the weak hypotheses, and the one with the fewest contradictions survives. You did not prove the winner; you eliminated the losers, which is the only thing evidence can actually do.

- **What it reveals.** Which of several live explanations the evidence is *least able to contradict* — and, just as important, which pieces of evidence are actually doing the deciding versus which are consistent with everything and therefore decide nothing.
- **How it changes the read.** You stop asking *"what supports my theory?"* and start asking *"what would be true if each rival theory were correct, and which of those predictions does the evidence violate?"* — confirmation becomes elimination.
- **When to foreground it.** Two or more genuinely competing explanations are on the table, you have evidence that bears on them differently, and the cost of converging on the wrong one is high — especially when a team is talking past each other because different people favor different stories.
- **What you'd miss without it.** That the evidence "supporting" your favored hypothesis is usually also consistent with two of the others, so it was never really support at all; and that the one fact which quietly contradicts your favorite is worth more than the ten that seem to confirm it.
- **Where it misleads.** It can only judge the hypotheses you put on the table — if the true explanation is one nobody named, the matrix will still crown a winner. And the consistent/inconsistent judgments are themselves judgments, so a determined analyst can smuggle a preference back in through how each cell is scored.

## Realtime examples

See real, dated analyses where this mode weighed several explanations for a story in the news against the evidence → **[Analysis of Competing Hypotheses on Main Street Independent](https://mainstreetindependent.com/analyses/technique/hypothesis-evaluation/competing-hypotheses)**

## How to invoke it in Ora

You have at least two live explanations for the same situation, you have evidence that bears on them, and you want to know which one the evidence actually supports — rather than which one you walked in believing.

Lay out the rival explanations and ask:

> "I have three competing hypotheses for why [X happened]: [A], [B], [C]. Make me an ACH matrix — what rules out each?"

The phrases *competing hypotheses*, *make me an ACH matrix*, and *what rules out each* are what route you here. Bring two things if you can: the full set of explanations you take seriously, and the evidence you have, stated as concrete observations rather than conclusions ("engagement fell on a single day" is evidence; "the algorithm did it" is a hypothesis). The mode will surface a hypothesis or pull in evidence if you give it only one side, but it does its sharpest work when both the rival explanations and the facts are on the table from the start — and it will add a hypothesis of its own, including a deception hypothesis, if the situation calls for one.

Two boundaries worth knowing. If what you really want is *probabilities that move as new evidence arrives* — a quantitative degree of belief in each explanation, updated over time — that is the Bayesian hypothesis-network mode, not this one; ACH gives a structured verdict, not a posterior probability. And if there is only one explanation on the table and the question is whether it holds up, that is single-hypothesis work, not competing-hypotheses work — ACH needs a plurality of rivals to do anything at all.

## How it works

The method was forged for the hardest version of this problem. In the 1970s, a CIA analyst named Richards Heuer kept watching skilled colleagues make the same mistake — and it was not a stupidity problem, it was a wiring problem. An analyst would form an early read on what a foreign adversary was doing, and from that moment on, every cable, every intercept, every report got quietly filed as *more support* for the read they already had. Evidence that fit was noticed and remembered; evidence that did not was explained away or never weighed. The favored hypothesis grew stronger in the analyst's mind without ever being put at risk — and when only one hypothesis is ever really on the table, there is nothing to catch a deception, because a planted fact "fits" the story it was planted to support.

Heuer's fix inverts the natural motion of the mind. Instead of starting from a hypothesis and looking for support, you start by listing *all* the hypotheses — every explanation anyone takes seriously, plus the ones nobody wants to say out loud — before you look hard at any single piece of evidence. Then you build a grid: the hypotheses across the top, the evidence down the side. For each piece of evidence, you go across the row and ask of every hypothesis: if this hypothesis were true, would I expect to see this? Mark it consistent, inconsistent, or not-applicable. The grid forces you to confront each fact against *every* explanation at once, not just your favorite — which is exactly the comparison the unaided mind refuses to make.

Now comes the move that makes the whole thing work, and it is the one that feels backward. You do not pick the winner by counting the consistent marks. You pick it by counting the *inconsistent* ones — and the hypothesis with the fewest contradictions wins. The reason is deep and worth sitting with: in the real world, most evidence is consistent with most explanations. A suspect's fingerprints at the scene are consistent with "he is the murderer" and also with "he visited last week" and also with "he was framed by someone who lifted his prints." Consistency is cheap; it barely narrows anything. But *inconsistency* is decisive — a fact that simply cannot be true if a hypothesis holds eliminates that hypothesis outright. So the evidence that earns its keep is the *diagnostic* evidence: the pieces that point one way and not the others, that are consistent with some hypotheses and flatly inconsistent with the rest. A fact that fits every hypothesis equally tells you nothing about which is true, however dramatic it sounds. The grid makes the cheap evidence visible as cheap, and lets the diagnostic evidence do the deciding.

Think of a detective with three suspects. The amateur builds a case *for* the suspect who seems guiltiest, accumulating motive and opportunity until the story feels airtight — and the story can feel airtight while being wrong, because everything that fit was counted and nothing that did not was sought. The ACH detective does the opposite: she lists all three suspects, lays out every fact, and goes hunting for the fact that *breaks* each one. Suspect A had a motive — but A was on a train two hundred miles away, and that single inconsistency does what no amount of motive could: it eliminates A. The surviving suspect is not the one with the thickest file of supporting detail; it is the one against whom she could find the least that cannot be explained away. This is why ACH is, at bottom, an application of a much older idea — Karl Popper's insight that you never prove a theory true, you only fail to prove it false, and the theory worth believing is the one that has survived the most serious attempts to break it. ACH is that discipline made into a grid: list the rivals, hunt for what disconfirms, and trust the survivor not because the evidence loves it but because the evidence could not kill it.

## Framework & implementation

*This section uses Ora's own terms for the parts of an analysis, so that if you open the actual mode file they line up. Each is glossed in plain language on first use.*

### Pipeline execution

Analysis of Competing Hypotheses is the **depth-thorough mode** in the **hypothesis-evaluation** territory — the heavyweight of its family, sitting beside the lighter differential-diagnosis (informal weighing of a handful of candidates) and the quantitative Bayesian hypothesis-network (probability reasoning over a network). It runs at **Gear 4**, Ora's most thorough setting: a **Depth analyst** and a **Breadth analyst** work the problem in parallel and then critique each other (**cross-adversarial evaluation**) before a consolidator integrates the result. That two-stream structure is not decoration here — it is the method's own defense against bias, turned on the analysis itself. The streams can score the same evidence cell differently (one stream may rate "competitors dropped too" as inconsistent with an internal cause while the other permits a shared-dependency reading), and rather than papering over the disagreement, the mode surfaces it as a **tension** and carries both ratings into the sensitivity check — so the reader sees exactly which judgment the verdict hangs on.

The pass does six things in order. It **elicits the hypothesis set** — at least two competing explanations, because ACH needs plurality to function, and it will add an analyst-generated hypothesis (including a **deception hypothesis**) when the situation warrants. It **inventories the evidence**, attaching a credibility and relevance rating to each piece. It **builds the consistency matrix** — every evidence row scored consistent, inconsistent, or not-applicable against every hypothesis column. It runs the **diagnosticity assessment**, separating the evidence that discriminates among hypotheses from the evidence that is consistent with everything and therefore decides nothing. It **scores by elimination** — ranking hypotheses by their count of *inconsistent* cells, fewest-contradictions-wins, never by counting consistencies. Finally it produces a **calibrated verdict and sensitivity analysis**: which hypothesis is least contradicted, how confident that is, and precisely which evidence ratings, if they flipped, would flip the winner.

The mode's reasoning tools ride in its **`ANALYTICAL PERSPECTIVES`** block — the lenses it loads as it works. The load-bearing ones are the **disconfirmation discipline** (spend attention on what does not fit the favored hypothesis, not on what does), a **confirmation-bias** corrective (the named tendency the whole method exists to defeat), and a **deception-awareness** lens (treat a capable adversary's possible manipulation of the evidence as its own hypothesis rather than as noise).

### Output contract

The deliverable is a fixed set of sections, so the reasoning is auditable rather than a bare verdict: a **Hypothesis List** (each explanation stated precisely, with its origin noted — user-supplied or analyst-generated), an **Evidence Inventory** (each piece with credibility, relevance, and source), the **Consistency Matrix** (the full evidence-by-hypothesis grid, with any cross-stream **tensions** flagged in footnotes), a **Diagnosticity Assessment** (which evidence discriminates and which is consistent-with-everything), **Tentative Conclusions via Elimination** (the inconsistency-count ranking that names the surviving hypothesis), a **Sensitivity Analysis** (the specific evidence ratings whose reversal would change the verdict), a **Deception Assessment** (present and tested when an adversarial actor is in play, explicitly marked not-applicable when none is), and **Monitoring Priorities** (the evidence still worth gathering, ranked by how much it would move the verdict).

### Origin and evidence

The method is Richards J. Heuer Jr.'s, developed inside the CIA's Directorate of Intelligence in the 1970s and laid out in full in his *Psychology of Intelligence Analysis* (1999) — a book written precisely because the cognitive failures it catalogs (premature closure, evidence-for-the-favorite, blindness to deception) had repeatedly produced intelligence failures, and no amount of telling analysts to "be objective" had fixed them. ACH was Heuer's structural answer: a procedure that makes the bias-defeating move mandatory rather than hoping for it. Heuer and Randolph Pherson later codified ACH as one of the core methods in *Structured Analytic Techniques for Intelligence Analysis* (2010), the field's standard handbook, carrying it from intelligence work into business, law, and investigation. The deeper philosophical root is Karl Popper's *The Logic of Scientific Discovery* (1959): the principle that theories are tested by attempted falsification, not accumulated confirmation, and that the surviving theory is the one that has withstood the most serious attempts to refute it. ACH is that principle rendered as a working grid.

### Applications and common uses

- **Intelligence and security analysis.** The native use — assessing an adversary's intentions or capabilities when several readings fit the reporting and deception is possible.
- **Business and market diagnosis.** Competing explanations for a metric moving the wrong way — an engagement drop, a churn spike, a sales miss — weighed against internal and external evidence.
- **Investigation and forensics.** Multiple suspects, causes, or scenarios for an incident, tested by the evidence that eliminates rather than the evidence that fits.
- **Scientific and technical troubleshooting.** Rival mechanisms for an anomaly or failure, each held up against the observations that would distinguish them.
- **Team disputes over what happened.** When members favor different explanations and keep talking past each other, the shared matrix turns an argument into a structured comparison everyone can read.

### Failure modes and when not to use it

- **The unlisted-truth problem.** ACH can only judge the hypotheses on the table; if the real explanation is one nobody named, it will still crown a winner. The mode mitigates by surfacing the hypothesis-set decision explicitly and adding analyst-generated hypotheses when the matrix structure hints one is missing — but it cannot manufacture an explanation no one conceived.
- **Smuggled preference.** The consistent/inconsistent rating of each cell is itself a judgment, and a determined analyst can encode a favorite by how generously each cell is scored. The two-stream adversarial structure is the guard — divergent ratings become visible tensions rather than a silent thumb on the scale.
- **Correlated evidence counted as independent.** ACH treats evidence rows as separate votes, but real evidence often clusters from one underlying source, so five "facts" may be one fact wearing five hats — inflating a hypothesis's apparent support. The mode flags suspected evidence-correlation rather than letting it pad the count.
- **Mistaking the grid for a calculator.** The inconsistency count is a discipline, not a quantitative probability; reading the numbers as precise belief is a category error Heuer himself warned against.

**When not to reach for it.** When the question is **probabilities that update as evidence arrives** — a quantitative degree of belief in each explanation over time — route to **bayesian-hypothesis-network**, the territory's quantitative sibling; ACH gives a structured verdict, not a posterior. When the candidates are **medical or fault symptoms** to be ranked quickly and informally, **differential-diagnosis** is the lighter, faster sibling. When there is **a single failure to trace backward** to its generating cause rather than a field of rival explanations to weigh, that is **root-cause-analysis**, not a competing-hypotheses problem. And when the disagreement is really an **inter-frame paradigm dispute** — the parties are using incompatible worldviews, not weighing the same evidence — no matrix will resolve it, and a paradigm mode fits better.

## Related

- **Bayesian Hypothesis Network** — the quantitative sibling in the same territory: when you need probabilities that update as evidence arrives rather than a structured eliminate-the-rivals verdict, this is the handoff.
- **Differential Diagnosis** — the lighter, faster sibling for ranking a handful of candidate explanations by informal weighing when the full matrix would be overkill.
- **Red-Team Assessment** — the complement when the worry is not "which explanation is true?" but "where would an adversary or our own blind spot break this read?" — adversarial pressure applied to the conclusion itself.
- **Confirmation Bias** — the lens this mode is built to defeat: the pull to gather evidence for the favored story and explain away the rest, which the disconfirmation discipline exists to override.

## Sources

- [Heuer, Richards J., Jr. (1999), Psychology of Intelligence Analysis, Center for the Study of Intelligence, CIA](https://openlibrary.org/works/OL20202835W)
- [Heuer, Richards J., Jr. & Pherson, Randolph H. (2010), Structured Analytic Techniques for Intelligence Analysis, CQ Press](https://openlibrary.org/works/OL21076070W)
- [Popper, Karl (1959), The Logic of Scientific Discovery, Hutchinson](https://openlibrary.org/works/OL1984582W)