---
name: Differential Diagnosis Schema
status: draft
territory: hypothesis-evaluation
host_mode: differential-diagnosis
also_loadable_in: []
msi_wired: false
sources:
  - title: "Sackett, David L. (1991), Clinical Epidemiology: A Basic Science for Clinical Medicine (2nd ed.), Little, Brown"
    url: https://openlibrary.org/works/OL19601518W
---

# Differential Diagnosis Schema

## Why it matters

When something could have several causes, the instinct is to grab the one that springs to mind and hunt for evidence that fits. The disciplined move is the opposite: find the evidence that tells the causes *apart*, and let it do the ranking.

For example: a web service slows to a crawl. It could be the database, last night's deploy, a traffic spike, or a flaky dependency. The tempting path — "probably the deploy, let's go read the deploy logs" — finds *something* consistent with the deploy and stops there. The differential path lists all four, then asks of each: what would I see if *this* were the cause that I would *not* see otherwise? A traffic spike predicts a jump in request volume; a database problem predicts slow queries specifically; the deploy predicts the slowdown began exactly at deploy time. The observation that *discriminates* — not the one that merely fits your first guess — is what points to the answer.

- **What it reveals.** Which candidate cause is *best distinguished* by the evidence — ranked by how well each observation separates the possibilities, not by how much it flatters your favorite.
- **How it changes the read.** You stop asking *"does this evidence fit my hypothesis?"* and start asking *"what would I see under each hypothesis — and which observation tells them apart?"*
- **When to foreground it.** Any situation with two to five plausible explanations where you want a structured, quick read on which is best supported — short of a full intelligence-grade workup.
- **What you'd miss without it.** That the most *suggestive* evidence is often worthless — a symptom present under every candidate (fever, across a list of infections) feels meaningful but discriminates nothing.
- **Where it misleads.** It only works on genuinely *competing* (mutually exclusive) hypotheses, and it knows its own limits — when two candidates stay tied, the honest move is to escalate to the heavier full analysis, not force a verdict.

## How to invoke it in Ora

You have a handful of possible explanations for something and you want them ranked honestly and quickly — by what rules them in or out, not by which you thought of first.

Describe the situation and the candidates, and ask:

> "Differential diagnosis on this outage: database, the deploy, a traffic spike, or a bad dependency — which does the evidence actually point to, and what test would settle it?"

The differential-diagnosis schema is the core protocol of this analysis. Ora lists the candidate explanations, names the evidence that distinguishes them, rates each observation's *diagnosticity* (how well it separates the candidates), strikes the evidence that doesn't discriminate, ranks what remains, and — crucially — flags when the result is too close to call and a heavier analysis is warranted.

One thing to know: the words *differential*, *candidate explanations*, *what are the possibilities*, *rule out*, or *most likely cause* are what route you here. This is the fast, structured read; when the stakes or the tie justify it, the analysis escalates to a full Analysis of Competing Hypotheses.

Make sure your candidates genuinely *compete* — that adopting one would mean rejecting the others. If two explanations could both be true at once, the differential is ill-posed; restructure them first.

One thing Ora won't do: rank by what *fits*. It scores evidence by its power to *discriminate* between candidates and explicitly forbids "this supports my leading hypothesis" bookkeeping — ranking by support rather than by diagnosticity is the classic way a differential goes wrong.

## How it works

The phrase *differential diagnosis* comes from medicine, and the clinic is where its discipline is sharpest, because there the cost of grabbing the first explanation that fits can be a dead patient. A doctor meets a patient short of breath. A dozen things cause shortness of breath — pneumonia, heart failure, a blood clot, asthma, anxiety. The novice's temptation is to pattern-match: this *looks like* the textbook picture of one of them, so let's treat that. And medicine has a famous warning about exactly this temptation — "when you hear hoofbeats, think horses, not zebras" — because the dramatic, memorable rare disease (the zebra) is precisely the one a vivid resemblance lures you toward, while the common, boring, far-more-likely cause sits unexamined.

The trained move is different, and it has a precise logic that clinical epidemiologists like David Sackett worked out formally. You don't start by betting on a winner. You start by writing down the *list* — the two to five candidates genuinely in play. Then, for each, you ask what evidence would *distinguish* it from the others, and you go looking for that evidence specifically. The chest X-ray that's cloudy in pneumonia but clear in a blood clot. The leg swelling that points to a clot over an infection. The key concept underneath all of it is **diagnosticity**: not how *strongly* a piece of evidence supports a diagnosis, but how well it *tells the candidates apart*. A fever, in a patient you already suspect has an infection, feels like confirmation — but if every candidate on your list causes fever, the fever discriminates *nothing*, and a disciplined diagnostician strikes it from the analysis no matter how suggestive it feels. The evidence that earns its place is the evidence that would come out *differently* depending on which hypothesis is true.

So the procedure is a short, deliberate loop. List the candidates — and keep the list to a workable handful, because beyond five the comparison sprawls. For each, name the distinguishing evidence. Rate every piece by its diagnosticity: high (seen under one candidate, not the others), medium (narrows the field a little), low (consistent with everything, so useless). Throw out the low-diagnosticity evidence entirely. Rank what's left by which candidate the *high*-diagnosticity evidence points to — and note, this is *not* the candidate with the most evidence "consistent with" it, which is the trap that lets a wrong answer accumulate a comforting pile of meaningless support. Finally, and this is the part that separates a good diagnostician from an overconfident one: assess how sure you actually are. If one candidate dominates cleanly, good. But if two stay roughly tied, or a high-diagnosticity clue refuses to fit anything, you do *not* manufacture a confident verdict. You flag it — order the definitive test, or escalate to the heavier, slower, full comparison. Knowing when the quick read has hit its limit is not a weakness of the method. It is the method.

## Framework & implementation

*This section uses Ora's own terms for the parts of an analysis, so that if you open the actual mode and lens files they line up. Each is glossed in plain language on first use.*

### Pipeline execution

The differential-diagnosis schema is the **required** lens of the Differential Diagnosis analysis — listed in the mode's `lens_dependencies` and loaded in its **`ANALYTICAL PERSPECTIVES`** block. As a `lens_type: protocol`, it supplies the mode's procedure: a five-step hypothesis comparison that is deliberately *lighter* than the full Analysis of Competing Hypotheses (Heuer's ACH), with an explicit escalation path to it. The mode runs at **Gear 4**, Ora's most thorough setting — a **Depth analyst** and a **Breadth analyst** work the situation in parallel, critique each other, and revise.

**Where the lens engages.** It activates on its **Detection Signals** — a situation with two to five candidate explanations needing structured comparison; an analyst unsure whether the leading candidate is genuinely best-supported or merely the first one they thought of; stakes that don't justify the full ACH protocol but where informal weighing would be too loose. Its **Application Steps** run the protocol: list the candidates, list the distinguishing evidence per candidate, rate each item's **diagnosticity** (high / medium / low), strike the low-diagnosticity evidence and rank by what the high-diagnosticity evidence supports, and assess confidence with an escalate-to-ACH flag.

**What it produces in the analysis.** The mode's output sections are this protocol made explicit. **Candidate hypotheses** lists the genuinely-distinct possibilities (with a base-rate hint where available). **Evidence observed** names each observation once and tags which candidates it bears on. The **Diagnosticity per hypothesis** section is the heart — a table whose cells use disconfirming-power language (`rules out` / `discriminating-positive` / `consistent with` / `irrelevant`), making the load-bearing cells visible at a glance. **Ranking with reasoning** orders the candidates by the load-bearing cells (with a sensitivity note), **Disconfirming tests for the top two** names what would separate them, and **Confidence per ranking** keeps confidence per-candidate rather than blending it into one verdict — with an explicit *evidence-sufficiency flag* when the evidence base is too thin for a stable diagnosis.

**Cross-adversarial evaluation.** At Gear 4 each analyst's reading is critiqued by the other, which catches the lens's signature failures — keyed to its **Critical Questions** and **Common Failure Modes**: rating evidence by support for the favored hypothesis rather than by diagnosticity (**confirmation lock**); stopping at the first plausible candidate with the alternatives left un-mapped (**premature closure**); including alternatives only as straw men (**foil hypotheses**); presenting a borderline result as definitive (**treating borderline as definitive**); and running the protocol with no diagnosticity rating at all (**evidence without diagnosticity column**). The evaluator presses the central re-rate: *what would I expect to see if H1 were true vs. if H2 were true?* — never *does this fit H1?*

**Honesty discipline.** The lens knows its scope. It is the *light* protocol, and its value depends on recognizing its limits: when confidence is borderline — two candidates tied, or anomalies substantial — it sets the **escalation flag** to the full Heuer ACH rather than dressing a coin-flip as a diagnosis. And it requires that alternatives be *steelmanned* before evidence is rated, so the differentiation is real rather than illusory.

**What the analysis will not do.** It will not rank a hypothesis up for accumulating evidence merely *consistent* with it, will not run on overlapping (non-competing) hypotheses without restructuring them first, and will not present a borderline ranking as a confident one — the staged design exists precisely so a close call is escalated, not forced.

### Origin and evidence

The schema is the formalization of clinical diagnostic reasoning, rooted in the medical tradition and in the clinical-epidemiology movement that made it rigorous. David Sackett's *Clinical Epidemiology* (1991) established systematic, evidence-weighted approaches to diagnosis; Jerome Kassirer's "Diagnostic reasoning" (1989) is the canonical account of hypothesis-driven differential diagnosis as a structured cognitive process; David Eddy's *Clinical Decision Making* (1996) gave it a decision-analytic frame. The underlying probabilistic logic is Bayesian (each finding updating the odds across candidates), and the lens is explicitly the *lighter sibling* of Richards Heuer's Analysis of Competing Hypotheses (*Psychology of Intelligence Analysis*, 1999) — sharing ACH's core move of ranking by *disconfirmation* while trading its full rigor for speed, and escalating to it when the lighter read proves insufficient. The discipline generalizes far beyond medicine, into engineering fault-finding, intelligence analysis, and debugging.

### Applications and common uses

The differential-diagnosis schema is a working tool wherever a small set of competing explanations must be ranked quickly and honestly.

- **Medicine and clinical reasoning.** Its native ground: ranking candidate diagnoses by discriminating findings, resisting the pull of the vivid rare disease, and ordering the test that separates the top two.
- **Incident response and debugging.** "What's causing the outage?" is a differential — database, deploy, dependency, load — best solved by the evidence that distinguishes them (the timing, the query latencies, the error signatures), not by checking your first hunch.
- **Engineering fault diagnosis.** Root-causing a failure among a few plausible mechanisms, ranked by the symptom that only one of them would produce.
- **Intelligence and investigation.** The fast read when a full ACH is overkill — a few hypotheses, ranked by diagnostic evidence, with escalation when the call is close.
- **Everyday problem-solving.** Any "it's probably X" judgment improves by listing the two-to-five real possibilities and asking what evidence would tell them apart, rather than confirming the first.

In every case the payoff is the same: a short, honest ranking driven by the evidence that *discriminates*, an explicit confidence read, and the discipline to escalate rather than fake certainty when the candidates stay tied.

### Failure modes and when not to use it

The lens's characteristic ways of going wrong are catalogued in its **Common Failure Modes**:

- **Confirmation lock.** Rating evidence by its support for the leading hypothesis rather than by its diagnosticity across hypotheses. The tell is a column full of "supports H1" rather than "distinguishes H1 from H2." Re-rate each item by asking what you'd expect under *each* hypothesis.
- **Premature closure.** Stopping at the first plausible candidate while the alternatives get no serious evidence-mapping. Require at least one high-diagnosticity item per candidate — actively searching for distinguishing observations you haven't considered.
- **Foil hypotheses.** Including alternatives only to make the favorite look strong by contrast. Steelman each alternative before rating evidence; if it can't be steelmanned, drop it and find a real competitor.
- **Treating borderline as definitive.** Presenting a confident ranking when the protocol's own confidence read is borderline. Enforce the escalation flag — dispatch the full analysis or report with explicit borderline framing.

**When not to reach for it.** When the candidate explanations are not mutually exclusive — they could all be partly true at once — a differential is ill-posed, and a contributory or systems analysis fits better. When the situation demands the full intelligence-grade workup from the start (high stakes, adversarial deception, many hypotheses), skip straight to the Analysis of Competing Hypotheses rather than running a light read you'll have to escalate anyway. And when there is really only one serious candidate, "differential" is theater — just test that one.

## Related

- **Differential Diagnosis** — the analysis this lens anchors; ranks two-to-five competing explanations by discriminating evidence and escalates when the call is close.
- **Analysis of Competing Hypotheses** — the heavier sibling this schema escalates to: the full Heuer matrix when the light read proves borderline.
- **Representativeness Heuristic** — the bias the discipline guards against: judging a candidate by how well it *resembles* a textbook picture rather than by the evidence that discriminates and the base rate.
- **Bayesian Reasoning** — the probabilistic engine underneath: each finding updating the odds across candidates by its diagnostic weight.

## Sources

- [Sackett, David L. (1991), Clinical Epidemiology: A Basic Science for Clinical Medicine (2nd ed.), Little, Brown](https://openlibrary.org/works/OL19601518W)
