Differential Diagnosis Schema

Why it matters

When something could have several causes, the instinct is to grab the one that springs to mind and hunt for evidence that fits. The disciplined move is the opposite: find the evidence that tells the causes apart, and let it do the ranking.

For example: a web service slows to a crawl. It could be the database, last night’s deploy, a traffic spike, or a flaky dependency. The tempting path — “probably the deploy, let’s go read the deploy logs” — finds something consistent with the deploy and stops there. The differential path lists all four, then asks of each: what would I see if this were the cause that I would not see otherwise? A traffic spike predicts a jump in request volume; a database problem predicts slow queries specifically; the deploy predicts the slowdown began exactly at deploy time. The observation that discriminates — not the one that merely fits your first guess — is what points to the answer.

What it reveals. Which candidate cause is best distinguished by the evidence — ranked by how well each observation separates the possibilities, not by how much it flatters your favorite.
How it changes the read. You stop asking “does this evidence fit my hypothesis?” and start asking “what would I see under each hypothesis — and which observation tells them apart?”
When to foreground it. Any situation with two to five plausible explanations where you want a structured, quick read on which is best supported — short of a full intelligence-grade workup.
What you’d miss without it. That the most suggestive evidence is often worthless — a symptom present under every candidate (fever, across a list of infections) feels meaningful but discriminates nothing.
Where it misleads. It only works on genuinely competing (mutually exclusive) hypotheses, and it knows its own limits — when two candidates stay tied, the honest move is to escalate to the heavier full analysis, not force a verdict.

How to invoke it in Ora

You have a handful of possible explanations for something and you want them ranked honestly and quickly — by what rules them in or out, not by which you thought of first.

Describe the situation and the candidates, and ask:

“Differential diagnosis on this outage: database, the deploy, a traffic spike, or a bad dependency — which does the evidence actually point to, and what test would settle it?”

The differential-diagnosis schema is the core protocol of this analysis. Ora lists the candidate explanations, names the evidence that distinguishes them, rates each observation’s diagnosticity (how well it separates the candidates), strikes the evidence that doesn’t discriminate, ranks what remains, and — crucially — flags when the result is too close to call and a heavier analysis is warranted.

One thing to know: the words differential, candidate explanations, what are the possibilities, rule out, or most likely cause are what route you here. This is the fast, structured read; when the stakes or the tie justify it, the analysis escalates to a full Analysis of Competing Hypotheses.

Make sure your candidates genuinely compete — that adopting one would mean rejecting the others. If two explanations could both be true at once, the differential is ill-posed; restructure them first.

One thing Ora won’t do: rank by what fits. It scores evidence by its power to discriminate between candidates and explicitly forbids “this supports my leading hypothesis” bookkeeping — ranking by support rather than by diagnosticity is the classic way a differential goes wrong.

How it works

The phrase differential diagnosis comes from medicine, and the clinic is where its discipline is sharpest, because there the cost of grabbing the first explanation that fits can be a dead patient. A doctor meets a patient short of breath. A dozen things cause shortness of breath — pneumonia, heart failure, a blood clot, asthma, anxiety. The novice’s temptation is to pattern-match: this looks like the textbook picture of one of them, so let’s treat that. And medicine has a famous warning about exactly this temptation — “when you hear hoofbeats, think horses, not zebras” — because the dramatic, memorable rare disease (the zebra) is precisely the one a vivid resemblance lures you toward, while the common, boring, far-more-likely cause sits unexamined.

The trained move is different, and it has a precise logic that clinical epidemiologists like David Sackett worked out formally. You don’t start by betting on a winner. You start by writing down the list — the two to five candidates genuinely in play. Then, for each, you ask what evidence would distinguish it from the others, and you go looking for that evidence specifically. The chest X-ray that’s cloudy in pneumonia but clear in a blood clot. The leg swelling that points to a clot over an infection. The key concept underneath all of it is diagnosticity: not how strongly a piece of evidence supports a diagnosis, but how well it tells the candidates apart. A fever, in a patient you already suspect has an infection, feels like confirmation — but if every candidate on your list causes fever, the fever discriminates nothing, and a disciplined diagnostician strikes it from the analysis no matter how suggestive it feels. The evidence that earns its place is the evidence that would come out differently depending on which hypothesis is true.

So the procedure is a short, deliberate loop. List the candidates — and keep the list to a workable handful, because beyond five the comparison sprawls. For each, name the distinguishing evidence. Rate every piece by its diagnosticity: high (seen under one candidate, not the others), medium (narrows the field a little), low (consistent with everything, so useless). Throw out the low-diagnosticity evidence entirely. Rank what’s left by which candidate the high-diagnosticity evidence points to — and note, this is not the candidate with the most evidence “consistent with” it, which is the trap that lets a wrong answer accumulate a comforting pile of meaningless support. Finally, and this is the part that separates a good diagnostician from an overconfident one: assess how sure you actually are. If one candidate dominates cleanly, good. But if two stay roughly tied, or a high-diagnosticity clue refuses to fit anything, you do not manufacture a confident verdict. You flag it — order the definitive test, or escalate to the heavier, slower, full comparison. Knowing when the quick read has hit its limit is not a weakness of the method. It is the method.

Framework & implementation

This section uses Ora’s own terms for the parts of an analysis, so that if you open the actual mode and lens files they line up. Each is glossed in plain language on first use.

Pipeline execution

The differential-diagnosis schema is the required lens of the Differential Diagnosis analysis — listed in the mode’s lens_dependencies and loaded in its ANALYTICAL PERSPECTIVES block. As a lens_type: protocol, it supplies the mode’s procedure: a five-step hypothesis comparison that is deliberately lighter than the full Analysis of Competing Hypotheses (Heuer’s ACH), with an explicit escalation path to it. The mode runs at Gear 4, Ora’s most thorough setting — a Depth analyst and a Breadth analyst work the situation in parallel, critique each other, and revise.

Where the lens engages. It activates on its Detection Signals — a situation with two to five candidate explanations needing structured comparison; an analyst unsure whether the leading candidate is genuinely best-supported or merely the first one they thought of; stakes that don’t justify the full ACH protocol but where informal weighing would be too loose. Its Application Steps run the protocol: list the candidates, list the distinguishing evidence per candidate, rate each item’s diagnosticity (high / medium / low), strike the low-diagnosticity evidence and rank by what the high-diagnosticity evidence supports, and assess confidence with an escalate-to-ACH flag.

What it produces in the analysis. The mode’s output sections are this protocol made explicit. Candidate hypotheses lists the genuinely-distinct possibilities (with a base-rate hint where available). Evidence observed names each observation once and tags which candidates it bears on. The Diagnosticity per hypothesis section is the heart — a table whose cells use disconfirming-power language (rules out / discriminating-positive / consistent with / irrelevant), making the load-bearing cells visible at a glance. Ranking with reasoning orders the candidates by the load-bearing cells (with a sensitivity note), Disconfirming tests for the top two names what would separate them, and Confidence per ranking keeps confidence per-candidate rather than blending it into one verdict — with an explicit evidence-sufficiency flag when the evidence base is too thin for a stable diagnosis.

Cross-adversarial evaluation. At Gear 4 each analyst’s reading is critiqued by the other, which catches the lens’s signature failures — keyed to its Critical Questions and Common Failure Modes: rating evidence by support for the favored hypothesis rather than by diagnosticity (confirmation lock); stopping at the first plausible candidate with the alternatives left un-mapped (premature closure); including alternatives only as straw men (foil hypotheses); presenting a borderline result as definitive (treating borderline as definitive); and running the protocol with no diagnosticity rating at all (evidence without diagnosticity column). The evaluator presses the central re-rate: what would I expect to see if H1 were true vs. if H2 were true? — never does this fit H1?

Honesty discipline. The lens knows its scope. It is the light protocol, and its value depends on recognizing its limits: when confidence is borderline — two candidates tied, or anomalies substantial — it sets the escalation flag to the full Heuer ACH rather than dressing a coin-flip as a diagnosis. And it requires that alternatives be steelmanned before evidence is rated, so the differentiation is real rather than illusory.

What the analysis will not do. It will not rank a hypothesis up for accumulating evidence merely consistent with it, will not run on overlapping (non-competing) hypotheses without restructuring them first, and will not present a borderline ranking as a confident one — the staged design exists precisely so a close call is escalated, not forced.

Origin and evidence

The schema is the formalization of clinical diagnostic reasoning, rooted in the medical tradition and in the clinical-epidemiology movement that made it rigorous. David Sackett’s Clinical Epidemiology (1991) established systematic, evidence-weighted approaches to diagnosis; Jerome Kassirer’s “Diagnostic reasoning” (1989) is the canonical account of hypothesis-driven differential diagnosis as a structured cognitive process; David Eddy’s Clinical Decision Making (1996) gave it a decision-analytic frame. The underlying probabilistic logic is Bayesian (each finding updating the odds across candidates), and the lens is explicitly the lighter sibling of Richards Heuer’s Analysis of Competing Hypotheses (Psychology of Intelligence Analysis, 1999) — sharing ACH’s core move of ranking by disconfirmation while trading its full rigor for speed, and escalating to it when the lighter read proves insufficient. The discipline generalizes far beyond medicine, into engineering fault-finding, intelligence analysis, and debugging.

Applications and common uses

The differential-diagnosis schema is a working tool wherever a small set of competing explanations must be ranked quickly and honestly.

Medicine and clinical reasoning. Its native ground: ranking candidate diagnoses by discriminating findings, resisting the pull of the vivid rare disease, and ordering the test that separates the top two.
Incident response and debugging. “What’s causing the outage?” is a differential — database, deploy, dependency, load — best solved by the evidence that distinguishes them (the timing, the query latencies, the error signatures), not by checking your first hunch.
Engineering fault diagnosis. Root-causing a failure among a few plausible mechanisms, ranked by the symptom that only one of them would produce.
Intelligence and investigation. The fast read when a full ACH is overkill — a few hypotheses, ranked by diagnostic evidence, with escalation when the call is close.
Everyday problem-solving. Any “it’s probably X” judgment improves by listing the two-to-five real possibilities and asking what evidence would tell them apart, rather than confirming the first.

In every case the payoff is the same: a short, honest ranking driven by the evidence that discriminates, an explicit confidence read, and the discipline to escalate rather than fake certainty when the candidates stay tied.

Failure modes and when not to use it

The lens’s characteristic ways of going wrong are catalogued in its Common Failure Modes:

Confirmation lock. Rating evidence by its support for the leading hypothesis rather than by its diagnosticity across hypotheses. The tell is a column full of “supports H1” rather than “distinguishes H1 from H2.” Re-rate each item by asking what you’d expect under each hypothesis.
Premature closure. Stopping at the first plausible candidate while the alternatives get no serious evidence-mapping. Require at least one high-diagnosticity item per candidate — actively searching for distinguishing observations you haven’t considered.
Foil hypotheses. Including alternatives only to make the favorite look strong by contrast. Steelman each alternative before rating evidence; if it can’t be steelmanned, drop it and find a real competitor.
Treating borderline as definitive. Presenting a confident ranking when the protocol’s own confidence read is borderline. Enforce the escalation flag — dispatch the full analysis or report with explicit borderline framing.

When not to reach for it. When the candidate explanations are not mutually exclusive — they could all be partly true at once — a differential is ill-posed, and a contributory or systems analysis fits better. When the situation demands the full intelligence-grade workup from the start (high stakes, adversarial deception, many hypotheses), skip straight to the Analysis of Competing Hypotheses rather than running a light read you’ll have to escalate anyway. And when there is really only one serious candidate, “differential” is theater — just test that one.

Differential Diagnosis — the analysis this lens anchors; ranks two-to-five competing explanations by discriminating evidence and escalates when the call is close.
Analysis of Competing Hypotheses — the heavier sibling this schema escalates to: the full Heuer matrix when the light read proves borderline.
Representativeness Heuristic — the bias the discipline guards against: judging a candidate by how well it resembles a textbook picture rather than by the evidence that discriminates and the base rate.
Bayesian Reasoning — the probabilistic engine underneath: each finding updating the odds across candidates by its diagnostic weight.