Pearl Causal Graphs and the Ladder of Causation

Why it matters

“Correlation isn’t causation” is the most repeated warning in all of statistics — and almost no one can say what causation actually is, or how you would ever get it from data. Pearl’s causal graphs are the machinery that finally makes the difference precise enough to compute.

For example: ice-cream sales and drownings rise and fall together, almost perfectly. Obviously the ice cream isn’t drowning anyone — but how do we know that, and how could a computer? Draw it: a hidden common cause, hot weather, drives both (ice cream ← weather → drownings). That little fork is exactly what makes the correlation spurious, and seeing the fork tells you the fix — to find ice cream’s real effect on drownings, you must “adjust for” the weather. The picture does the reasoning that the slogan only gestures at.

What it reveals. Which kind of causal claim is actually on the table — a correlation, the effect of an action, or a what-might-have-been — and what evidence each kind genuinely requires.
How it changes the read. You stop asking “are X and Y related?” and start asking “what happens to Y if I intervene on X — and does the data I have actually support that?”
When to foreground it. Any “X causes Y” or “if we do X, Y will follow” claim, especially a policy or business decision resting on observational data.
What you’d miss without it. The hidden structure — confounders that fake a relationship, and colliders that create a fake one the moment you “control for” them — that decides whether a correlation is real causation or an illusion.
Where it misleads. A graph is only as good as its arrows: drawn to make a favored conclusion follow, a DAG launders motivated reasoning into something that looks rigorous. Each arrow — and each missing arrow — is a claim that has to be earned.

How to invoke it in Ora

You have a causal question — does X cause Y, what happens if we change X, would Y have happened anyway — and you want it reasoned about rigorously rather than settled by a correlation.

Describe the variables and the claim, and ask:

“Build the causal graph: does our onboarding change actually drive retention, or is company size confounding it — and can we even identify the effect from the data we have?”

Pearl’s framework is the foundational tool of the Causal DAG analysis. Ora first locks the rung — is this a question of seeing, doing, or imagining — then lays out the variables and their roles, draws the directed graph, marks confounders, mediators, and colliders, and hands the structure to the do-calculus step to decide whether the effect is even identifiable from your data.

One thing to know: the words causal graph, DAG, do-calculus, Pearl, confounder, back-door, or front-door are what route you here. The analysis is only as good as the graph, so the more domain knowledge you bring about what plausibly causes what, the sharper it is.

Be honest about which question you’re really asking. “Does X predict Y?” (seeing) needs only data; “what happens if we set X?” (doing) needs a defensible graph and identifying assumptions; “would this specific case have gone differently?” (imagining) needs more still. The analysis will name the rung and refuse to answer a higher-rung question with lower-rung evidence.

One thing Ora won’t do: treat a DAG as decoration. Every arrow and every absent arrow is a falsifiable claim it has to justify from domain knowledge — a graph drawn to reach a conclusion is motivated reasoning wearing a lab coat, and the analysis flags it.

How it works

For most of the twentieth century, statistics lived under a taboo. You were allowed to say two things correlated; you were not allowed to say one caused the other. The founders of the field had been burned by sloppy causal talk, and the orthodoxy hardened into a rule: the data cannot speak of causes, only of associations. Generations of students learned to chant “correlation is not causation” — and were given no tools whatsoever to ever get to causation. The word itself became slightly disreputable.

Judea Pearl, an artificial-intelligence researcher, found this maddening, and for a simple reason: human beings reason about cause and effect effortlessly, all day long. A child knows the rooster’s crow doesn’t cause the sunrise. We know the barometer’s drop doesn’t cause the storm, even though they’re tightly linked. If our minds do this so fluently, Pearl reasoned, then there must be a logic to it — and a logic can be written down. What he built to write it down has two pieces, and together they ended the taboo.

The first piece is a ladder of causation with three rungs, and the key insight is that they are genuinely different questions, not degrees of the same one. Rung one is seeing — association: what does observing X tell me about Y? This is the entire rung classical statistics ever stood on. Rung two is doing — intervention: what happens to Y if I reach in and set X to a value? That is a different question, and you can feel the difference — seeing a high barometer reading tells you a storm is unlikely, but setting the barometer with your hand does nothing to the weather. Rung three is imagining — counterfactual: given what actually happened, what would have happened to this particular patient if we’d given a different drug? That is reasoning about a world that never occurred. Each rung answers questions the rung below it simply cannot, and most confident-sounding causal mistakes are really someone answering a rung-two question with rung-one evidence.

The second piece is the device that lets you climb: the causal diagram, or DAG (directed acyclic graph). Draw each variable as a dot and each direct cause as an arrow from cause to effect. That’s it — but the arrows carry your assumptions about how the world works, and once they’re on the page, the graph reasons for you. Three little shapes recur and decide everything. A chain (X → M → Y) means M is a mediator passing the effect along. A fork (X ← C → Y) means C is a confounder — the hidden hot weather behind the ice cream and the drownings — manufacturing a correlation that isn’t causal, which you fix by adjusting for C. And a collider (X → V ← Y) is the treacherous one: X and Y are genuinely unrelated, but the moment you “control for” V, you create a phantom relationship out of thin air. Knowing which is which — and so which variables to adjust for and which to leave strictly alone — is the whole game, and you can read it straight off the picture. Causation, in Pearl’s hands, stops being a thing you can only intuit and becomes a thing you can draw, argue about, and compute.

Framework & implementation

This section uses Ora’s own terms for the parts of an analysis, so that if you open the actual mode and lens files they line up. Each is glossed in plain language on first use.

Pipeline execution

Pearl’s causal graphs and the ladder of causation are the foundational lens of the Causal DAG analysis — foundational: true in its lens file, and one of its two required lenses (the other, do-calculus, computes on the structure this one builds). It sits in the mode’s ANALYTICAL PERSPECTIVES block under “always loaded.” The mode runs at Gear 4, Ora’s most thorough setting — a Depth analyst and a Breadth analyst work the causal question in parallel, critique each other, and revise.

Where the lens engages. It activates on its Detection Signals — an “X causes Y” claim that must be classified; observational evidence being used to deliver an interventional or counterfactual verdict; a policy claim (“if we do X, Y will happen”) resting on correlation. Its Application Steps run the setup: classify the claim’s rung (association / intervention / counterfactual), sketch the DAG (variables as nodes, direct causes as edges, confounders / mediators / colliders marked), and check the rung-evidence match — does the available evidence actually support inference at the claimed rung?

What it produces in the analysis. The mode’s output sections are this lens made operational. The Causal question — Pearl rung locked section is the rung classification. The Variable inventory with roles and DAG specification sections are the graph. The Confounder / mediator / collider classification restates each variable’s structural role for the identifiability step — which confounders must be adjusted for, which mediators block or open which paths, and (critically) which colliders must not be conditioned on. The do-calculus lens then consumes this to produce the Identifiability verdict (back-door / front-door / a do-calculus rule, plus the conditioning set).

Cross-adversarial evaluation. At Gear 4 each analyst’s reading is critiqued by the other, which catches the lens’s signature failures — keyed to its Critical Questions and Common Failure Modes: treating a correlation as an effect (rung-1-as-rung-2); treating a population-level intervention claim as a specific-case counterfactual (rung-2-as-rung-3); drawing a DAG to summarize what’s already believed rather than to license inference from data (DAG-as-summary); and ignoring that the inference depends on the faithfulness assumption (faithfulness blindness). The evaluator presses the hardest check: is each edge — and each absent edge — justified by domain knowledge or empirical test, or drawn to make a desired conclusion follow?

Honesty discipline. The mode carries an Assumption inventory ordered most-fragile-first, each assumption paired with what would falsify it, because a causal verdict is only as strong as the assumptions identifying it. And the lens surfaces a live foundational debate (the Maudlin–Pearl dispute, “D4”) about whether counterfactual claims are genuinely distinct from interventional ones: when a claim is counterfactual, the analysis discloses that its force rests on Pearl-style structural models and offers an interventional restatement rather than treating a contested question as settled.

What the analysis will not do. It will not answer a rung-2 or rung-3 question with rung-1 evidence, will not condition on a collider (which manufactures bias), and will not present a DAG’s conclusions as more secure than the arrows that produced them — the graph is a falsifiable model, not a summary of beliefs.

Origin and evidence

The framework is Judea Pearl’s, developed across decades of work in artificial intelligence and set out formally in Causality: Models, Reasoning, and Inference (2000; 2nd ed. 2009) and accessibly in The Book of Why (2018, with Dana Mackenzie), which is the canonical source for the ladder of causation. The originating technical paper for DAG-based identification is Pearl’s “Causal diagrams for empirical research” (Biometrika, 1995). The work won Pearl the 2011 Turing Award and is widely credited with the “causal revolution” that returned causation to respectability in statistics, epidemiology, economics, and machine learning. It runs alongside a complementary causal-discovery tradition (Spirtes, Glymour, and Scheines’s Causation, Prediction, and Search), which learns graph structure from data rather than assuming it. The framework is not unanimously accepted at the foundations: Tim Maudlin and others contest whether the third rung (counterfactuals) is genuinely distinct from the second — the debate this lens deliberately surfaces rather than hides.

Applications and common uses

Pearl’s causal graphs are a working tool wherever a causal question is asked of non-experimental data, used to classify a claim and to structure its analysis.

Epidemiology and medicine. The native ground of confounding: drawing the DAG to find the right adjustment set is how observational studies estimate a treatment’s effect without (or before) a randomized trial — and how they avoid conditioning on a collider and inventing an effect.
Economics and policy. “If we raise the minimum wage, employment will…” is a rung-two claim usually argued from rung-one data; the graph makes the identifying assumptions explicit and contestable.
Business analytics and experimentation. Deciding whether an effect (a feature on retention, a campaign on sales) is identifiable from logs, or whether only an experiment will settle it — and which confounders to control for if you must use observational data.
Machine learning and AI. Causal graphs underpin the move beyond pattern-matching toward systems that can answer “what if” and “why” questions, and that generalize when the world’s conditions shift.
Everyday reasoning about evidence. A discipline for reading the daily barrage of “study finds X linked to Y” — naming the likely confounder, and refusing to climb from linked to causes without the structure to license it.

In every case the payoff is the same: a causal claim sorted onto the right rung, drawn as a graph whose every arrow is an earned assumption, so that what can be concluded — and what cannot — is explicit.

Failure modes and when not to use it

The lens’s characteristic ways of going wrong are catalogued in its Common Failure Modes:

Rung-1-as-rung-2. Treating an observed correlation as a causal effect — the dominant error. The tell is “X causes Y” backed by an observational regression. Restate the claim associationally, or supply the identifying assumptions that license the causal jump.
Rung-2-as-rung-3. Treating a population-level intervention effect as a specific-case counterfactual (“this firm would have hired three more workers”). Distinguish average treatment effects from individual counterfactuals; supply the structural model unit-level reasoning needs, or restate at the population level.
DAG-as-summary. Drawing a graph to depict existing beliefs rather than to license inference from data. The tell is a structure assumed without justification. Justify each edge and each absent edge; treat the DAG as falsifiable.
Faithfulness blindness. Ignoring that the inference depends on observed independencies matching the graph’s structure. Test the assumption; revise the graph when they don’t.

When not to reach for it. When you have a real experiment — randomization already severs the confounding the DAG exists to handle, so the graph adds rigor mostly for edge cases (attrition, mediation). When the question is genuinely just predictive — will X let me forecast Y, with no intervention contemplated — rung one is the honest level and causal machinery over-claims. And when the variables and their plausible relationships are too unknown to draw a defensible graph, a DAG built on guesses produces confident nonsense; the honest move is causal discovery or more domain knowledge first, not a hand-drawn diagram.

Causal DAG — the analysis this lens founds; classifies a causal claim, builds the graph, and determines whether the effect is identifiable from the data.
Pearl Do-Calculus — the computational partner: given the graph, the back-door and front-door criteria that decide which variables to adjust for to recover the causal effect.
Confirmation Bias — why a DAG must be falsifiable: the temptation to draw the arrows that make a favored conclusion follow is exactly the bias the graph can launder.
Regression to the Mean — a companion caution against reading causation into a pattern (an extreme followed by a milder outcome) that needs no cause at all.