Causal DAG
Why it matters
A causal DAG — a directed acyclic graph — is a picture of a causal story. Each variable is a node, each arrow is a claim that one variable causes another (an arrow from X to Y means X causes Y), and “acyclic” means no loops, because a cause has to come before its effect. It looks like nothing — just boxes and arrows — but it is the most disciplined tool there is for one specific job: deciding which variables you must adjust for, and which you must leave alone, before you try to read a cause-and-effect claim off data you didn’t get to run an experiment on.
For example: ice-cream sales and drowning deaths rise and fall together, week by week, tightly. A naive look says ice cream is dangerous. Draw the DAG and the trap is obvious — summer heat has an arrow into ice-cream sales and an arrow into swimming (and so into drownings). Heat is a common cause sitting upstream of both. The arrow you’d be tempted to draw — ice cream to drowning — is the one arrow the diagram tells you not to draw, and the correlation evaporates the moment you account for the season. The picture caught what the correlation could not.
- What it shows. A causal model made explicit: every variable that matters as a node, every cause-effect claim as an arrow, and — just as loudly — every missing arrow as a claim that no direct effect exists.
- When to reach for it. Before you estimate a causal effect from data you only observed (you didn’t randomize), and you need to know which variables to control for so the estimate isn’t a confounded artifact.
- How to read it. Trace the arrows from your cause to your effect; the diagram sorts the other variables into confounders (adjust for them), mediators (don’t, if you want the total effect), and colliders (never adjust for them).
- What you’d miss without it. The confounder you forgot and the collider you’d have “controlled for” by reflex — the two mistakes that turn an honest regression into a false causal claim.
- Where it misleads. It is only as true as its arrows; a DAG with a missing edge or a wrong direction will hand you a confident, wrong adjustment set. It encodes assumptions, not findings — and it cannot represent feedback, where a variable loops back on itself.
How to read it
Picture a scatter of labelled boxes with arrows running between them. Each box is a variable — exercise, sleep, depression, age. Each arrow is a causal claim: an arrow from X to Y says “X has a direct causal effect on Y.” That is the whole vocabulary. The power is in how strictly it’s used: the absence of an arrow between two boxes is every bit as strong a statement as a drawn one — it asserts there is no direct effect, and that assertion does real work in the analysis. “Acyclic” — no arrow can ever loop back to where it started — is just the rule that causes precede their effects.
Once it’s drawn, the diagram sorts the supporting variables into three kinds, and getting the kind right is the entire game. A confounder is a common cause — one variable with arrows into both the cause and the effect you care about (summer heat into both ice cream and drowning). A confounder opens a fake “back-door” path that masquerades as a real effect, so you must adjust for it to close that path. A mediator sits on the path between cause and effect — it’s how the cause works (exercise improves sleep, and better sleep lifts mood, so sleep mediates exercise’s effect on depression). Adjust for a mediator and you wipe out part of the very effect you were measuring. A collider is the mirror image of a confounder: a common effect, with two arrows pointing into it. Colliders are the counterintuitive ones — they’re already blocked, so adjusting for one opens a spurious link between its causes that was never there (this is the engine behind selection bias, where conditioning on who got into your sample invents correlations).
So the diagram is not decoration; it is a calculator you read by eye. Pick your cause and your effect, find every back-door path between them, and the structure tells you the exact set of variables to adjust for — and the set to keep your hands off — to recover a true causal effect from data that was merely watched, not controlled. That is the trick prose can’t pull off: in a paragraph, “we controlled for the relevant variables” hides which ones and why; on the DAG, the choice is forced, visible, and checkable.
When to use it
The causal DAG belongs to the CAUSAL family of diagrams — the ones that make cause-and-effect structure visible — and within it the DAG sits at the formal end. The family is a ladder of rigor, and picking the right rung is how you pick the right tool:
- A Fishbone Diagram is the qualitative cousin — a brainstorm that enumerates candidate causes of one problem by category. Reach for it to gather hypotheses; reach for the DAG when you need to test one against data.
- A Causal Loop Diagram captures the one thing a DAG structurally cannot: feedback — the vicious and virtuous cycles where a variable loops back on itself. When the system has loops, the “acyclic” rule breaks and you need the loop diagram.
- A Stock-and-Flow Diagram goes quantitative in a different direction — accumulations and rates over time — when you need to simulate a dynamic system, not identify a single effect.
Reach for a causal DAG when you are about to estimate a causal effect from data you didn’t randomize — observational or quasi-experimental data — and the honesty of the answer rides on adjusting for the right variables. That is the standard situation in epidemiology (does this exposure cause this disease?), econometrics (does this policy cause this outcome?), and A/B and experiment design (which variables must we measure or balance?). Skip it when the question is purely descriptive — you only want to summarize, not explain (a distribution or time-series plot is the tool) — or when the system is genuinely cyclic with feedback (use a causal loop diagram). The DAG is the step you take before the regression, not a substitute for it.
How Ora builds it
Ora produces a causal DAG from a semantic spec — a structured list of the variables with their roles (which is the treatment, which is the outcome, which are measured, which are unmeasured) and a set of directed edges, each edge a stated causal claim. Roles and edges are the whole model; the spec is the place the assumptions live, written down where they can be argued with.
That spec is rendered to a diagram. The layout uses Graphviz (the dot engine) to arrange the nodes and arrows, and the model is also emitted in DAGitty interchange syntax — DAGitty being the standard browser-based environment causal-inference practitioners use to build and check DAGs — so the model is portable, not trapped in the picture. Alongside the diagram comes an adjustment-set report: a tabular companion naming the back-door paths between treatment and outcome and the minimal set of variables to adjust for, since the diagram’s main payoff is a decision and the report states it in words. Accessibility is built in — alt-text describes the variable count, the treatment-outcome pair, the back-door paths, and the recommended adjustment set, because arrows alone aren’t readable by a screen reader.
The diagram is the visual face of Ora’s Causal DAG mode: when you ask “draw the causal DAG and tell me what to control for,” that mode builds the variable inventory, fixes the edges, classifies each variable as confounder, mediator, or collider, and runs the back-door test for identifiability — and this artifact is how it shows that work.
The notation is the work of Judea Pearl and his collaborators, who in the 1990s and 2000s made causal graphs mathematically rigorous and showed that a properly drawn DAG decides which effects can be identified from a given dataset — set out in his Causality: Models, Reasoning, and Inference and, for the general reader, The Book of Why with Dana Mackenzie. The formal causal-discovery side runs in parallel through Spirtes, Glymour, and Scheines’s Causation, Prediction, and Search (1993), and the lineage reaches back to Sewall Wright’s path diagrams in 1920s genetics.
Related
- Causal Loop Diagram — the CAUSAL-family member for feedback: the cycles and delays a DAG’s acyclic structure is forbidden from drawing.
- Fishbone Diagram — the qualitative cause-enumeration cousin: brainstorm candidate causes by category before you formalize and test them in a DAG.
- Stock and Flow — the quantitative dynamics member: accumulations and rates for simulating a system over time, where the DAG identifies a single effect.
- Causal DAG (mode) — the analytical operation this diagram renders: build the variable inventory, fix the edges, classify confounders and colliders, and derive the adjustment set.