---
name: Root Cause Analysis
status: draft
territory: causal-investigation
msi_territory: causal-investigation
sources:
  - title: Ishikawa, Kaoru (1972), Guide to Quality Control, Asian Productivity Organization
    url: https://openlibrary.org/works/OL120949W
  - title: "Ohno, Taiichi (1988), Toyota Production System: Beyond Large-Scale Production, Productivity Press"
    url: https://openlibrary.org/works/OL8681881W
  - title: Deming, W. Edwards (1982), Out of the Crisis, MIT Center for Advanced Engineering Study
    url: https://openlibrary.org/works/OL3190937W
---

# Root Cause Analysis

## Why it matters

When something keeps going wrong — the same defect, the same missed deadline, the same outage every few weeks — the pull is to fix what you can see and move on. But the visible failure is almost never the cause; it is the last link in a chain. Root cause analysis is the discipline of tracing that chain backward, past the convenient first answer, until you reach the condition that actually generated the failure — the one whose removal would keep it from coming back.

For example: a server crashes, the on-call engineer restarts it, and the ticket is closed. The crash is the symptom. Why did it crash? It ran out of memory. Why? A background job was leaking memory. Why did the leak go unnoticed until the crash? There was no alert on memory growth. Why no alert? The team's monitoring template predates that job and was never updated. The restart "fixed" the crash for a day; the missing alert — four links down — is why the crash returns. Stop at the restart and you have treated the symptom. Reach the monitoring gap and you have found the cause.

- **What it reveals.** The causal chain beneath a failure — not the proximate trigger you can already see, but the deeper condition that made the trigger likely and will produce the next failure if it is left in place.
- **How it changes the read.** You stop asking *"what broke?"* and start asking *"why was this allowed to break, and what would have to change for it never to break this way again?"*
- **When to foreground it.** A specific, recurring, or high-stakes failure where earlier fixes have not held — "we keep patching this and it keeps coming back" — and the question is backward-looking diagnosis, not forward design.
- **What you'd miss without it.** That the obvious cause is usually a symptom of a structural one; fix only the proximate cause and the deeper structure stays in place to generate the same failure again, slightly rearranged.
- **Where it misleads.** Pushed too hard it manufactures tidy single-cause stories for failures that are genuinely multi-causal or driven by feedback loops; and "human error" is almost never a root cause — it is a label that hides the process which made the error easy to commit.

## Realtime examples

See real, dated analyses where this mode traced a failure in the news back to its structural cause → **[Root Cause Analysis on Main Street Independent](https://mainstreetindependent.com/analyses/technique/causal-investigation/root-cause-analysis)**

## How to invoke it in Ora

You have a specific failure — ideally a recurring one whose earlier fixes did not hold — and you want it traced back to the condition that actually generates it, rather than the symptom you can already see.

Describe the failure in concrete terms and ask:

> "What are the root causes of our repeated [failure]? Why does this keep happening despite [what we've already tried]? Draw a fishbone."

The phrases *root causes of*, *why does this keep happening*, and *draw a fishbone* are what route you here. Bring the symptom concretely — "missed sprint deadlines" works, but "missed sprint deadlines three quarters running, mostly on the same kind of work, after we already added capacity and revised estimates" is far better — and say what you have already tried, because a fix that did not hold is itself evidence about where the deeper cause lives.

Two boundaries worth knowing. If the failure is driven by feedback loops — vicious cycles that feed on themselves — the causal-loop mode fits better than a backward chain. And if the diagnosis is already settled and the real question is *which* fix to choose, that is a decision, not a root-cause trace, and a decision mode is the right tool. This mode produces the diagnostic groundwork a fix starts from; it does not pick the fix.

## How it works

The cleanest illustration comes from the factory floor where the method was forged. Taiichi Ohno, the engineer behind the Toyota Production System, used to walk new managers up to a machine that had stopped and refuse to let them blame the obvious thing. A welding robot halts. *Why?* A fuse blew from an overload. The instinct is to replace the fuse — and the robot runs again, until the fuse blows next week. So Ohno kept going. *Why was there an overload?* A bearing was not lubricated enough. *Why?* The lubrication pump was not pumping properly. *Why?* The pump's shaft was worn and rattling. *Why?* There was no filter, so metal shavings had been sucked in and ground the shaft down. Five "whys" past the blown fuse and you arrive at the actual cause: a missing filter. Replace the fuse and you have bought a week; fit the filter and the failure is gone. The blown fuse was real — it just was not load-bearing.

That is the first of the method's two moves: go **deep**. Ask "why" of each answer, not of the original problem, and keep going until you reach a cause that is either something you can act on or something genuinely outside the boundary of the analysis. The discipline is to refuse the first plausible explanation, because the first explanation is almost always a symptom wearing a cause's clothing.

The second move guards against a different failure: tunnelling. If you only ever go deep on your first hunch, you find *a* cause — the one you were already suspicious of — and miss the others. Kaoru Ishikawa's answer, developed in Japanese quality control in the 1960s, was to go **wide** first. Before chasing any single chain, lay out all the *categories* a cause could live in and force yourself to look in each. For a factory the classic set is the "6 Ms" — manpower, methods, machines, materials, measurement, and environment; for a service it might be people, process, policy, and plant; for software, a set tuned to code and deployment. Drawn out, the categories branch off a central spine toward the symptom, which is why Ishikawa's diagram is called a *fishbone*. Its whole purpose is to make you consider the materials problem and the measurement problem before you commit to the one you walked in assuming.

Root cause analysis is just these two moves married: the fishbone spreads the search wide so you do not tunnel, and the five-whys drives each promising branch deep so you do not stop at the symptom. The marriage matters because each covers the other's blind spot — breadth without depth gives you a tidy chart of shallow causes; depth without breadth gives you one confidently-traced chain and three you never looked at. Done honestly, the method has one more piece of integrity built in: it is willing to end at a cause you cannot fix — an organizational structure, a regulation, a market reality — and *say so*, rather than inventing a convenient actionable cause where none exists. A true root cause you cannot act on is more useful than a false one you can.

## Framework & implementation

*This section uses Ora's own terms for the parts of an analysis, so that if you open the actual mode file they line up. Each is glossed in plain language on first use.*

### Pipeline execution

Root Cause Analysis is an **atomic mode** in the **causal-investigation** territory — a single diagnostic pass, not a composite of sub-analyses. It runs at **Gear 4**, Ora's most thorough setting: a **Depth analyst** and a **Breadth analyst** work the failure in parallel and then critique each other (**cross-adversarial evaluation**) before a consolidator integrates the result — a structure that directly mirrors the method's own deep-and-wide logic.

The pass does four things in order. It **locks the symptom** — the specific observable failure, stated with enough precision that the backward trace has a fixed endpoint. It runs a **fishbone decomposition** across the category set chosen to fit the domain (**6M** for manufacturing, **4P** for service, **4S** for software, **8P** for healthcare or education) — the **Chosen Framework and Rationale** step, where the mode says which category set it picked and why. Within each category it runs the **five-whys descent** on the candidate causes, pushing past the first plausible answer until the chain reaches a genuine root or terminates at a stated boundary. Finally it **assembles the structured output**, each branch annotated with the depth reached and where the actionable causes sit.

The mode's reasoning tools ride in its **`ANALYTICAL PERSPECTIVES`** block — the lenses it loads as it works. Three are load-bearing here: the **fishbone-diagram** lens (the breadth move — categorize before you chase), the **five-whys** lens (the depth move — descend past the symptom), and the **fundamental-attribution-error** lens (the corrective that forbids "human error" as a terminus and pushes the chain on to the process or policy that made the error likely).

### Output contract

The deliverable is a fixed set of sections, so the diagnosis is auditable rather than a narrative: **Presented Problem** (the locked symptom), **Chosen Framework and Rationale** (which category set and why), **Category Analysis** (each fishbone branch with its five-whys descent shown), **Root Causes** (each with the category it sits in, the depth reached beneath the symptom, and why it qualifies as root), **Evidence Assessment** (what would confirm each chain, and an explicit **correlation-versus-causation** flag noting where only an intervention could prove the link), **Recommendations** split into **Corrective** (address the surfaced failure) and **Preventive** (stop the class of failure recurring), and **Confidence and Alternative Framings** (how strong the dominant chain is and which convergent chains remain live if its fix proves insufficient).

### Origin and evidence

The method's two halves come from the post-war Japanese quality movement. Kaoru Ishikawa formalized the cause-and-effect (fishbone) diagram and the categorize-first discipline in his *Guide to Quality Control* (1972). Taiichi Ohno built the five-whys descent into the Toyota Production System, recounted in *Toyota Production System: Beyond Large-Scale Production* (1988), as the everyday tool for reaching the cause behind the cause. W. Edwards Deming's *Out of the Crisis* (1982) supplied the surrounding philosophy that gives root cause analysis its bite — that the large majority of failures originate in the *system*, not in the individual operator, so chasing blame is a category error and chasing structure is the work. The lineage carries forward into the formal incident-investigation methodologies of aviation safety and healthcare.

### Applications and common uses

- **Manufacturing and operations.** The native use: a recurring defect or line stoppage traced to the process condition that produces it.
- **Software incident and postmortem review.** The blameless postmortem is root cause analysis by another name — outage to proximate trigger to the monitoring, testing, or design gap beneath it.
- **Service-quality problems.** Recurring complaints, wait-time spikes, or error rates traced past the front-line symptom to staffing, scheduling, or policy structure.
- **Safety and healthcare.** Incident investigation where stopping at "operator error" is precisely the failure the method exists to prevent.
- **Team and organizational diagnosis.** Missed deadlines, repeated escalations, or quality slippage traced to estimation practice, capacity policy, or single-person dependencies.

### Failure modes and when not to use it

- **Five-whys over-application.** Driving the chain past a genuine root yields causes that are nominally deeper but useless. The mode terminates at the level you can act on, and names the termination rather than manufacturing a deeper one.
- **The single-cause trap.** Real failures are often multi-causal and convergent; a method that wants a clean chain can impose one. The full fishbone is the guard — it keeps several chains live and flags convergence rather than declaring a single winner.
- **Category tunnelling.** The Ishikawa categories are scaffolding, not truth; a cause that cuts across them can be missed if the categories are treated as boundaries. The mode is willing to surface cross-category causes.

**When not to reach for it.** When the failure runs on **feedback loops** rather than a linear chain, the causal-loop / systems-dynamics mode fits. When the central difficulty is **competing explanations with evidence on different sides**, that is a hypothesis problem (analysis of competing hypotheses, or process tracing), not a root-cause trace. When the diagnosis is settled and the question is **which intervention to take**, route to a decision mode. And when a failure is genuinely one-off with an obvious cause, running the full apparatus produces noise, not signal.

## Related

- **Causal DAG** — the depth-thorough sibling in the same territory: when the causal structure deserves a formal directed graph that exposes confounders and mediators, not a single chain.
- **Systems Dynamics (Causal)** — the mode for when the failure is sustained by feedback loops and delays rather than a one-way chain — the boundary this mode hands off across.
- **Process Tracing** — the sibling for a single, evidence-rich historical case where reconstructing the exact pathway is the whole task.
- **Fishbone Diagram**, **Five Whys**, and **Fundamental Attribution Error** — the three lenses this mode loads: go wide, go deep, and refuse "human error" as a stopping point.

## Sources

- [Ishikawa, Kaoru (1972), Guide to Quality Control, Asian Productivity Organization](https://openlibrary.org/works/OL120949W)
- [Ohno, Taiichi (1988), Toyota Production System: Beyond Large-Scale Production, Productivity Press](https://openlibrary.org/works/OL8681881W)
- [Deming, W. Edwards (1982), Out of the Crisis, MIT Center for Advanced Engineering Study](https://openlibrary.org/works/OL3190937W)
