---
name: Swiss Cheese Model
status: active
territory: risk-and-failure-analysis
host_mode: fragility-antifragility-audit
also_loadable_in:
  - pre-mortem-action
  - pre-mortem-fragility
  - process-mapping
  - red-team-assessment
  - root-cause-analysis
msi_wired: true
msi_family: risk
sources:
  - title: "Reason, James (2000), Human error: models and management, BMJ 320(7237):768-770"
    url: https://doi.org/10.1136/bmj.320.7237.768
  - title: "Perneger, Thomas V. (2005), The Swiss cheese model of safety incidents: are there holes in the metaphor?, BMC Health Services Research 5:71"
    url: https://doi.org/10.1186/1472-6963-5-71
---

# Swiss Cheese Model

## Why it matters

Catastrophe almost never comes from one big failure — it comes from several small holes in separate defenses lining up just long enough for trouble to pass straight through.

For example: a patient gets the wrong drug. Trace it back and there was never a villain. The prescription was written in a hurry but legible; the pharmacy was short-staffed but checked it; the ward's barcode scanner was broken that week; the nurse who'd have caught it was covering a second ward. Every one of those defenses usually works. They all happened to gap at the same hour, on the same patient — and the hazard walked through the tunnel they briefly opened. Four small, survivable problems, no single one of them the "cause."

- **What it reveals.** That safety lives in *layers*, and a layer is never a wall — it's a slice of cheese with holes. The diagnosis isn't "which layer failed" but "did the holes across the layers line up," which is a question about the *whole stack*, not any one slice.
- **How it changes the read.** You stop hunting for the single broken part or the one person to blame, and start asking which gaps were free to coincide — and *why* they were there long before anyone slipped. The proximate slip is the last hole, not the story.
- **When to foreground it.** Any defense-in-depth setup — safety, security, quality, reliability — where multiple safeguards stand between a hazard and harm, and you need to know whether they're genuinely independent or quietly fail together.
- **What you'd miss without it.** The alignment, and the holes that were built in long ago and just sat there waiting. Blame the operator at the sharp end and you fix the one slip while leaving every latent hole exactly where it was — so the next alignment is only a matter of time.
- **Where it misleads.** Counting slices is not the same as being safe: a fifth layer that fails under the *same* conditions as the other four (the same fatigue, the same deadline, the same bad data feed) adds a slice whose holes line up with theirs and buys almost nothing. And the picture is a way to find systemic patterns, not a tool for assigning individual blame.

## Realtime examples

See real, dated analyses where this pattern shaped the read on the news → **[The Swiss cheese model on Main Street Independent](https://mainstreetindependent.com/analyses/lens/risk/swiss-cheese-model)**

## How to invoke it in Ora

You're looking at a set of layered defenses — the training, the checklist, the alarm, the review, the backup — and you want to know where the gaps could line up into one failure that runs clean through all of them.

Describe the defenses and the failure you're worried about, and ask:

> "Fragility audit of our layered safety defenses: where could the gaps line up into one catastrophic failure, and what is the tail risk?"

Ora maps each defensive layer, finds the holes in every one — the latent weaknesses built in long ago and the active slips at the sharp end — checks whether those holes are independent or fail together, traces the trajectory where they align, and recommends both the holes to close and the genuinely independent layers worth adding.

One thing to know: the words *fragility*, *tail risk*, *stress-test*, or *defenses* are what route you here. A plain "is our safety process good?" gets a clarifying question instead, because nothing in it says you want the layered defenses stress-tested for aligned holes rather than the process judged on its merits.

Describe the *layers* concretely — what each one is, when it tends to fail, and what it depends on (a person, a data feed, a quiet moment) — because the entire analysis turns on whether two layers lean on the *same* thing. Two safeguards that both fail when the night shift is short aren't two layers; they're one.

One thing Ora won't do: hand you a clean bill of health because you have a lot of layers. The audit is adversarial by design — a stack with no aligned-hole path is assumed to be hiding a shared failure mode, and it keeps looking until it finds one or earns the all-clear.

## How it works

Picture every defense you have against some disaster as a slice of Swiss cheese, and stack the slices one behind another: the training, then the checklist, then the alarm, then the supervisor, then the backup. A hazard has to get through all of them, front to back, before anyone gets hurt.

Now the honest part. Every slice has holes. No defense is perfect — training fades, checklists get skipped under pressure, alarms get muted, a supervisor looks away, a backup was wired up wrong years ago. If you demanded a slice with no holes you'd never build anything. So the trick was never to make a perfect layer. The trick is that the holes are in *different places*. A hazard that finds a hole in the first slice runs straight into solid cheese on the second; if it slips that, the third stops it. Most of the time, somewhere in the stack, there's cheese where the hazard is.

That's why disaster needs something rare and almost unlucky: a single moment when the holes in *every* slice happen to line up, opening one clean tunnel from front to back. The hazard goes through untouched. And it explains the thing that makes real accidents so disorienting — there usually isn't one big blunder to point at. There's a handful of small failures, each one survivable on its own, each one the kind of thing that happens all the time, that briefly aligned. You go looking for the broken part and find five things that were each *almost* fine.

This picture is James Reason's, and his sharpest move was to notice that the holes come in two very different kinds. Some are *active* failures — the slip right at the sharp end, the operator's mistake, the thing that happens in the last second before the accident. Those are the ones everyone sees, because they're closest to the harm, and they're the ones that get the blame. But most of the holes were already there. Reason called them *latent* conditions: weaknesses built into the system long ago, by decisions made far from the front line — the staffing level set in a budget meeting, the alarm threshold chosen for convenience, the data feed nobody updated, the deadline that quietly shortened every safety margin. Latent holes don't cause accidents by themselves. They sit there, sometimes for years, widening the holes and waiting for an active failure to line up with them.

Once you see the two kinds, the usual response to an accident looks backwards. The instinct is to find the person who made the last slip and fix *them* — retrain, reprimand, add a warning label. But the active failure was just the last hole in the tunnel; the latent holes that let it through are still exactly where they were, so the same tunnel can open again next week with a different person at the end of it. The more durable fix runs the other way: go after the latent holes, and make sure the layers are genuinely independent — that they don't all gap under the same fatigue, the same time pressure, the same single data source. Because the failure that ruins you isn't the hole in any one slice. It's the day they all line up.

## Framework & implementation

*This section uses Ora's own terms for the parts of an analysis, so that if you open the actual mode and lens files they line up. Each is glossed in plain language on first use.*

### Pipeline execution

The Swiss cheese model is one of the mental models the Fragility Antifragility Audit carries in its **`ANALYTICAL PERSPECTIVES`** block under "always loaded" — it isn't the audit's foundational lens (that's the fragility/antifragility model itself), but it's resident on every run, ready to engage the moment the system in question is a stack of defenses. The audit runs at **Gear 4**, Ora's most thorough setting: a **Depth analyst** and a **Breadth analyst** read the system independently, each critiques the other's reading, both revise under that critique, and a consolidator merges what survives. The lens threads through those stages like this.

**Detection.** The lens engages on the cases in its **Detection Signals** — a failure that passed through multiple safeguards that should have caught it; a defense-in-depth strategy being designed for safety, security, or quality; a post-mortem that needs to say which layers failed and why; a question of whether existing layers are *truly* independent or share a common failure mode; a decision about where a new layer would do the most good. The precondition is a system with multiple defensive layers, layer failures that can be at least partly observed, and a cost of failure high enough to justify the work.

**The Depth and Breadth analysts.** Two models read the system in parallel. The **Depth analyst** commits to one reading and defends it, running the lens's **Application Steps**: list every defensive layer between the hazard and the harm; for each layer, identify its holes — the conditions under which it fails to catch the problem; and then the step that does the real work, checking for *correlated* holes — layers that fail under the same condition (the same fatigue, the same time pressure, the same data source). It distinguishes each hole's **active failure** (the act, visible at the time of the accident) from its **latent condition** (the pre-existing organizational or design weakness the active failure merely exposes), since the durable fixes target the latter. The **Breadth analyst** works the same system at the same time, hunting the holes the first reading would miss — latent conditions buried upstream of the operators, and shared dependencies that make two nominally separate layers secretly one. Neither sees the other's work. Together they answer the mode's CQ1 (the three-way fragile/robust/antifragile classification this defensive stack falls under) and, centrally for this lens, CQ2 (the **hidden concavity** — latent failures are often invisible until the alignment exposes them) and CQ4 (**via negativa** — the move to *remove* holes, not only stack more layers on top).

**Cross-adversarial evaluation.** Each analyst's reading is handed to the *other* to critique. The lens's signature failures are caught here, keyed to its **Common Failure Modes**: adding a layer that shares a failure mode with an existing one and calling it redundancy (*independence assumption* — the evaluator demands the new layer be shown to fail under *different* conditions than the layers it backs up, or strikes the claim of added safety); fixating on the immediate slip while the latent conditions go untraced (*active-failure focus*); and piling on layers with no regard for marginal value (*layer-count fetishism* — more slices whose holes line up with the existing ones). A reading that names a proximate cause but no systemic factors is sent back as a required fix.

**Revision and claim-check.** The reviser addresses the fixes. Where the reading rests on a factual claim — a real failure trace, an actual dependency between two layers, a true staffing or data-source fact — that claim is marked a **flagged claim** and sent to a web-search tool; it has to resolve against outside sources before the revised draft moves forward.

**Consolidation and output.** The consolidator merges the two revised readings, and the formatter places them into the mode's set sections. The defensive stack and the failure trace are stated where the audit locks its subject, in **System or strategy locked**. The aligned-holes finding lands in **Concave exposures** — each layer's holes are catalogued, tagged *latent* or *active*, and the exposure is named concave because the loss is small and survivable until the holes align, at which point catastrophe arrives all at once (the alignment is the downside-bending tail). The aligned-hole trajectory itself — the specific path a hazard takes through the lined-up gaps — lands in **Tail risk assessment**, held apart from the ordinary day where the layers do their job. The two recommendation sections split the response cleanly: closing the latent holes (training, automation, checklists, fresh-eyes review that shrink each gap) lands in **Via negativa recommendations**, and adding *independent* layers — ones whose failure modes don't correlate with the existing stack — lands in **Addition recommendations** beside it, never instead of it. A **Confidence per finding** rating closes each.

**What the analysis will not assert.** It reports where the holes are, whether they're free to align, and what closes or offsets them. It does not hand back a clean bill of health to be reassuring — the audit's character is adversarial, and a stack it can't find an aligned-hole path through is assumed to be hiding a shared failure mode rather than declared safe. And it will not use the model to assign individual blame: the point of separating latent conditions from active failures is precisely to move the diagnosis off the person at the sharp end and onto the systemic pattern that put the holes there.

### Origin and evidence

The model is James Reason's, the British psychologist who reframed human error as a property of *systems* rather than of careless individuals. He set it out in *Human Error* (1990), developed it for organizational accidents in *Managing the Risks of Organizational Accidents* (1997), and gave it its most-cited statement of the case in a short 2000 *BMJ* paper, "Human error: models and management," which contrasts the *person approach* (blame the operator, exhort them to try harder) with the *system approach* (assume fallible humans are a given and build layered defenses that catch their inevitable slips). The model's core image — successive slices of defense, each with holes that shift and move, an accident occurring only when the holes momentarily line up — is Reason's, and its enduring line is his: defenses in depth work *not because each barrier is perfect, but because the weaknesses in each are offset by the strengths in others*. The active/latent distinction is the model's analytic backbone: active failures are the unsafe acts of people in direct contact with the system, latent conditions the resident pathogens — Reason's own metaphor — seeded by upstream decisions and lying dormant until they combine with active failures and local triggers. The picture has been adopted across aviation, nuclear power, healthcare, and engineering as the standard mental model of defense in depth, and it has been criticized productively too — Thomas Perneger's 2005 examination ("are there holes in the metaphor?") presses on its ambiguities, notably that it can be read to imply the holes are independent and randomly placed when in practice they are often correlated by common causes, which is exactly the failure the audit is built to catch.

### Applications and common uses

The Swiss cheese model is the working vocabulary of defense in depth — used both to *audit* an existing set of safeguards and to *design* a new one so its layers don't fail together.

- **Healthcare and patient safety.** The model's adopted home: medication-error analysis, surgical checklists, and incident review are routinely framed as layers and holes, and the system-not-person reframe underwrites the whole modern patient-safety movement — investigate the latent conditions, not just the nurse or the surgeon at the sharp end.
- **Aviation and nuclear power.** The high-reliability domains where defense in depth is doctrine. Accident investigation traces the trajectory through pilot or operator actions, procedures, automation, and supervision, and the central design discipline is verifying that the layers are genuinely independent rather than sharing a common-cause failure.
- **Cybersecurity.** Layered controls — perimeter, authentication, monitoring, backups — are slices, and the sharp question is correlation: a single stolen admin credential or one unpatched dependency that opens holes across several layers at once is the aligned tunnel an attacker walks through.
- **Software reliability and engineering safety.** Deployment pipelines (tests, review, staging, canary) and safety-critical control systems are read as defensive stacks; the model pairs naturally with normal-accident theory and the fragility audit, and the recurring fix is to make the layers independent — different data, different reviewers, different failure conditions — rather than simply adding more of the same.
- **Organizational risk and post-mortems.** Beyond safety, any blameless post-mortem leans on the model to separate the proximate trigger from the latent organizational conditions — the staffing, the incentives, the deadlines — that quietly enlarged the holes long before the incident.

In every case the payoff is the same: a map of where the holes are, an honest verdict on whether they're free to line up, and a fix aimed at the *latent* gaps and the *independence* of the layers — not at the unlucky person who happened to be standing at the last hole.

### Failure modes and when not to use it

The lens's characteristic ways of going wrong are catalogued in its **Common Failure Modes**, joined by the misapplications named in its lens file:

- **Independence assumption.** Treating layers as independent without verifying it — adding a safeguard that fails under the very same conditions as the ones it's meant to back up. The tell is a "redundant" layer that goes down whenever the existing layer goes down. The fix is to explicitly test for layer correlation before trusting the redundancy.
- **Active-failure focus.** Fixating on the immediate unsafe act while the latent conditions go untraced. The tell is a post-mortem that names the proximate cause and stops, with no systemic factors behind it. The fix is to trace the latent condition behind every hole.
- **Layer-count fetishism.** Adding layers without regard for marginal value, as if more slices were automatically more safety. The tell is the cost of layers climbing without a matching drop in risk. The fix is to prefer *shrinking the holes* in existing layers over stacking on redundant ones whose holes line up with what's already there.
- **Blame by metaphor.** Using the model to pin the failure on the individual at the sharp end — the opposite of its purpose. The model exists to move the diagnosis from the person to the system; reading it as a way to locate one culpable hole inverts it.

**When not to reach for it.** When there's really only *one* line of defense, there's no stack of holes to align and the layered picture adds nothing — analyze that single barrier directly. When the layers can't be observed even partly, the model has nothing to work with. And when the failure of interest isn't a hazard slipping through *defenses* but something else — a slow drift in the system's own behavior, or a pure capacity or design limit — a different lens (normal-accident theory for tightly-coupled complexity, normalization-of-deviance for drift, the fragility audit for the shape of the response) fits the question better.

## Related

- **Fragility Antifragility Audit** — the analysis this lens runs inside; reads how a system responds to volatility and stress, with layered defenses as one concave, tail-exposed shape.
- **Normal Accident Theory** — a sibling in the same audit: in tightly-coupled, complex systems, the conditions that line the holes up are structurally *normal*, not exceptional.
- **Taleb Fragility and Antifragility** — the foundational lens of the host audit; a stack of defenses whose holes can align is a textbook concave, tail-exposed exposure.
- **Normalization of Deviance** — what widens the holes over time: small accepted shortcuts that "work fine" until the day the gaps they opened all line up.

## Sources

- [Reason, James (2000), Human error: models and management, BMJ 320(7237):768-770](https://doi.org/10.1136/bmj.320.7237.768)
- [Perneger, Thomas V. (2005), The Swiss cheese model of safety incidents: are there holes in the metaphor?, BMC Health Services Research 5:71](https://doi.org/10.1186/1472-6963-5-71)