Cleveland-McGill Perceptual Tasks

Why it matters

Which chart “reads better” isn’t a matter of taste — it’s been measured. People decode some visual encodings far more accurately than others, and the ranking, from most accurate to least, is known.

For example: the same five numbers are drawn twice — once as a pie chart, once as a bar chart. Ask a room of people to read off the values, and the bar chart wins, badly. It isn’t that bars are prettier or that pies are old-fashioned. It’s that judging where a bar-top sits against a shared axis is a task the eye does almost exactly, while judging the angle of a pie slice is a task the eye does poorly — and that gap was put to the test and quantified. The pie hasn’t gotten any less charming. It’s just been caught being harder to read.

  • What it reveals. Which of a graphic’s encodings the eye can read accurately and which it can only read roughly — graded against an experimentally measured ranking, not an opinion about chart types.
  • How it changes the read. You stop arguing “bar versus pie versus bubble” on style and start asking “is this graphic’s most important comparison carried by an encoding people read accurately, or one they don’t?”
  • When to foreground it. Any chart whose job is to let a reader extract or compare values — and especially one reaching for pies, bubbles, 3-D, treemaps, or color-as-quantity.
  • What you’d miss without it. That a clean, honest, well-labeled chart can still under-communicate, because it parked its key comparison in an encoding the eye reads imprecisely — a failure no amount of decluttering fixes.
  • Where it misleads. The ranking measures value-reading accuracy, not every purpose a graphic serves. A pie can be fine for a rough part-of-whole gist, and a map’s color may be worth keeping for the spatial pattern it shows — accuracy is one goal among several, not the only one.

How to invoke it in Ora

You have a chart, dashboard, or infographic, and you want to know whether its most important comparison is encoded in a way people can actually read.

Attach the graphic (or describe it precisely) and ask:

“Critique this chart for information density — is the main comparison encoded in a way people read accurately, and what’s the redesign?”

This perceptual ranking is one of the always-loaded tools of the Information Density analysis. It isn’t a thing you summon on its own — it rides along whenever you ask for the density critique. Ora identifies what the viewer’s eye actually has to do to read each value (judge a position, a length, an angle, an area, a shade), grades that task against the accuracy ranking, and — when a load-bearing comparison is sitting in a low-accuracy encoding — tells you which higher-accuracy encoding would carry it better.

One thing to know: phrases like information density, visual hierarchy, Tufte, data-ink ratio, chartjunk, or Bertin are what route you here. Asking to “rank the perceptual tasks” by name won’t summon it on its own — invoke the density analysis and this evidence comes with it. A clear image, or a mark-by-mark description (which dimension is drawn as position, which as angle, which as area or color), gives the analysis the most to work with.

Say which comparison the graphic is for — the one number-reading the reader most needs to get right. The ranking is about precise value-reading, so the analysis prioritizes the encoding carrying that comparison; it won’t condemn a pie used for a rough impression or a choropleth kept for its spatial pattern, but it will flag either if a precise magnitude claim is leaning on it.

One thing Ora won’t do: turn the ranking into a blanket ban (“never use pie charts”). It treats the ranking as a defeasible guide — prefer the higher-accuracy encoding unless a named reason (spatial context, convention, audience) justifies the lower one, and when it does, say so and note the precision cost.

How it works

The pie chart versus the bar chart is one of the oldest arguments in design, and for most of its life it was settled the way arguments about taste usually are — loudly, and by whoever cared most. In 1984, two statisticians at Bell Labs decided to settle it a different way: with data. William Cleveland and Robert McGill built graphs that encoded the same numbers in different visual forms, showed them to people, and simply measured how accurately the viewers could read the values back. Not which chart people preferred. Which chart people read correctly.

What came out was a ranking. They broke the act of reading a chart into a set of elementary jobs the eye does — they called them perceptual tasks — and ordered them from the one people perform most accurately to the one they perform least. At the very top is judging position along a common scale: where dots or bar-tops sit against the same axis. This is the task a dot plot or an ordinary bar chart asks of you, and people are astonishingly good at it — they read the values back almost exactly, because the shared axis gives the eye a continuous ruler to measure against.

Then accuracy starts to slide. A notch down is judging position on scales that don’t line up — comparing across panels that each have their own axis, where the eye has to mentally bridge the gap. Below that is judging length on its own, a bar segment floating without a baseline to anchor it. Below that is judging angle and slope — and this is the pie chart’s task, reading a value off the angle of a wedge. That placement is the punchline of the whole 1984 experiment: the pie isn’t bad because it’s unfashionable, it’s bad because angle is a job the eye does measurably worse than position, so the very same numbers come back fuzzier. Keep descending and it gets worse still: area is next (the bubble chart, where a circle’s size stands for a value, and people reliably underestimate the big ones), then volume (anything reaching into a fake third dimension), and at the bottom, color shading — judging a quantity from how light or dark a patch is, which the eye can barely rank, let alone measure.

So the reveal is this. The eye is not a neutral instrument that reads every chart equally well; it has a measured hierarchy of decoding accuracy, and where you place a comparison on that hierarchy decides how accurately your reader can make it. This is the Cleveland-McGill ranking of perceptual tasks, and it turns “which chart should I use?” from a stylistic preference into an engineering choice: figure out the most important comparison your graphic has to support, and encode it with a task near the top of the list — position or length — rather than one near the bottom. A quarter-century later, Jeffrey Heer and Michael Bostock re-ran the core of the experiment with thousands of online participants and got the same ordering back, which is about as much confirmation as a finding in this field ever gets. The argument that used to be settled by taste turned out to have a right answer, and we can read it off a ruler the eye supplies for free — provided the chart lets it.

Framework & implementation

This section uses Ora’s own terms for the parts of an analysis, so that if you open the actual mode and lens files they line up. Each is glossed in plain language on first use.

Pipeline execution

The Cleveland-McGill ranking is one of the always-loaded perspectives of the Information Density analysis — foundational: true in its lens file, sitting in the mode’s ANALYTICAL PERSPECTIVES block alongside the lens that founds the mode (Tufte’s data-ink and chartjunk), the encoding vocabulary (Bertin’s visual variables), and the perception-of-grouping and composition lenses. The mode runs at Gear 4, Ora’s most thorough setting — a Depth analyst and a Breadth analyst (two readers that examine the graphic in parallel, critique each other, and revise) read it independently; where an image is attached, the mode can mark its findings directly on it via an annotated visual overlay.

Where the lens engages. It activates on its Detection Signals — a graphic that leans on pie charts, bubble charts, 3-D, area-based encodings, or color-as-quantity, so that a task ranked low for accuracy is doing significant analytical work; a magnitude comparison being claimed in the analysis that the chart’s encoding can’t actually support at the implied precision; a choropleth or heatmap used to communicate magnitudes (not just spatial pattern) without acknowledging the precision penalty; or a redesign under consideration that needs the empirical ranking to choose among alternatives. Its Application Steps run the grade: identify each quantitative dimension being encoded; for each, name the perceptual task the viewer must actually perform to read it; locate that task in the ranking and record its rank; identify higher-ranked tasks that could carry the same dimension; recommend re-encoding to the highest-ranked task the graphic’s constraints permit; and flag any quantitative claim resting on a rank-4-or-lower encoding without acknowledgment of the accuracy limit.

What it produces in the analysis. This lens does not supply the output’s skeleton — Tufte’s data-ink audit anchors that. What it contributes is the perceptual-task ranking layer: a per-encoding accuracy grade, a re-encoding recommendation when a higher-accuracy task is available, and — where a lower-ranked encoding is correctly retained — the named justification for the deviation. It is the empirical half of a three-way division of labor: Tufte’s data-ink says how much of the ink is doing work, Bertin names which visual variable each piece of data-ink is using, and Cleveland-McGill says how accurately the eye can read that variable — so the prioritized redesign list prefers the encoding the evidence ranks highest for the comparison that matters most.

Cross-adversarial evaluation. At Gear 4 each analyst’s reading is critiqued by the other, which catches this lens’s signature mistakes — keyed to its Critical Questions and Common Failure Modes: grading the chart-type label instead of parsing what the viewer’s eye must do, so a stacked bar gets scored as rank-1 position when only its bottom segment is anchored to the baseline and every segment above it is really rank-3 length-without-baseline (misidentifying-the-perceptual-task); treating the ranking as an absolute and stripping a choropleth of the spatial context that justified its color, purely to chase rank-1 accuracy (ranking-as-absolute); and the trio of low-accuracy defaults — pie-chart-as-default, 3-D-as-impression, area-as-quantitative-without-acknowledgment, color-only quantitative encoding. The evaluator presses the sharpest test: for the comparison this graphic most needs to support, is the encoding near the top of the accuracy ranking or near the bottom — and if it’s near the bottom, is there a real reason, or just habit?

Honesty discipline. The mode carries a Residual tradeoffs and constraints section, and this lens is a frequent contributor to it, because the highest-accuracy encoding is not always the right one. A map’s color is rank-6 for reading values yet may be exactly right for showing a regional pattern; a pie may be fine for a rough part-of-whole gist where no precise reading is asked. So the analysis names the conflict — accuracy versus spatial context, accuracy versus convention or audience expectation — and weighs it, rather than pretending the ranking is a verdict. The one thing it will not let pass unmarked is a precise magnitude claim leaning on a low-accuracy encoding.

What the analysis will not do. It will not issue a blanket prohibition (the ranking is a defeasible guide, not a rule against any chart type); it will not score a graphic by its chart-type label without parsing the actual per-dimension task; and it will not apply the value-reading ranking to a purpose the ranking doesn’t govern — gestalt pattern detection or anomaly-spotting, where a colormap can legitimately outperform a long bar chart even though bars win at point-magnitude reading.

Origin and evidence

The ranking is William Cleveland and Robert McGill’s, established in “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods” (Journal of the American Statistical Association, 1984), the controlled magnitude-estimation experiments that decomposed chart-reading into elementary perceptual tasks and ordered them by measured accuracy. Cleveland folded the result into a full graphing methodology in The Elements of Graphing Data (1985), and the pair reached a wider scientific audience the same year in Science. The ordering has held up under replication: Heer and Bostock’s 2010 crowdsourced study (CHI 2010) re-ran the core comparisons with thousands of online participants and recovered the same ranking, extending it to encodings Cleveland and McGill hadn’t tested; Talbot, Setlur, and Anand (2014) confirmed that bolting any non-position cue onto a bar chart — 3-D, gradient fills, perspective — typically lowers reading accuracy with no compensating gain. The area and volume penalties trace to older psychophysics — S. S. Stevens’ power law (1957), which found perceived area and volume grow more slowly than the real quantity, so people systematically underestimate the big ones. And the ranking became machinery as well as evidence: Jock Mackinlay’s 1986 automatic-presentation system encoded it as a ranking function, the ancestor of the “recommended chart” logic in modern visualization tools. The companion traditions complete the picture — Bertin supplies the catalog of visual variables this lens grades, and Tufte supplies the evaluative frame whose redesigns this empirical evidence reinforces.

Applications and common uses

The ranking is a working tool wherever a graphic exists to be read for values, used both to diagnose a chart and to choose its replacement.

  • Dashboards and business intelligence. Catching the gauge widgets, donuts, and 3-D pies that encode key metrics in low-accuracy tasks, and moving the load-bearing comparison onto bars or dot plots the eye reads precisely.
  • Scientific and analytical publishing. Choosing encodings for figures readers study closely, where a few percent of reading error changes a conclusion — position-on-a-common-scale by default, area and color only with the precision caveat stated.
  • Data journalism. Deciding when the bubble map or the choropleth is worth its accuracy cost (the spatial story it tells) and when a sorted bar chart serves the reader’s number-reading better.
  • Visualization tooling and defaults. The ranking is the evidence base under “recommended chart” features and the bias toward bars-and-lines in modern charting libraries — Cleveland and McGill, operationalized.
  • Chart redesign and review. The empirical tiebreaker when a redesign has several options on the table: prefer the encoding the ranking puts highest for the comparison that matters, and justify any descent from it.

In every case the payoff is the same: the graphic’s most important comparison ends up in an encoding people read accurately, and any retreat from that — for a map’s pattern, a convention, an audience — is a choice made on the record, with its precision cost named.

Failure modes and when not to use it

The lens’s characteristic ways of going wrong are catalogued in its Common Failure Modes:

  • Pie-chart-as-default. Reaching for a pie (an angle task, rank 3, often with 3-D and perspective penalties piled on) for a quantitative comparison a bar chart (rank 1, position) would convey far more accurately. The tell: the chart asks the viewer to compare slice angles when they could be comparing bar positions. Replace it with bars unless a specific conventional or contextual reason earns the pie.
  • 3-D-as-impression. Adding a third dimension to two-dimensional data in the belief it improves communication. The perspective distortion only lowers reading accuracy with nothing to show for it; remove it.
  • Area-as-quantitative-without-acknowledgment. Using bubbles or other area encodings to carry magnitudes while the caption claims relationships viewers can’t actually extract — people underestimate large areas as a rule. Either give an explicit area-to-value legend or move to a length-based encoding.
  • Color-only quantitative encoding. Asking the reader to read a value off hue alone, with no change in lightness. Use a perceptually-ordered colormap that varies lightness (viridis, magma, grayscale) so the eye has the rank-6a ordinal cue rather than the rank-6b arbitrary one.
  • Ranking-as-absolute. Applying the ranking mechanically — stripping a choropleth of the spatial context that was its whole point just to gain rank-1 accuracy. The ranking is a defeasible guide: keep the lower-ranked encoding when a named constraint justifies it, and acknowledge the precision cost.
  • Misidentifying-the-perceptual-task. Grading by chart-type label rather than by what the eye actually does — scoring a stacked bar as rank-1 position when only its bottom segment touches the baseline and the rest demand rank-3 length-without-baseline reading. Parse the task per data dimension; one chart often mixes several ranks.

When not to reach for it. When the reader’s job is not to extract precise values — pattern recognition, anomaly-spotting, getting the gestalt of a distribution — the value-reading ranking can mislead, and a low-accuracy encoding (a colormap, a small-multiple field) may genuinely serve better; apply the ranking to the comparisons it governs, not to every visualization purpose. When a hard constraint fixes the encoding — a mapped quantity that must stay on the map for its spatial meaning, a house style that mandates a chart type — the grade still diagnoses but the prescription is blocked, so name the constraint and note the cost. And when the graphic isn’t communicating magnitudes at all — a categorical legend, an illustration — the accuracy ranking has little to say, and a different lens applies.

  • Information Density — the analysis this lens informs; audits a graphic for data-ink, encoding, perceptual accuracy, and the redesign that would sharpen it.
  • Tufte Data-Ink and Chartjunk — the lens that founds the mode and anchors its output skeleton (the data-ink audit, the chartjunk catalog, the integrity check); Cleveland-McGill supplies the evidence for which encodings its redesigns should prefer.
  • Bertin Visual Variables — the catalog of the handful of ways (position, size, value, color, shape, orientation, texture) a mark can carry data, naming which variable each encoding uses; Cleveland-McGill ranks how accurately the eye reads each one.
  • Gestalt Grouping Principles — how the eye groups marks (proximity, similarity, enclosure), which governs whether the chosen encoding reads as the designer intends.