Bertin Visual Variables
Why it matters
A mark on a page can vary in only a handful of ways — where it sits, how big it is, what shape it takes, how dark, what color, which way it tilts, what texture it carries — and matching the right one of those to the kind of data you have is most of what makes a graphic readable.
For example: a map shades each county by a different rainbow hue to show its average income — pink, orange, green, blue, scattered with no pattern — and you simply cannot see where the money is, because the eye has no way to rank one hue above another. Re-encode the same numbers as lightness, pale for low and dark for high, and the map snaps into focus: a dark cluster downtown, fading to pale at the edges. The data never changed. The graphic just switched from a variable the eye can’t rank to one it can — and amount finally became visible.
- What it reveals. Which of the few elementary visual variables — position, size, shape, value, color, orientation, texture — is carrying each piece of data, and whether that variable is one the eye can actually read for the kind of meaning intended.
- How it changes the read. You stop asking “does this chart look clean?” and start asking “is each thing the data is doing encoded by a variable suited to it — an amount by something the eye can measure, a category by something it can tell apart?”
- When to foreground it. Any graphic that encodes more than one thing at once — a map, a multi-series chart, a bubble plot, a dashboard — especially when it feels muddled even though every individual choice seems defensible.
- What you’d miss without it. That a graphic can be spotless — no clutter, no decoration — and still fail, because it asks the eye to read a rank out of color or a magnitude out of shape, jobs those variables simply cannot do.
- Where it misleads. The encoding can be right and the graphic still poor — color blindness can break a color-coded key, and loading four or five variables onto one mark overwhelms the eye no matter how well each is matched. A correct encoding is necessary, not sufficient.
How to invoke it in Ora
You have a chart, map, dashboard, or infographic and you want to know whether each thing it shows is encoded in a way the eye can actually read.
Attach the graphic (or describe it precisely — what each mark’s position, size, color, shape, and shading stand for) and ask:
“Critique this map for information density — is each variable encoded in a way the eye can read, what’s chartjunk, and what’s the redesign?”
The encoding check rides inside the Information Density analysis. Ora names, for each thing the graphic shows, which visual variable is carrying it — position, size, shape, value, color, orientation, or texture — and whether that variable suits the kind of data: an amount needs a variable the eye can measure, a rank needs one it can order, a plain category needs only one it can tell apart. A mismatch (an amount in color, a category in size) comes back as a finding, alongside the data-ink and chartjunk audit and a prioritized redesign.
One thing to know: phrases like information density, visual hierarchy, Tufte, data-ink ratio, chartjunk, or Bertin are what route you here. Naming the lens alone — “apply Bertin’s visual variables” — does not route; describe the graphic and ask for the density critique. A clear image, or a mark-by-mark description that says what each variable encodes, gives the analysis the most to work with.
Say what kind of data each dimension is — a measured quantity, a ranking, or an unordered category — because that is exactly what the encoding is judged against. The same color scheme that is perfect for “which climate zone” is wrong for “how much rainfall,” and the analysis can only catch that if it knows which one the graphic means.
One thing Ora won’t do: call an encoding wrong just because it’s weak. Texture instead of color in a black-and-white print, or size instead of length when a map leaves no room for bars, is a justified compromise — the analysis names the constraint and the tradeoff rather than flagging a forced choice as an error.
How it works
Jacques Bertin was a French cartographer who spent decades doing something most people never think to do: looking, carefully and systematically, at thousands of maps and charts and graphs, and asking a question nobody had answered cleanly. When you put a mark on a page to mean something — a dot, a line, a patch of shading — what are all the ways that mark can vary to carry that meaning? Not which charts are good or bad, but what the raw materials even are. In 1967 he published the answer, in a dense and now-famous book called the Sémiologie graphique, and the startling thing about the answer was how small and how complete it turned out to be.
The page itself gives you two dimensions for free: a mark’s left-right and up-down location — its position. Beyond that, Bertin found, a mark can vary in just six more ways, which he called the retinal variables because the eye takes them in at a glance: its size (big or small), its value (light or dark), its texture (the grain or pattern of its fill), its color (its hue — red, blue, green), its orientation (the angle it tilts), and its shape (circle, square, triangle). That is the whole alphabet of visual encoding. Every chart, map, and diagram ever drawn is built from those few letters and no others. You can feel how complete it is by trying to invent an eighth — there isn’t one.
But the alphabet was only half of it. Bertin’s deeper insight was that these variables are not interchangeable — each is good at a different kind of job, and the eye reads them in fundamentally different ways. Some are what he called selective: a single red dot in a field of black ones leaps out instantly, with no scanning, so color and value can isolate a group at a glance. Some are ordered: lightness runs naturally from pale to dark, so the eye reads it as a sequence and value can show rank — but hue does not, because there is no natural order to “is red more than green?” And only a couple are truly quantitative — only position and size let the eye read how much, estimate that one thing is roughly twice another. Shape and color can say which kind, but never how much.
That is the reveal, and it is why Bertin’s “visual variables” became the grammar underneath the whole field: visual encoding has a fixed alphabet and a set of rules about which letter suits which meaning. Hand the eye a job a variable can’t do — ask it to read an amount out of color, which it cannot rank, or a category out of size, which implies a false order — and the graphic fails no matter how clean, how uncluttered, how beautifully made it is. The clutter you can strip away. A mismatched encoding you cannot strip away; you have to re-encode. Bertin gave us the vocabulary to see the difference — to look at a failing graphic and say not just “it’s too busy” but “it’s asking the eye to do something the eye can’t do here, and here is the variable that would do it.”
Framework & implementation
This section uses Ora’s own terms for the parts of an analysis, so that if you open the actual mode and lens files they line up. Each is glossed in plain language on first use.
Pipeline execution
Bertin’s visual variables are an always-loaded catalog in the Information Density analysis — lens_type: catalog, foundational: true in its lens file, sitting in the mode’s ANALYTICAL PERSPECTIVES block beside the foundational data-ink lens (Tufte), the perceptual ranking (Cleveland-McGill), and the grouping principles (Gestalt). The mode runs at Gear 4, Ora’s most thorough setting — a Depth analyst and a Breadth analyst read the attached graphic in parallel, critique each other, and revise; where an image is attached the mode can mark mismatches directly on it via an annotated visual overlay. Bertin supplies the encoding grammar: where Tufte’s data-ink audit anchors the output skeleton (what is data versus decoration, what to cut), Bertin names what each piece of data-ink is doing and whether that variable fits the data type.
Where the lens engages. It activates on its Detection Signals — a graphic encoding several data dimensions where some look mismatched; a categorical color scheme carrying data that has an order hue can’t preserve; a bubble- or shape-size encoding asked to claim magnitude; a graphic that feels “muddled” despite individually defensible choices; or a black-and-white reproduction where the analyst must know which variables survive when color is removed. Its Application Steps run the encoding audit: receive the graphic and its declared data dimensions, and for each dimension identify which of the seven variables encodes it.
What it produces in the analysis — the visual-variable mapping check. This is the lens’s distinct contribution, and it is a different finding from Tufte’s clutter audit. For each data attribute the graphic shows, the lens classifies the attribute’s type — nominal (an unordered category), ordinal (a rank), or quantitative (a measured amount, with magnitude) — names the visual variable encoding it, and walks the pair through the lens’s Properties Tabulation: the table of which variables are selective (highlight a subset), associative (group a family), ordered (carry rank), and quantitative (carry magnitude). Position alone is fully quantitative; size is quantitative but penalized (length read better than area, area better than volume); value and texture carry order but only weak magnitude; shape and hue carry neither order nor amount. A misfit — an amount encoded by hue, a category encoded by size that implies a false rank, a ranking encoded by shape that gives no perceptual cue of order — is surfaced as a finding, each paired with a re-encoding that names the variable to switch to and the property it supports. This mapping is what Tufte’s pipeline calls on when it needs to say not just that a mark wastes ink but what the mark is doing and whether the eye can read it; Cleveland-McGill then refines which of the valid encodings is most accurate.
Cross-adversarial evaluation. At Gear 4 each analyst’s reading is critiqued by the other, which catches the lens’s signature failures — keyed to its Critical Questions and Common Failure Modes: an amount encoded in shape, hue, or orientation (quantitative-in-non-quantitative-variable); a ranking encoded in shape or hue with no perceptual order (ordered-in-non-ordered-variable); an unordered category encoded in value or size so the viewer reads a rank that isn’t there (nominal-in-ordered-variable); a rainbow or qualitative palette asked to carry order with no progression in lightness (hue-as-ordered); a bubble-area encoding assumed to read magnitudes accurately (area-as-fully-quantitative); and texture used where color was freely available (texture-as-default). The evaluator presses the sharpest test the lens carries: is the analyst confusing selective with quantitative — treating a variable that can highlight a group as if it could also encode how much?
Load and honesty discipline. Beyond per-dimension matching, the lens audits the load on each mark — it counts how many variables a single mark carries and flags overload (more than three or four at once), because the eye can attend to two or three variables fluently but a five-variable mark forces sequential reading rather than gestalt comprehension; the recommended fix is to distribute dimensions across small multiples. And the mode carries a Residual tradeoffs and constraints section, so a weak-but-justified encoding forced by the medium — texture in black-and-white print, size where a map leaves no room for bars — is named as a tradeoff against its constraint rather than scored as an error.
What the analysis will not do. It will not flag a forced, medium-constrained compromise as a mistake; will not treat a correct encoding as sufficient (a well-matched variable can still fail a color-blind reader, or overload a mark); and will not pretend a clutter fix can repair a mismatch — clutter you strip, a misfit you re-encode.
Origin and evidence
The framework is Jacques Bertin’s, set out in Sémiologie graphique (Mouton-Gauthier-Villars, 1967) and carried into English as Semiology of Graphics: Diagrams, Networks, Maps (W. J. Berg, trans., University of Wisconsin Press, 1983; reissued by ESRI Press, 2010), with his more accessible Graphics and Graphic Information Processing (1981) offering the worked-example exposition. Bertin’s was a structural, theory-first achievement — a complete taxonomy derived from looking, not from experiment — and the tradition that followed supplied the empirical and computational backing. Alan MacEachren’s How Maps Work (1995) extended the variables into cartographic cognition with perceptual research integrated; Tamara Munzner’s Visualization Analysis and Design (2014) updated the catalog with empirical findings and a computational vocabulary; and Leland Wilkinson’s The Grammar of Graphics (2005) turned Bertin’s encoding choices into explicit language constructs — the lineage that underlies ggplot2 and the modern grammar-of-graphics tools. Within the Information Density analysis the companion lenses divide the labor cleanly: Bertin names the encoding, Cleveland-McGill’s perceptual experiments rank how accurately each encoding is read (refining Bertin’s quantitative ratings with measured evidence), and Tufte’s data-ink and chartjunk principles supply the evaluative frame the encoding audit sits inside.
Applications and common uses
Bertin’s catalog is a working tool wherever data is drawn, used both to diagnose a graphic’s encodings and to choose them in the first place.
- Cartography and choropleth maps. The home ground: deciding that magnitude belongs in value (lightness) and category in hue, and catching the rainbow-coded quantity that the eye cannot rank — the single most common encoding error on maps.
- Multivariate charts and scatterplots. Assigning the most important dimension to position (the only fully quantitative variable) and secondary dimensions to size, hue, or shape — and auditing whether a point mark has been overloaded past what the eye can read at a glance.
- Dashboards and analytical displays. Checking that each metric is encoded by a variable suited to its type before clutter is even addressed — an amount the viewer must compare needs length or position, not a color chip.
- Black-and-white and accessibility-constrained media. Knowing which variables survive when color is removed (position, size, value, shape, texture) so a graphic still works in print, in grayscale, or for a color-blind reader.
- Charting libraries and grammar-of-graphics tools. Bertin’s variables are the explicit “aesthetics” or channels a tool maps data onto — the vocabulary that lets a system reason about which encoding fits which column of data.
In every case the payoff is the same: each thing the graphic shows is carried by a variable the eye can actually read for it — amounts in variables the eye can measure, ranks in variables it can order, categories in variables it can tell apart — so the reader sees the structure of the data instead of fighting the encoding.
Failure modes and when not to use it
The lens’s characteristic ways of going wrong are catalogued in its Common Failure Modes:
- Quantitative-in-non-quantitative-variable. Encoding a numeric amount in shape, hue, or orientation, so the legend asks the viewer to read colors or shapes as numbers. Re-encode to position or length (or, with an explicit accuracy caveat, area); keep shape and hue for unordered categories only.
- Ordered-in-non-ordered-variable. Encoding a ranking in shape or hue, which give the eye no cue that the categories are ordered. Re-encode to value (lightness), size, or position; reserve shape for unordered data.
- Nominal-in-ordered-variable. Encoding unordered categories in value or size, so the viewer reads a rank the data don’t support (“which one is higher?”). Re-encode to hue or shape; reserve value, size, and position for data with a real order.
- Hue-as-ordered. Using a rainbow or qualitative palette to carry ordered or quantitative data, with no progression in lightness for the eye to follow. Switch to a perceptually-uniform sequential colormap that varies value as well as hue, or use value directly.
- Area-as-fully-quantitative. Assuming bubble-area encodings read magnitudes accurately when viewers systematically underestimate large areas. Prefer length (bars) for accurate reading; if area is required, include a clear area-to-magnitude legend and accept the penalty.
- Multivariate overload. Loading more than three or four variables onto a single mark, so the legend needs extensive study and the marks resist gestalt comprehension. Distribute dimensions across small multiples or layered views.
- Texture-as-default. Reaching for cross-hatching or stippling where color is freely available and value or hue would outperform. Reserve texture for media without color or for cases where its particular character is needed.
When not to reach for it. Two cautions bound the lens. First, the variables interact — a correct per-dimension encoding can still fail if too many are stacked on one mark, so the matching audit is incomplete without the load audit beside it. Second, accessibility makes some encodings fragile regardless of fit: color (hue) is a natural choice for categories but a brittle one for the roughly one in twelve men with color-vision deficiency, so a hue encoding that passes the property table can still need a value or shape backup to be safe. And where the medium genuinely forecloses the ideal variable — color unavailable in print, space too tight for bars — the lens diagnoses the compromise but the prescription is constrained; name the constraint rather than scoring the forced choice as an error.
Related
- Information Density — the analysis this catalog informs; audits a graphic for data-ink, chartjunk, integrity, and the redesign that would sharpen it.
- Tufte Data-Ink and Chartjunk — the foundational, evaluative lens of the same analysis: it sorts a graphic’s ink into data and decoration and anchors the redesign, while Bertin names what each piece of that data-ink is doing.
- Cleveland-McGill Perceptual Tasks — the empirical ranking of how accurately people read each encoding, which refines Bertin’s quantitative ratings and tells the redesign which of the valid variables to prefer.
- Gestalt Grouping Principles — how the eye groups marks (proximity, similarity, enclosure), which governs whether the encoded variables read as the designer intends.