14  Visual design and perception

Having explored how to integrate text and data through typography, layout, and grid systems, we now turn to the visual representation of data itself. While we have discussed how to organize information spatially, we have not yet examined how to encode data values into visual form—how to transform numbers into shapes, colors, and positions that human eyes can interpret. This transformation is the essence of data visualization, and it requires understanding both the grammar of graphics and the mechanisms of human perception.

14.1 Why show data graphically?

Data visualization is not merely about making numbers look appealing—it is about harnessing the remarkable pattern-recognition capabilities of human vision. Our brains process visual information in parallel, instantly detecting shapes, relationships, and anomalies that would take minutes or hours to uncover through textual or tabular analysis. But to understand why visualization is so powerful, we must first recognize what happens when we strip away the essential ingredient that gives data graphics their meaning.

Consider a single bar standing alone. What does it mean? Is it large or small? Good or bad? Increasing or decreasing? Without context, a single datum rendered visually tells us almost nothing.

Figure 14.1: A single bar chart showing only one value. Without comparison, this graphic conveys almost no information.

As Koponen and Hildén (2019) observe in The Data Visualization Handbook:

A data graphic acquires its meaning from comparison. While text can use different types of content structures, an abstract visualization just presents relationships between data points. Thus, a single bar, map symbol or shape does not convey information. It only becomes meaningful by its relationship with other elements in the image—in other words, it is polysemic.

Now compare that isolated bar to a chart showing multiple categories:

Figure 14.2: When bars are placed in comparison, meaning emerges. We can now judge relative magnitudes, rank categories, and identify patterns.

The comparative chart allows us to rank values, identify outliers, and understand proportions. This is the fundamental power of data visualization: comparison creates meaning.

The necessity of visual comparison becomes even clearer when we examine datasets that appear identical through summary statistics alone. This famous example, known as Anscombe’s Quartet, was constructed by statistician Francis Anscombe in 1973 to demonstrate why graphs are essential for data analysis. Each dataset has the same mean, variance, and linear regression line—yet their visual patterns are radically different. Consider these four datasets:

Table 14.1: Anscombe's Quartet: Four datasets with identical statistical properties but very different relationships.
1
2
3
4
x y x y x y x y
10 8.04 10 9.14 10 7.46 8 6.58
8 6.95 8 8.14 8 6.77 8 5.76
13 7.58 13 8.74 13 12.74 8 7.71
9 8.81 9 8.77 9 7.11 8 8.84
11 8.33 11 9.26 11 7.81 8 8.47
14 9.96 14 8.10 14 8.84 8 7.04
6 7.24 6 6.13 6 6.08 8 5.25
4 4.26 4 3.10 4 5.39 19 12.50
12 10.84 12 9.13 12 8.15 8 5.56
7 4.82 7 7.26 7 6.42 8 7.91
5 5.68 5 4.74 5 5.73 8 6.89

Reviewing the raw data is cognitively taxing. Scanning for differences in the relationships between x and y across datasets requires sequential, focused attention. Summary statistics do not differentiate these datasets—each x and y variable shares the same mean and standard deviation, and linear regression produces practically identical coefficients. We can verify this computationally:

Table 14.2: Summary statistics for Anscombe’s Quartet. Each dataset has identical means and standard deviations for both x and y variables.
Summary statistics for Anscombe's Quartet
Dataset x mean x sd y mean y sd
1 9 3.32 7.5 2.03
2 9 3.32 7.5 2.03
3 9 3.32 7.5 2.03
4 9 3.32 7.5 2.03

The regression results are equally indistinguishable:

Table 14.3: Linear regression coefficients for each dataset. The intercept, slope, and R-squared values are virtually identical across all four datasets.
Linear regression results for Anscombe's Quartet
Dataset Intercept Slope
x1 1 3.000 0.5 0.667
x2 2 3.001 0.5 0.666
x3 3 3.002 0.5 0.666
x4 4 3.002 0.5 0.667

Despite these identical statistical summaries, the underlying data patterns differ dramatically. Only when we visualize the data do the distinctions become apparent:

Figure 14.3: The differing (x, y) relationships in Anscombe’s Quartet become instantly clear when visualized. Dataset 1 shows linearity; Dataset 2 shows curvature; Dataset 3 shows linearity with an outlier; Dataset 4 shows no relationship with one influential point.

A well-crafted visual display reveals what statistics obscure. Pattern recognition occurs in parallel, leveraging our attunement to preattentive visual attributes (Ware 2020). Unlike sequential processing required for table lookups, visual perception allows us to detect geometry, assemble grouped elements, and estimate relative differences simultaneously (Cleveland 1985, 1993).

Having established that comparison creates meaning and that visualization reveals patterns hidden in statistics, we now turn to the mechanics of building effective graphics. Creating visualizations that support accurate perception requires understanding the fundamental building blocks: the coordinate systems that map data to space, the scales that transform values into visual positions, and the encodings that translate abstract numbers into visible marks. In the sections that follow, we build up visuals from first principles—examining how these components work together to transform data into visible meaning.

14.2 Graphs, coordinate systems, and scales

The foundation of any data graphic is its coordinate system—the mathematical framework that maps data values to positions in space. Understanding coordinate systems is essential because the same data can tell very different stories depending on how we map it to visual space.

14.2.1 Cartesian coordinates

We begin with the simplest and most familiar coordinate system: the two-dimensional Cartesian system. This is the foundation upon which most data graphics are built. In Cartesian coordinates, x and y axes run orthogonally (at right angles) to each other, creating a grid where every data point has a unique address defined by its horizontal and vertical position.

Why start with Cartesian coordinates? Because this system leverages our most accurate perceptual capability: judging position along a common scale. When two points share the same baseline, we can compare their heights with remarkable precision. The orthogonal axes also create a straightforward mapping from data values to visual space that is easy to interpret and universally understood.

Figure 14.4: A point at coordinates (2, 3) in Cartesian space. The x and y axes provide orthogonal reference lines for locating positions.

In Cartesian coordinates, the distance between points corresponds directly to the difference in their data values. This makes Cartesian systems excellent for comparing magnitudes and detecting patterns in two quantitative variables.

14.2.2 Polar coordinate system

Once we understand Cartesian coordinates, we can explore variations that suit different data structures. The polar coordinate system represents one such variation, particularly useful when our data has cyclical or radial characteristics. In polar coordinates, positions are defined by an angle (θ) and a radius (r) from a central origin. The same point that was at (2, 3) in Cartesian coordinates appears at a different location in polar space.

Why would we trade the simplicity of Cartesian grids for the circular logic of polar coordinates? Because certain data naturally fits a radial layout—think of time repeating in daily or annual cycles, compass directions wrapping around 360 degrees, or hierarchical data radiating from a central point. The circular arrangement can emphasize periodicity and make patterns in cyclical data immediately visible.

Figure 14.5: The same geometric elements transformed into polar coordinates. Notice how the straight lines become curves and the point at (2, 3) now appears at a different angular position.

Polar coordinates are particularly useful for cyclical data—time of day, compass directions, seasonal patterns—where the circular layout naturally represents periodic relationships.

To understand the transformation from Cartesian to polar, imagine the Cartesian grid anchored at its bottom-left corner at (-4, -4). Now picture pulling the top vertical edge around in a circle, like opening folding fan. The horizontal axis becomes the radius extending outward from the center, while the vertical axis wraps around to become the angle. Points that were aligned horizontally in Cartesian space now lie along radial lines, and points that were vertically aligned now trace concentric circles.

14.2.3 Map projections

Having explored abstract coordinate systems for general data, we now turn to a specialized but ubiquitous case: geographic maps. Maps present a unique challenge because they must represent the curved, three-dimensional surface of the Earth on a flat, two-dimensional plane. This transformation is fundamentally impossible to perform without distortion—mathematicians have proven that you cannot flatten a sphere without stretching, tearing, or compressing some regions.

Consequently, there is no single “correct” map projection. Each method involves explicit tradeoffs between preserving area, shape, distance, or direction. The projection you choose should align with your analytical goal: Are you comparing sizes of countries? Plotting navigation routes? Understanding global spatial relationships? Consider how the same world boundary data appears under three different projections, each with distinct characteristics:

Equirectangular projection: This is the simplest approach, treating latitude and longitude as Cartesian x and y coordinates. While it preserves the grid-like structure and is computationally straightforward, it significantly distorts area at high latitudes—Greenland appears much larger than it actually is relative to equatorial regions.

Figure 14.6: Equirectangular projection treats latitude/longitude as Cartesian coordinates. Simple but distorts area at high latitudes.

Mercator projection: Developed for navigation, this projection preserves angles and local shapes, making it ideal for plotting compass bearings. However, this comes at a severe cost: areas near the poles are dramatically inflated. On a Mercator map, Greenland appears comparable in size to Africa, when in reality Africa is approximately 14 times larger.

Figure 14.7: Mercator projection preserves angles for navigation but dramatically inflates area near the poles.

Orthographic projection: This perspective projection shows the Earth as it appears from space, preserving true distances from the center point and maintaining the visual appearance of the globe. While intuitive for understanding global relationships, it can only display one hemisphere at a time and distorts features near the edges.

Figure 14.8: Orthographic projection shows the Earth as seen from space, preserving true distances from the center point.

The choice of projection should align with your communication goal. Use Mercator when showing navigation routes or preserving local shapes matters. Use equirectangular for simple grid-based analysis where computational simplicity outweighs area distortion. Use orthographic or other equal-area projections when comparing sizes across different latitudes is essential.

14.2.4 Transforming scales: When linear mapping fails

So far, we have assumed that data values map linearly to visual positions—a value of 8 is twice as far from zero as a value of 4. But many real-world datasets violate this assumption. Consider COVID-19 case counts spanning from single digits to millions, stock prices with exponential growth patterns, or population densities ranging from sparse rural areas to dense urban centers. On a linear scale, such data becomes unreadable: early values disappear into a flat line while late values shoot off the chart.

Scale transformations solve this problem by changing the mathematical relationship between data values and their visual positions. Understanding when and how to transform scales is essential for revealing patterns across the full range of your data.

Just as we can choose different coordinate systems, we can transform the scales that map data values to visual positions. Understanding the distinction between data transformation and scale transformation is crucial.

When we transform data, we mathematically modify the values before plotting. When we transform the scale, we keep the original data values but change how the axis maps them to positions. Both approaches change the visual appearance but in subtly different ways.

Consider data ranging from 1 to 10. The diagrams below illustrate how each transformation affects the visual spacing—quadrant diagrams on the left show which transformation is applied, with corresponding visualizations on the right:

Linear scale (baseline): Points are evenly spaced according to their actual values. No transformation—both data and scale remain linear.

(a) None

 

(b) Equal spacing
Figure 14.9: Linear scale (baseline): Points are evenly spaced according to their actual values. No transformation—both data and scale remain linear.

Log-transformed data: When we mathematically transform the data using logarithms, the values themselves change, and then we plot them on a linear scale.

(a) Data

 

(b) Values compressed at high end.
Figure 14.10: Log-transformed data: When we mathematically transform the data using logarithms, the values themselves change, and then we plot them on a linear scale.

Log scale: Here we keep the original data values but change how the axis positions them—compressing the high end and expanding the low end.

(a) Scale

 

(b) Axis spacing changes.
Figure 14.11: Log scale: Here we keep the original data values but change how the axis positions them—compressing the high end and expanding the low end.

Square root transformations: These offer gentler compression than logarithms and can handle zero values.

(a) Data

 

(b) Intermediate compression.
Figure 14.12: Square root transformations: These offer gentler compression than logarithms and can handle zero values.
(a) Scale

 

(b) Axis uses sqrt spacing.
Figure 14.13: Square root transformations: These offer gentler compression than logarithms and can handle zero values.

Log transformations are particularly useful for data spanning multiple orders of magnitude, as they compress large ranges and expand small ones. However, they cannot handle zero or negative values. Square root transformations offer a gentler compression and can handle zeros.

The New York Times demonstrated the power of scale transformation during the COVID-19 pandemic, showing the same data on both linear and logarithmic scales:

Figure 14.14: The New York Times showed COVID-19 cases using both linear and logarithmic scales. The linear scale emphasizes absolute differences; the log scale reveals relative growth rates and makes exponential patterns appear linear.

On a linear scale, an exponential outbreak appears as a rapidly steepening curve, making early stages look flat and late stages look catastrophic. On a logarithmic scale, exponential growth becomes a straight line, making it easier to compare growth rates across regions at different stages of outbreak and to identify when exponential growth begins to slow.

14.3 Data encodings for visual comparison

Now that we understand the spatial frameworks for positioning data, we turn to the marks themselves—the geometric elements that represent data values. Visual encodings translate abstract numbers into visible properties that human perception can decode. The choice of mark profoundly affects what patterns viewers can perceive.

14.3.1 From points to volumes: Choosing the right mark

Consider how the dimensionality of your mark shapes the story you can tell. A zero-dimensional point conveys position alone. A one-dimensional line adds connection and sequence. A two-dimensional surface introduces area and magnitude. A three-dimensional volume, though challenging to represent accurately on flat displays, suggests mass and density.

Each mark type serves different analytical purposes. Points excel at showing individual observations and outliers. Lines reveal trends and continuity across ordered data. Surfaces emphasize accumulation and magnitude. Understanding this progression—from the simplest point to the most complex volume—helps you select marks that align with your analytical goals.

14.3.1.1 Points

The simplest mark is a point—a location in space with no extent. In practice, points must have some size to be visible, but their essence is position:

Figure 14.15: A point mark encodes position. To be visible, it must have arbitrary size or its size can be linked to data.

14.3.1.2 Lines

When individual observations are ordered—by time, sequence, or rank—a line connects them into a continuous path. Unlike isolated points, lines create relationships. They suggest continuity, trend, and flow:

A line is an infinite collection of points, creating a one-dimensional path through space:

Figure 14.16: A line mark connects points. Like points, lines must have visible width to be seen, or their width can encode data.

14.3.1.3 Surfaces

When magnitude matters as much as position, surfaces or areas fill the space beneath or between lines, creating two-dimensional regions. Areas have intrinsic visibility—they do not require arbitrary sizing to be seen—and they naturally suggest volume, accumulation, and totality:

A surface or area is bounded by lines, creating a two-dimensional region that we can perceive directly:

Figure 14.17: An area mark creates a visible surface. Unlike points and lines, areas have intrinsic visibility and can encode data through their extent or fill properties.

14.3.1.4 Volumes

Volume adds a third dimension, though in two-dimensional displays it must be represented through perspective or shading. Volumes suggest mass, density, and three-dimensional structure, but they are notoriously difficult to judge accurately:

Figure 14.18: Volume adds another dimension but is difficult to represent accurately on a flat surface.

14.3.2 Color as a visual variable

Beyond geometric marks, color provides three distinct perceptual channels for encoding data. Unlike position and length, which are inherently ordered, color’s three dimensions serve different purposes:

  • Hue is what we typically mean by “color”—the quality that distinguishes red from blue from green. Our visual system treats hue as categorical, making it excellent for distinguishing groups but poor for showing ordered quantities.

  • Chroma (also called saturation) describes color intensity—the difference between a pale pink and a vivid red. Chroma can encode ordered data, though our ability to judge saturation differences is less precise than judging position.

  • Luminance (brightness) ranges from light to dark. Like chroma, luminance can encode ordered quantities, but it is more versatile because it works even for viewers with color vision deficiencies.

Together, these three color dimensions provide:

Figure 14.19: The three perceptual dimensions of color: hue distinguishes categories (qualitative), while chroma and luminance can encode ordered quantities.

Different visual variables suit different data types. Position and length work well for quantitative data. Color hue excels at distinguishing categorical data. Luminance and saturation can encode ordered data but are less precise than position for quantitative judgments.

14.3.3 Bertin’s visual variables: A systematic framework

By now, you may recognize that we face a design decision every time we create a visualization: Which visual property should represent which data variable? In 1967, French cartographer Jacques Bertin provided the first comprehensive answer to this question. His Semiology of Graphics (Bertin 1983, 2010) analyzed how humans perceive visual information and systematized the available encoding channels.

Bertin’s insight was that not all visual variables are equally effective for all data types. Some variables (like position) can represent any kind of data accurately. Others (like color hue) excel at categorical distinctions but fail for quantitative comparisons. By understanding Bertin’s framework, we can make informed encoding choices rather than relying on software defaults or personal habit.

Bertin identified the fundamental visual channels available for encoding data:

Figure 14.20: Bertin’s visual variables organized by whether they are suitable for qualitative (nominal), ordered, or quantitative data. Position is the most versatile and accurate encoding channel.

Bertin organized these variables by their suitability for different data types. Position is the most powerful—humans can judge positions along common scales with high precision. Length from a common baseline is nearly as effective. Area, angle, and color saturation are progressively less accurate for quantitative judgments. Color hue works best for distinguishing categories.

14.4 Identify use of Bertin’s channels

Theory becomes useful when applied. We have examined Bertin’s visual variables and the types of data they encode—now let’s practice identifying how professional visualizations map data to these channels. In the following exercises, you will analyze published graphics by systematically deconstructing their encodings: for each visual element, identify which Bertin channel it uses, what data variable it represents, and whether the mapping effectively serves the communication goal.

Work through each example carefully, documenting your observations before checking the analysis provided. This deliberate practice builds the analytical skills necessary to evaluate and improve your own visualizations.

Exercise 14.1 (Ticket Volume Dashboard) The graphic below comes from Cole Nussbaumer Knaflic’s Storytelling with Data, a widely-cited resource on visualization best practices. Study the dashboard and identify its visual encodings before reading the analysis.

Figure 14.21: A dashboard showing ticket volumes over time from Knaflic’s Storytelling with Data.

Step 1: Identify the visual channels

List every visual property you see that varies systematically: positions, colors, shapes, sizes. Then determine what data each represents.

Step 2: Map data to channels

For each channel you identified, specify: - Which Bertin channel is used (position, length, color hue, luminance, etc.)? - What data type does it encode (categorical, ordered, quantitative, temporal)? - What specific data variable is mapped to it? - Is this mapping appropriate given what we know about perceptual accuracy?

Step 3: Evaluate the design

After mapping all channels, evaluate: - Are quantitative comparisons supported by the most accurate channels (position/length)? - Does color serve a clear categorical distinction role? - Are there redundant encodings reinforcing the same dimension? - Could alternative encodings better serve the communication goal?


Analysis:

This time-series chart maps data to visual channels as follows:

Visual Channel Data Variable Data Type Appropriateness
Horizontal position (x-axis) Time (months) Ordered/Temporal Excellent—position is most accurate for ordered data; left-to-right reading matches temporal progression
Vertical position (y-axis) Ticket volume Quantitative Excellent—vertical position from baseline enables precise magnitude comparison
Line elements Sequential data points Connection Good—lines emphasize trend and continuity between time points
Color hue Ticket category Categorical Good—distinct hues differentiate series, but check for color blindness accessibility

The design effectively leverages position—our most accurate perceptual channel—for the critical quantitative and temporal comparisons. Color hue serves a secondary categorical role without competing with the position encodings.

Questions for further consideration:

  • Does the y-axis start at zero? For area-based judgments (the filled area under or between lines), zero baselines prevent perceptual distortion.
  • How many distinct hues are used? Research suggests we can reliably distinguish 6-8 categorical colors.
  • What alternative designs might work? Small multiples (faceting by category) could reduce the cognitive load of comparing overlapping lines, though at the cost of screen space.

Before proceeding to the next exercise, take a moment to reflect: Did your initial analysis match the systematic breakdown above? What encodings did you notice first, and what did you overlook? Developing this analytical discipline—checking each channel systematically rather than relying on first impressions—is crucial for rigorous visualization critique.

Exercise 14.2 (Crime Information) Now examine this geographic visualization from the Los Angeles Times. Geographic displays introduce special complexity: position carries semantic meaning (actual spatial locations) rather than serving purely as a data mapping.

Figure 14.22: A crime visualization from the LA Times showing geographic patterns of incidents.

Your task: Working systematically, identify at least five distinct visual encodings. For each, specify in writing:

  1. The visual channel (which Bertin variable: position, size, color hue, luminance, orientation, shape, etc.)
  2. The data attribute (what specific data variable: crime type, location coordinates, incident count, time, etc.)
  3. The data type (categorical, ordered, quantitative, temporal, geographic)
  4. The encoding appropriateness (does Bertin’s framework endorse this channel for this data type?)

Analysis prompts to guide your work:

  • Position: How is position used?
  • Size: If point size varies, what does it represent? Is size appropriate for that data type?
  • Color: How many color channels are active?
  • Shape: Are different shapes used, or is shape constant? What would shape encode if varied?
  • Texture/Density: In areas with many overlapping points, how does the graphic handle density?

Critical evaluation questions:

  • How does the graph handle overlapping points in high-density areas? Does overlap obscure or aggregate?
  • Does the base graph support or compete with the data message?

Extension: Consider alternative designs.

14.5 Grammar of graphics: A layered approach

We have examined individual encodings—position, color, size, shape—and practiced identifying them in published graphics. But knowing the vocabulary is not the same as knowing the grammar. To construct meaningful visualizations, we need systematic principles for combining these elements into coherent wholes.

Enter Leland Wilkinson. In 1999, Wilkinson—then a statistician at Bell Labs—published The Grammar of Graphics, a work that fundamentally restructured how we think about data visualization. Wilkinson did not merely propose “a” grammar among many possibilities; he argued for “the” grammar, a comprehensive formal system that underlies all statistical graphics. Where previous approaches treated charts as discrete types to be memorized (bar charts, line charts, scatter plots), Wilkinson revealed these as surface manifestations of deeper structural patterns.

14.5.1 Wilkinson’s foundational insight

Wilkinson’s breakthrough was recognizing that graphics share the same foundational structure as language. Just as a finite set of grammatical rules generates infinite meaningful sentences, a finite set of graphical components generates infinite meaningful visualizations. The Oxford English Dictionary defines grammar as “that department of the study of a [thing] which deals with its inflectional forms or other means of indicating the relations of [parts in things].” Wilkinson applied this concept to graphics: visualization requires rules for how data variables, geometric elements, and aesthetic attributes combine.

As Wilkinson insisted:

We often call graphics charts. There are pie charts, bar charts, line charts, and so on. [We should] shun chart typologies. Charts are usually instances of much more general objects. Once we understand that a pie is a divided bar in polar coordinates, we can construct other polar graphics that are less well known. We will also come to realize why a histogram is not a bar chart and why many other graphics that look similar nevertheless have different grammars. Elegant design requires us to think about a theory of graphics, not charts.

Wilkinson’s insight does not negate the value of chart catalogs, which provide useful taxonomies and inspiration. Harris’s Information Graphics remains the most comprehensive reference (Harris 1999)1. But these starting points should not limit our thinking—the grammar enables us to move beyond pre-defined templates.

This perspective liberates us from memorizing chart types. Instead, we learn to combine fundamental components that Wilkinson formalized:

  • Data operations: Creating variables from datasets
  • Transformations: Mathematical operations (sum, mean, rank, log, sqrt)
  • Scales: Mappings from data to visual space (linear, log, sqrt)
  • Coordinates: Spatial frameworks (Cartesian, polar, map projections)
  • Elements: Geometric marks (points, lines, areas)
  • Aesthetic attributes: Visual properties (position, size, color, shape)
  • Guides: Axes, legends, and other reference marks

14.5.2 Theory to practice—Hadley Wickham’s contribution

Wilkinson provided the theoretical framework, but the grammar remained largely abstract until Hadley Wickham implemented it in R’s ggplot2 package. Wickham—a statistician at Rice University and later RStudio—recognized that Wilkinson’s grammar could guide software design. In his 2010 paper “A Layered Grammar of Graphics,” (Wickham 2010) emphasized a crucial dimension Wilkinson had noted but not fully developed: layering.

Wickham observed that complex graphics are not monolithic structures but rather stacks of independent layers. Each layer contains its own data, transformations, and aesthetic mappings, yet layers combine through simple addition. This insight shaped ggplot2’s design: we build graphics incrementally, adding one layer at a time, with each layer contributing distinct information.

Consider the logical progression:

  1. Base layer: Coordinate system and guides (axes, legends)
  2. Context layer: Background elements, reference lines, or geographic boundaries
  3. Data layer: Primary geometric elements with aesthetic mappings
  4. Annotation layer: Text labels, highlights, explanatory marks
  5. Final layer: Scales and theme adjustments

Each layer operates independently—we can modify, add, or remove layers without breaking the overall structure. This modularity makes iteration and experimentation practical.

14.5.3 Demonstrating layers: From simple to complex

To appreciate the power of layering, consider how a complex visualization emerges through progressive accumulation of simple elements. The visualization below—recreating Nadieh Bremer’s “Let The Music Play” (showing the Top 2000 songs from Dutch radio station NPO Radio 2)—demonstrates this principle clearly.

Figure 14.23: Nadieh Bremer’s ‘Let The Music Play’ visualization showing all 2000 songs from the annual Dutch Top 2000 radio countdown. The graphic uses a radial layout where angular position represents ranking and radial distance represents release year. Hover interactions reveal connections between songs by the same artist or collaborations. Original concept and design by Nadieh Bremer (Bremer 2016).

The full graphic appears intricate, but decomposing it reveals how each layer adds specific information. Below we examine how this complex visualization builds up through progressive layering (Spencer 2020).

The build-up proceeds as follows:

Layer 1: Foundation We begin with a polar coordinate system and a central red circle establishing the origin and focal point.

Layer 2: Background A dark circular “vinyl record” area provides visual context, immediately signaling “music” through cultural association.

Layer 3: Structure
Concentric circles and radial grid lines create the coordinate framework—year rings and angular positions where songs will be placed.

Layer 4: Data Individual songs appear as points positioned by release year (radial distance) and chart position (angular position). Each point represents one observation.

Layer 5: Connections Lines connect related songs—collaborations, covers, samples—adding relational information beyond individual data points.

Layer 6: Annotation Labels, legends, and explanatory text complete the graphic, providing context and enabling interpretation.

This decomposition reveals that complexity emerges not from sophisticated individual elements but from the systematic accumulation of simple layers. Each layer is comprehensible in isolation; together they create a rich, multi-dimensional representation. The grammar of graphics provides the rules for how these layers combine—ensuring that adding a new data series or aesthetic mapping follows predictable, debuggable patterns.

14.5.4 Grammar in action: building from simple marks

To see the grammar at work, let’s examine how the same underlying data flows through different geometric elements. Figure 14.24 maps Wilkinson’s components to their ggplot2 implementations:

Figure 14.24: Mapping Wilkinson’s grammar to ggplot2 code structure. Data flows through transformations, scales, coordinates, and geometric elements with aesthetic attributes.

The progression from points to lines to areas illustrates how the grammar generates different visual forms from the same fundamental components. A point mark is the simplest case: we map data variables to x and y positions, and the grammar creates one point per observation. Nothing could be more basic—a zero-dimensional mark positioned in two-dimensional space.

Figure 14.25: Code for a point element: aesthetic mappings position data in Cartesian space.
Figure 14.26: Result: discrete points, each representing a single observation.

Change the geometric element to a line mark, and the same position mappings create continuous connections. The grammar now draws line segments between ordered points, emphasizing sequence and trend. The data and aesthetics remain constant; only the geometric interpretation changes.

Figure 14.27: Code for a line element: same aesthetic mappings, different geometry.
Figure 14.28: Result: connected segments showing progression and continuity.

An area mark extends this logic further, filling the space between the line and a baseline. The aesthetic mappings are identical—same data variables, same scales, same coordinates—but the geometric interpretation creates a filled region that emphasizes magnitude and accumulation.

Figure 14.29: Code for an area element: filling the region beneath a line.
Figure 14.30: Result: filled region emphasizing volume under the curve.

This is the grammar’s power: by varying one component (the geometric element) while holding others constant, we create meaningfully different visualizations. The grammar makes these variations systematic rather than arbitrary.

14.5.5 Order matters: Layering and occlusion

The sequence of layers determines what viewers see. Later layers appear on top of earlier ones, creating occlusion. This ordering is not arbitrary—it represents a deliberate design choice about which information should be visually dominant.

Consider the example below showing two overlapping circles. In the first version, the orange circle is drawn first, then the blue:

Figure 14.31: Layering order 1: Orange circle at (0, 0) drawn first, then blue circle at (1, 1). The blue circle appears on top where they overlap.

Now observe what happens when we reverse the order—the exact same data, the same visual elements, but the blue circle is drawn first:

Figure 14.32: Layering order 2: Same circles, reversed sequence. Now the orange circle appears on top, changing the visual hierarchy.

These particular effects are created in code, simply by our code order for the markings, overlapping the markings, and choosing fill colors to distinguish the two shapes. We can create the same perception in other ways, too.

Samara (2014) describes these design choices as creating a sense of near and far. We may create a sense of depth, of foreground and background, using any of size, overlapping the forms or encodings, the encodings relative values (lightness, opacity). Samara (2014) writes, “the seeming nearness or distance of each form will also contribute to the viewer’s sense of its importance and, therefore, its meaning relative to other forms presented within the same space.” Ultimately we are trying to achieve a visual hierarchy for the audience to understand at each level.

When designing graphics, and especially when comparing encodings or annotating them, we must perceptually layer and separate types of information or encodings. As Tufte (1990) explains, “visually stratifying various aspects of the data” aides readability. By layering or stratifying, we mean placing one type of information over the top of a second type of information. The grammar of graphics, discussed earlier, enables implementations of such a layering. To visually separate the layered information, we can assign, say, a hue or luminance, for a particular layer. Many of the graphics discussed separate types of data through layering.

This principle becomes crucial in complex visualizations. When combining multiple data series, should the most important series appear on top? Should reference lines or annotations occlude data points or sit behind them? When using filled areas with transparency, does the overlap color convey meaningful information or create confusion? The designer must consciously decide which elements should be visually dominant, as this hierarchy guides the viewer’s attention and shapes their interpretation. If we had reversed the order of the layers when re-constructing Bremer’s graphic, what might we see as a result?

14.5.6 Layering and opacity

Opacity / transparency provide another attribute very useful in graphics perception. For layered data encoded in monochrome, careful use of transparency can reveal density:

Figure 14.33: Semi-transparency can help with overplotting.

The key in the above use is monochrome (a single color and shade). When we also use color, especially hue, as a channel to represent other data information, we get unintended consequences. Opacity, combined with other color attributes can change our perception of the color, creating encodings that make no sense. Let see this in action by adding opacity to our foreground / background example above:

Figure 14.34: Semi-transparency and multiple hues cause perception issues.

Notice, also, a question arises: is orange or blue in the foreground? With this combination of attributes, we lose our ability to distinguish foreground from background.

14.6 Deconstructing Lupi’s “Nobels, No Degrees”

To apply our understanding of the grammar, let’s deconstruct an apparent complex visualization. Giorgia Lupi’s “Nobels, No Degrees” visualizes Nobel Prize winners who did not have a university degree . Here’s the original published graphic:

Figure 14.35: Giorgia Lupi’s original ‘Nobels, No Degrees’ as published in a newspaper. The hand-drawn aesthetic creates visual interest while encoding multiple data dimensions.

The graphic is visually striking, but what data does it encode, and through which visual variables? Let’s rotate the graphic and pull it from the paper’s circulation for use here:

Figure 14.36: Spencer’s reconstruction can be decomposed into a set of business graphics, each organized and aligned with one another.

You may notice that some of the complexity is really just organizing four different types of graphs next to one another, and each on its own is a more common business graphic: multiples of a scatter plot and line chart, stacked bar charts, histogram charts, and a sankey or flow diagram. Once we carefully separate out each of the graphics, we find standard ideas in terms of Bertin’s ideas for channels and attributes.

Exercise 14.3 (Deconstruction) Study the original Lupi graphic and identify:

  1. Data variables: What information about each Nobel laureate is being displayed? (Year, category, age, lifespan, etc.)

  2. Visual encodings: Which Bertin variables are used for each data variable? Are any data variables encoded redundantly through multiple visual channels?

  3. Layer structure: What is the logical order of layers? Which elements establish the framework (axes, categories) and which encode the data?

  4. Design choices: What makes this graphic memorable? Which encodings are essential for understanding the data, and which serve aesthetic purposes?

  5. Critique: According to the principles we’ve discussed, which encodings work well? Which might impede accurate interpretation?

14.7 Anatomy of a data graph

Every visualization consists of marks on a page—points, lines, shapes—positioned within a coordinate system and enhanced with various visual attributes. Understanding the structural components helps us make intentional design choices about what to emphasize and what to minimize. Figure 14.37 illustrates the standard components of a statistical graphic. Some components encode data directly: the marks themselves (points, lines, bars) and their visual properties (position, color, size). Other components provide essential context: axes tell us what values the positions represent, legends explain color or size encodings, and titles orient us to the graphic’s purpose.

Figure 14.37: The components of a statistical graphic. Some elements encode data directly; others provide context and support interpretation.

The balance between data-carrying elements and contextual elements shapes how easily audiences can interpret a visualization. Too little context, and viewers cannot decode the encodings. Too much decoration, and the data gets lost in visual noise. Effective design requires judgment about which components serve the communication goal and which merely add clutter.

Every component is optional—except the data marks themselves. A graphic with just points and no axes requires external context to interpret. A graphic with elaborate borders, backgrounds, and annotations may obscure the patterns the data reveals. The grammar of graphics gives us the vocabulary to discuss these components precisely and to modify them systematically.

14.8 Specifying layered graphics for language models

Having established the grammar of graphics and its implementation through ggplot2, we now turn to a practical question: how do we communicate these specifications to AI systems? The layered approach provides a natural structure for prompts—we can specify graphics layer by layer, following the same logical progression we use when building visualizations manually.

14.8.1 The system prompt as grammar specification

Effective prompts to language models require the same precision we demand in our own thinking. Just as Wilkinson’s grammar decomposes graphics into fundamental components, our prompts should decompose requests into systematic specifications. We begin with a system prompt that establishes the computational environment, coding conventions, and our layered approach to visualization.

NoteSystem Prompt: Layered Graphics Specification

You are an expert data visualization developer specializing in the grammar of graphics. I will describe visualizations layer by layer, following Wilkinson’s grammar: data operations, transformations, scales, coordinates, geometric elements, and aesthetic mappings.

Environment and Libraries: - Use R with tidyverse (ggplot2, dplyr, readr) - Connect operations with |> pipe syntax - Use readr::read_csv() for data loading

Grammar Implementation: - Build graphics incrementally through layers - Begin with data layer (read and transform) - Add coordinate and scale layers (Cartesian, polar, etc.) - Add geometric elements (points, lines, areas) - Map aesthetics to data variables (position, size, color, etc.) - Finish with guides and theme adjustments

Coding Standards: - Minimal, working code without comments - Explicit aesthetic mappings (map data variables directly to visual properties like position, size, color) - Implement scale transformations as specified (linear, log, sqrt, reverse, etc.) - Reference data columns exactly as named in the dataset

I will specify each layer sequentially. Implement layers in the order described, building from foundational elements to final annotations.

This system prompt accomplishes several purposes. It establishes the grammar framework explicitly, ensuring the AI understands we are not requesting a “chart type” but rather constructing a visualization from composable elements. It specifies the software environment—R’s ggplot2 being the canonical implementation of the layered grammar. Most importantly, it commits both human and AI to a step-by-step layered approach, mirroring how experienced visualization practitioners actually work.

14.8.2 From grammar to language model prompt

Consider how we might specify the California crime quadrant visualization using our grammar framework. We have already deconstructed this graphic (see Figure 14.22 in the exercise above)—now we reconstruct it through layered specifications. The raw data is available here: https://github.com/ssp3nc3r/diw/blob/main/data/lacrime_yoy_changes.csv. This example will also demonstrate how we can add explanatory elements to provide context, a technique we will explore more fully when we discuss annotations and explanatory graphics in later sections. The key constraint: we specify from the data outward, without referring to the original visualization’s appearance. We let the grammar generate the appropriate visual form.

NotePrompt: California Crime Rate Changes

Create a visualization showing how California city crime rates changed between 2012 and 2013.

Data Layer: Load the dataset from data/lacrime_yoy_changes.csv. The dataset contains 37 rows—36 California cities with population > 150,000 plus one “CALIFORNIA” row representing the state average. Keep the California state row separate for later use; filter it out when preparing the city data.

Transformation Layer: For both data frames, create two derived variables: 1. Calculate Total_Crime_Rate_2013 = Violent_Crime_Rate_2013 + Property_Crime_Rate_2013 (for bubble sizing) 2. Convert percentage changes to proportions for axis scaling: divide Violent_Crime_Change and Property_Crime_Change by 100

Coordinate System: Use Cartesian coordinates with x-axis (violent crime change) and y-axis (property crime change). Set axis limits to -0.3 to 0.3 (representing -30% to +30% change).

Geometric Elements:

  1. Background layer: Add four rectangular annotations creating quadrant backgrounds:

    • Top-right (x>0, y>0): Fill with #F8DCDB (light red/pink)
    • Top-left (x<0, y>0): Fill with #EDEDEE (light gray)
    • Bottom-left (x<0, y<0): Fill with #D2DDEA (light blue)
    • Bottom-right (x>0, y<0): Fill with #EDEDEE (light gray)
  2. Reference layer: Add horizontal and vertical lines at y=0 and x=0 using gray70 color.

  3. Data layer (cities): Add points (geom_point) with:

    • Position: x = violent crime change proportion, y = property crime change proportion
    • Size: mapped to Total_Crime_Rate_2013 (range 3 to 30)
    • Color: #D17333 (burnt orange) with alpha = 0.75
  4. Data layer (California reference): Add one additional point for the California state average using a separate data frame:

    • Same position mappings as cities
    • Size: mapped to Total_Crime_Rate_2013
    • Color: black (#000000)
    • This provides a reference point for comparing individual cities against the state overall

Scales: Both axes should show percentage labels (-30%, -20%, -10%, 0%, 10%, 20%, 30%) but use the proportion values (-0.3 to 0.3) for actual plotting.

Guides: Label x-axis “Violent crime rate change” and y-axis “Property crime rate change”. Remove all legends.

Theme: Use theme_minimal(). Remove panel grid lines. Make axis text size 10, axis titles size 12 and bold. Set plot background to white. Remove legend.

Implementation: Generate working modern R code using ggplot2 with tidyverse syntax (pipe operators |> , geom_* functions, scale_* functions, and theme adjustments).

Consider the reasons we wrote the prompt this way:

This prompt exemplifies specification thinking applied to visualization. Notice several deliberate choices:

Grammar-first organization: Rather than saying “make a scatter plot with colored quadrants,” we specify each layer following Wilkinson’s grammar—data, transformations, coordinates, geometry, aesthetics, guides, theme. This ensures the AI constructs the visualization systematically rather than relying on defaults.

Data-to-visual mapping: Each aesthetic mapping is explicit. Position maps to crime rate changes, size maps to total crime rate, color is constant. Following Bertin’s principles, we use position (our most accurate channel) for the primary comparison and size (less accurate but good for magnitude) for secondary information.

Semantic color choices: The quadrant colors are not arbitrary—they encode meaning. Red tones signal “both crime types increased” (top-right), blue tones signal “both decreased” (bottom-left), gray signals mixed changes. These colors leverage cultural associations without needing explicit legends.

No reference to the original: Notice we never say “recreate the LA Times graphic” or “make it look like the original.” We specify from data and grammar outward, letting the visualization emerge from principled choices. This is crucial—when working with AI, we want the grammar to generate appropriate forms, not merely copy existing designs.

Explanatory annotations: The black California bubble serves as a reference point—a technique from explanatory graphics that helps viewers interpret individual city data relative to the state overall. By placing this distinctively colored point at the state average position, we enable comparisons without cluttering the graphic with text labels or additional legends. This demonstrates how layered graphics can incorporate explanatory elements within the grammar framework. We will come back to explanatory ideas later.

Implementation: As code in one form or another is needed to render the graphic, we should specify our preferred tools. This might be specialty graphics libraries designed specifically to apply the grammar of graphics, like ggplot2 (in R) or plotnine or altair (in Python) or vega-lite or d3.js (in javascript). If we don’t specify it, then the language model will randomly choose its own tool, but we’re no longer in control.

Let’s see what a result may look like:

Following this prompt, the AI produces code that faithfully implements the layered specification:

library(tidyverse)
library(readr)

# Data layer
crime_data <- read_csv("data/lacrime_yoy_changes.csv")

# Separate cities from state average
city_data <- crime_data %>% 
  filter(City != "CALIFORNIA") %>%
  mutate(
    Total_Crime_Rate_2013 = Violent_Crime_Rate_2013 + Property_Crime_Rate_2013,
    Violent_Change_Prop = Violent_Crime_Change / 100,
    Property_Change_Prop = Property_Crime_Change / 100
  )

ca_data <- crime_data %>% 
  filter(City == "CALIFORNIA") %>%
  mutate(
    Total_Crime_Rate_2013 = Violent_Crime_Rate_2013 + Property_Crime_Rate_2013,
    Violent_Change_Prop = Violent_Crime_Change / 100,
    Property_Change_Prop = Property_Crime_Change / 100
  )

# Coordinate, geometry, and aesthetic layers
ggplot(city_data, aes(x = Violent_Change_Prop, y = Property_Change_Prop)) +
  # Background quadrants
  annotate("rect", xmin = 0, xmax = Inf, ymin = 0, ymax = Inf, fill = "#F8DCDB") +
  annotate("rect", xmin = -Inf, xmax = 0, ymin = 0, ymax = Inf, fill = "#EDEDEE") +
  annotate("rect", xmin = -Inf, xmax = 0, ymin = -Inf, ymax = 0, fill = "#D2DDEA") +
  annotate("rect", xmin = 0, xmax = Inf, ymin = -Inf, ymax = 0, fill = "#EDEDEE") +
  # Reference lines
  geom_hline(yintercept = 0, color = "gray70") +
  geom_vline(xintercept = 0, color = "gray70") +
  # City data points (orange)
  geom_point(aes(size = Total_Crime_Rate_2013), color = "#D17333", alpha = 0.75) +
  # California state reference point (black)
  geom_point(data = ca_data, aes(size = Total_Crime_Rate_2013), 
             color = "#000000", alpha = 0.9) +
  # Scales
  scale_x_continuous(limits = c(-0.3, 0.3), 
                     breaks = c(-0.3, -0.2, -0.1, 0, 0.1, 0.2, 0.3),
                     labels = c("-30%", "-20%", "-10%", "0%", "10%", "20%", "30%")) +
  scale_y_continuous(limits = c(-0.3, 0.3), 
                     breaks = c(-0.3, -0.2, -0.1, 0, 0.1, 0.2, 0.3),
                     labels = c("-30%", "-20%", "-10%", "0%", "10%", "20%", "30%")) +
  scale_size_continuous(range = c(3, 30)) +
  # Guides and theme
  labs(x = "Violent crime rate change", y = "Property crime rate change") +
  theme_minimal() +
  theme(panel.grid = element_blank(),
        axis.text = element_text(size = 10),
        axis.title = element_text(size = 12, face = "bold"),
        plot.background = element_rect(fill = "white"),
        legend.position = "none")

The resulting visualization (Figure 14.38) demonstrates how grammar-based specification produces effective graphics without requiring reference to existing designs. The layered structure—data preparation, coordinate system, background context, reference lines, data marks, scales, guides, and theme—emerges naturally from the specification.

Figure 14.38: Reproduction of the California crime quadrant visualization using layered grammar specification. The visualization emerges from systematic data-to-visual mappings following Wilkinson’s grammar and Bertin’s principles.

This approach scales to more complex visualizations. By training ourselves to think in layers—data, transformations, coordinates, geometry, aesthetics, guides, theme—we develop specifications that AI systems can implement reliably. More importantly, we develop the analytical discipline to deconstruct any visualization we encounter, understanding how its components serve (or fail to serve) its communicative purpose.

I should note, as one of your first encounters with a prompt here, it may appear daunting—at over 200 words, it seems to require more effort than simply writing the code yourself. This verbosity serves a pedagogical purpose: it demonstrates the complete grammar of graphics in action, with each layer explicitly articulated so you can see how data flows through transformations into visual form. Even a modest laptop-sized model could work with it.

And consider what this specification produces. The resulting code spans approximately 50 lines, carefully structured across data preparation, coordinate systems, geometric elements, and aesthetic mappings. For this example—even with verbosity—the ratio is roughly 4:1—four words of specification for every line of generated code. This investment pays dividends: the specification is technology-agnostic (the same prompt could generate Python with plotnine or altair, or JavaScript with D3.js), self-documenting (the intent is embedded in the specification itself), and reusable (you can modify individual layers without rewriting everything).

As you internalize the grammar, your prompts will naturally become more concise without sacrificing grammar and could adjust based on model capabilities. The verbose specification above is training wheels—necessary for learning, but eventually supplanted by fluency.

The key insight: we are not asking the AI to “make a chart.” We are describing a computational pipeline that transforms data into visual representations through systematic operations. Once you think in these terms, specification becomes as natural as describing your analysis to a colleague.

Let’s now return to the example from Knaflic as an exercise. We analyzed Cole Nussbaumer Knaflic’s ticket volume dashboard (see Figure 14.21). This graphic tells a compelling story: ticket processing capacity collapsed after two employees quit in May, creating a growing backlog. The visualization uses position along a common scale for temporal comparison, line elements to show continuity, and color hue to distinguish received versus processed tickets.

Now, using the layered grammar framework, write a specification prompt to recreate this visualization. The data file https://github.com/ssp3nc3r/diw/blob/main/data/knaflic-ticket-volume.csv contains 12 rows—one per month in 2014—with three columns: month, received, and processed.

Exercise 14.4 (Write a Layered Specification) Write a complete specification prompt that an AI could use to generate the Knaflic ticket volume visualization. Your specification should:

  1. Analyze the communication goal: What story does this graphic tell? How do the data support that narrative?

  2. Specify each layer following Wilkinson’s grammar:

    • Data: Load the CSV and identify the structure
    • Transformations: Any calculations needed?
    • Coordinate system: What axes? What limits?
    • Geometric elements: Lines? Points? Annotations?
    • Aesthetic mappings: Position to what variables? Color to what categories?
    • Scales: Linear? Any custom labels?
    • Guides: Axis labels, legends?
    • Theme: Clean, minimal, or detailed?
  3. Include implementation details: Request working R code using ggplot2.

  4. Add an annotation layer: Include a vertical reference line at May to highlight when the employees quit, plus text annotation explaining the insight.

Data structure reference:

month,received,processed
Jan,160,160
Feb,185,185
...
Dec,177,140

Test your prompt: Does it clearly separate data operations from visual mappings? Does it follow the layered approach? Would another reader understand what visualization should emerge? Now try changing the prompt to require code from Python and either its plotnine or altair library. And again asking for the result in D3.js and javascript within a single html file as code.

14.9 Looking ahead

We have established the foundations of visual design: the components of graphics, coordinate systems and scale transformations, visual encoding channels following Bertin’s framework, the layered grammar that structures our choices, and the practical implementation of these concepts in code. These foundations provide the theoretical framework for making design decisions.

But theory must be applied. Next, Chapter 16 delves deeper into the practical aspects of encoding data—exploring specific visual channels like position, length, angle, area, volume, color, and texture in greater detail. We will examine which channels are most effective for different types of data and communication goals, and we will learn to avoid common pitfalls that plague even experienced designers.

The goal is not merely to create graphics that are technically correct, but to craft visual communications that guide attention, reveal patterns, and enable understanding. The principles we have outlined here will serve as our compass in that endeavor.

We understand that effective visualization requires:

  1. Comparison as the source of meaning—single data points communicate nothing without context
  2. Appropriate coordinate systems and scales—the mathematical framework shapes perception
  3. Matching encodings to data types—following Bertin’s guidance on which visual variables suit which data
  4. Thinking in layers and grammar—not memorizing chart types but composing from fundamental elements
  5. Attention to order and occlusion—designers control what viewers see through layering decisions

With these principles in hand, we turn to the practical work of decoding visual information and understanding how human perception processes the encodings we create.


  1. Additional resources include Holtz and Healy (2018); Healy (2018); Knaflic (2015); Cleveland (1993); Cleveland (1985).↩︎