Data in Wonderland

Explores communication with data in various forms through seminal and cutting-edge ideas in writing, data analyses, and visualzation.

Scott Spencer (Columbia University)

Narrative and story can enhance others’ understanding of data, offering meaning and insights as communicated in numbers and words, or in graphical encodings. Carefully combined, narratives, data analyses, and visuals can help enable change:

Empirical studies suggest the communication is generally more effective when its author controls all aspects of the communication, from content to typography and form. But in some cases, we may enhance the communication by allowing our audience to choose among potential contexts of the information. Here, we aim to explore many ideas within this framework, using as content a data analytics project. We will start by exploring our content (a proposed and implemented data analytics project), our intended audiences (both executives, more general, and mixed audiences), through narrative and considering not just our words but the typographic forms of those words in the chosen communication medium. Then we begin to integrate other, visual, forms of data representation into the narrative. As our discussions in graphical data encodings become more complex, we give them focus in the form of information graphics, dashboards, and finally enable our audience as, to some degree, co-author with interactive design of the communication.

The readers who will get most from this text, for whom I have in mind as my audience, are curious active learners:

An active learner asks questions, considers alternatives, questions assumptions, and even questions the trustworthiness of the author or speaker. An active learner tries to generalize specific examples, and devise specific examples for generalities.

An active learner doesn’t passively sponge up information — that doesn’t work! — but uses the readings and lecturer’s argument as a springboard for critical thought and deep understanding.

This text isn’t meant to be an end, but a beginning, giving you hand-selected, seminal and cutting-edge references for the concepts presented. Go down these rabbit holes, following citations and studying the cited material. Becoming an expert in storytelling with data also requires practicing. Indeed,

Learners need to practice, to imitate well, to be highly motivated, and to have the ability to see likenesses between dissimilar things in [domains ranging from creative writing to mathematics] (Gaut 2014).

You may find some concepts difficult or vague on a first read. For that, I’ll offer encouragement from Abelson (1995),

I have tried to make the presentation accessible and clear, but some readers may find a few sections cryptic …. Use your judgment on what to skim. If you don’t follow the occassional formulas, read the words. If you don’t understand the words, follow the music and come back to the words later.

Let’s dive in!

1 Narrative

1.1 Analytics Communication Scopes

1.1.1 Scopes

One place the general scope of analytics projects arise is within science (in which data science exists) and proposal writing. Let’s first just consider basic informational categories (in typical, but generic structure or order) that generally form a research proposal — see (Friedland, Folt, and Mercer 2018), (Oster and Cordo 2015), (Schimel 2012), (Oruc 2011), and (Foundation 1998), keeping in mind that the particular information and ordering explicitly depend on our intended audience:

I.  Title
II. Abstract
III. Project description
     A.  Results from prior work
     B.  Problem statement and significance
     C.  Introduction and background
         1.  Relevant literature review
         2.  Preliminary data
         3.  Conceptual, empirical, or theoretical model
         4.  Justification of approach or novel methods
     D.  Research plan
         1.  Overview of research design
         2.  Objectives or specific aims, hypotheses, and methods
         3.  Analysis and expected results
         4.  Timetable
     E.  Broader impacts
IV. References cited
V.  Budget and budget justification

While these sections are generically labelled and ordered, each should be specific to an actual proposal. Let’s consider a few sections. The title, for example, should accurately reflect the content and scope of the overall proposal. The abstract frames the goals and scope of the study, briefly describes the methods, and presents the hypotheses and expected results or outputs. It also sets up proper expectations, so be careful to avoid misleading readers into thinking that the proposal addresses anything other than the actual research topic. Try for no more than two short paragraphs.

Within the project description, the problem statement and significance typically begin with the overall scope and then funnels the reader through the hypotheses to the goals or specific aims of the research.

The literature review sets the stage for the proposal by discussing the most widely accepted or influential papers on the research. The key here is to provide context and be able to show where the proposed work would extend what has been done or how it fills a gap or resolves uncertainty, etcetera1. We will discuss this in detail later.

Preliminary data can help establish credibility, likely success, or novelty of the proposal. But we should avoid overstating the implications of the data or suggesting we’ve already solved the problem.

In the research plan, the goal is to keep the audience focused on the overall significance, objectives, specific aims, and hypotheses while providing important methodological, technological, and analytical details. It contains the details of the implementation, analysis, and inferences of the study. Our job is typically to convince our audience that the project can be accomplished.

Objectives refer to broad, scientifically far-reaching aspects of a study, while hypotheses refer to a more specific set of testable conjectures. Specific aims focus on a particular question or hypothesis and the methods needed and outputs expected to fulfill the aims. Of note, these objectives will typically have already been (briefly) introduced earlier, for example, in the abstract. Later sections add relevant detail.

If early data are available, show how you will analyze them to reach your objectives or test your hypotheses, and discuss the scope of results you might eventually expect. If such data are unavailable, consider culling data from the literature to show how you expect the results to turn out and to show how you will analyze your data when they are available. Complete a table or diagram, or run statistical tests using the preliminary or “synthesized” data. This can be a good way to show how you would interpret the results of such data.

From studying these generic proposal categories, we get a rough sense of what we, and our audiences, may find helpful in understanding an analytics project, results, and implications: the content we communicate. Let’s now focus on analytics projects in more detail. Data measures in analytics projects

Data analytics projects, of course, require data. What, then, are data? Let’s consider what Kelleher and Tierney (2018) has to say in their aptly titled chapter, What are data, and what is a data set? Consider their definitions:

datum : an abstraction of a real-world entity (person, object, or event). The terms variable, feature, and attribute are often used interchangeably to denote an individual abstraction.

Data are the plural of datum. And:

data set : consists of the data relating to a collection of entities, with each entity described in terms of a set of attributes. In its most basic form, a data set is organized in an \(n \cdot m\) data matrix called the analytics record, where \(n\) is the number of entities (rows) and \(m\) is the number of attributes (columns).

Data may be of different types, including nominal, ordinal, and numeric. These have subtypes as well. Nominal types are names for categories, classes, or states of things. Ordinal types are similar to nominal types, except that it is possible to rank or order categories of an ordinal type. Numeric types are measurable quantities we can represent using integer or real values. Numeric types can be measured on an interval scale or a ratio scale. The data attribute type is important as it affects our choice of analyses and visualizations.

Data can also be structured (like a table) or unstructured (more like the words in this document). And data may be in a raw form such as an original count or measurement, or it may be derived, such as an average of multiple measurements, or a functional transformation. Normally, the real value of a data analytics project is in using statistics or modelling “to derive one or more attributes that provide insight into a problem” (Kelleher and Tierney 2018).

Finally, existing data originally for one purpose may be used in an observational study, or we may conduct controlled experiments to generate data.

But it’s important to understand, on another level what data represents. Lupi (2016) offers an interesting and helpful take,

Data represents real life. It is a snapshot of the world in the same way that a picture catches a small moment in time. Numbers are always placeholders for something else, a way to capture a point of view — but sometimes this can get lost. Understanding Data requires context

Data measurements never reveal all aspects relevant to their generation or impact upon our analysis (Loukissas 2019). Loukissas (2019) provides several interesting examples where local information that generated the data matter greatly in whether we can fully understand the recorded, or measured data. His examples include plant data in an arboretum, artifact data in a museum, collection data at a library, information in the news as data, and real estate data. Using these examples, he convincingly argues we need to shift our thinking from data sets to data settings.

Let’s consider another example, from baseball. In the game, a batter that hits the pitched ball over the outfield fence between the foul poles scores for his team — he hits a home run. But a batter’s home run count in a season does not tell us the whole story of their ability to hit home runs. Let’s consider some of the context in which a home run occurs. Batters hit a home run pitched by a specific pitcher, in a specific stadium, in specific weather conditions. All of these circumstances (and more) contribute to the existence of a home run event, but that context isn’t typically considered. Sometimes partly, rarely completely.

Perhaps obviously, all pitchers have different abilities to pitch a ball in a way that affects a batter’s ability to hit the ball. Let’s leave that aside for the moment, and consider more concrete context.

In Major League Baseball there are 30 teams, each with its own stadium. But each stadium’s playing field is differently sized than the others, each stadium’s outfield fence has uneven heights, and is different from other stadium fences! To explore this context, in figure 1, hover your cursor over a particular field or fence to link them together and compare with others.

Figure 1: We cannot understand the outcome of a batter’s hit without understanding its context, including the distances and heights of each stadium’s outfield fences.

You can further explore this context in an award-winning, animated visualization (Vickars 2019, winner Kantar Information is Beautiful Awards 2019). Further, the trajectory of a hit baseball depends heavily on characteristics of the air, including density, wind speed, and direction (Adair 2017). The ball will not travel as far in cold, humid, dense air. And density depends on temperature, altitude, and humidity. Some stadiums have a roof with conditioned air protected somewhat from weather. But most are exposed. Thus, we would learn more about the qualities of a particular batter’s home run if understood in the context of these data.

Other aspects of this game are equally context-dependent. Consider each recorded ball or strike, an event made by the umpire when the batter does not swing at the ball. The umpire’s call is intended to describe location of the ball as it crosses home plate. But errors exist in that measurement. It depends on human perception, for one. We have independent measurements by a radar system (as of 2008). But that too has measurement error we can’t ignore. Firstly, there are 30 separate radar systems, one for each stadium. Secondly, those systems require periodic calibration. And calibration requires, again, human intervention. Moreover, the original radar systems installed in these stadiums in 2007 are no longer used. Different systems have been installed and used in their place. Thus, to fully understand the historical location of each pitched baseball and outcome means we must research and investigate these systems.

So when we really want to understand an event and compare among events (comparison is crucial for meaning), context matters. We’ve seen this in the baseball example, and in Loukissas’s several fascinating case study examples with many types of data. When we communicate about data, we should consider context, data settings. Project scope

More on scope. On a high-level, it involves an iterative progression of the identification and understanding of decisions, goals and actions, methods of analysis, and data.

Analytic components of a general statistical workflow, adapted from Pu and Kay (2018).

Figure 2: Analytic components of a general statistical workflow, adapted from Pu and Kay (2018).

The framework of identifying goals and actions, and following with information and techniques gives us a structure not unlike having the outline of a story, beginning with why we are working on a problem and ending with how we expect to solve it. Just as stories sometimes evolve when retold, our ideas and structure of the problem may shift as we progress on the project. But like the well-posed story, once we have a well-scoped project, we should be able to discuss or write about its arc — purpose, problem, analysis and solution — in relevant detail specific to our audience.

Specificity in framing and answering basic questions is important: What problem is to be solved? Is it important? Does it have impact? Do data play a role in solving the problem? Are the right data available? Is the organization ready to tackle the problem and take actions from insights? These are the initial questions of a data analytics project. Project successes inevitably depend on our specificity of answers. Be specific. Defining goals, actions, and problems

Identifying a specific problem is the first step in any project. And a well-defined problem illuminates its importance and impact. The problem should be solvable with identified resources. If the problem seems unsolvable, try focusing on one or more aspects of the problem. Think in terms of goals, actions, data, and analysis. Our objective is to take the outcome we want to achieve and turn it into a measurable and optimizable goal.

Consider what actions can be taken to achieve the identified goal. Such actions usually need to be specific. A well-specified project ideally has a set of actions that the organization is taking — or can take — that can now be better informed through data science. While improving on existing actions is a good general starting point in defining a project, the scope does not need to be so limited. New actions may be defined too. Conversely, if the problem stated and anticipated analyses does not inform an action, it is usually not helpful in achieving organizational goals. To optimize our goal, we need to define the expected utility of each possible action. Researching and using what is known

The general point of data analyses is to add to the conversation of what is understood. An answer, then, requires research: what’s understood? In considering how to begin, we get help from J. Harris (2017) in another context, making interesting use of texts in writing essays. We need to “situate our [data analyses] about … an issue in relation to what others have written about it.” That’s the real point of the above “literature review” that funding agencies expect, and it’s generally the place to start our work.

Searching for what is known involves both the “literature” on whatever issue we’re interested in, and any related data. Identifying accessible data

Do data play a role in solving the problem? Before a project can move forward, data must be both accessible and relevant to the problem. Consider what variables each data source contributes. While some data are publicly available, other data are privately owned and permission becomes a prerequisite. And to the extent data are unavailable, we may need to setup experiments to generate it ourselves. To be sure, obtaining the right data is usually a top challenge: sometimes the variable is unmeasured or not recorded.

In cataloging the data, be specific. Identify where data are stored and in what form. Are data recorded on paper or electronically, such as in a database or on a website? Are the data structured — such as a CSV file — or unstructured, like comments on a twitter feed? Provenance is important (Moreau et al. 2008): how were the data recorded? By a human or by an instrument?

What quality are the data (Fan 2015)? Measurement error? Are observations missing? How frequently is it collected? Is it available historically, or only in real-time? Do the data have documentation describing what it represents? These are but a few questions whose answers may impact your project or approach. By extension, it affects what and how you communicate. Identifying analyses and tools

Once data are collected, the workflow needed to bridge the gap between raw data and actions typically involves an iterative process of conducting both exploratory and confirmatory analysis (Pu and Kay 2018), see Figure 2, which employs visualization, transformation, modeling, and testing. The techniques potentially available for each of these activities may well be infinite, and each deserves a course of study in itself. Wongsuphasawat, Liu, and Heer (2019), as their title suggests, review common goals, processes, and challenges of exploratory data analysis.

Today’s tools for exploratory data analysis frequently begin by encoding data as graphics, thanks first to John Tukey, who pioneered the field in Tukey (1977), and subsequent, and applied, work in graphically exploring statistical properties of data. Cleveland (1985), the first of Cleveland’s texts on exploratory graphics, considers basic principles of graph construction (e.g., terminology, clarifying graphs, banking, scales), various graphical methods (logarithms, residuals, distributions, dot plots, grids, loess, time series, scatterplot matrices, coplots, brushing, color, statistical variation), and perception (superposed curves, color encoding, texture, reference grids, banking to 45 degrees, correlation, graphing along a common scale). His second book, Cleveland (1993), builds upon learnings of his first, and explores univariate data (quantile plots, Q-Q plots, box plots, fits and residuals, log and power transformations, etcetera), bivariate data (smooth curves, residuals, fitting, transforming factors, slicing, bivariate distributions, time series, etcetera), trivariate data (coplots, contour plots, 3-D wireframes, etcetera), hypervariate data (using scatterplot matrices and linking and brushing). Chambers (1983) in particular explores and compares distributions, explicitly considers two-dimensional data, plots in three or more dimensions, assesses distributional assumptions, and develops and assesses regression models. These three are all helpfully abstracted from any particular programming language2. Unwin (2016) applies exploratory data analysis using R, examining (univariate and multivariate) continuous variables, categorical data, relationships among variables, data quality (including missingness and outliers), making comparisons, through both single graphics and an ensemble of graphics.

While these texts thoroughly discuss approaches to exploratory data analyses to help data scientists understand their own work, these do not focus on communicating analyses and results to other audiences. In this text and through references to other texts we will cover communication with other audiences.

To effectively use graphics tools for exploratory analysis requires the same understanding, if not the same approach, we need for graphically communicating with others, which we explore, beginning in section 2.

Along with visualization, we can use regression to explore data, as is well explained in the introductory textbook Gelman, Hill, and Ventari (2020), which also includes the use of graphics to explore models and estimates. These tools, and more, contribute to an overall workflow of analysis: Gelman et al. (2020) suggest best practices.

Again, the particular information and its ordering in any communication of these analyses and results depend entirely on our audience. After we begin exploring an example data analysis project, and consider workflow, we will consider audience.

1.1.2 Applying project scope: Citi Bike

Let’s develop the concept of project scope in the context of an example, one to help the bike share sponsored by Citi Bike.

You may have heard about, or even rented, a Citi Bike in New York City. Researching the history, we learn that in 2013, the New York City Department of Transportation sought to start a bike share to reduce emissions, road wear, congestion, and improve public health. After selecting an operator and sponsor, the Citi Bike bike share was established with a bike fleet distributed over a network of docking stations throughout the city. The bike share allows customers to unlock a bike at one station and return it at any other empty dock.

Might this be a problem we can find available data and conduct analyses to inform the City’s actions and further its goals?

Exercise 1 Explore the availability of bikes and docking spots as depending on users’ patterns and behaviors, events and locations at particular times, other forms of transportation, and on environmental context. What events may be correlated with or cause empty or full bike docking stations? What potential user behaviors or preferences may lead to these events? From what analogous things could we draw comparisons to provide context? How may these events and behaviors have been measured and recorded? What data are available? Where are it available? In what form? In what contexts are the data generated? In what ways may we find incomplete or missing data, or other errors in the stored measurements? May these data be sufficient to find insights through analysis, useful for decisions and goals?

Answers to questions as these provide necessary material for communication. Before digging into an analysis, let’s discuss two other aspects of workflow — reproducibility and code clarity.

1.1.3 Workflow for credible communication

Truth is tough. It will not break, like a bubble, at a touch; nay, you may kick it about all day, like a football, and it will be round and full at evening (Holmes 1894).

To be most useful, reproducible work must be credibly truthful, which means that our critics can test our language, our information, our methodologies, from start to finish. That others have not done so led to the reproducibility crisis noted in (Baker 2016):

More than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments.

By reproducibility, this meta-analysis considers whether replicating a study resulted in the same statistically significant finding (some have argued that reproducibility as a measure should compare, say, a p-value across trials, not whether the p-value crossed a given threshold in each trial). Regardless, we should reproducibly build our data analyses like Holmes’s football, for our critics (later selves included) to kick it about. What does this require? Ideally, our final product should include all components of our analysis from thoughts on our goals, to identification of — and code for — collection of data, visualization, modeling, reporting and explanations of insights. In short, the critic, with the touch of her finger, should be able to reproduce our results from our work. Perhaps that sounds daunting. But with some planning and use of modern tools, reproducibility is usually practical. Guidance on assessing reproducibility and a template for reproducible workflow is described by Kitzes and co-authors (Kitzes, Turek, and Deniz 2018), along with a collection of more than 30 case studies. The authors identify three general practices that lead to reproducible work, to which I’ll add a fourth:

  1. Clearly separate, label, and document all data, files, and operations that occur on data and files.

  2. Document all operations fully, automating them as much as possible, and avoiding manual intervention in the workflow when feasible.

  3. Design a workflow as a sequence of small steps that are glued together, with intermediate outputs from one step feeding into the next step as inputs.

  4. The workflow should track your history of changes.

Several authors describe modern tools and approaches for creating a workflow that leads to reproducible research supporting credible communication, see (Gandrud 2020) and (Healy 2018a).

The workflow should include the communication. And the communication includes the code. What? Writing code to clean, transform, and analyze data may not generally be thought of as communicating. But yes! Code is language. And sometimes showing code is the most efficient way to express an idea. As such, we should strive for the most readable code possible. For our future selves. And for others. For code style advice, consult Boswell and Foucher (2011) and an update to a classic, Thomas and Hunt (2020).

1.2 Audiences and Challenges

1.2.1 Audiences

Sometimes in analytics we write for ourselves, part and parcel to analysis. In another context, writer Joan Didion captured this well:

I write entirely to find out what I’m thinking, what I’m looking at, what I see, and what it means (Didion 1976).

But our introspectives are not optimal for others. Analytics executives

Let’s imagine, for a moment, we share the mind of Citi Bike’s Chief Analytics Officer. We know what the public experiences with our bike sharing program as has, too, been reported in the news. The newspaper West Side Rag, for example, quoted Dani Simmons, our program’s spokeswoman. She explained: “Rebalancing is one of the biggest challenges of any bike share system, especially in … New York where residents don’t all work a traditional 9-5 schedule, and … people work in a variety of other neighborhoods” (Friedman 2017). As a Chief Analytics Officer, one of our jobs includes analyzing, and overseeing the analyses of, data that inform decisions for solving problems for our organization (Zetlin 2017), like rebalancing.

Would it interest us if we opened an email or memo from one of our data analysts that began:

Citi Bike, a bike sharing program, has struggled to rebalance its bikes. By rebalance, I mean taking actions that ensure customers may both rent bikes and park them at the bike sharing program’s docking stations….

Are we — Citi Bike’s Chief Analytics Officer — motivated to continue reading? Do we know why we should, or whether we should spend our time on other matters? What might we think of the data analyst from whom we received that communication?

Returning, now, to our own minds: for what audience(s), if any, might such a beginning be interesting or helpful? How can we assess whether the communication is appropriate, even optimized, and if not, adjust it to be so? That’s the focus of this text.

As we hone our skills for communicating with intended audiences, we’ll consider other minds, too: executives in analytics, marketing, chief executives, both individually and mixed with secondary or more general audiences. Consider Scott Powers, Director of Quantitative Analytics at the Los Angeles Dodgers, who earned his doctorate in statistics from Stanford University, publishes research in machine learning, codes in R, among other languages, and has worked for the Dodgers for several years. Marketing executives

A Chief Marketing Officer shares some responsibilities with the analytics officer and other executives, while other responsibilities are primarily her own. Meet David Carr, Director of Marketing Strategy and Analysis at Digitas (a marketing agency) who has written to, and about, his marketing colleagues and their uses and misuses of data. Broadly, he or she leads responses to changing circumstances; shapes products, sales strategies, and marketing ideas, collaborating across the company.

Carr (2019) describes three main types of value that marketing drives:

  1. business value: long and near-term growth, greater efficiency and enhanced productivity
  2. consumer value: attitudes and behaviors that effect brand choice, frequency and loyalty
  3. cultural value: shared beliefs that create a favorable environment in which to operate and influence

He illustrates his research of, and experience with, these values graphically, as a central circle, and in concentric rings identifies various characteristics and details related to these values.

Relatedly, Carr (2016) has mapped out the details for designing and managing a brand, and explained its interconnections:

The brand strategy should be influenced by the business strategy and should reflect the same strategic vision and corporate culture. In addition, the brand identity should not promise what the strategy cannot or will not deliver. There is nothing more wasteful and damaging than developing a brand identity or vision based on strategic imperative that will not get funded. An empty promise is worse than no promise.

We can tie many aspects of brand building and marketing value to measurements and data. Carr (2018) explains how marketing does — and should — work with data. His article suggests how we should craft data-driven messages for marketing executives. Chief executives

Typically, the analytics and marketing executives report, directly or indirectly, to the CEO, who has ultimate responsibility to drive the business. Bertrand (2009) reviews empirical studies on the characteristics of CEOs. They write, while “modern-day CEOs are more likely to be generalists,” more than one quarter of those running fortune 500 companies have earned an MBA. The core educational components of the MBA program at Columbia, for example, include managerial statistics, business analytics, strategy formulation, marketing, financial accounting, corporate finance, managerial economics, global economic environment, and operations management.(Columbia University 2020) This type of curricula suggests the CEO’s vocabulary intersects with both analytics and marketing. Indeed, Bertrand explains that “current-day CEOs may require a broader set of skills as they directly interact with a larger set of employees within their organization.” If they are fluent in the basics of analytics and marketing, their responsibilities are both broader and more focused on leading the drive for creating business value. Our communications with the CEO should begin with and remained focused on how the content of our communication helps the CEO with their responsibilities.

In communicating, we should keep in mind that audiences3 have a continuum of knowledge. Everyone is a specialist on some subjects and a non-specialist on others. Moreover, even a group of all specialists could be subdivided into more specialized and less specialized readers. Specialists want details. Specialists want more detail because they can understand the technical aspects, can often use these in their own work, and require them anyway to be convinced. Non-specialists need you to bridge the gap. The less specialized your audience, the more basic information is required to bridge the gap between what they know and what the document discusses: more background at the beginning, to understand the need for and importance of the work; more interpretation at the end, to understand the relevance and implications of the findings.

Exercise 2 Identify a few analytics, marketing, and chief executives, and research their backgrounds. Describe the similarities and differences, comparing the range of skills and experience you find.

1.2.2 Challenges Information gaps

One challenge in communicating data analytics is understanding what we see. For that, we might consider again Didion (1976)’s thoughts as part of our project. We generally revise our written words and refine our thoughts together; the improvements made in our thinking and improvements made in our writing reinforce each other (Schimel 2012). Clear writing signals clear thinking. To test our project, then, we should clarify it in writing. Once it is clear, we can begin the processes of data collection, further clarify our understanding, begin technical work, again clarify our understanding, and continuing the iterative process until we converge on interesting answers that support actions and goals.

More overlooked, to be explored here, is communicating our project effectively to others. Consider the skills typically needed for an analytics project. The qualities we need in an analytics team, writes Berinato (2018), include project management, data wrangling, data analysis, subject expertise, design, and storytelling. For that team to create value, they must first ask smart questions, wrangle the relevant data, and uncover insights. Second, the team must figure out — and communicate — what those insights mean for the business.

These communications can be challenging, however, as an interpretation gap frequently exists between data scientists and the executive decision makers they support, see (Maynard-Atem and Ludford 2020) and (Brady, Forde, and Chadwick 2017).

How can we address such a gap?

Brady and his co-authors argue that data translators should bridge the gap, address data hubris and decision-making biases, and find linguistic common ground. Subject-matter experts should be taught the quantitative skills to bridge the gap because, they continue, it is easier to teach quantitative theory than practical, business experience.

Before delving into the above arguments, let’s first consider from what perspective we’re reading. Both perspectives are written for business executives, Berinato writes in the Harvard Business Review, Brady and his co-authors write from MIT Sloan Management Review. According to HBR, their “readers have power, influence, and potential. They are senior business strategists who have achieved success and continue to strive for more. Independent thinkers who embrace new ideas. Rising stars who are aiming for the top” (HBR Advertising and Sales,” n.d.). Similarly, MIT Sloan Management Review reports their audience: “37% of MIT SMR readers work in top management, while 72% confirm that MIT SMR generates a conversation with friends or colleagues” (“Print Advertising Opportunities” 2020). Further, all authors are in senior management. Berinato is senior editor. Brady and co-authors are consultants focusing on sports management. Why might it be important we know both an author’s background and their intended audience?

Perhaps it is not surprising for a senior executive to conclude that it would be easier to teach data science skills to a business expert than to teach the subject of a business or field to those already skilled in data science. Is this generally true? Might the background of a data translator depend upon the type of business or type of data science? Is it appropriate for this data translator to be an individual? Berinato argues that data science work requires a team. Might the responsibility of a data translator be shared?

Bridging the gap requires developing a common language. Senior management do not all use the same vocabulary and terms as analysts. Decision makers seek clear ways to receive complex insights. Plain language, aided by visuals, allow easier absorption of the meaning of data. Along with common language, data translators should foster better communication habits. Begin with questions, not assertions. Then, use analogies and anecdotes that resonate with decision makers. Finally, whomever fills this role, they must hone their skills, skills that include business and analytics knowledge, but also must learn to speak the truth, be constantly curious to learn, craft accessible questions and answers, keep high standards and attention to detail, be self-starters. Multiple or mixed audiences

Frequently we encounter mixed audiences. Audiences are multiple, for each reader is unique. Still, readers can usefully be classified in broad categories on the basis of their proximity both to the subject matter (the content) and to the overall writing situation (the context). Primary readers are close to the situation in time and space. Uncertainty of the knowledge of a reader is like having a mixed audience, one knowing more than the other. Writing for a mixed audience is, thus, quite challenging. That challenge to write for a mixed audience is to give secondary readers information that we assume the primary readers know already while keeping the primary reader interested. The solution, conceptually, is simple: just ensure that each sentence makes an interesting statement, one that is new to all readers — even if it includes information that is new to secondary readers only. Thus, make each sentence interesting for all audiences. Let’s consider Doumont (2009)’s examples for incorporating mixed audiences. The first sentence in his example,

We worked with IR.

may not work because IR may be unfamiliar to some in the audience. One might try to fix the issue by defining the word or, in this case, the acronym:

We worked with IR. IR stands for information Resources and is a new department.

But that isn’t ideal either because those who already know the meaning aren’t given new information. It is, in fact, pedantic. The better approach is to weave additional information, like a definition, into the information that the specialist also finds interesting, like so:

We worked with the recently launched Information Resources (IR) department.

We’ll consider these within the context of effective business writing.

1.2.3 The utility of decisions

For data analysis to grab an audience’s attention, the communication of that analysis should answer their question, “so what?” Now that the audience knows what you’ve explained, what should come of it? What does it change?

To get to such an answer we need to think about the expected utility of the information in terms of that so what. This idea, is a more formal quantification driving towards purpose for a particular audience. We’ll just introduce a few concepts without details here as this topic is advanced, given its placement in our text’s sequencing, but it’s important for future reference to have an awareness that these concepts exist.

We can combine probability distributions of expected outcomes and the utility of those outcomes to enable rational decisions (Parmigiani 2001); (Gelman et al. 2013, chap. 9). In simple terms, this means we can decide how much each possible outcome is worth and multiply that by the probability that each outcome happens. Then, we, or our audience, can choose the “optimal” one. Slightly more formally, optimal decisions choose an action that maximizes expected utility (minimizes expected loss), where the expectation is computed using a posterior distribution.

Model choice is a special case of decision-making that uses a zero-one loss function: the loss is zero when we choose the correct model, and one otherwise. Beyond model selection, a business may use as a loss function, say, for its choice of actions that maximize expected profits arising from those actions. In more general, mathematical notation, we integrate over the product of the loss function and posterior distribution,

\[\begin{equation} \min_a\left\{ \bar{L}(a) = \textrm{E}[L(a,\theta)]= \int{L(a,\theta)\cdot p(\theta \mid D)\;d\theta} \right\} \end{equation}\]

where \(a\) are actions, \(\theta\) are unobserved variables or parameters, and \(D\) are data. The seminal work by von Neumann and Morgenstern (2004) set forth the framework for rational decision-making, and J. O. Berger (1985) is a classic textbook on the intersection of statistical decision theory and Bayesian analysis. Of note, the analyses of either-or decisions, and even sequential decisions can be fairly straight-forward. Complexity grows, however, when multiple objectives or agents are involved.

Communicating our results in terms of the utility for decisions will help us bridge the gap from analysis to audience. Again, utility is an advanced topic for its placement in this text. Just have some awareness that we can inform decisions through such a process and, to the extent these details are vague, don’t worry. Move on for now.

1.3 Elements of Writing

1.3.1 Purpose, audience, and craft

As we prepare to scope, and work through, a data analytics project, we must communicate variously if it is to have value. The reproducible workflow shown in the last chapter at least provides value as a communication to its immediate audience — its authors, as a reference for what they accomplished — and to those with similar needs4 to understand and critique the logical progression and scope of the project. That form of communiation, however, will be less valuable than other communication forms for different audiences and purposes. Given the information compiled from our project — the content — we now consider communicating various aspects for other purposes and audiences.

The importance of adjusting communication is not unique to data analytics. Let’s consider, say, how communication form differs when for a news story versus an op-ed. Long-time editor of the op-ed at the New York Times explains,

The approach to argument that I learned in classes at Berkeley was much more similar to an op-ed than the inverted pyramid of daily journalism or the slow, anecdotal flow of feature stories that had dominated my professional life (Hall 2019).

The qualities of an op-ed piece must be, she writes: “surprising, concrete, and persuasive.” These qualities are similar to that we need in business communication, which generally drive decisions. All business writing begins with a) identifying the purpose for communicating and b) understanding your audiences’ scopes of knowledge and responsibilities in the problem context. Neither is trivial; both require research. To motivate this discussion, let’s consider and deconstruct two example memos — one for Citi Bike, the other for the Dodgers — written for data science projects. It will be helpful for this exercise to place both memos side-by-side for comparison as we work through them below. Communication structure

Let’s begin discussing the communication structure from several perspectives: purpose, narrative structure, sentence structure, and effective redundancy through heirarchy. Then, we consider audiences, story, and the importance of revision. Purpose and audience

In the first example, we return to Citi Bike. After project ideation, and scoping, we want to ask Citi Bike’s head of data analytics to let us write a more detailed proposal to conduct the data analytics project. We accomplish this in 250 words. The the fully composed, draft memo follow,

Example 1 (250-word Citi Bike memo)
Michael Frumin
Director of Product and Data Science
for Transit, Bikes, and Scooters at Lyft

To inform the public on rebalancing, let’s re-explore docking
availability and bike usage with subway and weather.

Let’s re-explore station and ride data in the context of subway and weather information to gain insight for “rebalancing,” what our Dani Simmons explains is ”one of the biggest challenges of any bike share system, especially in … New York where residents don’t all work a traditional 9-5 schedule, and though there is a Central Business District, it’s a huge one and people work in a variety of other neighborhoods as well” (Friedman 2017).

Recalling the previous, public study by Columbia University Center for Spatial Research (Saldarriaga 2013), it identified trends in bike usage using heatmaps. As those visualizations did not combine dimensions of space and time, which the public would find helpful to see trends in bike and station availability by neighborhood throughout a day, we can begin our analysis there.

We’ll use published data from NYC OpenData and The Open Bus Project, including date, time, station ID, and ride instances for all our docking stations and bikes since we began service. To begin, we can visually explore the intersection of trends in both time and location with this data to understand problematic neighborhoods and, even, individual stations, using current data.

Then, we will build upon the initial work, exploring causal factors such as the availability of alternative transportation (e.g., subway stations near docking stations) and weather. Both of which, we have available data that can be joined using timestamps.

The project aligns with our goals and shows the public that we are, in Simmons’s words, “innovative in how we meet this challenge.” Let’s draft a detailed proposal.

Scott Spencer

Friedman, Matthew. “Citi Bike Racks Continue to Go Empty Just When Upper West Siders Need Them.” News. West Side Rag (blog), August 19, 2017.

Saldarriaga, Juan Francisco. “CitiBike Rebalancing Study.” Spatial Information Design Lab, Columbia University, 2013.

It begins, in the title of this memo, with our purpose of writing, to conduct data analysis on specifically identified data to inform the issue of rebalancing, one of Citi Bike’s goals:

To inform rebalancing, let’s explore docking and bike availability in the context of subway and weather information.

This is what Doumont (2009) calls a message. We should craft communications with messages, not merely information. Doumont explains that a message differs from raw information in that it presents “intelligent added value,” that is, something to understand about the information. A message interprets the information for a specific audience and for a specific purpose. It conveys the so what, whereas information merely conveys the what. What makes our title a message? Before answering this, let’s compare one of Doumont’s examples of information to that of a message. This sentence is mere information:

A concentration of 175 \(\mu g\) per \(m^3\) has been observed in urban areas.

A message, in contrast to information, would be the so what:

The concentration in urban areas (175 \(\mu g / m^3\)) is unacceptably high.

In our title, we request an action, approval for the exploratory analysis on specified data, for a particular purpose, to inform rebalancing. This purpose also implies the so what: unavailable bikes or docking slots, unbalanced stations, are bad. We’re asking to help remedy the issue.

This beginning, if effective, is only because we wrote it for a particular audience. Our audience is head of data analytics at Citi Bike, and presumably knows the problem of rebalancing; it is well-known in, and beyond, the organization. Thus, our sentence implicitly refers back to information our audience already knows5. Relying on his or her knowledge means we do not need to first explain what rebalancing is or why it is a problem.

Let’s introduce a second example before digging further into the structure of the Citi Bike memo. Having multiple examples to analyze has the added benefit of allowing us to induce some general, but effective, writing principles.

Considering a second example, professional teams in the sport of baseball, including the Los Angeles Dodgers, make strategic decisions within the boundaries of the sport’s rules for the purpose of winning games. One of those rules involves stealing bases, as in figure 3.

In a close call, the baseball umpire spread his arms, signaling that the Dodgers baserunner successfully ran to the base faster than the catcher could throw the ball to the base to get him out --- the runner stole the base. Knowing when to try to steal is strategic and depends on other events that are at least partly measured as data.

Figure 3: In a close call, the baseball umpire spread his arms, signaling that the Dodgers baserunner successfully ran to the base faster than the catcher could throw the ball to the base to get him out — the runner stole the base. Knowing when to try to steal is strategic and depends on other events that are at least partly measured as data.

This concept is part of our next example, written to the Los Angeles Dodgers’s Director of Quantitative Analytics, Scott Powers. Powers, is Director of Quantitative Analytics at the Los Angeles Dodgers. He earned his doctor of philosophy in statistics from Stanford, has authored publications in machine learning, knows R programming, and as an employee of the Dodgers, knows their history. Powers manages a team of data scientists; their responsibilities include assessing player and team performance. The draft memo follows,

Example 2 (250-word Dodgers memo)
Scott Powers
Director of quantitative analysis
at Los Angeles Dodgers

Our game decisions should optimize expectations.
Let’s test the concept by modeling decisions to steal.

Our Sandy Koufax pitched a perfect game, the most likely event sequence, only once: those, we do not expect or plan. Since our decisions based on other most likely events don’t align with expected outcomes, we leave wins unclaimed. To claim them, let’s base decisions on expectations flowing from decision theory and probability models. A joint model of all events works best, but we can start small with, say, decisions to steal second base.

After defining our objective (e.g. optimize expected runs) we will, from Statcast data, weight everything that could happen by its probability and accumulate these probability distributions. Joint distributions of all events, an eventual goal, will allow us to ask counterfactuals — “what if we do this” or “what if our opponent does that” — and simulate games to learn how decisions change win probability. It enables optimal strategy.

Rational and optimal, this approach is more efficient for gaining wins. For perspective, each added win from the free-agent market costs 10 million, give or take, and the league salary cap prevents unlimited spend on talent. There is no cap, however, on investing in rational decision processes.

Computational issues are being addressed in Stan, a tool that enables inferences through advanced simulations. This open-source software is free but teaching its applications will require time. To shorten our learning curve, we can start with Stan interfaces that use familiar syntax (like lme4) but return joint probability distributions: R packages rethinking, brms, or rstanarm. Perfect games aside, we can test the concept with decisions to steal.

Scott Spencer

The beginning of this memo reads:

Our game decisions should optimize expectations. Let’s test the concept by modeling decisions to steal.

In the first three words, the subject of this sentence, signals to our audience something they have experience with: our game decisions. It’s familiar. Then, we provide a call to action: our game decisions should optimize expectations. Let’s test the concept by modeling decisions to steal. As with the Citi Bike memo (example 1), the Dodgers memo (example 2) begins with a message and stated purpose. And as with Citi Bike, the information in the title message of the Dodgers memo is shared knowledge with our audience. In fact, we rely on the educational background of our audience, who we know has earned a doctor of philosophy in statistics, when including the concept to “optimize expectations”6 without first explaining what that is because we know, or from the audience’s background, can assume the audience understands the concept.

So in both cases, we have begun with language and topics already familiar to the audience, which follows the more general writing advice from Doumont (2009) , who instructs us to

put ourselves in the shoes of the audience, anticipating their situation, their needs, their expectations. Structure the story along their line of reasoning, recognizing the constraints they might bring: their familiarity with the topic, their mastery of the language, the time they can free for us.

What else do we know about chief analytics officers in general? Their jobs require them to be efficient with their time. Thus, by starting with our purpose, letting them know what we want them to do, we are considerate of their “constraints” and “time they can free for us.”

Beginning with a purpose and call-to-action also allow the executive to understand the memo’s relevance to them, in terms of their decision-making, immediately; they have a reason to continue reading.

The persuasive power of beginning with the main message for your audience, or issue relevant to your audience, is nearly as timeless as it is true. Cicero, the Roman philosopher with a treatise on rhetoric, explained that we must not begin with details because “it forms no part of the question, and men are at first desirous to learn the very point that is to come under their judgment” (Cicero and Watson 1986).

Next, let’s review the structure of these memos to see whether we’ve “structur[ed] the story along their line of reasoning.” Common ground

Let’s compare the first sentences of the body of both examples. The Citi Bike memo begins,

Let’s re-explore station and ride data in the context of subway and weather information to gain insight for “rebalancing,” broadening the factors we’ve told the public that “one of the biggest challenges of any bike share system, especially in … New York where residents don’t all work a traditional 9-5 schedule, and though there is a Central Business District, it’s a huge one and people work in a variety of other neighborhoods as well” (Friedman 2017).

This sentence starts with the title request, and then ties the purpose — rebalancing — to corporate goals. It does so by quoting the company’s spokesperson, which serves as both evidence of the so what. Offering, and accepting, Simmons’s quote serves a second purpose in writing: it helps to establish common ground with our audience.

If we want to affect the behaviors and beliefs of the person in front of us, we need to first understand what goes on inside their head and establish common ground. Why? When you provide someone with new data, they quickly accept evidence that confirms their preconceived notions (what are known as prior beliefs) and assess counterevidence with a critical eye (Sharot 2017). Four factors come into play when people form a new belief: our old belief (this is technically known as the “prior”), our confidence in that old belief, the new evidence, and our confidence in that evidence. Focusing on what you and your audience have in common, rather than what you disagree about, enables change. Let’s check for common ground in the Dodgers memo. The first sentence of the body begins,

Our Sandy Koufax pitched a perfect game, the most likely event sequence, only once: those, we do not expect or plan.

Sandy Koufax is one of the most successful Dodgers players in the history of the franchise. He is one of less than 20 pitchers in the history of baseball to pitch a perfect game, something extraordinary. Our audience, as an employee of the Dodgers, will be familiar with this history. It is also something very positive — and shared — between author and audience. It is intended to establish common ground, in two ways. Along with that positive, shared history, it sets up an example of a statistical mode, one that we know the audience would agree is unhelpful for planning game strategy because it is too rare, even if it is a statistical mode. It helps to create common ground or agreement that it may not be best to use statistical modes for making decisions.

In both memos, we are also trying to use an interesting fact that may be unexpected or surprising in this context (Sandy Koufax, Dani Simmons) to grab our audience’s attention. In journalism, this is one way to create the lead. Zinsser (2001)7 explains that the most important sentence in any communication is the first one. If it doesn’t induce the reader to proceed to the second sentence, your communication is dead. And if the second sentence doesn’t induce the audience to continue to the third sentence, it’s equally dead. Readers want to know — very soon — what’s in it for them.

Your lead must capture the audience immediately cajoling them with freshness, or novelty, or paradox, or humor, or surprise, or with an unusual idea, or an interesting fact, or a question. Next, it must provide hard details that tell the audience why the piece was written and why they ought to read it. Details

At this point in both memos, we have begun our memo with information familiar to our audience, relevant to their job in decision-making, and established our purpose. We have also started with information they would agree with. We’ve created common ground. The stage is set. What’s next? Here’s the next two sentences in the body of the Citi Bike memo:

Recalling the previous, public study by Columbia University Center for Spatial Research (Saldarriaga 2013), it identified trends in bike usage using heatmaps. As those visualizations did not combine dimensions of space and time, which the public would find helpful to see trends in bike and station availability by neighborhood throughout a day, we can begin our analysis there.

The first sentence introduces previous work — background — on rebalancing studies and its limitations, and we proposed to start where the prior work stopped. This accomplishes two objectives. First, it helps our audience understand beginning details of our proposed project. Second, it helps the audience see that our proposed work is not redundant to what we already know. Thus, we began the details of our proposed solution. What is described in the Dodgers memo at a similar point? This:

To claim them, let’s base decisions on expectations flowing from decision theory and probability models. A joint model of all events works best, but we can start small with, say, decisions to steal second base.

As with Citi Bike, the next two sentences start introducing details of the proposed project.

After introducing the nature of the proposed project in both memos, we identify data that makes the proposed project feasible. In the Citi Bike memo we identify specific categories of data and the publicly available source of those data:

We’ll use published data from NYC OpenData and The Open Bus Project, including date, time, station ID, and ride instances for all our docking stations and bikes since we began service.

Similarly, in the Dodgers memo,

After defining our objective (e.g. optimize expected runs) we will, from Statcast data, weight everything that could happen by its probability and accumulate these probability distributions.

It may seem we are less descriptive of the data than in the Citi Bike memo, but the label “Statcast” signals to our particular audience a group of specific, publicly available variables collected by the Statcast system, see (Willman, n.d.) and (Willman 2020).

After identifying data, we explain how we plan its analysis.

Having identified data, both memos then describe more details of our proposed methodology. In Citi Bike, we discuss two stages. We plan to graphically explore specific variables in search of specific trends first.

To begin, we can visually explore the intersection of trends in both time and location with this data to understand problematic neighborhoods and, even, individual stations, using current data.

Then, we specifically identify additional data we plan to join and explore as causal factors for problem areas:

Then, we will build upon the initial work, exploring causal factors such as the availability of alternative transportation (e.g., subway stations near docking stations) and weather. Both of which, we have available data that can be joined using timestamps.

Similarly, in the Dodgers memo, go into the planned methodology. We plan to model expectations from the data:

…from Statcast data, weight everything that could happen by its probability and accumulate these probability distributions. Benefits

Having described our data and methodology in both memos, we now describe some benefits. In the Citi Bike memo,

The project aligns with our goals and shows the public that we are, in Simmons’s words, “innovative in how we meet this challenge.”

And in the Dodgers memo, perhaps because we believe the benefits are comparatively less obvious, or less proven, we further develop them:

Joint distributions of all events, an eventual goal, will allow us to ask counterfactuals — “what if we do this” or “what if our opponent does that” — and simulate games to learn how decisions change win probability. It enables optimal strategy.

Rational and optimal, this approach is more efficient for gaining wins. For perspective, each added win from the free-agent market costs 10 million, give or take, and the league salary cap prevents unlimited spend on talent. There is no cap, however, on investing in rational decision processes. Limitations

In the Citi Bike memo, we didn’t identify limitations. Should we?

In the Dodgers memo, we do, while also explaining how we plan to overcome those limitations:

Computational issues are being addressed in Stan, a tool that enables inferences through advanced simulations. This open-source software is free but teaching its applications will require time. To shorten our learning curve, we can start with Stan interfaces that use familiar syntax (like lme4) but return joint probability distributions: R packages rethinking, brms, or rstanarm. Conclusion

Finally, we wrap up in both memos. in the Citi Bike memo, after echoing the quote from Simmons, we state,

Let’s draft a detailed proposal.

Again, the Dodgers memo is similar. There, we circle back to our introduction to Sandy Koufax and his perfect game, then conclude,

Perfect games aside, we can test the concept with decisions to steal.

Again, this idea of echoing something from where we began is journalism’s complement to the lead.

Zinsser explains that, ideally, the ending should encapsulate the idea of the piece and conclude with a sentence that jolts us with its fitness or unexpectedness. Consider bringing the story full circle — to strike at the end an echo of a note that was sounded at the beginning. It gratifies a sense of symmetry.

Executives’ lines of reasoning commonly, but do not always, follow the general document structure described above. If we don’t have information otherwise, this is a good start. Narrative structure

The above ideas — tools — are helpful in structuring and writing persuasive memos, and longer communications for that matter. And as writing lengthens, the next couple of related tools can be especially helpful in refining the narrative structure in a way that holds our audience’s interest by creating tension. German dramatist Gustav Freytag in the late 19th century illustrated a narrative arc8 used in Shakespearean dramas:

The primary elements of an applied analytics project may be thought of as a well-articulated business problem, a data science solution, and a measurable outcome to produce value for the organization. The analytics project may thus be conceptualized as a narrative arc, with a beginning (problem), middle (analytics), and end (overcoming of the problem), along with characters (analysts, colleagues, clients) who play important roles.

Duarte (2010) used the narrative arc to conceptualize an interesting alternative way to think about structure that creates tension: alternating what is with what may be:

We can repeat this approach, see figure 4, switching between what is and what may be to maintain a sense of tension or interest throughout a narrative arc. Once you become aware, you may be surprised how much you find writing in this form.

Duarte illustrates the repeated switching between what is and what may be, which helps to hold audience interest throughout a narrative.

Figure 4: Duarte illustrates the repeated switching between what is and what may be, which helps to hold audience interest throughout a narrative.

This juxtaposition of two states for creating tension is another form of comparison.

Exercise 3 Revisit the two memos, examples 1 and 2. Identify sentences or paired sentences that shift focus from what is to what could be, creating a contrast.

Let’s look closer, now, to sentence structure. Sentence structure

When we describe old before new, using sentence structure, it generally improves understanding. The concept has also been described as an information unit. “The information unit is what its name implies: a unit of information. Information, in this technical grammatical sense, is the tension between what is already known or predictable and what is new or unpredictable” (Halliday and Matthiessen 2004). As a general principle, “readers follow a story most easily if they can begin each sentence with a character or idea that is familiar to them, either because it was already mentioned or because it comes from the context” (J. M. Williams, Bizup, and Fitzgerald 2016).

Consider an alternative flow of information. Put new information before old information. Reversing the information flow will likely confuse your audience. This point was clearly demonstrated in a classic movie, Memento,

Poster for Memento, a movie designed to purposefully confuse the audience by narrating the story (partly) in reverse.

Figure 5: Poster for Memento, a movie designed to purposefully confuse the audience by narrating the story (partly) in reverse.

where Director Christopher Nolan tells the story of a man with anterograde amnesia (the inability to form new memories) searching for someone who attacked him and killed his wife, using an intricate system of Polaroid photographs and tattoos to track information he cannot remember. The story is presented as two different sequences of scenes interspersed during the film: a series in black-and-white that is shown chronologically, and a series of color sequences shown in reverse order (simulating for the audience the mental state of the protagonist). The two sequences meet at the end of the film, producing one complete and cohesive narrative. Yet, the reversed order is (effectively) designed to hold the audience in confusion so that they may get a sense of the confusion experienced by someone with this illness. Indeed, that we demonstrate this with film implies that ordering in visual representation matters too, and it does. As such, we revisit this in the context of images.

For reasons similar, explain complex information last. This is particularly important in three contexts: introducing a new technical term, presenting a long or complex unit of information, introducing a concept before developing its details. And just as the old—new paradigm helps to convey messages, so too does expressing crucial actions in verbs. Make your central characters the subjects of those verbs; keep those subjects short, concrete, and specific.

Exercise 4 Revisit the Dodgers memo again, example 2. This time, for each sentence and words within the sentence, try to identify whether the word or phrase is new or old. When determining this, consider both the words, phrases, and sentences preceding the one under analysis. Of note, you may also consider the audience’s background knowledge as a form of information. Layering and heirarchy

Most communications benefit by providing multiple levels in which the narrative may be read. Even emails and memos — concise communications — enable two layers, the title and the main body. Thus, the title should inform the audience of the relevance of the communication: what is it the author wants them to do or know. It should also, or at least, invite the audience to learn more through the details of the main body. As the communication lengthens, more layers may be used. The title’s purpose remains the same, as does the main body. But we may add middle layers, headers and subheaders (Doumont 2009). These should not be generic. Instead, the author should be able to read just these and understand the gist of the communication. This concept is well established where we intend persuasive communication. A well-known instructor of legal writing, Guberman (2014) explains how to draft this middle layer:

Strong headings are like a good headline for a newspaper article: they give [the audience] the gist of what [they] need to know, draw [them] into text [they] might otherwise skip, and even allow … a splash of creativity and flair.

The old test is still the best. Could [the audience] skim your headings and subheadings and know why [they should act]?

A good way to provide these “signposts” is to make your headings complete thoughts that, if true, would push you toward the finish line.

In accord, Scalia and Garner (2008) writes:

Since clarity is the all-important objective, it helps to let the reader know in advance what topic you’re to discuss. Headings are most effective if they’re full sentences announcing not just the topic but your position on that topic.

In short, headings should be what Doumont (2009) calls messages. Headings provide “effective redundancy.” The redundancy gained from headers may be two fold. They, first, introduce your message before the detailed paragraphs and, second, may be collected up front, as a table of contents. Even short communication benefit from headers, and communications of at least several pages will likely benefit from such a table of contents along with headers. Story

At this point, we’ve identified a problem or opportunity upon which our entity may decide to act. We’ve found data and considered how we might uncover insights to inform decisions. We’ve scoped an analytics project. In beginning to write, we’ve considered document structure, sentence structure, and narrative. We’ve also begun to consider our audience.

Can we — should we — use story to help us communicate? Consider its use in the context of an academic debate on the question: (Krzywinski and Cairo 2013a), (Katz 2013), and (Krzywinski and Cairo 2013b); and for more perspective: (Gelman and Basbøll 2014).

Exercise 5 Explain whether, and why or why not, you believe any of the arguments in the debate about using story to explain scientific results are correct.

Let’s now focus on how we may directly employ story. A little research on story, though, reveals differences in use of the term. “"Story, n."” (2015) defines story generally as a narrative:

An oral or written narrative account of events that occurred or are believed to have occurred in the past…

Distinguished novelist E.M. Forster famously described story as “a series of events arranged in their time sequence.” Of note, he also compares and distinguishes story from plot: “plot is also a narrative of events, the emphasis falling on causality: ‘The king died and then the queen died’ is a story. But ‘the king died and then the queen died of grief’ is a plot. The time sequence is preserved, but the sense of causality overshadows it” (Forster 1927). But not just any narrative works for moving our audiences to act. Harari (2014) explains,

the difficulty lies not in telling the story, but in convincing everyone else to believe it. . .

Let’s consider other points of view.

To understand the narrative arc of successful stories, John Yorke studied numerous stories, and from those induced general principles Yorke (2015). A journalist and author, too, has studied narrative structure but, with a different approach — he focuses on the cognitive science and psychology of how our mind works and relates those characteristics to story (Storr 2020). Story, writes Storr, typically begins with “unexpected change,” the “opening of an an information gap,” or both. Humans naturally want to understand the change, or close that gap; it becomes their goal. Language of messages and information that close the gap, then, form what we may think of as narrative’s plot.

Indeed, Storr (2020) suggests that the varying so-called designs for plot structure9 are all really different approaches to describing change:

But I suspect that none of these plot designs is actually the ‘right’ one. Beyond the basic three acts of Western storytelling, the only plot fundamental is that there must be regular change, much of which should preferably be driven by the protagonist, who changes along with it. It’s change that obsesses brains. The challenge that storytellers face is creating a plot that has enough unexpected change to hold attention over the course of an entire novel or film. This isn’t easy. For me, these different plot designs represent different methods of solving that complex problem. Each one is a unique recipe for a plot that moves relentlessly forwards, builds in intrigue and tension and never stops changing.

A quantitative analysis of over 100,000 narratives suggests this too (Robinson 2017).

We evolved for recognizing change, and for cause and effect. Thus, a narrative or story is driven forward by linking together change after change as instances of cause and effect. Indeed, to create initial interest, we only need to foreshadow change.

Let’s consider some examples that use information gaps and narratives in the context of data science. The short narratives in Wainer (2016b), each about a data science concept that people frequently misunderstand, are exemplary. He begins each of these by setting up a contrast or information gap. In chapter 1, for example, the author teaches the “Rule of 72” as a heuristic to think about compounding quantities by posing a question10 to setup an information gap:

Great news! You have won a lottery and you can choose between one of two prizes. You can opt for either:

  1. $10,000 every day for a month, or

  2. One penny on the first day of the month, two on the second, four on the third, and continued doubling every day thereafter for the entire month.

Which option would you prefer?

Similarly, in chapter 2, Wainer (2016b) teaches us implications of the law of large numbers by exposing an information gap. Again, he uses a question to setup an information gap:

“Virtuosos becoming a dime a dozen,” exclaimed Anthony Tommasini, chief music critic of the New York Times in his column in the arts section of that newspaper on Sunday, August 14, 2011.

But why?

Once he has setup his narratives, he bridges the gap. Let’s keep in mind that his purpose in these stories are for audience awareness. To teach. We can adapt these narrative11 concepts, though, in communications for other purposes.

Exercise 6 By this discussion, are the Citi Bike and Dodgers memos a story? If not, what they may lack? If so, explain what structure or form makes them a story. Do the story elements — or would they if used — add persuasive effect? Revise for the audience

“We write a first draft for ourselves; the drafts thereafter increasingly for the reader” (J. Williams and Colomb 1990). Revision lets us switch from writing to understand to writing to explain. Switching audience is critical, and not doing so is a common mistake. Schimel (2012) explains one manifestation of the error:

Using an opening that explains a widely held schema is a flaw common with inexperienced writers. Developing scholars are still learning the material and assimilating it into their schemas. It isn’t yet ingrained knowledge, and the process of laying out the information and arguments, step by step, is part of what ingrains it to form the schema. Many developing scholars, therefore, have a hard time jumping over this material by assuming that their readers take it for granted. Rather, they are collecting their own thoughts and putting them down. There is nothing wrong with explaining things for yourself in a first draft. Many authors aren’t sure where they are going when they start, and it is not until the second or third paragraph that they get into the meat of the story. If you do this, though, when you revise, figure out where the real story starts and delete everything before that.

Revision gives us opportunity to focus on our audience once we understand what we have learned. This benefit alone is worth revision.

But it does more, especially when we allow time to pass between revisions: “If you start your project early, you’ll have time to let your revised draft cool. What seems good one day often looks different the next” (Booth et al. 2016a). As you revise, read aloud. While normal conversations do not typically follow grammatically correct language, well-written communications should smoothly flow when read aloud. Try reading this sentence aloud, following the punctuation:

When we read prose, we hear it…it’s variable sound. It’s sound with — pauses. With emphasis. With, well, you know, a certain rhythm (Goodman 2008).

And when revising, consider each word and phrase, and test whether removing that word or phrase changes the context or meaning of the sentence for your audience. If not, remove it. In a similar manner, when choosing between two words with equally precise meaning, it is generally best to use the word with fewer syllables or that flows more naturally when read aloud.

Consider this 124-word blog post for what became a data-for-good project:

Improving Traffic Safety Through Video Analysis

Nearly 2,000 people die annually as a result of being involved in traffic-related accidents in Jakarta, Indonesia. The city government has invested resources in thousands of traffic cameras to help identify potential short-term (e.g. vendor carts in a hazardous location) and long-term (e.g. poorly engineered intersections) safety risks. However, manually analysing the available footage is an overwhelming task for the city’s Transportation Agency. In support of the Jakarta Smart City initiative, our team hopes to build a video-processing pipeline to extract structured information from raw traffic footage. This information can be integrated with collision, weather, and other data in order to build models which can help public officials quickly identify and assess traffic risks with the goal of reducing traffic-related fatalities and severe injuries.

The authors identified their audience explicitly in their award-winning write-up (Caldeira et al. 2018),

We want this project to provide a template for others who hope to successfully deploy machine learning and data driven systems in the developing world. . . . These lessons should be invaluable to the many researchers and data scientists who wish to partner with NGOs, governments, and other entities that are working to use machine learning in the developing world.

Exercise 7 Explain the similarities and differences in structure, categories of information, and level of detail between the above blog post and our two, example memos.
Exercise 8 Explain the similarities and differences you would expect between their audience and the background and experience you would expect from the chief analytics executive for the City of Jakarta.

Exercise 9 For this exercise, your relationship to the chief analytics executive at Jakarta would be that of employee or consultant (your choice). Revise the above blog post into a 250-word memo written to whom you imagine as the chief analytics executive for the City of Jakarta, with the purpose of him or her approving of you moving forward with the project, beginning with a formal proposal. You have the exclusive benefit of their post-project write-up, but don’t use any described actual results in your revisions.

1.3.2 Persuasion

Should we use data science to persuade others? The late Robert Abelson thought so when he published Abelson (1995). But since then, this question has been under the public eye as we try to correct the replication crisis we mentioned in section 1.1.3. A special interest group has formed in service of this correction. Wacharamanotham et al. (2018) explains,

we propose to refer to transparent statistics as a philosophy of statistical reporting whose purpose is to advance scientific knowledge rather than to persuade. Although transparent statistics recognizes that rhetoric plays a major role in scientific writing [citing Abelson], it dictates that when persuasion is at odds with the dissemination of clear and complete knowledge, the latter should prevail.

Gelman (2018) poses the question, too:

Consider this paradox: statistics is the science of uncertainty and variation, but data-based claims in the scientific literature tend to be stated deterministically (e.g. “We have discovered … the effect of X on Y is … hypothesis H is rejected”). Is statistical communication about exploration and discovery of the unexpected, or is it about making a persuasive, data-based case to back up an argument?

Only to answer:

The answer to this question is necessarily each at different times, and sometimes both at the same time.

Just as you write in part in order to figure out what you are trying to say, so you do statistics not just to learn from data but also to learn what you can learn from data, and to decide how to gather future data to help resolve key uncertainties.

Traditional advice on statistics and ethics focuses on professional integrity, accountability, and responsibility to collaborators and research subjects.

All these are important, but when considering ethics, statisticians must also wrestle with fundamental dilemmas regarding the analysis and communication of uncertainty and variation.

Gelman seems to place persuasion with deterministic statements and constrasts that with the communication of uncertainty.

Exercise 10 How do you interepret Gelman’s statement? Must we trade uncertainty for persuasive arguments? Discuss these issues and the role of persuasion, if any, in the context of a data analytics project. Methods of persuasion

A means of persuasion “is a sort of demonstration (for we are most persuaded when we take something to have been demonstrated),” writes Aristotle and Reeve (2018). Consider, first, appropriateness of timing and setting, Kairos. Can the entity act upon the insights from your data analytics project, for example? What affect may acting at another time of place mean for the audience? Second, arguments should be based on building common ground between listener and speaker, or listener and third-party actor. Common ground may emerge from shared emotions, values, beliefs, ideologies, or anything else of substance. Aristotle referred to this as pathos. Third, Arguments relying on the knowledge, experience, credibility, integrity, or trustworthiness of the speaker — ethos — may emerge from the character of the advocate or from the character of another within the argument, or from the sources used in the argument. Fourth, the argument from common ground to solution or decision should be based on the syllogism or the syllogistic form, including those of enthymemes and analogies. Called logos, this is the logical component of persuasion, which may reason by framing arguments with metaphor, analogy, and story that the audience would find familiar and recognizable. Persuasion, then, can be understood as researching the perspectives of our audience about the topic of communication, and moving from their point of view “step by step to a solution, helping them appreciate why the advocated position solves the problem best” (Perloff 2017). The success of this approach is affected by our accuracy and transparency.

Exercise 11 In the Citi Bike memo example 1, identify possible audience perspectives of the communicated topic. In what ways, if at all, did the communication seek to start with common ground? Do you see any appeals to credibility of the author or sources? What forms of logic were used in trying to persuade the audience to approve of the request? Consider whether other or additional approaches to kairos, pathos, ethos, and logos could improve the persuasive effect of the communication.
Exercise 12 In the Dodgers memo example 2, identify possible audience perspectives of the communicated topic. In what ways, if at all, did the communication seek to start with common ground? Do you see any appeals to credibility of the author or sources? What forms of logic were used in trying to persuade the audience to take action? Consider whether other or additional approaches to kairos, pathos, ethos, and logos could improve the persuasive effect of the communication.
Exercise 13 In the second Dodgers example — the draft proposal — is the communication approach identical to that in the Dodgers memo? If not, in what ways, if at all, did the communication seek to start with common ground? Do you see any appeals to credibility of the author or sources? What forms of logic were used in trying to persuade the audience to take action? Consider whether other or additional approaches to kairos, pathos, ethos, and logos could improve the persuasive effect of the communication.
Exercise 14 As with the above exercises, examine your draft data analytics memo. Identify how the audience may view the current circumstances and solution to the problem or opportunity you have described. Remember that it tends to be very difficult to see through our biases, so ask a colleague to help provide perspective on your audience’s viewpoint. Have you effectively framed the communication using common ground? Explain. Accuracy

Narrative arguments must avoid any temptation for overstatement. Strunk and White (2000) warn:

A single overstatement, wherever or however it occurs, diminishes the whole, and a carefree superlative has the power to destroy, for readers, the object of your enthusiasm.

Two prominent legal scholars, one a former United States Supreme Court Justice, agree. Scalia and Garner (2008) explain:

Scrupulous accuracy consists not merely in never making a statement you know to be incorrect (that is mere honesty), but also in never making a statement you are not certain is correct. So err, if you must, on the side of understatement, and flee hyperbole. . . Inaccuracies can result from either deliberate misstatement or carelessness. Either way, the advocate suffers a grave loss of credibility from which it is difficult to recover.

As in law, so too in the context of arguments supporting research:

But in a research argument, we are expected to show readers why our claims are important and then to support our claims with good reasons and evidence, as if our readers were asking us, quite reasonably, Why should I believe that?… Instead, you start where your readers do, with their predictable questions about why they should accept your claim, questions they ask not to sabotage your argument but to test it, to help both of you find and understand a truth worth sharing (p. 109)…. Limit your claims to what your argument can actually support by qualifying their scope and certainty (p. 129) (Booth et al. 2016b). Transparency

Edward R. Tufte (2006a) explains, “The credibility of an evidence presentation depends significantly on the quality and integrity of the authors and their data sources.”

Be accurate. Be transparent. Syllogism and enthymeme

Leaving aside emotional appeals [for the moment], persuasion is possible only because all human beings are born with a capacity for logical thought. It is something we all have in common. The most rigorous form of logic, and hence the most persuasive, is the syllogism (Scalia and Garner 2008).

Syllogisms are one of the most basic tools of logical reasoning and argumentation. They are structured argument, constructed with a major premise, a minor premise, and a conclusion. Formally, the structure is of the form,

All A are B.

C is A.

Therefore, C is B.

Such rigid use of “all” and “therefore” isn’t necessary, what’s necessary is the meaning of each premise and conclusion.

We may sometimes abbreviate the syllogism, leaving one of the premises implied (enthymeme). The effectiveness of this approach depends upon whether your audience will, from their knowledge and experience, naturally fill in the implied gap in logic.

Syllogism and enthymeme are a powerful tool for persuasion. But the persuasive effect may be compromised — as tested experimentally, see (Copeland, Gunawan, and Bies-Hernandez 2011) and (Evans, Barston, and Pollard 1983) — by various audience biases and perceptions of credibility, discussed above. Logic also serves as a building block for a rhetoric of narrative, i.e., a narrative that convinces the audience. Narrative as argument

A rhetoric of narrative is logical, but also emotive and ethical (Rodden 2008). It may seem surprising to find argument common in fiction12, and its value grows with non-fiction and communication for business purposes. A rhetorical narrative functions, if effective, by adjusting its ideas to its audience, and its audience to its ideas. The idea, in this sense, includes the sequence of events that demonstrate change or contrast, introduced in section To enable action on an issue, in Aristotle’s words, dispositio, it was essential to state the case through description — writing imaginable pictures — and narration (telling stories) (Aristotle and Reeve 2018).

Consider our two memo examples 1 and 2. Do either elicit images in the narratives? Explain. In the Citi Bike memo, what might be a reason for quoting Dani Simmons? Does that reason compare with or differ from how you perceive possible reasons for referencing Sandy Koufax in the Dodgers example? Priming and emotion

An introductory story can prime an audience for our main message:

priming is what happens when our interpretation of new information is influenced by what we saw, read, or heard just prior to receiving that new information. Our brains evaluate new information by, among other things, trying to fit it into familiar, known categories. But our brains have many known categories, and judgments about new information need to be made quickly and efficiently. One of the “shortcuts” our brains use to process new information quickly is to check the new information first against the most recently accessed categories. Priming is a way of influencing the categories that are at the forefront of our brains (L. L. Berger and Stanchi 2018).

As we make decisions based on emotion (Damasio 1994), and we may even start with emotion and back into supporting logic (Haidt 2001), we can introduce our messages with emotional priming, too. Yet we should be careful with this approach as audiences may feel manipulated and become resistant — or even opposed — to our message. Tone of an argument

When trying to persuade, authors sometimes approach changing minds too directly:

Many of us view persuasion in macho terms. Persuaders are seen as tough-talking salespeople, strongly stating their position, hitting people over the head with arguments, and pushing the deal to a close. But this oversimplifies matters. It assumes that persuasion is a boxing match, won by the fiercest competitor. In fact, persuasion is different. It’s more like teaching than boxing. Think of a persuader as a teacher, moving people step by step to a solution, helping them appreciate why the advocated position solves the problem best. Persuasion, in short, is a process (Perloff 2017).

Try gradually leading audiences to act, framing your message as more reasonable among options, compromising, or any combination of these. And about those other options for decisions. Showing our audience that our message is more reasonable among options requires discussing those other options. If we do not discuss alternatives, and our audience knows of them or learns of them, they may find our approach less credible, and thus less persuasive, because we did not consider them in advocating our message. Narrative patterns

Stories are built upon narrative patterns (Riche et al. 2018). These include patterns for argumentation, the action or process of reasoning systematically in support of an idea, action, or theory. Patterns for argumentation serve the intent of persuading and convincing audiences. Let’s consider three such patterns: comparison, concretize, and repetition.

Comparison allows the narrator to show equality of both data sets, to explicitly highlight differences and similarities, and to give reasons for their difference. We have already seen various forms of graphical comparison used for understanding. In (Knaflic 2015),, the author offers an example showing graphical comparison to support a call to action, see figure 6.

Knaflic's example uses comparison to persuade its audience to hire employees.

Figure 6: Knaflic’s example uses comparison to persuade its audience to hire employees.

Concretizing, another type of pattern useful in argumentation, shows abstract concepts with concrete objects. This pattern usually implies that each data point is represented by an individual visual object (e.g., a point or shape), making them less abstract than aggregated statistics. Let’s consider, first, an example from Reuters. In their article Scarr and Hernandez (2019), the authors encode data as individual images of plastic bottles collecting over time, figure 7, also making comparisons between the collections and familiar references, to demonstrate the severity of plastic misuse.

Authors use individual images of bottles to concretize the problem with plastic.

Figure 7: Authors use individual images of bottles to concretize the problem with plastic.

From a persuasive point of view, how does this form of data encoding compare with their secondary graphic, see figure 8, in the same article:

This graphic reports plastic (mis)use graphically and through annotation.

Figure 8: This graphic reports plastic (mis)use graphically and through annotation.

Exercise 15 Do the two graphics intend to persuade in different ways? Explain.

Here’s another example from news, the New York Times. Manjoo (2019) represents each instance of tracking an individual who browsed various websites. Figure 9 represents a snippet from the full information graphic. The full graphic concretizes each instance of being tracked. Notice each colored dot is timestamped and labeled with a location. The intended effect is to convey an overwhelming sense to the audience that online readers are being watched — a lot.

This snippet of the information graphic shows concretizing each instance of tracking someone's every browser click online to create an overwhelming sense of being watched.

Figure 9: This snippet of the information graphic shows concretizing each instance of tracking someone’s every browser click online to create an overwhelming sense of being watched.

Review the full infographic and consider whether the use of concretizing each timestamped instance, labeled by location, heightens the realization of being tracked more than just reading the more abstract statement that “hundreds of trackers followed me.”

Like concretizing, repetition is an established pattern for argumentation. Repetition can increase a message’s importance and memorability, and can help tie together different arguments about a given data set. Repetition can be employed as a means to search for an answer in the data. Let’s consider another information graphic, which exemplifies this approach. Roston and Migliozzi (2015) uses several rhetorical devices intended to persuade the audience that greenhouse gasses cause global warming. A few of the repeated graphics and questions are shown in figure 10, reproduced from the article.

Repetition is used in several ways in this graphic-based news story.

Figure 10: Repetition is used in several ways in this graphic-based news story. Statistical persuasion

Let’s consider, now, how statistics informs persuasion. Comparison is crucial

We’ve touched upon the importance of comparison. Edward R. Tufte (2006a) explains the centrality of comparison, “The fundamental analytical act in statistical reasoning is to answer the question ‘Compared with what?’”

Abelson (1995), too, forcefully argues that comparison is central: “The idea of comparison is crucial. To make a point that is at all meaningful, statistical presentations must refer to differences between observation and expectation, or differences among observations.” Abelson tests his argument through a statistical example,

The average life expectancy of famous orchestral conductors is 73.4 years.

He asks: Why is this important; how unusual is this? Would you agree that answering his question requires some standards of comparison? For example, should we compare with orchestra players? With non-famous conductors? With the public? With other males in the United States, whose average life expectancy was 68.5 at the time of the study reported by Abelson? With other males who have already reached the age of 32, the average age of appointment to a first conducting post, almost all of whom are male? This group’s average life expectancy was 72.0. Elements of statistical persuasion

Several properties of data, and its analysis and presentation, govern its persuasive force. Abelson describes these as magnitude of effects, articulation of results, generality of effects, interestingness of argument, and credibility of argument: MAGIC.

Magnitude of effects. The strength of a statistical argument is enhanced in accord with the quantitative magnitude of support for its qualitative claim. Consider describing effect sizes like the difference between means, not dichotomous tests. The information yield from null hypothesis tests is ordinarily quite modest, because all one carries away is a possibly misleading accept-reject decision. To drive home this point, let’s model a realization from a linear relationship between two independent, random variables \(\textrm{normal}(x \mid 0, 1)\) and \(\textrm{normal}(y \mid 1, 1)\) by simulating them in R as follows:

y <- rnorm(n = 1000, mean = 1, sd = 1)
x <- rnorm(n = 1000, mean = 0, sd = 1)

And model them using a linear regression,

model_fit <- lm(y ~ x)

Results in a “statistically significant” p-value:

Dependent variable:
x -0.063** (0.031)
Constant 1.005*** (0.030)
Observations 1,000
R2 0.004
Adjusted R2 0.003
Residual Std. Error 0.959 (df = 998)
F Statistic 4.211** (df = 1; 998)
Note: p<0.1; p<0.05; p<0.01

Yet we know there is no actual relationship between the two variables. p-values say little, and can mislead. Here’s what a p-value of less than, say, 0.01 means:

If it were true that there were no systematic difference between the means in the populations from which the samples came, then the probability that the observed means would have been as different as they were, or more different, is less than one in a hundred. This being strong grounds for doubting the viability of the null hypothesis, the null hypothesis is rejected.

More succinctly we might say it is the probability of getting the data given the null hypothesis is true: mathematically, \(P(\textrm{Data} \mid \textrm{Hypothesis})\). There are two issues with this. First, and most problematic, the threshold for what we’ve decided is significant is arbitrary, based entirely upon convention pulled from a historical context not relevant to much of modern analysis.

Secondly, a p-value is not what we usually want to know. Instead, we want to know the probability that our hypothesis is true, given the data, \(P(\textrm{Hypothesis} \mid \textrm{Data})\), or better yet, we want to know the possible range of the magnitude of effect we are estimating. To get the probability that our hypothesis is true, we also need to know the probability of getting the data if the hypothesis were not true:

\[ P(\textrm{H} \mid \textrm{D}) = \frac{P(\textrm{D} \mid \textrm{H}) P(\textrm{H})}{P(\textrm{D} \mid \textrm{H}) P(\textrm{H}) + P(\textrm{D} \mid \neg \textrm{H}) P(\neg \textrm{H})} \]

Decisions are better informed by comparing effect sizes and intervals. Whether exploring or confirming analyses, show results using an estimation approach — use graphs to show effect sizes and interval estimates, and offer nuanced interpretations of results. Avoid the pitfalls of dichotomous tests13 and p-values. Dragicevic (2016) writes, “The notion of binary significance testing is a terrible idea for those who want to achieve fair statistical communication.” In short, p-values alone do not typically provide strong support for a persuasive argument. Favor estimating and using magnitude of effects. Let’s briefly consider the remaining characteristics that Abelson describes of statistical persuasion. These:

Articulation of results. The degree of comprehensible detail in which conclusions are phrased. This is a form of specificity. We want to honestly describe and frame our results to maximize clarity (minimizing exceptions or limitations to the result) and parsimony (focusing on consistent, connected claims).

Generality of effects. This is the breadth of applicability of the conclusions. Over what context can the results be replicated?

Interestingness of argument. For a statistical story to be theoretically interesting, it must have the potential, through empirical analysis, to change what people believe about an important issue.

Credibility of argument. Refers to believability of a research claim, requiring both methodological soundness and theoretical coherence.

Let’s get back to the ever-important concept of comparison.

In language describing quantities, we have two main ways to compare. One form is additive or subtractive. The other is multiplicative. We humans perceive or process these comparisons differently. Let’s consider an example from Andrews (2019):

The Apollo program crew had one more astronaut than Project Gemini. Apollo’s Saturn V rocket had about seventeen times more thrust than the Gemini-Titan II.

We process the comparative language of “seventeen times more” differently than “1,700 percent more” or “33 versus 1.9.” Add and subtract comparisons are easier for people to understand, especially with small numbers. Relative to additive comparisons, multiplying or dividing are more difficult. This includes comparisons expressed as ratios: a few times more, a few times less. People generally try to interpret multiplying operations through pooling, or repeat addition.

In Andrews’s example, it may be better to show a graphical comparison,

A bar chart allows relative comparisons between quantities that may be generally more useful than merely displaying numbers.

Figure 11: A bar chart allows relative comparisons between quantities that may be generally more useful than merely displaying numbers.

Statistics and narrative.

We’ve discussed narrative and statistics as forms of persuasion. And we’ve seen examples of their combination. Is the combination more persuasive than either individual form? Some researchers have claimed that the persuasive effect depends on the data and statistics. Krause and Rucker (2020) argue from an empirical study that narrative can improve less convincing data or statistics, but may actually detract from strong numerical evidence. Their study involved survey responses from participants that reviewed \(a_1\)) less favorable data (a phone that was relatively heavy and shatter-tested in a 3-foot drop) in the form of a list and \(a_2\)) the same data embedded within a narrative. The data was then changed to be more favorable (a phone that was relatively light and shatter-tested in a 30-foot drop) and \(b_1\)) placed into a list and \(b_2\)) the same, more favorable data was embedded within the same narrative. When comparing responses involving the less-favorable data, the researchers found that the narrative form positively influenced participants relative to presenting the data alone. But when comparing responses involving the more favorable data, the relationship reversed. Respondents was more swayed by the data alone than when presented with it embedded within the narrative. Of note, there was no practical (or significant) difference in responses between narratives with either data. They conclude, from the study, that narratives operate by taking the focus off the data, which may either help or harm a claim, depending on the strength of the data.

But a review of the actual narrative created for the study reveals that the narrative was not about the thing generating the data (a phone and its properties). Instead, the narrative was about a couple hiking that encountered an emergency and used the phone during the emergency. In other words, the data of the phone characteristics amounted to what the advertising industry might call a “product placement.” Product placements, of course, have been found to be effective in transferring sentiment about the narrative to sentiment about the product. But it would be dangerous to generalize from this empirical study to potential effects and operations of other forms of narrative. Instead of choosing between listing convincing data on its own or embedding it as a product placement, we should consider providing narrative context focused on the data and thing that generated it. In other words, we can create a narrative that emphasizes the data, instead of shifting our audiences’ focus from the data. And we can create that narrative context using metaphor, simile, and analogy, discussed next. Comparison through metaphor, simile, analogy

Metaphor adds to persuasiveness by reforming abstract concepts into something more familiar to our senses, signaling particular aspects of importance, memorializing the concept, or providing coherence throughout a writing14. The abstract concepts we need help explaining, ideas we need to make important, or the multiple ideas we need to link, we call the target domain. Common source domains include the human body, animals, plants, buildings and constructions, machines and tools, games and Sport, money, cooking and food, heat and cold, light and darkness, and movement and direction. Let’s consider some examples.

In a first example, we return to Andrews (2019). As a book-length communication, it has more space to build the metaphor, and does so borrowing from the source domain of music:

How do we think about the albums we love? A lonely microphone in a smoky recording studio? A needle’s press into hot wax? A rotating can of magnetic tape? A button that clicks before the first note drops? No!

The mechanical ephemera of music’s recording, storage, and playback may cue nostalgia, but they are not where the magic lies. The magic is in the music. The magic is in the information that the apparatuses capture, preserve, and make accessible. It is the same with all information.

After setting up this metaphor, he repeatedly refers back to it as a form of shorthand each time:

When you envision data, do not get stuck in encoding and storage. Instead, try to see the music. … Looking at tables of any substantial size is a little like looking at the grooves of a record with a magnifying glass. You can see the data but you will not hear the music. … Then, we can see data for what it is, whispers from a past world waiting for its music to be heard again.

What, if anything, do you think use of this source domain adds to the audiences understanding of data and information? For other uses of simile and metaphor for data analytics concepts, see McElreath (2020), using mythology (a golem) to represent properties of statistical models, Rosenbaum (2017), using poetry about a road not taken, Frost (1921) to explain how we think about, and the properties of, co-variates.

Exercise 16 Find other examples of metaphor and simile used to describe data science concepts. Do you believe they aide understanding for any particular audience(s)? Explain. Patterns that compare, organize, grab attention

We can use patterns to “make the words they arrange more emphatic or memorable or otherwise effective” (Farnsworth 2011). In Classical English Rhetoric, Farnsworth provides a wealth of examples, categorized. Unexpected word placement calls attention to them, creates emphasis by coming earlier than expected or violating the reader’s expectations. Note that, to violate expectations necessarily means reserving a technique like inversion for just the point to be made, lest the reader come to expect it — more is less, less is more. Secondly, it can create an attractive rhythm. Thirdly, when the words that bring full meaning come later, it can add suspense, and finish more climactic.

These patterns can be the most effective and efficient ways to show comparisons and contrasts. While Farnsworth provides a great source of these rhetorical patterns in more classical texts, we can find plenty of usage in something more relevant to data science. In fact, we have already considered a visual form of repetition in section Let’s consider example structure (reversal of structure, repetition at the end) used in another example text for data science, found in Rosenbaum (2017):

A covariate is a quantity determined prior to treatment assignment. In the Pro-CESS Trial, the age of the patient at the time of admission to the emergency room was a covariate. The gender of the patient was a covariate. Whether the patient was admitted from a nursing home was a covariate.

The first sentence begins “A covariate is …” Then, the next three sentences reverse this sentence structure, and repeat to create emphasis and nuance to the reader’s understanding of a covariate. Here’s another pattern (Repetition at the start, parallel structure) from Rosenbaum’s excellent book:

One might hope that panel (a) of Figure 7.3 is analogous to a simple randomized experiment in which one child in each of 33 matched pairs was picked at random for exposure. One might hope that panel (b) of Figure 7.3 is analogous to a different simple randomized experiment in which levels of exposure were assigned to pairs at random. One might hope that panels (a) and (b) are jointly analogous to a randomized experiment in which both randomizations were done, within and among pairs. All three of these hopes may fail to be realized: there might be bias in treatment assignment within pairs or bias in assignment of levels of exposure to pairs.

Repetition and parallel structure are especially useful where, as in these examples, the related sentences are complex or relatively long. Let’s consider yet another pattern (asking questions and answering them):

Where did Fisher’s null distribution come from? From the coin in Fisher’s hand.

Rhetorical questions or those the author answers are a great way to create interest when used sparingly. In your own studies, seeing just a few examples invites direct imitation of them, which tends to be clumsy when applied. Immersion in many examples, however, allows them to do their work by way of a subtler process of influence, with a gentler and happier effect on our resulting style of narrative. Le mot juste — the exact word

Writing poetically, Goodman (2008) explains the importance of finding the exact word. Le mot juste, in French, is how it’s expressed.

In our search we must also keep in mind, and use, words with the appropriate precision, as Alice explains:

“When I use a word,” Humpty Dumpty said in rather a scornful tone, “it means just what I choose it to mean—nothing more nor less.”

“The question is,” said Alice, “whether you can make words mean so many different things” (Carroll 2013).

Yet empirical studies suggest variation in our understanding of words that express quantity. For words meant to convey quantity, their meanings vary more than Alice would like. Barclay et al. (1977) reports survey responses from 23 NATO military officers who were asked to assign probabilities to particular phrases if found in an intelligence report. In another, online survey of 46 individuals, zonination (2015) provided responses to the question: What [probability/number] would you assign to the phrase [phrase]? where the phrases matched those of the NATO study. The combined responses in figure 12 show wide variation in what probabilities individuals associate with words, although some ordering or ranking is evident.

Results from the combined studies reflect uncertainty in the probability that people associate with words.

Figure 12: Results from the combined studies reflect uncertainty in the probability that people associate with words.

As with variation in probabilities assigned to words about uncertainty, the empirical study suggests variation in amounts assigned to words about size, shown in Figure 13.

Even words whose definitions refer to counts have significant variation in perceived meaning.

Figure 13: Even words whose definitions refer to counts have significant variation in perceived meaning.

Variance in perception of the meaning of such words does not imply we should avoid them altogether. It does mean, however, we should be aware of the meaning others may impart and complement our use of them with numerals or graphic displays.

1.3.3 Heuristics and biases

Humans have two separate processes for understanding information, which Kahneman (2013) labels as system one and system two. If we are to find common ground, and move our audience to a new understanding for decisionmaking, we must understand how they think. Intuitive (system one) thinking — impressions, associations, feelings, intentions, and preparations for actions — flow effortlessly. This system mostly guides our thoughts, as illustrated next. Most of us immediately sense emotion from the face below, system one processing, but would need to work hard to mentally calculate 17 x 24, system two processing.

System one uses heuristics, biases. Reflective (system two) thinking, in contrast, is slow, effortful, and deliberate. Both systems are continuous, but system two typically monitors things, and only steps in when stakes are high, we detect an obvious error, or rule-based reasoning is required. For a sense of this difference, Kahneman provides exemplaray information that we process using system one, as in the above image, and system two, as in mentally calculating 17 x 24. For other examples, consider figure 7 (system one) and figure 8 (processing may depend on familiarity with the graphic — a alluvial diagram — and which comparisons are of focus within the graphic).

On how humans process information, we have decades of empircal and theoretical research available (Gilovich, Griffin, and Kahnman 2009), and theoretical foundations have long been in place (J. B. Miller and Gelman 2020).

Kahneman, Lovallo, and Sibony (2011) gives executives ways to guard against some biases by asking questions and recommending actions:

self-interested biases | Is there any reason to suspect the team making the recommendation of errors motivated by self-interest? Review the proposal with extra care, especially for over optimism.

the affect heuristic | Has the team fallen in love with its proposal? Rigorously apply all the quality controls on the checklist.

groupthink | Were there dissenting opinions within the team? Were they explored adequately? Solicit dissenting views, discreetly if necessary.

saliency bias | Could the diagnosis be overly influenced by an analogy to a memorable success? Ask for more analogies, and rigorously analyze their similarity to the current situation.

confirmation bias | Are credible alternatives included along with the recommendation? Request additional options.

availability bias | If you had to make this decision in a year’s time, what inform-ation would you want, and can you get more of it now? Use checklists of the data needed for each kind of decision.

anchoring bias | Where are the numbers from? Can there be … unsubstantiated numbers? … extrapolation from history? … a motivation to use a certain anchor? Re-anchor with data generated by other models or benchmarks, and request a new analysis.

halo effect | Is the team assuming that a person, organization, or approach that is successful in one area will be just as successful in another? Eliminate false inferences, and ask the team to seek additional comparable examples.

sunk-cost fallacy, endowment effect | Are the recommenders overly attached to past decisions? Consider the issue as if you are a new executive.

overconfidence, optimistic biases, competitor neglect | Is the base case overly optimistic? Have a team build a case taking an outside view: use war games.

disaster neglect | Is the worst case bad enough? Have the team conduct a premortem: imaging that the worst has happened, and develop a story about the causes.

loss aversion | Is the recommending team overly cautious? Align incentives to share responsibility for the risk or to remove risk.

We increase persuasion by addressing these issues in anticipation that our audience will want to know. It’s very hard to remain aware of our own biases, so we need to develop processes that identify them and, most importantly, get feedback from others to help protect against them. Get colleagues to help us. Present ideas from a neutral perspective. Becoming too emotional suggests bias. Make analogies and examples comparable to the proposal. Genuinely admit uncertainty in the proposal, and recognize multiple options. Identify additional data that may provide new insight. Consider multiple anchors in a proposal.

1.4 Integrating Text and Data

1.4.1 Layout, hierarchy, and integration

Visual presentation is communication. Typography

For visual presentation of communication, we may first think about a data graphic. But consider this paragraph from Strunk and White (2000), white space removed:


The visual presentation of communication involves all best practices in typography and design. Adding white space between words, just one of many components of typography, is an obvious decision. It makes the advice from Strunk and White15 more readable, more understandable:

Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts. This requires not that the writer make all his sentences short, or avoid all detail and treat subjects only in outline, but that every word tell. A single overstatement, wherever or however it occurs, diminishes the whole, and a carefree superlative has the power to destroy, for readers, the object of your enthusiasm.

Best practices in visual presentation of communication go well beyond spacing between words. Butterick (2018) credits a great deal to, among others, Bringhurst (2004), explaining best practices, well, best. Typography is the visual component of the written word. “Typography is for the benefit of the reader”:

Most readers are looking for reasons to stop reading. . . . Readers have other demands on their time. . . . The goal of most professional writing is persuasion, and attention is a prerequisite for persuasion. Good typography can help your reader devote less attention to the mechanics of reading and more attention to your message.

The typographic choices in the PDF versions of our memo examples 1 and 2 and proposal follow Butterick (2018)’s advice:

Basic typographic guidelines as implemented in examples.

Figure 14: Basic typographic guidelines as implemented in examples.

Those best practices do more than aid readability. Experiments have demonstrated that “high quality typography can improve mood [of the reader]” (Larson and Picard 2005), and the better their mood, the more likely they are to consider what you say.

Butterick’s recommendations, and as implemented in the example memos, are designed functionally. When designing communications for the interwebs, also consult Rutter (2017). There will be occasions, however, when more creativity can be used in combination with functionality. Information graphics are an example. You may find inspiration in de Bartolo, Coles, and Spiekermann (2019), which studies the creative placement of text. Not just for text, typography — layout — is for all communication: text, numbers, data graphics, and images. Laying out numbers: tables

Stand-alone numbers should generally fit in the context of a sentence. When reporting multiple numbers, though, consider a table within a paragraph to aid comparisons (Edward R. Tufte 2001b).

Tables require design. Text and numbers are best when aligned to an invisible grid that creates negative or white space between columns and other components of the table. Invisible is key, as grid lines between all rows and columns detract from the data Wong (2013). (Wainer 2016a) works through an example series of tables for multivariate comparisons, and considers its design, data transformations, and organization to aid audience understanding.16 Along with Tufte, Wainer, and Wong, J. E. Miller (2015) provides us another great resource for advice on creating, and showing examples of, tables.

R. L. Harris (1999) names and describes the components of a typical table, not all of which are always used or if used should be visible:

Basic components of a table.

Figure 15: Basic components of a table.

The better designed tables will minimize any markings other than the data and annotations explaining the data, relying on Gestalt (a subset of design) principles, two17 of which are proximity and similarity. The Gestalt principle of proximity reflects that we perceive markings closer together — relative to other markings — as belonging to the same group.

Gestalt principles of proximity: can perceive the left group of dots as grouped horizontally and the right group of dots grouped vertically simply based on relative proximity to one another.

Figure 16: Gestalt principles of proximity: can perceive the left group of dots as grouped horizontally and the right group of dots grouped vertically simply based on relative proximity to one another.

As with proximity, we can create the perception of groupings based on similarity of color, or shape, or another attribute. Here’s an example in which the horizontal spacing and vertical spacing are equal to demonstrate the color attributes ability to group:

Gestalt principle of similarity can help us create perceived groups too. In this example, the dots have equal horizontal and vertical spacing, but use different shades.

Figure 17: Gestalt principle of similarity can help us create perceived groups too. In this example, the dots have equal horizontal and vertical spacing, but use different shades.

Consider these Gestalt principles at work in this example table:

Notice the components, think about the underlying grid system organizing the content, the alignment and position of each type of information, and how proximity and similarity help to separate these different information types.

Along with placing numbers in text — sentences — or in tables, we can re-organize them into a hybrid form, having attributes of a table and a graph. This hybrid, called a stem-and-leaf diagram, has attributes of a table because it is constructed with actual numbers, just more compactly than a pure table. It’s also like a graphic in that its profile conveys distributional information similar to a histogram. Figure 18 below provides an example, which is interactive, too. Hover your cursor over a number amongst the “leaves” for instructions interpreting the number:

Figure 18: 5 | 4 represents 54. This stem-and-leaf diagram provides more information than it’s modern replacement, the histogram, in a way more compactly than a table. Hover your cursor over a leaf to show the value represented by the stem and leaf. The representation can be confusing for audiences to whom this is unfamiliar.

R. L. Harris (1999) thoroughly explains variations on stem-and-leaf diagrams. Grid systems and narrative layout

Another aspect of typography and design rely on grid systems. A very basic grid is shown in figure 14, some of its components drawn in brown and labeled in gray: gutters, module, and margin. The gutters between the gridlines create white space that separate information placed into columns, rows, modules, or spatial zones (a spatial zone comprises multiple modules or rows or columns). Of course, the grid lines are not part of the final communication; we create them temporarily to layout and align information. That layout is informed by visual perception and the way we process information in a given culture. When reading English, for example, we generally start processing the information from the top, left, our eyes scanning to the right, and then returning left and down, in a repeating zig-zag pattern. Hebrew is right to left. We call this type of narrtive structure linear (Koponen and Hildén 2019). And various graphic design choices can purposefully or inadvertently guide the reader through the material in other ways. Images, unlike sentences, create an open narrative structure, allowing us to reason differently (Koponen and Hildén 2019). We’ll come back to this concept.

Grid systems can be much more complex. We are guided by Muller-Brockmann in his seminal reference, “Arranging surfaces and spaces into a grid creates conformity among texts, images and diagrams. The size of each implies its importance. Reducing elements in a grid suggests planning, intelligibility, clarity, and orderliness of design. One grid allows many creative ways to show relationships” (Müller-Brockmann 1996). A grid with 8 rows by 4 columns and gutter spacing between the blocks, for example, can lead to numerous arrangements of disparate, but related, information:

Yet the commonly aligned sides of word blocks, images, and data graphics can help connect related information. By connect, we mean the layout creates or enables a path that the audience’s eye follows, a scan path. In this paragraph of text, you started reading at its beginning and followed horizontally until the end of the line, then scanned to the left beginning of the line below and repeated the process. In strip comics, the sequentially arranged images encourage a similar linear narrative. But other layouts enable an open narrative. These include radial layouts in which the order we scan relies on focal points, which are prominent components due to, say, their size or color in relation to the surrounding information. Of note, in some circumstances we my intend a serial narrative within an open narrative. Consider labeling or numbering the features, using gestalt principles, or both, to guide the audience.

Thus, as Müller-Brockmann (1996) explained, grids enable orderliness, adds credibility to the information, and induces confidence. Information presented with clear and logically set out titles, subtitles, texts, illustrations and captions will not only be read more quickly and easily but the information will also be better understood.

Exercise 17 Try to identify placement of the (invisible) grid lines used to align information in the Dodgers proposal, which is primarily text.
Exercise 18 Consider the poster version of the information graphic Bremer (2016). Try to identify placement of the (invisible) grid lines used for alignment.

1.4.2 Combined meaning of words and images

Words, graphics, and images — when combined — can provide some extent what Doumont (2009) prescribed: effective redundancy. This is sometimes called dual coding. And to maximize their combination, we first consider that we process languages and images differently (Ware 2020). Words are read, and processed in linear fashion, serially, one after the other. Images, on the other hand, can be processed or understood as a whole, in parallel.

Secondly, each type of medium conveys meaning differently; neither exactly overlap: a description of an image never actually represents the image. Rather, … it is a representation of thinking about having seen a picture — it’s already formulated in its own terms (Sousanis 2015), paraphrasing (Baxandall 1985). Each is better at conveying certain types of messages. Sousanis puts it: “while image is, text is always about.” Text is usually better for expressing abstract concepts, and procedure, such as logic or programming. Diagrams help when explaining structural relationships.

We can benefit from various studies into the interplay of words and images found in comics, (Cohn 2016); (Sousanis 2015); (McCloud 1993), and extrapolate those concepts into information visualization. Done right, each informs and enriches the other. Images and graphics also enable a unique form of comparison, juxtaposing one image or encoding to another — or to the absence of another — to form meaning.

1.4.3 Visually integrating graphics and text

Good design and typography also enable visual connections between words and sentences to, say, data graphics. Edward R. Tufte (2001b) explains, at their best, graphics are instruments for reasoning about quantitative information. Often the most effective way to describe, explore, and summarize a set of numbers—even a very large set—is to look at pictures of those numbers. Furthermore, of all methods for analyzing and communicating statistical information, well-designed data graphics are usually the simplest and at the same time the most powerful. And if “a means of persuasion is a sort of demonstration,” and we now agree with Aristotle that it is, then graphics are frequently the most effective way to demonstrate things, especially for understanding patterns and comparisons.

But it isn’t a Hobson’s choice, words or graphics. Instead, we should use both. Edward R. Tufte (2001b) explains how they work together: “The principle of data/text integration is: data graphics are paragraphs about data and should be treated as such.”

Visual displays may be integrated directly within the text. Tufte’s book is a living example, and explains the approach:

We were able to integrate graphics right into the text, sometimes into the middle of a sentence, eliminating the usual separation of text and image — one of the ideas Visual Display advocated.

Experiments support Tufte’s advice. Koponen and Hildén (2019) summarizes an experiment of eye-tracking movements and comprehension when reading communications in various layouts (Holsanova, Rahm, and Holmqvist 2006), from which we learned that layouts integrating images within text columns improve communication over both radial layouts and layouts that separte text from images. The integrated approach promoted careful reading of the text between images while layouts separating text from images promoted the reading of a title, skipping the body text, and focusing on the images. Radial layouts were reviewed more quickly than linear, integrated text-image layouts, and less information was retained.

For effective integration, visual display need only be large enough to clearly convey the information as intended for our audience in the manner to be consumed. To make the point, consider the word-sized graphics Edward R. Tufte (2006a) calls sparklines: .18 Also note that when the graphic is large enough to include annotation,

The principle of text/graphic/table integration also suggests that the same typeface be used for text and graphic and, further, that ruled lines separating different types of information be avoided.

Exercise 19 Locate two or three narratives with data graphics as paragraphs that you believe the graphic helped persuade audiences of the point of the narrative. Explain why the graphic explained better than words as used.

1.4.4 Annotating data graphics with words

Annotations add explanations and descriptions to introduce the graph’s context, which is important for almost any audience. Annotation plays a crucial role in asynchronous data storytelling as the surrogate for the storyteller. They can also explain how to read the graph, which helps readers unfamiliar with the graph — whether a simple line chart or an advanced technique like a treemap or scatterplot. When done right, the annotation layer will not get in the way for experienced users. Consider, for example, figure 19.

Example of data graphic containing annotation to assist the audience.

Figure 19: Example of data graphic containing annotation to assist the audience.

From a cognitive perspective, Ware writes that “plac[ing] explanatory text as close as possible to the related parts of a diagram, and us[ing] a graphical linking method” will “reduce [the] need to store information temporarily while switching back and forth between locations” (Ware 2020). Figure 19, published in newspaper article Schleuss and Lin II (2013), displays a scatter plot that encodes the rate change of crime on the x-axis, change of property crime on the y-axis, and rate of crimes as size of the location or point. Note the plot is segmented into quadrants, color-coded to indicate better and worse conditions, and annotations are overlain that explain how to interpret the meaning of a data point located within quadrants of the graphic. The various annotations greatly assist its general audience in decoding the data and considering insights.

Rahlf (2019) provides over 100 examples of annotating and modifying exploratory graphics for presenting in communication., and should be consulted along with this text’s section 2.

1.4.5 Visually linking words with graphics

Placement of data graphics within words and annotating graphics with words are the first step in integrating the information. Another best practice includes using color encodings and other explicit markings, linking words to encodings, such as adding lines connecting related information (Riche et al. 2018):

The link between the narrative and the visualization helps the reader discern what item in the visualization the author is referencing in the text. Create links with annotation, color, luminosity, or lines.

For example, color words in annotations on a data graphic and in the paragraphs surrounding that graphic with the same hue as used in the data encodings of the graphic. This follows the principle of similarity, discussed earlier. Let’s consider an example table, the example we created in section, placing it into a paragraph and linking its data to surrounding words (a form of data display):

Using Table 1, we can calculate the value of a strike by subtracting the expected run value of a strike, given the game state and count, from the value of a ball, starting from the same game state and count. Let’s say there is a runner on first and second with one out, and the count is 1 ball, 1 strike, suggesting we should expect 0.99 more runs this inning:

Assuming the batter doesn’t swing on the next pitch, a strike lowers expected runs to 0.86 while a ball raises it to 1.11. Thus, in this scenario, the expected value of a strike would be 0.86 - 1.11, or -0.25 runs.

Consider the ways we apply the principles of proximity and similarity. Does in-paragraph placement (proximity) and text-data coloring (similarity) help us in learning to use the table? For other examples, see Kay (2015), which provides example uses of color for linking words to data encodings. Yet for another great example of linking paragraphs with illustrations, see Byrne’s revision of Euclid’s first six books (Byrne 2017).

1.4.6 Linking multiple graphics

If individual graphs reveal information and structure from the data, an ensemble of graphs can multiply the effect. By ensemble, we mean multiple graphs simultaneously displayed, each containing different views of the data with common information linked together by various techniques. We’ve already seen one form of an interactive linkage between two graphics in figure 1, which dynamically linked each baseball stadium field boundary to the corresponding fence. And while Cleveland (1985) describes “brushing and linking” — where items selected in on one visual display highlights the same subset of observations in another visual display — as an interactive tool, he effectively shows the technique by highlighting the same data across static displays. Authors Unwin and Valero-Mora (2018) provide a nice example, walking through use of ensembles in exploring data quality, comparing models, and presenting results. As the authors explain,

Coherence in effective ensembles covers many different aspects: a coherent theme, a coherent look, consistent scales, formatting, and alignment. Coherence facilitates understanding.

The additional effort for coherence “are more design than statistics, but they are driven by the statistical information to be conveyed, and it is therefore essential that statisticians concern themselves with them.” Along with using the same theme styles, their choice of placement is informed by best practices in graphic design, which apply a grid system, already discussed.

We’ve covered a lot of material. We can use all these techniques to help in writing a brief proposal to a chief analytics officer, asking him or her to approve our analytics project. Recall the Dodgers memo, example 2? Let’s continue that example with a 750-word brief proposal, see Spencer (2019b). To assess whether the example proposal accomplishes its goals, note the audience. As previously explained, his background includes a doctor of philosophy in Statistics, and experience with machine learning and statistical programming in R.

Exercise 20 Try to identify the document structure in the example brief proposal. Does it identify problems and goals? Data? Methods? Compare the structure, specificity and level of detail to both the memos, examples 1 and 2. Next, consider the tools we’ve covered in business writing, starting with messages and goals, applying typographic best practices, aligning information with grids, integrating graphics within paragraphs, linking words and graphics, annotation, and use of comparison, metaphor, patterns, and examples or analogies to persuade. How many can you find? If you were the director would you be persuaded to approve of the project? Why or why not? How might you edit the proposal to make it more persuasive?

2 Visual

2.1 Visual Design and Perceptual Psychology

The value of data graphics can be grasp from a brief analysis of the following four datasets (1-4) of (x, y) data in table 1 from a famous data set:

Table 1: These four simple datasets are known as Anscombe’s Quartet.
x y x y x y x y
10 8.04 10 9.14 10 7.46 8 6.58
8 6.95 8 8.14 8 6.77 8 5.76
13 7.58 13 8.74 13 12.74 8 7.71
9 8.81 9 8.77 9 7.11 8 8.84
11 8.33 11 9.26 11 7.81 8 8.47
14 9.96 14 8.10 14 8.84 8 7.04
6 7.24 6 6.13 6 6.08 8 5.25
4 4.26 4 3.10 4 5.39 19 12.50
12 10.84 12 9.13 12 8.15 8 5.56
7 4.82 7 7.26 7 6.42 8 7.91
5 5.68 5 4.74 5 5.73 8 6.89

For most of us, reviewing the table for comparing the four datasets from Anscombe (1973) is cognitively taxing, and especially when scanning for differences in the relationships between x and y across datasets. Processing the data this way occurs sequentially; we review data pairs with focused attention. And, here, summary statistics do not differentiate the datasets. All x variables share the same mean and standard deviation (table 2). So do all y variables.

Table 2: The mean and standard deviation per dataset are identical.
x y x y x y x y
mean 9.00 7.50 9.00 7.50 9.00 7.50 9.00 7.50
sd 3.32 2.03 3.32 2.03 3.32 2.03 3.32 2.03

Further, the linear regression on each dataset (table 3) suggests that the (x, y) relationship across datasets are the same. Are they?

Table 3: Linear regression coefficients across datasets are practically identical.
Parameter Mean Std Err t-val p-val
Dataset 1
(Intercept) 3.000 1.125 2.667 0.026
x 0.500 0.118 4.241 0.002
Dataset 2
(Intercept) 3.001 1.125 2.667 0.026
x 0.500 0.118 4.239 0.002
Dataset 3
(Intercept) 3.002 1.124 2.670 0.026
x 0.500 0.118 4.239 0.002
Dataset 4
(Intercept) 3.002 1.124 2.671 0.026
x 0.500 0.118 4.243 0.002

A well-crafted visual display, however, can instantly illuminate any differing (x, y) relationships among the datasets. To demonstrate, we arrange four scatterplots in figure 20 showing the relationships between (x,y), one for each dataset. Overlain on each, we show the linear regression calculated above.

The differing (x, y) relationships among datasets in Anscombe's Quartet become instantly clear when visualized.

Figure 20: The differing (x, y) relationships among datasets in Anscombe’s Quartet become instantly clear when visualized.

As the example shows, exploratory data analysis using visual and spatial representations add understanding. It allows us to find patterns in data, detecting or recognizing the geometry that encodes the values, assembling or grouping these detected elements, and estimating the relative differences between two or more quantities (Cleveland 1985); (Cleveland 1993). In estimating, we first discriminate between data: we judge whether \(\textbf{a}\) is equal to \(\textbf{b}\). Then we rank the values, judging whether \(\textbf{a}\) is greater than, less than, or equal to \(\textbf{b}\). Finally, we consider the ratio between them using encoded geometries (e.g., relative distance from a common line). Unlike with sequential processing required for table lookups, pattern recognition — and outliers from those patterns — seem to occur in parallel, quickly because we are attuned to preattentive attributes (Ware 2020).

2.1.1 Reasoning with images

We previously mentioned how, unlike processing text in linear fashion, images enable an open narrative, which we process differently (Koponen and Hildén 2019); (Sousanis 2015); (Kosslyn, Thompson, and Ganis 2006); (Baxandall 1985).

We may also combine linear and open narrative structures in various ways (Segel and Heer 2010).

2.1.2 Components of a graphic

Graphics include a coordinate system, arranged spatially, and have numerous attributes that we may make visible in some way, if it helps users understand the graphic. These components can be understood in two categories. Those encoding data (data-ink) and all the rest (non-data-ink). Non-data-ink

We’ll use an R/ggplot implementation of graphics to discuss these components19. Figure shows the names for most of the non-data-ink components of a visual display.

Edward Tufte advocates maximizing the data-ink ratio within reason. Some non-data-ink can be critical to understanding the data. For each element's marking, coloring, size, shape, orientation, or transparency setting, ask whether it maximizes our audience's understanding of the intended insight.

Figure 21: Edward Tufte advocates maximizing the data-ink ratio within reason. Some non-data-ink can be critical to understanding the data. For each element’s marking, coloring, size, shape, orientation, or transparency setting, ask whether it maximizes our audience’s understanding of the intended insight.

Most of the aesthetics of each labeled component can be set, modified, or removed using the ggplot function theme(), which takes plot components as parameters. We set parameters equal to other formatting functions like, say, element_text() for formatting its typography, element_rect() for formatting its various shape or coloring information, or element_blank() to remove entirely the element. In Figure , for example, we set the panel border attribute linetype and color using,

theme(panel.border = element_rect(color = "gray60", 
                                  linetype = "dashed", 
                                  fill = NA))

We can use the ggplot function annotate() to include words or draw directly onto the plotting area. Figure 22 shows the basic code structure.

GGplot's functions are set up as layers. We may use more than one geometry or annotation.

Figure 22: GGplot’s functions are set up as layers. We may use more than one geometry or annotation.

In the pseudocode of figure 22, we map variables in the data to aesthetic characteristics of a plot that we see through mapping = aes(<aesthetic> = <variable>)20. Particular aethetics depend on the type of geometric encoding we choose. A scatter plot, say, would at least include x and y aesthetics. The geometric encodings are created through functions named for their geometries: e.g., geom_point(<...>) for the scatter plot, which we generalize to geom_<type>(<...>). The geometry is then mapped onto a particular coordinate system and scale: coord_<type>(<...>) and scale_<mapping>_<type>(<...>), respectively. Finally, we annotate and label the graph. These can be thought as layers that are added (+) over each previous layer.

The remaining markings of a graphic are the data-ink, the data encodings, discussed next. Data-ink

Encodings depend on data type, which we introduced in section As Andrews (2019) explains, “value types define how data is stored and impact the ways we turn numbers into information.” To recap, these types are either qualitative (nominal or ordered) or quantitative (interval or ratio scale).

“A component is qualitative” and nominal, Bertin21 explains, “when its categories are not ordered in a universal manner. As a result, they can be reordered arbitrarily, for purposes of information processing” (Bertin 2010). The qualitative categories are equidistant, of equal importance. Considering Citi Bike, labeled things such bikes and docking stations are qualitative at the nominal level.

“A component is ordered, and only ordered, when its categories are ordered in a single and universal manner” and “when its categories are defined as equidistant.” Ordered categories cannot be reordered. The bases in baseball are ordinal, or ordered: first, second, third, and home. Examples of qualitative ordering may be, say, temporal: morning, noon, night; one comes before the other, but we would not conceptually combine morning and night into a group of units.

When we have countable units on the interval level, the data of these counts are quantitative. A series of numbers is quantitative when its object is to specify the variation in distance among the categories. We represent these numerically as integers. The number of bike rides are countable units. The number of stolen bases in baseball are countable units. We represent these as integers.

Finally, ratio-level, quantitative values represent countable units per countable units of something else. The number of bike rides per minute and the number of strike outs per batter would be two examples, represented as fractions, real numbers.

The first and most influential structural theory of statistical graphics is found the seminal reference, Bertin (1983).

Based on Bertin’s practical experience as a cartographer, part one of this work is an unprecedented attempt to synthesize principles of graphic communication with the logic of standard rules applied to writing and topography.

Part two brings Bertin’s theory to life, presenting a close study of graphic techniques, including shape, orientation, color, texture, volume, and size, in an array of more than 1,000 maps and diagrams. Here are those encoding types:

Bertin's illustration of the possible encoding forms for data.

Figure 23: Bertin’s illustration of the possible encoding forms for data.

Less commonly discussed is Bertin’s update (Bertin 2010) to his original work. In the update, after defining terms he reviews the natural properties of a graphic image. The essence of the graphic image is described in three dimensions. The first two describe spatial properties (e.g. x and y axes) while the third dimension (denoted z) encodes the characteristics of each mark — e.g. size, value, texture, color, orientation, shape — at their particular spatial (x, y) locations.

Bertin’s ideas, over 50-years old, have proven reliable and robust (MacEachren 2019); (Garlandini and Fabrikant 2009). Grammar

Graphics are not charts, explains Wilkinson (2005):

We often call graphics charts. There are pie charts, bar charts, line charts, and so on. [We should] shun chart typologies. Charts are usually instances of much more general objects. Once we understand that a pie is a divided bar in polar coordinates, we can construct other polar graphics that are less well known. We will also come to realize why a histogram is not a bar chart and why many other graphics that look similar nevertheless have different grammars…. Elegant design requires us to think about a theory of graphics, not charts.

We should think of chart names only as a shorthand for what they do. To broaden our ability to represent comparisons and insights into data, we should instead consider their representation as types of measurement: length along a common baseline, for example, or encoding data as color to create Gestalt groupings.

In Leland Wilkinson’s influential work, he develops a grammar of graphics. That grammar respects a fundamental limitation, a difference from pictures and other visual arts:

We have only a few rules and tools. We cannot change the location of a point or the color of an object (assuming these are data-representing attributes) without lying about our data and violating the purpose of the statistical graphic — to represent data accurately and appropriately.

Leland categorizes his grammar:

Algebra comprises the operations that allow us to combine variables and specify dimensions of graphs. Scales involves the representation of variables on measured dimensions. Statistics covers the functions that allow graphs to change their appearance and representation schemes. Geometry covers the creation of geometric graphs from variables. Coordinates covers coordinate systems, from polar coordinates to more complex map projections and general transformations. Finally, Aesthetics covers the sensory attributes used to rep- resent graphics.

He discusses these components of graphics grammar in the context of data and its extraction into variables. He also extends the discussion with facets and guides.

How do we perceive data encoded in this grammar?

2.1.3 Perceptions of visual data encodings

We assemble mental models of grouping through differences in similarity, proximity, enclosure, size, color, shading, and hue, to name a few. In figure 20, for example, we recognize dataset three as having a grouped linear relationship with one outlier based on proximity. Using shading, for example, we can separate groups of data. In the left panel of figure 24, we naturally see two groups, one gray, and the other black, which has an outlier. We could even enclose the outlier to further call attention to it, as shown on the right panel.

We can use preattentive attributes to separate data categorically and call attention to particular aspects of that data.

Figure 24: We can use preattentive attributes to separate data categorically and call attention to particular aspects of that data.

Several authors22 provide in-depth reviews of these ideas. We can, and should, use these ideas to assist us in understanding and communicating data through graphical displays.

Graphical interpretation, however, comes with its own limitations. Our accuracy in estimating the quantities represented in visual encoding depends on the geometries used for encoding. In other words, it can be easy for us, and less familiar readers, to misinterpret a graph. Consider the example in Figure 25 where the slope of the trend changes rapidly. Considering the left panel alone, it may seem deviations from the fitted line decrease as x increases. But the residuals encoded in the right panel show no difference.

Without careful inspection, it may seem that deviations from the fitted line decrease as x increases. The plot of residuals, however, shows the reverse is true.

Figure 25: Without careful inspection, it may seem that deviations from the fitted line decrease as x increases. The plot of residuals, however, shows the reverse is true.

The misperception arises if we mistakenly compare the minimal distance from each point to the fitted line instead of comparing the vertical distance to the fitted line. Cleveland (1985) has thoroughly reviewed our perceptions when decoding quantities in two or more curves, color encoding (hues, saturations, and lightnesses for both categorical and quantitative variables), texture symbols, use of visual reference grids, correlation between two variables, and position along a common scale. Empirical studies by Cleveland and McGill (1984) and Heer and Bostock (2010) have quantified our accuracy and uncertainty when judging quantity in a variety of encodings.

The broader point is to become aware of issues in perception and consider multiple representations to overcome them. Several references mentioned in the literature review delve into visual perception and best practices for choosing appropriate visualizations. Koponen and Hildén (2019), for example, usefully arranges data types within visual variables and orders them by our accuracy in decoding, shown in figure 26:

Visual variables, organized by how well they are suited for representing data measured on each type of scale.

Figure 26: Visual variables, organized by how well they are suited for representing data measured on each type of scale.

Placing encodings in the context of chart types, figure 27, we decode them from more to less accurate, position encoding along common scales (e.g., bar charts, scatter plots), length encodings (e.g., stacked bars), angles (e.g., pie charts), circular areas (e.g., bubble charts), luminance, and color (Munzner 2014):

We gage position along a common scale more accurately than length, which we gauge more accurately than angle or area. Luminance or color are typically reserved for encoding quantities in a third dimension.

Figure 27: We gage position along a common scale more accurately than length, which we gauge more accurately than angle or area. Luminance or color are typically reserved for encoding quantities in a third dimension.

A thorough visual analysis may require multiple graphical representations of the data, and each require inspection to be sure our interpretation is correct. Color

As mentioned, We can encode data using color spaces, which are mathematical models. The common color model RGB has three dimensions — red, green, and blue, each having a value between 0 and 255 (\(2^8\)) — where those hues are mixed to produce a specific color.

Notice the hue, chroma, and luminance of this colorspace,

seems to have uneven distances and brightness along wavelength.

Let’s consider how we might, as illustrated below, map data to these characteristics of color.

Luminance is the measured amount of light coming from some region of space. Brightness is the perceived amount of light coming from that region of space. Perceived brightness is a very nonlinear function of the amount of light emitted. That function follows the power law:

\[\begin{equation} \begin{split} \textrm{perceived brightness} = \textrm{luminance}^n \end{split} \end{equation}\]

where the value of \(n\) depends on the size of the patch of light. Colin Ware(Ware 2020) reports that, for circular patches of light subtending 5 degrees of visual angle, \(n\) is 0.333, whereas for point sources of light \(n\) is close to 0.5. Let’s think about this graphically. Visual perception of an arithmetical progression depends upon a physical geometric progression (Albers 2006). In a simplification shown in figure 28, this means: if the first 2 steps measure 1 and 2 units in rise, then step 3 is not only 1 unit more (that, is, 3 in an arithmetical proportion), but is twice as much (that is, 4 in a geometric proportion. The successive steps then measure 8, 16, 32, 64 units.

As Albers illustrates, Weber's law, applied to creating color steps we perceive as evenly spaced requires we convert from an arithmetical progress to a geometric progression.

Figure 28: As Albers illustrates, Weber’s law, applied to creating color steps we perceive as evenly spaced requires we convert from an arithmetical progress to a geometric progression.

Color intervals are the distance in light intensity between one color and another, analogous to musical intervals (the relationship between notes of different pitches).

Uneven wavelengths between what we perceive as colors, as we saw in the RGB color space, results in, for example, almost identical hues of green across a range of its values while our perception of blues change more rapidly across the same change in values. We also perceive a lot of variation in the lightness of the colors here, with the cyan colors in the middle looking brighter than the blue colors.

The problem exists in each channel or attribute of color. Let’s consider examples by comparing the hue, saturation, and luminance of two blocks. Do we perceive these rectangles as having the same luminance or brightness?

Do we perceive these as having the same saturation?

Do we perceive these as having equal distance between hues?

There’s a solution, however. Other color spaces show changes in color we perceive as uniform. Humans compute color signals from our retina cones via an opponent process model, which makes it impossible to see reddish-green or yellowish-blue colors. The International Commission on Illumination (CIE) studied human perception and re-mapped color into a space where we perceive color changes uniformly. Their CIELuv color model has two dimensions — u and v — that represent color scales from red to green and yellow to blue.

More modern color spaces improve upon CIELuv by mapping colors as perceived into the familiar and intuitive Hue-Chroma-Luminance23 dimensions. Several modern color spaces, along with modification to accommodate colorblindness, are explained in the expansive Koponen and Hildén (2019). In contrast with the perceptual change shown with an RGB colorspace above, the change in value shown below of our green-to-blue hues in 10 equal steps using the HCL model are now perceptually uniform.

For implementations of perceptually uniform color spaces in R, see Spencer (2020) and Zeileis, Hornik, and Murrell (2009). Usig the perceptually uniform colorspace HSLuv, let’s explore the above hue changes, across various saturation and lumiinance:

With categorical data, we do not want one color value to appear brighter than another. Instead, we want to choose colors that both separate categories while holding their brightness level equal.

When mapping data to color channels — hue, saturation, or lightness — use a perceptually uniform colorspace. Relativity of color

Notice, by the way, that each of the above 10 equal blocks from green to blue appear to show a gradient in hue. We also see this for each step in luminance (but not across blocks of saturation) in our HSLuv comparisons. That isn’t the case, the hue is uniform within each block or step. Our eyes, however, perceive a gradient because the adjacent values create an edge contrast. Humans have evolved to see edge contrasts, as in figure 29.

We see comparative --- not absolute --- luminance value. Adjacent data encoded by color may cause us to misperceive the value we're inspecting.

Figure 29: We see comparative — not absolute — luminance value. Adjacent data encoded by color may cause us to misperceive the value we’re inspecting.

We see comparative — not absolute — luminance value. The edge between the left and right gray rectangles in figure 29, created by a luminance adjustment tricks us into seeing each rectangle as uniform and differing in shade, though the outer portions of each have the same luminance. Need proof? Cover the edge portion between them!

Similarly, our comparative perception has implications for how to accurately represent data using luminance. Background or adjacent luminance — or hue or saturation — can influence how our audience perceives our data’s encoded luminance value. The small rectangles in the top row of figure 30 all have the same luminance, though they appear to change. This misperception is due to the background gradient.

Background information causes us to misperceive that each row of small rectangles are encoded with identical luminance values.

Figure 30: Background information causes us to misperceive that each row of small rectangles are encoded with identical luminance values.

One color can interact to appear as two. The inner blocks on the left are the same color, and the two inner blocks on the right are the same, but different background colors change our perception:

Two different colors can interact to appear as one. In this case, the background colors change our perceptions of the two different inner blocks such that they appear the same:

And contrasting hues with similar luminance can create vibrating boundaries:

These examples were adapted from those in Albers (2006), which discusses yet more ways the relativity of color can mislead us.

Exercise 21 Locate two or three graphics on the internet, each with different types of data-ink encodings you believe are well-designed. Be adventurous. Describe those encodings without using names of charts.
Exercise 22 Locate two graphics, each with different types of data-ink encodings you believe are problematic. Describe the encodings, what makes them problematic, and suggest a more appropriate encoding.
Exercise 23 Explain how you might use the apparent problem of vibrating boundaries to help an audience. Hint: think about why we use gestalt principles.

2.1.4 Maximize information in visual displays

Maximize the information in visual displays within reason. Edward R. Tufte (2001a) measures this as the data-ink ratio:

\[\begin{equation} \begin{split} \textrm{data-ink ratio} =\; &\frac{\textrm{data-ink}}{\textrm{total ink used to print the graphic}} \\ \\ =\; &\textrm{proportion of a graphic's ink devoted to the} \\ &\textrm{ non-redundant display of data-information}\\ \\ =\; &1.0 - \textrm{proportion of a graphic that can be} \\ &\textrm{erased without loss of data-information} \\ \\ \end{split} \end{equation}\]

That means identifying and removing non-data ink. And identifying and removing redundant data-ink. Both within reason. Just how much requires experimentation, which is arguably the most valuable lesson24 from Tufte’s classic book, The Visual Display of Quantitative Information. In it, he systematically redesigns a series of graphics, at each step considering what helps and what may not. Tufte, of course, offers his own view of which versions are an improvement. His views are that of a designer and statistician, based on his experience and theory of graphic design.

Some of his approaches have also been subject to experiments (Anderson et al. 2011), which we should consider within the context and limitations of those experiments. More generally, for any important data graphic for which we do not have reliable information on its interpretability, we should perform tests on those with a similar background to our intended audiences.

Let’s reconsider the example figure from Knaflic:

Knaflic systematically changes a graphic, beginning with the original, default graph on the left, and finishing with the graphic on the right.

Figure 31: Knaflic systematically changes a graphic, beginning with the original, default graph on the left, and finishing with the graphic on the right.

Compare Knaflic’s before-and-after example.

Exercise 24 Try to articulate all differences. Consider whether her changes follow Tufte’s principles, and whether each of her changes would improve her audience’s understanding of the intended narrative and supporting evidence.

For the next example, revisiting the Dodgers, consider the following example analysis related to understanding game attendance as a function of fan preferences for game outcome certainty, since maximizing attendance is a marketing objective:

Example 3 To help us understand game attendance as a function of fan preference for certainty or uncertainty of the game outcome, we created a model. It included variables like day of the week, time of day, and the team’s cumulative fraction of wins. We believe that some uncertainty helps attract people to the game. But how much? It also seems reasonable to believe that the function is non-linear: a change in probability of a win from 0 percent to 1 percent may well attract fewer fans than if from 49 percent to 50 percent. Thus, we modelled the marginal effect of wins as quadratic. Our overall model, then, can be described as:

\[\textrm{Normal}(\theta, \sigma)\]

for game \(i\), where \(\theta\) represents the mean of attendance, \(\sigma\) the variation in attendance, and \(\theta\) itself decomposed:

\[\begin{equation} \begin{split} \theta_i \sim &\alpha_{1[i]} \cdot \textrm{day}_i + \alpha_{2[i]} \cdot \textrm{time}_i + \\ &\beta_{1[i]} \cdot \frac{\sum{\textrm{wins}_i}}{\sum{\textrm{games}_i}} + \beta_{2[i]} \cdot p(\textrm{win}_i) + \beta_{3[i]} \cdot p(\textrm{win}_i)^2 \end{split} \end{equation}\]

With posterior estimates from the model, we calculated the partial derivative of estimates of win uncertainty (\(\beta_2\) and \(\beta_3\)) to find a maximum:

\[\textrm{Maximum} = \frac{-\beta_2}{2 \cdot \beta_3 }\]

For the analysis, we used betting market odds as a proxy for fans’ estimation of their team’s chances of winning. The betting company Pinnacle made these data available for the 2016 season, which we combined with game attendance and outcome data from Retrosheets.

The analysis included the exploratory graphic on the left, using default graphic settings, in figure 32, and the communicated graphic on the right.

An exploratory graphic using R / ggplot default settings on the left, and corresponding presentation graphic on the right.

Figure 32: An exploratory graphic using R / ggplot default settings on the left, and corresponding presentation graphic on the right.

Exercise 25 Try to articulate all differences between the exploratory and communicated graphic. Consider whether the changes follow Tufte’s principles, if so, which, and whether each of the changes would improve the marketing audience’s understanding of the intended narrative and supporting evidence. Can you imagine other approaches?

2.2 Visually Encoding Data, Common and Xenographic

2.2.1 Encoding data-ink, common graphics

Resources abound for encoding and coding common graphics. In an award-winning graphic form, Holtz and Healy (2018) provides taxonomies for common graphics, and an analysis of basic charts. See also, (Healy 2018b); (Knaflic 2015); (R. L. Harris 1999); (Cleveland 1993); (Cleveland 1985). Again consulting the Data Visualization Handbook will explain common statistical graphics, including bar charts, dot plots, line charts and their variants, like slopegraphs, streamgraphs, bumps charts, cycle plots, sparklines, pie and donut charts, scatterplots (scatter or x-y, strip plot, beeswarm plot), bubble charts, heatmaps, box plots, violin plots, and many more.

We should not try to memorize each type. Instead, we should understand how they work using the language and ideas from section 2.1.3. Apply the the advice about studying metaphor and rhetorical figures (section when constructing graphics, too:

Seeing just a few examples invites direct imitation of them, which tends to be clumsy. Immersion in many examples allows them to do their work by way of a subtler process of influence, with a gentler and happier effect on the resulting style.

Of note, the difference between common graphics and what has been called xenographics is somewhat arbitrary. The more important point is not the name we use — chart names are just short-hand to convey instances of graphics or look them up — but that we anticipate what encodings our audience already understands how to decode and what encodings our audience needs explaination on how to decode.

2.2.2 Layers and separation

Graphics, including data encodings, are created in layers: each marking is closer to our eyes than the previous marking. The implications are several. First, when the correct attributes of markings are used, we can perceive one marking closer than another. In the left graphic, for example, we perceive the orange circle behind the blue circle, while in the right graphic, we perceive the blue circle behind the orange circle:

These particular effects are created in code, simply by our code order for the markings, overlapping the markings, and choosing fill colors to distinguish the two shapes. We can create the same perception in other ways, too.

Samara (2014) describes these design choices as creating a sense of near and far. We may create a sense of depth, of foreground and background, using any of size, overlapping the forms or encodings, the encodings relative values (lightness, opacity). Samara (2014) writes, “the seeming nearness or distance of each form will also contribute to the viewer’s sense of its importance and, therefore, its meaning relative to other forms presented within the same space.” Ultimately we are trying to achieve a visual hierarchy for the audience to understand at each level.

When designing graphics, and especially when comparing encodings or annotating them, we must perceptually layer and separate types of information or encodings. As Edward R. Tufte (1990) explains, “visually stratifying various aspects of the data” aides readability. By layering or stratifying, we mean placing one type of information over the top of a second type of information. The grammar of graphics, discussed earlier, enables implementations of such a layering. To visually separate the layered information, we can assign, say, a hue or luminance, for a particular layer. Many of the graphics discussed separate types of data through layering.

2.2.3 Layering and opacity

Opacity / transparency provide another attribute very useful in graphics perception. For layered data encoded in monochrome, careful use of transparency can reveal density:

The key in the above use is monochrome (a single color and shade). When we also use color, especially hue, as a channel to represent other data information, we get unintended consequences. Opacity, combined with other color attributes can change our perception of the color, creating encodings that make no sense. Let see this in action by adding opacity to our foreground / background example above, left graphic:

Notice, also, a question arises: is orange or blue in the foreground? With this combination of attributes, we lose our ability to distinguish foreground from background.

2.2.4 Encoding data-ink, xenographics

For a growing collection of interesting approaches to visualizing data in uncommon ways, consult the website Lambrechts (2020). But we have already seen a few less common data encodings. Recall, for example, instances of tracking information encoded as dots within circles in figure 9. Let’s consider a couple more. Getting back to our example Citi Bike project, we identified various data visuals used in earlier exploratory work. In that earlier study, Saldarriaga (2013), researchers visualized bike and docking station activity data in the form of heatmaps overlaying maps, and heatmaps as a grid wherein the x-axis encoded time of day, the y-axis encoded docking station names as categorical data, hue at a given time and docking station encoded the imbalance between incoming and outgoing bikes, and a luminocity gradient at the same location encoded activity level, as shown in figure 33.

Researchers visualized bike and docking station activity data in the form of heatmaps overlaying maps, and heatmaps as a grid wherein the x-axis encoded time of day, the y-axis encoded docking station names as categorical data, hue at a given time and docking station encoded the imbalance between incoming and outgoing bikes, and luminocity at the same location encoded activity level.

Figure 33: Researchers visualized bike and docking station activity data in the form of heatmaps overlaying maps, and heatmaps as a grid wherein the x-axis encoded time of day, the y-axis encoded docking station names as categorical data, hue at a given time and docking station encoded the imbalance between incoming and outgoing bikes, and luminocity at the same location encoded activity level.

The more interesting aspect of this graphic is that, as explained in its legend, the dual, diverging hue, luminance encoding enables markings to disappear if either a) incoming and outgoing activity is balanced or b) the activity level is very low. The limitations of the overall encoding, however, include an unfamiliar listing of docking stations by name on the y-axis. As we proposed in the memo, example 1, let’s try encoding these variables differently. Let’s try addressing the admitted challenge of encoding geographic location with time in a way that allows further, meaningful encodings. We will do this in stages. First, we consider activity level, which we naturally think of as a daily pattern. Other graphics, Armstrong and Bremer (2017) and Bremer (2017), have explored daily patterns of activity, and encode that activity level using polar coordinates. We borrow from that work, encoding bike activity level the way we think about time — circular, think of a 24-hour clock. Our first graphic is in figure 34. We read the graphic as reflecting activity level over time, which is encoded circular, with midnight at the top, 6am to the right, noon at the bottom, 18 hours (6pm) to the left. To help visualize time of day, we label sunrise and sunset, and shade areas before and after sunrise as dark and light.

This graphic encodes bike activity levels throughout a 24-hour day, where time is encoded as polar coordinates.

Figure 34: This graphic encodes bike activity levels throughout a 24-hour day, where time is encoded as polar coordinates.

As did Nadieh, we encode an average activity level along the black line, activity level at a given time as the distance from that average. And the color within that distance from average activity level encodes the quantiles (think boxplot) of activity. As with encoding average activity level, we annotate with reference activity levels: 5, 20, and 35 rides per minute. What is remarkable is the observed magnitude of change from average (black circle) ride rates that exist throughout the day, which reflects this rebalancing problem. Minutes in only light blue show when 50 percent of the ride rates exist. Minutes that include dark blue show when the highest (outside black circle) or lowest (inside black circle) rate of rides happen. Finally, the remaining minutes with medium blue show when the rest of the rates of rides occur.

We now address the limitation of the prior work. In this regard, we can learn from the famous graphic by Minard of Napoleon’s march, see figure 35.

Minard's Napoleon graphic redrawn and translated into English.

Figure 35: Minard’s Napoleon graphic redrawn and translated into English.

In Minard’s graphic, as Tufte explains, he overlays the path of Napoleon’s march onto a map in the form of a ribbon.25 While the middle of that ribbon may accurately reflect geographic location, the width of that ribbon does not. Instead, the width of the ribbon encodes the number of solders at that location, wherein time is also encoded as coinciding with longitude. That encoding gives a sense of where the solders were at a given time, while also encoding number of solders. We try a similar approach with Citi Bike, shown in figure 36. We place each docking station the a black dot (\(\cdot\)) overlaying a geographic map of New York City. At each station, we encode using color an empty or full station as a line segment ( | ) starting at the station dot and extending towards time of day, the length of a unit circle. The line segments are partly transparent so that an individual empty or full station won’t stand out, but repeated problems at that time of day over the three weeks of the data (January 2019) would be more vivid and noticeable. Finally, we annotate the graphic with a narrative and a key that explains these encodings, along with encoding the general activity levels of the graphic in figure 34.

The visualization — a xenographic — invites riders to explore bike and docking station availability for encouraging re-distribution for the NYC bike share. The data on trips and station availability are encoded in seven dimensions: space, time, bike and dock availability, rate of new rides per minute, and whether unavailability at a given time of day occurred multiple times. I used the metaphor of unavailability as dandelions among flowers that riders travel through each spring, weeds that need fixing and a request: by riding against the flow—redistributing bikes—those riders are helping us all.

Figure 36: The visualization — a xenographic — invites riders to explore bike and docking station availability for encouraging re-distribution for the NYC bike share. The data on trips and station availability are encoded in seven dimensions: space, time, bike and dock availability, rate of new rides per minute, and whether unavailability at a given time of day occurred multiple times. I used the metaphor of unavailability as dandelions among flowers that riders travel through each spring, weeds that need fixing and a request: by riding against the flow—redistributing bikes—those riders are helping us all.

The infographic adds, as its title, a call to action: ride against the flow (Spencer 2019a, longlisted Kantar Information is Beautiful Awards). When encoding custom graphics, basic math can come in handy. The encodings (colored line segments) for empty and full docking stations at each station were created by mapping the hour of a day to the angle in degrees/radians of a unit circle, and calculating the end of the line segments as an offset from the docking station geolocation using basic trigonometry,

2.3 Encoding Uncertainty, Estimates, and Forecasts

2.3.1 Motivation to communicate uncertainty

Most authors do not convey uncertainty in their communications, despite its importance (Hullman 2020). Yet good decisions rely on knowledge of uncertainty (B. Fischhoff and Davis 2014); (Baruch Fischhoff 2012). Scientists are often hesitant to share their uncertainty with decisionmakers who need to know it. With an understanding of the reasons for their reluctance, decisionmakers can create the conditions needed to facilitate better communication. The failure to express uncertainty has a negative value. Communicating knowledge can worsen results if it induces unwarranted confidence or is so hesitant that other, overstated claims push it aside. Quantifying uncertainties aids verbal expression.

If we perceive the concern: people will misinterpret quantities of uncertainty, inferring more precision than intended. We might respond: Most people like getting quantitative information on uncertainty, from them can get the main message, and without them are more likely to misinterpret verbal expressions of uncertainty. Posing clear questions guide understanding. Concern: people cannot use probabilities. Response: laypeople can provide high-quality probability judgments, if they are asked clear questions and given the chance to reflect on them. Communicating uncertainty protects credibility. Concern: credible intervals may be used unfairly in performance evaluations. Response: probability judgments give us more accuracy about the information; i.e., won’t be too confident or lack enough confidence.

2.3.2 Research in uncertainty expression

Hullman (2019) provides a short overview of some ways we can represent uncertainty in common data visualizations, along with pros and cons of each. Recent ideas include hypothetical outcome plots (Kale et al. 2018), quantile dotplots (Fernandes et al. 2018) and (Kay et al. 2016), like this,

where the reader can count or estimate the relative frequency occurring at at given measurement, and compare the numbers above or below some threshold. Or values coded with hue and luminosity to create a value-suppressing uncertainty palette (Correll, Moritz, and Heer 2018),

and gradient and violin plots (Correll and Gleicher 2014), as used below in, for example, figure 38.

Missing data create another form of uncertainty, and are common in data analytics projects. The worst approach, usually, is to delete those observations, see (Little and Rubin 2019) and (Baguley and Andrews 2016). Instead, we should think carefully about an appropriate way to represent our uncertainty of those values, usually through multiple imputation. This approach means we treat each missing value as an estimated distribution of possible values. We also need to communicate about those missing values. There are other approaches (Song and Szafir 2018) for visualizing missing data.

Data are part of our models; understanding what is not there is important.

2.3.3 Estimations and predictions from models

To persuasively communicate estimates and predictions, we must understand our models. Our goal in modeling is most typically to understand real processes and their outcomes, to understand what has happened, why, and to consider what may transpire.

Data represent events, outcomes of processes. Let’s call the data observed variables. Typically, we do not know enough about the process to be certain about which outcome will be next: if we did, we wouldn’t need to model it!

But with some knowledge of the process — even before knowing the observed variables — we have an idea of its possible outcomes and probability of each outcome. This knowledge, of course, comes from some kind of earlier (prior) data, perhaps from various sources.

We’ve mentioned that visualization is a form of mental modeling of the data. Visual displays enable us to find relationships between the variables of interest. But not all relations lend themselves to the displays at our disposal. This is especially true as the quantity and type of variables grow. Conceptualizing models

Complementary to visual analyses, we code models to identify, describe, and explain relationships. We have already used a basic regression model earlier when exploring Anscombe’s Quartet. That linear regression could be coded in R as simply lm(y ~ x). But mathematical notation can give us a more informative, equivalent description. Here, that might be26,

\[\begin{equation} \begin{split} y &\sim \textrm{Normal}(\mu, \sigma) \\ \mu &= \alpha + \beta \cdot x \\ \alpha &\sim \textrm{Uniform}(-\infty, \infty) \\ \beta &\sim \textrm{Uniform}(-\infty, \infty) \\ \sigma &\sim \textrm{Uniform}(0, \infty) \end{split} \end{equation}\]

Considering a second example, if we are given a coin to flip, our earlier experiences with such objects and physical laws suggest that when tossed into the air, the coin would come to rest in one of two outcomes: either heads or tails facing up. And we are pretty sure, but not certain, that heads would occur around 50 percent of the time. The exact percentage may be higher or lower, depending on how the coin was made and is flipped. We also know that some coins are made to appear as a “fair” coin but would have a different probability for each of the two outcomes. Its purpose is to surprise: a magic trick! So let’s represent our understanding of these possibilities as an unobserved variable called \(\theta\). \(\theta\) is distributed according to three, unequal possibilities: The coin is balanced, heads-biased, or tails-biased. Visually communicating models

Then, we can represent our prior understanding of the probability of heads as distributed, say, \(\textrm{Beta}(\theta \mid \alpha, \beta)\) where \(\theta\) is the probability, and \(\alpha\) and \(\beta\) are shape parameters, and the observed data distributed \(\textrm{Binomial}(\textrm{heads} \mid \textrm{flips}, \theta)\). If we were very confident that the coin was fair, we could keep the prior distribution narrowly near a half, but in this example we will leave extra uncertainty for the possibility of a trick coin, say, \(\alpha = \beta = 10\), as shown in the left panel of figure 37. For 8 heads in 10 flips, our model distribution with this data are shown in the middle panel of Figure 37.

Our prior knowledge of coins and the observed data combine to inform our new understanding of, and uncertainty in, the probability of heads with this coin.

Figure 37: Our prior knowledge of coins and the observed data combine to inform our new understanding of, and uncertainty in, the probability of heads with this coin.

When our model combines the two sources of knowledge together, shown on the right panel of figure 37, we get our updated information on, and uncertainty of, what the underlying probability of heads may be for this coin. Such distributions most fully represent the information we have about, and should be considered when describing our modeling. With the distributions in pocket, of course, we can summarize them in whatever way makes sense to the questions we are trying to answer and for the audience we intend to persuade. figure 38 provides one alternative for expressing our knowledge and uncertainty.

We can think of this summary as a top-down perspective on the above distributions. We lose some information because it is difficult to capture differences in plausibility through transparency settings alone, but the approach still conveys uncertainty, and the skewed distribution of the likelihood.

Figure 38: We can think of this summary as a top-down perspective on the above distributions. We lose some information because it is difficult to capture differences in plausibility through transparency settings alone, but the approach still conveys uncertainty, and the skewed distribution of the likelihood.

The above is a more intuitive representation of the variation than in table 4. Tables may, however, complement the information in a graph display.

Table 4: Tables do not offer the same ability to quickly understand uncertainty or compare differences between states of information. But they can complement, shape, and provide numerical precision to what we perceive in the graphic.
Range of possible probabilities of heads
Knowledge Min 25th 50th 75th Max
Prior 0.13 0.43 0.50 0.58 0.86
Likelihood 0.18 0.67 0.76 0.84 1.00
Posterior 0.23 0.54 0.60 0.66 0.92

Modeling is usually, of course, more complex than coin flipping and can be a vast, complex subject. That does not mean it cannot be visualized and explained persuasively. Much can be learned about how to explain what we are doing by reviewing well-written introductions on the subject of whatever types of models we use. There are several references — for example, (McElreath 2020) and (Stan Development Team 2019) — that thoroughly introduce applied Bayesian modeling.

McElreath explains what models are, and step-by-step explains various types in words, mathematically, and through corresponding code examples. These models are various: continuous outcome with one covariate, a covariate in polynomial form, multiple covariates, interactions between covariates, generalized linear models, such as when modeling binary outcomes, count outcomes, unordered and ordered categorical outcomes, multilevel covariates, covariance structures, missing data, and measurement error. Along the way, he explains their interpretations, limitations, and challenges. Causal relationships

McElreath also introduces the importance of thinking causally when modeling. This is especially important in selecting variables. One variable may mask — confound — the relationship between two other variables. We can think about relationships and causation with a tool called a directed acyclic graph (DAG). McElreath discusses four types of confounds, and the solutions to avoid them, so that we may better understand the relationships between variables of interest. His instruction is especially helpful as it is provided an applied setting with examples. For more in-depth understanding of causation, consult Pearl (2009).

2.4 Combining Data Graphics in a Visual Narrative

Recall our first layer of messages for a graphic should be annotation of the graphic itself, as first discussed in section 1.4.4, and that we should consider data graphics as paragraphs about data and include them directly into the paragraph text, as discussed in section 1.4.3. We can broaden the concept of integrating or combining graphics and text to think about graphics as part of the overall narrative.

References; (Rendgen 2019); (Boy, Detienne, and Fekete 2015); (Hullman et al. 2013).

3 Interactive

Hohman et al. (2020) discuss five affordances of interactive articles: connecting people and data; making systems playful; prompting self-reflection; personalizing reading; and reducing cognitive load.

References: (Shneiderman 1996) (Overview, zoom, filter, details-on-demand, relate, history, and extract).

3.1 Effective Interactive Dashboard Design

We first introduce dashboard design, and then layer in interactive concepts.

References: (Sarikaya et al. 2019); (Unwin and Valero-Mora 2018); (Wexler, Shaffer, and Cotgreave 2017).

3.2 Introduction to Interactive Storytelling

References: (Hohman et al. 2020); (Heer and Shneiderman 2012); (Segel and Heer 2010).

Additional Examples: (Sam Vickars and Michael Hester 2020).

3.3 Framing an Interactive Story

References: (Hullman and Diakopoulos 2011); (Entman 1993).

4 Multimodal Delivery

4.1 Gestural and Verbal Communication

References: (Schwabish 2016); (E. Tufte 2016); (Doumont 2009); (Edward R. Tufte 2006c); (Edward R. Tufte 2006b); (Edward R. Tufte 2020).



Abelson, Robert P. 1995. Statistics as Principled Argument. Psychology Press.
Adair, Robert K. 2017. The Physics of Baseball. Third. HarperCollins.
Albers, Josef. 2006. Interaction of Color. Yale University Press.
Altman, Rick. 2008. A Theory of Narrative. New York: Columbia University Press.
Anderson, E. W., K. C. Potter, L. E. Matzen, J. F. Shepherd, G. A. Preston, and C. T. Silva. 2011. “A User Study of Visualization Effectiveness Using EEG and Cognitive Load.” Computer Graphics Forum 30 (3): 791–800.
Andrews, R J. 2019. Info We Trust: How to Inspire the World with Data. Wiley.
Anscombe, F J. 1973. “Graphs in Statistical Analysis.” The American Statistician 27 (1): 17–21.
Aristotle, and C. D. C. Reeve. 2018. Rhetoric. Indianapolis ; Cambridge: Hackett Publishing Company, Inc.
Armstrong, Zan, and Nadieh Bremer. 2017. “Why Are so Many Babies Born Around 8:00 A.M.?” Scientific American.
Baguley, Thom, and Mark Andrews. 2016. “Handling Missing Data.” In Modern Statistical Methods for HCI. HumanComputer Interaction Series. Springer.
Baker, Monya. 2016. “Is There a Reproducibility Crisis?” Nature 533 (26): 452–54.
Bal, Mieke. 2017. Narratology: Introduction to the Theory of Narrative. Toronto; Buffalo; London: University of Toronto Press.
Barclay, Scott, Rex V Brown, Clinton W Kelly III, Cameron R Peterson, Lawrence D Phillips, and Judith Selvidge. 1977. “Handbook for Decision Analysis.” Decisions and Designs, Inc.
Baxandall, Michael. 1985. Patterns of Intention: On the Historical Explanation of Pictures. New Haven: Yale University Press.
Berger, James O. 1985. Statistical Decision Theory and Bayesian Analysis. Second. Springer.
Berger, Linda L., and Kathryn M. Stanchi. 2018. Legal Persuasion: A Rhetorical Approach to the Science. Law, Language and Communication. Milton Park, Abingdon, Oxon ; New York, NY: Routledge.
Berinato, Scott. 2018. “Data Science & the Art of Persuasion.” Harvard Business Review, December, 1–13.
Bertin, Jacques. 1983. Semiology of Graphics. University of Wisconsin Press.
———. 2010. Semiology of Graphics: Diagrams Networks Maps. Redlands: ESRI Press.
Bertrand, Marianne. 2009. CEOs.” Annual Review of Economics 1: 121–49.
Booker, Christopher. 2004. The Seven Basic Plots: Why We Tell Stories. London ; New York: Continuum.
Booth, Wayne C, Gregory G Columb, Joseph M Williams, Joseph Bizup, and William T Fitzgerald. 2016a. “13. Organizing Your Argument.” In The Craft of Research, Fourth. University of Chicago Press.
———. 2016b. The Craft of Research. Fourth. University of Chicago Press.
Boswell, Dustin, and Trevor Foucher. 2011. The Art of Readable Code. O’Reilly.
Boy, Jeremy, Francoise Detienne, and Jean-Daniel Fekete. 2015. “Storytelling in Information Visualizations: Does It Engage Users to Explore Data?” In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems - CHI ’15, 1449–58. Seoul, Republic of Korea: ACM Press.
Brady, Chris, Mike Forde, and Simon Chadwick. 2017. “Why Your Company Needs Data Translators.” MIT Sloan Management Review, March, 1–6.
Bremer, Nadieh. 2016. “The Top 2000 Loves the 70s & 80s.” Personal. Visual Cinnamon.
———. 2017. “The Baby Spike.” Portfolio. Visual Cinnamon.
Bringhurst, Robert. 2004. The Elements of Typographic Style. Third. Hartley & Marks.
Butterick, Matthew. 2018. “Butterick’s Practical Typography.”
Byrne, Oliver. 2017. The first six books of the elements of Euclid in which coloured diagrams and symbols are used instead of letters for the greater ease of learners. Bibliotheca universalis. Köln: TASCHEN.
Caldeira, Joao, Alex Fout, Aniket Kesari, Raesetje Sefala, Joseph Walsh, Katy Dupre, Muhammad Rizal Khaefi, et al. 2018. “Improving Traffic Safety Through Video Analysis in Jakarta, Indonesia.” In Nd Conference on Neural Information Processing Systems NeurIPS, 1–5.
Carr, David J. 2016. “A Map of Modern Brand Building.” Medium | David J Carr.
———. 2018. “Data Is the New Oil: Dirty, Misunderstood, Polluting the World & Pulled from All the Wrong Places.” Medium | Redwhale.
———. 2019. “What Value Do You Create? Marketings 3 Types of Value.” Medium | Marketing.
Carroll, Lewis. 2013. Alice’s Adventures in Wonderland and Other Stories. Canterbury Classics.
Chambers, J. M. 1983. Graphical Methods for Data Analysis. 2018 Republication.
Cicero, Marcus Tullius, and J. S. Watson. 1986. Cicero on oratory and orators. Landmarks in rhetoric and public address. Carbondale: Southern Illinois University Press.
Cleveland, William S. 1985. The Elements of Graphing Data. Wadsworth.
———. 1993. Visualizing Data. Hobart Press.
Cleveland, William S, and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association 79 (387): 531–54.
Cohn, Neil. 2016. The Visual Narrative Reader. Edited by Neil Cohn. Bloomsbury Academic.
Columbia University. 2020. MBA Core Curriculum.” Columbia Business School.
Copeland, David E., Kris Gunawan, and Nicole J. Bies-Hernandez. 2011. “Source Credibility and Syllogistic Reasoning.” Memory & Cognition 39 (1): 117–27.
Correll, Michael, and Michael Gleicher. 2014. “Error Bars Considered Harmful: Exploring Alternate Encodings for Mean and Error.” IEEE Transactions on Visualization and Computer Graphics 20 (12): 2142–51.
Correll, Michael, Dominik Moritz, and Jeffrey Heer. 2018. “Value-Suppressing Uncertainty Palettes.” In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI ’18, 1–11. Montreal QC, Canada: ACM Press.
Damasio, Antonio R. 1994. Descartes’ Error: Emotion, Reason, and the Human Brain. New York: Putnam.
de Bartolo, Carolina, Stephen Coles, and Erik Spiekermann. 2019. Explorations in Typography. Second. 101Editions.
Didion, Joan. 1976. “Why I Write.” The New York Times, December.
Doumont, Jean-Luc. 2009. Trees, Maps, and Theorems. Effective Communication for Rational Minds. Principiæ.
Dragicevic, Pierre. 2016. “Fair Statistical Communication in HCI.” In Modern Statistical Methods for HCI, edited by Judy Robertson and Maurits Kaptein, 291–330. Springer International Publishing.
Duarte, Nancy. 2010. Resonate: Present Visual Stories That Transform Audiences. Wiley.
Entman, Robert M. 1993. “Framing: Toward Clarification of a Fractured Paradigm.” Journal of Communication 43 (4): 51–58.
Evans, J. St. B. T., Julie L. Barston, and Paul Pollard. 1983. “On the Conflict Between Logic and Belief in Syllogistic Reasoning.” Memory & Cognition 11 (3): 295–306.
Fan, Wenfei. 2015. “Data Quality: From Theory to Practice.” SIGMOD Record 44 (3): 7–18.
Farnsworth, Ward. 2011. Farnsworth’s Classical English Rhetoric. David R. Godine Publisher.
———. 2016. Farnsworth’s Classical English Metaphor. David R. Godine Publisher.
Fernandes, Michael, Logan Walls, Sean Munson, Jessica Hullman, and Matthew Kay. 2018. “Uncertainty Displays Using Quantile Dotplots or CDFs Improve Transit Decision-Making.” In The 2018 CHI Conference, 1–12. New York, New York, USA: ACM Press.
Fischhoff, Baruch. 2012. “Communicating Uncertainty: Fulfilling the Duty to Inform.” Issues in Science and Technology 28 (4): 63–70.
Fischhoff, B, and A L Davis. 2014. “Communicating Scientific Uncertainty.” Proceedings of the National Academy of Sciences 111 (Supplement): 13664–71.
Forster, E. M. 1927. Aspects of the Novel. United Kingdom: Edward Arnold.
Foundation, National Science. 1998. A Guide for Proposal Writing / National Science Foundation, Directorate for Education and Human Resources, Division of Undergraduate Education. National Science Foundation.
Friedland, Andrew J., Carol L. Folt, and Jennifer L. Mercer. 2018. Writing Successful Science Proposals. Third edition. New Haven: Yale University Press.
Friedman, Matthew. 2017. “Citi Bike Racks Continue to Go Empty Just When Upper West Siders Need Them.” News. West Side Rag.
Frost, Robert. 1921. Mountain Interval. New York: Holt,.
Gandrud, Christopher. 2020. Reproducible Research with R and RStudio. Third edition. The R Series. Boca Raton, FL: CRC Press.
Garlandini, Simone, and Sara Irina Fabrikant. 2009. “Evaluating the Effectiveness and Efficiency of Visual Variables for Geographic Information Visualization.” In Spatial Information Theory: 9th International Conference, COSIT 2009, Aber Wrac’h, France, September 21-25, 2009: Proceedings, edited by Kathleen Hornsby. Lecture Notes in Computer Science 5756. Berlin ; New York: Springer.
Gaut, Berys. 2014. “Educating for Creativity.” In The Philosophy of Creativity: New Essays, edited by Elliot Samuel Paul, 265–87. New York: Oxford University Press.
Gelman, Andrew. 2018. “Ethics in Statistical Practice and Communication: Five Recommendations.” Significance 15 (5): 40–43.
Gelman, Andrew, and Thomas Basbøll. 2014. “When Do Stories Work? Evidence and Illustration in the Social Sciences.” Sociological Methods & Research 43 (4): 547–70.
Gelman, Andrew, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian Data Analysis. Third. CRC Press.
Gelman, Andrew, Jennifer Hill, and Aki Ventari. 2020. Regression and Other Stories. S.l.: Cambridge University Press.
Gelman, Andrew, Aki Vehtari, Daniel Simpson, Charles C. Margossian, Bob Carpenter, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul-Christian Bürkner, and Martin Modrák. 2020. “Bayesian Workflow.” arXiv:2011.01808 [Stat], November.
Gilovich, Thomas, Dale Griffin, and Daniel Kahnman. 2009. Heuristics and Biases. Edited by Thomas Gilovich, Dale Griffin, and Daniel Kahneman. The Psychology of Intuitive Judgment. Cambridge: Cambridge University Press.
Goodman, Richard. 2008. The Soul of Creative Writing. Routledge.
Guberman, Ross. 2014. Point Made: How to Write Like the Nation’s Top Advocates. Second edition. Oxford ; New York, NY: Oxford University Press.
Haidt, Jonathan. 2001. “The Emotional Dog and Its Rational Tail: A Social Intuitionist Approach to Moral Judgment.” Psychological Review 108 (4): 814–34.
Hall, Trish. 2019. Writing to Persuade: How to Bring People over to Your Side. First edition. New York: Liveright Publishing Corporation, a division of W.W. Norton & Company.
Halliday, M. A. K., and Christian M. I. M. Matthiessen. 2004. An Introduction to Functional Grammar. 3rd ed. London : New York: Arnold ; Distributed in the United States of America by Oxford University Press.
Harari, Yuval Noah. 2014. Sapiens: A Brief History of Humankind. London: Harvill Secker.
Harris, Joseph. 2017. Rewriting: How to Do Things with Texts. Second edition. Logan: Utah State University Press.
Harris, Robert L. 1999. Information Graphics: A Comprehensive Illustrated Reference. New York: Oxford University Press.
HBR Advertising and Sales.” n.d. Harvard Business Review.
Healey, C G, and J T Enns. 2012. “Attention and Visual Memory in Visualization and Computer Graphics.” IEEE Transactions on Visualization and Computer Graphics 18 (7): 1170–88.
Healy, Kieran. 2018a. “The Plain Person’s Guide to Plain Text Social Science.”
———. 2018b. Data Visualization. Princeton University Press.
Heer, Jeffrey, and Michael Bostock. 2010. “Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design.” In Proceedings of the Sigchi Conference on Human Factors in Computing Systems, 203–12.
Heer, Jeffrey, and Ben Shneiderman. 2012. “Interactive Dynamics for Visual Analysis: A Taxonomy of Tools That Support the Fluent and Flexible Use of Visualizations.” Queue 10 (2): 30–55.
Hohman, Fred, Matthew Conlen, Jeffrey Heer, and Duen Chau. 2020. “Communicating with Interactive Articles.” Distill 5 (9): 10.23915/distill.00028.
Holmes, Oliver Windell. 1894. The Professor at the Breakfast-Table. Houghton, Mifflin & Co.
Holsanova, Jana, Henrik Rahm, and Kenneth Holmqvist. 2006. “Entry Points and Reading Paths on Newspaper Spreads: Comparing a Semiotic Analysis with Eye-Tracking Measurements.” Visual Communication 5 (1): 65–93.
Holtz, Yan, and Conor Healy. 2018. “From Data to Viz.” In.
Hullman, Jessica. 2019. “Confronting Unknowns: How to Interpret Uncertainty in Common Forms of Visualization.” Scientific American 321 (3): 80–83.
———. 2020. “Why Authors Don’t Visualize Uncertainty.” IEEE Transactions on Visualization and Computer Graphics 26 (1): 130–39.
Hullman, Jessica, and Nick Diakopoulos. 2011. “Visualization Rhetoric: Framing Effects in Narrative Visualization.” IEEE Transactions on Visualization and Computer Graphics 17 (12): 2231–40.
Hullman, Jessica, Steven Drucker, Nathalie Henry Riche, Bongshin Lee, Danyel Fisher, and Eytan Adar. 2013. “A Deeper Understanding of Sequence in Narrative Visualization.” IEEE Transactions on Visualization and Computer Graphics 19 (12): 2406–15.
Kahneman, Daniel. 2013. Thinking, Fast and Slow. Farrar, Straus and Giroux.
Kahneman, Daniel, Dan Lovallo, and Olivier Sibony. 2011. “Before You Make That Big Decision ...” Harvard Business Review 89 (6): 50–60.
Kale, Alex, Francis Nguyen, Matthew Kay, and Jessica Hullman. 2018. “Hypothetical Outcome Plots Help Untrained Observers Judge Trends in Ambiguous Data.” IEEE Transactions on Visualization and Computer Graphics 25 (1): 892–902.
Katz, Yarden. 2013. “Against Storytelling of Scientific Results.” Nature Publishing Group 10 (11): 1045–45.
Kay, Matthew. 2015. “Figures.”
Kay, Matthew, Tara Kola, Jessica R Hullman, and Sean A Munson. 2016. “When (Ish) Is My Bus? User-Centered Visualizations of Uncertainty in Everyday, Mobile Predictive Systems.” In The 2016 CHI Conference, 5092–5103. New York, New York, USA: ACM Press.
Kelleher, John D, and Brendan Tierney. 2018. Data Science. MIT Press.
Kitzes, Justin, Daniel Turek, and Fatma Deniz. 2018. The Practice of Reproducible Research. Case Studies and Lessons from the Data-Intensive Sciences. University of California Press.
Knaflic, Cole Nussbaumer. 2015. Storytelling with Data. A Data Visualization Guide for Business Professionals. Wiley.
Koponen, Juuso, and Jonatan Hildén. 2019. Data Visualization Handbook. First. Finland: Aalto Art Books.
Kosslyn, Stephen Michael, William L. Thompson, and Giorgio Ganis. 2006. The Case for Mental Imagery. Oxford Psychology Series 39. New York: Oxford University Press.
Kowarik, Alexander, Bernhard Meindl, and Matthias Templ. 2015. sparkTable: Generating Graphical Tables for Websites and Documents with R.” The R Journal 7 (1): 24–37.
Kövecses, Zoltán. 2010. Metaphor: A Practical Introduction. Second. Oxford University Press.
Krause, Rebecca J., and Derek D. Rucker. 2020. “Strategic Storytelling: When Narratives Help Versus Hurt the Persuasive Power of Facts.” Personality and Social Psychology Bulletin 46 (2): 216–27.
Krzywinski, Martin, and Alberto Cairo. 2013a. “Storytelling.” Nature Publishing Group 10 (8): 687–87.
———. 2013b. “Reply to: Against Storytelling of Scientific Results.” Nature Publishing Group 10 (11): 1046–46.
Lakoff, George, and Mark Johnson. 1980. Metaphors We Live by. Chicago: University of Chicago Press.
Lambrechts, Maarten. 2020. “Xenographics: Weird but (Sometimes) Useful Charts.” Xenographics.
Larson, Kevin, and Rosalind Picard. 2005. “The Aesthetics of Reading.” MIT Affective Computing Lab, January, 1–12.
Little, Roderick J. A., and Donald B. Rubin. 2019. Statistical Analysis with Missing Data. Third edition. Wiley Series in Probability and Statistics. Hoboken, NJ: Wiley.
Loukissas, Yanni A. 2019. All Data Are Local: Thinking Critically in a Data-Driven Society. Cambridge, Massachusetts: The MIT Press.
Lupi, Giorgia. 2016. DATA HUMANISM: The Revolution Will Be Visualized.” Print 70 (3): 76–85.
MacEachren, Alan M. 2019. “(Re)Considering Bertin in the Age of Big Data and Visual Analytics.” Cartography and Geographic Information Science 46 (2): 101–18.
Manjoo, Farhad. 2019. “I Visited 47 Sites. Hundreds of Trackers Followed Me.” New York Times, August.
Maynard-Atem, Louise, and Ben Ludford. 2020. “The Rise of the Data Translator.” Impact 2020 (1): 12–14.
McCloud, Scott. 1993. Understanding Comics: The Invisible Art. Kitchen Sink Press.
McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.
McShane, Blakeley B., David Gal, Andrew Gelman, Christian Robert, and Jennifer L. Tackett. 2019. “Abandon Statistical Significance.” The American Statistician 73 (sup1): 235–45.
Meirelles, Isabel. 2013. Design for Information. An Introduction to the Histories, Theories, and Best Practices Behind Effective Information Visualizations. Rockport.
Miller, Jane E., ed. 2015. “Creating Effective Tables.” In The Chicago Guide to Writing about Numbers, Second edition. Chicago ; London: The University of Chicago Press.
Miller, Joshua B., and Andrew Gelman. 2020. “Laplace’s Theories of Cognitive Illusions, Heuristics and Biases.” Statistical Science 35 (2): 159–70.
Moreau, Luc, Paul Groth, Simon Miles, Javier Vazquez-Salceda, John Ibbotson, Sheng Jiang, Steve Munroe, et al. 2008. “The Provenance of Electronic Data.” Communications of the ACM 51 (4): 52–58.
Munzner, Tamara. 2014. Visualization Analysis and Design. CRC Press.
Müller-Brockmann, Josef. 1996. Grid Systems in Graphic Design. A Visual Communication Manual for Graphic Designers, Typographers, and Three Dimensional Designers. ARTHUR NIGGLI LTD.
Oruc, A Yavuz. 2011. Handbook of Scientific Proposal Writing. CRC Press.
Orwell, George. 2017. 1984. New York: Haughton Mifflin Harcourt.
Oster, Sandra, and Paul Cordo. 2015. Successful Grant Proposals in Science, Technology and Medicine: A Guide to Writing the Narrative. Cambridge ; New York: Cambridge University Press.
Parmigiani, G. 2001. “Decision Theory: Bayesian.” In International Encyclopedia of the Social Behavioral Sciences, 3327–34.
Pearl, Judea. 2009. CAUSALITY: Models, Reasoning, and Inference Second Edition. Cambridge University Press.
Perloff, Richard M. 2017. The Dynamics of Persuasion: Communication and Attitudes in the 21st Century. Sixth edition. New York: Routledge, Taylor & Francis Group.
“Print Advertising Opportunities.” 2020. Business. MIT Sloan Management Review.
Pu, Xiaoying, and Matthew Kay. 2018. “The Garden of Forking Paths in Visualization: A Design Space for Reliable Exploratory Visual Analytics.” In BELIV Workshop 2018, 1–9.
Rahlf, Thomas. 2019. Data Visualization with R 111 Examples. S.l.: Springer Nature.
Rendgen, Sandra. 2019. History of Information Graphics. Köln: Taschen.
Riche, Nathalie Henry, Christophe Hurter, Nicholas Diakopoulos, and Sheelagh Carpendale. 2018. Data-Driven Storytelling. CRC Press.
Ricoeur, Paul. 1984. Time and Narrative. Vol. 1: ... Translated by Kathleen McLaughlin. Repr. Chicago, Ill.: Univ. of Chicago Press.
———. 1985. Time and Narrative. Vol. 2: ... Translated by Kathleen McLaughlin and David Pellauer. Repr. Chicago, Ill.: Univ. of Chicago Press.
———. 1988. Time and Narrative. Vol. 3: ... Translated by Kathleen Blamey and David Pellauer. Repr. Chicago: Univ. of Chicago Pr.
———. 1993. The Rule of Metaphor: Multi-Disciplinary Studies of the Creation of Meaning in Language. Translated by Robert Czerny and Kathleen McLaughlin. Toronto; Buffalo; London: University of Toronto Press.
Robinson, David. 2017. “Examining the Arc of 100,000 Stories: A Tidy Analysis.” Variance Explained.
Rodden, John. 2008. “How Do Stories Convince Us? Notes Towards a Rhetoric of Narrative.” College Literature 35 (1): 148–73.
Rosenbaum, Paul. 2017. Observation and Experiment: An Introduction to Causal Inference. Harvard University Press.
Roston, Eric, and Blacki Migliozzi. 2015. “What’s Really Warming the World?” Bloomberg, June.
Rutter, Richard. 2017. Web Typography. A Handbook for Designing Beautiful and Effective Responsive Typography. Ampersand Type.
Saldarriaga, Juan Francisco. 2013. CitiBike Rebalancing Study.” Spatial Information Design Lab, Columbia University.
Sam Vickars, and Michael Hester. 2020. “The No Doubter Report.” Services. The DataFace.
Samara, Timothy. 2014. Design Elements: A Graphic Style Manual. Understanding the Rules and Knowing When to Break Them. Rockport.
Sarikaya, Alper, Michael Correll, Lyn Bartram, Melanie Tory, and Danyel Fisher. 2019. “What Do We Talk About When We Talk About Dashboards?” IEEE Transactions on Visualization and Computer Graphics 25 (1): 682–92.
Scalia, Antonin, and Bryan A Garner. 2008. Making Your Case. Limited. The Art of Persuading Judges. Thomson West.
Scarr, Simon, and Marco Hernandez. 2019. “Drowning in Plastic.” Reuters Graphics, September.
Schimel, Joshua. 2012. Writing Science: How to Write Papers That Get Cited and Proposals That Get Funded. Oxford ; New York: Oxford University Press.
Schleuss, Jon, and Rong-Cong Lin II. 2013. “California Crime 2013.” Los Angeles Times.
Schwabish, Jonathan. 2016. Better Presentations: A Guide for Scholars, Researchers, and Wonks. Columbia University Press.
Segel, E, and J Heer. 2010. “Narrative Visualization: Telling Stories with Data.” IEEE Transactions on Visualization and Computer Graphics 16 (6): 1139–48.
Sharot, Tali. 2017. The Influential Mind. What the Brain Reveals about Our Power to Change Others. Henry Holt and Company.
Shneiderman, B. 1996. “The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations.” In Proceedings 1996 IEEE Symposium on Visual Languages, 336–43. Boulder, CO, USA: IEEE Comput. Soc. Press.
Snyder, Blake. 2013. Save the Cat!: The Last Book on Screenwriting You’ll Ever Need. S.l.: Michael Wiese.
Song, Hayeong, and Danielle Albers Szafir. 2018. “Where’s My Data? Evaluating Visualizations with Missing Data.” IEEE Transactions on Visualization and Computer Graphics 25 (1): 914–24.
Sousanis, Nick. 2015. Unflattening. Cambridge, Massachusetts: Harvard University Press.
Spencer, Scott. 2019a. “Ride Against the Flow.”
———. 2019b. “Proposal for Exploring Game Decisions Informed by Expectations of Joint Probability Distributions.” Proposal.
———. 2020. HSLuv: Converts HSLuv to RGB and Hex. Manual.
Stan Development Team. 2019. Stan Users Guide. 2.20 ed.
Storr, Will. 2020. Science of Storytelling. New York, NY: Abrams Books.
“"Story, n.".” 2015. Oxford English Dictionary.
Strunk, William, and E B White. 2000. The Elements Of Style. Fourth. Allyn & Bacon.
Thomas, David, and Andrew Hunt. 2020. The Pragmatic Programmer. 20th Anniversary. Your Journey to Mastery. Addison-Wesley.
Tufte, Edward. 2016. “The Future of Data Science.” Seattle, Washington.
Tufte, Edward R. 1990. “Layers and Separation.” In Envisioning Information. Graphics Press.
———. 2001a. “Data-Ink Maximization and Graphical Design.” In The Visual Display of Quantitative Information, 1–15. Graphics Press.
———. 2001b. The Visual Display of Quantitative Information. Second. Graphics Press.
———. 2006a. Beautiful Evidence. Graphics Press.
———. 2006b. “Corruption in Evidence Presentations: Effects Without Causes, Cherry Picking, Overreaching, Chartjunk, and the Rage to Conclude.” In Beautiful Evidence. Graphics Press.
———. 2006c. “The Cognitive Style of PowerPoint: Pitching Out Corrupts Within.” In Beautiful Evidence. Graphics Press.
Tufte, Edward R. 2020. “Smarter Presentations and Shorter Meetings.” In Seeing with Fresh Eyes: Meaning, Space, Data, Truth, 151–61. Cheshire, Conn.: Graphics Press.
Tukey, John W. 1977. Exploratory Data Analysis. Behavioral Science: Quantitative Methods. Addison-Wesley.
Unwin, Antony. 2016. Graphical Data Analysis with R. CRC Press.
Unwin, Antony, and Pedro Valero-Mora. 2018. “Ensemble Graphics.” Journal of Computational and Graphical Statistics 27 (1): 157–65.
Vaidyanathan, Ramnath, Kent Russell, and Gareth Watts. 2016. Sparkline: ’jQuery’ Sparkline ’Htmlwidget’. Manual.
Vickars, Sam. 2019. “The Irregular Outfields of Baseball.” Business. The Data Face.
von Neumann, John, and Oskar Morgenstern. 2004. Theory of Games and Economic Behavior. 60th Anniversary. Princeton University Press.
Wacharamanotham, Chat, Shion Guha, Matthew Kay, Pierre Dragicevic, and Steve Haroz. 2018. “Special Interest Group on Transparent Statistics Guidelines.” The 2018 CHI Conference, April, 1–441.
Wainer, Howard. 2016a. “Inside Out Plots.” In Truth or Truthiness. Distinguishing Fact from Fiction by Learning to Think Like a Data Scientist. Cambridge: Cambridge University Press.
———. 2016b. Truth or Truthiness. Distinguishing Fact from Fiction by Learning to Think Like a Data Scientist. Cambridge: Cambridge University Press.
Ware, Colin. 2020. Information Visualization: Perception for Design. Fourth. Philadelphia: Elsevier, Inc.
Wexler, Steve, Jeffrey Shaffer, and Andy Cotgreave. 2017. The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios. Hoboken, New Jersey: Wiley.
Wilkinson, Leland. 2005. The Grammar of Graphics. Second. Springer.
Williams, Joseph M, Joseph Bizup, and William T Fitzgerald. 2016. “17. Revising Style.” In The Craft of Research, 248–67. University of Chicago Press.
Williams, Joseph, and Gregory Colomb. 1990. Style: Toward Clarity and Grace. Toward Clarity and Grace. University of Chicago Press.
Willman, Daren. 2020. “Standard Statistics.” MLB Advanced Media.
———. n.d. “Statcast Search CSV Documentation.” MLB Advanced Media.
Wong, Dona M. 2013. The Wall Street Journal Guide to Information Graphics: The Dos and Don’ts of Presenting Data, Facts, and Figures. New York: Norton.
Wongsuphasawat, Kanit, Yang Liu, and Jeffrey Heer. 2019. “Goals, Process, and Challenges of Exploratory Data Analysis: An Interview Study.”
Yorke, John. 2015. Into the Woods: A Five-Act Journey into Story. The Overlook Press.
Zeileis, A, K Hornik, and P Murrell. 2009. “Escaping RGBland: Selecting Colors for Statistical Graphics.” Computational Statistics & Data Analysis 53 (9): 3259–70.
Zetlin, Minda. 2017. “What Is a Chief Analytics Officer? The Exec Who Turns Data into Decisions.” CIO, November.
Zinsser, William. 2001. On Writing Well. Sixth. The Classic Guide to Writing Nonfiction. Harper Resource.
zonination. 2015. “Perceptions of Probability and Numbers,” August.

  1. For a wonderful reference for using previous writings and works to drive research, consult J. Harris (2017), as we will do when covering this use later.↩︎

  2. Abstracting out how to create a graphic lets us focus on what to create.↩︎

  3. Note the plural. While we have identified a single person in the memo examples, discussed later, those memos may be passed to others on his team — it may have secondary audiences.↩︎

  4. Examples of those with similar needs may categorically include, say, the intended audience in (Caldeira et al. 2018).↩︎

  5. We will discuss this structure — old before new — in detail later.↩︎

  6. An expectation is specifically defined in probability theory. To optimize is also a specific mathematical concept.↩︎

  7. William Zinsser: A long-time teacher of writing at Columbia and Yale, the late professor and journalist is well-known for putting pen to paper, or finger to key, as the case may be.↩︎

  8. This form of narrative has dominated since Aristotle’s Poetics, but narrative is broader. See (Altman 2008).↩︎

  9. For example, (Snyder 2013) or (Booker 2004).↩︎

  10. Later, we will formalize discussion of questions, among other things, as tools of persuasion.↩︎

  11. For a detailed understanding of narrative, consult seminal and recent work, including (Altman 2008); (Bal 2017); (Ricoeur 1984); (Ricoeur 1985); and (Ricoeur 1988).↩︎

  12. Orwell (2017), for example, argues against political tyrrany.↩︎

  13. Indeed, we are being warned to abandon significance tests (McShane et al. 2019).↩︎

  14. Seminal works on metaphor include (Farnsworth 2016); (Kövecses 2010); (Ricoeur 1993); (Lakoff and Johnson 1980).↩︎

  15. Their very-short, classic book on writing would not be in its 50th Edition were it not still valuable. Leading by example, this tiny book provides dos and don’ts with examples of each. Re-read.↩︎

  16. In doing so, he introduces another idea, inside out plots, which reverse a table’s numbers and labels in certain contexts.↩︎

  17. We will cover other Gestalt and design principles later. Many great references discuss Gestalt principles e.g., (Ware 2020), usually in the context of design generally or data visualizations, but these apply universally: including for tables!↩︎

  18. For an example implementation, in R, and more details, see Vaidyanathan, Russell, and Watts (2016) and Kowarik, Meindl, and Templ (2015).↩︎

  19. Other implementations of graphics will typically name the components of a graphic similarly.↩︎

  20. Note that <...> is not part of the actual code. It represents, for purposes of discussion, a placeholder that the coder would replace with appropriate information.↩︎

  21. Jacques Bertin was a French cartographer and theorist, trained at the Sorbonne, and a world renowned authority on the subject of information visualization. He later assumed various leadership positions in research and academic institutions in Paris. Semiology of Graphics, originally published in French in 1967, is internationally recognized as a foundational work in the fields of design and cartography.↩︎

  22. Indepth reviews are in Ware (2020), Bertin (1983), Meirelles (2013), and Healey and Enns (2012).↩︎

  23. Same as Hue-Saturation-Brightness.↩︎

  24. Indeed, most criticisms of Tufte’s work misses the point by focusing on the most extreme cases of graphic representation within his process of experimentation, completely losing what we should learn — how to reason and experiment with data graphics. Focus on learning the reasoning and experimentation process.↩︎

  25. Edward R. Tufte (2001b) analyses Minard’s graphic, declaring it, perhaps, the greatest ever created.↩︎

  26. Note that in this toy example the “priors” or “unobserved variables” \(\alpha\), \(\beta\), and \(\sigma\) are distributed \(\textrm{Uniform}\) over infinity, which is what the above lm() assigns. This is poor modeling as we always know more than this about the covariates and relationships under the microscope before updating our knowledge with data \(x\) and \(y\).↩︎



If you see mistakes or want to suggest changes, please create an issue on the source repository.


For attribution, please cite this work as

Spencer (2021, Feb. 14). Data in Wonderland. Retrieved from

BibTeX citation

  author = {Spencer, Scott},
  title = {Data in Wonderland},
  url = {},
  year = {2021}