Data in Wonderland
Explores communication with data in various forms through seminal and cutting-edge ideas in writing, data analyses, and visualzation.
Preface
Narrative and story can enhance others’ understanding of data, offering meaning and insights as communicated in numbers and words, or in graphical encodings. Carefully combined, narratives, data analyses, and visuals can help enable change:
Audience and assumed background
My primary audience are my students at Columbia University. The readers who will get most from this text, for whom I have in mind as my more general audience, are curious active learners:
An active learner asks questions, considers alternatives, questions assumptions, and even questions the trustworthiness of the author or speaker. An active learner tries to generalize specific examples, and devise specific examples for generalities.
An active learner doesn’t passively sponge up information — that doesn’t work! — but uses the readings and lecturer’s argument as a springboard for critical thought and deep understanding.
This text isn’t meant to be an end, but a beginning, giving you hand-selected, seminal and cutting-edge references for the concepts presented. Go down these rabbit holes, following citations and studying the cited material. Becoming an expert in storytelling with data also requires practicing. Indeed,
Learners need to practice, to imitate well, to be highly motivated, and to have the ability to see likenesses between dissimilar things in [domains ranging from creative writing to mathematics. (Gaut 2014).
You may find some concepts difficult or vague on a first read. For that, I’ll offer encouragement from Abelson (1995):
I have tried to make the presentation accessible and clear, but some readers may find a few sections cryptic …. Use your judgment on what to skim. If you don’t follow the occassional formulas, read the words. If you don’t understand the words, follow the music and come back to the words later.
Let’s hop in!
Structure and content
Empirical studies suggest the communication is generally more effective when its author controls all aspects of the communication, from content to typography and form. But in some cases, we may enhance the communication by allowing our audience to choose among potential contexts of the information. Here, we aim to explore many ideas within this framework, using as content a data analytics project.
We will start by exploring our content (a proposed and implemented data analytics project) and our intended audiences (various executives, general audiences, and mixed audiences) through narrative and considering not just our words but the typographic forms of those words in the chosen communication medium.
Then we begin to integrate other, visual, forms of data representation into the narrative. As our discussions in graphical data encodings become more complex, we give them focus in the form of information graphics, dashboards, and finally enable our audience as, to some degree, co-author with interactive design of the communication.
Software information and conventions
The primary software tools used in this reference include R
, the tidyverse
, ggplot2
, and a few other extensions to that implementation of the grammar of graphics. Part of the later chapters in interactive communications, I introduce the internet web standards that enable such graphics: html
, css
, svg
, canvas
, and javascript
.