index.knit

Ok, awesome. So last week we sort of ignored implementation of interactivity to focus on the what, why, and conceptual how’s. This week, I’m going to introduce you to the modern technology stack that is most common, most standard, and most important.

Last week, we discussed how Tableau enables us to use features we’ve been practicing within the grammar of graphics, and includes interactive features through a drag and drop interface, and automatically creates a few forms of interactivity. So you can create an interactive graphic, save it as a packaged workbook, and give it to your audience to interactively explore if they have access to Tableau.

In this discussion tonight, we’re going to think more broadly about the technology stack you can use directly with the grammar of graphics you have already been using to make interactive graphics.

So we’re going to focus on understanding the actual, standard technologies that enable interactivity and how they work together. Also Tableau is much more limited in what you can do than what we’ll be reviewing, and it’s proprietary, whereas all the technologies we’ll discuss tonight are free and, again, based on world-wide standards.

Even if your group decides to use Tableau to create the interactive communication, understanding what I’m going to explain tonight is important even if you don’t build your own graphics this way. Understanding them will also enable you to hire and manage others who do. Why? Because you’ll have an understanding of potential capabilities.

And if your group does want to use these technologies to make the interactive communication, my hope is that this lecture will help you get up to speed quickly. :)

Ok, I’ve sketched out a very high level overview of quite a few technologies. When I show you, I don’t want you to feel overwhelmed by it, though, because we’re going to talk through the pieces together on a level that I think will be helpful to you, whether you are just managing other people who are using interactive technologies or you are making your own.

With that caveat, let’s take a look.

First, on the right, I’ve highlighted the tools you’ve been practicing with so far, including the two we briefly discussed last week.

Ok, again, try not to feel overwhelmed by all these categories of technology. We’ll go through this together. By the way, you can click on any of these words to go to a website that is helpful to learn more when you are reviewing after class.

Ok, I’m showing you basically two columns of information. The top row of each should be familiar with all of you. We’ve all been using a web browser many years. And some of you were probably born after it already existed, but if you venture in to the stacks in Butler library, you can learn about our world before the Internet existed. ;)

And at least since you’ve started the applied analytics program at Columbia, you’ve started using R and RStudio, a terrific IDE. So the left column represents technologies that your web browser uses to show you things when you’re late night surfing the net. Now these various technologies are used by people having a job called a “front-end developer” to create interactive websites.

For the most part, you in applied analytics do not generally have to do that. You can stay, for the most part, in the right hand column, and tell your analysis tools to make the stuff on the left for you.

But what happens should not feel like magic to you, and if you know a few basic ideas about the stuff on the left, you can free yourself from the limitations of whatever template you are presented with.

And the lines I’ve drawn to conceptually connect these words are there to help you see that these technologies relate to each other in some way or another.

Does the motivation to learn these on a basic level make sense to you?

Cool. Ok, let’s start by reviewing how web browsers work.

These are the same actions we talked about last week. So the browser watches for these events, and reports the events to things in the webpage you are reading. And we’ll talk more about how the actions go from you or your audience to a page element and bubble back up to you or your audience in a few minutes.

By the way, I’m about to show you slides that are more dense than I usually show you, and I apologize for that. It’s because some of this material may be new to you and I wanted to provide additional written explanations you can review after lecture, along with what I write in your reader, Data in Wonderland. So when I show you them, I’ll explain what to focus on. Sound good?

Ok, so we haven’t really introduced what I mean by a page element. So what’s a web page; what does it consist of?

Don’t worry about reading the mini paragraphs on top. You’ll see these little paragraphs on most of the slides tonight. That’s the take home stuff to help with your review later. So let’s just focus on the overall connections on the right hand side, and the examples and screenshots I’ve made below these paragraphs. Make sense what we’re focusing on? Cool.

The web pages that load into your browser are typically named html files. These are just plain text files that contain lots of different kinds of code, which we will discuss in a moment. All that plain text is organized in something called a Document Object Model. That’s really just a fancy word to say what I’m minimally showing you in this small example. A Document Object Model, or DOM, consists of nested pairs of tags, and a tag is written as angle bracket pairs with the name or type of tag inside them.

So all pages have the first and last tag I’m showing you, have two tags with html inside them. All pages have two tags with the word head inside. And all pages have two tags with the word body inside. The html just tells the browser, hey I’m a web page so you know what’s inside me.

The body tags are where the content of the page generally goes, what the browser displays to you.

And the head tells the browser how to display the content, among other things, but doesn’t actually display what’s inside the head tags.

All these tags are part of the html specification. Let’s consider them more closely.

Ok, in this example, we’re just looking at one html element. The p inside the first angle bracket means paragraph. Notice that the pairs are what we call an opening tag and a corresponding closing tag. Also notice that in the closing tag, the name of the tag, here p, is preceded by a forward slash. That just tells the browser that’s the end of that element. Finally, we can assign styles and other information to that tag type, here, again, the p or paragraph.

In this case, I’m assigning this p element a css class that we’ll look at in a moment, and I called my class cycling_team, which tells the browser to format this element according to the class.

Finally, the content goes between the opening and closing tags. Here, the content are the two words, Education First.

Make sense so far? Questions?

Cool. Now let’s look at how we write these css classes.

In the top code illustration, I’m showing you the class I write to format what we just looked at. The class name starts with a period. So here, period cycling team. Then we put everything we write inside brackets.

Can anyone guess how I’m telling the browser to format the words Education First?

Right, it’s going to color the words pink. There are lots of properties, like color, font-weight, margins, all kinds of stuff, that we can set values for the properties.

Now for the browser to know that this code is format or style instructions, we put these classes between two style tags. Make sense?

Ok, so if you recall in many past discussions we’ve had, we’ve talked about using grids to arrange things. Things like our memo, our proposal, and so forth. And that the grid is generally invisible but it helps us to organize. And you can imagine using grids to create things like dashboards of data graphics. So there have been fairly recent advances in css that make it easy to create whatever kind of grid we want. One of those technologies is called, you might guess css grid.

By the way, css stands for cascading style sheets. The cascading part means that the browser applies what you write in the order you write them, which means that, here, if we defined multiple classes that, say, both color text, the browser uses rules to know which thing to apply. We won’t get into the details of this during class, though, because we can get away with just learning a tiny bit of all that we can do, and get fancy later, if we want.

Ok, let’s look at one minimal way we can specify a css grid.

As with other formatting things, we put our grid specs as classes inside or between the style tags. And, so, we specify an overall grid as a class that has a number of rows and columns of whatever size we want, and a gutter or gap size between them. I’ve called this class .gridlayout but you can name it whatever makes sense to you. Inside this class, the main property we set is the display: to grid. That, and, again, the number of columns and rows. The columns property is called, as you see here, grid-template-columns and the rows property is grid-template-rows. The gap property is called, well, gap.

Once we decide how many and what size of grid, we can create a class that groups however many rows and columns we want into a single area to put content in. We’ll see that in another example in a moment. If we don’t specify a particular area, we can just make a class that has basic formatting, like I did here with the class I named item, and every time we use the class in another divider tag, it puts the content in the next available cell.

So in this example, we specified two columns and two rows. The two columns are equal fractions of the total grid, that’s what the fr means, and the row heights we set at 5 units.

So when we create a new divider of type item, it starts with the first available cell, which would be column 1, row 1, for the first item. Then it checks cell availability in row order. That’s why we see item 2 to the right of item 1 instead of below it. Make sense so far?

Cool. Ok, we’ll come back to grids in a few minutes. First, I’ll like to talk about the last three main things the make up the document object model.

We can write vector shapes using something called svg, which means scalable vector graphics. I’ve annotated a simple example here. Think of the svg as all the shape things the make the graphic, not just one shape. So we put our shapes between two svg tags, like I’m showing here. And we include the width and height of the graphic.

Now one things just to be aware of if you ever code your own shapes is that normally we think about the origin of a data graphic as starting in the lower, left corner and increasing to the right and going up. For svg s that’s not quite the same. Values do increase as we move to the right, but unlike we normally see data graphics, the origin of an svg is the upper, left and values increase going down. That’s just because browsers are more general than a data graphic, and we normally read starting in the top, left. So far so good?

Within our new svg tag, we can make shapes.

How many of you have drawn things as a kid by connecting dots? So a path tag or a path shape is a connect the dots. We put how to connect the dots into a string. And there are three main commands. Let’s keep the pencil metaphor. We can move our pencil in the air to a specific x, y location on our connect the dots sheet to start, then put the lead on the paper there. That’s the M command in this example. Once the pencil touches the paper, we move our pencil to the next x, y location. That’s the L command, which means line to. And that means a straight line.

Finally, we can make the line curve. Now this is the only one that takes some practice to get well. But the C command, or curve to command, has three x, y locations. The first two are these levers I’ve colored green and blue. The last x, y location is just like the Lx, y location. It’s the final place we move the pencil. Now in this example, I have 5 commands, move the pencil to a location, and draw four straight lines without lifting our pencil up. If you think about the coordinates, we made a square. :)

Now, svg has lots more functionality, but that’s all I think we need to cover for now. This is the basics of how vector shapes are made by other programs under the hood, like ggplot.

Now last class, we talked about how an svg keeps high resolution no matter how much we zoom in, but raster graphics lose resolution. But that raster graphics are faster when there are tens of thousands of shapes, which was the case with our class example citibike graphic. If we tried to make that graphic interactive, there would be too many vectors and it would not be responsive with most computers today.

So to keep advancing our citibike example, let me show you the interactive version. You can see as you zoom in that, unlike in the static pdf version, the line segments that show empty and full stations are not as sharp. That’s because I made that graphics layer a raster to speed it up.

That’s where rasters are helpful.

Javascript is a programming language that can be used, as I said, to listen for your actions, and react by changing the html or css or svg or whatever is needed to update.

Now the very basic code I’m showing you gets to the heart of how this works. In javascript, we can search the DOM for a named element or tag, and that thing because an object. And javascript let’s us do things to the object through functions attached to the object. Here, the function is on event. The word event is a placeholder for onclick, or onmouseover, or onkeypress, and so forth. Your actions. So that function watches for the action, on that element, and when it sees the action, it calls whatever function is assigned, here I’ve generically called it function name. And that function then does things, like adds a new tag element, or removes one, or changes the stroke width, or color, or really whatever you’d want. And that, on a very simple level, is basically how we get interactivity.

Now as applied analysts, we probably won’t be coding most of this directly. Instead, we use analysis software that makes this for us. And that’s the stuff on the right hand column, using the R ecosystem as an example.

It’s pretty simple. All the data is is a data frame with each observation being the id of the ball field and connect the dots x and y coordinates. And I use geom_path to draw the 30 black boundary lines. If you’re looking closely, you’ll see that I also have a geom_polygon and I’ve subset the data between the two geoms. That’s because I draw the outer black lines with geom_path but also have the infield dirt location in the data, so I filter that out of the black lines, and I only use the dirt path in geom_polygon. And geom_polygon lets me use the brown fill color.

And it’s the black lines made from geom_path that are interactive in the real example, not this screenshot, right? We can take a quick look at the original again. Pretty easy and straight forward so far, right? Any questions before we make it interactive?

To make it interactive we can use almost the same code. We include a second package called ggiraph, in this case. And you’ll notice that the lines created by geom_path now use a very similar named but different function. It’s called geom_path_interactive. And that’s a ggplot replacement function that ggiraph gives us to make it interactive. We get the same parameters we had, but we can also specify a tooltip to bind to actions and a data id to bind to our actions.

And I’ve bolded those differences here. We save the entire new ggplot into an R object, which I’ve colored pink.

Then we give that pink object to a second function called girafe, which does the action to element binding. And I’m showing you the pink object here as a parameter in this function. Notice, also that this function includes options, which are what we want to change in reaction to our actions. Here, I’m specifying changing the css or svg elements stroke width and opacity based on a hover action. So in these CSS strings, we can include whatever properties we want to change.

Does that make sense so far?

Ok, to add the second graphic of the fences, we start by just making another ggplot graphic, but again using the interactive version of the ggplot geom. Again, we save that into an object. Here I’ve named it gg_fences and colored it blue.

Now to combine the two graphics, we give both plots at the same time to the girafe function. See that in print gg_boundaries / gg_fences. The slash comes from the package patchwork, which is very useful and cool for organizing plots, and the slash means we are placing one on top of the other, like a fraction with a number on top and bottom. And that’s it. The rest is the same, and both graphics work together, cross sharing the hover action.

Pretty cool, right?

And because this is just a little more code, you can probably imagine that this would be useful for just exploring data, too, to quickly make a plot and move the mouse around to explore details. And that, I think, is what Jaques Bertin meant when he explained that plots are never made once and for all, but changed over and over to learn about all aspects of the data. This type of interactivity just saves us time instead of recoding every time to learn a new detail.

Ok, along with ggiraph, we have other packages that make interactivity pretty easy. Let’s look at another.

This next one is called plotly, along with some other helper packages. So plotly is both an alternative to ggiraph, but it can also work with ggiraph. Let’s consider it on its own. Plotly is a chart library or package that both has its own functions to draw graphics, but, importantly, and like girafe, can turn a ggplot object into an interactive one.

One nice thing about plotly is that it gives us an easy way to select multiple elements in the graph at once, using a lasso or other things, to select them. And in the example I’ve made for a screenshot, I’ve made a ggplot graph on the left interactive, but I’ve also used two other packages, one called DT, which means data tables, and another called crosstalk. Crosstalk let’s the plot on the left talk to the table on the right. Let’s see how it in action, and then look at how the code does this.

So, as with girafe, I’ve written little explanations on how the code works. This is pretty short to make what we just looked at. We load the few libraries.

Then, we create a key that identifies observations across the graphic and table, which is added to our data frame. That’s the highlight_key() function. So here, our data frame is the mpg example in R. Then we save the keyed version of it in a new object, here I’ve called that, just m.

Then, we create our ggplot object, but notice that we use the keyed version of the data frame for our data.

Once we’ve made, and saved the ggplot to an R object, we make it interactive. Plotly’s version of this uses two functions, called highlight() and one of its parameters is the ggplot object wrapped in another function called ggplotly(). Make sense?

Finally, we combine the graphic and table using the function bscols().

That’s not too bad, right?

And with plotly, we also get other things pretty easy, like zooming and panning, or a filter box, or a radio selection box, and so forth. Now I want you to be aware of one more important tool for creating interactive graphics today, but it’s a little more complex and beyond the scope of this discussion to demonstrate how it works under the hood.

That tool is called shiny. And it is a little different than giraffe or plotly. It is really a tool to create web applications, not specifically data graphics. And this type of tool is all about both having a server side and a client side, even if the server is our own computer. And the reason is is made this way is it requires an active R session to run calculations before making changes to what we see.

But that’s also one limitation of this, more specifically, because it needs a server we can’t have a single html file to give to someone else. This shiny tool let’s us make all kinds of widgets for a web page to interact with data graphics, and I’m showing you a screenshot of examples here. Buttons, checkboxes, text inputs, file inputs, radio buttons, drop down selection boxes, sliders, and so forth.

I want you to be aware of it, and it isn’t too difficult to make basic things with it, but it takes up too much time for explaining how it works during a class. Ok, with these tools in mind, let’s think about how we can organize multiple graphics onto something like a dashboard.

The flexdashboard template is an r markdown file that creates a grid to put things in. Now the grid it uses is called flex grid and, more specifically, uses another technology called bootstrap that creates defaults for the flex grid.

So we can just click a button to get started making this. Let’s try it out.

A nice feature in it using bootstrap4 version of flexgrid is that it is fully responsive to rearranging your content to best fit the size of device the user is on, like a desktop versus an iPhone. And you didn’t have to do anything to get this to work.

But next, I want to give you more freedom and not rely on someone else’s default templates like here. And flex grid is, perhaps ironically, less flexible than css grid. By that, I mean flex grid only allows us to specify either rows or columns, not both, like we saw with css grid. So it’s not as precise either.

So, now, let’s see how we can do our own thing with css grid inside a basic r markdown file. And before class, in your resources, I’ve given you an example of this. But let’s look at another one.

We’ll start with the first reading, which provides a critique of what people call dashboards. Now, here, I’m showing you a generic representation of a vehicle dashboard, kind of like those found in your favorite Volkswagen Beetle.

Who is the audience for a vehicle dashboard?

What’s their purpose?

Does it need extensive words or narrative? Why?

Right, these dashboards are specifically designed to provide information for literally “at a glance” monitoring of their car and driving. We don’t want drivers to have to study it while driving. Nothing changes in the variables, just the measured values, right? So there are uses for dashboards that do not explain itself.

We aren’t talking about those tonight. We’re more interested in creating communications that provide new information to our audiences. For those, we must. We must consider who our audience is, and relate the purpose of our communication to them. And for those, even for dashboards in business applications, whatever those forms may be, should be annotated.

Let’s see what the authors of a study who asked business intelligence experts, what they think about dashboards.

Again, I encourage you, when you have time, to look up the original communications these authors used to develop these so-called flow factors. The authors have keyed each to their factors. They should provide you with inspiration on how to approach various communication challenges. Seeing many examples and variations enable us to go beyond simple imitation to naturally implementing our own designs.

I’d like to wrap up this initial, high level discussion by considering what some business intelligence experts have said about their preferences with interactivity.

I’ve pulled this quote from the same reading. Let’s read this together.

“An issue of communication is related to storytelling ability. Dashboards are increasingly used for decision making and communication across contexts: top-down, within departments, and across the organization. Dashboards that capture only the data and not the semantics of the data, or what was done in response to the data, can be insufficient for communication purposes. In BI, people often take screenshots of dashboards and put them into slide presentations in order to annotate them with contextual information, suggesting a need for more powerful storytelling features.”

So we should craft narratives and explainers, even for dashboards, to help our audience if we are to be successful for our purpose in communicating. And as the reading discussed, the concept of a dashboard is used quite broadly, blending with communications that others may think of as information graphics or even articles.

So far we’ve discussed interactivity around the data graphics themselves. These same concepts apply more broadly to the entire communication, not just the data encodings. And this brings us to the other reading on visual narrative flow.

Those authors break down and study audience experiences with a taxonomy of over 80 interactive documents that both differed in types of interaction and domain of use. They start with the most fundamental, and obvious, but to frame their work. They define visual narrative flow as, let’s read this together, “visual narrative flow is the congruence between flow-factors, i.e., 1) the way a reader navigates the story, 2) the visual components of the story, and 3) the type of visual feedback the reader receives; along with the nature of the data and facts that the author wants to communicate.”

So the authors discuss three components. They categorize them as seven types, and along with studying interactive documents in the wild, they also conducted experiments to study audience preferences in their interactions. Let’s consider their abstract examples of these flow-factors, next, and review their experiment.

The authors explain that level of control corresponds to how much control a reader has over the motion or animated transitions of story components. For these levels of control, a reader can have discrete control if they trigger motion playback, like using a scroller, or continuous control if they can play through the keyframes or time points of that motion.

It is also possible for a hybrid style to combine or support aspects of both, with, for example, a timeline plot where points can be clicked to navigate. They break down levels of control based on the categories I’m showing you here: text, visualizations, and animated transitions.

Text and visualizations can move or fade in or out within the page, and this motion is described by level of control for those elements. Note that we’ve already seen related examples, like where we fade out data encodings that are not on the focus of a pointer hover, right?

An animated transition is defined here as more specific, data-relevant motion that preserves data context across or within visualizations. And, again, three weeks ago, we considered animations, right? In that context, it was as an alternative to sequence or layer information one step at a time. Remember that? Cool.

Let’s consider navigation progress.

Again, as the authors explain, navigation progress describes how the reader perceives their placement within the entire story. Not all stories may show navigation progress, relying on the implied progress of a scrollbar. That’s common, right, on most web pages we browse?

Otherwise, stories may showcase this progress in a variety of ways. A common way is to represent steps with dots, and we considered an example of dots last week, right? Remember the New York Times article on the Yield Curve?

Another method utilizes numbers or text for story steps. Notice these ideas can be pretty flexible, but either require we know our audience is already comfortable with the thing we use, or we need to clearly explain it.

Authors also use visualization to convey story progress, such as a path on a small multiple map. And we’ve looked at examples of small multiples. You should review the actual examples these authors cite in addition to the class examples I’ve already shown you.

Let’s talk about story layout.

Navigation feedback combines things like animated transitions with additional animations of story text or other components, such as fading or movement. This factor is all about showing to readers that their input affects the story.

For example, it is possible for both the text and visualizations to transition or move on the page simultaneously, or in sync, with users scrolling or moving their cursor. Remember how we talked about how scrollytelling conceptually works?

These animations can also occur one before the other, just the text or just the visualizations. Animated transitions that are not tied to data can show change using motion or fading, and these animations can occur in different parts of the story interface: the text, the progress widget, or the visualizations. And again, we’ve considered examples of this.

So those are the main concepts we should keep front of mind when designing and organizing our interactive graphics within a narrative we communicate.

Depending on our purpose, any of these can be more important or more useful for a specific purpose. And we should test different approaches, to see what we and our audiences think works best for them.

Let’s read this together. When creators were asked if they want the visualizations in reports to be completely interactive and encourage readers to interact with them (e.g. using drill down/up, filter, link & brush), the experts prefer to have interactive visualizations that permit linking and brushing (i.e. data selection). But they would limit the more advanced interactions such as drill down/up or filtering.

They felt that all the data needed to tell the story should be displayed clearly in the report without the need to explore the data further. By the way, that sounds like what we’ve discussed earlier from other author’s, right?

They go on, the authors feel business stories should be mostly author-driven and work best when the goal is storytelling or efficient communication.

What do you think? Agree? Disagree?

Maybe it all goes back to our audience and purpose, right? So let’s consider the audience and purpose of our next assignment. Let’s go to the description and look together.

Now the words are hard to read from a full spread slide on the screen, but we can obviously zoom in. The visual, to explain, categorizes types of value that marketing executives provide to their organizations. David has us start from the inner three, shown here like a pie chart. What he says are three main types of value. Cultural value, business value. And consumer value.

As we move outward, David, subcategorizes and clarifies and provides examples to help us think about these different types of value, all the way out to the outer tags, which get pretty specific. We won’t go through all this today, but whenever your job is to communicate with a marketing executive, this is a great place to start.

Along with that article, and various breakdowns of this visual, he authored two other very helpful articles.

Now, this is a screenshot, so let’s click to look at the interactive version. I’ve placed it on my website and it’s a single html page. Take a moment to explore it. Then, let’s discuss your thoughts.

Cultural Value, relevance. Are there better temperatures for us to trigger marketing messages to encourage rides? Customer value purchase experience. How can we segment our audience to find opportunities for increasing ridership? Cultural Value relevance. Are there better times of day for us to trigger marketing messages to encourage rides? Business value insight creation. Do any anomalies suggest preferred customer behavior? Business value insight creation. What customer/rider attributes and use cases are more correlated with high usage? How can we use this information to expand our prospect marketing efforts and more effectively appeal to prospects who display similar behaviors? Business value insight creation. Similarly, Do rider attributes correlate with lower usage? Are we missing key target audiences? Customer value purchase experience. Are there any anomalies in the data that would indicate lack of availability may be causing lower usage?

Now to create this, again, I just used the same technologies that we discussed last week. I’ll post the code and data online so you can both run it to make this, and to review how I used what we discussed to make this. And I’ll post it as a discussion so you can ask questions to me and to each other. Sound good?

So I hope this has been a fairly easy introduction to the technologies and tools we can use to make interactive graphics. And I hope the examples are a good start to a few approaches to start making things like dashboards.

Of course, you can also use Tableau if you want, but it’s much more limited than the tools we’ve been discussing, and it’s proprietary, not free. And to share a Tableau interactive, you either need your audience to have a version of Tableau installed, or host your interactive on a website somewhere.

The approaches we’ve been discussing enable us to make a single html file and give to others, and all they need is a browser to look at it.

Remember, You need to understand the principles we’ve been discussing all semester for explaining static graphics before you’ll be able to create effective interactive graphics. But please start to practice these new tools between now and next class. Start by just trying to make a few of your past graphics interactive.