Welcome to Wednesday Night Live from New York!

Hello everyone. How’s it going? Last week, we discussed coordinate systems, encoding data using visual channels, building graphics using a grammar of graphics, and we practiced using that grammar.

You’ve also turned in your first homework.

This week, I’ll start by reviewing the homework.

Then, we’ll talk about color as an important and particular channel for encoding data.

Third, we’ll touch upon the effectiveness of different ways of comparing encoded data.

Finally, we will discuss several principles of design that we will use in the upcoming weeks to help us explain our graphics.

I’ve shown you this Venn diagram a few times, and we’ll continue to use it as a reminder of the elements of this course we’re focusing on. Tonight, we will continue working on visuals to, ideally and eventually, capitalize on their value as John Tukey has described.

Before we get started, let’s see our timeline of deliverables again.

We’ll look at this every week, just to refresh our minds on where we are in the course. We’re still focusing on learning some basics of graphics by practicing, or working on individual homeworks. Later, we’ll start group work where each of you gets to contribute and apply these skills in new ways.

Next up in terms of deliverables is your homework 2, which I have made available on our class website.

Your second homework will build upon the graphics ideas we’ve been discussing, and builds upon your first homework.

Speaking of the first homework, let’s review it together now, which should also give you a chance to ask questions if you don’t understand.

[REVIEW THE HOMEWORK]

Awesome. OK, now let’s continue building our understanding of visually encoding data.

And this brings me to a particular issue with encoding data. Color is complex, and can do unexpected things when we try to use it to encode data. So let’s review color.

We considered this summary of visual channels last week which was, again, formalized in Jaques Bertin’s seminal book, Semiology of Graphics.

And you’ll notice he includes color as a channel. He actually means hue here, one of three attributes of color.

Bertin also includes value, and by value he means luminance, another of the three attributes of color.

The third attribute, we can actually also use to encode data, through he doesn’t specifically list it. That is Chroma, also called saturation.

So let’s break down these attributes of color.

Let’s consider how we see color on a computer screen. On the left rectangle, I’ve filled it in with our school color.

Next, I’ve literally taken a photo of a 10 times magnifying glass of my computer screen. Notice that there are many individual groups of red, green, and blue pixels. When those are close together and small, our eye and mind blends them together so that we see our school color.

So every pixel in a computer has a value for each read, green, and blue. Those triplet values are stored as a single hexadecimal code. A concatenation of three numbers. The red number values go from 0 to 255 (because of the bits used to store numbers) and, converted to the hexadecimal system, that range is 00 to FF. We can use functions to convert different ways to think about color into those hex codes.

The spectrum of visible color can be represented in various ways.

On the left, I’m showing you all visible colors created from blending three hues: Red, Green, and Blue. Now we tell computers how to display colors from these three values. And typical values are to range each by (0 to 255).

This maximum may seem odd, but it’s just a product of early computers where we cared about how many bits we used to represent something. 255 is the same thing as eight combinations of 1 or 0. Eight bits either turned on or off.

More naturally, we can represent color in terms of it’s hue, what we think of as color. And also it’s chroma or saturation, which is how much hue versus gray we see. And finally, how light or dark what we see is, the luminance. I’ve shown this representation to the right.

But notice that the range of wavelengths for what we perceive as distinct hues are not equally spaced. This is a problem for encoding data. We’ll get to that in a moment.

This is a visuable representation of mapping data, the numbers, to color values, here more specifically, luminance values. Notice the mapping between numbers and luminance uses a function, a mapping function.

So how can we encode or map data to each aspect of color, it’s hue, chroma or luminance?

To do that, lets consider those non-linear differences in wavelength, and more specifically, how our eyes do not see things linearly.

There’s a difference between actual differences in lightness, the measured value we call luminance, and our perception of that luminance, we call brightness.

Let’s represent this difference with some boxes making a diagram, which I borrowed from a master in color theory, Joseph Albers.

The two graphics on the left show luminance, the physically measured thing.

The top left shows linear increases in luminance.

To the right of that, he shows us that as luminance increases, our ability to distinguish levels diminishes. It our perception isn’t linear.

Then, on the lower left, Albers shows non-linear increases in luminance, which we perceive as equal increases, as linear.

Does this make sense? Questions?

Color can play tricks on us in other ways too.

Do we perceive these two rectangles as having the same luminance? Or do we preceive one of these as brighter?

These do, in fact have the same luminance, which I’ve specified and shown you the values. Let’s compare another attribute of color.

Do we perceive these two rectangles as having the same saturation?

Let’s now compare the third color attribute.

Do we perceive these as having the same difference between hues?

The last example shows up in the upper, right corners here. I’ve used the default graphics RGB encoded to data. Notice how uneven the changes can be. So let’s say we want to encode a range of values to hue.

I’ve coded equal steps in this gradient from green to blue. But do we perceive these steps as linear? No, right? The values between 6 and 10 seem to be represented by the same green hue while values 2 through 4 seem to be mapped to very different hues.

Obviously, our audiences would not be able to use this color mapping to easily decode them to their numerical representations or even make accurate comparisons between them, right?

So can we even map data to color to represent linear ranges or is it just unreliable?

Researchers have been studying this problem for a long time. And an international commission of researchers, in the 1970s was able to re-map or transform color wavelengths to a space that we perceive as closer to linear, with uniform changes.

Now the Commission’s work is in a color space that is not as intuitive as what we’ve been discussing. It’s not described as hue, saturation, and luminance.

But others have added to this work, re-expressing it in a intuitive way.

And you can read about it in the references I’ve given you.

Now I’ve written an R package that uses the perceptually uniform mapping in terms we are familiar: hue, chroma, and luminance. I’ve called it HSLuv and you can get it on my GitHub.

Let’s see whether using this type of mapping seems more uniform or linear in changes than what we just discussed.

So I’ve created a data frame of values for hue, saturation, and luminance. Then I use my mapping function hsluv_hex to translate it to a computer color number. Then I plot all the values.

Let’s see the before and after.

As you explore the after, I think you’ll agree that comparing any step change across hue, saturation, or luminance, we perceive the values as more uniformly changing, which is what we want for encoding data using these attributes of color.

And how might this translate into our 10 step transition from green to blue? Let’s see.

Placing this back into our green to blue, our audience can now more easily decode the changes in hue to changes in value, right? We perceive this as more linear in the changes across hue.

More perceptually-uniform color spaces make it easier to assign colors to categories too, so that each we perceive as equally-spaced apart.

Not only do we perceive color non-linearly, and need to account for that in our mapping or encoding data to these channels, we need to be aware that color can play other tricks on us.

Let’s also consider the interaction of color attributes with another idea we’ve discussed. Recall the idea of transparency, or opacity?

In this example on the right, I’m showing you 5 black circles — each with an opacity of 40 percent, that’s the alpha parameter — partially stacked onto one another. So we can partially see through each.

Will someone describe what we see?

Let’s consider an example of how partial transparency can be useful.

With partial transparency, when shapes overlap, we can get a sense of how many shapes are overlapping, to use a statistical term, we get some information about the density of shapes at any location.

Here, the darker areas indicate more overlap of circles. Questions?

What happens, though, is we use different colors to represent the shapes, like the orange and blue circles we considered last week? Let’s take a look.

On the right, again, I’ve set the opacity of the orange and blue circle to 40 percent, and let them partially overlap. What do you see?

Now if we want the colors to mean something categorically for example, we risk the overlap and transparency creating new colors that we don’t have categorical meanings for.

There are actually numerous ways that color can trick us when we’re trying to convey information through color encodings of data. Let’s look at a few.

We see color as relative to what’s around it. Color interacts with its surroundings. And this is another issue we have to always consider when using color to encode data. Let’s see how it can play tricks on us.

Here, I’m showing what you perceive to be two rectangles of gray. What do you see? The rectangle on the left is darker than the rectangle on the right? Actually, it isn’t.

What I’ve done is only gradually darken the left rectangle towards the right one, and gradually lighten the rectangle near the left one. This creates an edge that we as humans are tuned to see. But most of the rectangles are both the same color of gray.

Seems crazy, right? You can cover up the area between the two vertical blue lines, with your hand or whatever, and see that the rectangles on the left and right are the same, but I’ll also demonstrate.

[SCROLL DOWN FOR REVEAL]

I’m now showing you the same rectangles, but just put a white one covering up where they meet. Now see they are the same?

So our perception of color, of luminance, is affected by surrounding luminance.

If we encode data with color as multiple markings, one visual marking can alter how we perceive the color value of another visual marking and, thus, it can cause errors for our audience in decoding it to numbers.

Let’s see this problem in action another way.

Here’s I’ve placed 8 rectangles on top of a background that changes gradient. Imagine that the luminance values of these 8 small rectangles we’re encoded using data. Do the 8 small rectangles have the same or different values?

Let’s remove the background and see.

[SCROLL DOWN FOR REVEAL]

Ah, the top 4 are all the same luminance value, and the bottom 4 are all the same luminance value.

But having a changing background deceived us into thinking they were different and, thus, our audience would have erred in decoding them to numbers or data.

I’ll let you explore the effects in other ways. I’ve borrowed these examples from another great reference, Colin Ware’s textbook which I provided you the citation on the references slide at the end.

Now we’ve been focusing just on luminance. Let’s also consider hue and saturation.

Here, within the left squares, the small ones are the same color. Notice how the background makes it seem like the two small squares are different shades. The same is true of the red squares.

It surroundings can create the opposite affects, too.

Here, the two small squares in the center of each large square seem almost the same, but when I place them next to each other in the small swatches, we can see just how different they are! I’ll show you one more issue. I’ll warn you now this may hurt your eyes.

When two colors with contrasting hues but similar luminance are near each other, they create what’s called a vibrating boundary.

Hard to look at right?

Now I always like to question things. If this seems bad, should we always avoid it? Or can we make it work for us? Let’s try an experiment.

We will soon be thinking about how to guide our audience in various ways within a visualization.

Here I’m showing you a random scatter, and have colored one point using both those hues we just looked that that creates a vibrating boundary. Does this small application help draw your eye to the point?

Next week, we are going to consider different ways to use encodings to help us explain our data to others. Between now and then, I’ll let you work through your own experiments to see if you find it more or less effective than other approaches.

Awesome. Ok, let’s review some of the ideas related to visual design that we can use in our data graphics.

Ok, first up is a concept that we’re not actually going to talk about tonight, but is something I’ve been using in every presentation slide I’ve shown you. It’s the use of grids. Grids help us organize information to make it easier on our audiences.

I’ll get into the use of grids when we start to combine graphics with other graphics and with narrative.

As we begin to discuss how to structure information, and how to design tables and graphics, we one important design principle is that of proximity. By the way, we call these design principles, Gestalt principles, after the German researchers who formalized the ideas in the 1920s.

It’s the relative closeness of elements that we as humans use to perceive things as groups and having patterns.

Now the idea of proximity and use of grids are related but different. Grids help use keep like things together, and different concepts separated by, say, negative or white space, which is very important. But the concept of proximity is more broad than grids in that things don’t necessarily have to be aligned, just relatively close, for us to consider them related. Make sense?

Also notice, I’m showing you the abstract concept of proximity placed precisely in my grid. I use for modules as an area to show you the concept, and you’ll see as we go through different concepts, I’m aligning each together in a single column to the left on the grid.

So another early concept we discussed was the Gestalt principle of similarity.

And I made this abstract example to show you how we perceive things as together based on the elements sharing an attribute. That attribute here is the luminance of color, right? And we purposefully used color attributes for several purposes, including linking text with data encodings. Sort of like I’m also doing here with the blue hue on this visual, right? Make sense?

And we’ve discussed a third Gestalt principle.

It’s the principle of enclosure. We can use a shape, like this gray rectangle, to enclose other elements, and we perceive the enclosed elements as having some relationship, depending on the context we use it for. We saw this example several times, including in our design of a table. And we’ve seen this used to great effect on data graphics too.

There are three more principles of organization we’ll review here.

It’s the principle of closure. Closure means that we don’t have to see the entire shape of an element or thing for us to imagine that element as having it. Here, we only partially see a circle layered behind a rectangle, but we think of that partial marking as a circle. We imagine the rest we don’t see. Our minds tend to do this automatically.

Now this concept is useful in several ways. And we’ll consider a useful way related to our discussion of maximizing data-ink by erasing things. We’ll see this in action in a few minutes.

A fourth helpful principle is continuity. Let’s review that.

In this example, we tend to see a rectangle and a curvy line partially in front or behind the rectangle, right? We tend to see this instead of seeing other odd shapes that could be present with these lines. Does that make sense?

How is this useful? What about being able to follow curvy lines in a network diagram, or being able to rely on dots or dashes lined up close together to create the appearance of a continuous line? We’ll see an example of this in a moment, too.

The last of these grouping principles is that of connection. Here, we see the left two black circles connected as related, and the right two that are connected as related. This is very common in connecting discrete measurements along a timeline, for example. But the concept is not limited to data encodings. We can use this to, for example, connect an explanation to an element.

So, along with these Gestalt principles for organization and grouping, we’ve discussed concepts that help us focus our audience on specific elements or information. Let’s review six of these.

What’s the thing you notice in this abstract example?

Right, that there’s a single element rotated differently than the rest. Notice, also, that we can use orientation to help us focus the audience, but that this is also an available attribute of a visual channel that we can encode data with, right? And we’ve seen examples of this, like with the Citibike exploratory information graphic. Remember we used orientation to encode time of day of an observation?

Along with orientation, we can focus an audience using shape. Here, the triangle jumps out, or sort of pops. We notice it because we only see it among a bunch of circles. So to be effective for focus, we must use it carefully and sparingly.

Along with shape, we can use the color attribute of luminance. Here, the dark luminance shape pops among all the light ones.

We can do the same thing with size.

Or with the color attribute of hue.

And the Gestalt principle of enclosure can also work to help us focus on an element. If you remember the information graphic we reviewed from Giorgia Lupi, Nobels, no degrees, she used a pink circle to enclose data elements that represented female prize winners. Remember that? So some of these concepts overlap in how we can use them.

We don’t have to choose just one, either. Frequently, it helps to combine these ideas to reinforce a relationship or reinforce how we focus someone on part of our communication. This is also in line with how Doumont described including what he called effective redundancy in communication. Using different channels to support the same message.

Let’s consider one last set of ideas, and then look at a few examples and try to guess what concepts are in play.

I’ve pulled three examples from Knaflic’s new book, a follow up of her Storytelling with data book, called Let’s practice.

Now she made up the example. What Gestalt principles do you see used?

How about most of them? :)

[DISCUSS]

And how does Knaflic try to focus her audience’s attention?

[DISCUSS]

IF TIME IS SHORT, ASK THEM TO DO THE NEXT TWO ON THEIR OWN.

So we’ve covered a lot of ground with graphics concepts up to now. We’ve talked about coordinate systems. Visual encodings, The grammar of graphics. Color. Principles of design. Now that we have these encodings, the most important thing a data visual can show is to enable us to compare things.

I agree with these professors that have explained comparison is necessary for meaning. Abelson, who we will consider readings from later, explains, let’s read this together. “The idea of comparison is crucial. To make a point that is at all meaningful, statistical presentations must refer to differences between observation and expectation, or differences among observations.” That’s a pretty forceful claim, isn’t it?

Another professor who you will become very familiar when we discuss visualizations is Edward Tufte. He explains, and again let’s read together: “The fundamental analytical act in statistical reasoning is to answer the question ‘Compared with what?’” Comparison, he says, is fundamental to our reasoning!

This brings us to the question about how we should form comparisons.

Several researchers have extensively tested the effectiveness of different types of comparisons with visual encodings. Their empirical results allowed us to rank types of comparisons from more effective to less effective, which I’ve shown from left to right here, for two groups of data types.

Within numeric types, whether ratio, interval, or ordered data, the most effective approach is to compare an encoding against common scales and baselines.

With both design principles and channel effectiveness in mind, let’s consider a couple of questions about decoding data. We’ll start by taking a poll.

[SCROLL DOWN]

Now in your reader you have already read about a graphic similar to this.

Read the question.

[ACTIVATE POLL]

Excellent. Let’s see the results…

[SHOW RESULTS AND DISCUSS]

What’s going on here?!

[DEACTIVATE! CLICK UP TO GO BACK TO MAIN SLIDE]

What type of comparison are we using to assess the variation of pink points from the blue line over the range of x?

[DISCUSS]

How might we improve this graphic using the design principles we’ve just discussed?

[DISCUSS]

What about the principle of connection? If we’re trying to compare distance of each point from the line at a given x, what if we connect the points to the line along the x value?

Now there are other types of graphs that would be even more effective, and we’ll cover other ideas later.

Now, for more practice, I’d like to turn to a few published examples and try to identify what types of comparisons the graphics invite?

We first reviewed this data graphic last week, if I recall. So now I’d like to know how data encodings are being compared. It’s a very useful skill to be able to understand and sort of deconstruct things like this.

[DISCUSS]

Let’s review another.

Again, we first looked at this last week, and I also gave you code that would let you create each of the encodings yourselves.

Let’s take a minute as a class and try to identify different types of comparisons here.

[DISCUSS]

Let’s look at one more. This time a new one for you, but related to our CitiBike case study.

We’ll need to zoom in so you can see this better. But this visualization encodes data with color differently than we’ve seen so far.

[DISCUSS]

Ok, that’s plenty for us to think about for tonight. I hope our discussion and code demonstrations will give you plenty for your own practice in your homework 2, which, again, I’ve posted on our class website for you to access.

As always, I’ve hand picked references that are best suited to going further for the topics we’ve discussed today.

I’ll stay for questions. Otherwise have a great rest of your night!