Daily Box Score 9/11: Visualizations
I don't know about you all, but I've been enjoying the phoenix-like rebirth of our regular feature Graph of the Day. Much credit is owed to Justin Bopp and Walter Fulbright, who clearly know their layers from their masks. So I thought this might be a good time to talk about the elements of a good visualization.
Table of Contents
Data
Presentation
Finished Product
Discussion Question of the Day
The single most important function of any visualization is to convey data. Different bits of data are best visualized in different ways.
The preeminent authority on the presentation of data in visual form is Edward Tufte, and his most acclaimed work is The Visual Display of Quantitative Information. In it, Tufte gives examples of what does and does make for an effective visualization.
One of Tufte's most essential points is that the manner of visualization is dictated by the data, not the other way around. Even though Microsoft Excel makes it very easy to turn just about any data set into any type of chart doesn't mean it's a good idea. For example, when the sum of the data is important, a pie chart often fails, because it does not give any visual indication of the magnitude of the sum. It is a representation of relative size, not absolute magnitude.
A related point is that good visualizations are dense. That doesn't mean they are hard to unravel, but rather that they contain a great deal of information per square inch (or pixel). Tufte's classic example is this visualization, created by Charles Joseph Minard in 1812. (For a sarcastic cautionary tale, see Tommy Rancel's incredible pie chart here.)
Now, it might take you a minute to untangle that, especially if you don't speak French (zut alors!). But what you're seeing are several visualizations at once: (1) a map of Napoleon's progress toward, and retreat from, Moscow (note the rivers), (2) a representation of the size of his armies (and their offshoots, with width indicating size), and most incredibly, (3) a representation of the temperature at each stage of the march (see the line chart at the bottom, with corresponding indicators to the chart above--higher is colder).
This is what we would call a dense visualization. Did I mention he did it in 1812? Think about that the next time you create a chart in Excel...
Presentation is an important part of any visualization, and the better a chart looks, the easier it is to spend lots of time deciphering it. I can claim no expertise, but there are a few basic guidelines for clean presentation of visual information.
First, the use of colors should be such that they are easy to distinguish (even for those who suffer from color-blindness). But more than that, it is important to remember that color is potential axis upon which information can be conveyed. Color coding is a good way to visualize one variable, thus increasing the density of the information presented.
Second, a good chart is clearly labeled. Notice the plethora of labels included on Minard's chart above. There are dozens of labels, and they are all as close, spatially speaking, to the item they label as possible. The more complex a visualization gets, the more difficult it can be to follow lines connecting labels to data points or series. One of things I appreciate about Justin and Walter's charts is they way they superimpose images of players onto the bars representing their performance (for an example, see here). It's an easy way to create a label that is tied very closely to the data.
Part of the inspiration for me venturing down this road was the work of Dave Allen, who writes for Fangraphs and Baseball Analysts. He's a programming-savvy guy, and he is probably best-known for using the language R to create compelling heat maps based on PITCHf/x data. You can see a characteristic example of his work here. In the vast world of PITCHf/x graphs, Allen's are among the easiest to decipher, even to the lay fan. For that reason alone, his work is praise-worthy.
But I was particularly impressed with this particular chart, taken from this article. Allow me to reproduce:
Allen explains:
Most batters have more power on pulled balls and pull more inside pitches. So is Pena's outside power from opposite field power or from an ability to pull outside pitches[?] To examine this I took inspiration from Max's work looking at relationship the between the horizontal location of a pitch and the horizontal angle of the resulting ball in play. In this case I just looked at Pena's HRs. Remember that -45 is the third base line and 45 is the first base line.
Here's what's great about this visualization: it's dense, it's high-contrast, and it conveys information in a way that could not be done without a graph. This last part is particularly important. There would be no other way to understand the central point (that Carlos Pena pulls pitches on the outside half of the plate to right field) without this kind of graph. A table couldn't; a pie chart couldn't; a bar graph couldn't; an x-y plot couldn't.
The upshot is to reinforce the points made above, which Allen clearly understands. It also goes to show that knowledge of programming can be extremely useful in visualizing.
Discussion Question of the Day
When I showed this graph to a friend, he said he wished such a chart existed for every player. And I agree.
But it led me to another question. I have been thinking about how one might visualize AVG, OBP and SLG on a single chart to display the shape of a player's performance.
What data series do you think would go together nicely on a graph?
18 comments
|
1 recs |
Do you like this story?
Comments
Nice write-up, Tommy. (and thanks for the shout-out!)
Good points, but one cannot overemphasize the importance of the actual information conveyed. I’ve always held that a chart/graph functions as a visual summary of what the research/article/hypothesis is attempting to conclude; the reader should be able to see the chart and instantly know the point the author is attempting to make.
That’s the problem with so many charts in sabermetric research: the point of them is to make the conveyed message easier to understand, but so often the included visual is barely three lines stretched horizontally over a gray surface.
I loved the section about Dave Allen. The ability to convey data visually in unconventional means is amazing.
"What we do in life, echoes in eternity!"
One interesting pie chart would be
A break down of skills into their percent of an overall value stat , such as wOBA by BB, S, 2B, 3B HR, SB etc. or WAR by its major components. One relevant use would be in an article like FanGrpahs comparison between Adam Dunn and Nyger Morgan back at the time of Morgan’s trade to Washington. The author argued that the two may have similar overall values from vastly differing skill sets (OBP/SLG vs. Defense/positional).That seems like a perfect time for a nice pie chart. Mmmm pie chart.
by Slugger O'Toole on Sep 11, 2009 8:04 PM EDT reply actions
I LOVE THIS IDEA.
I used to work with an old man that told me. Son, every workplace has a dumbass, if you don't have one where you work, then I'm afraid you're it.
BPro does something like this on their player pages.
They do a column graph with five columns, measuring production (offense only I believe) in terms of K, BB, ISO, AVG, and uh, something else I think. Love the idea.
I think I prefer the column graph to a pie chart, because you can not only see the relative strengths of a player, but you can compared them to league-averages.
Beyond the Boxscore Not a member? Sign up.
Let me know when I can help. I can do anything mentioned here, provided I have the data.
"What we do in life, echoes in eternity!"
by Justin Bopp on Sep 13, 2009 10:12 PM EDT up reply actions
An example:

This presents an issue, though, in that the ability to run the bases isn’t as valuable as the ability to hit for power. Do you scale things in runs or scale things in percentiles?
Beyond the Boxscore Not a member? Sign up.
Right
And these are decent enough but caveats apply. First, they are projections and therefore do not give an accurate picture of actual performance. Second, as you note, they do not weight the “tools.” Look at several of these side by side (which is hard enough to do) doesn’t give much comparative value. If you could do a series of small, thumbnail-sized images , say a triangle (or really any concave polygon) with the length of each side proportionate to performance on a given metric, it would be much more useful, I think.
My inspiration for this idea was the player skill graphic from Winning Eleven 7, if anyone knows what I’m talking about. Can’t find a screenshot at the moment.
by Tommy Bennett on Sep 14, 2009 5:31 PM EDT up reply actions
Stacked Bars??
Doing a stacked bar of wOBA so that you can put 5 or so guys of similar wOBA on the same chart would be fairly nifty
by stevesommer05 on Sep 15, 2009 4:33 PM EDT up reply actions
That's sweet.
I really like the wOBA by component idea, too.
Beyond the Boxscore Not a member? Sign up.
by Sky Kalkman on Sep 15, 2009 10:31 PM EDT up reply actions
If we were just going to do one player at a time, and not go stacked bars...
I think it would help to have the average rates notated somehow, and maybe the 80th percentile or something like that. So while the 1B piece might always be larger than the HBP piece, you could better see that Player X gets value from HBP than usual while less from 1B.
Beyond the Boxscore Not a member? Sign up.
by Sky Kalkman on Sep 15, 2009 10:32 PM EDT up reply actions
An idea for a graph
This wouldn’t create any new data, but it would be a handy-dandy visual reference to see how a player performs at a glance:
I’m not even sure what they’re called, but if you can imagine a standard four-quadrant graph, with each axis representing a different metric: one for AVG., one for OBP. one for SLG., and one for say… UZR you’d end up with an irregular diamond shape depending on that player’s skills. These sorts of graphs get used a lot in video games to show a character’s statistics visually, and I think would really work for baseball — with some tweaking necessary. You’d need to find a way to illustrate negative UZR, like making a 0 rating actually stand off the axis somewhat. I apologize for being such a layman, but hopefully you can get the gist of what I’m talking about.
I was thinking of that graph when Tommy asked
I cannot recall what it’s named though, you’re right.
I don’t think UZR would work in this case, as it isn’t in the same units as the rest of the measures. For the other three, the least you would have to do is put it in a + scale with an average of 1 or 100. In that sense, we don’t have to deal as much with the fact that each of the values are in a radically different scale. I would prefer also to see stats that don’t auto include average into this. ISO works as a SLG replacement, but I can’t think of anything for OBP. I don’t want to use BB%, because it doesn’t illustrate everything about OBP.
Somewhat related, here's a link to a blog of a teacher that is in love with visual media.
http://blog.mrmeyer.com/?cat=47
This guy loves the Feltron.
I used to work with an old man that told me. Son, every workplace has a dumbass, if you don't have one where you work, then I'm afraid you're it.
Great Chart
This wonderful visualization is simple, only takes a minute to figure out, and conveys such information.
Jon Peltier
Pena
has he really not hit a ball straight away this year? thats kinda amazing.
by jamiethekiller on Sep 14, 2009 10:46 AM EDT reply actions
I have something in the works for this. Waiting on the data from Walter.
"What we do in life, echoes in eternity!"

by 




















