Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Jeremy Lin Continues Rampage, New York Wins On Road

Daily Box Score 9/11: Visualizations

Option4_medium

I don't know about you all, but I've been enjoying the phoenix-like rebirth of our regular feature Graph of the Day. Much credit is owed to Justin Bopp and Walter Fulbright, who clearly know their layers from their masks. So I thought this might be a good time to talk about the elements of a good visualization.

Star-divide

Table of Contents

Data
Presentation
Finished Product
Discussion Question of the Day

 

Data

The single most important function of any visualization is to convey data. Different bits of data are best visualized in different ways.

The preeminent authority on the presentation of data in visual form is Edward Tufte, and his most acclaimed work is The Visual Display of Quantitative Information. In it, Tufte gives examples of what does and does make for an effective visualization.

One of Tufte's most essential points is that the manner of visualization is dictated by the data, not the other way around. Even though Microsoft Excel makes it very easy to turn just about any data set into any type of chart doesn't mean it's a good idea. For example, when the sum of the data is important, a pie chart often fails, because it does not give any visual indication of the magnitude of the sum. It is a representation of relative size, not absolute magnitude.

A related point is that good visualizations are dense. That doesn't mean they are hard to unravel, but rather that they contain a great deal of information per square inch (or pixel). Tufte's classic example is this visualization, created by Charles Joseph Minard in 1812. (For a sarcastic cautionary tale, see Tommy Rancel's incredible pie chart here.)

Now, it might take you a minute to untangle that, especially if you don't speak French (zut alors!). But what you're seeing are several visualizations at once: (1) a map of Napoleon's progress toward, and retreat from, Moscow (note the rivers), (2) a representation of the size of his armies (and their offshoots, with width indicating size), and most incredibly, (3) a representation of the temperature at each stage of the march (see the line chart at the bottom, with corresponding indicators to the chart above--higher is colder).

This is what we would call a dense visualization. Did I mention he did it in 1812? Think about that the next time you create a chart in Excel...

Presentation

Presentation is an important part of any visualization, and the better a chart looks, the easier it is to spend lots of time deciphering it. I can claim no expertise, but there are a few basic guidelines for clean presentation of visual information.

First, the use of colors should be such that they are easy to distinguish (even for those who suffer from color-blindness). But more than that, it is important to remember that color is potential axis upon which information can be conveyed. Color coding is a good way to visualize one variable, thus increasing the density of the information presented.

Second, a good chart is clearly labeled. Notice the plethora of labels included on Minard's chart above. There are dozens of labels, and they are all as close, spatially speaking, to the item they label as possible. The more complex a visualization gets, the more difficult it can be to follow lines connecting labels to data points or series. One of things I appreciate about Justin and Walter's charts is they way they superimpose images of players onto the bars representing their performance (for an example, see here). It's an easy way to create a label that is tied very closely to the data.

Finished Product

Part of the inspiration for me venturing down this road was the work of Dave Allen, who writes for Fangraphs and Baseball Analysts. He's a programming-savvy guy, and he is probably best-known for using the language R to create compelling heat maps based on PITCHf/x data. You can see a characteristic example of his work here. In the vast world of PITCHf/x graphs, Allen's are among the easiest to decipher, even to the lay fan. For that reason alone, his work is praise-worthy.

But I was particularly impressed with this particular chart, taken from this article. Allow me to reproduce:

X_ang_medium

via baseballanalysts.com

Allen explains:

Most batters have more power on pulled balls and pull more inside pitches. So is Pena's outside power from opposite field power or from an ability to pull outside pitches[?] To examine this I took inspiration from Max's work looking at relationship the between the horizontal location of a pitch and the horizontal angle of the resulting ball in play. In this case I just looked at Pena's HRs. Remember that -45 is the third base line and 45 is the first base line.

Here's what's great about this visualization: it's dense, it's high-contrast, and it conveys information in a way that could not be done without a graph. This last part is particularly important. There would be no other way to understand the central point (that Carlos Pena pulls pitches on the outside half of the plate to right field) without this kind of graph. A table couldn't; a pie chart couldn't; a bar graph couldn't; an x-y plot couldn't. 

The upshot is to reinforce the points made above, which Allen clearly understands. It also goes to show that knowledge of programming can be extremely useful in visualizing.

Discussion Question of the Day

When I showed this graph to a friend, he said he wished such a chart existed for every player. And I agree.

But it led me to another question. I have been thinking about how one might visualize AVG, OBP and SLG on a single chart to display the shape of a player's performance. 

What data series do you think would go together nicely on a graph? 

Comment 18 comments  |  1 recs  | 

Do you like this story?

Comments

Display:

Nice write-up, Tommy. (and thanks for the shout-out!)

Good points, but one cannot overemphasize the importance of the actual information conveyed. I’ve always held that a chart/graph functions as a visual summary of what the research/article/hypothesis is attempting to conclude; the reader should be able to see the chart and instantly know the point the author is attempting to make.

That’s the problem with so many charts in sabermetric research: the point of them is to make the conveyed message easier to understand, but so often the included visual is barely three lines stretched horizontally over a gray surface.

I loved the section about Dave Allen. The ability to convey data visually in unconventional means is amazing.

"What we do in life, echoes in eternity!"

by Justin Bopp on Sep 11, 2009 5:43 PM EDT reply actions  

One interesting pie chart would be

A break down of skills into their percent of an overall value stat , such as wOBA by BB, S, 2B, 3B HR, SB etc. or WAR by its major components. One relevant use would be in an article like FanGrpahs comparison between Adam Dunn and Nyger Morgan back at the time of Morgan’s trade to Washington. The author argued that the two may have similar overall values from vastly differing skill sets (OBP/SLG vs. Defense/positional).That seems like a perfect time for a nice pie chart. Mmmm pie chart.

by Slugger O'Toole on Sep 11, 2009 8:04 PM EDT reply actions  

I LOVE THIS IDEA.

I used to work with an old man that told me. Son, every workplace has a dumbass, if you don't have one where you work, then I'm afraid you're it.

by Warden11 on Sep 12, 2009 2:00 AM EDT up reply actions  

BPro does something like this on their player pages.

They do a column graph with five columns, measuring production (offense only I believe) in terms of K, BB, ISO, AVG, and uh, something else I think. Love the idea.

I think I prefer the column graph to a pie chart, because you can not only see the relative strengths of a player, but you can compared them to league-averages.

by Sky Kalkman on Sep 13, 2009 9:46 PM EDT up reply actions  

An example:

Pujols at BPro

This presents an issue, though, in that the ability to run the bases isn’t as valuable as the ability to hit for power. Do you scale things in runs or scale things in percentiles?

by Sky Kalkman on Sep 14, 2009 3:17 PM EDT up reply actions  

Right

And these are decent enough but caveats apply. First, they are projections and therefore do not give an accurate picture of actual performance. Second, as you note, they do not weight the “tools.” Look at several of these side by side (which is hard enough to do) doesn’t give much comparative value. If you could do a series of small, thumbnail-sized images , say a triangle (or really any concave polygon) with the length of each side proportionate to performance on a given metric, it would be much more useful, I think.

My inspiration for this idea was the player skill graphic from Winning Eleven 7, if anyone knows what I’m talking about. Can’t find a screenshot at the moment.

by Tommy Bennett on Sep 14, 2009 5:31 PM EDT up reply actions  

Stacked Bars??

Doing a stacked bar of wOBA so that you can put 5 or so guys of similar wOBA on the same chart would be fairly nifty

by stevesommer05 on Sep 15, 2009 4:33 PM EDT up reply actions  

That's sweet.

I really like the wOBA by component idea, too.

by Sky Kalkman on Sep 15, 2009 10:31 PM EDT up reply actions  

If we were just going to do one player at a time, and not go stacked bars...

I think it would help to have the average rates notated somehow, and maybe the 80th percentile or something like that. So while the 1B piece might always be larger than the HBP piece, you could better see that Player X gets value from HBP than usual while less from 1B.

by Sky Kalkman on Sep 15, 2009 10:32 PM EDT up reply actions  

An idea for a graph

This wouldn’t create any new data, but it would be a handy-dandy visual reference to see how a player performs at a glance:

I’m not even sure what they’re called, but if you can imagine a standard four-quadrant graph, with each axis representing a different metric: one for AVG., one for OBP. one for SLG., and one for say… UZR you’d end up with an irregular diamond shape depending on that player’s skills. These sorts of graphs get used a lot in video games to show a character’s statistics visually, and I think would really work for baseball — with some tweaking necessary. You’d need to find a way to illustrate negative UZR, like making a 0 rating actually stand off the axis somewhat. I apologize for being such a layman, but hopefully you can get the gist of what I’m talking about.

by RubioNubio on Sep 11, 2009 8:18 PM EDT reply actions  

I was thinking of that graph when Tommy asked

I cannot recall what it’s named though, you’re right.

I don’t think UZR would work in this case, as it isn’t in the same units as the rest of the measures. For the other three, the least you would have to do is put it in a + scale with an average of 1 or 100. In that sense, we don’t have to deal as much with the fact that each of the values are in a radically different scale. I would prefer also to see stats that don’t auto include average into this. ISO works as a SLG replacement, but I can’t think of anything for OBP. I don’t want to use BB%, because it doesn’t illustrate everything about OBP.

by SFiercex4 on Sep 12, 2009 8:28 PM EDT up reply actions  

Somewhat related, here's a link to a blog of a teacher that is in love with visual media.

http://blog.mrmeyer.com/?cat=47

This guy loves the Feltron.

I used to work with an old man that told me. Son, every workplace has a dumbass, if you don't have one where you work, then I'm afraid you're it.

by Warden11 on Sep 12, 2009 2:02 AM EDT reply actions  

Great Chart

This wonderful visualization is simple, only takes a minute to figure out, and conveys such information.

Jon Peltier

by JonPeltier on Sep 12, 2009 8:43 AM EDT reply actions  

Pena

has he really not hit a ball straight away this year? thats kinda amazing.

by jamiethekiller on Sep 14, 2009 10:46 AM EDT reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
Context Neutral Run and RBI projections
Small
Free Agent Compensation
Img_0001_small
Value of Various Plate Approaches
Strike_three2_small
Effect of Foul Area on Strikeouts: AL 1954-68: Erratum
Small
Baseball on a stick
Small
Player Evaluating Statistic
Baseball_small
Rays Outfield: Cheap but Extremely Productive
Small
A new xBABIP
Small
Jack Morris "pitching to the score"
Strike_three2_small
Foul Area and Differences in SO: AL vs NL

+ New FanPost All FanPosts >

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Picture-6_small Chris St. John

Btbpro_small Dave Gershman

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung