Beyond the Box Score: An SB Nation Community

Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Around SBN: ACC Power Rankings: 2.8

Daily Box Score 9/11: Visualizations

Option4_medium

I don't know about you all, but I've been enjoying the phoenix-like rebirth of our regular feature Graph of the Day. Much credit is owed to Justin Bopp and Walter Fulbright, who clearly know their layers from their masks. So I thought this might be a good time to talk about the elements of a good visualization.

Star-divide

Table of Contents

Data
Presentation
Finished Product
Discussion Question of the Day

 

Data

The single most important function of any visualization is to convey data. Different bits of data are best visualized in different ways.

The preeminent authority on the presentation of data in visual form is Edward Tufte, and his most acclaimed work is The Visual Display of Quantitative Information. In it, Tufte gives examples of what does and does make for an effective visualization.

One of Tufte's most essential points is that the manner of visualization is dictated by the data, not the other way around. Even though Microsoft Excel makes it very easy to turn just about any data set into any type of chart doesn't mean it's a good idea. For example, when the sum of the data is important, a pie chart often fails, because it does not give any visual indication of the magnitude of the sum. It is a representation of relative size, not absolute magnitude.

A related point is that good visualizations are dense. That doesn't mean they are hard to unravel, but rather that they contain a great deal of information per square inch (or pixel). Tufte's classic example is this visualization, created by Charles Joseph Minard in 1812. (For a sarcastic cautionary tale, see Tommy Rancel's incredible pie chart here.)

Now, it might take you a minute to untangle that, especially if you don't speak French (zut alors!). But what you're seeing are several visualizations at once: (1) a map of Napoleon's progress toward, and retreat from, Moscow (note the rivers), (2) a representation of the size of his armies (and their offshoots, with width indicating size), and most incredibly, (3) a representation of the temperature at each stage of the march (see the line chart at the bottom, with corresponding indicators to the chart above--higher is colder).

This is what we would call a dense visualization. Did I mention he did it in 1812? Think about that the next time you create a chart in Excel...

Presentation

Presentation is an important part of any visualization, and the better a chart looks, the easier it is to spend lots of time deciphering it. I can claim no expertise, but there are a few basic guidelines for clean presentation of visual information.

First, the use of colors should be such that they are easy to distinguish (even for those who suffer from color-blindness). But more than that, it is important to remember that color is potential axis upon which information can be conveyed. Color coding is a good way to visualize one variable, thus increasing the density of the information presented.

Second, a good chart is clearly labeled. Notice the plethora of labels included on Minard's chart above. There are dozens of labels, and they are all as close, spatially speaking, to the item they label as possible. The more complex a visualization gets, the more difficult it can be to follow lines connecting labels to data points or series. One of things I appreciate about Justin and Walter's charts is they way they superimpose images of players onto the bars representing their performance (for an example, see here). It's an easy way to create a label that is tied very closely to the data.

Finished Product

Part of the inspiration for me venturing down this road was the work of Dave Allen, who writes for Fangraphs and Baseball Analysts. He's a programming-savvy guy, and he is probably best-known for using the language R to create compelling heat maps based on PITCHf/x data. You can see a characteristic example of his work here. In the vast world of PITCHf/x graphs, Allen's are among the easiest to decipher, even to the lay fan. For that reason alone, his work is praise-worthy.

But I was particularly impressed with this particular chart, taken from this article. Allow me to reproduce:

X_ang_medium

via baseballanalysts.com

Allen explains:

Most batters have more power on pulled balls and pull more inside pitches. So is Pena's outside power from opposite field power or from an ability to pull outside pitches[?] To examine this I took inspiration from Max's work looking at relationship the between the horizontal location of a pitch and the horizontal angle of the resulting ball in play. In this case I just looked at Pena's HRs. Remember that -45 is the third base line and 45 is the first base line.

Here's what's great about this visualization: it's dense, it's high-contrast, and it conveys information in a way that could not be done without a graph. This last part is particularly important. There would be no other way to understand the central point (that Carlos Pena pulls pitches on the outside half of the plate to right field) without this kind of graph. A table couldn't; a pie chart couldn't; a bar graph couldn't; an x-y plot couldn't. 

The upshot is to reinforce the points made above, which Allen clearly understands. It also goes to show that knowledge of programming can be extremely useful in visualizing.

Discussion Question of the Day

When I showed this graph to a friend, he said he wished such a chart existed for every player. And I agree.

But it led me to another question. I have been thinking about how one might visualize AVG, OBP and SLG on a single chart to display the shape of a player's performance. 

What data series do you think would go together nicely on a graph? 

1 recs  |  Comment 18 comments |

Story-email Email Printer Print

Around SB Nation

A Plug

Aug 2009 from DRaysBay - 0 comments

Comments

Display:

Nice write-up, Tommy. (and thanks for the shout-out!)

Good points, but one cannot overemphasize the importance of the actual information conveyed. I’ve always held that a chart/graph functions as a visual summary of what the research/article/hypothesis is attempting to conclude; the reader should be able to see the chart and instantly know the point the author is attempting to make.

That’s the problem with so many charts in sabermetric research: the point of them is to make the conveyed message easier to understand, but so often the included visual is barely three lines stretched horizontally over a gray surface.

I loved the section about Dave Allen. The ability to convey data visually in unconventional means is amazing.

"What we do in life, echoes in eternity!"

by Justin Bopp on Sep 11, 2009 5:43 PM EDT reply actions   0 recs

One interesting pie chart would be

A break down of skills into their percent of an overall value stat , such as wOBA by BB, S, 2B, 3B HR, SB etc. or WAR by its major components. One relevant use would be in an article like FanGrpahs comparison between Adam Dunn and Nyger Morgan back at the time of Morgan’s trade to Washington. The author argued that the two may have similar overall values from vastly differing skill sets (OBP/SLG vs. Defense/positional).That seems like a perfect time for a nice pie chart. Mmmm pie chart.

by Slugger O'Toole on Sep 11, 2009 8:04 PM EDT reply actions   0 recs

I LOVE THIS IDEA.

I used to work with an old man that told me. Son, every workplace has a dumbass, if you don't have one where you work, then I'm afraid you're it.

by Warden11 on Sep 12, 2009 2:00 AM EDT up reply actions   0 recs

BPro does something like this on their player pages.

They do a column graph with five columns, measuring production (offense only I believe) in terms of K, BB, ISO, AVG, and uh, something else I think. Love the idea.

I think I prefer the column graph to a pie chart, because you can not only see the relative strengths of a player, but you can compared them to league-averages.

by Sky Kalkman on Sep 13, 2009 9:46 PM EDT up reply actions   0 recs

An example:

Pujols at BPro

This presents an issue, though, in that the ability to run the bases isn’t as valuable as the ability to hit for power. Do you scale things in runs or scale things in percentiles?

by Sky Kalkman on Sep 14, 2009 3:17 PM EDT up reply actions   0 recs

Right

And these are decent enough but caveats apply. First, they are projections and therefore do not give an accurate picture of actual performance. Second, as you note, they do not weight the “tools.” Look at several of these side by side (which is hard enough to do) doesn’t give much comparative value. If you could do a series of small, thumbnail-sized images , say a triangle (or really any concave polygon) with the length of each side proportionate to performance on a given metric, it would be much more useful, I think.

My inspiration for this idea was the player skill graphic from Winning Eleven 7, if anyone knows what I’m talking about. Can’t find a screenshot at the moment.

by Tommy Bennett on Sep 14, 2009 5:31 PM EDT up reply actions   0 recs

Stacked Bars??

Doing a stacked bar of wOBA so that you can put 5 or so guys of similar wOBA on the same chart would be fairly nifty

by stevesommer05 on Sep 15, 2009 4:33 PM EDT up reply actions   0 recs

That's sweet.

I really like the wOBA by component idea, too.

by Sky Kalkman on Sep 15, 2009 10:31 PM EDT up reply actions   0 recs

If we were just going to do one player at a time, and not go stacked bars...

I think it would help to have the average rates notated somehow, and maybe the 80th percentile or something like that. So while the 1B piece might always be larger than the HBP piece, you could better see that Player X gets value from HBP than usual while less from 1B.

by Sky Kalkman on Sep 15, 2009 10:32 PM EDT up reply actions   0 recs

An idea for a graph

This wouldn’t create any new data, but it would be a handy-dandy visual reference to see how a player performs at a glance:

I’m not even sure what they’re called, but if you can imagine a standard four-quadrant graph, with each axis representing a different metric: one for AVG., one for OBP. one for SLG., and one for say… UZR you’d end up with an irregular diamond shape depending on that player’s skills. These sorts of graphs get used a lot in video games to show a character’s statistics visually, and I think would really work for baseball — with some tweaking necessary. You’d need to find a way to illustrate negative UZR, like making a 0 rating actually stand off the axis somewhat. I apologize for being such a layman, but hopefully you can get the gist of what I’m talking about.

by RubioNubio on Sep 11, 2009 8:18 PM EDT reply actions   0 recs

I was thinking of that graph when Tommy asked

I cannot recall what it’s named though, you’re right.

I don’t think UZR would work in this case, as it isn’t in the same units as the rest of the measures. For the other three, the least you would have to do is put it in a + scale with an average of 1 or 100. In that sense, we don’t have to deal as much with the fact that each of the values are in a radically different scale. I would prefer also to see stats that don’t auto include average into this. ISO works as a SLG replacement, but I can’t think of anything for OBP. I don’t want to use BB%, because it doesn’t illustrate everything about OBP.

by SFiercex4 on Sep 12, 2009 8:28 PM EDT up reply actions   0 recs

Somewhat related, here's a link to a blog of a teacher that is in love with visual media.

http://blog.mrmeyer.com/?cat=47

This guy loves the Feltron.

I used to work with an old man that told me. Son, every workplace has a dumbass, if you don't have one where you work, then I'm afraid you're it.

by Warden11 on Sep 12, 2009 2:02 AM EDT reply actions   0 recs

Great Chart

This wonderful visualization is simple, only takes a minute to figure out, and conveys such information.

Jon Peltier

by JonPeltier on Sep 12, 2009 8:43 AM EDT reply actions   0 recs

Pena

has he really not hit a ball straight away this year? thats kinda amazing.

by jamiethekiller on Sep 14, 2009 10:46 AM EDT reply actions   0 recs

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?
Start posting on Beyond the Box Score »

Join SB Nation and dive into communities focused on all your favorite teams.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

770insig_small
BtB's "Ball On A Budget" Fantasy League - Discuss Participants, Payrolls and Position Eligibility

Recent FanPosts

Ds9_small
good graphing program?
Small
Predicting HR/FB Rates
Leopold_butter_scotch_southpark_small
Troy Tulowitzki vs Ryan Braun
Small
Pitchers batted ball observations
Small
Eric Byrnes: A player worth a look?
Small
Valverde Is Charging Detroit Double
Mukuro_small
Another question: About power rankings
Small
Why You Shouldn't Trade for Arroyo
Jinaz-reds-avatar_small
Last Call for BtB Sabermetric Writing Award Nominations

+ New FanPost All FanPosts >

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

If you care about newspaper coverage of MLB, read this post
Visualizing the Difference Between Offensive and Defensive Value for Catchers
First B-Pro and now ESPN. Tommy, you're growing up so fast
THT - Advancing by ground
Negro League Museum Close to Folding
It is a capital mistake to theorize before one has data. Insensibly one...
Ranking Minor League Systems Using Victors Wang's Prospect Valuations
Pitch f/x on Ricky Nolasco Stretch vs. Windup again
Veron Wells the artist.  I never knew.

http://www.vwellsart.com/
A Dream Team... in honor of Dr. King

+ New FanShot All FanShots >

BtB on Twitter

Main Feed: @BtBScore

Jeff: @jeffwzimmerman
Steve: @steve_sommer
Sky: @BtB_Sky
Dan: @dturkenk
Harry: @harrypav
Jinaz: @jinazreds
Jack: @jh_moore
Erik: @Erik_Manning
Tommy R: @trancel
Justin: @justinbopp

Subscribe to BtB via Email

Enter your email address:

Delivered by FeedBurner

Most Commented

BtB Goes Social


Managers

Wbc_029_small Jeff Sullivan

Editors

Rawlings_baseball_bigger_small Dan Turkenkopf

Limes_125_small Sky Kalkman

770insig_small Jeff Zimmerman (TucsonRoyal)

Aviles_small Justin Bopp

Authors

Roots_game_small R.J. Anderson

Jinaz-reds-avatar_small JinAZ

Face_small Harry Pavlidis

1753738656_110919ebe9_o_small vivaelpujols

Ozzie_small erik

Raysring1_small Tommy Rancel

Redcap_small SFiercex4

St_louis_cardinals_ce1141_003263_small stevesommer05

Paige_small Satchel Price