Rules of Thumb for Visualization
If you're going to make a visualization of any data (baseball or otherwise), here are a few recommendations to make it more effective.
#1 Know Your Story
Every graph should tell a story. If it's not, then why are you making it? What story are you trying to tell? What do you want the reader to get out of it? If you aren't clear about what the reader should take away from the graph, usually it's time to think about what you're trying to express before trying to make a graph of it. So for this example, I am going to compare the three career home run leaders of all time.
#2 Pick an Appropriate Graph Type
There are three main graph types: - Bar Graphs - Line Graphs - Pie Charts Bar Graphs are a good default. 98% of the time, if you're comparing things, a bar graph is a good place to start. When you use bar graphs, there is one major rule that you have to follow: always start the main axis at 0. We judge the categories by the height of the bar, so if you don't start from 0, it warps the data. Here's an example:
Wow! Bonds DESTROYED Ruth's record, didn't he? Only he didn't. Here's the same graph with the axis starting at 0:
Tells quite a different story, doesn't it?
One of the times you don't want to use bar graphs is when you're talking about something that happened over time. For instance, if you are going to look at how many home runs each of those players had by age, a line graph is a better bet:
If I do it with a bar graph, it gets messy, and harder to read:
Pie charts are the oddball of the group. Humans don't judge areas of circles too well, so pie charts are really only good if you're dealing with 2-3 categories, so that it's easy to eyeball. And the percentages always have to add up to 100%.
For instance, if we want to look at how often a player strikes out, walks, or hits the ball, a pie graph is a mighty fine choice:
#3 Don't Overdo It
Sometimes you have a lot of data, and you get tempted to try to put it all on one graph. Let's say you want to show how many home runs each player got along with how many plate appearances he had each season. You could throw them on the same graph like this:
Now the problem is that the message gets muddied. What exactly are you trying to say with that added data? If you want to show the different in PA/HR by age, then maybe it's a better bet to make a second graph that focuses just on that:
More graphs isn't necessarily a bad thing if it helps you tell your message better.
#4 Focus the Message
When you create a chart in Excel, it really makes it ugly. Look at the first graph I created using Excel Defaults:
Here are some of the problems:
- The background is too dark
- The gridlines are too strong
- The axis is messed up (it doesn't start from zero!)
- The title and legend duplicate information
- The player names are tiny
- The bar graphs all have shadows for some reason
What is the story of this graph? It shows how many home runs the three home run leaders have. So we want to focus on those three blue bars, who did them, and what they mean. So I eliminated the background of the graph, fixed the axis, lightened the grid lines, gave a proper title, and made the names of each player more visible:
#5 Add Color (where appropriate)
If you'll notice, most of my graphs use the same color. When we look at a graph and see a lot of color, we assume that the color means something. So color is another tool to help us tell the story. If I made the three bars in the career home run graphs different colors, our brain would tell us "Hey, the bars must be a different color for a reason!" and it would spend a little time trying to figure out what the color is telling us.
When there's a reason to add color (for instance, in the pie graph) it's a good idea to be consistent and to make sure that the color adds to the story, rather than to distract it. I used red for strikeouts because they are "bad", blue for walks because they are "good", and grey for the rest because they are neutral (and not part of the story, but need to be there to make sure the pies add up to 100%).
#6 Resist the Urge to Pretty-up the Graph with Chart Junk
A lot of the graphs we see on a daily basis are prettied up. People add 3D effects, drop shadows, gradients, etc. But those things don't typically add to the story we're trying to tell. They just "look cool" so people want to use them. Adding chart junk to your graphs is like putting a spoiler on a Honda Civic -- maybe it looks like it should go faster, but it's really just weighing the car down. For instance, let's say I get the urge to pretty-up my pie graphs. What was pretty obvious before suddenly becomes a lot less obvious:
It's a lot harder to compare the slices in 3D, because 3D graphs distort data. We aren't so good at adjusting perspective in our head, so we lose a lot of ability to analyze the data. And the gradients add absolutely nothing but headaches since we lose the center of the grey slice making it even harder to figure out the angle (and with the angle, the size of the slices).
These are just the basics, with a really simple example. The above suggestions are just that -- suggestions. The most important thing is to keep the message at the center of the graph. Find what works best with your audience, and displays the data best, and you'll do fine.
All of the above examples were created using Microsoft Excel. Graphs from Excel can be copy-pasted into a vector data editing program like Adobe Illustrator or Inkscape to have better control over how it looks.
I'm an expat living in Japan since 2003, doing sales and marketing work. More of my work is available on Henkakyuu, my personal blog. Also feel free to inspire me to use twitter more often @henkakyuu
23 comments
|
4 recs |
Do you like this story?
Comments
Good stuff, jmaciel.
I think the number one thing for me is to keep it reeeeeeeeeal simple.
And for the love of god never use Trebuchet.
On Twitter: @baseballtwit
Totally Agreed on Keeping it Simple
I would add Comic Sans to the list of fonts never ever to use.
And on keeping it simple, I think knowing your message and looking at your own work objectively will give you a good shot at keeping it simple. The more data there is, the harder you have to think about what to show, but in the end it’s worth it when you get something simple and understandable.
My Work: Henkakyuu
This is exactly what I was thinking.
Very good.
See Data Differently: Beyond the Box Score | @justinbopp
Create. Coach. Conquer! Two Out Rally, Baseball MMORPG | Facebook
If you like it
I may do a follow-up about more complicated graphs (x-y scatter plots, plotting 3-4 vectors on a single graph, using color and size to express data, etc.).
My Work: Henkakyuu
Yes.
See Data Differently: Beyond the Box Score | @justinbopp
Create. Coach. Conquer! Two Out Rally, Baseball MMORPG | Facebook
This is a great place to start
I was actually talking with Dave Gershman the other day via Twitter about doing a graphics primer series. I love that you started with the absolute basics. Some people try to jump in to the crazy complex stuff without having a solid foundation and it gets really messy.
Let me know if you need any help with this / would like a contributor to this “series.”
Feel free to contribute!
I start with the basics because those are the things that I had the most trouble with when I was learning. I went for flashy over informative, and complex over simple, and had trouble getting anyone to actually use my graphics. A total rethink from the ground up got me most of the way to where I am.
I think that the best thing to do would be to troll Fangraphs or THT for some data sets that are often used in articles, and do quick tutorials on possible ways to visualize them. Create a sort of “standard” with Excel tutorials to show how to make them quickly and easily with data downloaded from Fangraphs and/or Baseball Reference.
The hard part is figuring out which sorts of data are used but not graphed often enough.
My Work: Henkakyuu
Self Promotion
I actually just wrote an R graphics basics tutorial on AN. It uses a couple of datasets: one from FG, one from BBRef.
Linked here. Since I posted it on A’s blog, it’s A’s related.
"Loyal? I'm the most loyal player money can buy." - Don Sutton
I love this
Excellent points about viz basics.
One thing I struggle with is the decision of whether or not to truncate the x or y axis at times. In some cases doing so does exaggerate the size of an effect. But if you are dealing with a population of numbers where there is a hard floor at some point above zero I think it is okay, especially if extending the axis to zero would make it harder to observe the effect. If you are upfront about the size of the effect I think truncating the axis can be done. But I agree it can lead to trouble if not done properly.
Truncation is fine
So long as you are up-front with it, and preferably if you actually show the truncation in the graph:

For line graphs it’s not much of an issue, because you’re focusing horizontally (rather than vertically) and so need to check the axis. But if you’re going to do a bar graph, better to give the visual hint that they aren’t starting at zero.
My Work: Henkakyuu
I've never done that and
now I will. Freaking beautiful!
See Data Differently: Beyond the Box Score | @justinbopp
Create. Coach. Conquer! Two Out Rally, Baseball MMORPG | Facebook
Good work, Josh
Although I don’t totally agree with your position on pie charts. When trying to convey exact values, pie charts are suboptimal. When trying to convey relative differences, or the magnitude of relative differences, pie charts are great however—even beyond 2 or 3 categories.
I used these two pie charts in a post on my own site a few weeks ago:


Now, if I were trying to convey how many first place votes any one of these teams have received, this approach would be undesirable. However, in trying to convey the unipolar configuration power in one league to the multipolar configuration of power in another, I feel this approach was ideal (decorative touches aside)
As an aside, I think this would be a great BtB series, and I’d love to be involved. I could submit a piece on designing charts for the color deficient.
Blogger and Editor, Rational Pastime Blog. Twitter: @RationalPastime.
I like that one on men's hoops.
Posted it at RCT.
Glad I came, just wish I hadn't stayed so long.
People ask me what I do in winter when there’s no baseball...Rock Chalk Talk
Thanks, Warden
Keep in mind I wrote this preceding the UConn Women’s historic 89th consecutive win, so the data set stops with that week’s rankings.
Here’s the original post, if you’re interested: http://www.rationalpastime.com/2010/12/dominance-in-numbers-and-pictures-uconn.html
Blogger and Editor, Rational Pastime Blog. Twitter: @RationalPastime.
Not a fan
I’m also not a fan of pie charts. I agree with your general point—that they’re good for relative proportions—but side-by-side circles make it harder to discern relative differences (particularly when extra visuals are thrown in). In that case, I’d use side-by-side bar graphs, set to 100%.
Also, remember the principle of salience. Salience is why you “explode” a piece of a pie, to draw attention to it. In your first example, why explode Tennessee and Other? You’re drawing attention to them. Why?
In the other pie, you explode all the pieces! It calls out nothing in particular, and is only useful because the graphics you’ve thrown into the pie make it harder to discern the different pieces in the first place.
I always end up reading the labels
Pie charts are great for crude comparison of relative sizes, totally agreed. And very poor for exact values. But to even get a good chance at understanding the relative values, the most important part of the graph is the angle in the center (by comparing different angles to see which is “bigger” we have a decent chance of guessing which slice is bigger than which). So I would recommend not adding the 3D effect and shadow, and not shifting the pieces apart so that it’s harder to judge angles by comparison. I would also make sure that the Duke Logo doesn’t slice off the inside corner of that slice.
99.9% of the time, a bar graph would work just fine for the same data, and give a better understanding of it with an easier comparison. Pie charts are great because everyone understands them easily, but they’re not so great because they are tougher to make comparisons in than other graph types. Here is a more comprehensive article on the trials and tribulations of pie charts:
http://eagereyes.org/techniques/pie-charts
My Work: Henkakyuu
Except that bar graphs are not ratios
And pie charts are. It doesn’t make sense to use a method designed to present exact values when what you’re trying to convey is ratios.
Blogger and Editor, Rational Pastime Blog. Twitter: @RationalPastime.
The point isn't exact values
The point is that we are better at judging comparative size with bars than we are with pie graphs. Though a bar graph may not add up to 100% (which is the sole super-benefit of pie graphs), good labels/title should be able to handle that issue I’d hope.
Here’s an example (made in Illustrator for simplicity’s sake):

I’m going to make a bold statement, forgive me if it sounds TOO bold, but anyone who says that they can tell that answers C+D = B by size easier on the pie chart than they can on the bar chart are talking out their arse. And that is a ratio thing. Pie charts have their uses (especially for 2-choice questions, but occasionally 3-choice), but a bulk of the time you’re better off going with the bar chart.
My Work: Henkakyuu. Entice me to use twitter more @henkakyuu
I have an affinity for this post.
See Data Differently: Beyond the Box Score | @justinbopp
Create. Coach. Conquer! Two Out Rally, Baseball MMORPG | Facebook
Nice job!
I’m late to this post, but I just want to say, nice job. Excellent rules of thumb. I would add one basic one: Compare any graph you want to make to the alternative of just presenting the data. Does it actually improve the message?
I see graphs used when a simple table of data would better get the message across. And I often see graphs that are actually worse at communicating the message than data tables.
by studes on Jan 9, 2011 4:19 PM EST reply actions 1 recs
Agreed. And
guilty.
See Data Differently: Beyond the Box Score | @justinbopp
by Justin Bopp on Jan 10, 2011 12:09 PM EST up reply actions

by 





































