If you're going to make a visualization of any data (baseball or otherwise), here are a few recommendations to make it more effective.
#1 Know Your Story
Every graph should tell a story. If it's not, then why are you making it? What story are you trying to tell? What do you want the reader to get out of it? If you aren't clear about what the reader should take away from the graph, usually it's time to think about what you're trying to express before trying to make a graph of it. So for this example, I am going to compare the three career home run leaders of all time.
#2 Pick an Appropriate Graph Type
There are three main graph types: - Bar Graphs - Line Graphs - Pie Charts Bar Graphs are a good default. 98% of the time, if you're comparing things, a bar graph is a good place to start. When you use bar graphs, there is one major rule that you have to follow: always start the main axis at 0. We judge the categories by the height of the bar, so if you don't start from 0, it warps the data. Here's an example:
Wow! Bonds DESTROYED Ruth's record, didn't he? Only he didn't. Here's the same graph with the axis starting at 0:
Tells quite a different story, doesn't it?
One of the times you don't want to use bar graphs is when you're talking about something that happened over time. For instance, if you are going to look at how many home runs each of those players had by age, a line graph is a better bet:
If I do it with a bar graph, it gets messy, and harder to read:
Pie charts are the oddball of the group. Humans don't judge areas of circles too well, so pie charts are really only good if you're dealing with 2-3 categories, so that it's easy to eyeball. And the percentages always have to add up to 100%.
For instance, if we want to look at how often a player strikes out, walks, or hits the ball, a pie graph is a mighty fine choice:
#3 Don't Overdo It
Sometimes you have a lot of data, and you get tempted to try to put it all on one graph. Let's say you want to show how many home runs each player got along with how many plate appearances he had each season. You could throw them on the same graph like this:
Now the problem is that the message gets muddied. What exactly are you trying to say with that added data? If you want to show the different in PA/HR by age, then maybe it's a better bet to make a second graph that focuses just on that:
More graphs isn't necessarily a bad thing if it helps you tell your message better.
#4 Focus the Message
When you create a chart in Excel, it really makes it ugly. Look at the first graph I created using Excel Defaults:
Here are some of the problems:
- The background is too dark
- The gridlines are too strong
- The axis is messed up (it doesn't start from zero!)
- The title and legend duplicate information
- The player names are tiny
- The bar graphs all have shadows for some reason
What is the story of this graph? It shows how many home runs the three home run leaders have. So we want to focus on those three blue bars, who did them, and what they mean. So I eliminated the background of the graph, fixed the axis, lightened the grid lines, gave a proper title, and made the names of each player more visible:
#5 Add Color (where appropriate)
If you'll notice, most of my graphs use the same color. When we look at a graph and see a lot of color, we assume that the color means something. So color is another tool to help us tell the story. If I made the three bars in the career home run graphs different colors, our brain would tell us "Hey, the bars must be a different color for a reason!" and it would spend a little time trying to figure out what the color is telling us.
When there's a reason to add color (for instance, in the pie graph) it's a good idea to be consistent and to make sure that the color adds to the story, rather than to distract it. I used red for strikeouts because they are "bad", blue for walks because they are "good", and grey for the rest because they are neutral (and not part of the story, but need to be there to make sure the pies add up to 100%).
#6 Resist the Urge to Pretty-up the Graph with Chart Junk
A lot of the graphs we see on a daily basis are prettied up. People add 3D effects, drop shadows, gradients, etc. But those things don't typically add to the story we're trying to tell. They just "look cool" so people want to use them. Adding chart junk to your graphs is like putting a spoiler on a Honda Civic -- maybe it looks like it should go faster, but it's really just weighing the car down. For instance, let's say I get the urge to pretty-up my pie graphs. What was pretty obvious before suddenly becomes a lot less obvious:
It's a lot harder to compare the slices in 3D, because 3D graphs distort data. We aren't so good at adjusting perspective in our head, so we lose a lot of ability to analyze the data. And the gradients add absolutely nothing but headaches since we lose the center of the grey slice making it even harder to figure out the angle (and with the angle, the size of the slices).
These are just the basics, with a really simple example. The above suggestions are just that -- suggestions. The most important thing is to keep the message at the center of the graph. Find what works best with your audience, and displays the data best, and you'll do fine.
All of the above examples were created using Microsoft Excel. Graphs from Excel can be copy-pasted into a vector data editing program like Adobe Illustrator or Inkscape to have better control over how it looks.I'm an expat living in Japan since 2003, doing sales and marketing work. More of my work is available on Henkakyuu, my personal blog. Also feel free to inspire me to use twitter more often @henkakyuu