I've been playing around with the fan voting totals from the 2011 MLB All-Star Game for almost a year now. My quest to understand the way baseball fans think has driven me to comb through the numbers with excruciating detail, with my findings ranging from the hotly controversial to the meaningless miscellany.
But today is special. This is the post I have been waiting to write for over 11 months. And with the 2012 Midsummer Classic just a fortnight away, there was no sense in putting it off any longer.
I've tried to model how the fans should have voted. I've looked at what tools catch voters' eyes when they fill out their ballots. I've attempted to quantify fans' biases, and thus project how the results would have looked if homerism was out of the picture. And so, after a year of beating around the bush, it's finally time to ask: How well do different statistics correlate with All-Star votes?
Before we dig into the data, there are a few caveats and things to note. All numbers were taken from FanGraphs at the 2011 All-Star Break. For the nearly half of the players on the ballot whose vote totals Major League Baseball did not release (they published only the Top 8 at each position) I used my own previously calculated estimates of their votes. Most people don't sit down with some stat sheets when they fill out their ballots, so good relationships between votes and stats probably suggest crossover between how they assess players rather than deliberate thought. And while the total number of players (254) may seem large, we are using only one year's vote totals for this analysis and it is possible that 2011 was a fluke.
With that out of the way, let's start the results with what are probably the most-cited offensive statistics on the internet: the fantasy five (batting average, home runs, RBI, runs scored, and stolen bases) and wins above replacement: (click to embiggen)
Most voters have never heard of it. Even more don't know what it is. And a substantial number of those that do dismiss it as misguided an unrealistic. But with apologies to Edwin Starr, it turns out WAR is good for determining All-Star worthiness. Consciously or not, the fans have endorsed the Holy Grail of sabermetrics (or what would be if there were a consensus model) as a better means of identifying the game's best players than the traditional numbers they're used to seeing on the backs of their baseball cards. I think that's worth coming out of our basements to celebrate.
Now that I've whetted your appetite, here's a larger sampling of how different stats correlated with All-Star votes last year: (click to embiggen)
So fans like power, clutchness and getting runs across the plate. Contact ability and plate discipline take a backseat. Defense (at least, measured by UZR) and speed can go either way. The thing I find most interesting about this is the small but clear difference between wRC+ and wOBA—at the risk of reading too much into meaningless data, I'm quite proud of the MLB fanbase for understanding the importance of run environments and park factors.
In case anyone is as fascinated by this stuff as I am, here's how the votes correlated with every stat I could easily get my hands on. Note that Star Power (the basis of my model for unbiased fan voting) had a .666 correlation with the balloting results—i.e., stronger than any of the relationships below.
So there it is. That's how we vote. They may not ever always make the right choices and it's clear that both guys and chicks dig the long ball, but in general I like how the fans think.
You can read more of Lewie's work on his blog, Wahoo's on First. Follow him on Twitter: @LewsOnFirst