If you're like me, you like seeing numbers whenever you can. Objective, tangible, concrete—there's a certainty that comes from knowing, say, that I had a 50% walk rate in fourth-grade kid pitch that would be missing if I merely told you that I made Carlos Santana look like a free-swinging hack or that I was too short for nine-year-old pitchers to find my strike zone.
That's why, at the risk of sounding like these people, one of my biggest pet peeves is incomplete data. So when the most complete All-Star vote totals were released last month with only the top eight players at each position (except for outfield, where the top 24 were released) listed (MLB originally posted a much more official-looking press release here, but it has since been removed) I felt like I was missing something.
I understand why the process isn't fully transparent—no one who has a say in it has anything to gain by revealing who the least popular players in the league are and how little support they got—but it's frustrating not knowing how many people wanted Adam Dunn and Kendrys Morales to play in the Midsummer Classic even though the former had been a terrible-hitting DH and the latter had missed the entire first half of the season.
So I decided to make my own estimates for how many votes the average unlisted player received while making as few assumptions about things we do not know as possible.
Unfortunately, the specific vote totals aren't the only vital pieces of information that are inconveniently missing: we also have no idea what the total number of votes cast was. We know that fans cast 32.5 million ballots, but we don't know how many votes each balloter submitted. Simply multiplying 32.5 ballots times 17 positions doesn't work, because not everyone votes for every spot. There are some fans who accidentally vote for two players in the same category or simply skip a position or two, either intentionally or out of forgetfulness. And then there are the homers who vote only for their teams' candidates and leave the other league blank, or the fans who bother to vote only for their favorite players.
You might think those effects would be negligible, or at least relatively small, but it turns out that's not the case. To illustrate just how prevalent incomplete ballots are, the top 136 vote-getters combined for 310,830,344 votes—an average of just under 9.6 votes per ballot. Given the incredibly unequal distribution of All-Star votes, that the best 54% of players got only 56% of possible votes makes it clear that the average voter didn't come anywhere close to filling out his or her ballot in full.
If we assume that each ballot had an average of 11 votes—anything lower seems like a real underestimation, and starting with anything higher than that makes the final results higher than some top-eight players got—that leaves 46,669,656 votes unaccounted for in the official results. Dividing those amongst the 17 categories, that gives us an extra 2,745,274 votes at each position. Splitting those votes evenly among the six unlisted players at each AL position and eight at each NL position, we get the following figures for the average vote totals for players outside the top eight:
- AL: 457,546 votes
- NL: 343,159 votes
You may have noticed that this calculation was relatively simple, and ignored a number of factors that could have changed the numbers. Here are some things I left out, and my reasons for why:
- Players' worthiness/popularity. Not all low finishers are created equal. I would assume, for example, that Hanley Ramirez got more votes than Ian Desmond, and that Matt Wieters (who ended up making the team) got more support than Jason Kendall (who hasn't played a single game in 2011). Then again, I would have expected Ramirez to beat Yuniesky Betancourt (who placed fourth among NL shortstops) and the fans somehow gave Yorvit Torrealba (fifth place among AL catchers) more love than Wieters. We fans are hard to predict, and we have nothing on which to base individual estimates. Hence, the average is the best we can do.
- Positional adjustments. The top AL first basemen got nearly 1.5 million votes more than the top AL catchers, yet I assigned both groups the same estimates. Why? Because you could adjust them either way. You could assume that each position got the same number of votes, meaning the catcher picks were probably distributed more equitably, and that the worst catchers got more support than the worst first basemen. On the other hand, there's an argument to be made that the heavily represented positions are the ones the fans care about most, and therefore the worst first basemen probably got more votes than the worst catchers. I find the first idea more convincing, but given the possibility that more people voted for the worst first basemen and the fact that the differences weren't huge (every position got between 17.5 and 19.4 million votes if we divide outfield totals by three), leaving it alone seemed like the most pragmatic solution.
- League adjustments. I toyed with the idea of dividing the excess votes directly among the 118 unlisted players instead of splitting them evenly between each position—thus making players in both leagues equal—but the more I thought about it the more it seemed wrong. The National League has more teams than the American League does, but they don't necessarily have more fans. You could say that the top AL players got more votes (AL averaged 113.7 million votes per position, NL averaged 95.0 million) because NL fans split their votes 16 ways instead of 14, but you could also say that franchises like the Yankees, Red Sox, and Rangers (you might not think of them as a big-market team, but somehow Torrealba and David Murphy combined for over 3 million votes) have more fans and voters. Again, it was a situation we don't know enough to make an informed adjustment.
- Write-in votes. Despite my valiant efforts to send Ryan Roberts to Phoenix (or whatever you call it when a player goes to an All-Star Game in his home park), I realize that whatever impact write-in votes have is negligible, if MLB even bothers counting them at all.
In the near future, I'll be posting the results of some studies I'm doing about what All-Star votes can tell us about baseball fans, and when I need complete vote totals beyond what MLB has made public, these are the estimates I'll be using for unknown players.