/cdn.vox-cdn.com/uploads/chorus_image/image/62929399/usa_today_12044875.0.jpg)
It’s been just three days since the results for the Baseball Hall of Fame Class of 2019 were announced. As we all know, Roy Halladay, Edgar Martinez, Mike Mussina and Mariano Rivera received the 75 percent of the BBWAA vote necessary to earn enshrinement, and deservedly so.
Now that Hall of Fame “season” is finally over, it’s time to consider where we, the analysts, went right and wrong. I released my projections for the 2019 class back in late-December and wrote about the methodology behind the system the next day. With the results, I can now evaluate just how well I did:
Hall of Fame projections vs. actual
Player | Final | Projection | Margin of Error |
---|---|---|---|
Player | Final | Projection | Margin of Error |
Mariano Rivera | 100.0% | 99.6% | 0.4% |
Edgar Martinez | 85.4% | 77.7% | 7.7% |
Roy Halladay | 85.4% | 89.6% | 4.2% |
Mike Mussina | 76.7% | 70.4% | 6.3% |
Curt Schilling | 60.9% | 59.0% | 1.9% |
Roger Clemens | 59.5% | 61.7% | 2.2% |
Barry Bonds | 59.1% | 59.6% | 0.5% |
Larry Walker | 54.6% | 51.2% | 3.4% |
Omar Vizquel | 42.8% | 41.9% | 0.9% |
Fred McGriff | 39.8% | 28.9% | 10.9% |
Manny Ramirez | 22.8% | 23.1% | 0.3% |
Jeff Kent | 18.1% | 12.5% | 5.6% |
Scott Rolen | 17.2% | 16.2% | 1.0% |
Billy Wagner | 16.7% | 9.7% | 7.0% |
Todd Helton | 16.5% | 27.6% | 11.1% |
Gary Sheffield | 13.6% | 10.6% | 3.0% |
Andy Pettitte | 9.9% | 15.6% | 5.7% |
Sammy Sosa | 8.5% | 9.8% | 1.3% |
Andruw Jones | 7.5% | 7.8% | 0.3% |
Michael Young | 2.1% | 1.6% | 0.5% |
Lance Berkman | 1.2% | 3.6% | 2.4% |
Roy Oswalt | 0.9% | 1.6% | 0.7% |
AVERAGE MARGIN OF ERROR | -- | -- | 3.5% |
MEDIAN MARGIN OF ERROR | -- | -- | 2.3% |
ROOT MEAN SQUARE ERROR | -- | -- | 4.8% |
In short, I’m pretty happy with the results. By no means was I the best at projecting the vote among the nine Hall of Fame forecasters. But, as Nathaniel Rakich of FiveThirtyEight correctly pointed out, my model was only able to handle the first 50 ballots, inherently making it less accurate than the others. Still, however, you have to tip your cap to those who were able to come, on average, within two or even one percentage point of the actual results. Jason Sardell took home the Hall of Fame projection crown with an average margin of error of 0.9 percent and a root mean square error (RMSE) of just 1.5 percent. To compare, my average margin of error was 3.5 percent and my RMSE was 4.8 percent.
Clearly, I had some big misses.
Todd Helton was my biggest miss (11.1 percentage point margin of error), though I think part of it had to do with my sample size. Helton received 14 votes on the first 50 public ballots, or 28 percent. He would go on to receive just 27 more votes on the next 184 pre-announcement public ballots, or 14.7 percent. There’s no real plausible reason as to why this may have been the case. It just happened to be an abnormal sample.
I think sampling issues also resulted in a big miss for Billy Wagner (7.0 percentage point margin of error). At the time that I ran my model, Wagner had just four (!) votes among the first 50 voters, or just 8.0 percent. Wagner went on to receive 36 more votes among the next 184 voters — for a whopping 19.6 percent rate. Like with my overestimation on Helton, I vastly underestimated Wagner’s total percentage.
In my mind, my big Wagner miss could have been avoided had I considered the future votes that were coming. Across all 15 players who returned to the ballot this year, just a total of 36 votes were lost through 254 ballots. That means that each candidate lost, on average, just 2.4 votes. Knowing this, I could have concluded that many votes for Wagner were still to come. He received 26 votes across 247 pre-announcement, public ballots in 2018. Assuming that all of these voters were to vote for Wagner again in 2019, I could have been much more accurate. This doesn’t ring as true for Helton, who was in his first year of eligibility on the ballot. Thus, there was no past data to reference.
I also missed pretty badly on Fred McGriff (10.9 percentage point margin of error), but this miss can’t be (fully) attributed to a poor sample. After 50 ballots, McGriff was trending at 30.0 percent of the vote. He finished with 39.7 percent of the public, pre-announcement vote, a total significantly higher than the first 50 ballot sample. Still, when building in McGriff’s projection, I didn’t include an adjustment to take into account the fact that it was his final year on the ballot. Generally speaking, candidates who are in their last year of eligibility tend to see their support grow. And, considering my model assumed that the private vote totals would remain completely stagnant, I missed badly on McGriff.
In the same breath, I can say that the final year of eligibility adjustment was necessary for Edgar Martinez (7.7 percentage point margin of error). I still correctly projected Martinez to be elected, but I had him getting just 77.7 percent of the vote. Had I built in an adjustment, I believe I could have done much better on Martinez’s final total.
The last major error that I made was in regards to Mike Mussina (6.3 percentage point margin of error). I had Mussina missing election by a wide margin. Though he did come close to missing election, resulting in what wasn’t the worst margin of error, it would have been nice to correctly predict the full class. Regardless of what my model said, however, I do think Mussina deserved his election, and I am glad that he will be enshrined.
While I have spent plenty of time discussing the big failures of my algorithm, I do think it’s also important to point out the successes. For 15 of the 22 projections I made, I had a margin of error of less than five percentage points. And, for 10 of 22, I had a margin of error of less than two percentage points. I most accurately projected Manny Ramirez’s and Andruw Jones’ respective vote totals, with just a 0.3 percentage point margin of error. Even Larry Walker, who made incredible gains among public voters, still fell just 3.4 percentage points above my projection.
Next, I would like to discuss the biggest assumption of my model. To create my algorithm, I operated under the assumption that a player’s private ballot vote share would remain exactly equal from 2018 to 2019. So, here is my margin of error for my projected private vote rate versus the actual private vote:
Private Ballot projections vs. actual
Player | Actual Private% | Proj. Private% | Margin of Error |
---|---|---|---|
Player | Actual Private% | Proj. Private% | Margin of Error |
Mariano Rivera | 100.0% | 98.3% | 1.7% |
Edgar Martinez | 77.8% | 52.4% | 25.4% |
Roy Halladay | 76.6% | 88.3% | 11.7% |
Mike Mussina | 69.0% | 46.7% | 22.3% |
Roger Clemens | 48.5% | 45.7% | 2.8% |
Barry Bonds | 48.0% | 41.9% | 6.1% |
Curt Schilling | 47.4% | 32.4% | 15.0% |
Omar Vizquel | 45.6% | 45.7% | 0.1% |
Larry Walker | 39.8% | 23.8% | 16.0% |
Fred McGriff | 38.6% | 28.6% | 10.0% |
Manny Ramirez | 21.6% | 21.0% | 0.6% |
Jeff Kent | 17.5% | 14.3% | 3.2% |
Todd Helton | 15.8% | 26.3% | 10.5% |
Billy Wagner | 15.2% | 12.4% | 2.8% |
Andy Pettitte | 14.6% | 14.3% | 0.3% |
Gary Sheffield | 13.5% | 10.5% | 3.0% |
Scott Rolen | 12.9% | 4.8% | 8.1% |
Andruw Jones | 7.0% | 12.4% | 5.4% |
Sammy Sosa | 5.8% | 3.8% | 2.0% |
Michael Young | 2.9% | 0.3% | 2.6% |
Lance Berkman | 1.8% | 2.3% | 0.5% |
Roy Oswalt | 1.2% | 0.3% | 0.9% |
AVERAGE MARGIN OF ERROR | -- | -- | 6.9% |
MEDIAN MARGIN OF ERROR | -- | -- | 3.1% |
ROOT MEAN SQAURE ERROR | -- | -- | 9.9% |
I won’t bore you with the individual player error discussion here, but as you can see, the assumption that private vote share remains equal from year-to-year doesn’t ring true for everyone. For some candidates, it worked well. But for others, particularly those who experienced big gains this year (Martinez, Mussina, Walker), I was far off.
:no_upscale()/cdn.vox-cdn.com/uploads/chorus_asset/file/13711469/2018_Private__vs._2019_Private___1_.png)
Generally speaking, private votes from year-over-year do correlate pretty strongly with one another. But, even this graph has an r-squared value of 0.90, meaning that 10 percent of the variability in the 2019 private vote share cannot be explained by the trend line. That allows for a lot of error, and I paid the price.
What does this mean going forward?
I’m not quite sure yet. I have plenty of ideas to improve my model for next year, based on what I’ve read from other forecasters and on other factors (like the 10th year of eligibility factor). But, until I have everything fully worked out, I don’t plan on discussing any specific methodology quite yet.
One thing, though, is for certain: I can’t wait for 2020.
Devan Fink is a Featured Writer for Beyond The Box Score. You can follow him on Twitter @DevanFink.