/cdn.vox-cdn.com/uploads/chorus_image/image/36004618/57270048.0.jpg)
In my last article on extrema, we looked at the .400 hitter. Not surprisingly, the chances of such an elevated level is pretty low. But which is higher: the chances of a record-breaking 74 HR, or the .400 season? In prepping for this article, I put the question to our staff here at Beyond the Box Score. The majority (albeit in a limited sample) were of the opinion that while both are unlikely, the 74 HR season would be slightly more likely.
Must Reads
From one perspective, since 2002 there have been nine 50-HR seasons, and seventy 40-HR seasons, while there hasn't been a single season over .375 and 12 seasons over .350. So perhaps the writers' instincts are correct. Is this the case? Is there enough power out there to surpass batting averages chances? But before we get into the nitty gritty of this question, I want to talk a little bit of math.
A Little Math
Okay, so this will be a little gory. But I wanted to explain a little more of the math from last article.
Let's go back and remind ourselves of the notation. Assume that all our data points Xi are a sample coming independently from some distribution with a c.d.f. F(X) and p.d.f. f(X). Also, say that samples can be ordered from smallest to largest X(1)...X(n).
Now last time, I just gave the c.d.f and p.d.f. of the maximum X(n). Today, I want to give a quick common-sense derivation of it so anyone who wants to use this information can understand it a little better. We've ordered our sample such that X(1) ≤ X(2) ≤ ... ≤ X(n-1) (n). The c.d.f FX(a) can be defined as P(X ≤ a) for some value a.
Now, we're looking the distribution of the sample maximum X(n), specifically P(X(n) ≤ a). If the sample maximum is less than a, this means that every single sample observation is also less than a. So, this means that
Now, because we assumed that each observation is independent, we can break this down further using some basic probability rules. After that final step, we get to the form of the c.d.f. seen in the last article.
In order to get the p.d.f. of the maximum, we just take the derivative of the c.d.f., but I won't put everyone through that. But now, back to your regularly scheduled question.
The Single-Season Home Run Record
Over the 114 seasons since the American League was founded, there have been five players to hold the home run record, with Babe Ruth breaking his own record 3 times. Like the .400 hitters, we know their names, but their accomplishments deserve to be listed again.
Player | HR | Season Record Set | Season Record Broken |
---|---|---|---|
Ned Williamson | 27 | 1884 | 1919 |
Babe Ruth | 29 | 1919 | 1920 |
Babe Ruth | 54 | 1920 | 1921 |
Babe Ruth | 59 | 1921 | 1927 |
Babe Ruth | 60 | 1927 | 1961 |
Roger Maris | 61 | 1961 | 1998 |
Mark McGwire | 70 | 1998 | 2001 |
Barry Bonds | 73 | 2001 | Present |
So what are the chances a player breaks the record as it now stands? And no, the answer isn't HGH mixed with PEDs mixed with steroids to ramp up the chance of a broken record as the public might think. While unlikely, is it at least better than the chances of breaking .400?
The Chances of 74
Now, there is an inherent problem dealing with raw HR totals. Specifically, HR total is of course related to the number of plate appearances. The more chances you have to hit a home run, the more you'll hit (Generally speaking). So, instead of looking at raw HR totals, we'll look at the league environment of HR/PA. This is again bounded between [0,1] and can be modeled by a Beta distribution. And again, we're looking only at the HR/PA of qualified players, otherwise we may pick up too many season like Mike Hessman's (He of 407 minor league HRs, whom I remember watching on the 2002-04 Richmond Braves) 5 HRs in 31 PAs (16.1%) for the 2008 Tigers.
Depending on the number of PAs, we need to be looking for a HR/PA of 12.3% (600 PAs), 11.4% (650 PAs), or 10.6% (700 PAs). Since Bonds set the mark in 2001, we'll focus on the 13 seasons since then. First, we need to estimate the parameters of the Beta distributions for each year, which is done again through matching sample means and variances to the Beta distributions. This gives us the information to get the distribution for the maximum HR/PA for that season.
However, we aren't done yet. We need to convert that to home runs, so we need to account for the probability that a qualified player reaches at least 600, 650, or 700 plate appearances in that season. Assuming independence between the two events (Reaching the PAs threshold and having a HR/PA of x), we can get the following probabilities of breaking the HR record for each season from 2002-2014.
Season | P(Record) |
---|---|
2002 | 0.0229 |
2003 | 0.0183 |
2004 | 0.0271 |
2005 | 0.0205 |
2006 | 0.0478 |
2007 | 0.007 |
2008 | 0.008 |
2009 | 0.0156 |
2010 | 0.0082 |
2011 | 0.0052 |
2012 | 0.0043 |
2013 | 0.002 |
2014 | 0.0081 |
So clearly, the chance of this happening are pretty slim. For this season, it's still roughly double the chance of a .400 season, and roughly near 1%. Interestingly, the highest chance of 74 we've seen was in 2006, which featured Ryan Howard and David Ortiz going for 58 and 54 home runs. In that season's HR/PA environment, there was nearly a 5% chance of breaking the record. If that seems high to you, simulations run show similar results, with 1000 simulated seasons yielding 42 broken record seasons (4.2%).
Again, to conclude, we'll look at the most and least impressive HR/PA seasons relative to their expectation from the leaguewide environment. Not surprisingly, the best list is a homage to Babe Ruth, with Gavvy Cravath being the only non-Ruthian season in the top 9.
Season | Player | HR/PA | P(Max>Actual) |
---|---|---|---|
1920 | Babe Ruth | 0.08780488 | 0.04151227 |
1919 | Babe Ruth | 0.05350553 | 0.06106524 |
1926 | Babe Ruth | 0.07208589 | 0.09621733 |
1921 | Babe Ruth | 0.08513709 | 0.11546184 |
1915 | Gavvy Cravath | 0.03864734 | 0.15139745 |
⋮ | ⋮ | ⋮ | ⋮ |
2011 | Jose Bautista | 0.06564886 | 0.98498847 |
2004 | Adrian Beltre | 0.07305936 | 0.9878115 |
1986 | Rob Deer | 0.06043956 | 0.98910856 |
1970 | Johnny Bench | 0.06706408 | 0.98993891 |
2009 | Carlos Pena | 0.06842105 | 0.9927466 |
. . .
Data courtesy of FanGraphs.
Stephen Loftus is an editor at Beyond The Box Score. You can follow him on Twitter at @stephen__loftus.