clock menu more-arrow no yes

Filed under:

Sabermetric Extrema: 74 HR Season

New, 4 comments

First there was Ned Williamson, then there was Ruth a few times, then Maris, McGwire, and finally Bonds. The question becomes, what are the odds that we see 74?

Jed Jacobsohn

In my last article on extrema, we looked at the .400 hitter. Not surprisingly, the chances of such an elevated level is pretty low. But which is higher: the chances of a record-breaking 74 HR, or the .400 season? In prepping for this article, I put the question to our staff here at Beyond the Box Score. The majority (albeit in a limited sample) were of the opinion that while both are unlikely, the 74 HR season would be slightly more likely.

From one perspective, since 2002 there have been nine 50-HR seasons, and seventy 40-HR seasons, while there hasn't been a single season over .375 and 12 seasons over .350. So perhaps the writers' instincts are correct. Is this the case? Is there enough power out there to surpass batting averages chances? But before we get into the nitty gritty of this question, I want to talk a little bit of math.

A Little Math

Okay, so this will be a little gory. But I wanted to explain a little more of the math from last article.

Let's go back and remind ourselves of the notation. Assume that all our data points Xi are a sample coming independently from some distribution with a c.d.f. F(X) and p.d.f. f(X). Also, say that samples can be ordered from smallest to largest X(1)...X(n).

Now last time, I just gave the c.d.f and p.d.f. of the maximum X(n). Today, I want to give a quick common-sense derivation of it so anyone who wants to use this information can understand it a little better. We've ordered our sample such that X(1) ≤ X(2) ≤ ... ≤ X(n-1) (n). The c.d.f FX(a) can be defined as P(X ≤ a) for some value a.

Now, we're looking the distribution of the sample maximum X(n), specifically P(X(n) ≤ a). If the sample maximum is less than a, this means that every single sample observation is also less than a. So, this means that

Maxdistexp1_medium

Now, because we assumed that each observation is independent, we can break this down further using some basic probability rules. After that final step, we get to the form of the c.d.f. seen in the last article.

Maxdistexp2_medium

In order to get the p.d.f. of the maximum, we just take the derivative of the c.d.f., but I won't put everyone through that. But now, back to your regularly scheduled question.

The Single-Season Home Run Record

Over the 114 seasons since the American League was founded, there have been five players to hold the home run record, with Babe Ruth breaking his own record 3 times. Like the .400 hitters, we know their names, but their accomplishments deserve to be listed again.

Player HR Season Record Set Season Record Broken
Ned Williamson 27 1884 1919
Babe Ruth 29 1919 1920
Babe Ruth 54 1920 1921
Babe Ruth 59 1921 1927
Babe Ruth 60 1927 1961
Roger Maris 61 1961 1998
Mark McGwire 70 1998 2001
Barry Bonds 73 2001 Present

So what are the chances a player breaks the record as it now stands? And no, the answer isn't HGH mixed with PEDs mixed with steroids to ramp up the chance of a broken record as the public might think. While unlikely, is it at least better than the chances of breaking .400?

The Chances of 74

Now, there is an inherent problem dealing with raw HR totals. Specifically, HR total is of course related to the number of plate appearances. The more chances you have to hit a home run, the more you'll hit (Generally speaking). So, instead of looking at raw HR totals, we'll look at the league environment of HR/PA. This is again bounded between [0,1] and can be modeled by a Beta distribution. And again, we're looking only at the HR/PA of qualified players, otherwise we may pick up too many season like Mike Hessman's (He of 407 minor league HRs, whom I remember watching on the 2002-04 Richmond Braves) 5 HRs in 31 PAs (16.1%) for the 2008 Tigers.

Depending on the number of PAs, we need to be looking for a HR/PA of 12.3% (600 PAs), 11.4% (650 PAs), or 10.6% (700 PAs). Since Bonds set the mark in 2001, we'll focus on the 13 seasons since then. First, we need to estimate the parameters of the Beta distributions for each year, which is done again through matching sample means and variances to the Beta distributions. This gives us the information to get the distribution for the maximum HR/PA for that season.

However, we aren't done yet. We need to convert that to home runs, so we need to account for the probability that a qualified player reaches at least 600, 650, or 700 plate appearances in that season. Assuming independence between the two events (Reaching the PAs threshold and having a HR/PA of x), we can get the following probabilities of breaking the HR record for each season from 2002-2014.

Season P(Record)
2002 0.0229
2003 0.0183
2004 0.0271
2005 0.0205
2006 0.0478
2007 0.007
2008 0.008
2009 0.0156
2010 0.0082
2011 0.0052
2012 0.0043
2013 0.002
2014 0.0081

So clearly, the chance of this happening are pretty slim. For this season, it's still roughly double the chance of a .400 season, and roughly near 1%. Interestingly, the highest chance of 74 we've seen was in 2006, which featured Ryan Howard and David Ortiz going for 58 and 54 home runs. In that season's HR/PA environment, there was nearly a 5% chance of breaking the record. If that seems high to you, simulations run show similar results, with 1000 simulated seasons yielding 42 broken record seasons (4.2%).

Again, to conclude, we'll look at the most and least impressive HR/PA seasons relative to their expectation from the leaguewide environment. Not surprisingly, the best list is a homage to Babe Ruth, with Gavvy Cravath being the only non-Ruthian season in the top 9.

Season Player HR/PA P(Max>Actual)
1920 Babe Ruth 0.08780488 0.04151227
1919 Babe Ruth 0.05350553 0.06106524
1926 Babe Ruth 0.07208589 0.09621733
1921 Babe Ruth 0.08513709 0.11546184
1915 Gavvy Cravath 0.03864734 0.15139745
2011 Jose Bautista 0.06564886 0.98498847
2004 Adrian Beltre 0.07305936 0.9878115
1986 Rob Deer 0.06043956 0.98910856
1970 Johnny Bench 0.06706408 0.98993891
2009 Carlos Pena 0.06842105 0.9927466

. . .

Data courtesy of FanGraphs.

Stephen Loftus is an editor at Beyond The Box Score. You can follow him on Twitter at @stephen__loftus.