/cdn.vox-cdn.com/uploads/chorus_image/image/12387751/143230468.0.jpg)
Two weeks ago, Beyond the Box Score's Ryan Potter and I were looking to calculate Tom Seaver's ERA- for the first thirteen starts of his 1967 rookie season. The idea was to then juxtapose that score to the Mets' newest hard-throwing phenom Matt Harvey, expanding on his original comparison.
After mining the data, Ryan raised an interesting question: Should we compare Seaver's ERA through those first thirteen starts to the the league average over all of 1967? Or just that first month?
My first impulse was to say that it wouldn't make much of a difference. But I soon realized that I really didn't know that for sure.
Mainly when we use metrics that adjust for run environment like ERA- and wRC+, we are comparing extreme opposites-- say, 1968 with 1998 etc. Most of the time run environment doesn't change so much from one year to the next. Of course, most of the time isn't all of the time. As recently as 2010 we saw runs per game jump almost a quarter of a run from one year to the next as we entered the so-called "year of the pitcher".
But what about from month-to-month? How much can run environment change within a single year?
Monthly Run Environments
Using the retrosheet files, I queried the average runs-per-game for each month back until 1950, then compared them to the average runs-per-game for the entire season. These were the months that differed the most from their year averages:
Greatest outlier months by Runs/Game since 1950
# | Year | Month | R/G month | R/G year | Delta |
---|---|---|---|---|---|
1 | 1950 | June | 5.66 | 4.90 | 0.76 |
2 | 1950 | September | 4.31 | 4.90 | 0.59 |
3 | 1960 | April | 4.87 | 4.31 | 0.56 |
4 | 1951 | May | 5.06 | 4.53 | 0.53 |
5 | 1959 | April | 4.89 | 4.39 | 0.49 |
6 | 1995 | April | 5.32 | 4.85 | 0.47 |
7 | 1961 | April | 4.09 | 4.53 | 0.44 |
8 | 1974 | April | 4.54 | 4.12 | 0.42 |
9 | 1961 | July | 4.92 | 4.53 | 0.40 |
10 | 1951 | August | 4.15 | 4.53 | 0.38 |
1950, the earliest year in the query, was a wild one. In the first month of that season, runs were generated at a rate of 4.8 per game. Later that summer, however, there was a furious boom in offensive production. Lineups were at their most dangerous in June when players crossed the plate at an average rate of 5.7 runs-per-game.
Offensive numbers then began to return to normal in the subsequent months of July (5.0) and August (4.7) of 1950, until finally reaching their lowest point of the season in September at just 4.3 runs-per-game.
This extreme change in run environment from June to September totaled a difference of just under a run and a half per game. That is roughly the equivalent of going from the Year of the Pitcher in 1968, to the home run blitz led by Sosa and McGwire in the 1998 season.
The following May of the 1951 season saw a similar surge in offense, with a jump almost a half of a run more than the year's final average. The run environment again dropped significantly later that summer in August by retreating to just 4.2 runs-per-game.
We also see big swings in the pair of seasons exactly a decade after that in 1960-61. In fact, just two seasons make the top ten after 1961, and just one season occurs after 1974.
Fewer teams, fewer games, wilder swings
Naturally this begs the question, why were in-season run environments so volatile in the '50s and '60s? I imagine that the answer is mainly a product of sample size.
With far fewer teams in the league in those days (just 16 teams prior to 1961) and even fewer games during the season, these dramatic swings in the run environment were far more likely to occur. (We see a streak of Aprils appear in the top ten for that same reason. Opening day typically didn't take place until mid-April in those days.)
This might lead some of our younger readers to believe that 1995, a year in which 28 teams played 162 games, is more an outlier than any of those pre-expansion seasons, but it's not really the case. Remember that the 1995 season began late, with the strike of 1994 pushing opening day back to the 25th of April. Only 33 games were played that first month.
So what if we limit the query to seasons in which baseball still hosted 30 teams to eliminate the wild swings of those smaller samples? What kind of swings in run environment might we see then?
Greatest outlier months by Runs/Game since 1998
# | Year | Month | R/G month | R/G year | Delta |
---|---|---|---|---|---|
1 | 2006 | July | 5.17 | 4.86 | 0.31 |
2 | 2007 | April | 4.54 | 4.80 | 0.26 |
3 | 2008 | July | 4.90 | 4.65 | 0.25 |
4 | 2000 | April | 5.38 | 5.14 | 0.24 |
5 | 2009 | June | 4.37 | 4.61 | 0.24 |
6 | 2004 | August | 5.05 | 4.81 | 0.23 |
7 | 2009 | April | 4.84 | 4.61 | 0.23 |
8 | 2001 | June | 4.98 | 4.78 | 0.21 |
9 | 2011 | June | 4.09 | 4.28 | 0.20 |
10 | 2007 | September | 4.99 | 4.80 | 0.20 |
July of 2006 was the most outlier-ish month of all the outlier-y months of this 'modern' 30-team era. But even then the change wasn't nearly as dramatic as some of those from the 1950s.The wildest monthly swing of 1950 was still almost twice as large as that of July 2006. It would appear that with more teams and more games, comes more stability in the in-season run environment.
Nevertheless, in 2006 there was a massive drop from 5.2 runs-per-game in July to 4.7 in the immediately subsequent August, and changing run environments by a half of a run from one month to the next in that manner is a bit eye-popping.
It might be worth it to explore what events (if any) may have triggered this profound shift from one extreme to another in such a short period of time, but that effort will have to wait for another time in another post. It could just be randomness, it could've been a cold front, a rash of injuries, or any number of things. But I am no doubt intrigued.
One final thing
I've always been under the impression that offense in general seems to stutter a bit in the opening months of the season. It's typically colder, batters may still be finding the 'rhythm 'of their swings, etc.
Seeing a number of April months that exceed the yearly R/G average in that first table (despite the sample issues) prompted me to question that as the final exercise into this inquiry.
I took the average change in monthly run environment from it's corresponding run environment for that year for each of the six major months in a baseball season. For all Aprils I found there was a small .02 increase in runs-per-game on average in those months.
This seemed odd, so I re-ran the query using only seasons since 1998, to avoid the effects of those shorter Aprils from the 50s and 60s. Yet, still, there was a .02 increase in the modern era as well:
Average monthly change in run environment:
April | May | June | July | August | September | |
---|---|---|---|---|---|---|
Since 1950 | +0.02 | +0.02 | +0.05 | +0.06 | -0.03 | -0.12 |
Since 1998 | +0.02 | -0.02 | 0.00 | +0.05 | +0.02 | -0.03 |
So offenses clearly get a boost when the temperatures rise in the heart of the summer months. June appears to be the bell whether for a seasons 'true' run environment, while typically fewer cleats cross the plate during the final stretch in September.
Perhaps this is the effect of fatigue, or roster expansion, or possibly even an increase in favorable pitcher match-ups with playoff bids on the line (a stretch, I'm sure). Feel free to add your own theories in the comments.
. . .
Thanks to Baseball Heat Maps and Retrosheet for the data.
James Gentile writes about baseball at Beyond the Box Score and The Hardball Times. You can follow him on twitter @JDGentile.