/cdn.vox-cdn.com/uploads/chorus_image/image/25820191/20131018_jla_so6_101.0.jpg)
It started so innocently with a random tweet:
@ScottLindholm What about the lower limit of ERA in lower K environment re: ERA+? Are pitchers of 60s and 70s unfairly hurt?
— Michael Salfino (@MichaelSalfino) December 9, 2013
I had to think long and hard about this, since I really wasn't sure I understood the question. I begin by noting Michael Salfino writes at Yahoo! Sports and the Wall Street Journal sports page. When I received his tweet, I had to figure out what he meant--sometimes 140 characters isn't enough, believe it or not.
I'm still not sure I got it right, but he'll have proofread this so I'll be close. Over a series of emails I gradually began to grasp his argument, and it's a simple one. Power pitchers of the past were just as dominant in their eras as power pitchers of today, but it isn't reflected in their K/PA rates because the incidence of strikeouts was less. Michael's argument was to normalize K/PA to reflect this change. As such, this is my attempt at creating K/PA+ to normalize strikeout rates across eras.
It won't be perfect. My basic formula is simple:
100*(( KO/BF) / (( lgKO-KO) / ( lgBF-BF))))
It looks terrible when written and I feel like it's missing a parenthesis. It's a pitcher's strikeout rate per batter faced divided by the league strikeout rate, which is corrected by removing pitcher totals from league totals. It's straight-forward for most years, but if a pitcher changed leagues mid-season, I used the league totals in which he pitched the most innings.
My dataset was drawn from FanGraphs and consists of all pitchers since around 1950 with at least 1,000 IP who started more than 50% of his career games. There are exceptions like Dennis Eckersley and Wilbur Wood who had substantial numbers of both starts and relief appearances. The sample totals 529 players who pitched a total of 6,859 seasons.
Why 1950?
That's when strikeout rates began to increase. I don't know why--no new parks were built, the breaking of the color barrier didn't dramatically change strategy and pitching specialization hadn't begun. This chart shows the rates of strikeouts and walks from 1901-2013 and is adapted from data at Baseball-Reference.com:
Click on image to enlarge
In the modern era the stigma of the strikeout has decreased, making hitters more willing to swing for the fences, particularly as new hitter-friendly parks were built. This is the crux of Michael's argument--power pitchers of the 1960s like Sandy Koufax, Bob Gibson and Tom Seaver are diminished when looking at raw K/9 numbers and that K/9 numbers should be normalized like any other stat to reflect changes in the game.
An astute reader may have noticed I used K/PA earlier but switched to K/9 in the above sentence--that's how I had started this post but was instructed by someone that I:
"must (as in MUST) do it as K/PA and not K/IP. K/IP is tantamount to 'percentage of outs by strikeouts' times a constant (3 in this case)."
This person also said I should use PA since "PA is what is used for hitters. It really doesn't make sense to change the term based on offense or defense." And since the person who told me this useful information is Tom Tango, K/PA it is.
This table shows the pitchers most helped by normalizing strikeout rates since around 1950:
To explain, Nolan Ryan faced 22,575 batters in his career and struck out 25.3% of them. During his time, his leagues (he never switched leagues mid-season) had strikeout rates of 14%. 25.3/14 and multiplying the result by 100 gives a value of 180 (there will be rounding differences).
The reason this is important is while strikeouts have always been a part of baseball, the incidence has increased. This means there are fewer balls in play, or fewer opportunities for errors or other miscues. This chart shows the percentage of balls in play since 1950:
As more Pitchf/x data becomes available, it will be very interesting to see if batter selectivity has decreased, or in other words, if batters are swinging more freely than in the past. Assuming this is the case (which I'm not stating), pitchers of the past are being "penalized," for pitching to batters who were more selective. I don't think it's a news flash that Sandy Koufax is among the best pitchers in baseball history, nor should there be much discussion regarding Pedro Martinez when next year's Hall of Fame balloting rolls around. No one should be surprised by seeing Randy Johnson or Roger Clemens at the top of the KO/PA+ list, but when pitchers like Bob Feller and Sam McDowell appear, it proves they were dominant in their day. Comparing them to modern pitchers by simply looking at K/PA rates without taking the rest of the league into context does them a disservice.
Normalized stats allow meaningful comparisons of newer players to veterans and the ability to make projections. Bill James used them frequently in his New Historical Baseball Abstract, and they were eye-opening to me as I began to appreciate how to correctly interpret baseball statistics.
Will the world stop spinning if KO/PA+ isn't adopted as a common statistic?
Of course not--one of the side benefits of doing the research was seeing that it's almost built into the FIP stat, and as such FIP-. Consider the FIP formula:
FIP = ((13*HR)+(3*(BB+HBP))-(2*K))/IP + constant
When FIP is normalized against the league, a player's strikeouts, as well as home run, walk and HBP rates will be normalized as well. Does it hold up? This chart plots KO/PA+ vs. FIP-:
I show the r-squared value for those more statistically savvy than me (a very large population). The two appear to correlate well, suggesting some element of normalization is already in FIP- and making the adoption of KO/PA+ superfluous.
But it doesn't make the idea any less important. Strikeouts, like every stat in every sport, are only properly interpreted when compared to the era in which they occurred. Baseball changes all the time, and at this point, four months from the opening of the 2014 season, pitching seems to be on the rise. It won't stay that way forever. There are times when statistics don't tell the complete picture, and focusing on K/9 rates can be one such instance--look at the best all-time from B-R and see how top-heavy it is with recent pitchers. Normalizing helps put strikeout rates in the proper historical context.
All data courtesy of Baseball-Reference and FanGraphs.
Special thanks to Michael Salfino for the idea and for our discussions over the past month that helped me better understand his initial thought. I'm still not sure I get it. Also thanks to Tom Tango for his input.
Scott Lindholm is a web columnist for 670 The Score in Chicago. Follow him on Twitter at @ScottLindholm