The Distribution of Strikeout and Walk Rates Has Changed

Inspired by the musings of Tony Blengino’s recent guest posts on FanGraphs (here and here), I wanted to apply his method of thinking to other baseball analysis. While we often refer to how a player does relative to league average (stats like OPS+, FIP-), we have to understand and accept that there are error bars. If a player has an OPS+ of 117, we can’t necessarily say, definitively, that he is better than a player with an OPS+ of 116. OPS+ leaves out certain offensive contributions, such as baserunning, but the point is that the random nature of baseball introduces error into measurements.

Blengino, as a recently departed member of an MLB front office staff, offers a different perspective of how to evaluate players: the standard deviation. We generally understand that an MLB front office has more information than we do, so to get insight into how an MLB organization thinks, for free, is wonderful. To account for the variation in performance, Blengino uses standard deviations above or below average to evaluate the skills of a player. I want to use that same methodology to evaluate starting pitchers (relievers are a different beast and can’t really be lumped with starters).

In this quest, I have started with strikeout rates and walk rates. DIPS theory says that those two things (in addition to home runs given up) are reflective of pitcher skill, which means that pitchers can differentiate themselves based on those stats. My eventual goal is to determine how the distribution of pitcher skill in strikeout rates and walk rates affects performance; to start, for now, I will simply describe the distribution and trends thereof.

In order to obtain a sample for this analysis, I grabbed season-by-season data for each season for 1995-2013 and filtered by starting pitchers. In order to limit errors introduced by starting pitchers with small sample sizes, I limited the strikeout sample to those pitchers who had at least 70 TBF, which research shows is when strikeout rate stabilizes. I limited the walk sample to those pitchers who had at least 170 TBF, which research shows is when walk rate stabilizes. I calculated the walk rate by dividing the sum of BB and HBP by TBF, eliminating intentional walks from the calculation. I calculated the strikeout rate by dividing SO by TBF. On standard FanGraphs pages, those rates are calculated using plate appearances as the denominator, but using TBF should not change rates by a noticeable amount.

As one might expect, the strikeout rate has risen over time. This is not new information. However, the standard deviation of the strikeout rate has slightly increased in this time period. Most importantly, the distribution of the strikeout rate has changed. In the earlier years of this time period, the distribution was significantly skewed toward lower strikeout rates. In more recent years, the distribution has become normal, reflecting the shape of a bell curve, and more flat. Fewer pitchers now fit within one standard deviation of the middle than before. From 1995-2013, pitchers have differentiated themselves more and more based on strikeout rate.

The walk rate displays the opposite trends as the strikeout rate. The walk rate has actually decreased since 1995. Over time, the standard deviation has decreased slightly, and the distribution has become more skewed and more peaked. Walk rates are skewed to lower percentages, and there are more pitchers that fit within one standard deviation of the middle. There is more year-to-year variation than strikeout rates, but the general trend is present. It is becoming more difficult for pitchers to differentiate themselves based on walk rates as more and more pitchers fit within a similar range of walk rates.

There are several potential reasons for the rise in strikeout rates and decline in walk rates. According to FanGraphs, the O-swing% has increased while overall Swing% has stayed relatively stable, which means that Z-Swing% has decreased some. If players are swinging at more pitches outside the zone, fewer pitches outside the zone will be called balls and more strikes will be added to the count, whether foul or whiff. Like Swing%, SwStr% has stayed relatively stable; it is possible that there has been a change in whiff rate instead. These data do not go back to 1995, however. Pitch F/X data and other systems such as those could have an effect, but hitters should have access to the same data, and Pitch F/X doesn’t go back to 1995 either. Selection bias could be involved; it is possible that sabermetrics has allowed organizations to see the value of strikeouts and walks and select pitchers who excel in strikeouts and are averse to walks, which means that pitchers who have higher walk rates would disappear from the data. It is also possible that lower strikeout pitchers can retain value through their batted ball profile, but high walk pitchers cannot retain their value despite a favorable batted ball profile. Perhaps organizations have better teaching programs and coaches, and pitchers have better stuff and better command because of it.

The reasons above are by no means an exhaustive list; there could be many more reasons to explain the change in distribution. There are several directions for this research in the future in order to determine any possible reasons to explain the change in distribution. I plan to look at both batted ball profiles and performance. Some of the possible explanations for the results, such as better coaching, are difficult to evaluate. I will attempt to stick to questions that have data, of course. Perhaps these future analyses can help shed light on why these strikeout rates have increased and walk rates have decreased.

Overall, the variation in strikeout rates has increased while the variation in walk rates has decreased. Walk rates are becoming more homogeneous, while strikeout rates are becoming more heterogeneous. In the present day, pitchers can differentiate themselves more based on strikeout rates rather than walk rates.

*All data courtesy of FanGraphs. Analysis performed in R Studio.