For a pitcher, there are so many factors that go into getting strikeouts. Command, control, velocity, movement, deception, entropy. Generally, reduced to a single pitch, these things can factor into a pitcher generating a swing-and-miss. By conventional wisdom, this means that the more swings and misses a pitcher gets, the more strikeouts they’ll get.
This piece started with a questioning of the correlation between swings and misses (SwStr%) and strikeouts (K%). Examining an extended sample from this season tells us that the correlation is pretty strong (r = .81), giving us the obvious answer of yes, more swinging strikes results in more strikeouts. But that brings us the second question that almost always comes up when examining correlation levels: who are the outliers?
The strong correlation between swinging-strike percentage and strikeout percentage gave the opportunity of performing linear regression analysis. In simpler terms, turning swinging-strike percentage numbers into strikeout percentage numbers. The formula for our “expected” strikeout percentage ended up being swinging-strike percentage multiplied by a coefficient of 1.77.
Finding the differentials between the expected strikeout percentage and actual strikeout percentage, we have the outliers. Among the top over-performers are Chris Sale (+10.4 percent), Brad Peacock (+10.0 percent), Gerrit Cole (+9.7 percent), Rich Hill (+9.7 percent), and Brandon Woodruff (+9.4 percent). As for the top under-performers, there’s Sandy Alcantara (-3.2 percent), Brett Anderson (-2.1 percent), Zack Godley (-1.8 percent), Nick Margevicius (-1.6 percent), and Griffin Canning (-1.6 percent).
While this analysis is useful, there are multiple caveats to it. There are other factors that go into a strikeout outside of swinging-strikes. Luckily, there has been past research to try and improve the predictors of strikeout percentage, such as this piece from Mike Podhorzer over at RotoGraphs. Using metrics such as strike percentage, looking strike percentage, swinging strike percentage, and foul strike percentage, he was able to develop an expected strikeout percentage number that correlates to actual strikeout percentage better season-to-season better than actual strikeout percentage.
“So in my data set, the YoY correlation of xK% was slightly higher than K%, both of which are pretty high. All the strike/strike type components also have high correlations, with swinging strike rate being the most stable skill.”
Using pitchers from this season as the sample, the correlation between actual strikeout percentage and Podhorzer’s expected strikeout percentage are pretty fantastic (r = .96), leaving a small amount of outliers that include: Cole (+3.1 percent), Frankie Montas (+2.8 percent), Derek Holland (-3.9 percent), and Homer Bailey (-3.5 percent).
This bring us to the hypothesizing stage. What causes pitchers to over/under perform their strikeout numbers? The first thing I thought of was an inefficient distribution of swinging strikes. Getting swinging strikes late in a count (lets say two-strikes) is more important than getting strikes early in a count (non-two-strikes).
The league-average swinging-strike percentage comes in at 12 percent. In non-two-strike counts it drops down to 10.7 percent. In two-strike counts, it sits at 15 percent. When comparing actual strikeout percentage and expected strikeout percentage to how swinging strikes are broken down in counts, it becomes more clear through correlations.
- Non-2-Strike SwStr% to K%: 0.73
- 2-Strike SwStr% to K%: 0.75
- Non-2 Strike SwStr% to xK%: 0.80
- 2-Strike SwStr% to xK%: 0.72
Expected strikeout percentage doesn’t seem to correctly weigh swinging strikes by count, as the correlation is higher than when it’s matched up with actual strikeout percentage.
To understand swinging strikes broken by counts better, we’ll look at the correlations between the two. For the most part, pitchers distribute their swinging strikes to a somewhat even extent, with the correlation coefficient coming in at a steady (r = .51).
This still leaves room for plenty of outliers though. Among qualified pitchers, the biggest standout is Homer Bailey, who’s swinging-strike percentage is actually higher in non-two-strike counts (11.8 percent) than in two-strike counts (11.5 percent). The 0.3 percent differential may not sound like a lot, but it’s important to remember that the league on average gets 4.3 percent more swinging strikes in two-strike counts than in non-two-strike counts.
In what I thought to be non-surprising, Bailey was also the second biggest under-performer in actual strikeout percentage. Because of this matchup, I decided to evaluate this in a larger sample.
Looking back to the simple expected strikeout percentage derived from swinging-strike percentage, we see that the outliers there matchup decently with the outliers in the non-two-strike/two-strike swinging-strike percentage differential (r = .27).
Then looking to Podhorzer’s expected strikeout percentage, the same can be said, but even to a larger extent (r = .48).
Now, I can’t say exactly how pitchers who hurt from an inefficient distribution of swinging strikes can solve their issues. Maybe there are some out there that hold their best non-contact offerings too much late in counts for unknown reasons. Maybe their are some that suffer command issues. Maybe it could be a sequencing issue. All we know is that there are pitchers out there that do this and it could be feasible that their issues, if solved, could help them add a couple percentage points to their strikeout-rate.