/cdn.vox-cdn.com/uploads/chorus_image/image/1058927/149917910.0.jpg)
There may be many ways to skin a cat, train a fly, kill a man, etc, but it appears that there is only one solid and reliable way to strike batters out – to make them whiff on pitches.
Of course, this isn’t exactly brain wrinkling, since your only ways to finish a strikeout are with a swinging or a looking strike, and the only ways to get strikes are swings, fouls, or looks.
Still, when I ran the numbers for 2012 and the period from 2007-2012, I was stunned at just how extreme the results were. It seems that no statistic, sabermetric or otherwise, can effectively explain variance in K% (the percentage of plate appearances that end in a strikeout)…except for Swinging Strike % (from Fangraphs) and Whiff/Swing (from Baseball Prospectus). Since Whiff/Swing is new (well, the leaderboards are) and performed slightly better than SwStr% at explaining K% variation, when I refer to whiffs in this article, I’ll be referring to BP’s version.
For clarification, SwStr% is the percentage of total pitches a batter swings at and misses, while Whiff/Swing is the percentage of total swings a batter misses on.
Method
Using FanGraphs’ custom leaderboards and Baseball Prospectus' Whiff/Swing, I ran regressions for 2012 for K% against percentage of fastballs thrown (FA%), percentage of sliders thrown (SL%), average fastball velocity (vFA), overall strike rate (Strike%), overall swing rate (Swing%), first strike rate (F-Strike%), Horizontal and Vertical pitch movement (H Mov and V Mov, respectively), walk rate (BB%) and finally, SwStr% and Whiff/Swing.
For 2012, I used 40 innings pitched as the cut-off (or approximately 500 pitches). Later, when I discuss the data for 2007-2012 (this is as far back as Dan Brooks’ excellent PitchFX work goes), I used 200 innings pitched as the cut-off.
I understand that for a strikeout analysis, I perhaps should have included all pitchers, and I can do that in the future if there is a compelling case for it, but for now those were the cut off points I used.
Disclaimer
I’m a bit rusty on my regression analysis, but this Baseball Prospectus piece from Matt Swartz from a few seasons back seems to confirm my findings that swinging strike rates are highly correlated with strikeouts. Matt’s analysis focused more on predicting next-year strikeout rates (his findings were that once a baseline K% is established, SwStr% doesn't tell you too much else), but my aim was simply to explain the anatomy of strikeouts (that is, this is descriptive, not predictive, for now).
Results
For the single year regression for 2012 pitchers, Whiff/Swing performed the strongest of any of the indicators that I looked at. It doesn’t sit right with me how low the R-squareds are for these other statistics, so perhaps I erred somewhere, but it’s possible that pitchers can manage strikeouts regardless of their repertoire or overall locating abilities, so long as they have swing-and-miss stuff. The chart below shows the results for the regression comparing K% to each indicator.
Correlation with K% |
||
Stat |
R2 |
Standard Error |
Whiff/Swing | 0.656 | 0.034 |
SwStr% | 0.636 | 0.034 |
FA% (pfx) | 0.108 | 0.054 |
Strike% | 0.047 | 0.056 |
O-Swing% | 0.046 | 0.056 |
Swing% | 0.039 | 0.056 |
V Mov | 0.025 | 0.056 |
BB% | 0.020 | 0.056 |
vFA (pfx) | 0.020 | 0.056 |
SL% (pfx) | 0.013 | 0.057 |
F-Strike% | 0.004 | 0.057 |
H Mov | 0.001 | 0.057 |
All | 0.786 | 0.027 |
If all of the variables are used together, we can explain nearly 80% of the variation in pitcher K%, leaving about 20% up to random variance or perhaps pitchers on the tails of the distribution in terms of getting strikeouts from other means (or other elements I didn’t measure). When I looked at all of the years from 2007-2012, the same story holds.
Correlation with K%, 2007-2012 |
||
Stat |
R2 |
Standard Error |
Whiff/Swing | 0.687 | 0.025 |
vFA (pfx) | 0.216 | 0.039 |
FA% (pfx) | 0.130 | 0.041 |
BB% | 0.077 | 0.043 |
O-Swing% | 0.076 | 0.043 |
SL% (pfx) | 0.025 | 0.044 |
Strike% | 0.014 | 0.044 |
Swing% | 0.013 | 0.044 |
V Mov | 0.008 | 0.044 |
H Mov | 0.004 | 0.044 |
F-Strike% | 0.000 | 0.044 |
All | 0.858 | 0.017 |
Here we can predict even more of the variance in strikeout rate over the longer-term with our factors, and Whiff/Swing is even more significant, explaining 69% (haha, 69) of the variance. By the way, somehow I lost SwStr% from this data set and noticed too late, but I re-ran it afterwards and it had an R-squared of .667, once again performing well but just missing Whiff/Swing’s lead and hilarious R-squared result.
Using the formula our regression spits out for using Whiff/Swing to predict K%, we can develop an "Expected K%" in very rough terms that is K%=.007502+(.85006*Whiff%). You can see the graph below for actual K% and Whiff/Swing% with the trend line that roughly denotes our "expected K%" or xK%.
Taking this one step further and using our (admittedly rough) xK%, we can identify some outliers. These are pitchers who may have been lucky or unlucky in turning their whiffs into strikeouts, or are possibly the grouping of pitchers who manage to change their K% around a particular whiff ability.
Name |
K% |
Whiff/Swing |
xK% |
Diff |
Craig Kimbrel | 50.20% | 42.17% | 36.60% | 13.60% |
Brad Lincoln | 24.30% | 14.45% | 13.03% | 11.27% |
Jake McGee | 34.40% | 26.57% | 23.34% | 11.06% |
Kenley Jansen | 39.30% | 32.85% | 28.67% | 10.63% |
Tom Gorzelanny | 20.30% | 35.48% | 30.91% | 10.61% |
David Robertson | 32.70% | 25.37% | 22.32% | 10.38% |
Jason Grilli | 36.90% | 30.65% | 26.80% | 10.10% |
Aroldis Chapman | 44.20% | 40.00% | 34.75% | 9.45% |
Antonio Bastardo | 36.20% | 30.93% | 27.04% | 9.16% |
Sean Doolittle | 31.40% | 25.78% | 22.66% | 8.74% |
David Hernandez | 35.30% | 30.90% | 27.02% | 8.28% |
Vicente Padilla | 23.40% | 17.85% | 15.92% | 7.48% |
Ernesto Frieri | 36.40% | 33.22% | 28.99% | 7.41% |
Casey Janssen | 27.70% | 23.10% | 20.39% | 7.31% |
Not surprisingly, all of the pitchers here are relievers, given that their samples are smaller. The chart below is filtered for starters only.
Name |
K% |
Whiff/Swing |
xK% |
Diff |
Mike Fiers | 25.10% | 20.68% | 18.33% | 6.77% |
Cliff Lee | 24.40% | 20.73% | 18.37% | 6.03% |
Stephen Strasburg | 30.20% | 27.88% | 24.45% | 5.75% |
David Price | 24.50% | 21.39% | 18.93% | 5.57% |
Marco Estrada | 25.40% | 23.08% | 20.37% | 5.03% |
David Phelps | 23.20% | 20.92% | 18.53% | 4.67% |
Travis Blackley | 16.00% | 23.26% | 20.52% | 4.52% |
Derek Lowe | 8.60% | 14.39% | 12.98% | 4.38% |
Vance Worley | 18.10% | 15.36% | 13.81% | 4.29% |
Alex White | 13.90% | 20.47% | 18.15% | 4.25% |
Max Scherzer | 29.40% | 28.71% | 25.16% | 4.24% |
Takeaways
Our own Glenn DuPaul has had a lot of research of late on how simple K and BB-based ERA estimators (including his new predictive FIP), so it’s becoming more valuable to identify what goes in to striking batters out. The amount of swings that result in misses appears to be the best indicator for predicting strikeout performance (aka "dominance").
Items of Interest
*The highest Whiff/Swing rates belonged to Craig Kimbrel (42.17%) and Aroldis Chapman (40.00%), which is maybe the least surprising result of any study you'll read this year.
*The lowest Whiff/Swing rate belonged to Aaron Cook at 8.92%, making him the only pitcher below 11%. Basically with him, if you swing, you're Marco Scutaro in terms of contact ability.
*Tyler Chatwood, Bobby Parnell, and Ben Sheets are the poster boys for this formula, as each of their actual K% came within .05 of their xK%.
*While I had the data in front of me, I thought I'd look at how First Pitch Strike % matches up with BB%, and was surprised it had an R-squared of just .32. Overall Strike % actually only had a .56 R-squared value, so it seems there is no answer as obvious as "swing and miss" for BB%, which makes sense intuitively.