clock menu more-arrow no yes mobile

Filed under:

WHIFF! Strikeout Rates Explained

Predicting strikeout rate is difficult, and there's really only one variable you need to look at - Whiffs per Swing.

Leon Halip - Getty Images

There may be many ways to skin a cat, train a fly, kill a man, etc, but it appears that there is only one solid and reliable way to strike batters out – to make them whiff on pitches.

Of course, this isn’t exactly brain wrinkling, since your only ways to finish a strikeout are with a swinging or a looking strike, and the only ways to get strikes are swings, fouls, or looks.

Still, when I ran the numbers for 2012 and the period from 2007-2012, I was stunned at just how extreme the results were. It seems that no statistic, sabermetric or otherwise, can effectively explain variance in K% (the percentage of plate appearances that end in a strikeout)…except for Swinging Strike % (from Fangraphs) and Whiff/Swing (from Baseball Prospectus). Since Whiff/Swing is new (well, the leaderboards are) and performed slightly better than SwStr% at explaining K% variation, when I refer to whiffs in this article, I’ll be referring to BP’s version.

For clarification, SwStr% is the percentage of total pitches a batter swings at and misses, while Whiff/Swing is the percentage of total swings a batter misses on.

Method

Using FanGraphs’ custom leaderboards and Baseball Prospectus' Whiff/Swing, I ran regressions for 2012 for K% against percentage of fastballs thrown (FA%), percentage of sliders thrown (SL%), average fastball velocity (vFA), overall strike rate (Strike%), overall swing rate (Swing%), first strike rate (F-Strike%), Horizontal and Vertical pitch movement (H Mov and V Mov, respectively), walk rate (BB%) and finally, SwStr% and Whiff/Swing.

For 2012, I used 40 innings pitched as the cut-off (or approximately 500 pitches). Later, when I discuss the data for 2007-2012 (this is as far back as Dan Brooks’ excellent PitchFX work goes), I used 200 innings pitched as the cut-off.

I understand that for a strikeout analysis, I perhaps should have included all pitchers, and I can do that in the future if there is a compelling case for it, but for now those were the cut off points I used.

Disclaimer

I’m a bit rusty on my regression analysis, but this Baseball Prospectus piece from Matt Swartz from a few seasons back seems to confirm my findings that swinging strike rates are highly correlated with strikeouts. Matt’s analysis focused more on predicting next-year strikeout rates (his findings were that once a baseline K% is established, SwStr% doesn't tell you too much else), but my aim was simply to explain the anatomy of strikeouts (that is, this is descriptive, not predictive, for now).

Results

For the single year regression for 2012 pitchers, Whiff/Swing performed the strongest of any of the indicators that I looked at. It doesn’t sit right with me how low the R-squareds are for these other statistics, so perhaps I erred somewhere, but it’s possible that pitchers can manage strikeouts regardless of their repertoire or overall locating abilities, so long as they have swing-and-miss stuff. The chart below shows the results for the regression comparing K% to each indicator.

Correlation with K%

Stat

R2

Standard Error

Whiff/Swing 0.656 0.034
SwStr% 0.636 0.034
FA% (pfx) 0.108 0.054
Strike% 0.047 0.056
O-Swing% 0.046 0.056
Swing% 0.039 0.056
V Mov 0.025 0.056
BB% 0.020 0.056
vFA (pfx) 0.020 0.056
SL% (pfx) 0.013 0.057
F-Strike% 0.004 0.057
H Mov 0.001 0.057
All 0.786 0.027

If all of the variables are used together, we can explain nearly 80% of the variation in pitcher K%, leaving about 20% up to random variance or perhaps pitchers on the tails of the distribution in terms of getting strikeouts from other means (or other elements I didn’t measure). When I looked at all of the years from 2007-2012, the same story holds.

Correlation with K%, 2007-2012

Stat

R2

Standard Error

Whiff/Swing 0.687 0.025
vFA (pfx) 0.216 0.039
FA% (pfx) 0.130 0.041
BB% 0.077 0.043
O-Swing% 0.076 0.043
SL% (pfx) 0.025 0.044
Strike% 0.014 0.044
Swing% 0.013 0.044
V Mov 0.008 0.044
H Mov 0.004 0.044
F-Strike% 0.000 0.044
All 0.858 0.017

Here we can predict even more of the variance in strikeout rate over the longer-term with our factors, and Whiff/Swing is even more significant, explaining 69% (haha, 69) of the variance. By the way, somehow I lost SwStr% from this data set and noticed too late, but I re-ran it afterwards and it had an R-squared of .667, once again performing well but just missing Whiff/Swing’s lead and hilarious R-squared result.

Using the formula our regression spits out for using Whiff/Swing to predict K%, we can develop an "Expected K%" in very rough terms that is K%=.007502+(.85006*Whiff%). You can see the graph below for actual K% and Whiff/Swing% with the trend line that roughly denotes our "expected K%" or xK%.

Mlb-strikeout-rate-whiff-rate

Taking this one step further and using our (admittedly rough) xK%, we can identify some outliers. These are pitchers who may have been lucky or unlucky in turning their whiffs into strikeouts, or are possibly the grouping of pitchers who manage to change their K% around a particular whiff ability.

Name

K%

Whiff/Swing

xK%

Diff

Craig Kimbrel 50.20% 42.17% 36.60% 13.60%
Brad Lincoln 24.30% 14.45% 13.03% 11.27%
Jake McGee 34.40% 26.57% 23.34% 11.06%
Kenley Jansen 39.30% 32.85% 28.67% 10.63%
Tom Gorzelanny 20.30% 35.48% 30.91% 10.61%
David Robertson 32.70% 25.37% 22.32% 10.38%
Jason Grilli 36.90% 30.65% 26.80% 10.10%
Aroldis Chapman 44.20% 40.00% 34.75% 9.45%
Antonio Bastardo 36.20% 30.93% 27.04% 9.16%
Sean Doolittle 31.40% 25.78% 22.66% 8.74%
David Hernandez 35.30% 30.90% 27.02% 8.28%
Vicente Padilla 23.40% 17.85% 15.92% 7.48%
Ernesto Frieri 36.40% 33.22% 28.99% 7.41%
Casey Janssen 27.70% 23.10% 20.39% 7.31%

Not surprisingly, all of the pitchers here are relievers, given that their samples are smaller. The chart below is filtered for starters only.

Name

K%

Whiff/Swing

xK%

Diff

Mike Fiers 25.10% 20.68% 18.33% 6.77%
Cliff Lee 24.40% 20.73% 18.37% 6.03%
Stephen Strasburg 30.20% 27.88% 24.45% 5.75%
David Price 24.50% 21.39% 18.93% 5.57%
Marco Estrada 25.40% 23.08% 20.37% 5.03%
David Phelps 23.20% 20.92% 18.53% 4.67%
Travis Blackley 16.00% 23.26% 20.52% 4.52%
Derek Lowe 8.60% 14.39% 12.98% 4.38%
Vance Worley 18.10% 15.36% 13.81% 4.29%
Alex White 13.90% 20.47% 18.15% 4.25%
Max Scherzer 29.40% 28.71% 25.16% 4.24%

Takeaways

Our own Glenn DuPaul has had a lot of research of late on how simple K and BB-based ERA estimators (including his new predictive FIP), so it’s becoming more valuable to identify what goes in to striking batters out. The amount of swings that result in misses appears to be the best indicator for predicting strikeout performance (aka "dominance").

Items of Interest

*The highest Whiff/Swing rates belonged to Craig Kimbrel (42.17%) and Aroldis Chapman (40.00%), which is maybe the least surprising result of any study you'll read this year.

*The lowest Whiff/Swing rate belonged to Aaron Cook at 8.92%, making him the only pitcher below 11%. Basically with him, if you swing, you're Marco Scutaro in terms of contact ability.

*Tyler Chatwood, Bobby Parnell, and Ben Sheets are the poster boys for this formula, as each of their actual K% came within .05 of their xK%.

*While I had the data in front of me, I thought I'd look at how First Pitch Strike % matches up with BB%, and was surprised it had an R-squared of just .32. Overall Strike % actually only had a .56 R-squared value, so it seems there is no answer as obvious as "swing and miss" for BB%, which makes sense intuitively.