For a while, I have wondered if pitch data could be used to estimate a player's walk and strikeout rates. At Fangraphs.com, they display the percentage rate for pitches swung at and hit inside and outside the strike zone for each player (O-Swing%,Z-SWing%,O-Contact%,Z-Contact%). Using multiple variate regression, I took the 4 variables (outside swing and miss, outside contact, strike zone swing and miss, strike zone contact) and compared them to strikeout and walk percentages.
For this first run, I looked all the qualified hitters (500 PAs) from 2009. For the strikeout percentage, I got a r-squared of 0.89 and a standard deviation of 2.0% on the difference from the projected and final values. For the walk percentage, I ended up with an r-squared of 0.63 and a standard deviation of 2.0 on the difference from the projected and final values.
I went to look through this dataset and saw that some players had an actual much higher actual walk rate vice projected from a 6% to 8%. These players were all great hitters (Fielder, Pujols, A, Gonzalez) and it dawned that IBB was included in the walk rate and I needed to factor it in. I included a fifth variable in the walk calculations, IBB/PA and re-ran the regression. The results were much better. With an r-squared of 0.79 and the standard deviation of 0.15%. The highest percentage difference was 4% vice 8%. Here are the equations for estimating walk and strikeout rate:
SO% = ((-0.0407*O-Swing%)+(-0.2417 * Z-SWing%)+(-0.2429*O-Contact%)+(-0.8765*Z-Contact%) + 1.2885)*100%
BB% = ((-0.4134*O-Swing%)+(-0.0328*Z-SWing%)+(0.0216*O-Contact%)+(-0.2595*Z-Contact%)+ (1.7203*IBB per PA)+0.4217)*100%
Using these values, here are the players that I looked at the most deviate from the estimate and could be due for a correction in 2010:
| Name | 2010 Team | 2009 Walk Rate | 2009 Estimated Walk Rate | Estimated – Actual |
| Ichiro Suzuki | Mariners | 4.7% | 8.4% | 3.7% |
| B.J. Upton | Rays | 9.1% | 11.9% | 2.8% |
| Franklin Gutierrez | Mariners | 7.3% | 9.9% | 2.6% |
| Jason Kubel | Twins | 9.7% | 12.0% | 2.3% |
| Ben Zobrist | Rays | 15.2% | 11.8% | -3.4% |
| Nick Swisher | Yankees | 16.0% | 12.5% | -3.5% |
| Kosuke Fukudome | Cubs | 15.4% | 11.6% | -3.8% |
| Nick Johnson | Yankees | 17.2% | 13.3% | -3.9% |
| Name | 2010 Team | 2009 Strikeout Rate | 2009 Estimated Strikeout Rate | Estimated – Actual |
| Brian Roberts | Orioles | 17.7% | 12.2% | -5.5% |
| David Wright | Mets | 26.2% | 20.7% | -5.5% |
| Alfonso Soriano | Cubs | 24.7% | 20.3% | -4.4% |
| Kevin Youkilis | Red Sox | 25.5% | 21.2% | -4.3% |
| Yadier Molina | Cardinals | 8.1% | 12.2% | 4.1% |
| Hunter Pence | Astros | 18.6% | 23.0% | 4.4% |
| Brandon Phillips | Reds | 12.8% | 17.2% | 4.4% |
| Yunel Escobar | Braves | 11.7% | 16.5% | 4.8% |
I like the initial results and I am planning to add a few more years worth of data to get a better equation. I can see this formula being used to see if changes in walk and strike out rates is because of changes in plate discipline or just noise in the data.


There are 21 Comments. Load Now.
Shortcuts to mastering the comment thread. Use wisely.
C - Next Comment
X - Mark as Read
R - Reply
Z - Mark Read & Next
Shift + C - Previous
Shift + A - Mark All Read
Comment Settings
Live comment alert: Hide it!
Comments for this post are closed.