For a while, I have wondered if pitch data could be used to estimate a player's walk and strikeout rates. At Fangraphs.com, they display the percentage rate for pitches swung at and hit inside and outside the strike zone for each player (O-Swing%,Z-SWing%,O-Contact%,Z-Contact%). Using multiple variate regression, I took the 4 variables (outside swing and miss, outside contact, strike zone swing and miss, strike zone contact) and compared them to strikeout and walk percentages.
For this first run, I looked all the qualified hitters (500 PAs) from 2009. For the strikeout percentage, I got a r-squared of 0.89 and a standard deviation of 2.0% on the difference from the projected and final values. For the walk percentage, I ended up with an r-squared of 0.63 and a standard deviation of 2.0 on the difference from the projected and final values.
I went to look through this dataset and saw that some players had an actual much higher actual walk rate vice projected from a 6% to 8%. These players were all great hitters (Fielder, Pujols, A, Gonzalez) and it dawned that IBB was included in the walk rate and I needed to factor it in. I included a fifth variable in the walk calculations, IBB/PA and re-ran the regression. The results were much better. With an r-squared of 0.79 and the standard deviation of 0.15%. The highest percentage difference was 4% vice 8%. Here are the equations for estimating walk and strikeout rate:
SO% = ((-0.0407*O-Swing%)+(-0.2417 * Z-SWing%)+(-0.2429*O-Contact%)+(-0.8765*Z-Contact%) + 1.2885)*100%
BB% = ((-0.4134*O-Swing%)+(-0.0328*Z-SWing%)+(0.0216*O-Contact%)+(-0.2595*Z-Contact%)+ (1.7203*IBB per PA)+0.4217)*100%
Using these values, here are the players that I looked at the most deviate from the estimate and could be due for a correction in 2010:
Name | 2010 Team | 2009 Walk Rate | 2009 Estimated Walk Rate | Estimated – Actual |
Ichiro Suzuki | Mariners | 4.7% | 8.4% | 3.7% |
B.J. Upton | Rays | 9.1% | 11.9% | 2.8% |
Franklin Gutierrez | Mariners | 7.3% | 9.9% | 2.6% |
Jason Kubel | Twins | 9.7% | 12.0% | 2.3% |
Ben Zobrist | Rays | 15.2% | 11.8% | -3.4% |
Nick Swisher | Yankees | 16.0% | 12.5% | -3.5% |
Kosuke Fukudome | Cubs | 15.4% | 11.6% | -3.8% |
Nick Johnson | Yankees | 17.2% | 13.3% | -3.9% |
Name | 2010 Team | 2009 Strikeout Rate | 2009 Estimated Strikeout Rate | Estimated – Actual |
Brian Roberts | Orioles | 17.7% | 12.2% | -5.5% |
David Wright | Mets | 26.2% | 20.7% | -5.5% |
Alfonso Soriano | Cubs | 24.7% | 20.3% | -4.4% |
Kevin Youkilis | Red Sox | 25.5% | 21.2% | -4.3% |
Yadier Molina | Cardinals | 8.1% | 12.2% | 4.1% |
Hunter Pence | Astros | 18.6% | 23.0% | 4.4% |
Brandon Phillips | Reds | 12.8% | 17.2% | 4.4% |
Yunel Escobar | Braves | 11.7% | 16.5% | 4.8% |
I like the initial results and I am planning to add a few more years worth of data to get a better equation. I can see this formula being used to see if changes in walk and strike out rates is because of changes in plate discipline or just noise in the data.