Using Plate Discipline to Estimate Walk and Strikeout Rate - Corrected
For a while, I have wondered if pitch data could be used to estimate a player's walk and strikeout rates. At Fangraphs.com, they display the percentage rate for pitches swung at and hit inside and outside the strike zone for each player (O-Swing%,Z-SWing%,O-Contact%,Z-Contact%). Using multiple variate regression, I took the 4 variables (outside swing and miss, outside contact, strike zone swing and miss, strike zone contact) and compared them to strikeout and walk percentages.
For this first run, I looked all the qualified hitters (500 PAs) from 2009. For the strikeout percentage, I got a r-squared of 0.89 and a standard deviation of 2.0% on the difference from the projected and final values. For the walk percentage, I ended up with an r-squared of 0.63 and a standard deviation of 2.0 on the difference from the projected and final values.
I went to look through this dataset and saw that some players had an actual much higher actual walk rate vice projected from a 6% to 8%. These players were all great hitters (Fielder, Pujols, A, Gonzalez) and it dawned that IBB was included in the walk rate and I needed to factor it in. I included a fifth variable in the walk calculations, IBB/PA and re-ran the regression. The results were much better. With an r-squared of 0.79 and the standard deviation of 0.15%. The highest percentage difference was 4% vice 8%. Here are the equations for estimating walk and strikeout rate:
SO% = ((-0.0407*O-Swing%)+(-0.2417 * Z-SWing%)+(-0.2429*O-Contact%)+(-0.8765*Z-Contact%) + 1.2885)*100%
BB% = ((-0.4134*O-Swing%)+(-0.0328*Z-SWing%)+(0.0216*O-Contact%)+(-0.2595*Z-Contact%)+ (1.7203*IBB per PA)+0.4217)*100%
Using these values, here are the players that I looked at the most deviate from the estimate and could be due for a correction in 2010:
| Name | 2010 Team | 2009 Walk Rate | 2009 Estimated Walk Rate | Estimated – Actual |
| Ichiro Suzuki | Mariners | 4.7% | 8.4% | 3.7% |
| B.J. Upton | Rays | 9.1% | 11.9% | 2.8% |
| Franklin Gutierrez | Mariners | 7.3% | 9.9% | 2.6% |
| Jason Kubel | Twins | 9.7% | 12.0% | 2.3% |
| Ben Zobrist | Rays | 15.2% | 11.8% | -3.4% |
| Nick Swisher | Yankees | 16.0% | 12.5% | -3.5% |
| Kosuke Fukudome | Cubs | 15.4% | 11.6% | -3.8% |
| Nick Johnson | Yankees | 17.2% | 13.3% | -3.9% |
| Name | 2010 Team | 2009 Strikeout Rate | 2009 Estimated Strikeout Rate | Estimated – Actual |
| Brian Roberts | Orioles | 17.7% | 12.2% | -5.5% |
| David Wright | Mets | 26.2% | 20.7% | -5.5% |
| Alfonso Soriano | Cubs | 24.7% | 20.3% | -4.4% |
| Kevin Youkilis | Red Sox | 25.5% | 21.2% | -4.3% |
| Yadier Molina | Cardinals | 8.1% | 12.2% | 4.1% |
| Hunter Pence | Astros | 18.6% | 23.0% | 4.4% |
| Brandon Phillips | Reds | 12.8% | 17.2% | 4.4% |
| Yunel Escobar | Braves | 11.7% | 16.5% | 4.8% |
I like the initial results and I am planning to add a few more years worth of data to get a better equation. I can see this formula being used to see if changes in walk and strike out rates is because of changes in plate discipline or just noise in the data.
1 recs |
21 comments
|
Comments
Does this include intentional BB’s?
Bettman's Nightmare: A Blog Where Hockey Aficionados Dismantle That Mighty Empire, One Balsillie at a Time
http://bettmansnightmare.blogspot.com/
by Bettman's Nightmare on Mar 14, 2010 2:12 PM EDT reply actions
Nm, I breezed over that part on accident.
Bettman's Nightmare: A Blog Where Hockey Aficionados Dismantle That Mighty Empire, One Balsillie at a Time
http://bettmansnightmare.blogspot.com/
by Bettman's Nightmare on Mar 14, 2010 2:13 PM EDT up reply actions
This is pure, unadulterated awesome
Also might add bad calls to the list, or categorize it under noise.
I’d like to use your approach on Rotobase, if that’s OK Jeff. I’ll make a graph with career BB%, BB% and fxBB%.
I think it will rock. Let me know.
by Josh Hermsmeyer on Mar 14, 2010 2:19 PM EDT reply actions
that is, of course, assuming I can generate them from pitchf/x data. I believe it’s possible. Has anyone ever tried, or is BIS the only place for it?
by Josh Hermsmeyer on Mar 14, 2010 2:51 PM EDT up reply actions
It's quite easy to generate the FanGraphs stats with Pitch f/x
If you need help Josh, just email me.
by vivaelpujols on Mar 15, 2010 1:59 AM EDT up reply actions
Thanks Nick! I’ll take you up on it if I run into a wall :-)
by Josh Hermsmeyer on Mar 15, 2010 2:20 PM EDT up reply actions
It can be done on pitch fx and you can do it.
you may need to set the zone like a did in this article:
http://www.beyondtheboxscore.com/2009/11/5/1107712/umpire-strikezone-analysis-the
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman (TucsonRoyal) on Mar 14, 2010 9:39 PM EDT up reply actions
Awesome stuff
Jeff, this is fantastic.
Did you run this with 2008 numbers and see how the results looked for 2009?
Come check out Bullpen Banter
Follow Bullpen Banter on Twitter
Follow me on Twitter
Remember: baseball guys... baseball...
lol..
sorry it was a long weekend.
Come check out Bullpen Banter
Follow Bullpen Banter on Twitter
Follow me on Twitter
Remember: baseball guys... baseball...
Published formula is not working out for me, does anyone else have this problem?
I tried numbers both in % and decimal form; the BB rate is almost always negative, since the coefficient on the O-swing and z-swing are nearly identical, and Z-Swing % is a lot higher.
Why does O-swing rate have a positive coefficient w.r.t BB%? Shouldn’t they be inversely related?
I will look at them later tonight.
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman (TucsonRoyal) on Mar 14, 2010 9:07 PM EDT up reply actions
Equations corrected
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman (TucsonRoyal) on Mar 14, 2010 9:33 PM EDT up reply actions
Don't hitters change their approach with 1 strike and 2 strikes?
Or is this effect negligible?
by benderbrodriguez on Mar 14, 2010 8:38 PM EDT reply actions
It's not negligable or un-neglibible
It just isn’t in the scope of what Jeff is presenting in this article.
by vivaelpujols on Mar 15, 2010 1:59 AM EDT up reply actions
What nick said
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman (TucsonRoyal) on Mar 15, 2010 9:52 AM EDT up reply actions
Cool stuff Jeff
I think Mike Silver did something similar back at StatSpeak, but of course we can’t access the pages now, so let’s just say you’re the first one to do this ;)
The real test is to take the guys with the biggest discrepancies in 2008 and see whether or not their 2009 numbers we’re closer to the actual walk rate or the xwalk rate. You should also look at a year to year correlation of walk rate – x walk rate. If the difference is just luck, we should see a near 0 correlation.
I am going to get 5 years worth of data first and run the regression again
I used one year’s worth just to see if anything was actually there. Once I get the 5 years of data, it should then be easy to look at trends – year to year rates
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman (TucsonRoyal) on Mar 15, 2010 9:55 AM EDT up reply actions
It can be done
The data is there:
http://www.fangraphs.com/leaders.aspx?pos=all&stats=pit&lg=all&qual=y&type=5&season=2009&month=0
I am more believing Nick’s work where as long as the pitcher is throwing the same stuff, MPH and movement, the results are more based on luck.
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman (TucsonRoyal) on Mar 15, 2010 9:51 AM EDT up reply actions
Jeff Zimmerman:
Awesome.
Check out Two Out Rally, the new BASEBALL MMORPG, coming soon!
twooutrally.com | (on Facebook) | (on Twitter)

by 


























