Beyond the Box Score: An SB Nation Community

Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook

Using Plate Discipline to Estimate Walk and Strikeout Rate - Corrected

For a while, I have wondered if pitch data could be used to estimate a player's walk and strikeout rates. At Fangraphs.com, they display the percentage rate for pitches swung at and hit inside and outside the strike zone for each player (O-Swing%,Z-SWing%,O-Contact%,Z-Contact%). Using multiple variate regression, I took the 4 variables (outside swing and miss, outside contact, strike zone swing and miss, strike zone contact) and compared them to strikeout and walk percentages.

Star-divide

For this first run, I looked all the qualified hitters (500 PAs) from 2009. For the strikeout percentage, I got a r-squared of 0.89 and a standard deviation of 2.0% on the difference from the projected and final values. For the walk percentage, I ended up with an r-squared of 0.63 and a standard deviation of 2.0 on the difference from the projected and final values.

 

I went to look through this dataset and saw that some players had an actual much higher actual walk rate vice projected from a 6% to 8%. These players were all great hitters (Fielder, Pujols, A, Gonzalez) and it dawned that IBB was included in the walk rate and I needed to factor it in. I included a fifth variable in the walk calculations, IBB/PA and re-ran the regression. The results were much better. With an r-squared of 0.79 and the standard deviation of 0.15%. The highest percentage difference was 4% vice 8%. Here are the equations for estimating walk and strikeout rate:

SO% = ((-0.0407*O-Swing%)+(-0.2417 * Z-SWing%)+(-0.2429*O-Contact%)+(-0.8765*Z-Contact%) + 1.2885)*100%

BB% = ((-0.4134*O-Swing%)+(-0.0328*Z-SWing%)+(0.0216*O-Contact%)+(-0.2595*Z-Contact%)+ (1.7203*IBB per PA)+0.4217)*100%

Using these values, here are the players that I looked at the most deviate from the estimate and could be due for a correction in 2010:

Name 2010 Team 2009 Walk Rate 2009 Estimated Walk Rate Estimated – Actual
Ichiro Suzuki Mariners 4.7% 8.4% 3.7%
B.J. Upton Rays 9.1% 11.9% 2.8%
Franklin Gutierrez Mariners 7.3% 9.9% 2.6%
Jason Kubel Twins 9.7% 12.0% 2.3%





Ben Zobrist Rays 15.2% 11.8% -3.4%
Nick Swisher Yankees 16.0% 12.5% -3.5%
Kosuke Fukudome Cubs 15.4% 11.6% -3.8%
Nick Johnson Yankees 17.2% 13.3% -3.9%





Name 2010 Team 2009 Strikeout Rate 2009 Estimated Strikeout Rate Estimated – Actual
Brian Roberts Orioles 17.7% 12.2% -5.5%
David Wright Mets 26.2% 20.7% -5.5%
Alfonso Soriano Cubs 24.7% 20.3% -4.4%
Kevin Youkilis Red Sox 25.5% 21.2% -4.3%





Yadier Molina Cardinals 8.1% 12.2% 4.1%
Hunter Pence Astros 18.6% 23.0% 4.4%
Brandon Phillips Reds 12.8% 17.2% 4.4%
Yunel Escobar Braves 11.7% 16.5% 4.8%

I like the initial results and I am planning to add a few more years worth of data to get a better equation. I can see this formula being used to see if changes in walk and strike out rates is because of changes in plate discipline or just noise in the data.

1 recs  |  Comment 21 comments |

Story-email Email Printer Print

Comments

Display:

Does this include intentional BB’s?

Bettman's Nightmare: A Blog Where Hockey Aficionados Dismantle That Mighty Empire, One Balsillie at a Time

http://bettmansnightmare.blogspot.com/

by Bettman's Nightmare on Mar 14, 2010 2:12 PM EDT reply actions  

Nm, I breezed over that part on accident.

Bettman's Nightmare: A Blog Where Hockey Aficionados Dismantle That Mighty Empire, One Balsillie at a Time

http://bettmansnightmare.blogspot.com/

by Bettman's Nightmare on Mar 14, 2010 2:13 PM EDT up reply actions  

This is pure, unadulterated awesome

Also might add bad calls to the list, or categorize it under noise.

I’d like to use your approach on Rotobase, if that’s OK Jeff. I’ll make a graph with career BB%, BB% and fxBB%.

I think it will rock. Let me know.

by Josh Hermsmeyer on Mar 14, 2010 2:19 PM EDT reply actions  

that is, of course, assuming I can generate them from pitchf/x data. I believe it’s possible. Has anyone ever tried, or is BIS the only place for it?

by Josh Hermsmeyer on Mar 14, 2010 2:51 PM EDT up reply actions  

Thanks Nick! I’ll take you up on it if I run into a wall :-)

by Josh Hermsmeyer on Mar 15, 2010 2:20 PM EDT up reply actions  

Perfect. Just what I needed.

by Josh Hermsmeyer on Mar 15, 2010 2:20 PM EDT up reply actions  

Awesome stuff

Jeff, this is fantastic.

Did you run this with 2008 numbers and see how the results looked for 2009?

by JDSussman on Mar 14, 2010 3:42 PM EDT reply actions  

Published formula is not working out for me, does anyone else have this problem?

I tried numbers both in % and decimal form; the BB rate is almost always negative, since the coefficient on the O-swing and z-swing are nearly identical, and Z-Swing % is a lot higher.

Why does O-swing rate have a positive coefficient w.r.t BB%? Shouldn’t they be inversely related?

by Telegraph on Mar 14, 2010 6:33 PM EDT reply actions  

I will look at them later tonight.

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman (TucsonRoyal) on Mar 14, 2010 9:07 PM EDT up reply actions  

Equations corrected

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman (TucsonRoyal) on Mar 14, 2010 9:33 PM EDT up reply actions  

It's not negligable or un-neglibible

It just isn’t in the scope of what Jeff is presenting in this article.

by vivaelpujols on Mar 15, 2010 1:59 AM EDT up reply actions  

What nick said

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman (TucsonRoyal) on Mar 15, 2010 9:52 AM EDT up reply actions  

Cool stuff Jeff

I think Mike Silver did something similar back at StatSpeak, but of course we can’t access the pages now, so let’s just say you’re the first one to do this ;)

The real test is to take the guys with the biggest discrepancies in 2008 and see whether or not their 2009 numbers we’re closer to the actual walk rate or the xwalk rate. You should also look at a year to year correlation of walk rate – x walk rate. If the difference is just luck, we should see a near 0 correlation.

by vivaelpujols on Mar 15, 2010 2:02 AM EDT reply actions  

I am going to get 5 years worth of data first and run the regression again

I used one year’s worth just to see if anything was actually there. Once I get the 5 years of data, it should then be easy to look at trends – year to year rates

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman (TucsonRoyal) on Mar 15, 2010 9:55 AM EDT up reply actions  

Do we have this for pitcher’s K/9, BB/9 and K/BB rates as well? Sorry if this is a stupid question

by kellemonster on Mar 15, 2010 2:24 AM EDT reply actions  

It can be done

The data is there:

http://www.fangraphs.com/leaders.aspx?pos=all&stats=pit&lg=all&qual=y&type=5&season=2009&month=0

I am more believing Nick’s work where as long as the pitcher is throwing the same stuff, MPH and movement, the results are more based on luck.

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman (TucsonRoyal) on Mar 15, 2010 9:51 AM EDT up reply actions  

Jeff Zimmerman:

Awesome.

Check out Two Out Rally, the new BASEBALL MMORPG, coming soon!
twooutrally.com | (on Facebook) | (on Twitter)

by Justin Bopp on Mar 15, 2010 10:39 AM EDT reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?
Start posting on Beyond the Box Score »

Join SB Nation and dive into communities focused on all your favorite teams.

Connect_with_facebook

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
FIP is a Garbage Statistic
Jeter_400_101709_small
Scarier opponent come October?
Ghanafan03_741584gm-a_small
Los Angeles Angels trade for Dan Haren
Pedoria1_small
Pointing Fingers: Rollie Fingers and WAR
Small
Rajai Davis versus Gabe Gross
Small
Year of the Pitcher
Sealab_murphy_small
Prospect Surplus Value
T-rex_small
Saberizing a Mac, revisited
Small
How do you use splits?
Sealab_murphy_small
My Wang Problem

+ New FanPost All FanPosts >

Sign up for the BtB Newsletter!

BtB on Facebook

BtB on Twitter

RSS Feed: @BtBScore

Sky: @BtB_Sky

Jeff: @jeffwzimmerman
Steve: @steve_sommer
Dan: @dturkenk
Harry: @harrypav
Jinaz: @jinazreds
Jack: @jh_moore
Tommy R: @trancel
Justin: @justinbopp
Satchel: @SatchelPrice
Adam: @baseballtwit
Larry: @wezen_ball
Peter: @CapitolAvenue
Paul: @TheDiaTribe
Daniel: @CamdenCrazies
Matt: @devil_fingers

SBNation.com Recent Stories

ST. LOUIS - MAY 18:  Ryan Ludwick #47 of the St. Louis Cardinals rounds third base after hitting a game-winning homerun against the Washington Nationals at Busch Stadium on May 18, 2010 in St. Louis, Missouri.  The Cardinals beat the Nationals 3-2.  (Photo by Dilip Vishwanat/Getty Images) +3 updates

Padres, Cardinals, Indians Complete Three-Way Trade Involving Ryan Ludwick, Jake Westbrook

SEATTLE - JULY 08:  Alex Rodriguez #13 of the New York Yankees hits an RBI single in the ninth inning to give the Yankees a 3-1 lead against the Seattle Mariners at Safeco Field on July 8 2010 in Seattle Washington. (Photo by Otto Greule Jr/Getty Images) +16 updates

Yankees' 9th-Inning Win Completely Overshadowed By A-Rod's Ongoing Homer Drought

Colorado Rockies' Carlos Gonzalez is congratulated by teammates after his walk-off home run against the Chicago Cubs in the ninth inning of a baseball game at Coors Field in Denver, Colo. on Saturday, July 31, 2010.  (AP Photo/ Matt McClain)

Carlos Gonzalez Completes Cycle With Walk-Off Homer; Rockies Beat Cubs, 6-5

More from SBNation.com >


Managers

Limes_125_small Sky Kalkman

Wbc_029_small Jeff Sullivan

Editors

Rawlings_baseball_bigger_small Dan Turkenkopf

Dayton_small Jeff Zimmerman (TucsonRoyal)

Aviles_small Justin Bopp

Paige_small Satchel Price

Authors

Jinaz-reds-avatar_small JinAZ

Face_small Harry Pavlidis

Newavatar_small Matt Klaassen

Wezenball-logo_small lar

Big_pun--300x300_small Tommy Rancel

Adam_small adarowski

Redcap_small SFiercex4

St_louis_cardinals_ce1141_003263_small stevesommer05

Small garik16

Julio_teheran_2_small PWHjort

Cclogo_small Daniel Moroz

Closeup4_small J-Doug

Nick_cage_small The DiaTriber