Simplifying Batter Strike Zone Knowledge for Pitch F/X Data
I have been working with Pitch F/X data to determine a player's knowledge of the strike zone. Fangraphs.com has some values for hitters and strike zone, but I wanted to control for the strike zone called by umpires and make the data more usable. The following looks at each step in determining a hitter's strike zone judgment and possible improvement methods.
I began by taking the total pitch counts for right handed hitter's (RHH) and left handed hitter's (LHH) balls and strikes for 2007, 2008, 2009. I used the LHH and RHH strike zones I determined previously. I have been working at refining them further, but will use the previous zones for now. Here is a list of descriptions of each pitch as available from Pitch FX:
| Pitch FX Description |
| Ball |
| Ball In Dirt |
| Called Strike |
| Foul |
| Foul (Runner Going) |
| Foul Bunt |
| Foul Tip |
| Hit By Pitch |
| In play, no out |
| In play, out(s) |
| In play, run(s) |
| Intent Ball |
| Missed Bunt |
| Pitchout |
| Swinging Strike |
| Swinging Strike (Blocked) |
| Unknown Strike |
| (blank) |
I removed the following values as they do not matter in determining the batters judgment or the pitch data is unknown:
Blank, HBP, Intent Ball, Pitchout and Unknown Strike
The following values were combined as they fall into the same general categories.
Balls: Ball, Ball in Dirt
Fouls: Foul, Foul (Runner Going), Foul Bunt, Foul Tip
Swinging Strikes: Missed Bunt, Swinging Strike, Swinging Strike Blocked
Here are the %'s for each category from 2007, 2008, 2009 and all 3 years for both RHHs and LHHs:
| RHH Balls | 2007 | 2008 | 2009 | 3 years | LHH Balls | 2007 | 2008 | 2009 | 3 years | |
| Balls | 62.0% | 63.5% | 62.9% | 63.0% | Balls | 68.4% | 70.0% | 70.2% | 69.8% | |
| Called Strike | 6.1% | 5.7% | 5.8% | 5.8% | Called Strike | 4.2% | 3.5% | 3.4% | 3.6% | |
| Fouls | 12.3% | 11.9% | 12.2% | 12.1% | Fouls | 10.9% | 10.5% | 10.1% | 10.4% | |
| In play, no out | 2.1% | 1.9% | 2.0% | 2.0% | In play, no out | 1.6% | 1.5% | 1.5% | 1.5% | |
| In play, out(s) | 6.7% | 6.6% | 6.6% | 6.6% | In play, out(s) | 5.3% | 5.0% | 5.1% | 5.1% | |
| In play, run(s) | 1.2% | 1.0% | 1.1% | 1.1% | In play, run(s) | 0.8% | 0.7% | 0.7% | 0.8% | |
| Swinging Strikes | 9.6% | 9.3% | 9.5% | 9.5% | Swinging Strikes | 8.9% | 8.8% | 8.9% | 8.9% | |
| RHH Strikes | 2007 | 2008 | 2009 | 3 years | LHH Strikes | 2007 | 2008 | 2009 | 3 years | |
| Balls | 10.3% | 9.3% | 10.2% | 9.8% | Balls | 11.1% | 10.7% | 10.6% | 10.7% | |
| Called Strike | 28.2% | 28.6% | 29.0% | 28.7% | Called Strike | 28.9% | 29.1% | 30.1% | 29.5% | |
| Fouls | 23.6% | 24.2% | 23.4% | 23.8% | Fouls | 23.8% | 24.0% | 23.9% | 23.9% | |
| In play, no out | 6.8% | 6.9% | 6.7% | 6.8% | In play, no out | 6.7% | 6.8% | 6.6% | 6.7% | |
| In play, out(s) | 18.7% | 18.9% | 18.7% | 18.8% | In play, out(s) | 18.1% | 18.2% | 17.7% | 17.9% | |
| In play, run(s) | 3.8% | 3.7% | 3.7% | 3.7% | In play, run(s) | 3.6% | 3.7% | 3.6% | 3.6% | |
| Swinging Strikes | 8.6% | 8.4% | 8.2% | 8.4% | Swinging Strikes | 7.7% | 7.6% | 7.5% | 7.6% |
I see no reason not to use the 3 year combined data for comparison.
So far it has been pretty easy.
Comparison of RHH and LHH totals shows values a little off, especially on balls out of the zone. I am not sure the discrepancy is because of differences in quality between LHH and RHH or that the umpires call pitches differently depending on hitter handedness. I don't feel exactly comfortable, but I will combine the numbers at this point or a baseline for both RHH and LHH:
| LHH Balls | RHH Balls | Combined Balls | LHH Strikes | RHH Strikes | Combined Strikes | |
| Balls | 69.8% | 63.0% | 65.8% | 10.7% | 9.8% | 10.3% |
| Called Strike | 3.6% | 5.8% | 4.9% | 29.5% | 28.7% | 29.1% |
| Fouls | 10.4% | 12.1% | 11.4% | 23.9% | 23.8% | 23.8% |
| In play, no out | 1.5% | 2.0% | 1.8% | 6.7% | 6.8% | 6.7% |
| In play, out(s) | 5.1% | 6.6% | 6.0% | 17.9% | 18.8% | 18.4% |
| In play, run(s) | 0.8% | 1.1% | 0.9% | 3.6% | 3.7% | 3.7% |
| Swinging Strikes | 8.9% | 9.5% | 9.2% | 7.6% | 8.4% | 8.0% |
I selected two All Star catchers to compare, Joe Mauer and Miguel Olivo (Bengie Molina All Stars). Here are the pair's combined percentages, along with the league's overall percentages:
| Pitches out of Strike Zone | Combined | Miguel Olivo | Joe Mauer |
| Balls | 65.8% | 48.4% | 69.5% |
| Called Strike | 4.9% | 3.8% | 4.5% |
| Fouls | 11.4% | 13.9% | 9.8% |
| In play, no out | 1.8% | 2.0% | 2.5% |
| In play, out(s) | 6.0% | 7.3% | 7.4% |
| In play, run(s) | 0.9% | 1.0% | 1.9% |
| Swinging Strikes | 9.2% | 23.5% | 4.5% |
| Pitches in Strike Zone | Combined | Miguel Olivo | Joe Mauer |
| Balls | 10.3% | 6.1% | 15.2% |
| Called Strike | 29.1% | 20.5% | 37.9% |
| Fouls | 23.8% | 27.5% | 17.2% |
| In play, no out | 6.7% | 6.5% | 7.4% |
| In play, out(s) | 18.4% | 17.3% | 14.6% |
| In play, run(s) | 3.7% | 4.8% | 3.9% |
| Swinging Strikes | 8.0% | 17.3% | 3.8% |
Now I am at a point where I am not for sure what values are useful/informative to other people. Please let me know what data is desired.
Here is my method for simplifying and improving the data the I see as useful.
-
Combine Balls and Called Strikes for each category. This would be the Take % for pitches that are supposed to be either a ball or a strike.
-
Combine the Fouls and all 3 "In Play" categories into a Contact grouping
-
Combine In Play, no out and In Play, run(s) into a Good grouping.
-
Combine all groups except, Balls and Called Strike into a Swinging grouping
-
Divide the Contact grouping by the Swinging grouping to get a Contact %
-
Divide the Good grouping by the Swinging grouping and to get a Good Contact %
Here are the Take %, Contact % and Good Contact %'s for pitches in and out of the strike zone for the league and the two hitters being compared:
| Pitches out of Strike Zone | Combined | Miguel Olivo | Joe Mauer |
| Take % | 71% | 52% | 74% |
| Contact % | 69% | 51% | 83% |
| Good Contact % | 9% | 6% | 17% |
| Pitches in Strike Zone | Combined | Miguel Olivo | Joe Mauer |
| Take % | 39% | 27% | 53% |
| Contact % | 87% | 76% | 92% |
| Good Contact % | 17% | 15% | 24% |
These numbers are fine, but without the combined values, which would be a pain to always provide, the percentages don't mean much. To solve this problem, I converted the percentages to a 100 scale like ERA+ and OPS+. A value of 100 is league average, while a value of 90 mean the player is 10% below the league average and a value of 108 is 8% above the league average. Here are the values again converted to the (+) method:
| Pitches out of Strike Zone | Combined | Miguel Olivo | Joe Mauer |
| Take + | 100.0 | 73.9 | 104.6 |
| Contact + | 100.0 | 74.1 | 120.9 |
| Good Contact + | 100.0 | 68.4 | 180.3 |
| Pitches in Strike Zone | Combined | Miguel Olivo | Joe Mauer |
| Take + | 100.0 | 67.8 | 135.0 |
| Contact + | 100.0 | 88.0 | 106.0 |
| Good Contact + | 100.0 | 89.2 | 140.9 |
These values allow a person to know, for pitches out of the zone, that Joe Mauer is taking more than league average and when he does swing he makes good contact. Miguel on the other hand is swinging at pitches all the time and rarely making contact when he does swing.
Let me know what you think. I like the final values, but am pretty sure there is room for improvement.
7 comments
|
1 recs |
Do you like this story?
Comments
Understanding strike zones
I think you’e on to something but I will have to dig deeper to see what modifications need to be made. Also, you have to account for adjustments by the batter. Sometimes a batter may move off the plate and then lean in to make it look like he’s in the same spot but in reality he’s moved off a few inches so now pitches that are over the inside of the plate are effectively the same as being over the middle of the plate. That is something that will never show up in stats.
You might catch a little grief for bringing together all the different fouls. I could see doing two categories of fouls (fouls hit, foul tips).
Bettman's Nightmare: A Blog Where Hockey Aficionados Dismantle That Mighty Empire, One Balsillie at a Time
http://bettmansnightmare.blogspot.com/
by Bettman's Nightmare on Jan 31, 2010 11:40 AM EST reply actions
For the Olivo/Mauer tables...
“Pitches in Strike Zone” and “Pitches out of Strike Zone” should be switched.
Corrected, thanks.
Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.
by Jeff Zimmerman on Jan 31, 2010 3:27 PM EST up reply actions
The table that is third from the bottom is backwards too.
This is pretty interesting data, by the way.
can LHP and RHP be added?
seems like the split would show a big difference b/t what batters do against same-handers vs opposite-handers. also may get at what is causing the LHH to get more balls called as balls.
Zapp Brannigan/Dayton Moore quote of the day: "In the game of chess you can never let your opponent see your pieces"
by SagehenMacGyver47 on Feb 1, 2010 8:57 PM EST reply actions
I am working on the LHP and RHP, I have one 1 of the 4 combos done, just don't remember which
As fellow BtB writer Harry told me once, “The left handed strike is all #$%$ up.”
Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.
by Jeff Zimmerman on Feb 1, 2010 9:14 PM EST up reply actions

by 

















