I have been working with Pitch F/X data to determine a player's knowledge of the strike zone. Fangraphs.com has some values for hitters and strike zone, but I wanted to control for the strike zone called by umpires and make the data more usable. The following looks at each step in determining a hitter's strike zone judgment and possible improvement methods.
I began by taking the total pitch counts for right handed hitter's (RHH) and left handed hitter's (LHH) balls and strikes for 2007, 2008, 2009. I used the LHH and RHH strike zones I determined previously. I have been working at refining them further, but will use the previous zones for now. Here is a list of descriptions of each pitch as available from Pitch FX:
Pitch FX Description |
Ball |
Ball In Dirt |
Called Strike |
Foul |
Foul (Runner Going) |
Foul Bunt |
Foul Tip |
Hit By Pitch |
In play, no out |
In play, out(s) |
In play, run(s) |
Intent Ball |
Missed Bunt |
Pitchout |
Swinging Strike |
Swinging Strike (Blocked) |
Unknown Strike |
(blank) |
I removed the following values as they do not matter in determining the batters judgment or the pitch data is unknown:
Blank, HBP, Intent Ball, Pitchout and Unknown Strike
The following values were combined as they fall into the same general categories.
Balls: Ball, Ball in Dirt
Fouls: Foul, Foul (Runner Going), Foul Bunt, Foul Tip
Swinging Strikes: Missed Bunt, Swinging Strike, Swinging Strike Blocked
Here are the %'s for each category from 2007, 2008, 2009 and all 3 years for both RHHs and LHHs:
RHH Balls | 2007 | 2008 | 2009 | 3 years | LHH Balls | 2007 | 2008 | 2009 | 3 years | |
Balls | 62.0% | 63.5% | 62.9% | 63.0% | Balls | 68.4% | 70.0% | 70.2% | 69.8% | |
Called Strike | 6.1% | 5.7% | 5.8% | 5.8% | Called Strike | 4.2% | 3.5% | 3.4% | 3.6% | |
Fouls | 12.3% | 11.9% | 12.2% | 12.1% | Fouls | 10.9% | 10.5% | 10.1% | 10.4% | |
In play, no out | 2.1% | 1.9% | 2.0% | 2.0% | In play, no out | 1.6% | 1.5% | 1.5% | 1.5% | |
In play, out(s) | 6.7% | 6.6% | 6.6% | 6.6% | In play, out(s) | 5.3% | 5.0% | 5.1% | 5.1% | |
In play, run(s) | 1.2% | 1.0% | 1.1% | 1.1% | In play, run(s) | 0.8% | 0.7% | 0.7% | 0.8% | |
Swinging Strikes | 9.6% | 9.3% | 9.5% | 9.5% | Swinging Strikes | 8.9% | 8.8% | 8.9% | 8.9% | |
RHH Strikes | 2007 | 2008 | 2009 | 3 years | LHH Strikes | 2007 | 2008 | 2009 | 3 years | |
Balls | 10.3% | 9.3% | 10.2% | 9.8% | Balls | 11.1% | 10.7% | 10.6% | 10.7% | |
Called Strike | 28.2% | 28.6% | 29.0% | 28.7% | Called Strike | 28.9% | 29.1% | 30.1% | 29.5% | |
Fouls | 23.6% | 24.2% | 23.4% | 23.8% | Fouls | 23.8% | 24.0% | 23.9% | 23.9% | |
In play, no out | 6.8% | 6.9% | 6.7% | 6.8% | In play, no out | 6.7% | 6.8% | 6.6% | 6.7% | |
In play, out(s) | 18.7% | 18.9% | 18.7% | 18.8% | In play, out(s) | 18.1% | 18.2% | 17.7% | 17.9% | |
In play, run(s) | 3.8% | 3.7% | 3.7% | 3.7% | In play, run(s) | 3.6% | 3.7% | 3.6% | 3.6% | |
Swinging Strikes | 8.6% | 8.4% | 8.2% | 8.4% | Swinging Strikes | 7.7% | 7.6% | 7.5% | 7.6% |
I see no reason not to use the 3 year combined data for comparison.
So far it has been pretty easy.
Comparison of RHH and LHH totals shows values a little off, especially on balls out of the zone. I am not sure the discrepancy is because of differences in quality between LHH and RHH or that the umpires call pitches differently depending on hitter handedness. I don't feel exactly comfortable, but I will combine the numbers at this point or a baseline for both RHH and LHH:
LHH Balls | RHH Balls | Combined Balls | LHH Strikes | RHH Strikes | Combined Strikes | |
Balls | 69.8% | 63.0% | 65.8% | 10.7% | 9.8% | 10.3% |
Called Strike | 3.6% | 5.8% | 4.9% | 29.5% | 28.7% | 29.1% |
Fouls | 10.4% | 12.1% | 11.4% | 23.9% | 23.8% | 23.8% |
In play, no out | 1.5% | 2.0% | 1.8% | 6.7% | 6.8% | 6.7% |
In play, out(s) | 5.1% | 6.6% | 6.0% | 17.9% | 18.8% | 18.4% |
In play, run(s) | 0.8% | 1.1% | 0.9% | 3.6% | 3.7% | 3.7% |
Swinging Strikes | 8.9% | 9.5% | 9.2% | 7.6% | 8.4% | 8.0% |
I selected two All Star catchers to compare, Joe Mauer and Miguel Olivo (Bengie Molina All Stars). Here are the pair's combined percentages, along with the league's overall percentages:
Pitches out of Strike Zone | Combined | Miguel Olivo | Joe Mauer |
Balls | 65.8% | 48.4% | 69.5% |
Called Strike | 4.9% | 3.8% | 4.5% |
Fouls | 11.4% | 13.9% | 9.8% |
In play, no out | 1.8% | 2.0% | 2.5% |
In play, out(s) | 6.0% | 7.3% | 7.4% |
In play, run(s) | 0.9% | 1.0% | 1.9% |
Swinging Strikes | 9.2% | 23.5% | 4.5% |
Pitches in Strike Zone | Combined | Miguel Olivo | Joe Mauer |
Balls | 10.3% | 6.1% | 15.2% |
Called Strike | 29.1% | 20.5% | 37.9% |
Fouls | 23.8% | 27.5% | 17.2% |
In play, no out | 6.7% | 6.5% | 7.4% |
In play, out(s) | 18.4% | 17.3% | 14.6% |
In play, run(s) | 3.7% | 4.8% | 3.9% |
Swinging Strikes | 8.0% | 17.3% | 3.8% |
Now I am at a point where I am not for sure what values are useful/informative to other people. Please let me know what data is desired.
Here is my method for simplifying and improving the data the I see as useful.
-
Combine Balls and Called Strikes for each category. This would be the Take % for pitches that are supposed to be either a ball or a strike.
-
Combine the Fouls and all 3 "In Play" categories into a Contact grouping
-
Combine In Play, no out and In Play, run(s) into a Good grouping.
-
Combine all groups except, Balls and Called Strike into a Swinging grouping
-
Divide the Contact grouping by the Swinging grouping to get a Contact %
-
Divide the Good grouping by the Swinging grouping and to get a Good Contact %
Here are the Take %, Contact % and Good Contact %'s for pitches in and out of the strike zone for the league and the two hitters being compared:
Pitches out of Strike Zone | Combined | Miguel Olivo | Joe Mauer |
Take % | 71% | 52% | 74% |
Contact % | 69% | 51% | 83% |
Good Contact % | 9% | 6% | 17% |
Pitches in Strike Zone | Combined | Miguel Olivo | Joe Mauer |
Take % | 39% | 27% | 53% |
Contact % | 87% | 76% | 92% |
Good Contact % | 17% | 15% | 24% |
These numbers are fine, but without the combined values, which would be a pain to always provide, the percentages don't mean much. To solve this problem, I converted the percentages to a 100 scale like ERA+ and OPS+. A value of 100 is league average, while a value of 90 mean the player is 10% below the league average and a value of 108 is 8% above the league average. Here are the values again converted to the (+) method:
Pitches out of Strike Zone | Combined | Miguel Olivo | Joe Mauer |
Take + | 100.0 | 73.9 | 104.6 |
Contact + | 100.0 | 74.1 | 120.9 |
Good Contact + | 100.0 | 68.4 | 180.3 |
Pitches in Strike Zone | Combined | Miguel Olivo | Joe Mauer |
Take + | 100.0 | 67.8 | 135.0 |
Contact + | 100.0 | 88.0 | 106.0 |
Good Contact + | 100.0 | 89.2 | 140.9 |
These values allow a person to know, for pitches out of the zone, that Joe Mauer is taking more than league average and when he does swing he makes good contact. Miguel on the other hand is swinging at pitches all the time and rarely making contact when he does swing.
Let me know what you think. I like the final values, but am pretty sure there is room for improvement.