clock menu more-arrow no yes mobile

Filed under:

Simplifying Batter Strike Zone Knowledge for Pitch F/X Data

I have been working with Pitch F/X data to determine a player's knowledge of the strike zone. Fangraphs.com has some values for hitters and strike zone, but I wanted to control for the strike zone called by umpires and make the data more usable. The following looks at each step in determining a hitter's strike zone judgment and possible improvement methods.

 

I began by taking the total pitch counts for right handed hitter's (RHH) and left handed hitter's (LHH) balls and strikes for 2007, 2008, 2009. I used the LHH and RHH strike zones I determined previously. I have been working at refining them further, but will use the previous zones for now.  Here is a list of descriptions of each pitch as available from Pitch FX:

 

Pitch FX Description
Ball
Ball In Dirt
Called Strike
Foul
Foul (Runner Going)
Foul Bunt
Foul Tip
Hit By Pitch
In play, no out
In play, out(s)
In play, run(s)
Intent Ball
Missed Bunt
Pitchout
Swinging Strike
Swinging Strike (Blocked)
Unknown Strike
(blank)

 

I removed the following values as they do not matter in determining the batters judgment or the pitch data is unknown:

 

Blank, HBP, Intent Ball, Pitchout and Unknown Strike

 

The following values were combined as they fall into the same general categories.

 

Balls: Ball, Ball in Dirt

Fouls: Foul, Foul (Runner Going), Foul Bunt, Foul Tip

Swinging Strikes: Missed Bunt, Swinging Strike, Swinging Strike Blocked

 

Here are the %'s for each category from 2007, 2008, 2009 and all 3 years for both RHHs and LHHs:

 

RHH Balls 2007 2008 2009 3 years
LHH Balls 2007 2008 2009 3 years
Balls 62.0% 63.5% 62.9% 63.0%
Balls 68.4% 70.0% 70.2% 69.8%
Called Strike 6.1% 5.7% 5.8% 5.8%
Called Strike 4.2% 3.5% 3.4% 3.6%
Fouls 12.3% 11.9% 12.2% 12.1%
Fouls 10.9% 10.5% 10.1% 10.4%
In play, no out 2.1% 1.9% 2.0% 2.0%
In play, no out 1.6% 1.5% 1.5% 1.5%
In play, out(s) 6.7% 6.6% 6.6% 6.6%
In play, out(s) 5.3% 5.0% 5.1% 5.1%
In play, run(s) 1.2% 1.0% 1.1% 1.1%
In play, run(s) 0.8% 0.7% 0.7% 0.8%
Swinging Strikes 9.6% 9.3% 9.5% 9.5%
Swinging Strikes 8.9% 8.8% 8.9% 8.9%











RHH Strikes 2007 2008 2009 3 years
LHH Strikes 2007 2008 2009 3 years
Balls 10.3% 9.3% 10.2% 9.8%
Balls 11.1% 10.7% 10.6% 10.7%
Called Strike 28.2% 28.6% 29.0% 28.7%
Called Strike 28.9% 29.1% 30.1% 29.5%
Fouls 23.6% 24.2% 23.4% 23.8%
Fouls 23.8% 24.0% 23.9% 23.9%
In play, no out 6.8% 6.9% 6.7% 6.8%
In play, no out 6.7% 6.8% 6.6% 6.7%
In play, out(s) 18.7% 18.9% 18.7% 18.8%
In play, out(s) 18.1% 18.2% 17.7% 17.9%
In play, run(s) 3.8% 3.7% 3.7% 3.7%
In play, run(s) 3.6% 3.7% 3.6% 3.6%
Swinging Strikes 8.6% 8.4% 8.2% 8.4%
Swinging Strikes 7.7% 7.6% 7.5% 7.6%

 

I see no reason not to use the 3 year combined data for comparison.

 

So far it has been pretty easy.

 

Comparison of RHH and LHH totals shows values a little off, especially on balls out of the zone. I am not sure the discrepancy is because of differences in quality between LHH and RHH or that the umpires call pitches differently depending on hitter handedness. I don't feel exactly comfortable, but I will combine the numbers at this point or a baseline for both RHH and LHH:

 


LHH Balls RHH Balls Combined Balls LHH Strikes RHH Strikes Combined Strikes
Balls 69.8% 63.0% 65.8% 10.7% 9.8% 10.3%
Called Strike 3.6% 5.8% 4.9% 29.5% 28.7% 29.1%
Fouls 10.4% 12.1% 11.4% 23.9% 23.8% 23.8%
In play, no out 1.5% 2.0% 1.8% 6.7% 6.8% 6.7%
In play, out(s) 5.1% 6.6% 6.0% 17.9% 18.8% 18.4%
In play, run(s) 0.8% 1.1% 0.9% 3.6% 3.7% 3.7%
Swinging Strikes 8.9% 9.5% 9.2% 7.6% 8.4% 8.0%

 

I selected two All Star catchers to compare, Joe Mauer and Miguel Olivo (Bengie Molina All Stars).  Here are the pair's combined percentages, along with the league's overall percentages:

 

Pitches out of Strike Zone Combined Miguel Olivo Joe Mauer
Balls 65.8% 48.4% 69.5%
Called Strike 4.9% 3.8% 4.5%
Fouls 11.4% 13.9% 9.8%
In play, no out 1.8% 2.0% 2.5%
In play, out(s) 6.0% 7.3% 7.4%
In play, run(s) 0.9% 1.0% 1.9%
Swinging Strikes 9.2% 23.5% 4.5%




Pitches in Strike Zone Combined Miguel Olivo Joe Mauer
Balls 10.3% 6.1% 15.2%
Called Strike 29.1% 20.5% 37.9%
Fouls 23.8% 27.5% 17.2%
In play, no out 6.7% 6.5% 7.4%
In play, out(s) 18.4% 17.3% 14.6%
In play, run(s) 3.7% 4.8% 3.9%
Swinging Strikes 8.0% 17.3% 3.8%

 

Now I am at a point where I am not for sure what values are useful/informative to other people.  Please let me know what data is desired.

 

Here is my method for simplifying and improving the data the I see as useful.

  1. Combine Balls and Called Strikes for each category. This would be the Take % for pitches that are supposed to be either a ball or a strike.

  2. Combine the Fouls and all 3 "In Play" categories into a Contact grouping

  3. Combine In Play, no out and In Play, run(s) into a Good grouping.

  4. Combine all groups except, Balls and Called Strike into a Swinging grouping

  5. Divide the Contact grouping by the Swinging grouping to get a Contact %

  6. Divide the Good grouping by the Swinging grouping and to get a Good Contact %

Here are the Take %, Contact % and Good Contact %'s for pitches in and out of the strike zone for the league and the two hitters being compared:

 

Pitches out of Strike Zone Combined Miguel Olivo Joe Mauer
Take % 71% 52% 74%
Contact % 69% 51% 83%
Good Contact % 9% 6% 17%




Pitches in Strike Zone Combined Miguel Olivo Joe Mauer
Take % 39% 27% 53%
Contact % 87% 76% 92%
Good Contact % 17% 15% 24%

 

These numbers are fine, but without the combined values, which would be a pain to always provide, the percentages don't mean much.  To solve this problem, I converted the percentages to a 100 scale like ERA+ and OPS+.   A value of 100 is league average, while a value of 90 mean the player is 10% below the league average and a value of 108 is 8% above the league average.  Here are the values again converted to the (+) method:

 

Pitches out of Strike Zone Combined Miguel Olivo Joe Mauer
Take + 100.0 73.9 104.6
Contact + 100.0 74.1 120.9
Good Contact + 100.0 68.4 180.3




Pitches in Strike Zone Combined Miguel Olivo Joe Mauer
Take + 100.0 67.8 135.0
Contact + 100.0 88.0 106.0
Good Contact + 100.0 89.2 140.9

 

These values allow a person to know, for pitches out of the zone, that Joe Mauer is taking more than league average and when he does swing he makes good contact.  Miguel on the other hand is swinging at pitches all the time and rarely making contact when he does swing.

 

Let me know what you think.  I like the final values, but am pretty sure there is room for improvement.