Ever since I have seen and used Pitch F/X data, I have been trying to grasp the difference between the umpire-called strike zone and the textbook strike zone.
The official rule book definition of the strike zone has changed over the years. Since 1996, the offical definition according to MLB is:
"The Strike Zone is that area over home plate the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants, and the lower level is a line at the bottom of the knees. The Strike Zone shall be determined from the batter's stance as the batter is prepared to swing at a pitched ball."
On TV or on MLB Gameday there is a nice box that shows this perfect strike zone. After working and observing PitchFX data for a while, the strike zone is not that perfect and the following is a look at the basics of the called strike zone and how the batter handedness effects it.
Basics
When looking at the strike zone home plate umpires call, we must remember the following points:
1. Umpires are humans
2. Humans produce inconsistent work and make mistakes.
3. For the foreseeable future, computers will not call balls and strikes.
With this understanding, the umpire's strike zone generally consistent in size. Umpires' zones should be investigated, but not for evaluating the umpires. Instead, their strike zone tendencies should be known so it can be seen how well pitchers and hitters adapt to the different zones, which is a great part of the game baseball. Gamblers, who have more at stake than team pride, have been tracking umpire stats for years to see how they affect game scoring.
I have looked at a few cases of suspect umpire calls (Milton Bradley, Shane Victorino and Zack Greinke) and in each case the umpires did nothing different than what they do every night. They call their own unique game and the hitters and pitchers that adapt first usually will have an advantage.
Handedness
The strike zone that umpires call isn't a perfect box as shown on MLB Gameday. When an umpire positions himself behind the catcher, he moves to the inside part of the plate (depending on batter handedness). This adjustment can be seen the following two images.
Because umpires are positioned to see the inside pitch, they call balls and strikes more consitenly on the inside versus the outside. Besides the lack of consistency on the outside part of the plate, the strike zone shifts inside between 0.2 to 0.4 feet depending on the batter's handedness. The shift can be seen in the following image of 3000 called strikes vs LHH.
Note: The Gameday zone shown is 1.5 feet off the ground to 3.5 feet tall and extends 1 foot in each direction from the center of the plate.
Finally, the called strike zone is circular in shape as can be seen in the preceding image. Whenever it is said that the umpire is not calling the corners, it is probably because the corners aren't generally called.
To deal with all these aspects, I created a strike zone that is shaped like a cross (since corners aren't called, I won't count them) and shifted to the inside part of the plate. A circle would be ideal, but the cross is easier to display in graphics and use when running queries on the original data in SQL
I tried several ideas to position the zone and I decided on the following method. I found, through trial and error, the zone where there was the same percentage of pitches out of the zone called balls as there were called strikes in the zone. Initially, I created the zones by eyeballing one of the previous plot and then I adjusted the dimensions until the percentages were close. Here are the dimensions, the percent of balls and strikes in and out of the zones and 4 images depicting the zones against actual called balls and strikes.
Left Handed Hitters | |||
Zone 1 | Zone 2 | ||
x coordinates | y coordinates | x coordinates | y coordinates |
-1.1 | 1.5 | -1.4 | 2 |
0.3 | 3.5 | 0.8 | 3 |
Right Handed Hitters | |||
Zone 1 | Zone 2 | ||
x coordinates | y coordinates | x coordinates | y coordinates |
-0.4 | 1.5 | -0.75 | 2 |
0.8 | 3.5 | 1.25 | 3 |
Corrected Zone | |||
Right Handed | Total | Pitches in Zone | % of total |
Total Balls | 340226 | 291631 | 85.7% |
Total Strikes | 169836 | 141429 | 83.3% |
Left Handed | |||
Total Balls | 275819 | 238158 | 86.3% |
Total Strikes | 133829 | 114038 | 85.2% |
The zone for left-handed hitters is shifted even more inside than that for right handed hitters. I have tried to find a good explanation for this shift and have had no luck.
Previously I did a similar study that didn't adjust for handedness and found zones that, at best, had a 79%/85% Strike/Ball split. I think the 85%/85% split is much better, especially since it is that way for two separate zones. If you want the queries to use on your own dataset, here is a document that contains them.
Using these 2 zones, I created 3 boxes using the smallest, average or largest extents of the cross for use in other queries. Here are the extent boxes, along with the percent of called balls and strikes.
Left Handed | Right Handed | ||
Small Zone | |||
x coordinates | y coordinates | x coordinates | y coordinates |
-1.1 | 2.2 | -0.4 | 2 |
0.3 | 2.8 | 0.8 | 3 |
Average Zone | |||
x coordinates | y coordinates | x coordinates | y coordinates |
-1.250 | 1.850 | -0.575 | 1.750 |
0.550 | 3.150 | 1.025 | 3.250 |
Large Zone | |||
x coordinates | y coordinates | x coordinates | y coordinates |
-1.4 | 1.5 | -0.75 | 1.5 |
0.8 | 3.5 | 1.25 | 3.5 |
Square Zones | |||
Big Zone | |||
Right Handed | Total | Pitches in Zone | % of total |
Total Balls | 340226 | 263410 | 77.4% |
Total Strikes | 169836 | 152849 | 90.0% |
Left Handed | Total | Pitches in or out of Zone | % of total |
Total Balls | 275819 | 211072 | 76.5% |
Total Strikes | 133829 | 128172 | 95.8% |
Average Zone | |||
Right Handed | Total | Pitches in or out of Zone | % of total |
Total Balls | 340226 | 317308 | 93.3% |
Total Strikes | 169836 | 125755 | 74.0% |
Left Handed | Total | Pitches in or out of Zone | % of total |
Total Balls | 275819 | 261143 | 94.7% |
Total Strikes | 133829 | 99009 | 74.0% |
Small Zone | |||
Right Handed | Total | Pitches in or out of Zone | % of total |
Total Balls | 340226 | 336697 | 99.0% |
Total Strikes | 169836 | 73726 | 43.4% |
Left Handed | Total | Pitches in or out of Zone | % of total |
Total Balls | 275819 | 274076 | 99.4% |
Total Strikes | 133829 | 42138 | 31.5% |
Uses
- Small Zone: This zone can be used when looking at the heart of the plate. 99% of all balls are out of this zone, so any pitch throw here will probably be a strike.
- Average Zone: This zone can be used in place of the cross for simplicity.
- Large Zone: I plan on using this to see which batters do or don't have knowledge of the strike zone. Most (>90%) pitches inside this zone are strikes, so that batter should be swinging at any pitches out here.
Please let me know if there are any questions. I will be looking at zone difference depending on pitcher and batter handedness in the next installment.