Umpire Strike Zone Analysis - The Basics and Batter Handedness
Ever since I have seen and used Pitch F/X data, I have been trying to grasp the difference between the umpire-called strike zone and the textbook strike zone.
The official rule book definition of the strike zone has changed over the years. Since 1996, the offical definition according to MLB is:
"The Strike Zone is that area over home plate the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants, and the lower level is a line at the bottom of the knees. The Strike Zone shall be determined from the batter's stance as the batter is prepared to swing at a pitched ball."
On TV or on MLB Gameday there is a nice box that shows this perfect strike zone. After working and observing PitchFX data for a while, the strike zone is not that perfect and the following is a look at the basics of the called strike zone and how the batter handedness effects it.
Basics
When looking at the strike zone home plate umpires call, we must remember the following points:
1. Umpires are humans
2. Humans produce inconsistent work and make mistakes.
3. For the foreseeable future, computers will not call balls and strikes.
With this understanding, the umpire's strike zone generally consistent in size. Umpires' zones should be investigated, but not for evaluating the umpires. Instead, their strike zone tendencies should be known so it can be seen how well pitchers and hitters adapt to the different zones, which is a great part of the game baseball. Gamblers, who have more at stake than team pride, have been tracking umpire stats for years to see how they affect game scoring.
I have looked at a few cases of suspect umpire calls (Milton Bradley, Shane Victorino and Zack Greinke) and in each case the umpires did nothing different than what they do every night. They call their own unique game and the hitters and pitchers that adapt first usually will have an advantage.
Handedness
The strike zone that umpires call isn't a perfect box as shown on MLB Gameday. When an umpire positions himself behind the catcher, he moves to the inside part of the plate (depending on batter handedness). This adjustment can be seen the following two images.
Because umpires are positioned to see the inside pitch, they call balls and strikes more consitenly on the inside versus the outside. Besides the lack of consistency on the outside part of the plate, the strike zone shifts inside between 0.2 to 0.4 feet depending on the batter's handedness. The shift can be seen in the following image of 3000 called strikes vs LHH.
Note: The Gameday zone shown is 1.5 feet off the ground to 3.5 feet tall and extends 1 foot in each direction from the center of the plate.
Finally, the called strike zone is circular in shape as can be seen in the preceding image. Whenever it is said that the umpire is not calling the corners, it is probably because the corners aren't generally called.
To deal with all these aspects, I created a strike zone that is shaped like a cross (since corners aren't called, I won't count them) and shifted to the inside part of the plate. A circle would be ideal, but the cross is easier to display in graphics and use when running queries on the original data in SQL
I tried several ideas to position the zone and I decided on the following method. I found, through trial and error, the zone where there was the same percentage of pitches out of the zone called balls as there were called strikes in the zone. Initially, I created the zones by eyeballing one of the previous plot and then I adjusted the dimensions until the percentages were close. Here are the dimensions, the percent of balls and strikes in and out of the zones and 4 images depicting the zones against actual called balls and strikes.
| Left Handed Hitters | |||
| Zone 1 | Zone 2 | ||
| x coordinates | y coordinates | x coordinates | y coordinates |
| -1.1 | 1.5 | -1.4 | 2 |
| 0.3 | 3.5 | 0.8 | 3 |
| Right Handed Hitters | |||
| Zone 1 | Zone 2 | ||
| x coordinates | y coordinates | x coordinates | y coordinates |
| -0.4 | 1.5 | -0.75 | 2 |
| 0.8 | 3.5 | 1.25 | 3 |
| Corrected Zone | |||
| Right Handed | Total | Pitches in Zone | % of total |
| Total Balls | 340226 | 291631 | 85.7% |
| Total Strikes | 169836 | 141429 | 83.3% |
| Left Handed | |||
| Total Balls | 275819 | 238158 | 86.3% |
| Total Strikes | 133829 | 114038 | 85.2% |


The zone for left-handed hitters is shifted even more inside than that for right handed hitters. I have tried to find a good explanation for this shift and have had no luck.
Previously I did a similar study that didn't adjust for handedness and found zones that, at best, had a 79%/85% Strike/Ball split. I think the 85%/85% split is much better, especially since it is that way for two separate zones. If you want the queries to use on your own dataset, here is a document that contains them.
Using these 2 zones, I created 3 boxes using the smallest, average or largest extents of the cross for use in other queries. Here are the extent boxes, along with the percent of called balls and strikes.
| Left Handed | Right Handed | ||
| Small Zone | |||
| x coordinates | y coordinates | x coordinates | y coordinates |
| -1.1 | 2.2 | -0.4 | 2 |
| 0.3 | 2.8 | 0.8 | 3 |
| Average Zone | |||
| x coordinates | y coordinates | x coordinates | y coordinates |
| -1.250 | 1.850 | -0.575 | 1.750 |
| 0.550 | 3.150 | 1.025 | 3.250 |
| Large Zone | |||
| x coordinates | y coordinates | x coordinates | y coordinates |
| -1.4 | 1.5 | -0.75 | 1.5 |
| 0.8 | 3.5 | 1.25 | 3.5 |
| Square Zones | |||
| Big Zone | |||
| Right Handed | Total | Pitches in Zone | % of total |
| Total Balls | 340226 | 263410 | 77.4% |
| Total Strikes | 169836 | 152849 | 90.0% |
| Left Handed | Total | Pitches in or out of Zone | % of total |
| Total Balls | 275819 | 211072 | 76.5% |
| Total Strikes | 133829 | 128172 | 95.8% |
| Average Zone | |||
| Right Handed | Total | Pitches in or out of Zone | % of total |
| Total Balls | 340226 | 317308 | 93.3% |
| Total Strikes | 169836 | 125755 | 74.0% |
| Left Handed | Total | Pitches in or out of Zone | % of total |
| Total Balls | 275819 | 261143 | 94.7% |
| Total Strikes | 133829 | 99009 | 74.0% |
| Small Zone | |||
| Right Handed | Total | Pitches in or out of Zone | % of total |
| Total Balls | 340226 | 336697 | 99.0% |
| Total Strikes | 169836 | 73726 | 43.4% |
| Left Handed | Total | Pitches in or out of Zone | % of total |
| Total Balls | 275819 | 274076 | 99.4% |
| Total Strikes | 133829 | 42138 | 31.5% |
Uses
- Small Zone: This zone can be used when looking at the heart of the plate. 99% of all balls are out of this zone, so any pitch throw here will probably be a strike.
- Average Zone: This zone can be used in place of the cross for simplicity.
- Large Zone: I plan on using this to see which batters do or don't have knowledge of the strike zone. Most (>90%) pitches inside this zone are strikes, so that batter should be swinging at any pitches out here.
Please let me know if there are any questions. I will be looking at zone difference depending on pitcher and batter handedness in the next installment.
2 recs |
16 comments
|
Comments
I really like Peter's idea for a logistic regression approach
Does anybody know how to program that in SQL?
by vivaelpujols on Nov 5, 2009 10:02 PM EST reply actions 0 recs
I know it can be done with R, but not with just SQL.
If it can’t be done with a query, I won’t move that way. SQL is tough enough for some people, not alone learning R.
Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.
by Jeff Zimmerman (TucsonRoyal) on Nov 5, 2009 10:35 PM EST up reply actions 0 recs
Maybe you could do a "poor man circle"
Just have a bunch of restrictions so that the overall strike zone is somewhat circular.
by vivaelpujols on Nov 6, 2009 12:22 AM EST up reply actions 0 recs
Also, a logistic regression is just like any other regression I think
It has a specific formula that can be used in SQL. I just have no idea what that is.
by vivaelpujols on Nov 6, 2009 12:23 AM EST up reply actions 0 recs
If you can perform an ordinary regression, you can perform a logistic regression.
You fit the parameters in the same fashion:
Y = B*X + e
The only real difference though is that Y is either 0 (for a ball) or 1 (for a strike), and so the coefficients impact the likelihood for whether a ball or strike is called.
For this purpose, I would consider the following variables:
Call: 1= Strike, 0= Ball
BHand (B): 1 = Lefty, 0 = Righty
PHand (P): 1 = Lefty, 0 = Righty
Height (H): Inches or Feet above the ground
Dist From Center (D): Inches/Feet from the center of the plate, 0 being center, negative being toward the right-handed batters box (or left-handed, I forget which way the data goes..)
Regress on:
C = B1 + B2*B + B3*P + B4*H + B5*H^2 + B6*D
I incorporate the H^2 because the probability of a strike will increase from 0 to 20 inches or so (middle of strike zone) and then begin decreasing, so a parametric function should capture this effect.
At this point, the probability of a specific pitch being a strike is equal to (1 / 1 + E^(-Z)), where Z is the predicted value of C given a set of inputs.
And then you would simply want to solve for E^-Z = 1 to figure out where the probability of a call being a ball equals the call of a ball being a strike.
by Trickman on Nov 7, 2009 2:01 PM EST up reply actions 0 recs
My understanding of the top and bottom of the Pitch F/X zone
is that it is created manually, out in a production truck, by one or two people who wait for the batter to set his front foot at each pitch. The upper and lower limits are not a fixed distance off the plate. And, they use the outfield camera to make those settings.
If true, any margin of human error assigned to the plate ump concerning high/low pitches, is conflated with the margin of human error assignable to humans sitting out in the parking lot. Even if the point of this thread is not to assign error, it still means that the data is evaluating the results of multiple humans, not just one.
(P.S. – it is also my understanding that Pitch F/X only tracks the ball to the front of the plate. That would mean that those little targets in the charts above, along with the targets presented to the viewing audience by Fox, MLB Gamecast, etc., are not seeing anything the ball might do as it travels across the plate to the catcher’s mitt. By definition, then, Pitch F/X could not properly identify and locate pitches such as “backdoor sliders”.)
I wonder what John Lackey must think about how long Sosh left Juan Rivera in the playoff lineup?
by Stirrups on Nov 6, 2009 1:16 PM EST reply actions 0 recs
Not exactly sure on the truck evaluation -- maybe someone else can expand.
On the part about the ball being evaluated from the front part of the plate is true. I am actually going to eventually get to sub-classing out the pitch types. It should be written after I do the pitcher handedness analysis.
Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.
by Jeff Zimmerman (TucsonRoyal) on Nov 6, 2009 1:40 PM EST up reply actions 0 recs
When I get back to the office, I will post some links as references for the truck manipulation.
Had I owned the Pittsburgh Pirates, I could have saved America.
by Stirrups on Nov 6, 2009 1:55 PM EST up reply actions 0 recs
A pair of sources for the "truck evaluation"
From MLB.com
“The center-field camera is used for two purposes, most important for “sizing” the batter. For the software to find the ball (or “blob” to the engineers who plot the application), there needs to be a different plane of location for Matt Holliday than for Kazuo Matsui, who is smaller in stature than Holliday. Then the crew in the truck sizes each player during batting practice, so that during the game each tracking plane is pre-set; it is remembered for each subsequent at-bat by that player."
(my screen grab of the above is not displaying clearly)
And from Sportvision.com

Had I owned the Pittsburgh Pirates, I could have saved America.
by Stirrups on Nov 6, 2009 6:05 PM EST up reply actions 0 recs
You're confusing GameDay and Pitch f/x
Pitch f/x is the raw data of the location, speed, etc. of each pitcher. Gameday, is MLB.com’s presentation of it. The strike zone that you are talking about is the one used for Gameday, and it is entered by the Pitch f/x operators. Jeff is using the raw data to create his own strike zone, based on which pitches umpires are calling strikes.
by vivaelpujols on Nov 7, 2009 2:12 AM EST up reply actions 0 recs
I am working towards that.
For starters I am focusing on teh black rectangle in Jeff’s charts. It’s that black rectangle that bothers me. Jeff writes “The Gameday zone shown is 1.5 feet off the ground to 3.5 feet tall and extends 1 foot in each direction from the center of the plate”. There is Gameday data represented in the charts, and that data is being used as a visual measurement in contrast to what the umpires actually call. I contend that Gameday data contains subjective information, and using subjective information to measure subjective information is inappropriate.
I also take issue with a fixed upper and lower limit to the zone, since those are functions of the physical stature and batting stance of the players themselves, which vary from player to player. There’s no such thing as a fixed bottom of the strike zone, nor a fixed top.
Finally, I think that the information that the Pitch F/X data stops in front of the strike zone itself is important, and needs to be explored.
Had I owned the Pittsburgh Pirates, I could have saved America.
by Stirrups on Nov 8, 2009 3:58 AM EST up reply actions 0 recs
The upper and lower limits of the zone are problematic.
And I haven’t figured out a good solution to that yet. The PITCHf/x operator (who doesn’t typically sit in a production track as far as I know—at AT&T Park he’s in a booth in the player’s parking garage) does set the top and bottom of the zone by identifying the player’s belt and the hollow of his knee from the center field camera video as the batter prepares to swing. It used to be set pregame, but now it’s being done more real time with each pitch. But the values set by the Pfx operators are not very consistent. I’ve found it’s more accurate to take a fixed percentage of the batter’s height. But the drawback of that method is that it doesn’t account for differences in stances or oddly proportioned batters.
The fact that the PITCHf/x tracker only uses trajectory data from a few feet after the ball leaves the pitcher’s hand until a few feet before it reaches the plate is not a problem, however, because the trajectory is constant acceleration, within measurement error (half inch). So we know the trajectory all the way to the catcher’s glove with high accuracy.
by Mike Fast on Nov 9, 2009 11:28 PM EST up reply actions 0 recs
I have read elsewhere (or prior on this site) that soem "normalization" of the top/bottom was necessary.
This, of course, I don’t like. And I would prefer NOT to move the human definition from to some non-transparent humans hidden away in a truck.
The upper end of the zone is, by defnition, above the belt. Specifically: “…the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants…”. As a computer technologist, I cannot help but believe that real-time computer visual recognition will eventually present a solution for this.
As for trajectory through the zone, I fully accept that the laws of physics being what they are, it is absolutely possible to project the path through the zone (although it must be assumed where the catcher recieves the ball, because they reach forward (or not) to intercept the ball. And, if my math is right, even at 60 frames per second the cameras are not necessarily going to see the ball in the zone since the ball travels further than the depth of the plate between frames. So visual aids might not present the complete flight of hte ball through the zone. The question I have would be: are the marks in your charts the point in flight where the catcher received the pitch, or the point in front of the plate where the Pitch F/X cameras stopped tracking?
Had I owned the Pittsburgh Pirates, I could have saved America.
by Stirrups on Nov 10, 2009 12:15 AM EST up reply actions 0 recs
Apologies.
My Dell laptop has a hardware problem with the trackpoint, and clicks of its own accord. This posted prior to my chance to complete my editing. Please interpret to the best of your abilities.
Had I owned the Pittsburgh Pirates, I could have saved America.
by Stirrups on Nov 10, 2009 12:18 AM EST up reply actions 0 recs
Stirrups, you might be interested in the discussion of this thread going on at the Book blog
I gave some of my thoughts there about problems with identifying the top and bottom of the zone.
There is also this older thread at the Book blog:
The PITCHf/x operator identifies the batter’s belt, and the system sets the top of the strike zone at four inches above that point. (See the image linked in Post #38 in the second thread above.)
You are correct that we don’t know the exact beginning or end point of the pitch trajectory. But we can project any point in between pretty accurately. The strike zone location data reported by MLB’s Gameday application is at the front plane of home plate; however, it is possible for analyst to determine the strike zone location data for a pitch at any point around the strike zone that they wish by simply applying the equations of motion.
Most or all of the strike zone location charts that you will see around the web use the default front-of-plate locations. This gives the best credit to low pitches, which may have dropped out of the zone at points farther back on the plate. Pitches don’t move much side-to-side while they are crossing the plate, so that turns out to not be a big deal.
by Mike Fast on Nov 10, 2009 9:17 AM EST up reply actions 0 recs

by 


















BtB on Facebook















