How Much Do We Really Know About the Strike Zone?
I was originally going to title this article "Is Everything We Know About the Strike Zone a Lie?" I figured that was a nice juicy headline to draw in traffic and generate controversy. But then I re-looked at the numbers, got a whole lot more confused about what they mean and decided to scale back my Drudginess.
One the major assumptions we make about the Pitch F/X dataset is that we have a pretty good idea of what the strike zone is. It's been studied a bunch of times, starting with John Walsh, and continuing with our own Jeff Zimmerman, each time refining the answer a bit. But we assume that the strike zone we use is consistent across the entire data set. Yes, we realize that the top and bottom of the strike zone vary from at-bat to at-bat, so in general we use the median value for each batter. And we know that the boundaries of the strike zone are somewhat flexible. Walsh determined his strike zone at point where 50% of the pitches are called strikes. Jeff used 85% strikes as his boundary value. And of course the zones differ according to batter handedness. But we've assumed that the strike zone is relatively consistent from park to park. That assumption appears to be pretty far from the truth.
My research into park strike zones came about while trying to repeat this study on catcher framing from a few years back. My major issue at the time was the magnitude of the effect - the difference between the best and worst catchers was 25 wins. When I tried again with more recent data, the initial effect was even bigger at 5 runs per game - or roughly 60 wins per season. That completely failed the smell test. I'm pretty confident in my methodology, despite not yet adjusting for umpire. And similar results were found by Bill Letson, who's approach was a lot more rigorous than mine..
So I started thinking about what else could cause such a discrepancy, and decided to look into park factors for strike calls. Now you wouldn't expect there to be much in the way of park factors. I believe the semi-official stance on how the Pitch F/X system works is that the pitch as it crosses the plate is correct to within half an inch. Since the strike zone relative to home plate is the same in each park, the amount of "missed" pitches should be roughly the same in all parks, right? Of course you could argue that perhaps umpires make a difference, but they should be allocated fairly randomly across parks. But perhaps I'm getting a little ahead of myself. Let's look at the data for 2009.
| In Home Park | In Away Park | In Home Park | In Away Park | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Year | Team | LG | Called H | Called A | Called H | Called A | Fstrikes H | Fstrikes A | Fballs H | Fballs A | Fstrikes H | Fstrikes A | Fballs H | Fballs A | PF | |
| 2009 | ANA | AL | 6699 | 6637 | 6992 | 6513 | 603 | 691 | 225 | 246 | 693 | 525 | 255 | 228 | 113 | |
| 2009 | ARI | NL | 6193 | 6063 | 6275 | 5866 | 617 | 498 | 197 | 195 | 577 | 550 | 218 | 204 | 101 | |
| 2009 | ATL | NL | 5959 | 5781 | 6113 | 6085 | 684 | 508 | 132 | 186 | 558 | 574 | 243 | 153 | 121 | |
| 2009 | BAL | AL | 6213 | 5967 | 6161 | 6293 | 655 | 517 | 199 | 199 | 477 | 525 | 237 | 226 | 142 | |
| 2009 | BOS | AL | 6113 | 6604 | 7332 | 6756 | 538 | 554 | 261 | 239 | 695 | 534 | 279 | 321 | 105 | |
| 2009 | CHA | AL | 6178 | 5806 | 6073 | 5781 | 514 | 413 | 220 | 183 | 570 | 502 | 230 | 195 | 81 | |
| 2009 | CHN | NL | 6287 | 5855 | 6034 | 5952 | 478 | 473 | 216 | 203 | 550 | 486 | 216 | 200 | 86 | |
| 2009 | CIN | NL | 6104 | 5719 | 5920 | 6230 | 608 | 436 | 139 | 167 | 495 | 554 | 207 | 182 | 113 | |
| 2009 | CLE | AL | 6347 | 6258 | 6695 | 6090 | 465 | 521 | 275 | 223 | 663 | 437 | 259 | 268 | 89 | |
| 2009 | COL | NL | 6025 | 6528 | 6689 | 5931 | 497 | 574 | 213 | 233 | 638 | 492 | 214 | 201 | 89 | |
| 2009 | DET | AL | 6264 | 5871 | 6025 | 6272 | 436 | 479 | 259 | 194 | 483 | 435 | 241 | 262 | 112 | |
| 2009 | FLO | NL | 6685 | 6532 | 6496 | 6044 | 581 | 544 | 261 | 218 | 647 | 508 | 181 | 219 | 83 | |
| 2009 | HOU | NL | 6165 | 5530 | 5799 | 5987 | 458 | 415 | 246 | 205 | 513 | 466 | 195 | 218 | 76 | |
| 2009 | KCA | AL | 6442 | 5882 | 5739 | 6233 | 593 | 506 | 210 | 170 | 467 | 460 | 207 | 220 | 135 | |
| 2009 | LAN | NL | 6291 | 6428 | 7318 | 6473 | 581 | 593 | 239 | 288 | 653 | 572 | 242 | 196 | 90 | |
| 2009 | MIL | NL | 6482 | 6431 | 6472 | 6177 | 616 | 628 | 244 | 253 | 622 | 472 | 232 | 227 | 115 | |
| 2009 | MIN | AL | 6157 | 6423 | 6332 | 5702 | 598 | 446 | 248 | 325 | 505 | 483 | 280 | 196 | 89 | |
| 2009 | NYA | AL | 7303 | 7396 | 7097 | 6487 | 599 | 698 | 241 | 271 | 641 | 510 | 244 | 219 | 105 | |
| 2009 | NYN | NL | 6188 | 5995 | 6181 | 5909 | 494 | 555 | 211 | 166 | 592 | 425 | 188 | 235 | 112 | |
| 2009 | OAK | AL | 6096 | 5881 | 6153 | 6278 | 541 | 472 | 253 | 241 | 471 | 461 | 239 | 237 | 116 | |
| 2009 | PHI | NL | 7034 | 7024 | 7197 | 6646 | 695 | 505 | 185 | 233 | 529 | 586 | 292 | 227 | 126 | |
| 2009 | PIT | NL | 5763 | 5764 | 5906 | 5600 | 416 | 488 | 257 | 172 | 512 | 369 | 183 | 292 | 117 | |
| 2009 | SDN | NL | 6285 | 5913 | 5963 | 5947 | 476 | 377 | 281 | 287 | 503 | 500 | 219 | 213 | 50 | |
| 2009 | SEA | AL | 6047 | 5593 | 5574 | 5800 | 371 | 412 | 250 | 194 | 418 | 390 | 204 | 238 | 92 | |
| 2009 | SFN | NL | 6149 | 4972 | 4870 | 5968 | 579 | 457 | 200 | 141 | 369 | 526 | 147 | 221 | 127 | |
| 2009 | SLN | NL | 5967 | 5456 | 5592 | 5598 | 618 | 403 | 176 | 220 | 419 | 540 | 215 | 157 | 102 | |
| 2009 | TBA | AL | 5992 | 6227 | 6525 | 5921 | 518 | 472 | 228 | 238 | 514 | 538 | 241 | 238 | 93 | |
| 2009 | TEX | AL | 6404 | 5657 | 5649 | 5900 | 538 | 366 | 257 | 300 | 392 | 556 | 217 | 198 | 62 | |
| 2009 | TOR | AL | 5967 | 6045 | 6356 | 6075 | 486 | 440 | 238 | 239 | 567 | 523 | 217 | 233 | 74 | |
| 2009 | WAS | NL | 6108 | 6287 | 6379 | 6011 | 456 | 512 | 190 | 216 | 606 | 483 | 209 | 221 | 86 | |
FStrikes are called strikes that were outside the zone (for Fake Strikes), while FBalls were balls that were inside the zone. H and A are home and away, and PF is park factor. A low number is more pitcher friendly (more strikes than expected), while a high number is better for batters (more balls than expected).
You can see a huge spread in how likely a pitch was to be mis-called based on park in 2009. 2007 and 2008 were no better.
Here are the unweighted three year park effects for each team's stadium:
| Team | 2007 | 2008 | 2009 | Avg |
|---|---|---|---|---|
| ANA | 107 | 116 | 113 | 112 |
| ARI | 79 | 115 | 101 | 98 |
| ATL | 129 | 112 | 121 | 121 |
| BAL | 119 | 142 | 131 | |
| BOS | 66 | 108 | 105 | 93 |
| CHA | 59 | 83 | 81 | 74 |
| CHN | 122 | 109 | 86 | 106 |
| CIN | 67 | 106 | 113 | 95 |
| CLE | 70 | 82 | 89 | 80 |
| COL | 77 | 111 | 89 | 92 |
| DET | 195 | 130 | 112 | 146 |
| FLO | 30 | 110 | 83 | 74 |
| HOU | 175 | 103 | 76 | 118 |
| KCA | 116 | 108 | 135 | 120 |
| LAN | 69 | 90 | 90 | 83 |
| MIL | 94 | 95 | 115 | 101 |
| MIN | 65 | 76 | 89 | 77 |
| NYA | 105 | 105 | ||
| NYN | 112 | 112 | ||
| OAK | 101 | 107 | 116 | 108 |
| PHI | 97 | 99 | 126 | 107 |
| PIT | 141 | 149 | 117 | 136 |
| SDN | 101 | 92 | 50 | 81 |
| SEA | 159 | 68 | 92 | 106 |
| SFN | 105 | 76 | 127 | 103 |
| SLN | 104 | 96 | 102 | 101 |
| TBA | 199 | 122 | 93 | 138 |
| TEX | 63 | 72 | 62 | 66 |
| TOR | 120 | 116 | 74 | 103 |
| WAS | 89 | 86 | 88 |
The correlation between 2008 and 2009 is a fairly robust 0.36, which suggests there's at least some actual phenomenon here.
What might cause such discrepancy from park to park? You wouldn't imagine the strike zone would vary based on the park. It's a fairly static thing and not subject to the placement of the outfield fences, or the size of outfield, or even the length of the infield grass.
So if the cause is unlikely to be one of the normal factors that influence park effects, what might be some less obvious reasons for the difference?
According to Alan Nathan, the camera position differs from park to park, which might not be completely corrected for by Sportsvision's software.
Maybe there's something to the hitter's background at certain parks that affects the umpire's ability to call balls and strikes. If that were the case, you'd think we'd have heard some complaints at some point.
It's also possible that the assignment of umpires doesn't wash out when looking at the results of a single season. Perhaps Texas sees more than it's fair share of pitcher-friendly umpires. This option seems unlikely since we see similar results from season to season and it seems like any unintentional umpire scheduling bias wouldn't carry across multiple years.
There's a slim chance that certain pitchers were more likely to pitch at home than on the road and they could throw the numbers off. But that would be unlikely to occur for all teams, and, again, would likely be a single year effect. The same goes for catchers.
Unfortunately, since I don't have a better explanation, I'm tending to believe the first one - something is different about camera placement from park to park, and it's affecting how pitches are recorded.
It's just conjecture at this point and I'd love someone from Sportvision to tell me I'm wrong, but I'm a little nervous about the correctness of the Pitch FX coordinates. If we can't count on those from park-to-park, then a lot of studies need to be questioned.
We've known for a while that Pitch FX needed to be corrected for park. It's one of the things Josh Kalk was working on before being hired by Tampa Bay. But my understanding was that was mainly for the pitcher's side of things (release point, etc.) and that the values at the plate were correct to within a fraction of a inch.
I'm less convinced that's true now. Undoubtedly, I don't have a whole lot of evidence - more of a gut feel that something is not right with these results.
It's clearly of great importance to the sabermetric community to be able to trust the Pitch FX numbers. And right now, my faith is a little bit shaken.
16 comments
|
0 recs |
Do you like this story?
Comments
Better nightlife for the umpires in certain towns.
They want to get the game over with quicker in some cases.
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman on Apr 19, 2010 11:45 AM EDT reply actions
+1 Retweet.
See Data Differently. Beyond The Boxscore. | Follow me @justinbopp
Two Out Rally, the new BASEBALL MMORPG! | Facebook | Twitter
Pitch F/X is accurate within an inch at the plate...
…assuming proper calibration. If the system is not properly calibrated, the errors will be greater (and more to the point, they will be systemic – that is, the same errors are likely to repeat in the same parks).
That's kind of what I was getting at.
If the system is mis-calibrated, then we may need to question a lot of the results we’ve found to this point.
by Dan Turkenkopf on Apr 19, 2010 1:53 PM EDT up reply actions
I think Ike Hall did a good job a few years ago of showing that...
…in fact the system is mis-calibrated, in such a way as to cause “park effects.” That really should be the default assumption of anyone doing Pitch F/X studies.
Colin, that's true, but...
location at the plate is one of the things that is least susceptible to errors and least sensitive to them when they happen. Ike’s data found that of the 28 parks from 2007, only Dolphin Stadium had any noticeable shift in the strike zone.
I have not looked at this systematically, but I’ve looked for park shifts in strike zones on a few specific occasions and have not found anything out of the ordinary by more than an inch (which is what it would have to be before we could start to notice it).
By no means does that indicate that I don’t think there could be real problems with the zone in some parks at some times. But so far I haven’t generally seen that to be the case.
Winner, Beyond the Box Score 32 Predictions Contest, 2009
procedures
Dan…it would be helpful to know in detail how you are defining Fake Strikes and Fake Balls. Are you considering both the horizontal (well defined) and vertical (less well defined) extent of the strike zone? What do you use for the horizontal extent? Is your strike zone 3-dimensional? Do you take into account the finite size of the ball? Of the mis-calls, how many are within one inch of the zone boundary?
I ask all these questions so that I can appreciate the extent of the problem.
I'm using John Walsh's zones scaled for the average top/bottom for each batter (over the course of a single season I think)
I’ll have to look the actual numbers up at home.
The strike zone is point in time at the front of the plate.
The points about the size of the ball and the closeness to the boundary are good ones. I’ll have to take a look at whether correcting for that makes a difference.
by Dan Turkenkopf on Apr 19, 2010 5:20 PM EDT up reply actions
These are all good concerns...
…but they shouldn’t cause any concerns with a study of park bias. Even if there were errors in the ball-strike calls in PitchF/X due to a 2D rather than 3D conception of the strike zone, that should “wash out” over several seasons of data, which is not the behavior we’re seeing here.
I'm still advocating for an electronic call, for strikes.
If the pitch is in the strike, a vibrating buzzer goes off in the umps pocket (and, perhaps, a little red light in both dugouts). The ump does his usual “strike call”. No buzz, it’s a ball, plus, the ump can concentrate on foul tips, catcher’s interference, checked swings, etc.
This would certainly reward batters who could practice viewing pitches that are “strikes” for their particular stance, and make a quicker decision (based on repetition and practice) when they are at the plate. No squeezing of pitches, no framing pitches, no catcher-size differential.
Catcher size may be an explanation for the ballpark “actual phenomenon”.
Blez: Most folks seem to believe that the big flaw with the 2010 Oakland A's will be the lack of any power.
Beane: They believe it because it's true.
by One won lost won on Apr 19, 2010 9:39 PM EDT reply actions
One question I had, Dan, sorry if it's the answer is obvious/irrelevant
But is there also a “park” factor for percentage of called strikes? That would have nothing to do with PITCHf/x if it were the case.
Winner, Beyond the Box Score 32 Predictions Contest, 2009
Could it be light conditions?
As an amateur umpire I know that light conditions do effect the strike zone. There are more miscalled balls and strikes in twilight conditions then in full daylight or when the lights have full effect. The lighting systems in major league ball parks are excellent but they light level could be a factor.
the difference between the best and worst catchers was 25 wins.
That should read, runs, right?
by Andy Hellicksonstine on Apr 20, 2010 10:09 PM EDT reply actions

by 




























