Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: This Week In GIFs

How Much Do We Really Know About the Strike Zone?

I was originally going to title this article "Is Everything We Know About the Strike Zone a Lie?"  I figured that was a nice juicy headline to draw in traffic and generate controversy.  But then I re-looked at the numbers, got a whole lot more confused about what they mean and decided to scale back my Drudginess.

One the major assumptions we make about the Pitch F/X dataset is that we have a pretty good idea of what the strike zone is.  It's been studied a bunch of times, starting with John Walsh, and continuing with our own Jeff Zimmerman, each time refining the answer a bit.  But we assume that the strike zone we use is consistent across the entire data set.  Yes, we realize that the top and bottom of the strike zone vary from at-bat to at-bat, so in general we use the median value for each batter.  And we know that the boundaries of the strike zone are somewhat flexible.  Walsh determined his strike zone at point where 50% of the pitches are called strikes.  Jeff used 85% strikes as his boundary value.  And of course the zones differ according to batter handedness.  But we've assumed that the strike zone is relatively consistent from park to park.  That assumption appears to be pretty far from the truth.

Star-divide

My research into park strike zones came about while trying to repeat this study on catcher framing from a few years back.  My major issue at the time was the magnitude of the effect - the difference between the best and worst catchers was 25 wins.  When I tried again with more recent data, the initial effect was even bigger at 5 runs per game - or roughly 60 wins per season. That completely failed the smell test.  I'm pretty confident in my methodology, despite not yet adjusting for umpire.  And similar results were found by Bill Letson, who's approach was a lot more rigorous than mine..

So I started thinking about what else could cause such a discrepancy, and decided to look into park factors for strike calls.  Now you wouldn't expect there to be much in the way of park factors.  I believe the semi-official stance on how the Pitch F/X system works is that the pitch as it crosses the plate is correct to within half an inch.  Since the strike zone relative to home plate is the same in each park, the amount of "missed" pitches should be roughly the same in all parks, right?  Of course you could argue that perhaps umpires make a difference, but they should be allocated fairly randomly across parks.  But perhaps I'm getting a little ahead of myself.  Let's look at the data for 2009.

 

 

      In Home Park In Away Park In Home Park In Away Park  
Year Team LG Called H Called A Called H Called A Fstrikes H Fstrikes A Fballs H Fballs A Fstrikes H Fstrikes A Fballs H Fballs A PF
2009 ANA AL 6699 6637 6992 6513 603 691 225 246 693 525 255 228 113
2009 ARI NL 6193 6063 6275 5866 617 498 197 195 577 550 218 204 101
2009 ATL NL 5959 5781 6113 6085 684 508 132 186 558 574 243 153 121
2009 BAL AL 6213 5967 6161 6293 655 517 199 199 477 525 237 226 142
2009 BOS AL 6113 6604 7332 6756 538 554 261 239 695 534 279 321 105
2009 CHA AL 6178 5806 6073 5781 514 413 220 183 570 502 230 195 81
2009 CHN NL 6287 5855 6034 5952 478 473 216 203 550 486 216 200 86
2009 CIN NL 6104 5719 5920 6230 608 436 139 167 495 554 207 182 113
2009 CLE AL 6347 6258 6695 6090 465 521 275 223 663 437 259 268 89
2009 COL NL 6025 6528 6689 5931 497 574 213 233 638 492 214 201 89
2009 DET AL 6264 5871 6025 6272 436 479 259 194 483 435 241 262 112
2009 FLO NL 6685 6532 6496 6044 581 544 261 218 647 508 181 219 83
2009 HOU NL 6165 5530 5799 5987 458 415 246 205 513 466 195 218 76
2009 KCA AL 6442 5882 5739 6233 593 506 210 170 467 460 207 220 135
2009 LAN NL 6291 6428 7318 6473 581 593 239 288 653 572 242 196 90
2009 MIL NL 6482 6431 6472 6177 616 628 244 253 622 472 232 227 115
2009 MIN AL 6157 6423 6332 5702 598 446 248 325 505 483 280 196 89
2009 NYA AL 7303 7396 7097 6487 599 698 241 271 641 510 244 219 105
2009 NYN NL 6188 5995 6181 5909 494 555 211 166 592 425 188 235 112
2009 OAK AL 6096 5881 6153 6278 541 472 253 241 471 461 239 237 116
2009 PHI NL 7034 7024 7197 6646 695 505 185 233 529 586 292 227 126
2009 PIT NL 5763 5764 5906 5600 416 488 257 172 512 369 183 292 117
2009 SDN NL 6285 5913 5963 5947 476 377 281 287 503 500 219 213 50
2009 SEA AL 6047 5593 5574 5800 371 412 250 194 418 390 204 238 92
2009 SFN NL 6149 4972 4870 5968 579 457 200 141 369 526 147 221 127
2009 SLN NL 5967 5456 5592 5598 618 403 176 220 419 540 215 157 102
2009 TBA AL 5992 6227 6525 5921 518 472 228 238 514 538 241 238 93
2009 TEX AL 6404 5657 5649 5900 538 366 257 300 392 556 217 198 62
2009 TOR AL 5967 6045 6356 6075 486 440 238 239 567 523 217 233 74
2009 WAS NL 6108 6287 6379 6011 456 512 190 216 606 483 209 221 86

 

FStrikes are called strikes that were outside the zone (for Fake Strikes), while FBalls were balls that were inside the zone.  H and A are home and away, and PF is park factor.  A low number is more pitcher friendly (more strikes than expected), while a high number is better for batters (more balls than expected).

You can see a huge spread in how likely a pitch was to be mis-called based on park in 2009.  2007 and 2008 were no better.

Here are the unweighted three year park effects for each team's stadium:

 

Team 2007 2008 2009 Avg
ANA 107 116 113 112
ARI 79 115 101 98
ATL 129 112 121 121
BAL   119 142 131
BOS 66 108 105 93
CHA 59 83 81 74
CHN 122 109 86 106
CIN 67 106 113 95
CLE 70 82 89 80
COL 77 111 89 92
DET 195 130 112 146
FLO 30 110 83 74
HOU 175 103 76 118
KCA 116 108 135 120
LAN 69 90 90 83
MIL 94 95 115 101
MIN 65 76 89 77
NYA     105 105
NYN     112 112
OAK 101 107 116 108
PHI 97 99 126 107
PIT 141 149 117 136
SDN 101 92 50 81
SEA 159 68 92 106
SFN 105 76 127 103
SLN 104 96 102 101
TBA 199 122 93 138
TEX 63 72 62 66
TOR 120 116 74 103
WAS   89 86 88

 

The correlation between 2008 and 2009 is a fairly robust 0.36, which suggests there's at least some actual phenomenon here.

What might cause such discrepancy from park to park?  You wouldn't imagine the strike zone would vary based on the park.  It's a fairly static thing and not subject to the placement of the outfield fences, or the size of outfield, or even the length of the infield grass.

So if the cause is unlikely to be one of the normal factors that influence park effects, what might be some less obvious reasons for the difference?

According to Alan Nathan, the camera position differs from park to park, which might not be completely corrected for by Sportsvision's software.  

Maybe there's something to the hitter's background at certain parks that affects the umpire's ability to call balls and strikes.  If that were the case, you'd think we'd have heard some complaints at some point.

It's also possible that the assignment of umpires doesn't wash out when looking at the results of a single season.  Perhaps Texas sees more than it's fair share of pitcher-friendly umpires.  This option seems unlikely since we see similar results from season to season and it seems like any unintentional umpire scheduling bias wouldn't carry across multiple years.

There's a slim chance that certain pitchers were more likely to pitch at home than on the road and they could throw the numbers off.  But that would be unlikely to occur for all teams, and, again, would likely be a single year effect.  The same goes for catchers.

Unfortunately, since I don't have a better explanation, I'm tending to believe the first one - something is different about camera placement from park to park, and it's affecting how pitches are recorded.  

It's just conjecture at this point and I'd love someone from Sportvision to tell me I'm wrong, but I'm a little nervous about the correctness of the Pitch FX coordinates.  If we can't count on those from park-to-park, then a lot of studies need to be questioned.

We've known for a while that Pitch FX needed to be corrected for park.  It's one of the things Josh Kalk was working on before being hired by Tampa Bay.  But my understanding was that was mainly for the pitcher's side of things (release point, etc.) and that the values at the plate were correct to within a fraction of a inch.

I'm less convinced that's true now.  Undoubtedly, I don't have a whole lot of evidence - more of a gut feel that something is not right with these results.  

It's clearly of great importance to the sabermetric community to be able to trust the Pitch FX numbers.  And right now, my faith is a little bit shaken.

Comment 16 comments  |  0 recs  | 

Do you like this story?

Comments

Display:

Better nightlife for the umpires in certain towns.

They want to get the game over with quicker in some cases.

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Apr 19, 2010 11:45 AM EDT reply actions  

Pitch F/X is accurate within an inch at the plate...

assuming proper calibration. If the system is not properly calibrated, the errors will be greater (and more to the point, they will be systemic – that is, the same errors are likely to repeat in the same parks).

by cwyers on Apr 19, 2010 1:07 PM EDT reply actions  

That's kind of what I was getting at.

If the system is mis-calibrated, then we may need to question a lot of the results we’ve found to this point.

by Dan Turkenkopf on Apr 19, 2010 1:53 PM EDT up reply actions  

I think Ike Hall did a good job a few years ago of showing that...

…in fact the system is mis-calibrated, in such a way as to cause “park effects.” That really should be the default assumption of anyone doing Pitch F/X studies.

by cwyers on Apr 19, 2010 5:31 PM EDT up reply actions  

Colin, that's true, but...

location at the plate is one of the things that is least susceptible to errors and least sensitive to them when they happen. Ike’s data found that of the 28 parks from 2007, only Dolphin Stadium had any noticeable shift in the strike zone.

I have not looked at this systematically, but I’ve looked for park shifts in strike zones on a few specific occasions and have not found anything out of the ordinary by more than an inch (which is what it would have to be before we could start to notice it).

By no means does that indicate that I don’t think there could be real problems with the zone in some parks at some times. But so far I haven’t generally seen that to be the case.

Winner, Beyond the Box Score 32 Predictions Contest, 2009

by Mike Fast on Apr 20, 2010 12:05 AM EDT up reply actions  

procedures

Dan…it would be helpful to know in detail how you are defining Fake Strikes and Fake Balls. Are you considering both the horizontal (well defined) and vertical (less well defined) extent of the strike zone? What do you use for the horizontal extent? Is your strike zone 3-dimensional? Do you take into account the finite size of the ball? Of the mis-calls, how many are within one inch of the zone boundary?

I ask all these questions so that I can appreciate the extent of the problem.

by pobguy on Apr 19, 2010 5:14 PM EDT reply actions  

I'm using John Walsh's zones scaled for the average top/bottom for each batter (over the course of a single season I think)

I’ll have to look the actual numbers up at home.

The strike zone is point in time at the front of the plate.

The points about the size of the ball and the closeness to the boundary are good ones. I’ll have to take a look at whether correcting for that makes a difference.

by Dan Turkenkopf on Apr 19, 2010 5:20 PM EDT up reply actions  

These are all good concerns...

…but they shouldn’t cause any concerns with a study of park bias. Even if there were errors in the ball-strike calls in PitchF/X due to a 2D rather than 3D conception of the strike zone, that should “wash out” over several seasons of data, which is not the behavior we’re seeing here.

by cwyers on Apr 19, 2010 5:30 PM EDT up reply actions  

I'm still advocating for an electronic call, for strikes.

If the pitch is in the strike, a vibrating buzzer goes off in the umps pocket (and, perhaps, a little red light in both dugouts). The ump does his usual “strike call”. No buzz, it’s a ball, plus, the ump can concentrate on foul tips, catcher’s interference, checked swings, etc.

This would certainly reward batters who could practice viewing pitches that are “strikes” for their particular stance, and make a quicker decision (based on repetition and practice) when they are at the plate. No squeezing of pitches, no framing pitches, no catcher-size differential.

Catcher size may be an explanation for the ballpark “actual phenomenon”.

Blez: Most folks seem to believe that the big flaw with the 2010 Oakland A's will be the lack of any power.

Beane: They believe it because it's true.

by One won lost won on Apr 19, 2010 9:39 PM EDT reply actions  

One question I had, Dan, sorry if it's the answer is obvious/irrelevant

But is there also a “park” factor for percentage of called strikes? That would have nothing to do with PITCHf/x if it were the case.

Winner, Beyond the Box Score 32 Predictions Contest, 2009

by Mike Fast on Apr 20, 2010 12:15 AM EDT reply actions  

Could it be light conditions?

As an amateur umpire I know that light conditions do effect the strike zone. There are more miscalled balls and strikes in twilight conditions then in full daylight or when the lights have full effect. The lighting systems in major league ball parks are excellent but they light level could be a factor.

by RobertG on Apr 20, 2010 1:51 AM EDT reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

Yahoo_full_count

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Recent_pic_pg_small Patrick Gordon

Btbpro_small Dave Gershman

Me_small Bryan Grosnick

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung

30472_1481067225243_1190689185_1381415_997334_n_small Glenn DuPaul

1mnvxku7_small joshuaworn

Set_small MattFilippi18

Photo0011_small Nathaniel Stoltz