Strike Zone a Marginal Component of Home Field Advantage
| Home Team | Runs/150 | Zone Adv. |
| Pitching | -0.400 | 2.43% |
| Batting | -0.281 | 1.97% |
| ∆ | -0.119 | 0.45% |
I assigned a run value to each "blown" call thrown during the regular season from 2008-2010, using Tom Tango's linear weights per ball-strike count. As noted in the table above, I find the home vs. away spread to be -0.119, or about +/- 0.06 runs, per 150 called pitches (equivalent to the average nine-inning game). This is exactly the figure Dan Turkenkopf offers in the comments of the previous post.
Considering that the run environment over this period is about 9.1 per game (according to Baseball-Reference.com), and considering that the typical MLB field effect is +/- 4%, home team bias in the strike zone only accounts for ~16% of the observed effect. This is pretty darn close to the findings of Mike Fast and Phil Birnbaum, also noted by Dan in the comments of the previous post (Phil takes on Scorecasting from a slightly different angle on his blog).
For those of you scoring at home, that advantage is equivalent to playing an entire season at home and barely winning one extra game.
Subjectively, this is not my idea of a big deal. It's certainly not enough for the data to go "berserk," as the author claims in this interview with Wired. But there's another objective measurement that indicates how unimportant home field advantage is regarding the strike zone.
As I've noted several times in this series, I base much of my work on a statistical model predicting which variables have an effect on called pitch bias. This model not only tells us which variables make an impact, and in what direction, but how strong that impact is relative to the impact of the other variables in the model. So how strong a predictor is home field advantage on Zone Advantage?
Among 29 non-control, statistically significant variables, home field ranks dead last.
That's right, 29th out of 29. Relative to other variables, it's about 1/4 as important as pitcher-handedness, 1/5 as important as velocity, 1/8 as important as pitch type, 1/10 as important as the run expectancy of the base-out state, 1/57 as important as the run expectancy of the ball-strike state, and nearly 1/100th as important as the ratio of pitches thrown in or out of the legal zone over the course of an at bat.
Of course, at this point, we're beyond the scope of the book; this has nothing to do with home field advantage at the level of win-loss record. It does show, however, just how unimportant home field advantage is in terms of strike zone bias.
Rob Neyer wrote:
I still want to see the numbers. And it's not like nobody's ever studied this stuff before.
Mr. Neyer's right: it's not like nobody's done this before. Dan, Mike and Phil have crunched the numbers, and my findings here replicate and corroborate them. So there you have it.
PitchF/X data originate from Darrell Zimmerman's SQL-based PitchFX database, run expectancy data by Tom Tango at Inside the Book.
Previous episodes in the Benefit of the Doubt series:
33 comments
|
3 recs |
Do you like this story?
Comments
Excellent
Dave Gershman - Beyond the Box Score / SPANdemonium / Royals Prospects / Athletics Nation / Penn League Report / Twitter: @Dave_Gershman
I love it when people can come at a problem with different tools and dig up similar answers.
My Michigan State (and Big Ten) Baseball Blog.
Like music? See what I'm listening to at my Last.fm account.
I do too
That’s how we know we’re onto something.
Blogger and Editor, Rational Pastime Blog. Twitter: @RationalPastime.
Love it
Great example of the practical application of the scientific method:
-Person A declares a hypothesis, runs a study, and publishes the results
-Persons B, C, and D replicate the study and determine that the findings need to be qualified
-Hypothesis refined, knowledge advanced
J-Doug
Here to play devil’s advocate, because I think this is great.
It seems as though there could be some bias in the data. If the home field advantage bias is apparent and known by the players, this could affect their swing rate or the home team pitcher’s approach to pitching.
For example, if a Road Team Pitcher knows he won’t get that call on the corners, perhaps he throws it down the middle more often. If that were true, the increase in whatever offensive production statistic you want to lose would seem to be increased, but not attributed to the bias in your model. And you would see many less opportunities for the umpire to blow any calls.
Similarly, if a Road hitter up to bat may know of a bias and swing more at pitches outside the zone, again reducing the possibility of a blown call against the Road Team.
Similarly, the opposite would happen for the Home Team players, ultimately reaching some equilibrium (or near it). In that case, the model would fail to show the effect, no?
I agree
I would not be surprised to see this happening. We know that BABIPs are higher for the home team.
The data aren’t so cooperative, however. There’s very little split in the number of pitches thrown inside and outside the legal zone—the home team sees 45.4% and the away team 45.2%. I also find that the home team gets pounded inside more but doesn’t see more pitches off the outside corner.
Either way, the problem is this isn’t the finding that’s reported in Scorecasting. I have to investigate further, but it seems they claim specifically that there is a significant bias in home vs. away plate calls and that this is a significant component of the home field advantage. I could be interpreting this wrong, but that seems to be what they’re saying.
Personally, I think the bias may have a lot more to do with calls on balls in play, pickoffs and stolen bases. We know that foul calls are significantly biased in favor of the home team in college basketball. It would not surprise me if there were a significant effect going on here.
And, of course, on most balls in play, a bias in the call would have a far stronger effect than the bias on called pitches, and with a smaller sample size would be less likely to even out over the same period of time.
Blogger and Editor, Rational Pastime Blog. Twitter: @RationalPastime.
I like the way you are thinking here...
…that there may be a feedback at work here where player adjust their behavior because they are aware of the known bias and that leads to worse performance than we might naturally expect. But I think J-Doug handles the issue below.
This made me think of the larger issue of causal mechanisms—we know there is a home field advantage, but what we don’t know are the mechanisms through which that advantage comes to be. Even if the authors had been right that umpires exhibited a significant bias towards the away team the question would still remain, why? They assume it is because umpires don’t want to draw the ire of fans, but the only way to prove that would be through qualitative research, not quantitative. Right now, it’s just a hypothesis like any other.
Agreed
Although we could get closer. If there’s an attendance relationship, then that’s corroborating evidence. Perhaps we can get on-field decibel measurements for each pitch? I’d love to play with that data.
Blogger and Editor, Rational Pastime Blog. Twitter: @RationalPastime.
Right, except that...
a correlation with attendance could just as easily be evidence that players perform worse under hostile conditions. But if we’ve already debunked that and the only left is ump calls I can see where this would help.
Motivation, in any case, is really hard to tease out—for me, all quant and no qual doesn’t really get you where you want to go.
THT Annual 2011
I know it’s a book and all, but this was covered very well by John Walsh in the 2011 THT Annual. John found that about one-third of the home field bias is due to the home plate umpire.
Dan Turkenkopf actually published on this well before John did
And he found about half the effect that John did, at 0.06 runs per game. Since J-Doug now found the same as Dan by an independent method, I wonder if John made a mistake in his calculations.
Winner, Beyond the Box Score 32 Predictions Contest, 2009
OK
Mike, I’m not claiming a “first mover” advantage here. Just pointing out that someone else, a pretty well-respected researcher, has covered the subject too. I’m surprised that you’d wonder if John “made a mistake,” just because two (actually three) researchers came to different conclusions. Might we not want to study the subject a bit more and compare approaches?
Might we want to study a bit more? Absolutely.
That’s why i said I wonder, not that I know. But it’s not as if three people studied and came to three different conclusions. Dan and J-Doug both found 0.06 runs/game and John found 0.14 runs/game. As best I can tell, Dan and John used similar methods and similar (same?) definitions of the zone, and J-Doug used a different method.
Is it worthy of further investigation? Definitely.
Winner, Beyond the Box Score 32 Predictions Contest, 2009
Now that I read John's article a bit more closely
I see that he restricted himself to 0-0 pitches and extrapolated that effect to the full game. I’m extremely skeptical of the accuracy of that approach.
Winner, Beyond the Box Score 32 Predictions Contest, 2009
reduces other factors
As John says, we know that strike zones vary based on the count, which is why he restricted it to 0-0 (the largest sample). It’s fine for you to be “extremely skeptical”, but it sure does help if you say why. I would assume your skepticism implies that you feel that:
1. You feel the sample size is too small
2. The home field variance differs depending on the count
The inverse skepticism would be to ask whether J-Doug or Dan normalized for count.
Not sure what you mean by normalized in this context
But I have run two ordinal logit models. One is for all pitches and the other is for 0-0 counts only.
The home at bat variable shows almost exactly the same coefficient in both models: http://www.rationalpastime.com/2010/11/tech-notes-pitch-characteristics-and.html
Blogger and Editor, Rational Pastime Blog. Twitter: @RationalPastime.
Well, if you want my honest opinion
I think that very little of what has been published on the strike zone reflects to the reader just how little we actually know about it.
So I’m skeptical of everything that Dan, John, and J-Doug have published on the topic, and I don’t think any one of them have hit on the “truth”. I’ve studied the strike zone a lot, and I know I haven’t.
But given that proviso, if I had to choose between a study of 0-0 counts extrapolated to all counts and two studies of all counts, the studies of all counts would seem to have more validity. And yes, that implies that home field variance differs depending on the count. I don’t know how comfortable I am with that, but in the absence of evidence to defend John’s assumption, that’s what I’m left with at the moment.
I don’t think we understand enough about the strike zone by count to go normalizing anything by it yet.
Winner, Beyond the Box Score 32 Predictions Contest, 2009
thanks
Yes, Mike, I really do want your honest opinion. Does anything I say imply otherwise?
I agree with your overall point, which is that we don’t really know what we don’t know here, which is why I think we should be more open and highlight differences in outcome, rather than expressing skepticism that one study doesn’t match two others. It’s too early to choose sides. I’d rather have more approaches and different conclusions.
That comment wasn't directed at you
But at the broader sabermetric/baseball community. It is VERY popular to highlight anything that shows that the umpire doesn’t know what he is doing or is unfair and ought to be replaced by a machine.
And for me to say, “Well, I’ve looked at all this, and more than anything I am just befuddled and pessimistic of my/our ability to understand what is really going on here” doesn’t get much airplay. Understandably so. I’ve pointed out some questions I have in the comments to John’s and J-Doug’s articles on the topic, but that’s not nearly as convincing as coming to my own conclusions and publishing on them. I wish I could get far enough with the data that I had some conclusions I felt comfortable with.
I don’t particularly agree with your characterization of the studies so far as all being on equal footing. Given the data we have, I think it’s quite fair to characterize Dan and J-Doug’s finding as the default position of the moment. In no way, however, do I think that this is the final word on the matter. Could John be proved right and Dan and J-Doug wrong? I suppose. More likely, I think they’ll all be proved wrong. But if the current view of how the zone is thought of turns out to be true, then I think Dan and J-Doug’s work here will probably prove out as well.
Winner, Beyond the Box Score 32 Predictions Contest, 2009
fair enough
I’m not as quick as you to rush to conclusions about which studies may be more valid. I’d rather wait for more analysis, having identified key differences between studies. I certainly wouldn’t wonder if someone had made a mistake. It doesn’t seem to me that the differences here are THAT stark (unlike the differences with the Sportscasting writers).
Stating the obvious, if umpire bias differs over count (which we know), and umpire bias also differs home/away (which we also know), then accounting for the differences when counts differ by home or away is not a straightforward task.
Also, you home in on this issue as the key difference between the studies. I’m not sure that’s true. The strike zone determination, and the run value calculations, may also be significantly different. I’m not smart enough to figure that out.
I'm satisfied with this characterization
I know that as I continue to expand my model that I find new things I had not yet anticipated, and some earlier findings were rejected. Nature of the business I suppose.
The problem I have in particular with Scorecasting is that they seemed to use the same data that all the rest of us have come up with and (mostly subjectively) jumped to a far more significant conclusion. Even if John’s numbers are more accurately than mine, it seems that the publicity tour for this book has implied that the strike zone effect is far more than even 1/3 of total HFA.
That’s what I’m taking issue with, primarily. It’s Freakanomics all over again.
Blogger and Editor, Rational Pastime Blog. Twitter: @RationalPastime.
I didn't explicitly normalize for count when I ran my analysis
but count was implicitly included in the run value of switching a ball to a strike.
by Dan Turkenkopf on Jan 31, 2011 8:40 AM EST up reply actions
If I had known about it I certainly would have addressed it
Thanks for the heads-up, studes.
Blogger and Editor, Rational Pastime Blog. Twitter: @RationalPastime.
Catchers' hot zones
So…we’ve seen umpires’ hot zones, rt -& lt-handed hitters hot zones vs rt. & lt. handed pitchers, pitch type and velocity hot zones, etc. Great data. I still want to see catchers’ hot zones: strikes vs balls, for all locations, for all pitch types, based on catchers’ techniques: particularly catching palm up vs palm down for low pitches, glove inward vs glove outward fo pitches to the catcher’s left. I still think techniques that allow the umpire to see the ball as it hits the glove (palm up, palm in) would prove to get more called strikes than sloppy palm down, palm out techniques. However, I would bow to scientific evidence to the contrary.
The hard part with that analysis is that no available data source tracks catcher location or glove position
Sportsvision has talked about tracking the catcher’s position, but it hasn’t been added to the dataset yet (if it ever is)
by Dan Turkenkopf on Jan 31, 2011 9:13 PM EST up reply actions
I can promise you 100%
That if I had the data, I’d have written about it already. What I can check in the long run is if there’s any yty correlation in zone advantage and/or linear weights for different catchers behind the plate. I don’t have that data integrated in my data set yet, however.
Honestly, the yty correlation for pitchers in terms of sone advantage and linear weights is rather small (R=.40), and I’d be surprised if it was any larger for catchers.
Blogger and Editor, Rational Pastime Blog. Twitter: @RationalPastime.
Also in the THT Annual
Sean Smith wrote about this in the THT Annual. He didn’t have the PITCHf/x data, but he used a “WOWY” approach to identify differences between catchers. The general sabermetric principle is that catchers have little effect on ERA, but Sean found that this isn’t true.
This is another thing Bill Letson and I have looked into
Mine- http://www.beyondtheboxscore.com/2008/4/5/389840/framing-the-debate
Bill’s much better take – http://www.beyondtheboxscore.com/2010/3/26/1360581/a-first-pass-at-a-catcher-framing
And we found surprising variations between catchers. So much so that I don’t know if I believe what it’s telling us.
by Dan Turkenkopf on Feb 1, 2011 11:35 AM EST up reply actions
Catcher Hot Zone Cont'd
Dan, Thanks for references! There’s framing, and then there’s framing. Consistent technique for a catcher is just like consistently being around the strike zone for the pitcher…both elicit umpires’ calls to the pitcher’s advantage. When combined the effect may be more than additive….an opinion. But I’m not surprised you found an effect.

by 





































