This tweet rolled across my Twitter timeline last Friday, April 25th:
John Smoltz is saying the strike zone shrinks in the late innings of a close game. Anyone have a link to research on that?— Beyond the Box Score (@BtBScore) April 26, 2014
I never trust anything I see on Twitter (or the Internet, for that matter) without checking out the bona fides of the party involved, and after a rigorous vetting process, determined the tweeter involved could be trusted--indeed, they appear to be a stunningly bright group of men and women at the forefront of sabermetric research, but I digress. It poses a legitimate question, and the data exists to check the veracity of the statement.
Before showing the data, it's important to draw an important distinction. When John Smoltz shaves, he loses more baseball knowledge than I'll ever possess in my lifetime. In testing questions like this, my purpose isn't to demean someone or show off my vast knowledge as much as check if the facts line up with the statement. There will never be an instance when my opinion should be given greater weight that John Smoltz's.
With the advent of PITCHf/x in 2008, we can move beyond the Opinion Era to the Fact Era. All data comes from Daren Willman's outstanding baseballsavant.com, and this picture shows how the strike zone is defined:
Pitches in zones 1-9, as determined by the MLBAM PITCHf/x database, are strikes, and those in zones 11-14 are balls. Dan Brooks and Harry Pavlidis at brooksbaseball.net further refine the balls outside the strike zone by breaking down the four zones outside the strike zone into sixteen, but for our purposes that level of detail isn't necessary. This allows for four different outcomes:
Smoltz's comment suggests that in later innings, a smaller strike zone could lead to the two incorrect call states, a pitch in the strike zone that is incorrectly called a ball or a pitch outside the zone that is incorrectly called a strike.
PITCHf/x uses the following pitch result descriptions:
We're not interested in occasions where the batter swings at the ball--the umpire didn't make a call on those occasions. This first chart shows the number of called strikes on balls outside the strike zone and differentiates between earlier and later innings (all data through Tuesday's games):
CS=called strike CS OZ=called strike on pitch outside strike zone
There are two interesting facets to this chart. First, it shows very little difference between calls in the first six inning and the seventh inning and later--the number of "incorrect" calls is very similar. Second, before starting a movement to eliminate umpire calls due to the fact that almost one-third of these calls are "incorrect," consider this image:
This shows every called strike outside the strike zone for the New York Yankees since 2008 in innings 1-6 and is representative for all teams. It shows that relatively speaking, while there are high and low pitches that are called for strikes, far and away it's pitches wide of the plate that caused the most incorrect calls. This is an issue for another post but I felt it important to show the data.
What about balls that are called on pitches inside the strike zone?
B IZ=ball on pitch inside strike zone
The number of pitches incorrectly called balls on pitches inside the strike zone is far smaller, but also doesn't move appreciably depending on the inning.
Close games in the later innings create more tension for all involved, and every pitch takes on greater meaning. In this case every "incorrect" call takes on outsized weight and sticks out greater in a pitcher's mind. It's easy to understand Smoltz being more upset about an incorrect call with a game tied 2-2 in the eighth inning than on one in the first, but the numbers don't substantiate his claim. This chart ties it all together:
People with more knowledge than me can explain why more strikes are called outside the strike zone than balls called inside, but it probably comes down to it being easier to determine if a ball is inside vs. outside the strike zone. No matter what, there doesn't appear to be much difference in the later innings. Daren's site allows for this data to be drilled down to the umpire level, and while I suspect there might be an outlier or two, I doubt there would be much difference between umpires.
We can only work with the data that is available. STATS LLC or Elias Sports Bureau may have pitch data that goes back further than 2008, but it isn't publicly accessible. It would be interesting to see if balls and strikes are called more "correctly" in the PITCHf/x Era as umpires are aware that every pitch is being watched, but many of the incorrectly called pitches are those right on the line. Try being right every single time while crouched in 95 degree heat wearing tons of protective clothing--it's not easy.
In a just world, John Smoltz will be part of the Hall of Fame Class of 2015 (along with Pedro Martinez, Randy Johnson and Craig Biggio), and in no way am I diminishing his credibility as an analyst. In this case, the facts don't support his statement, which is what the availability of data allows for: the testing of "conventional wisdom." Sometimes it holds up . . . and sometimes it doesn't.
All data from baseballsavant.com
Scott Lindholm lives in Davenport, IA. Follow him on Twitter @ScottLindholm.