A couple of weeks ago, I used contour plots to see if umpires called different types of pitches differently. This qualitative approach didn't produce any clear distinctions, but noted strike zone guru Jon Roegele wasn't convinced by my conclusions. In the comments of that article, he wrote,
The thing that I'm not convinced about is whether you'd be able to see the difference in strike zone size from the 50% contour images. If there is a difference it's likely less than 25 sq. inches, and if this was spread around the periphery of the contour then I think it would look "the same" but in reality would be showing a different sized zone.
So let's start by finding the area under each curve, starting for left-handed batters on 0-0 counts. This table shows the area under select contours; for instance, the area under the 50 percent contour counts the number of square-inch bins with at least a 50-50 mix of called strikes and balls.
This seems to suggest that there is a significant difference between the different pitch types. But before we accept this at face value and move on, there are a few things to consider. First, of course, are the gaps in some of the contours; remember, no one on either side of the plate took an inside changeup all year, so those bins aren't counted toward the area.
To understand the other issue, consider the figure below, which compares the total number of four-seam fastballs taken by left-handed hitters on 0-0 counts in 2014 to the 50 percent contour. Note that the maximum number of pitches in any bin is 48, and that bins of 20 or fewer are common along the contour, even for the most common pitch type. Even 20 pitches are a very small sample: a bad day by an umpire, or a good day by a Yadier Molina, could unduly affect individual bins and eventually affect the total area. As further evidence of this, consider the 70 percent contours, practically identical across all pitch types.
An alternate approach would be to build a model, like Rob Arthur's recent work, to predict whether a given pitch is a ball or strike based on its trajectory. I used MATLAB to build two artificial neural networks: the first used only the pitch's location when it crossed the plate, and the second included parameters related to break and movement as well as end speed. Using these parameters, rather than the official classifications, allows me to quantify the effect of movement rather than the arbitrary (and occasionally incorrect) pitch classes.
The graphs below show which pitches were called balls and strikes by each model, as compared to the actual ball/strike calls on the left. Of course, using only location parameters produced a hard cutoff between balls and strikes, but we can see in the graph on the right that including other parameters produced a smoother boundary.
But did the extra parameters make our model more accurate? The results below are the average of five runs per neural net; in each run, half of the data were used to train the network, 25 percent were used for validation, and the remaining 25 percent were used for testing. We can tell that the extra variables add no predictive power to our model, even when we include a few extra hidden nodes to better divide the higher-dimensional space.
One advantage of using a neural network is that we can use the weights to determine the relative importance of each feature. When we look at the relative importance for the eight features in our advanced model (including location, speed, and movement), we see that the location features are by far the most important.
But these results still suggest that the model is using the movement parameters, right? Well, consider the table below. I trained and tested another neural network using the two location parameters and six normally distributed random variables. This network performed just as well as the other networks above, and the importance assigned to the random variables was about the same as the movement parameters in the previous model. In other words, you can replace the features related to pitch type with noise without affecting the model's performance.
Another advantage to using neural networks is that they can easily be converted to dynamic models. At the suggestion of Brian Mills, I built three dynamic models based on all 1-1 pitches taken by right-handed batters against right-handed pitchers in 2014. I included features that described both the 1-1 and the previous pitch, including location, movement, and whether the previous pitch was a strike. The average accuracy, sensitivity, and specificity are graphed below.
Here, too, we see no real advantage to including extra information in our dynamic model. And when we look at the relative importance of the features in our dynamic model, we see that even the location of the previous pitch is in the level of the noise.
I'm not comfortable saying pitch type has no effect on the strike zone; previous work by both Mills and Roegele suggest that pitch type is important, and Etan Green's Sloan paper found that whether the previous pitch was a ball or strike also affected the size of the strike zone. What I'd argue instead is that the effect is at or near the level of noise.
Earlier this week, Tom Tango wrote, "What we care about is the degree to which something exists and the degree to which it can be measured as having an impact." Previous work suggests that pitch type does affect the strike zone; this article argues that the magnitude of this effect is vanishingly small. Like players, umpires go through the same minor league wringer to reach the major leagues; it makes sense, then, that the ones that make the leap were promoted from the minors in part for their ability to reliably track pitches with movement.
. . .
Bryan Cole is a featured writer for Beyond the Box Score with an unhealthy love for neural networks. You can follow him on Twitter at @Doctor_Bryan.