/cdn.vox-cdn.com/uploads/chorus_image/image/46367692/usa-today-8565853.0.jpg)
The MVP of StatCast's first two months in the public eye is batted ball speed. Thanks to the tireless efforts of Baseball Savant's Daren Willman, the public can easily access the tens of thousands of batted ball speeds recorded since Opening Day, along with distance traveled, launch angle, hit direction, and the usual (but still amazing!) PITCHf/x data.
It should come as no surprise, then, that armchair analysts have already started testing the limits of their new toy. The comparatively small sample we have tells us that batted ball speed stabilizes very quickly, for instance. And although it's still too early to tell what leaderboards like this...
Starters allowing highest avg batted ball velos (200 pitches min) C.J. Wilson (96.9 mph) D. Salazar (94.9) V. Worley (94.4) K. Kendrick (93)
— Beyond the Box Score (@BtBScore) April 25, 2015
... tell us about a player's season, we can certainly use batted ball speed to look at league-wide trends. For instance, do harder-hit balls produce more errors?
Previous research was unable to find a relationship between batter speed and error frequency, though the release of baserunning-related StatCast data may change that. This article will focus only on fielding errors, testing the hypothesis that harder hit balls will be more difficult to field, producing more misplays. One could also imagine that softer-hit balls could force a fielder to rush his throw, producing more throwing errors (especially in the presence of fast runners). But for now, let's stick with fielding errors and save throwing errors for later.
The first approach is the simplest: run a logistic regression (specifically, a probit regression) to look for a relationship between batted ball speed and the likelihood of an error. The blue circles on the graph below show the probability of an error for all batted balls hit within 1 mph of that speed. The solid green line, the result of our logistic regression, does a poor job explaining those data points, and a large number of the points fall outside the confidence intervals (denoted by the red dashed lines).
Suppose we limit our scope to ground balls. A hard-hit fly ball, after all, is likely to leave the yard, and home runs rarely produce errors. The graph below shows the results of another probit model applied to only the 9,100 ground balls in our database. Ground balls were included based on their manually annotated batted ball type; these batted ball types can be tricky, but the distinction between most grounders and other types of batted balls is clearer than between, say, a liner and a fly ball.
We now have a model that suggests a real (if still slight) relationship between batted ball speed and the likelihood of committing an error. But that curve changes extremely slowly. The lower bound of the probability a 120-mph batted ball produces an error (1.8 percent) is smaller than the upper bound of the probability a 60-mph batted ball produces an error (2.5 percent). In other words, it's still possible that the relationship between batted ball speed and error likelihood is totally flat.
A possible confounding factor is that errors are a judgment call at the discretion of an official scorer employed by the home team. We all know the influence of home field advantage on umpires, and we've heard anecdotes about close calls becoming hits for the home team to boost batting averages. Does home cooking also affect fielding percentages?
Adding a dummy variable to identify when the home team was batting does not help improve our model's predictive power. But there is still a substantial difference between what gets called an error when the home team is batting and what gets called an error when the visitors are up!
The two curves on this graph represent the likelihood of an error when the home (blue) and road (orange) team is batting. The shaded portions represent the 95% confidence interval based on the number of ground balls observed at that speed. Speeds were grouped in 10-mph buckets with a 5-mph overlap, so the first dot covers all grounders hit between 50 and 60 mph, the next covers between 55 and 65 mph, and so on. And whereas the home team's fielding percentage decreases on harder-hit balls, the road team stays oddly consistent -- and relatively error-free! -- over the meaty part of the curve.
These data suggest home teams get the benefit of the doubt on would-be errors: a ground ball hit at the same speed is more likely to be called an error if the home team is fielding than if it is batting. If the relationship were flipped, you could argue that some of it was due to the visitors' inexperience with the nuances of an individual ballpark. But it seems unreasonable to argue that visiting defenders get more reliable away from their home grounds. Besides, scorers are incentivized to turn close calls for home batters into hits (to boost batting averages), and close calls for visiting batters into errors (to help keep down ERAs).
. . .
Bryan Cole is a featured writer for Beyond the Box Score. He will be talking about wearable sensors at this year's Saberseminar. You can follow him on Twitter at @Doctor_Bryan.