clock menu more-arrow no yes mobile

Filed under:

Towards aging curves for umpires

PITCHf/x data and an investigation of umpires' ability to call balls and strikes as a function of age.

Jason Miller

Scott Lindholm's recent look at individual umpires in the PITCHf/x era made me wonder what umpires' seasonal numbers looked like. In this article, I'll look at how umpires have adjusted their strike zone in the presence of PITCHf/x tracking, and investigate how an umpire's performance changes as he ages. And although the data size is too small to establish a definitive relationship, we can see the start of an umpire aging curve.

More from Beyond The Box Score: Is Cliff Lee a Hall of Famer?

Before we get to that, though, we have to talk about the PITCHf/x database. The database, which is freely available from Baseball Heat Maps, contains data on the trajectory of nearly every pitch thrown in the Major Leagues since 2007. (A primer on the database can be found here, and a glossary of the attributes that describe each pitch is available here.) As useful as this information is, we will only be focusing on a very small subset for this article: namely, where the pitch crossed the plate, what the result of the pitch was, the approximate strike zone, the handedness of the batter, and the umpire behind home plate.

I next want to establish some ground rules for the remainder of the article.

  • We will be looking only at those pitches where the umpire had to make a ball/strike determination. In our database, those are labeled "Ball", "Ball in Dirt", and "Called Strike".
  • I will define the PITCHf/x strike zone based on the rulebook definition. The upper and lower limits will be determined by the sz_top and sz_bot fields of the PITCHf/x data (corresponding to four inches above the batter's belt and the hollow of his knee, respectively). The width of the strike zone will be 9.9", as used here. This is slightly wider than the actual 17" width of the plate, but allows us to catch instances where the ball partially crosses the plate. Anything in this range will be defined as a strike; anything else a ball.
  • Since umpires call different strike zones for left-handed and right-handed batters, I will look at umpires' performance on each group separately.
  • The metric I'll be looking at is "agreement percentage", the percentage of those pitches where the umpire and PITCHf/x call the pitch the same way. Several commenters on Scott's piece objected to the use of "right" or "wrong" to describe calls relative to PITCHf/x. I understand why this might be seen as unfair -- an umpire who called the rulebook strike zone would be run out of town on a rail. In addition, there is always the possibility of calibration issues: other writers have observed that some pitches are called balls even when the PITCHf/x reading is close to the center of the strike zone. Nevertheless, as the graph below shows, umpires have been adjusting their calls since the introduction of PITCHf/x in 2007, resulting in higher agreement percentages with the automated system.
  • Image002_medium
  • Because I'm interested in the effects of age on this agreement percentage, I will only be considering performances by those umpires I can find birthdates for in the Retrosheet umpire database. I've also limited the data set to umpires with a minimum of 15 games behind the plate, chosen because it gives each umpire a minimum of about 1000 calls to both left-handed and right-handed batters. The average age of the 1,040 umpire seasons in the sample is 45.8, with a minimum of 27 and a maximum of 68.

On to the data. We begin with the highest agreement percentages by season. Note that the best numbers are all from 2013 or 2012 and (at least for right-handers) were largely posted by older-than-average umpires.

Umpire (RHB) Year Age Games Agree %
Tony Randazzo 2013 48 33 91.55%
Phil Cuzzi 2012 57 33 91.55%
Lance Barksdale 2013 46 34 91.23%
Tim McClelland 2012 60 33 91.11%
Gerry Davis 2013 60 32 91.04%

The agreement percentages for RHB are higher since (as previously mentioned) LHB typically see more called strikes off the outer part of the plate, but our strict PITCHf/x interpretation labels these pitches balls.

Umpire (LHB) Year Age Games Agree %
Chad Fairchild 2013 42 32 90.78%
Chad Fairchild 2012 41 29 90.43%
Tim McClelland 2013 61 30 90.28%
Manny Gonzalez 2013 33 32 90.03%
Manny Gonzalez 2012 32 29 89.99%

The lowest agreement percentages, on the other hand, were all posted in 2007, the first year of PITCHf/x. In fairness, I should point out that PITCHf/x was not installed in every ballpark in 2007, so some data points from this season are missing. And as the first graph shows, umpires seem to have begun changing their calls in 2008 -- or PITCHf/x operators adjusted the calibration of their systems -- to improve agreement.

Umpire (RHB) Year Age Games Agree %
Laz Diaz 2007 47 37 71.04%
Adrian Johnson 2007 32 16 72.09%
Marty Foster 2007 43 31 72.47%
Chad Fairchild 2007 36 28 72.66%
Brian O'Nora 2007 44 18 72.73%

Interestingly, most of the umpires in these tables are younger than average.

Umpire (LHB) Year Age Games Agree %
Adrian Johnson 2007 32 16 68.25%
Brian O'Nora 2007 44 18 70.15%
Charlie Reliford 2007 51 25 70.22%
Doug Eddings 2007 39 34 70.23%
Ed Hickox 2007 45 34 71.53%

I wanted to investigate umpire aging curves to see how agreement percentage changed as a function of age. There is a straightforward algorithm to do this for players, which I adapted for umpires. Ignoring each umpire's first season, I calculated the average change in agreement percentage relative to the previous season for all umpires at a given age. There are two problems with this analysis: first, there are far fewer umpires than MLB players (there were only 94 unique umpires in my 1,040-season sample); and second, there is a much wider age range for umpires than players (there are no 68-year-olds getting meaningful playing time). Between these two factors, there is too much noise to pull anything reliable from the data.

Both_ump_aging

But as we look at these two graphs, we can see a very slight downward trend, as illustrated by the parabolic trendlines. As I said, it's not enough to draw a solid conclusion from, and it's possible that other factors are responsible -- perhaps, for example, older umpires were slower to adjust their calls to the feedback from PITCHf/x. Still, the results fit with our intuition (humans' eyesight gets worse with age) and I wouldn't be surprised to see a slow decrease in agreement with the PITCHf/x system stand out from the noise as more data is collected.

. . .

PITCHf/x database provided by MLB Advanced Media and SportVision. Umpire database courtesy Retrosheet.

Bryan Cole's total lack of depth perception (among other things) ensures he will never be a Major League umpire. He is a featured writer on Beyond the Box Score, and can be found on Twitter at @Doctor_Bryan.