Not too long ago I posted a short piece here at Beyond the Box Score on the greatest single-season discrepancies between UZR, DRS, and FRAA for individual fielders. As you might expect, there were a number of seasons where the three popular defense metrics disagreed quite severely, as much as 25 runs in some cases.
It should be hardly surprising that in small samples defensive metrics can suggest two very different things about a defender's range and abilities in the field. But what about in larger samples? How badly can the advanced zoning metrics like DRS and UZR disagree over the course of a player's career?
The issue of reliability in modern advanced fielding metrics has been making the rounds lately. It is at the heart of Jon Heyman's recent and infamous 'WAR mysteries' of the week, which sparked Colin Wyers to speak out about the matter as well as a lengthy discussion at the Book blog.
One of the many alternatives tossed around in that thread (and elsewhere) was the possibility of regressing single-season defensive values. Generally, single-season samples are viewed as unreliable, while career numbers are viewed with more confidence.
Career DRS vs Career UZR
This got me to wondering how often Fangraphs and Baseball Reference can disagree about a player's fielding on a career level? And for which players should we be most aware of this disagreement? I took all player's career fielding runs from both Fangraphs (UZR-based) and Baseball-Reference (DRS-based) since 2003-- the first year both zoning metrics were applied. I also included Baseball Prospectus's play-by-play based FRAA for further reference.
I found there are eleven players that earned fifty runs more under one system (the difference of 5 wins or so). At the absolute worst there is a discrepancy of over eight wins:
Career Discrepancies Fielding
Worst case scenario is the always-entertaining defense of Alfonso Soriano. There narratives told by Fangraphs and Baseball-Reference in his case are completely opposite one another. Fangraphs alleges Soriano has been a 45-run plus fielder over the past ten years, while B-Ref estimates Soriano has cost his teams nearly an equal amount in that time.
Anyone who has watched the Cubs left fielder over the last few years would agree that Soriano's defense is a peculiar case. He certainly looks bad in the outfield, but if advanced metrics have told us anything over the years it is to look beyond the surface. A lazy trot might just be the result of a quick route, and an entertaining dive might indicate poor range.
With Soriano we find that both sites love his arm and have him at 20 runs above average. Both UZR and DRS also agree that he was abominable at the keystone, with nearly -40 runs amassed for his missteps at second base. The real disagreement is over Soriano's play in left field: DRS estimates a he was worth -5 runs in that role, while UZR estimates him as an elite +75 run fielder in the outfield.
I was first made aware of DRS's fondness for Orlando Hudson this winter when I explored the greatest three-year defensive peaks for The Hardball Times. Hudson's stretch of elite defense from 2003-2005 saved his team over 60 runs according to Baseball Reference, the best stretch of any second basemen in the history of the game. In a ten year span from 2002-2012, Baseball-Reference has Hudson as a phenomenal ten win defender.
As it turns out, however, UZR doesn't think he was one of the best of all time. In fact, it estimates Hudson has been only mildly above average from 2003-2012 at just +17 runs.
At number three, we once again find ourselves at the heart of the most controversial fielding debate of all time with Derek Jeter. This time, however, the debate is over whether Jeter is a very bad shortstop or a very very very bad shortstop.
In most of these cases, however, the two metrics are at odds over whether a player was exceptionally or mildly better or worse than average. The two exceptions in this top ten are Soriano and Pierre, where the two metrics are telling polar opposite tales. Pierre is estimated to be +20 fielder by Fangraphs, while Baseball-Reference estimates a much bleaker performance at -30 runs.
What about FRAA?
One aspect of Baseball Prospectus's play-by-play based FRAA metric that Colin Wyers promoted in his Heyman piece, was its regressed nature. Because of this, we'll find that FRAA will not credit a fielder with so much value so liberally, unlike DRS and UZR.
With regression, we are likely to estimate closer to a fielder's true-talent level in most cases, but certain fans like Beyond the Box Score's Matt Hunter have expressed concern that this method might devalue the most elite defenders.
I thought it might be interesting then to look at Fangraphs and Baseball-Reference fielding value estimates compared to a far more conservative FRAA.
The players who took the biggest hit over the same ten year period, lost over 100 runs saved under the Baseball Prospectus method. Ten players lost at least 80 runs saved:
Career Discrepancies Fielding (FRAA)
For the top four players at least, we see a scenario where Matt's fear might be realized. Especially in the case of noted elite center fielder Andruw Jones, we see the zoning metrics hand out some pretty impressive fielding scores, where FRAA says "not so much". The UZR-based Fangraphs estimation of 114 runs above average puts Jones in the stratosphere of excellence but Prospectus keeps Jones humble at a mere eight runs above average.
I want to be clear that I am not personally advocating any of these metrics over the other, or zone-based over play-by-play methods. Nor do I think these large discrepancies justify the scrapping of zone-based metrics all together. Generally, UZR, DRS, FRAA, and Total Zone agree on most occasions, and the numbers correlate rather well.
But I do think it's important to recognize the unreliability involved in defensive metrics and to at least be aware of the extremes, while not necessarily focusing on them.
Also, for individual players like Soriano, Beltre, Hudson, and Jones, I think it's also important to be aware of the differing opinions when looking at their WAR/WARP totals from the varying sites.
It is important not to overreact to these cases,there is no need to throw out the baby with the bath water. But certainly we should not under-react to these issues and not ignore them entirely.
. . .