There was significant disagreement between defensive metrics over his defensive performance in 2011. - Kim Klement-US PRESSWIRE
Examining the greatest single-season discrepancies between some of the more the popular defensive metrics of the day.
One of the lazier criticisms of WAR that was tossed around in last week's straw-manhunt was the fact that you can get "wildly different" WAR totals based on which version of WAR you are using. As most of us are fully aware, the use of several different WAR methods is actually preferable to a uniform calculation because it's entirely possible one of the WAR systems may be undervaluing or overvaluing a player.
Comparing the differences between the WAR calculated at Baseball-Reference, Fangraphs, and Baseball Prospectus allows us to have a more-complete, more-rounded view of a player's value from several angles, rather than constricted to one narrow and limited perspective.
For position players, the 'wild' differences stem mainly from the different replacement levels that each system has in place, but also the different methods of measurement used in determining a position player's defensive runs above average. For contemporary seasons, Baseball-Reference uses DRS, while Fangraphs uses UZR, and Baseball Prospectus its own FRAA.
Plenty has been written before regarding the frequency in which these metrics will disagree with one another-- even BTBS wunderkind Glenn DuPaul wrote about it with regards to the 2011 and 2012 seasons. But I don't ever recall seeing a full list of the greatest discrepancies between the major public defensive evaluators since their inception in 2002.
So, in order to get an idea of just how divergent these metrics can be when evaluating the same player in the same season, let's first compare the greatest deltas between Baseball-Reference's RField against that of Fangraphs' UZR.
Nomar's 2002 season is the very worst case scenario insofar as disagreement between the two defensive metrics. Fangraphs' UZR had him pegged as one of the worst defenders in baseball that year, while Baseball-Reference had him as one of the best. The difference of 26 runs is not a trivial matter either-- that roughly equates to a loss of two and a half Wins.
Also of note:
- Interestingly, seven of the top 20 greatest discrepancies occurred in the 2002 season, the first season in which the data to create both metrics is available (as far as I know at least).
- Not a single player-season made the top 20 from 2012.
- Orlando Hudson makes the list in two consecutive seasons, once in 2006 and again in 2007, with UZR valuing him significantly below Baseball-Reference in both instances.
- Alfonso Soriano had the greatest discrepancy in 2012 (16.8 runs).
- Darin Erstad's 2002 total of 38.7 defensive runs above average ranks as the best defensive season ever at Baseball-Reference.
When we add Baseball Prospectus to the mix, some of the arguments get a bit rowdier:
|5||Ken Griffey Jr.||2007||-29.1||-14||2.1||31.2|
What leads to these measurements having such dramatically different estimates of value for the same players in the same season? That's not something I can answer entirely. I imagine you would have to have in-depth knowledge on what happens under the hood of both zone-based metrics to make sense of the discrepancies, and we know FRAA is based on play-by-play data only.
I imagine some percentage of the readership may see these differences and interpret them as evidence that defensive metrics still have a ways to go before they can be deemed reliable. And I certainly won't argue with that interpretation. But I do believe this should also reinforce exactly why we should not limit ourselves to just one universal WAR metric and why we should continue to allow ourselves more than one way of evaluating player performance in any regard.
. . .