clock menu more-arrow no yes mobile

Filed under:

Be Wary of WAR: A Cautionary Tale

June 14, 2012; Chicago, IL, USA; Chicago Cubs second baseman Darwin Barney (15) forces out Detroit Tigers left fielder Delmon Young (21) at second base in the fourth inning at Wrigley Field.  Mandatory Credit: David Banks-US PRESSWIRE
June 14, 2012; Chicago, IL, USA; Chicago Cubs second baseman Darwin Barney (15) forces out Detroit Tigers left fielder Delmon Young (21) at second base in the fourth inning at Wrigley Field. Mandatory Credit: David Banks-US PRESSWIRE

I love WAR. I would get in trouble in the larger majority of circles for making that statement, but among the sabermetrically-inclined it's a completely normal opinion. WAR will always hold a special place in my heart; I originally became interested in sabermetrics because of WAR (as well as FIP). I love the all-encompassing value statistic more than Jeremy Hellickson loves throwing changeups. WAR has so many benefits: comparing players from different eras, debating MVP candidates, evaluating worth of trades and free agent contracts, and so much more. Wins Above Replacement is a great statistic, but it is not perfect.

Consider this example: A sabermetrically-inclined fan wanted to research how well Howie Kendrick did last season. He begins his search Fangraphs.com and sees that the Angels' second baseman was worth almost six wins above replacement (5.8), and was the Angels' best position player in 2011. That fan would conclude that Kendrick was an All-star caliber player and one of the top-5 second basemen in the game; however, if the same fan decided to begin his search at Baseball Prospectus, instead of Fangraphs, he would have come to a very different conclusion.

Kendrick was worth less than three wins above replacement (2.7), based on BP's calculation of WAR, fourth-best among 2011 Angel position players. This fan would then conclude that Kendrick was just slightly above-average last season, and was not even a top-10 second baseman. One player, two great sabermetric websites, two calculations of WAR, and two very different conclusions. So where does this difference in WAR come from?

The devil is in the defense.

The three most widely used calculations of WAR come from Fangraphs, Baseball Prospectus, and Baseball-Reference; labeled as fWAR, WARP, and rWAR, respectively. Click this link for a complete break-down of the different WAR calculations. The three WAR’s have slightly different calculations for the batting components, but over the course of a season they usually come to very similar numbers for how many wins a position player contributes with his bat.

The major difference in the systems comes from their fielding components. The three systems use three different defensive metrics; Fangraphs uses UZR, Baseball Prospectus uses FRAA, and Baseball-Reference uses DRS. FRAA shares less similarities than the other two metrics do, but DRS and UZR are by no means the same calculation.

I listed Kendrick's three different 2011 WAR's and defensive numbers:

Site

WAR

Defense

Fangraphs

5.8

16.7

Baseball-Reference

4.2

6

Baseball Prospectus

2.7

-2.3

The three systems rated Kendrick's bat as being worth between 3-4 WAR, but came to three very different WAR totals, because of his defense. Kendrick's glove and season probably were not as good as Fangraphs would lead one to believe, but probably not as "bad" as Baseball Prospectus' numbers look.

Defensive metrics aren't too useful over even a sample as a large as a single season. So I began thinking that a Kendrick-esque situation with variations in defense could affect the way we, who care about WAR, read into the WAR leaderboards of the current 2012 season.

Here's the current top-10 in position player WAR for each of the three calculations:

fWAR

WARP

rWAR

1. Joey Votto (4.2)

1. David Wright (4.0)

1. Joey Votto (4.0)

2. David Wright (3.9)

2. Joey Votto (3.6)

2. Josh Hamilton (3.5)

3. Michael Bourn (3.6)

3. Josh Hamilton (3.2)

3. Michael Bourn (3.5)

4. Adam Jones (3.5)

4. Adam Jones (3.0)

4. Brett Lawrie (3.4)

5. Josh Hamilton (3.5)

5. Melky Cabrera (2.9)

5. David Wright (3.4)

6. Carlos Ruiz (3.2)

6. Ryan Braun (2.7)

6. Melky Cabrera (3.0)

7. Ryan Braun (3.1)

7. Jose Altuve (2.7)

7. Darwin Barney (3.0)

8. Mike Trout (3.0)

8. Carlos Ruiz (2.6)

8. Carlos Ruiz (3.0)

9. Martin Prado (2.9)

9. Mike Trout (2.5)

9. Adam Jones (2.9)

10. A.J. Ellis (2.9)

10. Josh Willingham (2.5)

10. Mike Trout (2.8)

Players who appear in the top-10 of all three systems have clearly been awesome. I am interested in the two players in each top-10 who don't appear in either of the other leaderboards.

Is an inflated defensive metric the reason for these players having one very high WAR or is the difference in WAR is within the margin of error?

Ellis and Prado (FG):

Player

rWAR (DRS)

rfWAR (UZR)

WARP (FRAA)

AJ Ellis

2.2 (2)

2.9 (2)

2.0 (0)

Martin Prado

2.5 (5)

2.9 (7)

1.8 (-0.9)

Ellis is a catcher, thus defensive metrics don't effect his WAR by a large amount. UZR and DRS credit him with two runs saved (0.2 WAR), while FRAA does not. His high fWAR is most likely due to different calculations in batting and base running, not defense.

The difference in Prado's WAR is due to defensive calculations. FRAA does not like Prado's glove, while DRS and UZR like his defense, Sean Smith's Total Zone rating (TZR) does as well (3). Prado's defense has probably been better than FRAA rates it, but maybe not as good as UZR would lead one to believe. His value has been closer to his rWAR than the other two systems.

Willingham and Altuve (BP):

Player

rWAR (DRS)

fWAR (UZR)

WARP (FRAA)

Josh Willingham

1.8 (-6)

2.3 (-3.9)

2.5 (3.6)

Jose Altuve

1.9 (-5)

2.1 (-2)

2.7 (5.8)

Willingham is rated negatively by DRS (-6) and UZR (-3.9), but FRAA rates him as a positive defender (3.6) and in turn his WARP is high. I'm not ready to say his WARP is flawed because of his FRAA, though. Willingham's fWAR is still 2.3, despite the negative UZR and TZR (11) likes Willingham's glove a lot.

Altuve's WARP is a different story than Willingham's. FRAA is the only defensive metric that seems to like Atluve's glove, thus far. The other three metrics rate his defense as below-average, while FRAA has him as well-above average. The Astros' second baseman has had a great start to 2012, but it may not have been as great as his WARP currently reports.

Barney and Lawrie (BR):

Player

rWAR (DRS)

fWAR (UZR)

WARP (FRAA)

Darwin Barney

3.0 (16)

1.2 (1)

1.4 (5.4)

Brett Lawrie

3.4 (25)

1.7 (7.2)

2.0 (10)

The fact that the well-respected Baseball-Reference rates Darwin Barney as the seventh most valuable position player shocks me. Barney's current slash .279/.329/.409 won't blow anyone away, and neither will his OPS+ (96), wRC+ (99) or TAv (.255). Fangraphs and Baseball Prospectus both like his glove; he even rates among the top-25 defenders in the game based on FRAA. But his rWAR is incredibly inflated, it seems, by the 16 runs DRS credits to his glove. His DRS has led to a rWAR that is over a one and a half wins higher than the other two systems.

Lawrie is also a beneficiary of an extraordinarily high DRS. Toronto's third baseman has one of the best gloves (if not the best) in the game. HIs FRAA (10.0) is the highest in the game, and UZR also rates him as an above average defender (7.2); however, the 25 runs DRS credits to his glove is out of this world. But there may be a reason for it.

In a brilliant post at BP, Colin Wyers showed the flaw in the amount of runs credited to Lawrie, based on the DRS system. For those who haven't read Wyers work on Lawrie, his main conclusion is that each time Lawrie makes a play between first and second as part of a shift he gets credited with a run saved. The average third baseman doesn't make plays at second base.

I also looked into some other players who have defensive metrics that may be inflating one of their WARs:

DRS:

Player

rWAR (DRS)

fWAR (UZR)

WARP (FRAA)

Yunel Escobar

2.1 (17)

0.9 (4)

1.4 (5.4)

Sean Rodriguez

1.8 (14)

0.2 (-0.1)

0.1 (4.1)

Brendan Ryan

2.0 (18) 0.9 (10.4) 0.7 (6.6)

Lawrie's companion on Toronto's left-side, Yunel Escobar, could also be a beneficiary of the shift. Although a fair majority of Toronto's shifts, leave Escobar near his normal position. Like Lawrie, FRAA and UZR both like Escobar's glove, but his DRS is off the charts and severely inflates his rWAR.

The Tampa Bay Rays shift as much as any team in baseball and their shifting tendencies seem to have a similarly inflated DRS and rWAR of infielder Sean Rodriguez in a similar fashion. Rodriguez has been a replacement-level player in the eyes of FG and BP, and his batting slash .217/.268/.339 has replacement-level written all over it. His DRS makes him seem like a valuable player, but I think it's fairly safe to conclude his rWAR is not a true reflection of his 2012 value.

The next five players' high defensive metrics are not inflated by shifting around the field. They are just five players who are well above league average in only one system of measuring defense; thus, possibly have deceiving WAR's.

Seattle's shortstop, Brendan Ryan, has been more valuable than his .157 batting average would suggest, as all three systems love his game at short. But he has not been the 2-win player through less than half the season, that his crazy-high DRS suggests.

FRAA:

Player

rWAR (DRS)

fWAR (UZR)

WARP (FRAA)

Alex Rios

0.8 (2)

1 (0.3)

1.8 (6.8)

Howie Kendrick

0.5 (2)

0.6 (2.4)

1.5 (6.6)

Alex Rios and Howie Kendrick only rank behind Lawrie, in terms of FRAA. Rios was a great defender with Toronto, but since joining the White Sox his numbers have dipped, probably due to age. Also he has been around average in the outfield based on DRS and UZR. These facts makes me nervous about his current WARP and FRAA.

When Kendrick fell perfectly into this category this post came full circle for me. Kendrick's 2011 numbers were the inspiration behind this post, as UZR loved his glove, but FRAA was not a fan, causing a gap between his fWAR and WARP. So far, in 2012 that gap still exists, but ironically the metrics have reversed.

I say that this change is ironic, but maybe it was far too predictable, based on the volatility of defensive metrics from season to season.

UZR:

Player

rWAR (DRS)

fWAR (UZR)

WARP (FRAA)

Alfonso Soriano

0.8 (1)

1.8 (6.8)

0.9 (2.8)

Mike Moustakas

1.8 (5)

2.3 (8.5)

1.5 (3.9)

The Royals' third baseman, Moustakas, rates well on all three scales. But his UZR is just high enough to make me nervous about whether or not he's been a 2+ win player, thus far.

Alfonso Soriano is a very interesting case. He was an awful defensive second baseman, but after moving to outfield he became one of baseball's premier defenders. He saved over 13 runs more than any other player from 2007-08, according to UZR. FRAA liked his defense in both those seasons, but has not again, until this season. A positive score based on all three metrics is surprising for a 36 year-old who seemed to be slowing down at an alarming rate. Soriano could have changed something in his game, or he may be more evidence for just how confusing and unpredictable the way sabermetricians evaluate defense is.

One Final Opinion:

This statement has been beaten to a pulp by everyone involved in sabermetrics, but I think it bears repeating: Defensive measures will not be perfectly accurate until Fielding F/X data is released to the public. The critiques of sabermetrics always begin and end their arguments with the imperfection of defensive stats. However, to say that WAR is a flawed (or dare I say useless) statistic, because of the imperfection of its defensive component, is wrong and a complete and utter cop out.

Defense has to be a component of WAR. WAR attempts to put a single number to everything that occurs on the baseball field. Defense is a large component of the game; thus, it needs to be included in WAR. The day of Field F/X has yet to come (and it may never); thus, sabermetricians cannot sit around and wait for that day. They must keep trekking on by evaluating defense to the best of their ability with the information that is available.

I think for the most part, followers of sabermetrics have a defensive metric of preference. I've met some people who swear by FRAA, and others who are completely on the other side of the fence. I think this may be the wrong decision though. Having a favorite WAR or a preferred defensive stat may be an inherent flaw.

I made this mistake a couple weeks ago; stating in a post that, "I prefer Fangraphs' calculation of WAR over BP's version of the statistic." Picking one way to evaluate defense or one Wins Above Replacement statistic as the end-all be-all way is a mistake.

How can a person who swears by rWAR argue that Lawrie and Barney have been top-10 among positional players, in 2012? Can a person of UZR-preference really argue that Soriano saved almost 50 runs above league average in left field over the course of just two seasons, at age 30?

I still think that WAR is the best (baseball, of course) statistic on the planet. I also think that the number-crunchers behind these defensive stats are doing everything in their power to explain what is truly happening on the field. But I also think that any large variation between the WAR systems, especially this early season, should keep a researcher (or fan) from jumping to any brash conclusions.

I just urge those who love baseball, love sabermetrics, and love WAR to keep your eyes open, and be wary, because our understanding of the game is still nowhere close to perfect.

All statistics come courtesy of Fangraphs, Baseball-Reference and Baseball Prospectus and are as of Friday, July 15th.

You can follow Glenn on twitter @Baseballs_Econ