Twice when working on last week's articles about double plays I came across large differences in career WAR for some players, depending on whether you used Fagraphs' WAR (fWAR) or Rally's WAR (rWAR, which is used at Baseball-Reference.com). In the first example, Jim Rice had 41.5 rWAR but 56.1 fWAR. That's a pretty big difference. But in the second example, I found that Brooks Robinson scored at an impressive 69.1 rWAR but also boasted a staggering 94.6 fWAR.
This baffled me. I'm well aware of the differences in the pitching inputs used for each WAR metric. And both also use different defensive metrics for hitters. But for older players, they both use the same defensive metrics (which is Rally's Total Zone, since UZR was not yet available). I had been under the impression that both systems calculate offensive production similarly. Apparently I was wrong. Very wrong.
One early observation I made was that Fangraphs WAR tended to be higher. Today's graph shows how many more position players have 30+ career WAR in the Fagraphs system than in Rally's.
Yikes. This has me wondering if fWAR is on a completely different scale than rWAR. Does Fangraphs think offense is simply worth more runs than Rally does? Is Fangraphs merely top-heavy and this all evens out in the end? I don't have answers to these questions (yet), but I did take a look at which players were most affected by this difference.I pulled the Top 200 position players by career rWAR and compared that with their fWAR. First, I'll start with the shorter list—those who actually have a lower fWAR than rWAR:
Cap Anson takes a serious hit. Otherwise, it calms down pretty quick. What strikes me is that Anson, McPhee, Thompson, Davis, Ewing, Brouthers, and Kelly all played most or all of their careers in the 1800s. I'm also seeing some active players (Damon, Suzuki, Pujols) and the somewhat recently active Will Clark. I'm not sure if there's anything to that.
Here's the meaty list, where fWAR is greater than rWAR. Robinson's 25.5 win (not run!) difference takes the cake:
Some of these differences are just enormous. This is only the list of players with a ten-win difference. A total of 110 of the top 200 players by rWAR have a difference of five wins or more.
Often, we use these metrics interchangeably—and I've even read quite a few articles where these are avearged together. Should we be doing this? Are they on different scales? Are my personal rWAR classifications (70+ sure HOFer, 50-70 rWAR are interesting cases, less than 50 you need a damn good reason to be a HOFer) applicable to fWAR?
How about the fact that we tend to use rWAR when discussing Hall of Fame cases by fWAR when talking about the current season. If these produce very different results over time, at what point does that become a problem?
I don't have the answers to these questions yet, but my interest has been piqued.
And, gosh—we haven't even looked at pitchers yet…