Are fWAR and rWAR on Different Scales?
Twice when working on last week's articles about double plays I came across large differences in career WAR for some players, depending on whether you used Fagraphs' WAR (fWAR) or Rally's WAR (rWAR, which is used at Baseball-Reference.com). In the first example, Jim Rice had 41.5 rWAR but 56.1 fWAR. That's a pretty big difference. But in the second example, I found that Brooks Robinson scored at an impressive 69.1 rWAR but also boasted a staggering 94.6 fWAR.
This baffled me. I'm well aware of the differences in the pitching inputs used for each WAR metric. And both also use different defensive metrics for hitters. But for older players, they both use the same defensive metrics (which is Rally's Total Zone, since UZR was not yet available). I had been under the impression that both systems calculate offensive production similarly. Apparently I was wrong. Very wrong.
One early observation I made was that Fangraphs WAR tended to be higher. Today's graph shows how many more position players have 30+ career WAR in the Fagraphs system than in Rally's.
Yikes. This has me wondering if fWAR is on a completely different scale than rWAR. Does Fangraphs think offense is simply worth more runs than Rally does? Is Fangraphs merely top-heavy and this all evens out in the end? I don't have answers to these questions (yet), but I did take a look at which players were most affected by this difference.
I pulled the Top 200 position players by career rWAR and compared that with their fWAR. First, I'll start with the shorter list—those who actually have a lower fWAR than rWAR:| Player | rWAR | fWAR | Diff |
|---|---|---|---|
| Cap Anson | 99.5 | 88.7 | -10.8 |
| Bid McPhee | 57.9 | 51.4 | -6.5 |
| Johnny Damon | 48.3 | 41.8 | -6.5 |
| Sam Thompson | 46.7 | 41.2 | -5.5 |
| Ichiro Suzuki | 55.2 | 50.7 | -4.5 |
| George Davis | 90.7 | 86.4 | -4.3 |
| Buck Ewing | 51.8 | 47.5 | -4.3 |
| Dan Brouthers | 83.7 | 80.1 | -3.6 |
| King Kelly | 48.5 | 45.0 | -3.5 |
| Albert Pujols | 83.8 | 80.6 | -3.2 |
| Will Clark | 57.6 | 54.4 | -3.2 |
Cap Anson takes a serious hit. Otherwise, it calms down pretty quick. What strikes me is that Anson, McPhee, Thompson, Davis, Ewing, Brouthers, and Kelly all played most or all of their careers in the 1800s. I'm also seeing some active players (Damon, Suzuki, Pujols) and the somewhat recently active Will Clark. I'm not sure if there's anything to that.
Here's the meaty list, where fWAR is greater than rWAR. Robinson's 25.5 win (not run!) difference takes the cake:
| Player | rWAR | fWAR | Diff |
|---|---|---|---|
| Brooks Robinson | 69.1 | 94.6 | 25.5 |
| Carl Yastrzemski | 88.7 | 108.7 | 20.0 |
| Jimmie Foxx | 94.1 | 112.3 | 18.2 |
| Harmon Killebrew | 61.1 | 78.4 | 17.3 |
| Tony Perez | 50.5 | 67.8 | 17.3 |
| Pete Rose | 75.3 | 91.4 | 16.1 |
| Max Carey | 50.6 | 66.6 | 16.0 |
| Luke Appling | 69.3 | 84.7 | 15.4 |
| Honus Wagner | 134.5 | 149.8 | 15.3 |
| Joe Torre | 55.6 | 70.8 | 15.2 |
| Sherry Magee | 59.1 | 74.1 | 15.0 |
| Al Simmons | 63.6 | 78.5 | 14.9 |
| Ted Williams | 125.3 | 139.8 | 14.5 |
| Lou Boudreau | 56.0 | 69.8 | 13.8 |
| Luis Aparicio | 49.9 | 63.6 | 13.7 |
| Willie Stargell | 57.5 | 70.9 | 13.4 |
| Bob Johnson | 53.2 | 66.4 | 13.2 |
| Jimmy Sheckard | 51.8 | 65.0 | 13.2 |
| Bobby Doerr | 47.7 | 60.9 | 13.2 |
| Joe Tinker | 49.2 | 62.2 | 13.0 |
| Joe Cronin | 62.5 | 75.4 | 12.9 |
| Ron Santo | 66.4 | 79.3 | 12.9 |
| Billy Williams | 57.2 | 69.7 | 12.5 |
| George Sisler | 50.4 | 62.8 | 12.4 |
| Joe Gordon | 54.9 | 67.2 | 12.3 |
| Zack Wheat | 57.8 | 70.0 | 12.2 |
| Eddie Murray | 66.7 | 78.8 | 12.1 |
| Norm Cash | 52.9 | 64.8 | 11.9 |
| Stan Musial | 127.8 | 139.3 | 11.5 |
| Orlando Cepeda | 46.8 | 58.3 | 11.5 |
| Hank Greenberg | 56.8 | 68.2 | 11.4 |
| Al Kaline | 91.0 | 101.9 | 10.9 |
| Fred McGriff | 50.5 | 61.3 | 10.8 |
| Ted Simmons | 50.4 | 61.1 | 10.7 |
| Willie McCovey | 65.1 | 75.7 | 10.6 |
| Andruw Jones | 59.9 | 70.5 | 10.6 |
| Darrell Evans | 57.3 | 67.8 | 10.5 |
| Johnny Bench | 71.3 | 81.5 | 10.2 |
| Graig Nettles | 61.6 | 71.8 | 10.2 |
Some of these differences are just enormous. This is only the list of players with a ten-win difference. A total of 110 of the top 200 players by rWAR have a difference of five wins or more.
Often, we use these metrics interchangeably—and I've even read quite a few articles where these are avearged together. Should we be doing this? Are they on different scales? Are my personal rWAR classifications (70+ sure HOFer, 50-70 rWAR are interesting cases, less than 50 you need a damn good reason to be a HOFer) applicable to fWAR?
How about the fact that we tend to use rWAR when discussing Hall of Fame cases by fWAR when talking about the current season. If these produce very different results over time, at what point does that become a problem?
I don't have the answers to these questions yet, but my interest has been piqued.
And, gosh—we haven't even looked at pitchers yet…
33 comments
|
3 recs |
Do you like this story?
Comments
I have to admit I didn't
know how big the issue was. I knew that Robinson WAR from your last piece looked kinda funny, but … wow.
The baseball season doesn't have to end! Create your own players, coach your own teams, and join your friends in THE premier baseball MMO. Two Out Rally opens October 25th!
Two Out Rally, BASEBALL MMORPG | Facebook | @2OutRally
by Justin Bopp on Nov 29, 2010 10:35 AM EST via mobile reply actions
Baselines
fWAR has around 34 or 35 WAR per team, while rWAR has 29 or 30 WAR per team. Something like that. fWAR has a lower baseline. The above numbers imply a .280 baseline for fWAR and .320 for rWAR. Something like that. BaseballProspectus uses .250 I think, and Bill James is even lower.
All very good to know, thanks.
Definitely something I’m going to have to watch out for moving forward.
On Twitter: @baseballtwit
Any chance they could come to a consensus on that?
by The Ancient Mariner on Nov 29, 2010 1:26 PM EST up reply actions
*ahem*
The baseball season doesn't have to end! Create your own players, coach your own teams, and join your friends in THE premier baseball MMO. Two Out Rally opens October 25th!
Two Out Rally, BASEBALL MMORPG | Facebook | @2OutRally
by Justin Bopp on Nov 29, 2010 1:51 PM EST via mobile up reply actions
?
Either I’m missing something here or I’m not expressing myself clearly (either of which is completely possible — it’s been an extremely trying day).
by The Ancient Mariner on Nov 29, 2010 5:30 PM EST up reply actions
I'm guessing Justin
is indirectly referencing previous discussions in which we decided that different versions of WAR and replacement level are actually a good thing because they represent critical thinking and objective analysis rather than arbitrary consensus.
Or he could be agreeing with you.
by vivaelpujols on Nov 29, 2010 6:24 PM EST up reply actions
"Arbitrary consensus" is definitely not preferred.
However, stats with the same label should always mean the same thing.
The different versions do a disservice to the community and only serve to further alienate our audience. I’m not saying there should be some consensus, I’m saying each should come up with their own moniker.
The baseball season doesn't have to end! Create your own players, coach your own teams, and join your friends in THE premier baseball MMO. Two Out Rally opens October 25th!
Two Out Rally, BASEBALL MMORPG | Facebook | @2OutRally
by Justin Bopp on Nov 29, 2010 9:54 PM EST up reply actions 1 recs
Well they are both the same framework
What’s wrong with fWAR and rWAR?
Unless you wanna give them cool names or something.
by vivaelpujols on Nov 30, 2010 12:36 AM EST up reply actions
Seems like a good time to plug a post from Patriot.
He looks at the Brooks Robinson difference which is basically all offensive.
My Michigan State (and Big Ten) Baseball Blog.
Like music? See what I'm listening to at my Last.fm account.
Break it down by component
For Brooks – Fielding and position adjustments are equal.
Batting +20 BR, +133 FG
It looks to me like FG is not removing pitcher hitting from the league totals. But they are also using different approaches, like starting with WOBA, where I calculate custom baseruns coefficients for each team.
Rep +341 BR, +393 FG
I don’t think Fangraphs is adjusting for league quality, where I have the AL as the inferior league for much of Brook’s career.
Baserunning – not counted on Fangraphs, but Brooks came out roughly average anyway, so no big deal.
GIDP – also not counted on Fangraphs, but I have him at -35 runs.
Add it up and you’ve got about 20 wins.
"That boy is our last hope" - Obi Wan Scioscia, as Francisco Rodriguez left for the Mets. "No, there is another" - Yoda Reagins.
by RallyMonkey5 on Nov 29, 2010 3:24 PM EST reply actions 1 recs
I'm not good at the SABR
But I definitely would love to see more of an explanation on the ridiculous batting runs differential, wherein lies almost the entire difference. RM5’s couple sentences on that doesn’t really do it for me.
""I’d like to be a crossword clue one day. I want to be in The New York Times’s Sunday edition. Right now, the clue ‘Giants great’ is always Mel Ott. I want my clue to be down, not across. The down ones are usually harder. And when I’m the clue, I’ll fill it in — just that one — and frame it. " - Brian Wilson.
FanGraphs uses wOBA
Which is league wide linear rates (think of it as average runs per plate appearance).
Baseball Reference uses team BaseRuns, which derives it’s own weights from the team’s run environment.
In a nutshell, FanGraphs offensive rating is the same for all players, whereas Baseball Reference’s varies based on the quality of the player’s teams’ offense.
by vivaelpujols on Nov 29, 2010 7:51 PM EST up reply actions
Different replacement levels
Different ways of measuring offense (fWAR uses generic linear weights, rWAR uses linear rates generated from team run environment), different positional adjustments, and Rally includes stuff like baserunning, which FanGraphs doesn’t.
Yeah
I would liken this to comparing fWAR to WAB or WARP. Different replacement levels give different results (and slight variation on inputs). The confusion stems from the use of the same construct, I think.
Come check out Bullpen Banter!
Follow Bullpen Banter on Twitter
Follow me on Twitter
Remember: baseball guys... baseball...
So, in the cases that I've seen people "take the average" of rWAR and fWAR ...
… is that then dangerous to do?
On Twitter: @baseballtwit
I don't think so
I don’t think they are on different scales – at least not in the traditional sense. It’s just that they have different assumptions behind the components – because they differ on replacement level, rWAR will almost always be lower, but that doesn’t mean it’s scaled differently.
Am I making sense?
by vivaelpujols on Nov 29, 2010 8:55 PM EST up reply actions
In other words...
Fangraphs cuts you off at the ankles while BRef cuts you off at the knees, but neither puts you on a rack to stretch or squoosh you.
See, that makes me think averaging them probably isn't best...
Since it sounds like 6.0 rWAR is more impressive than 6.0 fWAR. And from a career perspective, it seems 60 rWAR is a hell of a lot more impressive than 60 fWAR. I mean, 39 extra players have accomplished 60 fWAR than 60 rWAR.
On Twitter: @baseballtwit
Well if you average it out for all players it should be fine
I don’t see the problem as one metric is always going to be systematically “biased” up or down.
by vivaelpujols on Nov 30, 2010 12:30 AM EST up reply actions
But...
it’s a lot nicer than if each increase in 1 WAR in rWAR was equal to an increase of 1.3 fWAR or something.
Right
So, averaging them requires you to keep in mind that fWAR is typically a bit higher (because of the replacement level), so it will be weighed a bit heavier in the average.
Ugh, feels like OPS all over again.
On Twitter: @baseballtwit
Is that essentially choosing a .300 replacement value?
by Dan Turkenkopf on Nov 30, 2010 11:35 AM EST up reply actions
Ok that might not be clear
I mean by averaging them you’re basically splitting the replacement level down the middle. All else being equal of course.
Since both systems calculate everything as above average except for the replacement value, you should be able to plug in whatever replacement level you want and scale them for better comparisons.
It probably would be a pretty easy spreadsheet to figure out. If no one else wants to take on the challenge, I’ll give it a try at some point when work calms down some.
by Dan Turkenkopf on Nov 30, 2010 11:38 AM EST up reply actions
And here we are with our damn spreadsheets again. :)
Sounds reasonable though.
On Twitter: @baseballtwit
I think I just figured out my next VAR.
Thanks Adam. :)
The baseball season doesn't have to end! Create your own players, coach your own teams, and join your friends in THE premier baseball MMO. Two Out Rally opens October 25th!
Two Out Rally, BASEBALL MMORPG | Facebook | @2OutRally

by 





























