Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Yankees Deny Rumors That Team Is For Sale

Are fWAR and rWAR on Different Scales?

Twice when working on last week's articles about double plays I came across large differences in career WAR for some players, depending on whether you used Fagraphs' WAR (fWAR) or Rally's WAR (rWAR, which is used at Baseball-Reference.com). In the first example, Jim Rice had 41.5 rWAR but 56.1 fWAR. That's a pretty big difference. But in the second example, I found that Brooks Robinson scored at an impressive 69.1 rWAR but also boasted a staggering 94.6 fWAR.

This baffled me. I'm well aware of the differences in the pitching inputs used for each WAR metric. And both also use different defensive metrics for hitters. But for older players, they both use the same defensive metrics (which is Rally's Total Zone, since UZR was not yet available). I had been under the impression that both systems calculate offensive production similarly. Apparently I was wrong. Very wrong.

One early observation I made was that Fangraphs WAR tended to be higher. Today's graph shows how many more position players have 30+ career WAR in the Fagraphs system than in Rally's.

Rwarfwar_medium

Yikes. This has me wondering if fWAR is on a completely different scale than rWAR. Does Fangraphs think offense is simply worth more runs than Rally does? Is Fangraphs merely top-heavy and this all evens out in the end? I don't have answers to these questions (yet), but I did take a look at which players were most affected by this difference.

Star-divide

I pulled the Top 200 position players by career rWAR and compared that with their fWAR. First, I'll start with the shorter list—those who actually have a lower fWAR than rWAR:

Player rWAR fWAR Diff
Cap Anson 99.5 88.7 -10.8
Bid McPhee 57.9 51.4 -6.5
Johnny Damon 48.3 41.8 -6.5
Sam Thompson 46.7 41.2 -5.5
Ichiro Suzuki 55.2 50.7 -4.5
George Davis 90.7 86.4 -4.3
Buck Ewing 51.8 47.5 -4.3
Dan Brouthers 83.7 80.1 -3.6
King Kelly 48.5 45.0 -3.5
Albert Pujols 83.8 80.6 -3.2
Will Clark 57.6 54.4 -3.2

 

Cap Anson takes a serious hit. Otherwise, it calms down pretty quick. What strikes me is that Anson, McPhee, Thompson, Davis, Ewing, Brouthers, and Kelly all played most or all of their careers in the 1800s. I'm also seeing some active players (Damon, Suzuki, Pujols) and the somewhat recently active Will Clark. I'm not sure if there's anything to that.

Here's the meaty list, where fWAR is greater than rWAR. Robinson's 25.5 win (not run!) difference takes the cake:

Player rWAR fWAR Diff
Brooks Robinson 69.1 94.6 25.5
Carl Yastrzemski 88.7 108.7 20.0
Jimmie Foxx 94.1 112.3 18.2
Harmon Killebrew 61.1 78.4 17.3
Tony Perez 50.5 67.8 17.3
Pete Rose 75.3 91.4 16.1
Max Carey 50.6 66.6 16.0
Luke Appling 69.3 84.7 15.4
Honus Wagner 134.5 149.8 15.3
Joe Torre 55.6 70.8 15.2
Sherry Magee 59.1 74.1 15.0
Al Simmons 63.6 78.5 14.9
Ted Williams 125.3 139.8 14.5
Lou Boudreau 56.0 69.8 13.8
Luis Aparicio 49.9 63.6 13.7
Willie Stargell 57.5 70.9 13.4
Bob Johnson 53.2 66.4 13.2
Jimmy Sheckard 51.8 65.0 13.2
Bobby Doerr 47.7 60.9 13.2
Joe Tinker 49.2 62.2 13.0
Joe Cronin 62.5 75.4 12.9
Ron Santo 66.4 79.3 12.9
Billy Williams 57.2 69.7 12.5
George Sisler 50.4 62.8 12.4
Joe Gordon 54.9 67.2 12.3
Zack Wheat 57.8 70.0 12.2
Eddie Murray 66.7 78.8 12.1
Norm Cash 52.9 64.8 11.9
Stan Musial 127.8 139.3 11.5
Orlando Cepeda 46.8 58.3 11.5
Hank Greenberg 56.8 68.2 11.4
Al Kaline 91.0 101.9 10.9
Fred McGriff 50.5 61.3 10.8
Ted Simmons 50.4 61.1 10.7
Willie McCovey 65.1 75.7 10.6
Andruw Jones 59.9 70.5 10.6
Darrell Evans 57.3 67.8 10.5
Johnny Bench 71.3 81.5 10.2
Graig Nettles 61.6 71.8 10.2

 

Some of these differences are just enormous. This is only the list of players with a ten-win difference. A total of 110 of the top 200 players by rWAR have a difference of five wins or more.

Often, we use these metrics interchangeably—and I've even read quite a few articles where these are avearged together. Should we be doing this? Are they on different scales? Are my personal rWAR classifications (70+ sure HOFer, 50-70 rWAR are interesting cases, less than 50 you need a damn good reason to be a HOFer) applicable to fWAR?

How about the fact that we tend to use rWAR when discussing Hall of Fame cases by fWAR when talking about the current season. If these produce very different results over time, at what point does that become a problem?

I don't have the answers to these questions yet, but my interest has been piqued.

And, gosh—we haven't even looked at pitchers yet…

Comment 33 comments  |  3 recs  | 

Do you like this story?

Comments

Display:

I have to admit I didn't

know how big the issue was. I knew that Robinson WAR from your last piece looked kinda funny, but … wow.

The baseball season doesn't have to end! Create your own players, coach your own teams, and join your friends in THE premier baseball MMO. Two Out Rally opens October 25th!
Two Out Rally, BASEBALL MMORPG | Facebook | @2OutRally

by Justin Bopp on Nov 29, 2010 10:35 AM EST via mobile reply actions  

Baselines

fWAR has around 34 or 35 WAR per team, while rWAR has 29 or 30 WAR per team. Something like that. fWAR has a lower baseline. The above numbers imply a .280 baseline for fWAR and .320 for rWAR. Something like that. BaseballProspectus uses .250 I think, and Bill James is even lower.

by tangotiger on Nov 29, 2010 11:33 AM EST reply actions  

All very good to know, thanks.

Definitely something I’m going to have to watch out for moving forward.

On Twitter: @baseballtwit

by adarowski on Nov 29, 2010 12:40 PM EST up reply actions  

*ahem*

The baseball season doesn't have to end! Create your own players, coach your own teams, and join your friends in THE premier baseball MMO. Two Out Rally opens October 25th!
Two Out Rally, BASEBALL MMORPG | Facebook | @2OutRally

by Justin Bopp on Nov 29, 2010 1:51 PM EST via mobile up reply actions  

?

Either I’m missing something here or I’m not expressing myself clearly (either of which is completely possible — it’s been an extremely trying day).

by The Ancient Mariner on Nov 29, 2010 5:30 PM EST up reply actions  

I'm guessing Justin

is indirectly referencing previous discussions in which we decided that different versions of WAR and replacement level are actually a good thing because they represent critical thinking and objective analysis rather than arbitrary consensus.

Or he could be agreeing with you.

by vivaelpujols on Nov 29, 2010 6:24 PM EST up reply actions  

"Arbitrary consensus" is definitely not preferred.

However, stats with the same label should always mean the same thing.

The different versions do a disservice to the community and only serve to further alienate our audience. I’m not saying there should be some consensus, I’m saying each should come up with their own moniker.

The baseball season doesn't have to end! Create your own players, coach your own teams, and join your friends in THE premier baseball MMO. Two Out Rally opens October 25th!
Two Out Rally, BASEBALL MMORPG | Facebook | @2OutRally

by Justin Bopp on Nov 29, 2010 9:54 PM EST up reply actions   1 recs

Well they are both the same framework

What’s wrong with fWAR and rWAR?

Unless you wanna give them cool names or something.

by vivaelpujols on Nov 30, 2010 12:36 AM EST up reply actions  

Break it down by component

For Brooks – Fielding and position adjustments are equal.

Batting +20 BR, +133 FG
It looks to me like FG is not removing pitcher hitting from the league totals. But they are also using different approaches, like starting with WOBA, where I calculate custom baseruns coefficients for each team.

Rep +341 BR, +393 FG
I don’t think Fangraphs is adjusting for league quality, where I have the AL as the inferior league for much of Brook’s career.

Baserunning – not counted on Fangraphs, but Brooks came out roughly average anyway, so no big deal.

GIDP – also not counted on Fangraphs, but I have him at -35 runs.

Add it up and you’ve got about 20 wins.

"That boy is our last hope" - Obi Wan Scioscia, as Francisco Rodriguez left for the Mets. "No, there is another" - Yoda Reagins.

by RallyMonkey5 on Nov 29, 2010 3:24 PM EST reply actions   1 recs

I'm not good at the SABR

But I definitely would love to see more of an explanation on the ridiculous batting runs differential, wherein lies almost the entire difference. RM5’s couple sentences on that doesn’t really do it for me.

""I’d like to be a crossword clue one day. I want to be in The New York Times’s Sunday edition. Right now, the clue ‘Giants great’ is always Mel Ott. I want my clue to be down, not across. The down ones are usually harder. And when I’m the clue, I’ll fill it in — just that one — and frame it. " - Brian Wilson.

by hairball on Nov 29, 2010 7:06 PM EST up reply actions  

FanGraphs uses wOBA

Which is league wide linear rates (think of it as average runs per plate appearance).

Baseball Reference uses team BaseRuns, which derives it’s own weights from the team’s run environment.

In a nutshell, FanGraphs offensive rating is the same for all players, whereas Baseball Reference’s varies based on the quality of the player’s teams’ offense.

by vivaelpujols on Nov 29, 2010 7:51 PM EST up reply actions  

Different replacement levels

Different ways of measuring offense (fWAR uses generic linear weights, rWAR uses linear rates generated from team run environment), different positional adjustments, and Rally includes stuff like baserunning, which FanGraphs doesn’t.

by vivaelpujols on Nov 29, 2010 3:26 PM EST reply actions  

Yeah

I would liken this to comparing fWAR to WAB or WARP. Different replacement levels give different results (and slight variation on inputs). The confusion stems from the use of the same construct, I think.

by JD Sussman on Nov 29, 2010 6:45 PM EST up reply actions  

I don't think so

I don’t think they are on different scales – at least not in the traditional sense. It’s just that they have different assumptions behind the components – because they differ on replacement level, rWAR will almost always be lower, but that doesn’t mean it’s scaled differently.

Am I making sense?

by vivaelpujols on Nov 29, 2010 8:55 PM EST up reply actions  

In other words...

Fangraphs cuts you off at the ankles while BRef cuts you off at the knees, but neither puts you on a rack to stretch or squoosh you.

by Sky Kalkman on Nov 29, 2010 9:02 PM EST up reply actions  

See, that makes me think averaging them probably isn't best...

Since it sounds like 6.0 rWAR is more impressive than 6.0 fWAR. And from a career perspective, it seems 60 rWAR is a hell of a lot more impressive than 60 fWAR. I mean, 39 extra players have accomplished 60 fWAR than 60 rWAR.

On Twitter: @baseballtwit

by adarowski on Nov 29, 2010 10:12 PM EST up reply actions  

Oh god, screw it.

Can we just call one of them INKY and get it over with?

On Twitter: @baseballtwit

by adarowski on Nov 29, 2010 10:12 PM EST up reply actions  

Well if you average it out for all players it should be fine

I don’t see the problem as one metric is always going to be systematically “biased” up or down.

by vivaelpujols on Nov 30, 2010 12:30 AM EST up reply actions  

But...

it’s a lot nicer than if each increase in 1 WAR in rWAR was equal to an increase of 1.3 fWAR or something.

by Sky Kalkman on Nov 30, 2010 9:56 AM EST up reply actions  

Right

So, averaging them requires you to keep in mind that fWAR is typically a bit higher (because of the replacement level), so it will be weighed a bit heavier in the average.

Ugh, feels like OPS all over again.

On Twitter: @baseballtwit

by adarowski on Nov 30, 2010 10:34 AM EST up reply actions  

Ok that might not be clear

I mean by averaging them you’re basically splitting the replacement level down the middle. All else being equal of course.

Since both systems calculate everything as above average except for the replacement value, you should be able to plug in whatever replacement level you want and scale them for better comparisons.

It probably would be a pretty easy spreadsheet to figure out. If no one else wants to take on the challenge, I’ll give it a try at some point when work calms down some.

by Dan Turkenkopf on Nov 30, 2010 11:38 AM EST up reply actions  

I think I just figured out my next VAR.

Thanks Adam. :)

The baseball season doesn't have to end! Create your own players, coach your own teams, and join your friends in THE premier baseball MMO. Two Out Rally opens October 25th!
Two Out Rally, BASEBALL MMORPG | Facebook | @2OutRally

by Justin Bopp on Nov 30, 2010 3:08 PM EST reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

Yahoo_full_count

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Recent_pic_pg_small Patrick Gordon

Btbpro_small Dave Gershman

Me_small Bryan Grosnick

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung

30472_1481067225243_1190689185_1381415_997334_n_small Glenn DuPaul

1mnvxku7_small joshuaworn

Set_small MattFilippi18

Photo0011_small Nathaniel Stoltz