Team WAR vs Actual Wins
I was doing some work on projecting WAR of players and using that information to predict how many games a team might win in a season. Once I got that information, I wanted to know how well the WAR values correlate to actual team wins. Using Rally's WAR database, I took all the teams since 1980 and plotted their actual wins versus the WAR of all the team's players (plus the 48.6 wins for a replacement level team). Here is the results:

Besides the graph, I determined the standard deviation of WAR projected wins (WAR plus replacement level wins - 48.6) minus actual wins to be 6.5. So 66% (one standard deviation) of the teams, 20, should be within 6.5 wins of the teams total of sample and 95% of the teams, 28.5, should be within 13 wins. So it a team has its 48.6 wins for at least being a replacement level team and 32.4 additional WAR from its players, it should have 81 wins according to WAR. Actually this team has a 66% chance of winning from 75.5 to 87.5 games or a 95% chance of winning 68 to 94 games. No eye-popping numbers here, but hopefully someone else will find it helpful and not have to do the work I did.
There was discussion in the comments and I decided to put up a better graph of the data with a 1Win:1WAR line for reference. Here is the original graph:
2 recs |
36 comments
Comments
This is FANTASTIC!!
I did a study this year comparing the two and came up with a similar, if predictable finding.
"What we do in life, echoes in eternity!"
by Justin Bopp on Sep 18, 2009 4:11 PM EDT reply actions 0 recs
I like your scatter chart much better for effectiveness (though i'd switch the axes), but here's this year's AL version dating back to the beginning of August:
http://grittyandclutch.blogspot.com/2009/08/war-vs-wins-american-league-study.html
I split mine into Pitching WAR vs. Batting WAR.

"What we do in life, echoes in eternity!"
by Justin Bopp on Sep 18, 2009 4:15 PM EDT up reply actions 0 recs
Probably need a larger sample
But I wonder if there’s any trend in the ratio of hitter war:pitcher war for teams that under (or over) perform their WAR projected wins. I’d assume not (a WAR is a WAR), but it’d be interesting to investigate (someone probably already has).
by stevesommer05 on Sep 18, 2009 4:55 PM EDT up reply actions 0 recs
That is the absolute question. Is a WAR a WAR?
Let’s collect a larger sample of each and establish a trendline for both and fine which one correlates more closely with actual wins.
"What we do in life, echoes in eternity!"
by Justin Bopp on Sep 18, 2009 5:23 PM EDT up reply actions 0 recs
Well...
I grabbed WAR data from fangraphs from 2002-2008 to take a quick look at this issue and as expected neither batter WAR nor pitcher WAR was more predictive than the other. Also of note there were no trends that I could find for teams that either over or under performed their WAR predicted win total (I ran a few linear models on ratio of batter to pitcher WAR vs. difference in projected and actual wins to no avail).
by stevesommer05 on Sep 18, 2009 10:50 PM EDT up reply actions 0 recs
Great - I got Justin bringing out his Photoshop graphics against my OpenOffice ones ;-)
Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.
by Jeff Zimmerman (TucsonRoyal) on Sep 18, 2009 5:27 PM EDT up reply actions 0 recs
BOOYAH!
"What we do in life, echoes in eternity!"
by Justin Bopp on Sep 18, 2009 6:34 PM EDT up reply actions 0 recs
Just to anger some people
What’s the correlation for a quadratic regression?
by Tommy Bennett on Sep 18, 2009 4:30 PM EDT reply actions 0 recs
ANGER RISING
(what?)
"What we do in life, echoes in eternity!"
by Justin Bopp on Sep 18, 2009 5:23 PM EDT up reply actions 0 recs
They will remain nameless
Some people get very angry when you suggest the value of WAR is not linear.
by Tommy Bennett on Sep 19, 2009 1:36 PM EDT up reply actions 0 recs
Sabermetric inside jokes.
And we wonder why people are scared of us.
@bs_uf15bosox9be The Original Gameday; Learn to use SB Nation
by bs.uf15bosox9bears23 on Sep 19, 2009 11:11 PM EDT up reply actions 0 recs
Question
Maybe I’m misinterpreting the graph, but it looks to me that at the higher win totals, WAR is actually over-estimating Wins? I would think that since high-win teams have a great amount of ‘luck’ that we would see just the opposite?
KJOK
by KJOK on Sep 18, 2009 7:12 PM EDT reply actions 0 recs
Interesting. How about a quadratic regression?
Beyond the Boxscore Not a member? Sign up.
by Sky Kalkman on Sep 18, 2009 7:14 PM EDT up reply actions 0 recs
I shouldn't have the equation line on there.
It is the best fit which the lower values are distorting. The line should be one to one like red one here and the upper values are more correct:

Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.
by Jeff Zimmerman (TucsonRoyal) on Sep 18, 2009 8:21 PM EDT up reply actions 0 recs
Isn't KJOK's point that the best fit line doesn't have the same slope as the 1:1 line?
Beyond the Boxscore Not a member? Sign up.
by Sky Kalkman on Sep 18, 2009 8:42 PM EDT up reply actions 0 recs
If that is what, me means, that is correct
The line should not be used as a predictor, it should be one to one like the red line. The main problem is the 20 or so points from 50 Win/65WAR to 70 Win/85 WAR where Win total is ~15 below WAR, there is no grouping like this below the line.
With log and exponential equations the R-squares are .70 and .74 respectfully
Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.
by Jeff Zimmerman (TucsonRoyal) on Sep 18, 2009 8:58 PM EDT up reply actions 0 recs
I agree that the line shouldn't be used as a predictor
It should simply be X WAR = X Wins. For teams that win a lot of games, you’ll see them have more Wins than WAR, and for really bad teams it’s the other way around, which is how the orange line looks. The reason for that is because teams that win or lose a lot of games will have to get lucky or unlucky. True Talent Level is a little bit more condensed.
by vivaelpujols on Sep 19, 2009 12:17 AM EDT up reply actions 0 recs
Still Confused
So, is the Red line ‘real’ wins, and the yellow line ‘WAR" wins? If so, then it makes sense. If it’s the reverse, then WAR is not doing a good job of modeling reality, at least for the ‘high win’ teams.
KJOK
by KJOK on Sep 19, 2009 1:04 AM EDT up reply actions 0 recs
I will remove the yellow line -- it and the equation should not be on there.
It is created by the spread sheet as the best fit line between WAR and wins. I needed it to get the r-squared value to show up. The line should be 1:1 like the red one. 1 WAR for 1 Win.
As Viva points out. There are teams that got lucky and won a few more games and few teams that were unlucky and lost a few more than they should have. The Points on the extremes (Seattle at 118 wins and Detroit at 42) are actually showing those team’s talent. Their total WAR and wins are close to 1:1.
Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.
by Jeff Zimmerman (TucsonRoyal) on Sep 19, 2009 1:13 AM EDT up reply actions 0 recs
So looking at the 1:1 line...
… points near the lower left (through about 70/80 wins) look to be more above the line than below it, meaning teams who lose a lot of games tend to have more WAR than wins. And above 70/80 wins, there are more points below the line, meaning teams who wins a lot of games tend to fewer WAR than wins. That’s precisely what we’d guess given the Wins = WAR + Luck approach.
And that’s the same thing the best fit line showed.
Beyond the Boxscore Not a member? Sign up.
by Sky Kalkman on Sep 19, 2009 7:54 AM EDT up reply actions 0 recs
Exactly
Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.
by Jeff Zimmerman (TucsonRoyal) on Sep 19, 2009 9:31 AM EDT up reply actions 0 recs
Agreed.
"What we do in life, echoes in eternity!"
by Justin Bopp on Sep 19, 2009 1:32 PM EDT up reply actions 0 recs
but
In looking at 100+ win teams, it seems that they have more points ABOVE the line, which is certainly not what I would expect.
I would expect WAR would track more with ‘3rd order wins’ than real wins, especially at the 100+ win level?
KJOK
by KJOK on Sep 19, 2009 10:36 PM EDT up reply actions 0 recs
Look at the Orange line
Not the origional one.
by vivaelpujols on Sep 20, 2009 1:36 AM EDT up reply actions 0 recs
Yes
Yes, the Orange Line. The two highest win teams (Seattle and New York I’m guessing) are right on the Orange line, and I would expect them to be substantially below the Orange Line.
KJOK
by KJOK on Sep 20, 2009 8:05 PM EDT up reply actions 0 recs
I would agree.
But it’s only two points. If you look at the rest of the high-win teams, the majority of them lie below the orange line, as we’d expect.
Beyond the Boxscore Not a member? Sign up.
by Sky Kalkman on Sep 21, 2009 9:12 AM EDT up reply actions 0 recs
Looks exponential to me.
"What we do in life, echoes in eternity!"
by Justin Bopp on Sep 18, 2009 9:29 PM EDT reply actions 0 recs
How about WAR vs. Pythag wins?
That way we can limit the luck/unlucky bias.
Not afraid to nitpick
by joker24 on Sep 19, 2009 2:30 PM EDT reply actions 0 recs
Luck does not have a bias
And the value of this comparison is looking at a statistic and comparing it to reality. In this study, the statistic proves its general value. If you look at the 100 WAR/100 Wins sector (or just notice the slope), it’s obvious that the components of WAR result in an almost 1-to-1 ratio with actual wins.
"What we do in life, echoes in eternity!"
by Justin Bopp on Sep 19, 2009 3:07 PM EDT up reply actions 0 recs
I think he more meant that the further you get from 81 wins, the more likely the team's win total included some luck.
Above 81, lucky, below 81, unlucky. Pythag will remove some, but not all of that luck, and it will also remove some skill as crossfire.
Beyond the Boxscore Not a member? Sign up.
by Sky Kalkman on Sep 19, 2009 3:40 PM EDT up reply actions 0 recs
Yeah the bias for high win teams to have gotten lucky/low win teams unlucky
I think it’s safe to say regardless of just how much you believe in Pythag that raw wins can and does overvalue/undervalue some teams fairly significantly. Obviously WAR is useful that’s not a debate, but I’d think Pythag wins could clean up some of that error as intuitively I’d think WAR would correlate better with Pythag than it does with raw wins.
Not afraid to nitpick
by joker24 on Sep 20, 2009 8:56 AM EDT up reply actions 0 recs
My favorite part of this is the bigger conclusion. Look:
1. No team with a combined 70 WAR or below has had more than 84 wins in the past three decades.
2. No team with a combined 90 WAR or above has had fewer than 86 wins in the past three decades.
That’s fairly significant. Cross reference that with a study on the average number of wins it takes to make the playoffs, and BLAMMO—an all purpose guide for telling teams how much WAR they’ll need to make the playoffs the next season.*
*provided we have accurate WAR projection models in place.
"What we do in life, echoes in eternity!"
by Justin Bopp on Sep 19, 2009 3:12 PM EDT reply actions 0 recs
LAST SEASONS STATS
© One Kaufman Way
I'm not a sabermetrician, but I do play one at Driveline Mechanics.
Can't get enough of me? Check out my Twitter feed.
by devil_fingers on Sep 20, 2009 1:29 AM EDT up reply actions 0 recs
A couple of late thoughts
What I think would be interesting is to look at how predicative WAR is vs actual wins. So run a trendline of wins from one year vs wins the next, and WAR one year vs. wins the next. I know there will be problems with players leaving or coming to the organization, and players performing better or worse, but those should affect both the same and it would be cool to see if there’s a major difference between the R squared for each.
Also, Stevesommer mentioned looking for trends in teams that under or overperformed. I have always thought that teams that bunt/situational hit might be undervalued a bit because moving a runner over counts as a normal out, when it does have more value than that (or less negative value). It would be interesting to see if this made any difference. I would guess that a lot of this would be captured in a baserunning statistic, as the runner would get credit for moving up the base, but I’m not sure exactly how those are calculated, and if the WAR you are using takes those into account (I think fangraphs just uses sb & cs, not an overall baserunning score).
These are things I’d look at if I had more time (and was better and this type of thing), and probably will when things slow down here. If someone does this, or has done it, let me know!
by lookatthosetwins on Sep 24, 2009 12:33 AM EDT reply actions 0 recs

by 













BtB on Facebook















