Welcome back to our ongoing series on Basic Sabermetrics, where we do our best to introduce sometimes-complex sabermetric statistics and concepts in an understandable fashion. You can see plenty (well, not plenty yet, but a few) more articles in our All About Sabermetrics page, where we attempt to de-mystify and explain the core -- and not-so-core -- concepts of objective baseball analysis.
Today, we're focusing on the one "sabermetric" statistic that probably gets more play, more positive attention, and more derision than any other: wins above replacement or WAR. WAR is an incredibly divisive metric, worthy of attention, but often misunderstood.
Basic Sabermetrics: WPA
Bryan breaks down the basics behind Win Probability Added (WPA), a sabermetric statistic that demonstrates how a player affected his team's percentage chance of winning a game.
Talking effectively about WAR requires considerable time and effort, given how much work goes into computing this number, how many factors / metrics it takes into account, and the fact that there are several permutations of it available to the public. So grab yourself a seat and a sandwich, take a deep breath, and let's get started.
What is WAR?
WAR is, at it's heart, a relatively simple concept. It's an attempt to measure all of a baseball player's on-field contributions using a single numerical value. For position players, that means accounting for hitting ability, defensive ability, and baserunning -- then adding a positional adjustment to reflect the value of playing a more difficult or skilled position. For pitchers, that means accounting for their pitching ability.
By converting all of a player's on-field value to a single number, we can use this number to compare players across leagues, positions, era ... even compare the value of a pitcher to the value of a hitter. And while it's inadvisable to use WAR values as strict, hard-and-fast, no-room-for-discussion value measurements, they are incredibly useful in analysis and performance measurement. But we'll talk more about that later.
The WAR number is called "wins" above replacement because that's the unit of value assigned to the player. A player is assigned a certain number of runs (positive or negative) for each aspect of their performance, and those runs are later converted to wins. The more wins that a player is worth, the better that player is.
Now, this metric also uses the concept of "replacement level," which is a little tougher to wrap your mind around at first. In WAR metrics, replacement level is considered to the level of performance that is freely available on the waiver wire, as a free agent, or as a minor-leaguer -- at any time. As such, replacement level changes a bit from season to season, as the overall talent level in the game changes. This isn't an arbitrarily selected number -- all implementations have a process for determining the appropriate replacement level of performance.
There are three major public implementations of the basic wins above replacement framework, and all compute their WAR in different ways. This makes for quite a bit of confusion whenever someone refers to WAR, as they could be referring to any one of the three major implementations. Since the three implementations all calculate WAR differently, it's important to identify which implementation is being referred to ... as such "WAR" should only be used to refer to the idea of wins above replacement as a framework, not any of the three specific implementations.
Are you following along so far? Great.
Let's look at a position player example: Ian Desmond of the Nationals. In 2013, Ian Desmond was a pretty good hitter, defender, and baserunner at the premium position of shortstop. As such, FanGraphs credited him with +12.1 runs as a hitter and +3.3 runs as a baserunner, to account for his offensive performance. They also credited him with +4.4 fielding runs, and +7.2 runs for playing shortstop for as many games as he played. Last, they grant him +1.0 runs to account for his league, and +18.7 runs to account for the replacement level current to the game at his position. That all adds up to a grand total of 46.6 runs above replacement. Using the current translation (where a little more than nine runs equals one win) at FanGraphs, that comes out to 5.0 fWAR.
How about a pitcher? Our example will be Ervin Santana of the Royals. In 2013, Santana had a good season. FanGraphs credits his pitching performance as worth +28.4 runs above replacement over 211 innings pitched (we'll talk about how later in this article), which is later translated to 3.0 fWAR.
So WAR is, at it's heart, a way to find the overall value of a ballplayer, as compared to "freely available" talent at their position.
Photo credit: Greg Fiume
Where to Find WAR
I'm switching up my usual progression in these articles, because talking about where to find WAR is essential to understanding the metric. In short, there are three implementations of WAR that you can find easily on the internet. When someone says "WAR", they could be referring to one or all of these three "flavors" of WAR. Each form of WAR is computed differently, using different inputs, but all try to tell the same thing: how many wins above replacement was a player worth?
Let's talk a little about each.
Baseball-Reference WAR (commonly abbreviated as bWAR*)
Possibly the most widely-referenced implementation of wins above replacement, due to Baseball-Reference's reach, when an outlet like ESPN or the MLB Network talks about WAR, they're probably talking about this implementation of WAR. bWAR is easy to find on Baseball-Reference player pages, just scroll down to the "Player Value--Batters" section on any player page, and it's listed as WAR.
bWAR's primary offensive input (weighted runs above average, or wRAA) is used in other WAR implementations in a similar way, so hitter WARs, especially on the offensive side, look pretty similar to -- but not the same as -- the next WAR implementation. But pitching bWAR is built off of runs allowed (RA9), and as such can be very, very different from the other types of WAR. For specific information, check out the upcoming piece on bWAR.
Note: Sometimes this flavor of WAR is abbreviated as rWAR, as well. I'm actually making an executive decision here, to use bWAR as the Baseball-Reference WAR abbreviation, simply because rWAR can sometimes be used to refer to "Rally WAR" (Sean Smith's WAR), a previous way WAR was calculated at Baseball-Reference.
FanGraphs WAR (commonly abbreviated as fWAR)
The WAR implementation found at FanGraphs is commonly referred to as fWAR, and is also widely used, especially by the sabermetric community at large. FanGraphs uses fWAR as the default sorting mechanism on their statistical leaderboards at the site, making it very easy to find, use, and compare. It's also given a place of focus on each of the player cards, and their "Value" tab on the player cards shows a fairly clear breakdown of the components of fWAR.
fWAR also uses wRAA as the primary offensive input, but does so slightly differently from bWAR. It also uses different fielding and baserunning inputs, though fWAR and bWAR share the same replacement level. Where fWAR and bWAR differ most starkly is in pitching measures. fWAR uses FIP, a measure of strikeouts, walks, and home runs allowed, as the starting point for their WAR metric. bWAR does not -- it uses RA9. This creates huge variances in many cases between fWAR and bWAR for the same pitchers, even for their performance in the same years.
For specific information, check out our upcoming piece on fWAR.
Baseball Prospectus WARP (commonly abbreviated as WARP)
Baseball Prospectus has it's own win value metric, but instead of calling it WAR, they call it wins above replacement player, or WARP. While this does tend to recuse it from confusion with bWAR or fWAR, the metric fundamentally tries to capture the same things.
Last I checked, WARP currently uses a higher replacement level than fWAR or bWAR, which makes the total pool of scores slightly lower than the pools for fWAR and bWAR. As such, you might expect to see a lower average WARP value for players than you might for fWAR or bWAR.
Though BP's WARP value (built off of VORP) is one of the first ways to look holistically at player performance, it kind of lives today as a somewhat marginalized WAR metric. Both fWAR and bWAR seem to get more overall publicity and use in analysis than WARP, probably due to the seeming opaqueness of BP's processes. In my personal opinion, it's about as useful as the other two implementations, so use it!
For specific information, check out our upcoming piece on WARP -- which we'll write as we're able to capture more / better information.
Photo credit: Pool Photo-USA TODAY Sports
Even though there are three different types of WAR metric, they all tend to come to similar (not the same) conclusions, and though it's common to see small differences, it's rare to see massive differences in value. To wit, Mike Trout led baseball in all three implementations of WAR: he rated a 10.4 WARP, 9.2 bWAR, and 10.4 fWAR. Clayton Kershaw led baseball in all three implementations of WAR among pitchers: he rated a 5.1 WARP, 8.4 bWAR, and 6.5 fWAR.
Of course, any person or outlet can create their own WAR metric. To editorialize for a moment, this should be done more often by independent agents. While FanGraphs, Baseball-Reference, and Baseball Prospectus are excellent outlets with excellent data, I'm confident that we're working as part of an iterative process here, and we're not done perfecting any of the WAR metrics. More attempts might help improve the process.
How to Calculate WAR
So how do the three publishers of public WAR metrics -- specifically -- calculate WAR? Well, we'll be writing a separate article on each of these statistics in order to get into the right level of detail. For the time being, we can talk a little about how each is developed here, to some extent.
Important note: If you do not care about the components of the WAR metrics, skip this section and go to the "Using WAR in Analysis" section.
FanGraphs WAR (fWAR)
For position players, fWAR is currently computed using the following statistics:
wRAA + UZR + wSB + UBR + Positional Adjustment + League Adjustment + Replacement Level = RAR which is then converted to WAR (fWAR)
wRAA measures offense, Ultimate Zone Rating (UZR) measures defense, and weighted stolen base runs (wSB) and Ultimate Baserunning (UBR) measure baserunning. RAR is runs above replacement. For the pre-UZR era, FanGraphs uses Total Zone (TZR) instead of UZR. For the pre-UBR era, UBR is simply eliminated from the calculation.
For pitchers, fWAR is currently computed using the following methodology:
FIP is converted to a run value, accounting for replacement level, park factors, run environment, and innings pitched. This run value is then converted to WAR (fWAR).
For more information on the math as to how this is done, check out this explanation by Dave Cameron.
Baseball-Reference WAR (bWAR)
For position players, bWAR is computed using the following statistics:
modified wRAA + DRS + baserunning runs + Park Factor + Position Adjustment + Replacement Level = RAR, which is then converted to WAR (bWAR)
wRAA measures offense, Defensive Runs Saved (DRS) measures defense, an unknown metric is used to measure baserunning runs. RAR is runs above replacement. For the pre-UZR era, Baseball-Reference uses Total Zone (TZR) instead of DRS.
For pitchers, bWAR is computed using the following methodology:
Start with runs allowed and innings pitched, then adjust for level of opposition, interleague, team defense, role (starter or reliever), and park factors. Convert runs to wins using PythagenPat, then add an adjustment for leverage, and an adjustment for replacement level to get WAR (bWAR).
Baseball Prospectus WARP (WARP)
Okay, so here we hit a little snag. Before Colin Wyers left Baseball Prospectus, the site was in the midst of re-working WARP in a more transparent way, which is a good thing. Because we don't know everything about how WARP is calculated, either for pitchers or for position players.
Here's what we think we know for position players:
True Average (TAv) converted to runs + Positional Adjustments + Park Factors = VORP
VORP + EqBRR + FRAA = total runs, which are then converted to wins using PythagenPat, which leads to WARP
Here's what we know for pitchers:
I think that they use Runs Allowed as a basis for pitching WARP. WARP accounts for replacement level (differently from bWAR and fWAR), for innings pitched, for run environment and park factors, for team defense, and converts this run value to wins using PythagenPat. End result is WARP.
That's it. As Baseball Prospectus's new stats team, led by Harry Pavlidis, does more work, I'm sure that WARP will become more transparent, and probably tweaked in some fashion.
Using WAR in Analysis
So what do the WAR metrics allow the reader or armchair analyst to do? It allows us to compare any player to any other player ... across positions, across era, across leagues.
For positional players, you can look at Mike Trout's bWAR values and understand how much he contributed overall as a player, beyond just his offensive, or baserunning, or fielding numbers alone. You can look at Lou Gehrig's career fWAR and compare it to Joe DiMaggio's to see which player appeared to add more value to the Yankees.
For pitchers, well, you can do the same sorts of things, but you can also compare the value of a pitcher to that of a position player, which can be helpful when trying to look across the traditional boundaries of hitter / pitcher. For example, looking at Clayton Kershaw's bWAR in 2013 (7.8) might help throw into stark relief that his season may have been just as good, or better, than many NL MVP candidates on the positional player side.
Keep in mind, each of the WAR values are, in essence, counting stats. That is to say that a player with more innings pitched or more plate appearances / games played will have more chances to rack up a higher WAR value than a player with fewer IP / PA / games. If Matt Harvey racks up 6.1 fWAR in 178 innings pitched in 2013, but Adam Wainwright racks up 6.2 fWAR in 242 innings pitched in 2013, which is the better pitcher? Well, it depends on how you value the innings pitched difference, but fWAR would imply that Harvey was a more effective pitcher in the innings that he pitched, given that he was able to rack up nearly the same value in 60+ fewer innings.
Also, one should probably remember that WAR formulas and implementations tend to shift over time. There are a couple reasons for this. One, and this is an important one, is that each major WAR implementation is regularly refined as new advances in sabermetrics help us better understand how to quantify value. All three measures have changed the way in which they compute WAR over the last few years, in an attempt to improve accuracy.
In addition, some of the stats used to help compute WAR values today didn't exist in the same way in the past. For example, both bWAR and fWAR use advanced defensive statistics (DRS and UZR) in their calculations, but these stats can't be computed prior to 2002, due to data limitations. For seasons before 2002, these implementations have to use a different measure: Total Zone Rating. It's smart to account for these factors, recognize the limitations of working with incomplete data sets, and analyze accordingly.
Criticisms / Considerations for WAR Implementations
What WAR values shouldn't be treated as, however, are absolute values. Let's say Jayson Werth has an fWAR of 4.6 in 2013, and Adam Jones has an fWAR of 4.2 in 2013. Does that mean that Werth was definitively better than Jones? Probably not. Like any statistic relying on certain assumptions, you have to build error bars -- measures of uncertainty -- into your analysis. At the same time, it's pretty safe, in my eyes, to say that Paul Goldschmidt (7.5 WARP) was better than Kyle Seager (4.9 WARP).
Another important consideration in WAR implementations -- all three of them -- is the value of defense. The defensive metrics currently used in each WAR implementation (DRS, UZR, and FRAA) are all somewhat unreliable, especially at the single-season level. As such, single-season WAR values have a level of uncertainty, and some players might see an enormous jump in a seasonal WAR value due to a huge positive (or negative) defensive run value. The following season, these numbers might fall precipitously, or rise, and that could leave a player with a giant outlier season. Perhaps the player had a preternaturally good defensive year. Perhaps the player was mis-represented to some extent using the defensive metrics. Either way, taking single-season defensive numbers as law is a mistake. They're far better than guessing, or using no objective measure, or using errors -- but they're not as reliable as offensive or pitching numbers.
Another defense-related consideration is how catcher defense is reported in WAR implementations. Currently, there is a lot of research and development being done to help identify the catcher's effect on defense and run prevention. As currently constructed, most WAR implementations try to account for the special nature of catchers, but they do not all do so in a way that is consistent with the most recent research. For example, recent studies into the catcher framing show that a catcher could potentially save his team upwards of 40 runs over the course of a season with excellent framing skill. As these studies are newer, and still being evaluated, catchers may or may not be getting their due reflected in our WAR implementations. Catchers are -- and may always be -- incredibly difficult to value in a quantifiable way.
If I could editorialize for a moment, I'd say that the proper way to use any of the WAR metrics is this: use fWAR, bWAR, or WARP to start a discussion, not to finish it. While referencing a WAR metric can be a quick and easy shorthand to point to a player's skill, or lack thereof, the value itself doesn't usually describe the player's true skill level -- especially without context. If a WAR value doesn't look right to you, investigate the components, break down the pieces, and try to understand what made the value what it is.
Personally, I'd love to see a WAR implementation that more heavily regresses defensive numbers, accounts for both FIP and RA9 equally, and and implements some sort of catcher framing component. But, as I stated earlier, it's up to anyone up to the challenge to develop their own WAR implementation.
In conclusion, WAR is useful, WAR is smart, and WAR is an imperfect way to measure total player value. It is also probably the best way to measure total player value at this time, in my opinion. As with all things, your mileage may vary, but at the end of the day, it is a terrific shorthand to describe a player's value, and to lead one to explore a player's statistics and abilities more thoroughly.
So much of what's done at FanGraphs, Baseball-Reference, and Baseball Prospectus can be aligned to WAR, so it's not terribly useful to talk in specifics about the components of the WAR implementations here. Instead, we'll briefly introduce you to a few WAR-related items.
RA9-WAR: This is an alternate way of calculating pitcher wins above replacement developed by FanGraphs. In essence it is a WAR developed using their methodology, but using runs allowed (RA9) in place of FIP in the calculation. It can also be broken down into several components: traditional FIP-based fWAR and FDP-Wins (further broken into LOB-Wins and BIP-Wins).
WAR Index (WARi): Developed by
the exceptionally brilliant Bryan Grosnick here at Beyond the Box Score, WAR Index is a methodology for taking the three major WAR metrics and putting them on the same scale, then averaging the three into a single value.
WAR Comparison Chart - Baseball-Reference (note, this is document, while very useful, is not completely updated at this time)
. . .
Bryan Grosnick is the Managing Editor of Beyond the Box Score. You can follow him on Twitter at @bgrosnick.