The truth about Wins Above Replacement (WAR) is that it's not really an official statistic. Rather, It is more of a general model, with multiple implementation coming from various sources.
Two of the most oft-cited implementations are FanGraphs' WAR and Rally's WAR (often cited as Sean Smith's WAR or BaseballProjection.com WAR—this is also the WAR used on Baseball-Reference.com). For position players, the implementation is similar, despite some key differences that can make certain players vary in value depending in the WAR source you use.
For example, Dustin Pedroia's 2008 MVP season was worth 6.6 WAR according to FanGraphs and 5.2 WAR according to Rally's WAR. What's the difference? The main difference is the defensive metric used. FanGraphs uses Ultimate Zone Rating (UZR), which had Pedroia worth 9.9 runs above average. Rally's WAR uses Total Zone which had Pedroia worth 1 run above average (range and double play combined).
The two metrics were in much better agreement in 2009, as FanGraphs had Pedroia at 5.0 WAR and Rally had Pedroia at 4.9 WAR. So far in 2010, Rally has Pedroia rated a bit higher (3.6 to 3.3). Total Zone has Pedroia already surpassing his career bests with the glove while FanGraphs is a bit more conservative.
Where these two WAR implementations deviate from each other even further is pitching WAR. FanGraphs is based on Fielding Independent Pitching (FIP) while Rally's WAR is based on runs allowed, which is then adjusted for defense and park factors.
I've noticed two main areas of focus in sabermetrics—projection of the future and analysis of the past. Personally, I'm far more interested in the much smaller camp—that which analyzes the past. It seems to me that each version of pitcher WAR should be used by different camps. FanGraphs' pitcher WAR is better geared toward future projection while Rally WAR is better for analyzing the past.
I get the value in FIP—you take walks, strikeouts, and homers and see how a pitcher would do if the defense was taken out of the equation. In many ways, it puts pitchers on a level playing field. But allow me to make an analagy between WAR and my profession: web development.
I'm a front-end web developer. What does that mean? Basically, I build the stuff on a web site that you see. If you're at all familiar with web devlopment, then you probably know the bane of our collective existence: Internet Explorer.
Internet Explorer is shitty defense. The most beautifully written markup and style sheets can be sent to Internet Explorer only to be blown to hell. We developers are then forced to expand our skills by learning to deal with the elements that surround us (in this case, a browser with enormous market share that rejects industry standards and does it's own thing), modify our approach, and deliver markup and style sheets that may not be written "by the book", but work in Internet Explorer.
Perfect, standards compliant front end code is Fielding Independent Pitching. If you write it well, you're a good developer. It should all just work. But it doesn't. But no pitcher whiffs 27 in every game, so defense has to come into play. No (public-facing) website can be built just for Firefox or WebKit. You have to deal with internet Explorer.
Have a great infield behind you? Throw more sinkers with men on base to get those double plays. Got fast outfielders who get good jumps? Don't be afraid to throw the high heat. Your guys can get to the gaps quickly. Pitchers can modify their basic approaches based on the external factors around them and that's why I'm uncomfortable analyzing past results solely on FIP. If I deliver a perfect site that fails in Internet Explorer, nobody is going to give me an 8.5 WAR season for my effort. FIP, and by extension Fangraphs WAR) says "in an ideal world, this is what we could reasonably expect to happen". That's what makes it great for projection. You can take that projection and adjust it for the anticipated environment.
But when talking about past performance, we know that it didn't occur in an ideal world. That's why we should start with what actually happened (runs allowed) as the baseline and then start adjusting for other factors.
FIP doesn't capture the ability to overcome a key error by your shortstop by inducing a timely grounder. It also doesn't capture the ability to diagnose and fix a guillotine bug (yes, this is the type of crap we deal with). But these skills are key to finishing the job and succeeding.
Rally for the past. Fangraphs for the future.