Ricky Nolasco: 4 WAR or 1 WAR?

This is a question I asked on my own blog, but I thought it'd get decent play here, and it addresses an interesting about pitcher WAR. Also, I have author privilege here, but I haven't written a damn useful thing on BtB for some time now, so I'd like to contribute.

Now as most readers here know, there are two great places to find WAR for any player. One of them is FanGraphs, the other is Rally's historical WAR database. One of them is fast and constantly updated, the other goes is comprehensive throughout baseball history (hence the "historical" part). The two databases measure positions players in essentially the same fashion. Some of the inputs and rates are different, but you aren't generally going to see a stark contrast in any one player; most differences are going to be due to different inputs (UZR for FanGraphs vs. TotalZone for Rally's database, for example).

Pitcher WAR is also performed in a similar fashion, but pitcher runs are determined in a very different fashion. FanGraphs uses FIP, a defense-independent component statistic that everyone here knows about and needs no further explanation. Rally's database uses a pitcher's actual runs allowed and takes a prorated value for defensive runs based on the balls in play the pitcher allows. Now, both versions in general reach similar conclusions, as most pitchers face around average luck and timing and the two values end up similar. A difference of 1 WAR would not preclude me from using one or the other; it simply becomes a matter of taste/preference for the method.

Then comes the interesting case of Ricky Nolasco's 2009 season.

Here's the relevant information:

2009 Nolasco FIP/0.92: 3.64

2009 Nolasco tRA (StatCorner): 3.94

2009 Nolasco RA: 5.40

Here's how I did the calculations and what I got as a result. From Marlin Maniac:

I did both calculations using park factors provided by Patriot. For defense-independent statistics WAR, I averaged tRA from StatCorner and FanGraphs, then averaged that value with FIP/0.92 and stuck into Pythagenpat. If you checked out my MVP article, you saw a list of WAR for pitchers calculated using that; that list contains all pitchers with more than 4.0 WAR. Using that method of evaluation, I had Ricky at 3.8 WAR for the season, a very good total. I then calculated WAR using Rally's method, using team bUZR from FanGraphs as my defensive metric. Using that total, I got Ricky totaling 0.8 WAR on the season.

The difference between the two is a staggering 3 WAR. If you'd prefer, just use the FanGraphs total of 4.2 WAR instead for the component statistic, it's not particularly relevant. The key here is that the difference is huge, and it brings up my question:

Strictly in terms of production, was Ricky Nolasco a 4 WAR pitcher or a 1 WAR pitcher this season?

Presumably, both measures are defense-independent, though they are calculated in different ways. Rally gives credit/debit to the pitcher for his context/timing, while the linear weights models are context independent. If all we wanted to know was production, which one would be a better option? Based on how we treat hitters, my initial presumption would be to lean towards the context-neutral method, but pitchers should have a lot more control over their environment than hitters do. Should a different method be used that's somewhere in between either of these options?

This is not to be taken as an indictment against either methodology, I'm just interested in everyone's opinion on the topic. Vote and discuss accordingly.

