For a long time, there was a spilt in the sabermetric community over how exactly to calculate pitching WAR.
- The FIP-based (Fielding Independent PItching) FanGraphs method, which ignores defense entirely.
- The other method (namely Baseball-Reference's) begins with Runs Against (RA9) and then attempts to factor defense out.
A few weeks ago, FanGraphs attempted to bridge the gap by releasing FDP (Fielding Dependent Pitching). FDP simply moves us from fWAR to FanGraphs' "RA9-Wins" by adding additional runs / wins from fielding-dependent events to fWAR.
For better explanations of exactly what FDP does and is, follow these links:
RA9-Wins sounds a lot like Baseball-Reference's version of pitching WAR (commonly referred to as rWAR), but the two statistics are not exactly the same.
When Dave Cameron explained RA9-Wins, he made this stipulation:
First, we calculated the total WAR that a pitcher would receive credit for if he was only evaluated by his runs allowed, and we assumed that he had 100 percent responsibility for every variable that influenced run scoring. That stat is now on the site, and is called "RA9-Wins". If you do not want to consider any impact of fielding on run prevention, and solely want to evaluate a pitcher by what actually happened when he was pitching (accounting for park and league adjustments, at least), then this is the metric for you
When Baseball-Reference calculates their WAR, they consider the impact fielding has on run prevention. Tom Tango did a good job of explaining the difference:
It’s also interesting to compare (RA9-wins) to Baseball-Reference. Over there, we see that Verlander gets a huge adjustment for playing in front of crappy fielders, while Felix gets a penalty for the opposite reason. (And it’s possible the park adjustments are more extreme at BR than at Fangraphs.)
I decided it might be interesting to run a linear regression between RA9-Wins and rWAR to see not only how close they are in general, but to mainly search for their differences.
Before I could run the correlation, I had to first look at the difference in replacement-level adjustments between the two statistics.
- FanGraphs's replacement-level baseline is ~43 wins for a 0 WAR team or (.265 winning percentage)
- Baseball-Reference's replacement-level baseline is 52 wins for a 0 WAR team or (.32 winning percentage)
In an attempt to adjust rWAR up RA9-Wins's lower baseline, I multiplied all of the rWARs by ~1.21.
The 1.21 came from: rWAR winning percentage (.32) / RA9-Wins winning percentage (.265) + 1
For players with a negative rWAR, the adjustment would have moved them even further away from their RA9 counterpart, so for these players the adjustment was a simple addition of 0.15 WAR.
Below is the graph of the linear regression of the RA9-Wins and adjusted rWAR of every qualified starter season from 2010-12 (n=279):
The correlation was really high, which is to be expected, seeing as both metrics are based around the same number -- RA9. 90.51 percent of the variance (r^2) in adjusted rWAR is explained by RA9-Wins, which is a lot.
**Note** I also ran an unadjusted correlation (RA9-Wins vs. unadjusted rWAR) and came up with pretty much the same r^2 of 90.44 percent.
If the correlation had between the two statistics had been perfect (or if the two statistics were the same), the regression equation we'd have seen would have been:
Y=1*X or ad. rWAR = 1*RA9-Wins
Instead we got this equation:
Y= 1.0961*X - 0.6618 or ad.rWAR = 1.0961*RA9-Wins - 0.6618
Obviously, the statistics aren't the same, so the regression equation was slightly different. The important aspect of this analysis is not the correlation though, but instead has everything to do with points off of the regression line.
In Tango's response to the original "rollout" of FDP and RA9-Wins at FanGraphs, he made a point about how Justin Verlander and Felix Hernandez could have fairly large differences in their RA9-Wins and rWAR, in opposite directions.
He noted that Verlander has a low BABIP, but gets credited for an even lower one by Baseball-Reference because the Tigers defense is bad. rWAR assumes that Verlander would have a lower RA9 if he played in front of another defense.
In the case of King Felix, he gets punished for playing in Safeco Field (a pitcher's park) -- but as Tango points out, Felix is a ground ball pitcher, so he would most likely succeed to the same extent if he pitched for another ball club.
So here's the comparison between the two pitchers:
|Pitcher||RA-9 Wins||Ad. rWAR||Squared Difference 1||Squared Difference 2|
**Note** The first Squared Difference is a simple difference between RA9-Wins and Adj. rWAR ((RA9-wins - ad. rWAR)^2). The second Squared Difference is the more important squared residual that comes from the regression model.
The reason Verlander's RA9-Wins is so far off his rWAR is that his negative LOB-Wins (-0.9) offsets his positive BIP-Wins (1.0), so he essentially has no difference between his fWAR and RA9-Wins. Verlander's BIP-Wins does not get offset in rWAR. According to B-R, his rWAR actually should be higher than it is, because of the Tigers' defense.
Felix is a little different. He gets credit for stranding runners in RA9-Wins, while his BIP-Wins is zero. rWAR thinks that Hernandez's RA9 should be higher, because his home park suppresses runs. His rWAR ends up being even worse than his RA9-Wins.
Below I listed the top-10 starters with the highest residual variance over the last three years:
|Pitcher (Year)||RA-9 Wins||Ad. rWAR||Expec. rWAR||Variance|
|1. Daniel Hudson (2011)||4.0||1.7||3.72||4.118|
|2. James Shields (2011)||7.5||5.7||7.56||3.511|
|3. Roy Halladay (2011)||8.3||10.3||8.43||3.403|
|4. Cliff Lee (2011)||8.1||10.0||8.22||3.320|
|5. Joe Saunders (2011)||3.7||1.7||3.39||2.891|
|6. Zack Greinke (2010)||2.6||3.9||2.19||2.828|
|7. Wade Davis (2011)||1.6||-0.6||1.09||2.697|
|8. Ricky Romero (2011)||6||7.5||5.91||2.508|
|9. John Danks (2010)||4.6||5.9||4.38||2.389|
|10. Mark Buehrle (2010)||3.2||4.4||2.85||2.273|
There's a pretty clear pattern that can be seen from looking at these players. The top-2 defenses in 2011 were the Tampa Bay Rays and Arizona Diamondbacks. Here are those two teams' UZR, DRS and PADE:
|Team||UZR (Rank)||DRS (Rank)||PADE (Rank)|
|Tampa Bay||53.7 (2nd)||85 (1st)||4.30 (1st)|
|Arizona||55.8 (1st)||54 (2nd)||0.59 (10th)|
Four of the ten pitchers on this list (including the top two) pitched in front of those two defenses. The rWAR model attributes much of Daniel Hudson, Joe Saunders, James Shields and Wade Davis's run prevention ability to the defense playing behind them.
Hudson's 2011 RA9-Wins is the furthest away from his rWAR than any other pitcher in the sample. This is strange to me, because his 2011 BABIP and, in turn, BIP-Wins were both below average -- .295 and -0.2, respectively. Yet, his rWAR is trying to say his BABIP (and RA9) should have been worse, because the defense behind him was so talented.
In 2011, both Cliff Lee and Roy Halladay had higher-than-league-average BABIPs. Baseball-Reference's WAR is based on DRS. The 2011 Phillies had the third-worst team DRS (-59 runs), thus the rWAR conclusion is that Lee and Halladay would've given up less runs if they had a better defense behind them.
We can follow this straightfoward pattern for almost every pitcher whose RA9-Wins and rWAR differ, in the sample. For instance, the 2010 Royals and White Sox both rated near the worst in the league defensively. So, rWAR rates Zack Greinke (Royals), John Danks (White Sox) and Mark Buehrle (White Sox) as being better than their RA9-Wins would indicate.
The only issue with this is that we still can't measure the actual effect the defense has on how many runs a pitcher gives up.
For more on this issue, please read this post from Tango -- with UZR help from MGL -- on how a defense could possibly change behind individual pitchers. This is a great quote from that post:
It’s very well possible that if you see a pitcher with a low BABIP on a bad-fielding team that he could have still gotten GOOD fielding behind him. Just as you wouldn’t presume a good hitting team provides good run support to all its pitchers, or a good bullpen helps out all of its pitchers, then neither should you presume that having a good set of fielders behind you means that ended up receiving good fielding support.
This idea is still very much up in the air, because we simply just don't know. But, it goes some of the way in explaining why we can see a stark difference between a pitcher's rWAR and RA9-Wins.
Anyways, you probably noticed that no 2012 pitcher reached the top-10. So, here's the list of the top-five pitchers with the highest squared residuals, so far this season:
|Pitcher||RA9-Wins||Ad. rWAR||Expec. rWAR||Variance|
|1. Justin Verlander||5.9||7.3||5.80||2.107|
|2. CJ Wilson||2.6||0.8||2.19||1.800|
|3. Chris Sale||5.7||6.9||5.59||1.710|
|4. Matt Cain||5.1||3.6||4.93||1.689|
|5. Lucas Harrell||2.2||3.0||1.75||1.621|
It wasn't surprising to see Verlander at the top of the list, or to see that two of his rotation-mates (Rick Porcello and Max Scherzer) are also in the top-10, because of the Tigers' defense.
The most interesting finding of the 2012 sample has nothing to do with team defense, but instead it shines some light on the differences in park factors.
The entire San Francisco Giants starting rotation makes up five of the top-21 of pitchers with the highest squared residuals.
Why is this so interesting? Well, let's look at some facts.
Both FanGraphs and Baseball-Reference adjust many of their statistics to incorporate park factors. ERA- and wRC+ are popular park-adjusted statistics that FanGraphs publishes.
The Giants' team ERA- is 96 (anything below 100 is above-average) and their wRC+ is 95 (anything above 100 is above-average). So, based on those statistics, the Giants have an above-average pitching staff and a below-average offense.
I think most of us would accept that.
According to Baseball-Reference, the Giants team ERA+ is 93 (anything above 100 is above-average), and their team OPS+ is 105 (anything above 100 is above-average). So, B-R's park-adjusted statistics claim the exact opposite of what similar FG numbers would tell us.
In Tango's statement I referred to early, he stated it was possible that B-R's park adjustments were more extreme that FGs. This issue has been raised on Twitter by both Baseball Prospectus's Bradley Ankrom, and BtBS's own Julian Levine.
A few months ago, Grant Brisbee of the SB Nation Giants blog, McCovey Chronicles, wrote a blog post addressing the seemingly-high park adjustment for AT&T Park, this season.
According to DRS, the 2012 Giants have had a well below-average defense this season (-30). Given the pattern we've seen with the pitchers I addressed earlier in the article, we'd assume that rWAR would attempt to compensate for this bad defense by boosting the Giants' individual pitching statistics.
However, their entire rotation has a higher RA9-Wins than rWAR, by a significant margin:
|Pitcher||RA9-Wins||Ad. rWAR||Expec. rWAR||Variance|
In my opinion, the only real way to explain this severe break from the pattern found earlier, is that AT&T Park has an extreme park-adjustment, according to B-R, for this season.
RA9-Wins has somewhat bridged the gap between fWAR and rWAR for pitchers, which is a very good thing.
It's especially good because now the fans have the option of deciding how much they want to weigh different aspects of the pitching for themselves when evaluating pitchers.
You can follow Glenn on twitter @Glenn_DuPaul