Ever since the
dawn of man about a year ago, I've wanted to put the different WAR metrics on the same scale, so that we can compare and contrast the different scores from the different systems. Well, last week I began the process of releasing my findings, and we'll continue that today, with an exploration of 2012's (mostly) starting pitchers.
I use the term "(mostly)" here, because for this sample, I pulled every pitcher with more than 100 innings pitched in 2012, and some of those guys threw relief innings in addition to starting. You can't win them all (literally, Cliff Lee). I used this as a cutoff point so that I could have a manageable set of data, and so that I could have a sample of almost the exact same size as my qualified hitters list. For what it's worth, these pitchers offered here make up more than half the total innings pitched in the majors in 2012.
For a more detailed picture of the methodology I use to scale these scores check out the original article. For 2012, I computed the following constants to adjust fWAR (FanGraphs' WAR) and WARP (Baseball Prospectus's WARP) to the level of rWAR (Baseball-Reference's WAR):
2012 fWAR to rWAR Conversion: -0.00238 fWAR/IP
2012 WARP to rWAR Conversion: 0.00241 WARP/IP
Hey, that's pretty interesting. For 2012, it appears that the difference between fWAR and rWAR and the difference between rWAR and WARP -- they're practically identical. That's kind of cool. It's also very close in 2011, but the further back we go, the more things shift and change.
I'd also like to point out one major thing about the different WAR(P) systems: when it comes to pitching, the inputs used to create WAR vary greatly. For example, rWAR is built with a focus on RA9 -- the amount of runs a pitcher gives up -- while fWAR is built around FIP -- the amount of K / BB / HR a pitcher gives up. Each system has its own merits, but given the way these are built, I think it's fair to expect that the differences between the systems might be a bit more pronounced than for hitting. Hitting stats and inputs are a bit more aligned across the different systems.
Anyways, on to the "good stuff."
|Name||Team||IP||fWAR||fWAR (adj.)||rWAR||WARP||WARP (adj.)||WARi|
Well, all of the systems can at least agree on one thing: Justin Verlander is the best pitcher in baseball. Each system gave Verlander more WAR than any other pitcher, so he sits atop the WAR Index. I'm not surprised at all.
From there, things get a bit more dicey. Adjusted fWAR has Felix Hernandez at No. 2, rWAR advocates for David Price at No. 2, and as for WARP, you have to go all the way down to No. 19 on the overall WARi leaderboard to find their candidate for second-best pitcher in the majors: Stephen Strasburg.
Given that the WARs are counting stats, it is incredibly impressive that pitchers with limited innings counts like Strasburg and Kris Medlen even managed to crack the top 20. Each of these guys had one particular system that "weighed them down" a little bit, with Medlen only worth 3.2 adjusted WARP and Strasburg only garnering 2.7 rWAR.
Though no system had him second overall, Dodgers hurler Clayton Kershaw easily found himself in second place overall by WARi, with the only indexed score above 5.0 other than Verlander. Though R.A. Dickey is getting much of the NL Cy Young talk, Kershaw has been very, very good again in 2012, and deserves consideration at least, and Cy votes at best.
Also, this points at something very important: overall starting pitching WARi scores don't look nearly the same as hitter WAR scores. While B-R advocates the same performance structure for pitchers as they do hitters (WAR of 0 = replacement-level, WAR of 2 = average starter, WAR of 5 = All-Star and WAR of 8 = Cy Young / MVP candidate), we find very few players on any system, or in the WAR Index, who sit at or around 5 wins.
Even if we lower our standards for "All-Star" performance to four WARi, we'd only have 13 starters who fit that criteria. Since there are so many fewer starting pitcher WAR to go around, maybe the scale needs to be adjusted. Even rWAR only had nine starters at five or more wins above replacement, many more than either adjusted fWAR or adjusted WARP.
|Name||Team||IP||fWAR||fWAR (adj.)||rWAR||WARP||WARP (adj.)||WARi|
Once again, we have something that nearly every system can agree upon: Ervin Santana was completely awful last year. By adjusted fWAR, he was the worst, and he was close-to-the-worst in the other two systems. He's the rare case where all three systems found him to be worth roughly a win and a half below replacement -- especially interesting given that all three systems only found agreement on five starters being worse than replacement in the whole league.
You're unlikely to find a lot of pitchers who rack up loads of negative WAR, presumably because with WAR being a "counting stat" most poorly-performing pitchers don't get enough innings to really rack up negative scores. Santana, Ricky Romero and Tim Lincecum are exceptions, given that they were reasonable pitchers in 2011 with sizeable contracts who wouldn't be sent down to the minors to work. Ubaldo Jimenez would fall under this category too ... if his 2011 was any good, which it wasn't. Name recognition!
Other than those guys, the other pitchers who racked up low WAR over lots of innings are the pitchers for teams who have no good replacement. This includes Clayton Richard (over 200 innings still has some value, right?), Henderson Alvarez (yuck!), Kevin Correia (ay-ya-yi!) and Luke Hochevar (yikes!).
It's pretty impressive that Hector Noesi could put up such terrible numbers despite Noesi (1) pitching at Safeco for some of those innings and (2) grabbing a few innings in relief as well. Boy, that Michael Pineda trade didn't work out too well for either side, did it?
Did you notice that three of the twelve-worst pitchers by WARi pitched for the Toronto Blue Jays? I thought I noticed you noticing. Time to go fishing on the free agent market, Mr. Anthopoulos!
The Biggest Differences Between Systems
Time to take a look at how the different systems showed the biggest differences (deltas) between 2012 performances. Below, you'll find the top-10 deltas for each of the three system comparisons, along with a couple of notes, which should be a ton of fun*.
* Note: Fun still not guaranteed.
Differences between adjusted WARP / rWAR
So, this is, uh, interesting. Step on up, Matt Harrison! You are a conundrum! Harrison followed up his very-good 2011 with another solid season, earning his first trip to the All-Star game in the process. But while Harrison didn't massively outperform his FIP in 2011, he did in 2012, posting a very nice LOB% (78.6%) despite striking out less than six batters per nine. According to rWAR, Harrison's RA9 of about 3.46 over all those innings made him the third- or fourth-best pitcher in baseball, right on par with Clayton Kershaw. WARP takes a much more nuanced view, giving Harrison only 2.1 WARP even after adjustment. This probably has to do with Harrison's 4.67 FRA, which looks to actually be below league-average. In this analyst's opinion, the rWAR score doesn't pass the smell test, and the WARP score looks a little low for a guy who threw 200+ innings (a bunch of them in Arlington) and managed a 3.29 ERA. The index looks to be a bit more representative of his actual performance than either of these two outlier inputs.
In the cases of Johnny Cueto, David Price and Hiroki Kuroda, all of these pitchers rated as All-Star caliber starters by rWAR, but WARP saw them much, much closer to just above-average contributors. Price and Cueto are both considered likely to get Cy Young votes, while Kuroda thrived in his first MLB season outside the friendly confines of Chavez Ravine.
Luke Hochevar, Brian Duensing and Clayton Richard are our three pitchers where one system thought they were *really* terrible, and the other system just thought they were about replacement level. All three of them averaged out very close to replacement-level, and I'd guess that none of them are guaranteed a rotation slot next season.
Differences between adjusted fWAR / rWAR
A long, long time ago Two years ago, in a land far far away San Francisco's Tim Lincecum was the best of the best. In 2012, he was, well, not. rWAR puts Lincecum as being very, very terrible in 2012, worth two wins less than replacement (Dan Runzler?) for the Giants. Think the Giants could have used a couple extra wins this season? At any rate, adjusted fWAR has a different view, having Lincecum at 1.1 fWAR, which is a bit better than replacement. Lincecum had a terrible ERA (5.18), which was propped up by a very, very unusual 1.11 HR/9, but his strikeout numbers remained strong, keeping his FIP manageable (4.18). Still, that's a recipe for a large difference between rWAR and fWAR.
From a peripheral perspective, Adam Wainwright didn't have a much different 2012 from his 2011. He struck out around the same number of hitters per PA, and his walk and HR numbers didn't shoot up. So why would Waino only be worth about 0.9 rWAR? Well, Adam Wainwright was crazy unlucky this season, posting a 67.8 LOB%. According to the FanGraphs Library, that rates somewhere around "poor" for a season like 2012. You could also use the term "unlucky", though that's probably the first time that term has been used in reference to the 2012 Cardinals ...
Jeremy Hellickson and rookie Miguel Gonzalez are excellent examples of pitchers who *greatly* outperform their FIPs, with each pitcher more than a run better by ERA than they are by FIP. While Gonzalez is a rookie with only 105 IP under his belt, Hellboy is a different story entirely. You can read more about his "magical" ability to outperform his FIP here, but we've got two full seasons of Hellickson performing this trick in the majors now. There's a non-zero chance that he's found the secret sauce to success without massive strikeout, walk or HR rates, and rWAR gives him full credit for this.
Differences between adjusted WARP / adjusted fWAR
|Name||Team||fWAR (adj.)||WARP (adj.)||delta||WARi|
Another interesting set of names shows up here. We see smaller deltas, in general, when comparing adjusted fWAR to adjusted WARP. The biggest difference here is "Big Game" James Shields. Shields's ERA and FIP sit very close to one another (3.52 and 3.47, respectively), but Baseball Prospectus's FRA tells a very different story. Shields's FRA is 4.43 for the season, which is hardly an elite number. And this isn't an isolated incident, either. Shields's career FRA is 4.41, and WARP has never really seen him as an elite starter.
Two Braves starters, Tim Hudson and Paul Maholm, also show up as big deltas between fWAR and WARP. Like with Shields, Hudson has similar ERA and FIP numbers under four, but his FRA is *way* out there at 5.13. Unlike Shields, Hudson has very, very pedestrian strikeout numbers. Maholm pitched quite a bit better after coming over to Atlanta mid-season, but that's not enough for WARP to consider Maholm much above replacement-level.
There may be a "big" difference between Felix Hernandez's fWAR and WARP, but all systems see him as an upper-tier starter. A difference of 1.4 WAR isn't insignificant, but when it comes to a player like Felix, it means a bit less than it might to a player on the borderline between replacement-level and league-average.
Bruce Chen, man. All I can do is shake my head.
You can find all the continued work that I'm doing on WARi in one place, at this StoryStream. And, as usual, I'd really love to hear from our readers in the comments section below. What areas do you want to see me explore further in WARi? Are there any questions about methodology you'd like answered? What's the next step you'd like to see us take? What years / players / teams would you like to see data for?