Any self-respecting baseball stathead knows about FIP. Tom Tango — the sabermetric doyen, as Henry Druschel refers to him — created the statistic to model ERA, and overall it does a pretty good job at that task. fWAR derives its values from FIP, and many of us rely on FIP to evaluate pitchers. In most cases, strikeouts, walks, and home runs will tell us all we need to know.
When FIP diverges from ERA, it yields another fascinating figure: the ERA-FIP difference (clever, no?). Over a small sample, this will tell us who's theoretically been lucky or unlucky. In a larger sample — for pitchers such as Chris Young on one end of the spectrum and Ricky Nolasco on the other — it provides an interesting source of analysis. What causes these pitchers to beat their peripherals? When can we write off random variation? For how long have they done so, and for how long will they continue to do so? Captivating questions, all of these.
But, of course, FIP has its flaws. Pitchers can control other elements of the game — some limit hard contact better than others, and some will melt down with runners on base. Because of pitch framing, fluky umpiring, park factors, and a host of other variables, they don't necessarily control the three true outcomes, either. We need a new metric with which to appraise pitchers.
Or we needed a new metric. About a year ago, we got two such statistics, each with their own values and purposes. In March, Jonathan Judge debuted Contextual FIP, or cFIP, at The Hardball Times. Then in April, Judge, Harry Pavlidis, and Dan Turkenkopf unveiled Deserved Run Average, or DRA, at Baseball Prospectus. Aside from better reflecting how well a player actually did, these metrics give us new differentials to examine.
Let's begin with DRA, which attempts to model a pitcher's runs allowed per nine innings*. The nutshell explanation of the formula is as follows: Compute the value of each plate appearance outcome for a pitcher (a strikeout, a popup, a walk, a single, everything); adjust for all sorts of contextual elements, big and small (not just parks and batter quality — think base-out state); and input the pitcher's ability to hold runners (which pitchers do control) and their ability to avoid wild pitches and passed balls. The actual formula is ridiculous, but this should summarize it cleanly.
*That means it takes into account earned and unearned runs, as rWAR does.
DRA goes back only to 1953, and pitchers who played before that don't have career values. So, for instance, Hoyt Wilhelm — who compiled 2,254.1 major-league innings over 21 seasons — has an overall DRA of zero, because he pitched 159.1 innings in 1952 (his first season). Over the 63 years that DRA encompasses, 1,312 pitchers have worked at least 500 innings, which will serve as our sample. Below, you'll see their career RAs and DRAs:
For the most part, as you can see, DRA correlates pretty well to runs allowed. With that said, some exceptions do exist. These ten pitchers, by RA-DRA, have overperformed the most:
Likewise, these ten pitchers have underperformed the most:
Some of these players have similar ERA-FIP gaps, but not all. Worley, by this measure the luckiest pitcher in recent history, has a lifetime ERA- and FIP- of 99 and 98, respectively. For the group as a whole, changes in the difference between ERA- and FIP- account for just 23.9 percent of the variance between RA and DRA. In simpler terms, ERA-FIP correlates rather poorly with RA-DRA.
Now we'll move on to cFIP. Unlike DRA, which focuses on describing a pitcher's output, cFIP sets out to predict it. (In that way, these two metrics respectively resemble ERA and FIP.) It revolves around the central components of FIP — strikeouts, walks, and home runs, as mentioned earlier — while taking into account, as its name suggests, the context. Batter handedness, home-field advantage, and umpire history, among other variables, affect the final product.
Here, we'll look at the difference between FIP- and cFIP, which will tell us how much higher or lower a pitcher's FIP likely should have been. Although cFIP extends back a bit further, to 1950, we'll use the same sample as before. Here, you can see every pitcher's FIP- and cFIP:
As with before, there's an evident relationship here. Yet, as with before, we have plenty of overperformers...
On the one hand, Ziegler also has a huge difference between his ERA and FIP — the third-largest among active pitchers. On the other hand, that doesn't apply to many other people on these lists. The r-squared for (ERA-)-(FIP-) and (FIP-)-(cFIP) sits at a dismal .004. So this has absolutely nothing to do with ERA and FIP.
Finally, we'll combine DRA and cFIP. The former takes a descriptive approach, the latter a predictive one, and when brought together, they form the best version of ERA-FIP that we can hope for. Take a look for yourself:
While the relationship doesn't possess the strength of the preceding two, it clearly exists. That doesn't concern us, though. Without further ado, here are the ten biggest overperformers:
And here are the ten biggest underperformers:
(ERA-)-(FIP-) has an r-squared of .244 with (DRA-)-cFIP, meaning a decent but subpar correlation. The presence of Worley on the list confirms that suspicion, and even Ziegler's craziness can't .
For these metrics, the same general principles as ERA-FIP apply. When a pitcher has a large difference between, say, his RA and DRA in one season, he's probably lucky. When he maintains that disparity over several seasons, it may stick around long-term. Someone such as Zeigler, who has always blown past his indicators, could continue to do so going forward. Exercise caution, until the sample increases to a safe amount.
All three of these differentials have their uses. As a replacement for ERA-FIP, however, I would use DRA- and cFIP. That way, you get one descriptive metric and one predictive metric, like you do with ERA and FIP. But DRA-RA still has use, and so does (FIP-)-cFIP. In distinct ways, each will tell you how well a pitcher should have fared.
I apologize for failing to offer much analysis immediately. By writing this article, I seek to outline these new ways of thinking about over- and underperformance. Young has held our attention for so long; maybe we should start looking at Worley the same way. Sabermetrics never stops evolving, and the way we look at the game must change with it.
. . .
An earlier version of this article incorrectly stated that cFIP goes back to 1951.