clock menu more-arrow no yes

Filed under:

Rethinking pitcher overperformance with DRA and cFIP

The classic ERA-FIP gap, which sabermetricians have used to evaluate overperformance for years, doesn't work all that well. Instead, we have a few different alternatives to gauge pitchers who surpass expectations.

No matter what approach you take, Ziegler's incredible career stands apart.
No matter what approach you take, Ziegler's incredible career stands apart.
Kevin Jairaj-USA TODAY Sports

Any self-respecting baseball stathead knows about FIP. Tom Tango — the sabermetric doyen, as Henry Druschel refers to him — created the statistic to model ERA, and overall it does a pretty good job at that task. fWAR derives its values from FIP, and many of us rely on FIP to evaluate pitchers. In most cases, strikeouts, walks, and home runs will tell us all we need to know.

When FIP diverges from ERA, it yields another fascinating figure: the ERA-FIP difference (clever, no?). Over a small sample, this will tell us who's theoretically been lucky or unlucky. In a larger sample — for pitchers such as Chris Young on one end of the spectrum and Ricky Nolasco on the other — it provides an interesting source of analysis. What causes these pitchers to beat their peripheralsWhen can we write off random variationFor how long have they done so, and for how long will they continue to do so? Captivating questions, all of these.

But, of course, FIP has its flaws. Pitchers can control other elements of the game — some limit hard contact better than others, and some will melt down with runners on base. Because of pitch framing, fluky umpiring, park factors, and a host of other variables, they don't necessarily control the three true outcomes, either. We need a new metric with which to appraise pitchers.

Or we needed a new metric. About a year ago, we got two such statistics, each with their own values and purposes. In March, Jonathan Judge debuted Contextual FIP, or cFIP, at The Hardball Times. Then in April, Judge, Harry Pavlidis, and Dan Turkenkopf unveiled Deserved Run Average, or DRA, at Baseball Prospectus. Aside from better reflecting how well a player actually did, these metrics give us new differentials to examine.

Let's begin with DRA, which attempts to model a pitcher's runs allowed per nine innings*. The nutshell explanation of the formula is as follows: Compute the value of each plate appearance outcome for a pitcher (a strikeout, a popup, a walk, a single, everything); adjust for all sorts of contextual elements, big and small (not just parks and batter quality — think base-out state); and input the pitcher's ability to hold runners (which pitchers do control) and their ability to avoid wild pitches and passed balls. The actual formula is ridiculous, but this should summarize it cleanly.

*That means it takes into account earned and unearned runs, as rWAR does.

DRA goes back only to 1953, and pitchers who played before that don't have career values. So, for instance, Hoyt Wilhelm — who compiled 2,254.1 major-league innings over 21 seasons — has an overall DRA of zero, because he pitched 159.1 innings in 1952 (his first season). Over the 63 years that DRA encompasses, 1,312 pitchers have worked at least 500 innings, which will serve as our sample. Below, you'll see their career RAs and DRAs:

For the most part, as you can see, DRA correlates pretty well to runs allowed. With that said, some exceptions do exist. These ten pitchers, by RA-DRA, have overperformed the most:

Rank Name IP RA DRA RA-DRA
1 Vance Worley 508.7 4.25 5.16 -0.91
2 Scott Linebrink 656.7 3.84 4.71 -0.87
3 John Franco 1,245.7 3.37 4.19 -0.82
4 Alan Mills 636.0 4.33 5.14 -0.81
5 Doug Rau 1,261.0 3.65 4.46 -0.81
6 Dave Smith 809.3 3.11 3.91 -0.80
7 Joe Beimel 680.0 4.38 5.17 -0.79
8 Bob Keegan 644.7 4.19 4.98 -0.79
9 Jair Jurrjens 767.3 4.03 4.81 -0.78
10 Tom Niedenfuer 653.0 3.46 4.24 -0.78

Likewise, these ten pitchers have underperformed the most:

Rank Name IP RA DRA RA-DRA
1 Kevin Ritz 753.3 5.79 4.91 0.88
2 Frank Rodriguez 654.0 6.11 5.25 0.86
3 Jordan Lyles 552.7 5.73 4.87 0.86
4 John Doherty 521.3 5.46 4.60 0.86
5 Jay Hook 752.7 5.76 4.94 0.82
6 Luke Hochevar 892.0 5.38 4.57 0.81
7 Bobby Ayala 576.0 5.52 4.74 0.78
8 Len Barker 1,323.7 4.73 3.95 0.78
9 John Thomson 1,270.3 5.20 4.44 0.76
10 Brian Bannister 667.3 5.49 4.74 0.75

Some of these players have similar ERA-FIP gaps, but not all. Worley, by this measure the luckiest pitcher in recent history, has a lifetime ERA- and FIP- of 99 and 98, respectively. For the group as a whole, changes in the difference between ERA- and FIP- account for just 23.9 percent of the variance between RA and DRA. In simpler terms, ERA-FIP correlates rather poorly with RA-DRA.

Now we'll move on to cFIP. Unlike DRA, which focuses on describing a pitcher's output, cFIP sets out to predict it. (In that way, these two metrics respectively resemble ERA and FIP.) It revolves around the central components of FIP — strikeouts, walks, and home runs, as mentioned earlier — while taking into account, as its name suggests, the context. Batter handedness, home-field advantage, and umpire history, among other variables, affect the final product.

Here, we'll look at the difference between FIP- and cFIP, which will tell us how much higher or lower a pitcher's FIP likely should have been. Although cFIP extends back a bit further, to 1950, we'll use the same sample as before. Here, you can see every pitcher's FIP- and cFIP:

As with before, there's an evident relationship here. Yet, as with before, we have plenty of overperformers...

Rank Name IP FIP- cFIP (FIP-)-(cFIP)
1 Brad Ziegler 528.7 86 100 -14
2 Hal Woodeshick 847.3 99 113 -14
3 Chad Bradford 515.7 79 92 -13
4 Fred Newman 610.0 97 110 -13
5 Don Schwall 743.0 106 119 -13
6 Doug Sisk 523.3 109 121 -12
7 Carl Morton 1,648.7 101 113 -12
8 Bob Purkey 2,114.7 98 110 -12
9 Javier Lopez 506.7 93 104 -11
10 Burke Badenhop 512.3 90 101 -11

...and underperformers:

Rank Name IP FIP- cFIP (FIP-)-(cFIP)
1 Herb Score 858.3 100 80 20
2 Dick Stigman 922.7 111 92 19
3 Steve Dunning 613.7 127 108 19
4 Phil Ortega 951.7 132 114 18
5 Bob Johnson 692.3 108 92 16
6 Bill Caudill 667.0 96 81 15
7 Art Mahaffey 999.0 115 100 15
8 Bill Greif 715.7 114 99 15
9 Danny Frisella 609.3 106 92 14
10 Ray Narleski 702.0 108 94 14

On the one hand, Ziegler also has a huge difference between his ERA and FIP — the third-largest among active pitchers. On the other hand, that doesn't apply to many other people on these lists. The r-squared for (ERA-)-(FIP-) and (FIP-)-(cFIP) sits at a dismal .004. So this has absolutely nothing to do with ERA and FIP.

Finally, we'll combine DRA and cFIP. The former takes a descriptive approach, the latter a predictive one, and when brought together, they form the best version of ERA-FIP that we can hope for. Take a look for yourself:

While the relationship doesn't possess the strength of the preceding two, it clearly exists. That doesn't concern us, though. Without further ado, here are the ten biggest overperformers:

Rank Name IP DRA- cFIP (DRA-)-cFIP
1 Brad Ziegler 528.7 79 100 -21
2 Steve Kline 750.3 95 115 -20
3 Bob Buhl 2,587.0 99 116 -17
4 Jesse Crain 532.0 78 95 -17
5 Bob Purkey 2,114.7 94 110 -16
6 Greg Minton 1,130.7 91 107 -16
7 Shelby Miller 575.3 88 104 -16
8 Joe Smith 518.7 80 96 -16
9 Dave Stieb 2,895.3 80 96 -16
10 Doug Sisk 523.3 106 121 -15

And here are the ten biggest underperformers:

Rank Name IP DRA- cFIP (DRA-)-cFIP
1 Brad Lidge 603.3 95 73 22
2 Dave Tomlin 511.3 119 99 20
3 Mike Paul 627.7 115 95 20
4 Bill Greif 715.7 118 99 19
5 Claude Raymond 721.0 113 94 19
6 Vance Worley 508.7 121 103 18
7 Brian Matusz 519.7 116 98 18
8 Shawn Camp 592.3 115 97 18
9 George Stone 1,020.7 115 97 18
10 Kevin Slowey 662.0 114 96 18

(ERA-)-(FIP-) has an r-squared of .244 with (DRA-)-cFIP, meaning a decent but subpar correlation. The presence of Worley on the list confirms that suspicion, and even Ziegler's craziness can't .

For these metrics, the same general principles as ERA-FIP apply. When a pitcher has a large difference between, say, his RA and DRA in one season, he's probably lucky. When he maintains that disparity over several seasons, it may stick around long-term. Someone such as Zeigler, who has always blown past his indicators, could continue to do so going forward. Exercise caution, until the sample increases to a safe amount.

All three of these differentials have their uses. As a replacement for ERA-FIP, however, I would use DRA- and cFIP. That way, you get one descriptive metric and one predictive metric, like you do with ERA and FIP. But DRA-RA still has use, and so does (FIP-)-cFIP. In distinct ways, each will tell you how well a pitcher should have fared.

I apologize for failing to offer much analysis immediately. By writing this article, I seek to outline these new ways of thinking about over- and underperformance. Young has held our attention for so long; maybe we should start looking at Worley the same way. Sabermetrics never stops evolving, and the way we look at the game must change with it.

Click here for the full spreadsheet of all 1,312 pitchers. Huge thanks to John Choiniere for collecting the career cFIP values.

. . .

An earlier version of this article incorrectly stated that cFIP goes back to 1951.

Ryan Romano is an editor for Beyond the Box Score. He also writes about the Orioles on Camden Depot (and on Camden Chat that one time), and about the Brewers on BP Milwaukee.