Research

Rethinking pitcher overperformance with DRA and cFIP

The classic ERA-FIP gap, which sabermetricians have used to evaluate overperformance for years, doesn't work all that well. Instead, we have a few different alternatives to gauge pitchers who surpass expectations.

By Ryan Romano@triple_r_ Feb 29, 2016, 11:00am EST

No matter what approach you take, Ziegler's incredible career stands apart.

Any self-respecting baseball stathead knows about FIP. Tom Tango — the sabermetric doyen, as Henry Druschel refers to him — created the statistic to model ERA, and overall it does a pretty good job at that task. fWAR derives its values from FIP, and many of us rely on FIP to evaluate pitchers. In most cases, strikeouts, walks, and home runs will tell us all we need to know.

When FIP diverges from ERA, it yields another fascinating figure: the ERA-FIP difference (clever, no?). Over a small sample, this will tell us who's theoretically been lucky or unlucky. In a larger sample — for pitchers such as Chris Young on one end of the spectrum and Ricky Nolasco on the other — it provides an interesting source of analysis. What causes these pitchers to beat their peripherals? When can we write off random variation? For how long have they done so, and for how long will they continue to do so? Captivating questions, all of these.

But, of course, FIP has its flaws. Pitchers can control other elements of the game — some limit hard contact better than others, and some will melt down with runners on base. Because of pitch framing, fluky umpiring, park factors, and a host of other variables, they don't necessarily control the three true outcomes, either. We need a new metric with which to appraise pitchers.

Or we needed a new metric. About a year ago, we got two such statistics, each with their own values and purposes. In March, Jonathan Judge debuted Contextual FIP, or cFIP, at The Hardball Times. Then in April, Judge, Harry Pavlidis, and Dan Turkenkopf unveiled Deserved Run Average, or DRA, at Baseball Prospectus. Aside from better reflecting how well a player actually did, these metrics give us new differentials to examine.

Let's begin with DRA, which attempts to model a pitcher's runs allowed per nine innings*. The nutshell explanation of the formula is as follows: Compute the value of each plate appearance outcome for a pitcher (a strikeout, a popup, a walk, a single, everything); adjust for all sorts of contextual elements, big and small (not just parks and batter quality — think base-out state); and input the pitcher's ability to hold runners (which pitchers do control) and their ability to avoid wild pitches and passed balls. The actual formula is ridiculous, but this should summarize it cleanly.

*That means it takes into account earned and unearned runs, as rWAR does.

DRA goes back only to 1953, and pitchers who played before that don't have career values. So, for instance, Hoyt Wilhelm — who compiled 2,254.1 major-league innings over 21 seasons — has an overall DRA of zero, because he pitched 159.1 innings in 1952 (his first season). Over the 63 years that DRA encompasses, 1,312 pitchers have worked at least 500 innings, which will serve as our sample. Below, you'll see their career RAs and DRAs:

For the most part, as you can see, DRA correlates pretty well to runs allowed. With that said, some exceptions do exist. These ten pitchers, by RA-DRA, have overperformed the most:

Rank	Name	IP	RA	DRA	RA-DRA
1	Vance Worley	508.7	4.25	5.16	-0.91
2	Scott Linebrink	656.7	3.84	4.71	-0.87
3	John Franco	1,245.7	3.37	4.19	-0.82
4	Alan Mills	636.0	4.33	5.14	-0.81
5	Doug Rau	1,261.0	3.65	4.46	-0.81
6	Dave Smith	809.3	3.11	3.91	-0.80
7	Joe Beimel	680.0	4.38	5.17	-0.79
8	Bob Keegan	644.7	4.19	4.98	-0.79
9	Jair Jurrjens	767.3	4.03	4.81	-0.78
10	Tom Niedenfuer	653.0	3.46	4.24	-0.78

Likewise, these ten pitchers have underperformed the most:

Rank	Name	IP	RA	DRA	RA-DRA
1	Kevin Ritz	753.3	5.79	4.91	0.88
2	Frank Rodriguez	654.0	6.11	5.25	0.86
3	Jordan Lyles	552.7	5.73	4.87	0.86
4	John Doherty	521.3	5.46	4.60	0.86
5	Jay Hook	752.7	5.76	4.94	0.82
6	Luke Hochevar	892.0	5.38	4.57	0.81
7	Bobby Ayala	576.0	5.52	4.74	0.78
8	Len Barker	1,323.7	4.73	3.95	0.78
9	John Thomson	1,270.3	5.20	4.44	0.76
10	Brian Bannister	667.3	5.49	4.74	0.75

Some of these players have similar ERA-FIP gaps, but not all. Worley, by this measure the luckiest pitcher in recent history, has a lifetime ERA- and FIP- of 99 and 98, respectively. For the group as a whole, changes in the difference between ERA- and FIP- account for just 23.9 percent of the variance between RA and DRA. In simpler terms, ERA-FIP correlates rather poorly with RA-DRA.

Now we'll move on to cFIP. Unlike DRA, which focuses on describing a pitcher's output, cFIP sets out to predict it. (In that way, these two metrics respectively resemble ERA and FIP.) It revolves around the central components of FIP — strikeouts, walks, and home runs, as mentioned earlier — while taking into account, as its name suggests, the context. Batter handedness, home-field advantage, and umpire history, among other variables, affect the final product.

Here, we'll look at the difference between FIP- and cFIP, which will tell us how much higher or lower a pitcher's FIP likely should have been. Although cFIP extends back a bit further, to 1950, we'll use the same sample as before. Here, you can see every pitcher's FIP- and cFIP:

As with before, there's an evident relationship here. Yet, as with before, we have plenty of overperformers...

Rank	Name	IP	FIP-	cFIP	(FIP-)-(cFIP)
1	Brad Ziegler	528.7	86	100	-14
2	Hal Woodeshick	847.3	99	113	-14
3	Chad Bradford	515.7	79	92	-13
4	Fred Newman	610.0	97	110	-13
5	Don Schwall	743.0	106	119	-13
6	Doug Sisk	523.3	109	121	-12
7	Carl Morton	1,648.7	101	113	-12
8	Bob Purkey	2,114.7	98	110	-12
9	Javier Lopez	506.7	93	104	-11
10	Burke Badenhop	512.3	90	101	-11

...and underperformers:

Rank	Name	IP	FIP-	cFIP	(FIP-)-(cFIP)
1	Herb Score	858.3	100	80	20
2	Dick Stigman	922.7	111	92	19
3	Steve Dunning	613.7	127	108	19
4	Phil Ortega	951.7	132	114	18
5	Bob Johnson	692.3	108	92	16
6	Bill Caudill	667.0	96	81	15
7	Art Mahaffey	999.0	115	100	15
8	Bill Greif	715.7	114	99	15
9	Danny Frisella	609.3	106	92	14
10	Ray Narleski	702.0	108	94	14

On the one hand, Ziegler also has a huge difference between his ERA and FIP — the third-largest among active pitchers. On the other hand, that doesn't apply to many other people on these lists. The r-squared for (ERA-)-(FIP-) and (FIP-)-(cFIP) sits at a dismal .004. So this has absolutely nothing to do with ERA and FIP.

Finally, we'll combine DRA and cFIP. The former takes a descriptive approach, the latter a predictive one, and when brought together, they form the best version of ERA-FIP that we can hope for. Take a look for yourself:

While the relationship doesn't possess the strength of the preceding two, it clearly exists. That doesn't concern us, though. Without further ado, here are the ten biggest overperformers:

Rank	Name	IP	DRA-	cFIP	(DRA-)-cFIP
1	Brad Ziegler	528.7	79	100	-21
2	Steve Kline	750.3	95	115	-20
3	Bob Buhl	2,587.0	99	116	-17
4	Jesse Crain	532.0	78	95	-17
5	Bob Purkey	2,114.7	94	110	-16
6	Greg Minton	1,130.7	91	107	-16
7	Shelby Miller	575.3	88	104	-16
8	Joe Smith	518.7	80	96	-16
9	Dave Stieb	2,895.3	80	96	-16
10	Doug Sisk	523.3	106	121	-15

And here are the ten biggest underperformers:

Rank	Name	IP	DRA-	cFIP	(DRA-)-cFIP
1	Brad Lidge	603.3	95	73	22
2	Dave Tomlin	511.3	119	99	20
3	Mike Paul	627.7	115	95	20
4	Bill Greif	715.7	118	99	19
5	Claude Raymond	721.0	113	94	19
6	Vance Worley	508.7	121	103	18
7	Brian Matusz	519.7	116	98	18
8	Shawn Camp	592.3	115	97	18
9	George Stone	1,020.7	115	97	18
10	Kevin Slowey	662.0	114	96	18

(ERA-)-(FIP-) has an r-squared of .244 with (DRA-)-cFIP, meaning a decent but subpar correlation. The presence of Worley on the list confirms that suspicion, and even Ziegler's craziness can't .

For these metrics, the same general principles as ERA-FIP apply. When a pitcher has a large difference between, say, his RA and DRA in one season, he's probably lucky. When he maintains that disparity over several seasons, it may stick around long-term. Someone such as Zeigler, who has always blown past his indicators, could continue to do so going forward. Exercise caution, until the sample increases to a safe amount.

All three of these differentials have their uses. As a replacement for ERA-FIP, however, I would use DRA- and cFIP. That way, you get one descriptive metric and one predictive metric, like you do with ERA and FIP. But DRA-RA still has use, and so does (FIP-)-cFIP. In distinct ways, each will tell you how well a pitcher should have fared.

I apologize for failing to offer much analysis immediately. By writing this article, I seek to outline these new ways of thinking about over- and underperformance. Young has held our attention for so long; maybe we should start looking at Worley the same way. Sabermetrics never stops evolving, and the way we look at the game must change with it.

Click here for the full spreadsheet of all 1,312 pitchers. Huge thanks to John Choiniere for collecting the career cFIP values.

. . .

An earlier version of this article incorrectly stated that cFIP goes back to 1951.

Ryan Romano is an editor for Beyond the Box Score. He also writes about the Orioles on Camden Depot (and on Camden Chat that one time), and about the Brewers on BP Milwaukee.

Rethinking pitcher overperformance with DRA and cFIP

Share this story

Share All sharing options for: Rethinking pitcher overperformance with DRA and cFIP

More From Beyond the Box Score

Share this story

All sharing options for: Rethinking pitcher overperformance with DRA and cFIP