clock menu more-arrow no yes mobile

Filed under:

Why I don’t like DRA

Baseball Prospectus’s advanced pitching metric is probably the most accurate in the public domain. But stats need to do more than just convey truth.

MLB: Spring Training-Chicago Cubs at Arizona Diamondbacks Rick Scuteri-USA TODAY Sports

It’s been just over two years since Baseball Prospectus introduced Deserved Run Average (DRA), its advanced pitching metric. Measuring pitcher talent and ability has long been a bugbear of the sabermetric world, and while we’ve left behind the randomness and imprecision of pitcher wins, finding suitable replacements has been difficult.

One of the biggest advances before DRA was Voros McCracken’s work on DIPS, or Defense Independent Pitching Statistics. DIPS assumes that pitchers have no command of what happens to a ball hit into the field of play, and examines only the three “true” outcomes of strikeouts, walks, and home runs that pitchers have almost complete control over. That’s a useful assumption, and one that is closer to the truth than the idea that pitchers have total command over the kind of contact they give up (as McCracken’s original research shows). It’s the assumption that’s at the core of FanGraphs’ Fielding Independent Pitching (FIP) metric, and as a result, at the core of FanGraphs’ version of pitcher WAR.

But it’s not a completely correct assumption, and as the quality of data available has improved, we’ve become more able to determine what pitchers are controlling the outcomes of balls hit into play, and how. That’s where DRA comes in. Last year, when it was substantially updated, the creators (Jonathan Judge, Harry Pavlidis, and Dan Turkenkopf) billed it as a challenge to “the citadel of DIPS,” that would finally allow us to move past the blunt measurement of FIP and to a more subtle and precise measurement. As the calculations in this year’s update appear to show, it succeeded, and now is the best measurement of pitcher talent available.

That’s where my explanation has to stop, though, because I don’t know that much more about DRA. I can tell you how it was made — using a linear mixed model that controls for a whole host of statistics that impact pitching, such as the park being played in; the temperature; the identity of the batter, catcher, and umpire; the defense playing behind the pitcher; and more — but I can’t tell you what exactly that means. I can’t tell you how much it cares about each of those variables, or how the variables might interact with each other. I can’t walk through the steps of a DRA calculation, and understand how the inputs for a given pitcher turn into the relevant output. In this way, it stands in stark contrast to FIP, for which the calculations are publicly available and relatively simple.

All that results in a hyper-accurate but shrouded metric, and a metric that I just can’t get on board with. My issue is not with how DRA is calculated, but with something more fundamental, what it is and what it’s trying to do.

Let’s assume that DRA is 100 percent accurate; that is, that it perfectly reflects the true talent of every pitcher. That’s not what I want out of a statistic. If baseball is a Rubik’s Cube, a totally accurate statistic with no explanation of how it gets that accuracy is the equivalent of pulling it apart into its component parts and putting it back together so that it’s solved. DRA gets us to the desired endpoint, but keeps the process so shrouded to the public that it may as well bypass it completely.

Maybe this just means I’m not the target audience for DRA. If you’re a GM trying to decide whether to trade for a pitcher, then the endpoint of perfect knowledge is far more interesting to you than the process. I suppose the same might be true of some particularly invested fantasy baseball folks. But for me, baseball is just a big puzzle, and it’s the act of solving it that brings me joy. I’m happy to have help from others — I’m not saying I want to look only at statistics I personally made — but if I can’t see how you’re helping, then I’m not that interested. To stick with the Rubik’s Cube metaphor, I absolutely want people smarter than me to teach me the tricks of solving it, or to take a spin or two while I watch and they explain what they’re doing. I don’t want someone to take the Cube to a closed room and return an hour later with it solved.

I think this explains why DRA has seemingly struggled to catch on. (I say “seemingly” because this is based on my own subjective observations only, which are certainly colored heavily by my own aversion to the metric.) DRA isn’t a common feature in articles or conversations around the water cooler because it ends discussions rather than furthering them. When we’re talking about Jake Arrieta and his rough start to 2017, we know that ERA, FIP, and whatever other measures of performance and talent we’re using are imperfect, which means it can be debated and challenged. When you bring up his horrible 5.44 ERA, I can respond to that with his .355 BABIP, and you can respond to that with his 40.2 percent ground ball rate. Metrics that you can take apart and challenge encourage conversation; a perfect-but-hidden metric does not. If, like me, you enjoy baseball analysis precisely because of those conversations, then DRA is probably unsatisfying to you as well.

That’s a result of both DRA’s complexity and its apparent near-perfection. It’s very hard to continue a conversation after you’ve introduced a metric that I can’t challenge, because I can’t see how it’s calculated, but that I’m told is extremely accurate. When you tell me Jake Arrieta’s DRA is 4.21, I can’t talk about how I think it’s overvaluing his strikeouts, or undervaluing his bad luck on fly balls, because I don’t know how it values any of the things he’s doing. I can either ignore his DRA, or stop the conversation.

When those are the sole options that a metric gives you, that’s not a metric for me. I don’t mean to cast any sort of aspersions on the creators of DRA, or even the stat itself; I have no doubts that it is truly as accurate as the BP folks have shown it to be, nor do I doubt their intelligence and rigor in the creation of it.

But when people have expressed their dissatisfaction with DRA in the past, the response has often been to point to its accuracy as a way of justifying the complexity and “black box” calculation. What I hope I’ve done in this piece is show that such a response is talking past the critics. Accuracy is not the end-all, be-all, or at least not for everyone. DRA will have to promote conversations to really catch on; right now, all it does is end them, and that makes it uninteresting to me.