clock menu more-arrow no yes mobile

Filed under:

How to look beyond ERA

A peek at the tools you can use to help better evaluate pitchers on a runs per 9 basis.

MLB: Minnesota Twins at Boston Red Sox Greg M. Cooper-USA TODAY Sports

For years, ERA has been the gold standard for most people when evaluating pitchers. Well, along with win-loss record, unfortunately. Thankfully, the win-loss record has lost much of its legitimacy in mainstream baseball speak. As we continue to move forward with public statistics, other metrics have been developed on the runs allowed per 9 scale — or RA9 — to help us better understand pitchers on a deeper level.

ERA is still a valuable piece of anyone’s baseball lexicon. Being the purest form of a results-based statistic (or nearly the purest; RA9, which ignores fielding errors and unearned runs, probably takes first place), it holds significant value. It is simply the amount of earned runs the pitcher averages on a nine-inning scale. However, when we want to evaluate pitchers and look forward, the statistic is lacking.

This is why we turn to statistics such as FIP, xFIP, SIERA, and DRA to help us answer questions that ERA can’t. All four of these metrics provide value to the discussion about pitchers and can inform you of their skill as well as provide a picture of what they may perform at in the future. All in all, each does so in a different manner. In order to effectively use these statistics, it’s helpful to know how they’re formulated and the ideology behind it.

FIP

Fielding Independent Pitching, or FIP, embraces the three true outcome ideology. Research done by Voros McCracken in the early 2000’s led us to believe that pitchers have little to no ability to impact outcomes on balls in play. McCracken believed that this was a function of the skill of the defense and luck. Thus, he stripped this down to strikeouts, home runs, and all forms of non-intentional walks to form the DIPS theory, or Defensive Independent Pitching. Tom Tango took this research a step further by establishing constants for each of the inputs and creating FIP.

Since then, the equation has been tweaked a bit and can vary slightly by source. FanGraphs is the most prominent source for FIP today. Their formula essentially takes Tango’s and adds a constant to it. That constant is derived from the difference in the league average ERA minus the league average FIP. This is done “solely to bring FIP onto an ERA scale.” FanGraphs popularized FIP by basing their analysis and WAR metric on it, and they continue to do so today.

Overall, FIP was and still is seen by some as the most effective way to measure a pitcher’s skill. By removing most of the impact of the defense, it was thought that the statistic boiled the game down to the true elements of pitching.

xFIP

The FIP concept was taken a step further by Dave Studeman of the Hardball Times with xFIP. Studeman surmises that pitchers do have limited control over batted balls. Their control comes in the form of flyballs. Essentially, the formula replaces the home run component with the amount of flyballs the pitcher gave up multiplied by the league average HR/FB rate.

The reasoning behind this change lies in the variance in home run rates across the league. Generally, HR/FB rates bears a lot of volatility from year to year, which may be influenced by the ball’s construction. So, the statistic attempts to peg a pitcher’s performance to the league average HR/FB under the assumption that they can control how many fly balls they allow.

However, it seems that the statistic is biting off a bit more than it can chew in that regard. In some ways it’s saying that fly balls are bad contact and that we should implicitly reward ground ball pitchers. This line of thinking has been a bit bastardized today, but it seems foolish to paint with that broad of a brush. Especially when we find that not all hitters are capable of being effective fly ball hitters. On top of that, the assumption that pitchers cannot influence the quality of contact they allow is refutable in and of itself. Still, this doesn’t make xFIP bad, per se, simply more limited in the questioms it can answer and insight it can provide. It’s something to be aware of when trying to use it to evaluate pitchers.

SIERA

SIERA attempts to introduce several more factors to the equation in comparison to FIP and xFIP. Matt Swartz and Eric Seidman developed SIERA back in 2010. The operative goal of SIERA is to drill down to the pitcher’s actual skill level on an ERA scale.

In order to do that, SIERA makes assumptions about the results that further a pitcher’s goals based on how they tend to pitch. For example, if a pitcher has a higher walk rate, then each ground ball is more valuable. That’s because it should result in more double plays. It is also a big, big fan of strikeouts. If you’re a low strikeout pitcher, each strikeout is valuable to it because it results in runners left on base. For high strikeout pitchers, it just generally assumes that they’re better because they induce less contact and lower quality contact.

One issue with SIERA is that I’m not sure it has been updated in quite a while. That’s not necessarily the biggest issue — it’s also exceedingly complicated and kind of obsolete at this point — but you wonder if it could use an update.

DRA

DRA is the most recent addition to this cadre. Developed by Jonathan Judge, Harry Pavlidis, Dan Turkenkopf, and the BP Stats Team in 2015, DRA attempts to introduce context to pitching and inform just how skilled a pitcher is. Essentially, it differs from defense independent statistics like FIP in that it embraces contact and believes (based on loads of research and data) that many pitchers can influence things like contact quality.

DRA assigns linear weights to events that are derived through mixed modeling. The factors, which can run from things like the pitcher’s ability to stymie base runners to the temperature at game time, are compiled in total to achieve the coefficients necessary to calculate DRA. This is somewhat similar to how wOBA is derived. DRA has been tweaked a few times since its rollout. The most prominent of which was when the introduction of batted balls into the model was unveiled last year. This new piece to the pie allows for DRA to more accurately account for both home runs and BABIP, which are traditionally very hard to predict.

Performance wise, DRA correlates best with a pitcher’s next season ERA. The predictive quality of DRA is slightly more potent that other stats listed in this article, as well has having better performance in reliability and descriptiveness. DRA is the most complex of this group, but it certainly shows in the success of the metric.

Anthony Rescan is a Featured Writer at Beyond the Box Score and a Stats Intern at Baseball Prospectus. You can follow him on Twitter at @AnthonyRescan.