/cdn.vox-cdn.com/photo_images/8079718/20120727_ajl_aj5_291.jpg)
It's time for a mildly-embarassing admission. I may write for Beyond The Boxscore, historically one of the best sites for sabermetric analysis on the web, but I'm not a brilliant sabermetrician. Don't get me wrong, I love advanced stats and the ability to objectively judge player performance, but when it comes to statistics, I can be kind of a bonehead* compared to some of the better names in the field. I can't run SQL queries, advanced math tends to make my eyes bleed and, as much as I try and research, sometimes I wind up using advanced stats wrong.
* Note: I'm not telling this so you stop reading my articles. I'm a smart guy, I just consider myself more of a writer-analyst than a statistician or economist or even researcher. And I'm learning and getting better.
The reason I'm telling you all of this is because today I cleared up a misunderstanding that I've had for a while about the nature of ERA+, an advanced statistic that can be found at Baseball-Reference, and ERA-, an advanced statistic that can be found at FanGraphs. ERA+ can be used to compare pitcher performance to the league average. So can ERA-. Except, when it comes to reading these stats, I'd been doing it wrong.
I'd like to explain my mistake to you, and how I've been able to learn from it, because (1) it's possible that someone else might be making the same mistake I did -- it's not the most clear difference in the world, (2) I'm of the point of view that one of these stats isn't very useful, (3) it's a great example of how there are fabulous resources on the web to learn from, and (4) there may be a lesson to be learned here about assumptions.
Today, as usual, another writer's article sent me off on my own line of inquiry. This time, it was our own Glenn DuPaul's piece on rWAR and and RA9-Wins. I was looking at park factors, and comparing pitching stats on the Minnesota Twins*, and I was building a table full of pitching stats. I was pulling stats from both Baseball-Reference and FanGraphs, and noticed that Scott Diamond's ERA+ was 117, and his ERA- was 86. I wondered why this was the case.
* Note: Spoiler alert, an article may be forthcoming on the subject.
Now, the last thing I want to do is confuse you, dear reader ... but basically, I thought that ERA+ and ERA- said the same thing. They don't. What I thought was the case is less important than the fact that I realized that it didn't say the same thing, and that I was slightly confused. I went out to Twitter to ask for help, and in the meantime, started looking for answers on my own.
As always, I started with the FanGraphs Library, which in turn led me to a great article by Patriot over at Walk Like A Sabermetrician. You can find the article here. Patriot does a great job of explaining the differences between ERA+, a newer version of ERA+ that was implemented at B-R (and since removed), and aERA (another term for ERA-) in great detail. I highly recommend both the article, and his blog at large. At any rate, between this article, and helpful tweets from saber All-Stars Sky Kalkman and James Gentile, they sorted me out right away, and now I'm smarter than I was a few hours ago.
So, what is the difference between ERA+ and ERA-? I'm glad you asked, because now I know, and I can explain it.
I'll start by describing ERA-. ERA- is an extraordinarily useful stat to use when comparing a pitcher's performance to the league average. Basically, ERA- takes a pitcher's ERA and compares it to league-average, scoring from a center of 100. For every point below 100 in an ERA- score, that's means that the pitcher's ERA is one percentage point better than league-average. For example, if Scott Diamond's ERA- is 86 for 2012, that means that his ERA is 14% better than league average on the season. For more info on ERA-, and how it is calculated, check out Patriot's article that I mentioned before, or this entry at the FanGraphs Library by Steve Slowinski.
I thought that ERA+ was roughly the same thing. It's similar, but definitely different. ERA+ also compares pitching performance (as judged by ERA) to the league average. But instead of telling you how much better (or worse) the pitcher was than the league average, this stat tells you how much better (or worse) the league was than the pitcher. It's not a big difference, but it is a difference. Worst of all, it's scaled the same way as ERA-, except it is inverted so that any digits over 100 are percentage points better than league-average, not worse. For example, Scott Diamond has a 117 ERA+, and that means that the league ERA is 17% higher than Scott's.
ERA- and ERA+ use the same basic numbers, but present them in a different way. In essence, the two scores are using the same numbers, but swapping the numerator and denominator of a fraction. Where ERA+ presents the number as league ERA over ERA, ERA- presents the number as ERA over league ERA. The fraction is inverted. Sky made it more clear to me in the following tweet:
@bgrosnick Take 4 and 5. 4 is 80% of 5, but 5 is 125% of 4. 80% * 125% = 100%. That's why ERA- is better.
— Sky Kalkman (@Sky_Kalkman) September 14, 2012
Finally clear, my next thought was this: "What should we use ERA+ for?"
Quite honestly, I don't know.
Typically, pitchers want a low ERA, but ERA+ is a metric that shows a pitcher's performance scores better when it is higher. Therefore, it's not as intuitive as ERA-. And I like to use stats that are simply descriptive; it's much easier for me to say that "Scott Diamond is 14% better than league average," rather than "The league average is 17% worse than Scott Diamond." Heck, even from a grammar perspective, that last statement sounds a little funky.
Don't get me wrong, I love Baseball-Reference. I just think, in this case, they provide a statistic that has only marginal value ... and to be honest, I can't understand why. Patriot makes one argument for it in his article, and that's that ERA+ can be used, relatively easily, to calculate an expected W%. This is kind of a fun toy, but it's not exactly something I'd consider "useful". Let me provide you with an example.
Take Scott Diamond, who I've used throughout the article. Diamond's ERA+ is 117. You can use this number (when you back out the decimal to 1.17), to find an expected win percentage using the following formula: (ERA+^2) / ((ERA+^2)+1).*
* Note: I may not be great with math, but I can handle basic algebra.
Swapping out Diamond's ERA+, we get the formula (1.17^2) / ((1.17^2)+1), which eventually gives us 57.78% as a W%. I take this to mean that, given the league environment, Scott Diamond should have been expected to win about 58% of the games he pitched, given average league pitching performance. That's great, I guess, but it hardly seems like a practical stat. I don't feel like I can hold that up under fierce scrutiny. There's so many variables to how many games should be won! And it uses earned runs instead of total runs! And relievers, y'know?!
So yeah, I've explored ERA+, and now that I know what it really is, what I don't know is why I would need to use it. ERA- tells a (very slightly) different story, but it does so in a way that better fits all of my mental models towards baseball. Unless a commenter or saberist can make a great argument in its defense, I can't see myself focusing on it when ERA- exists.
In closing, I'd like to provide couple of a small pieces of advice, if I may. The first is, every so often, take the time to go back and re-assess your assumptions. Especially when it comes to baseball, and baseballing statistics ... but hey, if you can, do it for the rest of your life as well. If you're like me, you'll benefit from it. The second is that you shouldn't be afraid to (politely) reach out to folks in the saber community. There are a lot of great minds in this field, and I have never once been disappointed by their willingness to help and to educate.
And hey, if you can, feel free to discuss the usage of ERA+ and ERA- in the comments below. Perhaps we can hash out how best to use these stats.
All stats from Baseball-Reference and FanGraphs.