Guess what new sabermetric toy kept me busy for an entire night recently?
Over at FanGraphs, brilliant dude Steve Staude developd a tool that allows you to find the correlation between any two pitching stats you like. It's very cool*, if you like trying to figure out which stats are most predictive of other stats.
* - "Cool" is used very, very liberally here.
You know what's better than idiot-proof web tools that even people like me can use? Not much. Especially when you can use them to learn just a little more about the stats that you think you know so well.
One of the most fun things about the tool is that it includes a few "secret" pitching stats -- ones you can't easily find on a leaderboard at FanGraphs or Baseball-Reference or BP. Things like Glenn DuPaul's pFIP, Chris Carruthers's TIPS, and Steve's own BERA.
I highly recommend reading Steve's whole article on these stats ... in fact, that's the primary reason why I'd decided to write this piece -- to simply link out to his fine work over at FanGraphs. But I will attempt to value-add with two quick things.
Part One: What Is Correlation?
When we talk about how pitching stats correlate, in this context, we talk about how they relate to each other, and how strongly linked they are to one another.
There are levels of correlation, and they're demonstrated through a value from -1 to 1. The value 1 indicates what's called a perfect positive correlation -- that the two data sets are completely linked: when one value rises, the other value rises the same way. The value 0 indicates no correlation between the two data sets, that the values are linked in no way. And the value -1 indicates a negative positive correlation, that when when value increases, the other value decreases the same way.
Most correlations, when it comes to baseball statistics, don't fall on the extremes. A value of -1 or 1 is very hard to find ... you'll only find it in something like comparing one stat (let's say FIP) to itself during the same season.
Often, what we look for when using correlation data when examining baseball statistics is to help predict the future. We want to know if, say, FIP in one year will correlate with FIP the next year -- can one help predict the future? Unfortunately, most statistics are not very heavily correlated, even one year in the future; you'll see things like a correlation of 0.161, which means that two stats aren't tied together in a very meaningful way. To me, you want to look for correlations of around -0.3 (or lower) and +0.3 (or higher) ... but your mileage might vary.
Also, keep in mind that correlation does not mean causation. If, say strikeout rate and wOBA against have a reasonable negative correlation, we're not saying that having a high strikeout rate causes a low ERA ... we're saying that those two items are related, and perhaps partially dependent. There's a difference.
Part Two: Fun With Correlations
So what can we do with the information? In short, we can learn a few fun things, and quickly dispense with a few preconceived notions. Here's a few minor things I picked up:
SwStr% in Season 0 / K% in Season 1 -- 0.600
This isn't surprising, that SwStr% would help predict a future strikeout rate with some reasonable degree of alacrity. I would figure that how often a guy gets swinging strikes would help us figure out how often he'll strike guys out in the future.
K% in Season 0 / K% in Season 1 -- 0.702
Of course, sometimes it's just smarter to use the simpler statistic. Looking at the correlation, you'll get a better idea of next's season's strikeout rate, by using this season's strikeout rate.
One of the things Steve did is create a table specifically designed towards showing the efficacy of some of the ERA predictors. Personally, I'm less interested in predicting ERA than I am in RA9 -- for reasons I've detailed before. I also tossed in wOBA against and RA9 in addition to the ERA predictors that Steve cited. This is what I found:
|Stat 1||Year 1||Stat 2||Year 2||Correlation|
Basically, it shows virtually no change from Steve's table -- much as I expected. My biggest takeaway here is that RA9, by and large, is slightly more correlation-friendly with the ERA predictors than ERA is. The difference is almost negligible, but it's there, and it runs all the way across the board ... more or less.
I was a little surprised that RA9 actually has a slightly lower correlation against future RA9 than ERA does against future RA9, but those numbers are basically the same. wOBA against proves to be less correlated to future RA9 than other ERA predictors, which isn't crazy surprising either. kwERA and xFIP swap spots on the list, and just about everything shows a slightly higher correlation, which is cool -- but not the most useful information.
HR/FB in Season 0 / HR/FB in Season 1 -- 0.090
There's almost no correlation between home runs per fly ball from one season to the next, which is interesting to me. I'd figure there'd be some relation between one year and the next ... and I guess there is a little, but there's hardly any. Another thing that doesn't appear to be a solid, repeatable skill -- and pretty much the reason xFIP exists.
K% in Season 0 / LOB% in Season 1 -- 0.348
I've always thought that having a high strikeout rate is a good way to strand runners on base. It's actually not a terribly strong correlation at all -- but it's not nothing either.
The fun part about the tool is that there are tons of things to explore and use. I'd advise anyone interested in pitching stats to at least mess around with it a little bit, and see what they can find. And thanks again to Steve for making a cool tool like this public.
. . .
All statistics courtesy of FanGraphs.
Bryan Grosnick is the Managing Editor of Beyond the Box Score. You can follow him on Twitter at @bgrosnick.