The world of baseball does not want for statistics. Indeed, at this point there is a metric for just about any outcome one could be interested in (as well three or four different versions of it).
But not all statistics are created equal. Each tells us something different. As Dave Cameron recently pointed out, one can group baseball metrics into a number of different categorical schemes, one of which being descriptive versus predictive.
This is why it is important to not just know the numbers, but the story behind the numbers. How are they constructed? What are they meant to capture? Some statistics are simply reflections of current performance while others reveal more about a player's skills or talent outside of a single year. It's critical that we understand the difference.
One place to start in order to bucket metrics into one of these two categories is the extent to which a metric correlates year over year. If a statistic in year one does not correlate all that well to itself in year two it generally is more descriptive than predictive.
This is the underlying logic behind pitching metrics like DIPS and FIP. Since ERA has a generally low year-to-year correlation (.38), it was a poor predictor of future performance and true talent.
I think if you ask most people what offensive statistics correlate year-to-year you won't find many confident answers. In order to help us along the journey I decided to run some correlations for common, and uncommon, batting statistics. For those that live in SQL, these numbers are probably well known. But for most, I think it is helpful to have them posted for reference.
Here are the results:
The correlations above were calculated using hitters from 2001 to 2008 that had at least 300 plate appearances in back to back seasons.*
Not surprisingly, Batting Average comes in at about the same consistency for hitters as ERA for pitchers. One reason why BA is so inconsistent is that it is highly correlated to Batting Average on Balls in Play (BABIP)--.79--and BABIP only has a year-to-year correlation of .35.
Descriptive statistics like OBP and SLG fair much better, both coming in at .62 and .63 respectively. When many argue that OBP is a better statistic than BA it is for a number of reasons, but one is that it's more reliable in terms of identifying a hitter's true skill since it correlates more year-to-year. Coincidentally, OBP also has a much lower correlation to BABIP--.58--and a high correlation to BB%--.74--hence it's higher degree of correlation.
What I find amazing is that of all these metrics, Line Drive Percentage (LD%) is easily the lowest at .22. Interestingly enough, this is similar to what folks have found for pitchers. What's puzzling to me is that while LD% is highly variable year-to-year for hitters, GB% and FB% are not. One possibility is that his reflects a coding error in the batted ball data, but if that was the case I would assume the other types would show similar variability. But they don't.
The other interesting thing is that the majority of the plate discipline statistics show fantastic correlation year-to-year. It would appear that the degree to which a hitter is patient, a free swinger, shows good selection, etc, really doesn't vary all that much. (My guess is that's it's more likely to change at the very beginning and end of player's careers.). What's really interesting is that Zone% is so low. When we do see a change in these statistics it should serve as a red flag that something may have truly changed with a hitter since randomness likely isn't the culprit (injury, aging, change in approach or mechanics, etc.).
For future reference we'll post a link to the correlations in the Saber Toolbox (left-hand side of the page). Hope you find it useful.
Also, here's a link to a correlation table that shows the general relationship between each statistic in Year 1 and all other statistics in Year 2. Note that the correlations vary a bit from the analysis above since the N size was different.
-------------------
*For some statistics, like batted ball and plate discipline metrics, the data only goes back to 2002. All data used courtesy of FanGraphs.