Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: The Most Dangerous Division in Sports

What Hitting Metrics Correlate Year-to-Year?

The world of baseball does not want for statistics. Indeed, at this point there is a metric for just about any outcome one could be interested in (as well three or four different versions of it).

But not all statistics are created equal. Each tells us something different. As Dave Cameron recently pointed out, one can group baseball metrics into a number of different categorical schemes, one of which being descriptive versus predictive.

This is why it is important to not just know the numbers, but the story behind the numbers. How are they constructed? What are they meant to capture? Some statistics are simply reflections of current performance while others reveal more about a player's skills or talent outside of a single year. It's critical that we understand the difference.

One place to start in order to bucket metrics into one of these two categories is the extent to which a metric correlates year over year. If a statistic in year one does not correlate all that well to itself in year two it generally is more descriptive than predictive. 

This is the underlying logic behind pitching metrics like DIPS and FIP. Since ERA has a generally low year-to-year correlation (.38), it was a poor predictor of future performance and true talent.

I think if you ask most people what offensive statistics correlate year-to-year you won't find many confident answers. In order to help us along the journey I decided to run some correlations for common, and uncommon, batting statistics. For those that live in SQL, these numbers are probably well known. But for most, I think it is helpful to have them posted for reference.

Here are the results:

Star-divide

 

 

The correlations above were calculated using hitters from 2001 to 2008 that had at least 300 plate appearances in back to back seasons.* 

Not surprisingly, Batting Average comes in at about the same consistency for hitters as ERA for pitchers. One reason why BA is so inconsistent is that it is highly correlated to Batting Average on Balls in Play (BABIP)--.79--and BABIP only has a year-to-year correlation of .35.

Descriptive statistics like OBP and SLG fair much better, both coming in at .62 and .63 respectively. When many argue that OBP is a better statistic than BA it is for a number of reasons, but one is that it's more reliable in terms of identifying a hitter's true skill since it correlates more year-to-year. Coincidentally, OBP also has a much lower correlation to BABIP--.58--and a high correlation to BB%--.74--hence it's higher degree of correlation.

What I find amazing is that of all these metrics, Line Drive Percentage (LD%) is easily the lowest at .22. Interestingly enough, this is similar to what folks have found for pitchers. What's puzzling to me is that while LD% is highly variable year-to-year for hitters, GB% and FB% are not. One possibility is that his reflects a coding error in the batted ball data, but if that was the case I would assume the other types would show similar variability. But they don't.

The other interesting thing is that the majority of the plate discipline statistics show fantastic correlation year-to-year. It would appear that the degree to which a hitter is patient, a free swinger, shows good selection, etc, really doesn't vary all that much. (My guess is that's it's more likely to change at the very beginning and end of player's careers.). What's really interesting is that Zone% is so low. When we do see a change in these statistics it should serve as a red flag that something may have truly changed with a hitter since randomness likely isn't the culprit (injury, aging, change in approach or mechanics, etc.). 

For future reference we'll post a link to the correlations in the Saber Toolbox (left-hand side of the page). Hope you find it useful.

Also, here's a link to a correlation table that shows the general relationship between each statistic in Year 1 and all other statistics in Year 2. Note that the correlations vary a bit from the analysis above since the N size was different.

-------------------

*For some statistics, like batted ball and plate discipline metrics, the data only goes back to 2002. All data used courtesy of FanGraphs.

Comment 20 comments  |  6 recs  | 

Do you like this story?

Comments

Display:

If you use Zone% based upon PITCHf/x data rather than BIS data

You will find that it has a really high year-to-year correlation, also, in the same neighborhood with the other plate discipline stats.

by Mike Fast on Sep 1, 2011 1:06 PM EDT reply actions  

Very interesting, Mike

Yeah, when I saw it it struck me as really odd, given the consistency of the other metrics in that category.

I wonder why that particular one would be so different between the two when the other metrics that rely on a similar definition of the zone aren’t (i.e. Z-Contact, Z-Swing, etc)?

Columnist at Beyond the Box Score. Contributor at Amazin' Avenue.

by Bill Petti on Sep 1, 2011 1:34 PM EDT up reply actions  

It probably is impacting them a little bit, too.

For example, I have Z-Contact based upon PITCHf/x data correlating year-to-y ear at r=0.85. (I had limited my sample to 1000 pitches in each season, which should be very close to your 300 PA, as it turns out.)

by Mike Fast on Sep 1, 2011 3:16 PM EDT up reply actions  

Awesome

A resource going forward. Bookmarked.

by James Kannengieser on Sep 1, 2011 3:29 PM EDT reply actions  

LD%

The weird thing about LD% is within-season correlation for pitchers. I’ve seen lots of pitchers who have an exceptional amount of streakiness for LD%. They may throw 10-20 games or a full season with LD%<.20, and then follow that up with a similar number of games with LD%>.20. Looking at the fangraphs LD% game-by-game plots illustrates this really well. If it were truly random fluctuation you’d expect a lot more zig zag than you get in most pitchers.

by Nivra on Sep 1, 2011 8:32 PM EDT reply actions  

Types of teams they face?

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Sep 2, 2011 8:55 AM EDT up reply actions   1 recs

Stringers

Could also be a problem with stringer bias.

by scapistron on Sep 2, 2011 10:00 AM EDT up reply actions  

THIS.

Writer at Beyond the Box Score and The Hardball Times
Pitchf/x enthusiast.

by garik16 on Sep 3, 2011 12:41 PM EDT up reply actions  

YtY Pitching Metrics?

I can’t wait for the follow-up with an analysis of pitching stats! Great work!

by Scott Clarkson on Sep 2, 2011 10:04 AM EDT reply actions  

Great, stuff, Bill

What you’re saying is that Alex Gordon’s BABIP is for real, right?

Look,just tell me what I want to hear and everyone gets out alive, okay?

Making watching baseball as fun as doing your taxes.
My Twitter feed.

by Matt Klaassen on Sep 2, 2011 3:08 PM EDT reply actions  

I'm curious...

since hitter’s BABIP is seen as more of a skill and yet only has a correlation of .35, about how well does pitcher’s BABIP correlate yr-to-yr then?

by UZR Illusion on Sep 2, 2011 4:01 PM EDT reply actions  

Haven't run those numbers yet

But I’ve heard it’s only a third as high for pitchers

Columnist at Beyond the Box Score. Contributor at Amazin' Avenue.

by Bill Petti on Sep 2, 2011 4:06 PM EDT via mobile up reply actions  

A paper written on this same topic ...

We used some variable selection techniques along with a random effects model in a paper recently to get a sense of (a) how consistent a measure is (i.e. how often it is the same for a given player from year-to-year) and (b) how much signal it has (i.e. how often player’s differentiate themselves from the league mean)? We didn’t include all of the pitching metrics you mentioned but for the ones you had at the top, ours agrees. There are other places further down, though, that aren’t as close.

http://www.bepress.com/jqas/vol6/iss3/8/

by James Piette on Sep 2, 2011 5:33 PM EDT reply actions  

If you can send me a copy, I'd love to take a look

To clarify, the metrics above were not for pitchers, but just for individual hitters.

I am taking on pitchers next.

Columnist at Beyond the Box Score. Contributor at Amazin' Avenue.

by Bill Petti on Sep 3, 2011 1:38 PM EDT up reply actions  

What’s puzzling to me is that while LD% is highly variable year-to-year for hitters, GB% and FB% are not. One possibility is that his reflects a coding error in the batted ball data, but if that was the case I would assume the other types would show similar variability. But they don’t.

There’s two types of batted ball scoring bias: ones that have persistent causes and one that have transient causes. Park-based biases would be persistent and would increase year to year correlations; catch/no catch biases would be transient and thus would only persist to the extent something like BABIP persists.

Because of how batted balls are scored, hits are much more susceptible to batted ball biases than outs. Of the batted ball types, line drives are the ones most likely to be hits. That’s at least one possible explanation of why different batted ball types would show different effects from bias. There is also a possibility of random (that is to say, unbiased and unrepeatable) measurement error that would decrease year-to-year correlation of batted ball measures.

by cwyers on Sep 5, 2011 4:13 PM EDT reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

Yahoo_full_count

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Recent_pic_pg_small Patrick Gordon

Btbpro_small Dave Gershman

Me_small Bryan Grosnick

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung

30472_1481067225243_1190689185_1381415_997334_n_small Glenn DuPaul

1mnvxku7_small joshuaworn

Set_small MattFilippi18

Photo0011_small Nathaniel Stoltz