How does it all relate?
Hey all, I originally posted this over at Purple Row. Pardon me if it seems somewhat simplistic or a little "audience specific" or something but I figured hey it might have some interesting stuff and can only do more good than harm. Enjoy.
Hey everyone, after reading RockiesMagicNumber's post about Chris Iannetta, it got me thinking about how much BABIP is correlated with other statistics. Specifically, in the post he discusses how Iannetta's linedrive rate is uncharacteristically low and that's contributing towards his low BABIP. He mentions it in that post but BABIP is Batting Average on Balls In Play. I guess you could call it a "sabermetric stat" but honestly it's just batting average without counting strike outs and home runs. It also gets referred to as an index of luck. Batters tend to have a BABIP around .300. Power hitters and speedy hitters tend to average a BABIP higher than that. (Garrett Atkins, Ian Stewart, and Chris Iannetta are all currently sporting BABIPs of .250~)
Anyway, I wanted to see just how correlated linedrive rate and BABIP are. I got the numbers from fangraphs (with the import to excel feature - seriously, how friggin cool is that website?) for the 164 batters who qualify and ran some correlations.
Here's the output:
None of this stuff is too profound, and there's a good chance it was already available somewhere on the net, but what the hell, it's an off day and I needed a good excuse to mess around with SPSS.
So in the end, BABIP and line drive % have a correlation (r) of .489. To get the r-squared statistic (r^2), you take the correlation coefficient r (.489), and square it (straightforward enough). The r^2 stat for this is .239. What this basically means (and please correct me if I'm wrong), is that 24% of the variation in BABIP is attributable to line drive rate.
For those of you who don't know, correlations have a range of -1.000 through 1.000. The further you are away from the midpoint 0, the stronger the correlation.
So some other things in that table - BABIP and Fly ball percentage have an even stronger correlation than BABIP and linedrive rate (although only by .001). However, whereas an increase in linedrive rate leads to an increase in BABIP, an increase in flyball rate leads to a decrease in BABIP - Iannetta's increase in fly balls this year may also be why his BABIP is down.
Unsurprisingly, Flyball rate and groundball rate have a -.931 correlation.
P.S. Ironically, Chris Iannetta's numbers did not contribute towards this "study" as he does not have enough at bats to qualify.
10 comments
|
0 recs |
Do you like this story?
Comments
Maybe this is a result of not really understanding correlation numbers too well (thanks for the short explanation though)...
Line drives are usually the type of hits that tend to “fall in” the most, as you can see from the larger positive correlation between it and BABIP, as compared with grounders and flies [I think]. Grounders depend a lot more on where they’re hit to, since the infield can usually get to them easier than liners (most grounders get there slower than liners). Flies are usually caught, so they hurt BABIP a lot.
IIRC, when trying to figure out if a player has a “lucky” BABIP, you’re supposed to compare it to LD%.120. If it’s lower than the “predicted BABIP” (LD%.120), then he’s been unlucky; if it’s higher, then then he’s been lucky. Could you possibly figure out the correlation between the “predicted BABIP” and actual BABIP?
@bs_uf15bosox9be The Original Gameday; Learn to use SB Nation
Hey, thanks for the comment
Could you clarify the LD%.120 thing a bit?
I think I know what you’re getting at – if I recall correctly, adding .120 to line drive rate gives one’s predicted BABIP. However, I’m pretty sure adding a constant of .120 to everyone’s LD% wouldn’t actually change the correlation between that and BABIP. Of course I could be wrong, I just woke up and am not going to be around until much later today, but I’ll be happy to play around with the numbers later.
At any rate, you’re absolutely right about LD’s, FB’s, and GB’s and how often they turn into hits – I guess I just did this to see exactly how much they’re related.
The Rockies need some oldschool purple/white striped high socks. The team’s problem is it’s lack of swagger. I feel strongly that these socks will provide the swagger necessary to tap the potential that are the Rockies.
That is the really general way to do it...
But most people have sort of dropped that approximation. it’s really rough and has a very low correlation (I can’t recall the number, but THT ran the numbers in comparison with xBABIP and some other predictors). Gun to your head, I’d say it’s better than guessing, but if you really want to predict/decide if a player’s been lucky or not, I’d say it might better to grab an xBABIP calculator.
True.
See, this is why I read BtB more than I participate.
@bs_uf15bosox9be The Original Gameday; Learn to use SB Nation
I remember the same article.
I believe the author’s conclusion was that it’s better to take the average of previous years’ BABIP (where possible) than to use LD% + .120.
Yes, I think the correlation was twice as high for previous years BABIP as LD BABIP
F*** Billy Beane... actually, I kinda like Holliday
You know...
I did the same thing forever before I read that article. It’s hard to keep up with all the magic number stuff (such as converting between RA/ERA and the multiplier to get a rough wOBA estimate), and know what’s sensible and what isn’t.
next step
Now that you’ve established what appears to be a positive, linear relationship, you ought to do some linear regression. Your super high correlations might need to be treated as co-variates. Use the regression to look into what’s got the greatest weight when predicting BABIP on your next day off. Remember, correlation only gives you a small part of the picture, and doesn’t tell you much about the true strength of the relationship between variables. Nice work, otherwise. I’ve been using SPSS to create some atheoretical regression equations for fantasy for a couple of years now, and have actually enjoyed a little success when making trades.
Hey
I really appreciate this as my statistical background is pretty green. Thanks for the pointers, the toughest part will be finding a solid day off!
The Rockies need some oldschool purple/white striped high socks. The team’s problem is it’s lack of swagger. I feel strongly that these socks will provide the swagger necessary to tap the potential that are the Rockies.

by 



























