Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Why Hockey Fans Should Root For Devils Vs. Kings

How does it all relate?

Hey all, I originally posted this over at Purple Row. Pardon me if it seems somewhat simplistic or a little "audience specific" or something but I figured hey it might have some interesting stuff and can only do more good than harm. Enjoy.

 

Hey everyone, after reading RockiesMagicNumber's post about Chris Iannetta, it got me thinking about how much BABIP is correlated with other statistics. Specifically, in the post he discusses how Iannetta's linedrive rate is uncharacteristically low and that's contributing towards his low BABIP. He mentions it in that post but BABIP is Batting Average on Balls In Play. I guess you could call it a "sabermetric stat" but honestly it's just batting average without counting strike outs and home runs. It also gets referred to as an index of luck. Batters tend to have a BABIP around .300. Power hitters and speedy hitters tend to average a BABIP higher than that. (Garrett Atkins, Ian Stewart, and Chris Iannetta are all currently sporting BABIPs of .250~)

 

Anyway, I wanted to see just how correlated linedrive rate and BABIP are. I got the numbers from fangraphs (with the import to excel feature - seriously, how friggin cool is that website?) for the 164 batters who qualify and ran some correlations.

Here's the output:

Drddrdrrdrdrdr_medium

via img99.imageshack.us


None of this stuff is too profound, and there's a good chance it was already available somewhere on the net, but what the hell, it's an off day and I needed a good excuse to mess around with SPSS.

So in the end, BABIP and line drive % have a correlation (r) of .489. To get the r-squared statistic (r^2), you take the correlation coefficient r (.489), and square it (straightforward enough). The r^2 stat for this is .239. What this basically means (and please correct me if I'm wrong), is that 24% of the variation in BABIP is attributable to line drive rate.

For those of you who don't know, correlations have a range of  -1.000 through 1.000. The further you are away from the midpoint 0, the stronger the correlation.

So some other things in that table - BABIP and Fly ball percentage have an even stronger correlation than BABIP and linedrive rate (although only by .001). However, whereas an increase in linedrive rate leads to an increase in BABIP, an increase in flyball rate leads to a decrease in BABIP - Iannetta's increase in fly balls this year may also be why his BABIP is down.

Unsurprisingly, Flyball rate and groundball rate have a -.931 correlation.

 

P.S. Ironically, Chris Iannetta's numbers did not contribute towards this "study" as he does not have enough at bats to qualify.

Comment 10 comments  |  0 recs  | 

Do you like this story?

Comments

Display:

Maybe this is a result of not really understanding correlation numbers too well (thanks for the short explanation though)...

Line drives are usually the type of hits that tend to “fall in” the most, as you can see from the larger positive correlation between it and BABIP, as compared with grounders and flies [I think]. Grounders depend a lot more on where they’re hit to, since the infield can usually get to them easier than liners (most grounders get there slower than liners). Flies are usually caught, so they hurt BABIP a lot.

IIRC, when trying to figure out if a player has a “lucky” BABIP, you’re supposed to compare it to LD%.120. If it’s lower than the “predicted BABIP” (LD%.120), then he’s been unlucky; if it’s higher, then then he’s been lucky. Could you possibly figure out the correlation between the “predicted BABIP” and actual BABIP?

by bdalebs on Aug 4, 2009 2:13 AM EDT reply actions  

Hey, thanks for the comment

Could you clarify the LD%.120 thing a bit?

I think I know what you’re getting at – if I recall correctly, adding .120 to line drive rate gives one’s predicted BABIP. However, I’m pretty sure adding a constant of .120 to everyone’s LD% wouldn’t actually change the correlation between that and BABIP. Of course I could be wrong, I just woke up and am not going to be around until much later today, but I’ll be happy to play around with the numbers later.

At any rate, you’re absolutely right about LD’s, FB’s, and GB’s and how often they turn into hits – I guess I just did this to see exactly how much they’re related.

The Rockies need some oldschool purple/white striped high socks. The team’s problem is it’s lack of swagger. I feel strongly that these socks will provide the swagger necessary to tap the potential that are the Rockies.

by Resolution on Aug 4, 2009 8:01 AM EDT up reply actions  

That is the really general way to do it...

But most people have sort of dropped that approximation. it’s really rough and has a very low correlation (I can’t recall the number, but THT ran the numbers in comparison with xBABIP and some other predictors). Gun to your head, I’d say it’s better than guessing, but if you really want to predict/decide if a player’s been lucky or not, I’d say it might better to grab an xBABIP calculator.

by SFiercex4 on Aug 4, 2009 11:56 AM EDT up reply actions  

True.

See, this is why I read BtB more than I participate.

by bdalebs on Aug 4, 2009 2:31 PM EDT up reply actions  

I remember the same article.

I believe the author’s conclusion was that it’s better to take the average of previous years’ BABIP (where possible) than to use LD% + .120.

by jwiscarson on Aug 6, 2009 2:28 PM EDT up reply actions  

You know...

I did the same thing forever before I read that article. It’s hard to keep up with all the magic number stuff (such as converting between RA/ERA and the multiplier to get a rough wOBA estimate), and know what’s sensible and what isn’t.

by jwiscarson on Aug 7, 2009 4:29 PM EDT up reply actions  

next step

Now that you’ve established what appears to be a positive, linear relationship, you ought to do some linear regression. Your super high correlations might need to be treated as co-variates. Use the regression to look into what’s got the greatest weight when predicting BABIP on your next day off. Remember, correlation only gives you a small part of the picture, and doesn’t tell you much about the true strength of the relationship between variables. Nice work, otherwise. I’ve been using SPSS to create some atheoretical regression equations for fantasy for a couple of years now, and have actually enjoyed a little success when making trades.

by phdstud on Aug 13, 2009 8:52 AM EDT reply actions  

Hey

I really appreciate this as my statistical background is pretty green. Thanks for the pointers, the toughest part will be finding a solid day off!

The Rockies need some oldschool purple/white striped high socks. The team’s problem is it’s lack of swagger. I feel strongly that these socks will provide the swagger necessary to tap the potential that are the Rockies.

by Resolution on Aug 13, 2009 2:56 PM EDT up reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

Yahoo_full_count

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Recent_pic_pg_small Patrick Gordon

Btbpro_small Dave Gershman

Me_small Bryan Grosnick

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung

30472_1481067225243_1190689185_1381415_997334_n_small Glenn DuPaul

1mnvxku7_small joshuaworn

Set_small MattFilippi18

Photo0011_small Nathaniel Stoltz