clock menu more-arrow no yes mobile

Filed under:

Projecting BABIP Part One: Applying Bayes' Theorem

Applying Bayes' Theorem to find bLD, bFB, bGB. Measuring when a hitter reaches base via the three (almost) true outcomes of batted balls -- LD, FB, and GB.

Chris Davis has a .398 BABIP this year
Chris Davis has a .398 BABIP this year
Mitchell Layton

I am intrigued by the idea of stabilizing BABIP and its awful amount of variance year to year. The first step in my process begins here, by finding the rate in which a ballplayer reaches base via LD, FB, and GB.

One reason why you should care is because a large amount of the variability in BABIP year to year, lies within the volatility of the actual batted ball info that it measures. For one, BABIP for LD, FB, and GB individually possess an incredible amount of noise year to year (which we will get to later).

Thus, it came to my attention that possibly there is a way to measure nearly the same thing as BABIP, but with much less noise year to year.

To do so, I will apply Bayes' Rule to BABIP to find this hidden batted-ball rate. (Note: for those who are not sure of what Bayes' Theorem is the formula is below:

BAYES' RULE

P(A | B) = P(B | A) P(A)

P(B)

The symbol "|" signifies a conditional probability.

For a simple example, imagine you want to find the probability that Dominic Brown walks given that its a Tuesday. That probability would look like this P(Walk | Tuesday). And of course the answer would be never or zero.


APPLYING BAYES' RULE

Lets get started!

To do this all we will need is LD%, FB%, GB%, OBP, and LD_BABIP, GB_BABIP, and FB_BABIP. All totals were found in Retrosheet.

We know the probability for each player to get on base is their OBP, this will be our P(A) or our "prior".

The posterior we want to find is P(A|B), where "A" is the batted ball we want (LD, FB, GB) and "B" is the probability of getting on base. In other words, our posterior is the probability of the selected batted ball (LD,FB,GB) given that the player made it on base.

Now hopefully you can see why this is different than just BABIP for LD's, and that once we isolate this number, there will be much less statistical noise.

Let's piece it all together using Bayes Rule above, using Line-Drives as an example:

P(LD | OB) = P(OB | LD) P(LD)

P(OB)

Here, OB is just the probability a player reaches base. We will call the product of this formula bLD,

We wil do the same when selecting for FB and GB, when we find bFB and bGB.

RESULTS

Now for the moment of truth. Using data from 2008 to 2012, with all players having 250 PA and above we will see just how well bLD, bFB, and bGB correlates year to year, and how it compares to BABIP for LD, GB, and FB individually.

Here are the raw results from the above study:

BABIP R^2 bMetrics R^2 DIFF
LD_BABIP 0.031 bLD 0.206 0.175
FB_BABIP 0.331 bGB 0.516 0.185
GB_BABIP 0.181 bFB 0.396 0.215

As you can see, the bMetrics are huge improvements in decreasing the noise year to year in BABIP as it is. For one, the year to year correlations are not perfect but are vastly improved. In a sense, the bMetrics are calculating nearly the same thing as BABIP, but instead we are looking at the probability of each outcome when a player makes it on base rather than the probability of geting on base when hitting a certain batted-ball type.

These bMetrics will come in handy later on when we try to predict BABIP as a whole. Since there is less noise in these metrics in comparison to LD_BABIP and etcetera, they will be useful for when we convert projected batted ball info into BABIP.

If you still don't understand the why this new method is useful let me explain it to you in a simple way:

In 2012, Mark Trumbo led the league in LD_BABIP at .857. At the same time he had the 9th lowest bLD at 29%.

Hope you get the picture, that bMetrics combine the rate in which a player reaches base via a LD, FB, or GB, not the batting average in which they hit a LD, FB or GB. So we can isolate a player's actual propensity to both hit a certain batted-ball type and reach base safely.

In closing here is a sortable table you can play with while you wait for the next installment of the series where we will implement these metrics with our previous pLD, pFB, and pGB models to predict BABIP.

CLOSING RESULTS

(sortable table)

Name yearID bLD bGB bFB LD_BABIP GB_BABIP FB_BABIP
Ryan Braun 2012 0.240 0.252 0.257 0.736 0.327 0.381
Matt Holliday 2012 0.244 0.321 0.227 0.629 0.338 0.316
Edwin Encarnacion 2012 0.247 0.207 0.286 0.684 0.305 0.281
Hunter Pence 2012 0.251 0.363 0.251 0.594 0.286 0.316
Miguel Cabrera 2012 0.268 0.262 0.247 0.636 0.319 0.353
Rickie Weeks 2012 0.273 0.324 0.271 0.636 0.283 0.281
Andrew McCutchen 2012 0.282 0.260 0.212 0.75 0.346 0.36
Mark Trumbo 2012 0.290 0.277 0.282 0.857 0.295 0.339
Miguel Montero 2012 0.291 0.224 0.233 0.764 0.286 0.355
Josh Willingham 2012 0.293 0.211 0.274 0.793 0.285 0.328
Ian Desmond 2012 0.293 0.323 0.231 0.753 0.312 0.308
Aramis Ramirez 2012 0.303 0.208 0.270 0.745 0.251 0.295
Danny Espinosa 2012 0.306 0.337 0.288 0.662 0.293 0.346
Drew Stubbs 2012 0.308 0.353 0.288 0.745 0.247 0.308
Corey Hart 2012 0.311 0.223 0.336 0.714 0.246 0.367
Aaron Hill 2012 0.311 0.227 0.286 0.676 0.304 0.294
Derek Jeter 2012 0.313 0.331 0.125 0.71 0.261 0.388
Albert Pujols 2012 0.314 0.200 0.303 0.716 0.207 0.325
B.J. Upton 2012 0.320 0.300 0.364 0.677 0.293 0.348
Torii Hunter 2012 0.321 0.328 0.174 0.767 0.34 0.369
Carlos Pena 2012 0.324 0.248 0.323 0.625 0.26 0.295
Adam Jones 2012 0.326 0.302 0.267 0.67 0.29 0.361
Carlos Gonzalez 2012 0.328 0.214 0.220 0.811 0.235 0.4
Paul Konerko 2012 0.328 0.210 0.260 0.687 0.238 0.333
Cameron Maybin 2012 0.330 0.359 0.162 0.769 0.243 0.214
Robinson Cano 2012 0.330 0.259 0.211 0.634 0.261 0.403
Shane Victorino 2012 0.330 0.334 0.183 0.691 0.269 0.19
Carlos Santana 2012 0.332 0.225 0.183 0.784 0.235 0.218
Carlos Beltran 2012 0.333 0.226 0.271 0.722 0.239 0.317
Dustin Pedroia 2012 0.333 0.269 0.210 0.716 0.251 0.258
Nick Swisher 2012 0.335 0.210 0.274 0.724 0.261 0.339
Adrian Beltre 2012 0.336 0.238 0.232 0.769 0.294 0.28
Nelson Cruz 2012 0.339 0.202 0.328 0.759 0.205 0.333
Josh Hamilton 2012 0.340 0.175 0.315 0.787 0.229 0.378
Erick Aybar 2012 0.341 0.319 0.184 0.791 0.263 0.27
Mark Reynolds 2012 0.342 0.174 0.358 0.731 0.205 0.368
Prince Fielder 2012 0.342 0.207 0.194 0.699 0.26 0.302
Mark Teixeira 2012 0.343 0.221 0.282 0.667 0.204 0.271
Justin Upton 2012 0.344 0.260 0.197 0.788 0.28 0.262
Ben Zobrist 2012 0.347 0.207 0.204 0.734 0.221 0.27
Billy Butler 2012 0.350 0.227 0.212 0.741 0.243 0.373
J.J. Hardy 2012 0.352 0.347 0.264 0.671 0.258 0.214
Austin Jackson 2012 0.361 0.302 0.142 0.775 0.366 0.214
Martin Prado 2012 0.362 0.329 0.150 0.664 0.287 0.215
Jay Bruce 2012 0.367 0.188 0.317 0.753 0.22 0.296
Michael Bourn 2012 0.367 0.333 0.141 0.735 0.272 0.252
Jose Reyes 2012 0.370 0.274 0.186 0.661 0.23 0.224
Dan Uggla 2012 0.370 0.189 0.218 0.829 0.254 0.211
Brandon Phillips 2012 0.371 0.314 0.210 0.687 0.255 0.248
Howie Kendrick 2012 0.371 0.349 0.142 0.771 0.255 0.292
Yunel Escobar 2012 0.374 0.425 0.147 0.624 0.236 0.183
Starlin Castro 2012 0.374 0.280 0.220 0.709 0.229 0.267
Jimmy Rollins 2012 0.375 0.204 0.281 0.738 0.194 0.253
Elvis Andrus 2012 0.377 0.357 0.110 0.702 0.255 0.214
Ian Kinsler 2012 0.380 0.206 0.267 0.729 0.211 0.244
Dexter Fowler 2012 0.381 0.229 0.195 0.75 0.313 0.312
Alfonso Soriano 2012 0.382 0.194 0.303 0.81 0.235 0.299
Matt Wieters 2012 0.382 0.176 0.273 0.789 0.168 0.324
Andre Ethier 2012 0.387 0.242 0.223 0.724 0.253 0.308
Kelly Johnson 2012 0.387 0.276 0.249 0.696 0.23 0.279
Gordon Beckham 2012 0.389 0.268 0.287 0.654 0.23 0.222
Coco Crisp 2012 0.392 0.244 0.206 0.75 0.212 0.223
Jamey Carroll 2012 0.392 0.345 0.129 0.657 0.243 0.256
Omar Infante 2012 0.394 0.271 0.227 0.714 0.242 0.212
Alex Gordon 2012 0.394 0.208 0.213 0.784 0.245 0.324
Yadier Molina 2012 0.399 0.236 0.169 0.721 0.265 0.216
Neil Walker 2012 0.404 0.270 0.203 0.722 0.274 0.259
Adrian Gonzalez 2012 0.406 0.208 0.254 0.752 0.23 0.318
Colby Rasmus 2012 0.408 0.271 0.301 0.693 0.246 0.244
Angel Pagan 2012 0.410 0.273 0.173 0.771 0.274 0.206
Alexei Ramirez 2012 0.412 0.360 0.212 0.68 0.254 0.206
Alberto Callaspo 2012 0.413 0.232 0.177 0.747 0.199 0.185
David DeJesus 2012 0.418 0.232 0.183 0.717 0.228 0.208
Curtis Granderson 2012 0.419 0.153 0.374 0.75 0.19 0.35
Alcides Escobar 2012 0.420 0.341 0.131 0.75 0.263 0.228
Jhonny Peralta 2012 0.422 0.240 0.287 0.667 0.202 0.272
Carlos Lee 2012 0.422 0.229 0.188 0.697 0.205 0.185
Asdrubal Cabrera 2012 0.423 0.204 0.231 0.747 0.205 0.269
Jeff Francoeur 2012 0.430 0.265 0.296 0.684 0.2 0.298
Freddie Freeman 2012 0.441 0.203 0.257 0.677 0.218 0.278
Delmon Young 2012 0.458 0.311 0.212 0.739 0.261 0.222
Michael Young 2012 0.473 0.371 0.106 0.717 0.242 0.154
Darwin Barney 2012 0.498 0.276 0.157 0.723 0.186 0.169
Ichiro Suzuki 2012 0.504 0.400 0.157 0.642 0.247 0.202

All statistics are from Fangraphs, or the Lahman and Retrosheet database

Btbs-twitter-insert_medium