Filed under:

# BABIP: Separating fact from fiction

We have a tendency to associate BABIP with luck, but what other factors play into batting average of balls in play?

It was just another weekday morning. I was on my rise and grind with coffee in hand while shaking off the cobwebs, when this gem was dropped directly in front of me:

Merry May Day indeed, I clicked Jeff Zimmerman's link and found myself staring at several hundred xBABIP numbers. That was cool and all, but only being somewhat familiar with xBABIP, I decided to dig into the stat and it's principles more deeply than before. While Zimmerman didn't divulge his exact methodology, presumably because he's working with non-public data, he did confirm that he was utilizing batted ball outcomes (hard hit percentage) and batter speed (speed scores) to come to his findings.

Before we get to those findings, however, I think it's important to have a brief conversation about BABIP in general. BABIP, or batting average on balls in play, refers to the rate at which batted balls become hits (excluding home runs). Baseball Prospectus calculates this as:

For context, we've all heard that about 30% of batted balls become hits, therefore we expect to see batters with a BABIP at or around .300. When we see a player who's posting a .380 BABIP, we declare him lucky and say that regression will wreak havoc on the rest of his season. When we see a player with a .240 BABIP, we declare him unlucky and expect brighter days to be on the horizon. For a lot of us, this is the extent of our knowledge around BABIP; we use it as a barometer and adjust expectations accordingly.

But perhaps some of us have been far too literal when reading BABIP disclaimers in the past. We passively accept a .300 BABIP as the benchmark and expect results to fluctuate in accordance with the principles of regression. More importantly, the casual passerby will often treat BABIP as something a batter has little or no control over, but that's not the case. Yes, batting average on balls in play is a volatile and difficult stat to project and it doesn't stabilize for several seasons (2.5 seasons for hitters8 seasons for pitchers), but that doesn't mean we have to helplessly accept a .300 BABIP as the standard for all hitters. We often equate BABIP to luck, but there's a saying around sports, and life in general, that one often makes his or her own luck. BABIP falls in this category. A gross adjustment to a batter's approach (trading grounders for flies, or vice versa) or major mechanical changes (changes in stance or swing) can tweak a player's BABIP, as can the defense he's hitting into (such as the shift) and other factors. Luck is part of the equation, but it's just one part.

As we've learned, each hitter has an individual BABIP, but because these can fluctuate so wildly from season to season, it can be hard to really pin it down until we have several seasons of data to work with. Miguel Cabrera is arguably the greatest hitter in the game and has a .346 BABIP over 10 qualified seasons. We'd hardly say he's been lucky as he makes consistent hard contact. Kurt Suzuki, on the other hand, does not make a lot of hard contact and has a .269 career BABIP. If you've seen him play, it doesn't necessarily seem like he's been unlucky, he's just not very good at hitting baseballs (at least not when compared to Miguel Cabrera, which might be an unfair comparison). You get the idea, the ability of the batter, especially when it comes to making consistently hard contact, has something to do with his BABIP.

Enter Billy Hamilton. If you've seen the young speedster play this season, you know he's not making a lot of hard contact. He's not the lacing lasers into the gaps for Cincinnati, but he makes up for the lack of hard contact by having the greatest foot speed in the game. His .322 BABIP over his first 40 major league games is largely a reflection of his ability to beat out infield hits. Ichiro was famous for this in Seattle as have been others over the years. We know that just as hitting the ball hard manipulates a player's BABIP, so does his speed.

Here's where Zimmerman's data comes back into play. He calculated this xBABIP data by using hard hit percentages and speed scores, taking quality contact and batter speed into account to calculate what a player's BABIP should be based upon these factors. If a mad scientist created a baseball monster by blending Miguel Cabrera's bat with Billy Hamilton's speed, we could calculate what that monster's BABIP would be. We don't have to rely on hypothetical monsters, however. We have 194 qualified hitters we can look at instead. Please note, this is Zimmerman's original data that he shared. I simply filtered it for qualified hitters and added the "Differential" category.

Rather than discuss 194 qualified hitters and their current BABIP and xBABIP numbers, I'd rather focus on that last column, titled "Differential." Here I simply found the difference between the players' current BABIP and calculated xBABIP (xBABIP - BABIP). Let's identify the top players who are underperforming their xBABIP at the moment based on how often they've hit the ball hard and/or how well they run. In other words, based on Zimmerman's work, these are the tough luck guys so far in 2014.

Name Hard Hit % BABIP Speed xBABIP Differential
Pedro Alvarez 30.43% 0.161 3.6 0.293 0.132
Alejandro De Aza 33.33% 0.191 5.6 0.316 0.125
Curtis Granderson 32.76% 0.186 4.3 0.305 0.119
Jhonny Peralta 36.11% 0.178 0.8 0.295 0.117
Chris Carter 44.44% 0.217 0.9 0.327 0.110
Mike Moustakas 21.13% 0.132 0.6 0.239 0.107
Carlos Santana 28.13% 0.164 1.0 0.268 0.104
Jedd Gyorko 29.03% 0.203 5.8 0.302 0.099
Yonder Alonso 24.39% 0.188 3.9 0.272 0.084
Asdrubal Cabrera 38.96% 0.263 6.9 0.345 0.082
Raul Ibanez 24.07% 0.185 2.9 0.265 0.080
Brian Dozier 27.27% 0.222 6.1 0.297 0.075
Pablo Sandoval 28.17% 0.208 3.4 0.283 0.075
Brian McCann 35.62% 0.229 2.0 0.302 0.073
Neil Walker 29.89% 0.217 3.3 0.289 0.072
Jason Kipnis 36.36% 0.250 4.0 0.317 0.067
Brett Lawrie 20.78% 0.176 0.8 0.239 0.063
Elvis Andrus 32.26% 0.253 6.2 0.316 0.063
Albert Pujols 31.58% 0.237 3.7 0.298 0.061
Carlos Gonzalez 35.00% 0.266 6.4 0.327 0.061

Based on how often these guys have hit the ball hard and/or fast they've run, we'd fully expect to see more balls in play go for hits. Asdrubal Cabrera and Carlos Gonzalez stand out here as they've hit the ball exceedingly hard in 2014 and run well, yet both have sub-.270 BABIPs so far. If I'm trying to identify guys who are about to break out, this is somewhere to look.

Conversely, let's see who's overachieved so far this season based on these factors.

Name Hard Hit % BABIP Speed xBABIP Differntial
Emilio Bonifacio 21.92% 0.415 5.7 0.275 -0.140
Leonys Martin 25.76% 0.409 7.6 0.301 -0.108
Mike Napoli 32.31% 0.387 1.3 0.285 -0.102
Rajai Davis 25.42% 0.393 6.2 0.291 -0.102
Dayan Viciedo 34.29% 0.405 3.1 0.304 -0.101
Casey McGehee 27.03% 0.366 2.2 0.271 -0.095
Jarrod Saltalamacchia 35.56% 0.391 1.4 0.298 -0.093
Yangervis Solarte 25.40% 0.349 0.8 0.256 -0.093
Everth Cabrera 26.19% 0.379 5.1 0.287 -0.092
Jason Kubel 39.62% 0.415 3.4 0.325 -0.090
Marcell Ozuna 27.03% 0.355 1.4 0.266 -0.089
Christian Yelich 25.35% 0.385 7.8 0.301 -0.084
Yadier Molina 35.29% 0.378 0.9 0.294 -0.084
Brett Gardner 23.81% 0.365 6.0 0.284 -0.081
Eric Hosmer 25.61% 0.344 1.8 0.263 -0.081
Chris Colabello 34.85% 0.379 2.1 0.299 -0.080
Derek Jeter 27.87% 0.344 0.5 0.264 -0.080
Bryce Harper 31.15% 0.377 4.5 0.301 -0.076
Matt Joyce 31.58% 0.364 2.3 0.288 -0.076
Xander Bogaerts 31.82% 0.364 2.0 0.288 -0.076

No surprise here, winter castaway Emilio Bonafacio leads the charge. It's not that he's been bad by any means, it's just that he hasn't been this good. Luck has aided the early season results of this list as their BABIPs aren't supported by their hard hit and/or speed rates, so we'd fully expect them to dip sharply.

While we can see some over and underachievers on this young season, the larger takeaway is that BABIP isn't set at .300 for each individual player. There are underlying attributes that drive it accordingly, only one of which is luck. As Zimmerman has shown us, hitting the ball hard and running fast certainly helps. xBABIP aids us in adjusting the rate we should expect balls in play to become hits on a player by player basis, which is a much smarter approach than applying a basic expectation on the entire league. Hopefully this advancement will become part of the larger baseball lexicon. If it does, expect analysis to become even more refined for specific players and player types.

. . .

Jeff Wiser is an editor and featured writer at Beyond the Box Score and co-author of Inside the 'Zona, an analytical look at the Arizona Diamondbacks. You can follow him on Twitter @OutfieldGrass24.