It was just another weekday morning. I was on my rise and grind with coffee in hand while shaking off the cobwebs, when this gem was dropped directly in front of me:
Merry May Day! April xBABIP values using Hard Hit% and Speed Score: https://t.co/dCEH9zAGF1— Jeff Zimmerman (@jeffwzimmerman) May 1, 2014
Merry May Day indeed, I clicked Jeff Zimmerman's link and found myself staring at several hundred xBABIP numbers. That was cool and all, but only being somewhat familiar with xBABIP, I decided to dig into the stat and it's principles more deeply than before. While Zimmerman didn't divulge his exact methodology, presumably because he's working with non-public data, he did confirm that he was utilizing batted ball outcomes (hard hit percentage) and batter speed (speed scores) to come to his findings.
Before we get to those findings, however, I think it's important to have a brief conversation about BABIP in general. BABIP, or batting average on balls in play, refers to the rate at which batted balls become hits (excluding home runs). Baseball Prospectus calculates this as:
But perhaps some of us have been far too literal when reading BABIP disclaimers in the past. We passively accept a .300 BABIP as the benchmark and expect results to fluctuate in accordance with the principles of regression. More importantly, the casual passerby will often treat BABIP as something a batter has little or no control over, but that's not the case. Yes, batting average on balls in play is a volatile and difficult stat to project and it doesn't stabilize for several seasons (2.5 seasons for hitters, 8 seasons for pitchers), but that doesn't mean we have to helplessly accept a .300 BABIP as the standard for all hitters. We often equate BABIP to luck, but there's a saying around sports, and life in general, that one often makes his or her own luck. BABIP falls in this category. A gross adjustment to a batter's approach (trading grounders for flies, or vice versa) or major mechanical changes (changes in stance or swing) can tweak a player's BABIP, as can the defense he's hitting into (such as the shift) and other factors. Luck is part of the equation, but it's just one part.
As we've learned, each hitter has an individual BABIP, but because these can fluctuate so wildly from season to season, it can be hard to really pin it down until we have several seasons of data to work with. Miguel Cabrera is arguably the greatest hitter in the game and has a .346 BABIP over 10 qualified seasons. We'd hardly say he's been lucky as he makes consistent hard contact. Kurt Suzuki, on the other hand, does not make a lot of hard contact and has a .269 career BABIP. If you've seen him play, it doesn't necessarily seem like he's been unlucky, he's just not very good at hitting baseballs (at least not when compared to Miguel Cabrera, which might be an unfair comparison). You get the idea, the ability of the batter, especially when it comes to making consistently hard contact, has something to do with his BABIP.
Enter Billy Hamilton. If you've seen the young speedster play this season, you know he's not making a lot of hard contact. He's not the lacing lasers into the gaps for Cincinnati, but he makes up for the lack of hard contact by having the greatest foot speed in the game. His .322 BABIP over his first 40 major league games is largely a reflection of his ability to beat out infield hits. Ichiro was famous for this in Seattle as have been others over the years. We know that just as hitting the ball hard manipulates a player's BABIP, so does his speed.
Here's where Zimmerman's data comes back into play. He calculated this xBABIP data by using hard hit percentages and speed scores, taking quality contact and batter speed into account to calculate what a player's BABIP should be based upon these factors. If a mad scientist created a baseball monster by blending Miguel Cabrera's bat with Billy Hamilton's speed, we could calculate what that monster's BABIP would be. We don't have to rely on hypothetical monsters, however. We have 194 qualified hitters we can look at instead. Please note, this is Zimmerman's original data that he shared. I simply filtered it for qualified hitters and added the "Differential" category.
Rather than discuss 194 qualified hitters and their current BABIP and xBABIP numbers, I'd rather focus on that last column, titled "Differential." Here I simply found the difference between the players' current BABIP and calculated xBABIP (xBABIP - BABIP). Let's identify the top players who are underperforming their xBABIP at the moment based on how often they've hit the ball hard and/or how well they run. In other words, based on Zimmerman's work, these are the tough luck guys so far in 2014.
|Name||Hard Hit %||BABIP||Speed||xBABIP||Differential|
|Alejandro De Aza||33.33%||0.191||5.6||0.316||0.125|
Based on how often these guys have hit the ball hard and/or fast they've run, we'd fully expect to see more balls in play go for hits. Asdrubal Cabrera and Carlos Gonzalez stand out here as they've hit the ball exceedingly hard in 2014 and run well, yet both have sub-.270 BABIPs so far. If I'm trying to identify guys who are about to break out, this is somewhere to look.
Conversely, let's see who's overachieved so far this season based on these factors.
|Name||Hard Hit %||BABIP||Speed||xBABIP||Differntial|
No surprise here, winter castaway Emilio Bonafacio leads the charge. It's not that he's been bad by any means, it's just that he hasn't been this good. Luck has aided the early season results of this list as their BABIPs aren't supported by their hard hit and/or speed rates, so we'd fully expect them to dip sharply.
While we can see some over and underachievers on this young season, the larger takeaway is that BABIP isn't set at .300 for each individual player. There are underlying attributes that drive it accordingly, only one of which is luck. As Zimmerman has shown us, hitting the ball hard and running fast certainly helps. xBABIP aids us in adjusting the rate we should expect balls in play to become hits on a player by player basis, which is a much smarter approach than applying a basic expectation on the entire league. Hopefully this advancement will become part of the larger baseball lexicon. If it does, expect analysis to become even more refined for specific players and player types.
. . .