clock menu more-arrow no yes mobile

Filed under:

Batting Average on Balls in Play

Every few months, I've found that I'll develop a fondness for a particular statistic. I'll constantly reference it to try and prove some point I want to make. I'll throw everything into a spreadsheet and spend (too much) time bending it, playing with it, taking it apart to see how it works.

Lately, that stat has been batting-average-on-balls-in-play, aka BABIP, aka BIPA.

I'm far from the first person to become wrapped up in the BABIP mystery. Ever since Voros McCracken opened up this pandora's box, hundreds of very talented writers and deeply analytical thinkers have approached the topic. I'm neither, but here's my take on some aspects of BABIP. I'm going to present some ideas, and ask all you fine readers to weigh in with your hypotheses.

- - - - - - -

While I normally spend most of my time thinking about baseball at the major league level, there is a month long period in which the collegiate game comes to the forefront. I like to call it "February". Spring Training has yet to begin, but there is baseball being played around the country. Well, if you consider "around the country" to be just the areas south of the 37th parallel (and Hawaii).

As far as I know, there's been precious little research done on BABIP at the collegiate level. In fact, there seems to be little regard for stats at that level in general. There is some data being collected and stored at NCAA-Baseball.com, but it's not really conducive to research. I contacted some of my fellow SABR members in the Collegiate Baseball committee about a college baseball database, and was told "To my knowledge no one has ever tried to create one for historical data."

Well, then I'd have to do this the hard way.

After failing in my attempts to write a script to spider data from conference web sites, I fell back on my old, tried and true methods. Brute force copy-and-paste. A few (too many) hours later, I had team data for 2006 formatted and ready to run in a spreadsheet.

I found that, suprisingly enough, the average BABIP for all of college baseball was .341, a bit higher than I expected. There's been research done by Clay Davenport that shows BABIP rises as the league decreases in the minors, but .341 is pretty high. My first instinct was that those damn metal bats were to blame.

Of course, after looking into it, I found that metal bats were regulated by the NCAA in 2002 to comply with exit velocity parameters that made them perform similarly to wood bats. So my theory was shot that the balls were getting to the infielders more quickly, and therefore reducing the effective range of the players. In my experience, I wouldn't say that NCAA fielding is that poor to account for the large BABIP. Call it a gut feeling, but there's got to be something else at play here.

Any suggestions? Hey, I didn't claim to have all the answers...

- - - - - - -

Now that I've lost half of the readers with an inconclusive spiel about college baseball, I should probably try and bring them back. Everyone likes pictures, so here's one:

What you're looking at is a chart of the BABIP all current pitchers who threw at least 700 innings since 2000. Why 700? Well, it seemed like a nice cut-off point. It's arbitrary.

The first thing I noticed about this was that I could have just as easily put this information in a list. But then I wouldn't have had a nice chart to show you all, would I?

The second thing I noticed was that Barry Zito and Glendon Rusch apparently don't like to play with others. While most everyone is within shouting distance of the league average line of regression that splits the group, Zito and Rusch are off on their own islands, plotting revenge against pseudo-statheads like me that single them out.

It's been well-documented that Zito's curve is hypothesized to function almost like a knuckleball in how it can effectively, and predictably limit BABIP from year to year. But what the heck is going on with Glendon Rusch? He got a lot of press for giving up not one but two homers to Bronson Arroyo last year, but apparently he's been hittable for a long time.

Rusch's singles/9 and line drive % have both been above average for the last few years, and the Chicago Cubs defense hasn't exactly been the best, either. But is that enough to explain the gap between Rusch's BABIP and the rest of the league?

- - - - - - -

Speaking of BABIP outliers, let's take a look at how 2006 came together. I'll start by presenting the top 10, and bottom 10, pitchers with at least 100 innings pitched last season.


NAME BABIP IP AVG SLG ISO DP% FB% GB% LD% PU% 1B/9IP 2B/9IP
1 Ryan Madson .364 134.3 .321 .516 .195 12.6 26.2 44.7 22.1 6.9 7.57 2.61
2 Victor Santos .362 115.3 .321 .510 .188 13.4 30.1 45.1 19.0 5.8 7.88 2.03
3 Jason Johnson .357 115 .335 .501 .166 21.4 17.6 61.1 16.4 4.9 8.77 2.43
4 Odalis Perez .352 126.3 .320 .519 .199 10.0 27.7 46.5 19.9 5.9 7.55 2.78
5 Byung-Hyun Kim .350 155 .295 .463 .168 15.3 30.8 43.4 21.6 4.3 6.79 2.32
6 Kyle Lohse .345 126.7 .298 .445 .147 8.7 27.9 43.9 20.8 7.4 7.53 2.06
7 Oliver Perez .343 112.7 .293 .477 .184 11.1 35.1 32.0 21.0 11.9 7.11 1.52
8 Ben Sheets .342 106 .259 .421 .163 5.9 31.2 42.5 18.3 8.0 5.09 2.80
9 Joe Blanton .341 194.3 .309 .448 .140 15.0 28.9 44.5 18.5 8.2 7.87 2.32
10 Brian Moehler .340 122 .325 .532 .206 13.1 26.8 46.8 20.7 5.7 7.75 2.43
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
Average .350 130.76 .308 .483 .176 12.65 28.23 45.05 19.83 6.9 7.39 2.33

- - -

And the bottom 10:


NAME BABIP IP AVG SLG ISO DP% FB% GB% LD% PU% 1B/9IP 2B/9IP
1 Chris Young .232 179.3 .206 .382 .176 13.0 39.1 27.8 17.5 15.5 4.07 1.00
2 Jered Weaver .239 123 .209 .360 .151 10.5 39.3 31.8 15.8 13.2 4.24 1.39
3 Anibal Sanchez .243 114.3 .217 .335 .118 14.1 28.2 43.5 16.4 11.8 4.88 1.26
4 Scott Elarton .250 114.7 .267 .510 .244 14.8 38.8 30.7 19.5 10.9 4.94 2.12
5 Chuck James .250 119 .232 .428 .195 7.1 40.7 29.5 18.9 10.9 4.39 1.59
6 Michael O'Connor .254 105 .244 .427 .183 11.1 31.7 39.2 16.6 12.5 5.06 1.46
7 Taylor Buchholz .258 113 .248 .466 .218 15.9 29.3 45.5 18.1 7.1 4.54 2.15
8 Carlos Zambrano .259 214 .208 .351 .143 11.4 27.8 49.2 15.6 7.4 4.04 1.72
9 Kenny Rogers .265 204 .253 .401 .148 19.7 24.6 50.1 18.6 6.8 5.69 1.81
10 Josh Beckett .265 204.7 .245 .450 .205 16.5 31.4 46.4 15.3 6.9 4.70 1.93
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
Average .252 149.1 .233 .411 .178 13.41 33.09 39.37 17.23 10.3 4.66 1.64

- - -

I've included what I deem to be some telling rate stats. ISO tells us that the pitchers with high BABIP aren't getting hit any harder than the low BABIP hurlers, just more often. For pitchers with high BABIP rates, they're getting fewer fly balls and pop-ups, and more line drives and ground balls. Fly balls and pop-ups are much easier to field for a defense, so this -- in my opinion -- further supports the notion that fielding is 90% (or more) responsible for BABIP.

It seems to me that the tale of the tape is this; it's all about the singles. There's a big difference in pop-up % (PU%) and singles per nine innings (1B/9IP), which I think are closely related. I have a feeling that there are a lot of flares and bleeders that could have been pop-ups, but the infielders just couldn't get to them.

Of course, there's some doubt when you consider that Jared Weaver had a very low BABIP with the poor Angels defense behind him, and Jason Johnson had a high BABIP with the well-regarded fielders of Boston and Cleveland behind him. Am I missing something here?