BABIP, HR/FB, and Batted Ball Type by Pitch Location
DIPS theory--the idea that a pitcher has little control over the outcome of balls in play--is possibly sabermetrics' most controversial idea. Many fans maintain that a pitcher, by consistently locating in the right spots, can induce weak contact and thus lower his batting average on balls in play. I took all balls in play (including home runs) from 2008, and using the gameday XML data, assigned them to one of 13 bins (I reversed the coordinate system for LHB's). Bins 1-9 are all inside the strike zone, while Bins 10-13 are balls.
For each bin, I calculated BACON (batting average on contact, including HRs), SLGCON (slugging on contact), BABIP (batting average on balls in play, excluding HRs), SLGBIP (slugging on balls in play), GB%, LD%, FB%, IF/FB%, HR/OFB, and batting averages for each of the batted ball types. A graph of BACON and SLGCON follows below:
This confirms Dave Allen's results, although he used a continuous approach rather than bins. The best zone for hitters is located along a diagonal line extending from the lower-inside corner to the high-outside corner. Along this diagonal line, hitters are best able to get the barrel of the bat on the ball, while pitches up and in and down and away are too far from the barrel to be hit solidly. As we would expect, hitters get much worse results on pitches outside of the strike zone.
By examining batted ball types, we can get a better idea of how pitches in various bins are hit. GB% follows a very predictable pattern:
Here we see that lower pitches result in dramatically more groundballs than higher pitches, and outside pitches result in somewhat more grounders than inside pitches. Interestingly, very inside pitches result in markedly more ground balls than one would expect, perhaps due to batters' inability to drive these pitches.
FB% is merely the reverse of GB rate, while LD rate is essentially random, except for inside pitches. LD% varies from 19.2% to 20.5% for pitches in Zones 2-9; however, in Zone 1 (up and in), it is 17.5%. In addition, Zones 10-13 all exhibit well below average line drive rates, ranging from 16.1% to 17.8%. This shows, once again, that batters have difficulty making solid contact up and in and outside of the zone. Zone 1 also exhibits a significantly lower Batting Average on Line Drives, at .710--Zones 2-8 range from .730 to .756, while Zone 9 is at .716. Furthermore, Zone 10 (balls inside) has the highest IF/FB ratio (43.3%) while Zone 1 is second at 36.1%. Once again, this confirms that batters are having a hard time driving the high inside pitch. Pitchers who throw high and inside should expect a lower BABIP, more infield flies and less line drives.
Interestingly, HR/OFB varies dramatically by location:
Surprisingly, pitches low and inside result in the highest HR/FB rate (though pitches right down the middle are second). Outside pitches, as would be expected, have a lower HR/FB rate. But the most surprising result is the degree of correlation between pitch location and HR/FB rate. These results seem to indicate that by pitching away, a pitcher can noticeably reduce his HR/FB rate--yet pitchers' HR/FB rate show a strong tendency to revert to the league average of 11%. The only way to solve this discrepancy is by constructing a model to estimate HR/FB and comparing that model to actual HR/FB.
Breakdown by Pitch Type
Due to the limitations of the gameday pitch classification algorithm and small sample sizes, I would be wary of drawing too many conclusions from the individual pitch data.
Fastballs (four seam)The results for fastballs were almost identical to the results for all pitches.
Change-upsThe high inside changeup was slightly less effective than the high inside fastball (.298 BACON, .536 SLGCON) though still far more effective than middle-inside or low inside. However, on high inside changeups, pitchers still induced tons of infield fles (37.7%), fewer line drives than average (18.0%), and a significantly lower batting average on those line drives (.660).
Changeups low and in were crushed for a 27.8% HR/FB rate, compared to 15.4% for fastballs, while changeups middle-in had a 21.3% HR/FB (12.3% for fastballs). This confirms conventional wisdom that changeups are much more effective on the outer part of the plate.
CurvesBACON and SLGCON for curves is largely similar; however, the SLGCON on curves low and inside (.690) is much higher than the SLGCON for curves right down the middle (.625). This seems to confirm that slow pitches are a bad idea inside.
The HR/FB data is more interesting. Curveballs up and in have the highest HR/FB of any curveball, at 21.8%. This might be a fluke of small sample size (133 fly balls), particularly in light of the contradictory result obtained by high-and-tight sliders (see below).
Sliders
The HR/FB trend observed in curveballs does not hold for sliders (10.9% HR/FB on high inside sliders). Gameday's pitch classification algorithm often has trouble distinguishing curves and sliders; thus I suspect that the high HR/FB on up and in curveballs is nothing more than a statistical fluke.
Sinkers (Two seam fastballs)
Sinkers have the least data out of all the pitch types--I suspect that Gameday classified a lot of sinkers as fastballs. Nevertheless, pitchers induce significantly more ground balls on sinkers than on fastballs--56% for sinkers, compared to 43.6% for four-seam fastballs.
What to do next
With this data, we can construct a model to predict HR/FB by pitch location. In particular, I wonder if the large variation between HR/FB in different locations translates to large variations between individual pitchers.
Data
The data is located here on Google Docs.
6 recs |
11 comments
|
Comments
Very nice work
As MGL (and others) are fond to remind us, pitch location does not equal pitch intent.
So while it’s definitely useful to look at these results, we need to be careful not to draw too many conclusions about how a pitcher should pitch based on them.
I agree--we need to track the catcher's target in order to determine the pitcher's "control"
But if we’re just worried about determining past performance, I think we should use the actual location. For instance, normalizing HR/FB based on pitch location (and possibly also pitch type) would produce a better luck-independent pitching metric than FIP or xFIP.
by Alex Krolewski on Jul 21, 2009 2:01 PM EDT up reply actions
Definitely
For value this is the way to go.
Again, nice work.
by Dan Turkenkopf on Jul 21, 2009 7:51 PM EDT up reply actions
You could expand beyond location
Velocity and Movement likely have just as large of an effect on HR/FB as location. If you could create, I dunno, 100 or so bins with all combination of location, velocity and movement, that would be amazing.
Derosa.
by vivaelpujols on Jul 21, 2009 9:32 PM EDT up reply actions
Right--the problem is the sample size
Just dividing the data by pitch type results in sample size issues—with 100 bins, some would have only 2 or 3 batted balls even with a years’ worth of data. Ideally, with 10 or so years we could create a multi-year average since these numbers shouldn’t change too much year to year.
In the end, I think the best approach is to model HR/FB rate based on location, and then look at overperformers and underperformers to determine if the model has any biases.
by Alex Krolewski on Jul 22, 2009 12:18 AM EDT up reply actions
I haven't checked out the data
But I would assume that there were more than 200-300 fly balls hit in the majors last year. In fact, according to Baseball Reference, there have been 14,460 plate appearances that have ended with a fly ball this year. Over a full season, that’s over 20,000 samples. If you had data for 3 seasons, 2007-2009, you would have about 70,000 samples.
I would then suggest forgoing the pitch classifications, and create 100 or so bins based on different combination’s of movement, velocity and location. Again, you are the one who has done the initial work, but I fail to see how doing that would result in small sample size problems.
Derosa.
by vivaelpujols on Jul 22, 2009 1:44 AM EDT up reply actions
It's small sample size per bin that's the concern
Some of your bins would be chock full of pitches – relatively straight 88 mph four-seamers up and in perhaps, while others will have very few pitches – 65 mph curve balls up and in. (I know you said ignore the classifications, but this was an easier way to illustrate)
This is just conjecture, but I wouldn’t be surprised if that’s how it turned out.
Without at least smoothing the data, you’re likely going to get some funky results. And I’m not enough of a statistician to know whether smoothing is enough here.
by Dan Turkenkopf on Jul 22, 2009 8:00 AM EDT up reply actions
"Hundreds of bins" might work
With 2.5 years of data, divided by pitch type, each bin would probably be large enough to alleviate small sample size concerns. In fact, we could probably divide the strikezone into 36 bins (rather than 9). I have already looked at a 44-bin approach (8 outside of the zone instead of 4 inside) and it looks like the data stabilizes around 600-700 AB. So if i used 2.5 years of data instead of 1, and I combined the smallest zones together, then I could probably model HR/FB by location and pitch type.
by Alex Krolewski on Jul 23, 2009 12:34 AM EDT up reply actions
One should be careful about using the word "would."
“Could” or “should” are probably more appropriate here, at least until you actually build and then test the metric.
Hmm... higher HR/FB rates being inside makes sense to me.
Hitters keep the bat closer to their body to try to get the sweet spot on the bat, end up increasing bat speed.
@bs_uf15bosox9be:OverTheMonster-ALLERGEN WARNING:May contain PB.

by 


























