DIPS theory--the idea that a pitcher has little control over the outcome of balls in play--is possibly sabermetrics' most controversial idea. Many fans maintain that a pitcher, by consistently locating in the right spots, can induce weak contact and thus lower his batting average on balls in play. I took all balls in play (including home runs) from 2008, and using the gameday XML data, assigned them to one of 13 bins (I reversed the coordinate system for LHB's). Bins 1-9 are all inside the strike zone, while Bins 10-13 are balls.
For each bin, I calculated BACON (batting average on contact, including HRs), SLGCON (slugging on contact), BABIP (batting average on balls in play, excluding HRs), SLGBIP (slugging on balls in play), GB%, LD%, FB%, IF/FB%, HR/OFB, and batting averages for each of the batted ball types. A graph of BACON and SLGCON follows below:
This confirms Dave Allen's results, although he used a continuous approach rather than bins. The best zone for hitters is located along a diagonal line extending from the lower-inside corner to the high-outside corner. Along this diagonal line, hitters are best able to get the barrel of the bat on the ball, while pitches up and in and down and away are too far from the barrel to be hit solidly. As we would expect, hitters get much worse results on pitches outside of the strike zone.
By examining batted ball types, we can get a better idea of how pitches in various bins are hit. GB% follows a very predictable pattern:
Here we see that lower pitches result in dramatically more groundballs than higher pitches, and outside pitches result in somewhat more grounders than inside pitches. Interestingly, very inside pitches result in markedly more ground balls than one would expect, perhaps due to batters' inability to drive these pitches.
FB% is merely the reverse of GB rate, while LD rate is essentially random, except for inside pitches. LD% varies from 19.2% to 20.5% for pitches in Zones 2-9; however, in Zone 1 (up and in), it is 17.5%. In addition, Zones 10-13 all exhibit well below average line drive rates, ranging from 16.1% to 17.8%. This shows, once again, that batters have difficulty making solid contact up and in and outside of the zone. Zone 1 also exhibits a significantly lower Batting Average on Line Drives, at .710--Zones 2-8 range from .730 to .756, while Zone 9 is at .716. Furthermore, Zone 10 (balls inside) has the highest IF/FB ratio (43.3%) while Zone 1 is second at 36.1%. Once again, this confirms that batters are having a hard time driving the high inside pitch. Pitchers who throw high and inside should expect a lower BABIP, more infield flies and less line drives.
Interestingly, HR/OFB varies dramatically by location:
Surprisingly, pitches low and inside result in the highest HR/FB rate (though pitches right down the middle are second). Outside pitches, as would be expected, have a lower HR/FB rate. But the most surprising result is the degree of correlation between pitch location and HR/FB rate. These results seem to indicate that by pitching away, a pitcher can noticeably reduce his HR/FB rate--yet pitchers' HR/FB rate show a strong tendency to revert to the league average of 11%. The only way to solve this discrepancy is by constructing a model to estimate HR/FB and comparing that model to actual HR/FB.
Breakdown by Pitch Type
Due to the limitations of the gameday pitch classification algorithm and small sample sizes, I would be wary of drawing too many conclusions from the individual pitch data.Fastballs (four seam)
The results for fastballs were almost identical to the results for all pitches.Change-ups
The high inside changeup was slightly less effective than the high inside fastball (.298 BACON, .536 SLGCON) though still far more effective than middle-inside or low inside. However, on high inside changeups, pitchers still induced tons of infield fles (37.7%), fewer line drives than average (18.0%), and a significantly lower batting average on those line drives (.660).
Changeups low and in were crushed for a 27.8% HR/FB rate, compared to 15.4% for fastballs, while changeups middle-in had a 21.3% HR/FB (12.3% for fastballs). This confirms conventional wisdom that changeups are much more effective on the outer part of the plate.Curves
BACON and SLGCON for curves is largely similar; however, the SLGCON on curves low and inside (.690) is much higher than the SLGCON for curves right down the middle (.625). This seems to confirm that slow pitches are a bad idea inside.
The HR/FB data is more interesting. Curveballs up and in have the highest HR/FB of any curveball, at 21.8%. This might be a fluke of small sample size (133 fly balls), particularly in light of the contradictory result obtained by high-and-tight sliders (see below).
The HR/FB trend observed in curveballs does not hold for sliders (10.9% HR/FB on high inside sliders). Gameday's pitch classification algorithm often has trouble distinguishing curves and sliders; thus I suspect that the high HR/FB on up and in curveballs is nothing more than a statistical fluke.
Sinkers (Two seam fastballs)
Sinkers have the least data out of all the pitch types--I suspect that Gameday classified a lot of sinkers as fastballs. Nevertheless, pitchers induce significantly more ground balls on sinkers than on fastballs--56% for sinkers, compared to 43.6% for four-seam fastballs.
What to do next
With this data, we can construct a model to predict HR/FB by pitch location. In particular, I wonder if the large variation between HR/FB in different locations translates to large variations between individual pitchers.
The data is located here on Google Docs.