Determining Batted Ball Rates using Pitch Type and Location

It is well-established that pitchers have control over their ground ball and fly ball rates--some pitchers, like Roy Halladay, are known for their extreme ground ball tendencies.  But what allows these pitchers to achieve a markedly different batted ball profile from the average pitcher?  I decided to use Pitch f/x data to determine whether batted ball rates depend on pitch type (as classified by Gameday) and location.

First, I divided the strike zone up into 9 zones of equal area and then added 4 additional zones outside of the strike zone corresponding to inside, outside, high and low pitches.  Then I determined the league average batted ball rates for each pitch type in each of the 13 segments for 2008.  Using these averages, I calculated each pitcher's expected batted ball rates based on his pitch types and locations.  Splitters and knuckleballs were so uncommon that I ran into sample size issues when I divided them among the 13 zones; therefore, I did not incorporate pitch location data for them.

The results were somewhat surprising: the correlation between expected ground ball percentage and actual ground ball percentage was low, only 0.449 for all pitchers with 50 or more LD allowed (corresponding to about 80 innings pitched).  Furthermore, the range of expected ground balls was far too flat:

As you can see, predicted GB% ranged from 40% to 50% while actual GB% ranged from 30% to 60%.  Even if you regress GB% to the mean, some pitchers end up well above 50%; therefore, we can safely conclude that predicted GB% does not correlate well with actual GB%.

I suspect the main reason for this low correlation is limitations in Gameday's pitch classification model.  With a better pitch classification algorithm, the correlation would probably increase.  Furthermore, my model only considered pitch location and pitch type; velocity and movement probably also play a major role in determining batted ball types.  Originally, I thought that pitch type would duplicate pitch movement, since all pitches of a particular type have roughly the same movement; however, it appears that different movements of the same pitch lead to different batted ball results. In conclusion, we cannot determine batted ball types solely based on Gameday's pitch types and pitch location data.