In 1999 Voros McCracken put forth the idea that pitchers don't differ very much in the batting average they allow on balls in play (BABIP) and that what really matters is their walks, strikeouts and homeruns. He found that there was little correlation from year to year in in BABIP. J. C. Bradburry provides a good angle on this here. He also shows that other stats, like the frequency of strikeouts, walks and HRs allowed are much more highly correlated from year to year (the defense independent stats or DIPS). In this article, I look at how much of a pitcher's yearly BABIP, in the long run, is explained by DIPS.
What I do here is run a regression in which a pitcher's yearly BABIP is a funtion of his DIPS stats, all per batter faced (BFP). The list of pitchers includes the 14 pitchers who had 20+ seasons with 100+ IP. This is so each pitcher has a high number of observations of seasons with a fairly high number of IP. The idea was to see if a guy pitched better in terms of the frequency of strikeouts, walks and HRs allowed, it might be harder for batters to make solid contact and they would pop out more or ground out weakly more often. So I expected the sign on the coefficients for HRs and BBs to be positive and negative on strikeouts.
The table below shows the regression results for the pitchers
For example, the equation for Spahn is
BABIP = 0.19 + 1.91*HR + 0.22*BB + 0.15*SO
The r-squared of .38 means that 38% of Spahn's yearly variation in BABIP is explained by his DIPS (he had 20 seasons with 100+ IP). The DIPS stats are all per BFP. The difference between his highest and lowest HR rate is .019. That times 1.91 is .036. So going from his best to worst season in in HR rate allowed made a .036 difference in BABIP for Spahn. For BB rate, the best to worst difference was .044. That times .22 is .010. For strikeouts, the difference was .06, leading to a .009 BABIP change. But the sign was positive, meaning that as he struck out more batters, his BABIP actually went up.
The odd thing is that although most pitchers have the signs you would expect, some are going the wrong way. For HR rate, 9 of the 14 signs are correct. For BBs, only half, 7, are correct (but the average was positive, in line with expectations). For SOs, 10 signs are correct. 9 of the 14 pitchers had an r-squared of .20 or more. That means that for them, 20% or more of their yearly BABIP is explained by DIPS. The average r-squared was .21. In case any one is curious, here is an equation in which the coefficient values are the average of those in the table:
BABIP = 0.290 + .38*HR + 0.017*BB - 0.15*SO
It may actually be hard to get a decent r-squared since the fielders behind these pitchers change over their 20+ year careers. That is partly due to personnel changes and partly due to changing teams. It seems that if there is supposed to be no year to year correlation in BABIP, that it would be unlikely that there would be any percentage of it (BABIP) that would be explained by other stats. I am not sure what this all means and there is also the issue of the widely varying coefficient values across pitchers as well as some signs going the wrong way.