Note: Before bothering to read this, note that the model below is basically a tautology and, therefore, worthless. Thanks to Tom Tango, Sky Kalkman, and Colin Wyers for pointing this out. This is a great example of getting too wrapped up in your own analysis and not stepping back to say "hey, does this even make sense?". Feel free to read on if you like, but do not bother to use the model since, well, it isn't useful.
Jose Bautista isquickly becoming a household name, namely due to his dramatic increase in power over the last year. Bautista went from a 13 HR hitter in 2009 to blasting 54 and leading the league in 2010. So far in 2011, he's off to a similar start with 8 home runs in his first 21 games.
Can we spot players similar to Bautista in season? That is, can we look at a player's performance in the early part of the year and predict if they are poised for a similar power surge?
It's a complicated question, one that I'm not ready to fully answer. However, I thought the place to start would be to look at whether we can explain/predict what leads to a change in a player's home runs numbers year-to-year.
I decided to start small and see what drove changes in home run production between 2009 and 2010. I looked at the percent change (not raw change) year-to-year in readily available data and finally settled on three measures: home run to fly ball ratios (HR/FB), percentage of fly balls (FB%), and pull rate (PULL%).
A change in HR/FB can indicate that a player has changed their approach, their conditioning, or something else. A change in FB% intuitively should drive change in home runs, since the more fly balls you hit the more likely you are to have some clear the fence. Finally, I thought about pull rate since a change in home runs is likely linked to how often a hitter pulls the ball. Opposite field power is not as prevalent as pull power, so the quickest way to home run glory is to start pulling the ball more in the air (a route taken by Bautista).
Next, I did some statistical analysis to see the strength of the relationship between these variables and a change in a player's home runs per plate appearance (HR/PA). After some crunching, I came away with the following formula:
%Change in HR/PA=(%Change in HR/FB*.963)+(%Change in FB%*.893)+(%Change in PULL*.269)+.006
The formula has an R=.988 and an R2=.977. So essentially, this formula explains almost 98% of the variation in HR/PA for players between 2009 and 2010.
Retroactively applied to the data, we get a really good fit between actual home runs in 2010 and home runs predicted by the model. The standard deviation of the difference between actual and predicted HR's is only 1.16.
Now, this is only two year's worth of data (and only really one run of the model), but intuitively it makes sense that changes in a player's HR/FB, FB%, and PULL% would indicate if they've experienced some qualitative change in their skill and approach that could predict a jump in power production.
Next week I will present the second part of the analysis where I likely look like a fool and try to predict who are those players that might see a Bautista-like increase in home run production based on last year's numbers and their performance to date.
Until then, feel free to leave your own predictions for the Bautista candidates in the comments.