Predicting the Next Jose Bautista: Part I, Explaining HR Change
Note: Before bothering to read this, note that the model below is basically a tautology and, therefore, worthless. Thanks to Tom Tango, Sky Kalkman, and Colin Wyers for pointing this out. This is a great example of getting too wrapped up in your own analysis and not stepping back to say "hey, does this even make sense?". Feel free to read on if you like, but do not bother to use the model since, well, it isn't useful.
Jose Bautista isquickly becoming a household name, namely due to his dramatic increase in power over the last year. Bautista went from a 13 HR hitter in 2009 to blasting 54 and leading the league in 2010. So far in 2011, he's off to a similar start with 8 home runs in his first 21 games.
Can we spot players similar to Bautista in season? That is, can we look at a player's performance in the early part of the year and predict if they are poised for a similar power surge?
It's a complicated question, one that I'm not ready to fully answer. However, I thought the place to start would be to look at whether we can explain/predict what leads to a change in a player's home runs numbers year-to-year.
I decided to start small and see what drove changes in home run production between 2009 and 2010. I looked at the percent change (not raw change) year-to-year in readily available data and finally settled on three measures: home run to fly ball ratios (HR/FB), percentage of fly balls (FB%), and pull rate (PULL%).
A change in HR/FB can indicate that a player has changed their approach, their conditioning, or something else. A change in FB% intuitively should drive change in home runs, since the more fly balls you hit the more likely you are to have some clear the fence. Finally, I thought about pull rate since a change in home runs is likely linked to how often a hitter pulls the ball. Opposite field power is not as prevalent as pull power, so the quickest way to home run glory is to start pulling the ball more in the air (a route taken by Bautista).
Next, I did some statistical analysis to see the strength of the relationship between these variables and a change in a player's home runs per plate appearance (HR/PA). After some crunching, I came away with the following formula:
%Change in HR/PA=(%Change in HR/FB*.963)+(%Change in FB%*.893)+(%Change in PULL*.269)+.006
The formula has an R=.988 and an R2=.977. So essentially, this formula explains almost 98% of the variation in HR/PA for players between 2009 and 2010.
Retroactively applied to the data, we get a really good fit between actual home runs in 2010 and home runs predicted by the model. The standard deviation of the difference between actual and predicted HR's is only 1.16.
Now, this is only two year's worth of data (and only really one run of the model), but intuitively it makes sense that changes in a player's HR/FB, FB%, and PULL% would indicate if they've experienced some qualitative change in their skill and approach that could predict a jump in power production.
Next week I will present the second part of the analysis where I likely look like a fool and try to predict who are those players that might see a Bautista-like increase in home run production based on last year's numbers and their performance to date.
Until then, feel free to leave your own predictions for the Bautista candidates in the comments.
6 comments
|
1 recs |
Do you like this story?
Comments
Gardner is a good candidate
Also, his FB% is up 16% and his Pull% is up 25% over last year.
Columnist at Beyond the Box Score
I don't understand
Are you saying you looked at change in HR/FB, change in FB/BIP, and change in Pulls/BIP between 2009 and 2010, and you found a correlation to HR/PA between 2009 and 2010?
Of course R will be almost 1.
What you have to do is look at the change in HR/FB, FB/BIP, Pulls/BIP between 2008 and 2009, and correlate to change in HR/PA between 2009 and 2010.
by tangotiger on Apr 29, 2011 2:08 PM EDT reply actions 1 recs
Thanks for the comment
Generally, I had a similar conversation with Colin Wyers and Sky Kalkman and the model is somewhat unecessary, but to answer your question:
No, I didn’t look at 08-09 and correlate to change in 09-10—I looked at 09-10 and change from 10 to the first month of 2011. I will look at 08-09, though, as Sky and Colin pointed out that HR/FB and FB% essentially = HR/PA, so you don’t really gain much by doing the complicated calc that I did.
Columnist at Beyond the Box Score
Strike what I said
Yes, the initial model was tested as you suggested, which ties into why Sky and Colin said it what you said re: R of 1.
Columnist at Beyond the Box Score
Right
Any time you get an r of close to 1, you are pretty much getting into a correlation of a=a, but obscured by something like a = sqrt(a^2+1) or something.
That’s why for example if you correlate RC to LWTS or to OPS, you get r at the .95 level. It’s all the same thing, but just re-arranged in a somewhat less than an ideal manner.
For what you are doing, correlating a change in one thing to a change in another, you should be thrilled if you get an r approaching 0.5, but more likely you’ll get an r around 0.1 or 0.2 if there’s anything there.

by 

































