Predicting Player Regression In The Second Half
| Brennan Boesch in 2010, Adam Lind in 2011, Who in 2012? |
In April 2010, the Tigers called up rookie Brennan Boesch to replace the injured Carlos Guillen. Boesch, the Tiger's 25th best prospect going into the season, had been tearing up the International League with Toledo with a slash line of .379/.455/.621. He continued his hot play in Detroit, entering July with lines of .332/.380/.602 and 2.1 WAR in 56 games.
In 2011, former blue-chipper Adam Lind had appeared to rebound from his horrendous 2010 season that had seen him produce -0.8 WAR. In 57 games, he produced .312/.361/.569 and a solid 1.9 WAR. He seemed to be showing a glimmer of the promise expected from his 2009 breakout season.
What do 2010 Boesch and 2011 Lind have in common? They both crashed back to earth in the second half. Boesch, who had a high .372 BABIP and low BB/K ratio of 0.36, finished the season at .256/.320/.416 and 0.6 WAR. Yes, he managed a -1.5 WAR in 76 games in the second half. Lind didn't show as many signs as Boesch, but he finished out with 0.5 WAR, or a -1.4 WAR second half.
Now, few players have a decline quite that spectacular. A majority of players do manage to keep their pace from the first half, +/- a little margin of error. However, as the saying goes, water seeks its own level, so there are players who can't continue their hot start. And of course analysts, owners, fantasy owners, and fans would like to know which of their favorites won't live up to expectations from the first half.
Methodology
First of all, we need to define when a player has regressed. I mean, if a player goes from a first half of 1.9 to a second half of 1.8, that doesn't really imply a regression. That's just a slight dip, and not something to be worried about. Therefore, a player in regression is defined as someone whose second half production is less than 3/4 of their first half production. So, for a first half WAR of 1.5, a player in regression has a second half WAR of less than 1.2. First half WAR of 2, 1.5; 4, 3; and so on.
Now, to take a look at position players from 2012, data was collected from 2010 and 2011. The regression of pitchers is another post for another day. Then, the data was limited to qualifying players with the following conditions: The player stayed on the same team through the whole year, the player played in every month of the second half, and players who had at least 1.5 WAR through the first half.
Finally, several categories were chosen as explanatory variables for the data set. These included, but were not limited to, Avg, OBP, SLG, BABIP, ISO, BB/K, GB/FB, OSwing%, ZSwing%, and SwStr%.
(If you want to skip this next technical math-heavy paragraph, feel free to skip to the next header.)
In order to model the 2010-11 data, I fell back on my statistics background and applied some basic Bayesian methodology. I applied a simple conjugate prior hierarchical model, utilizing Albert and Chib (1993) data augmentation. After running a Gibbs sampler on the model, I took the maximum a posteriori estimates of the effects to use for prediction. In running this model and checking it with 10-fold cross validation, I got between 70-75% accuracy on predictions, which frankly isn't too bad considering the high amount of variability involved in baseball.
Finally, I input the 2012 statistics for the players in order to get a predicted probability that the player regresses. If you want more information on the model in particular, leave a comment and I can get back to you.
Who's In Trouble In 2012?
So, the obvious question is, which players are going to crash in the second half? In the table below, I list all qualified batters who fit the original 2010-11 dataset requirements, their slash lines, and WAR. The highlighted players are the players who are predicted to regress in the second half. Now, this doesn't predict how much they will regress. Again, that could be another later post.
| Name | Team | Slash Line | WAR | Regress? |
| A.J. Pierzynski | White Sox | 0.285/0.332/0.517 | 2.2 | Yes |
| Aaron Hill | Diamondbacks | 0.301/0.362/0.516 | 2.8 | No |
| Adam Dunn | White Sox | 0.213/0.363/0.515 | 1.5 | No |
| Adam Jones | Orioles | 0.300/0.343/0.554 | 3.3 | Yes |
| Adrian Beltre | Rangers | 0.328/0.360/0.534 | 2.8 | Yes |
| Alejandro De Aza | White Sox | 0.295/0.363/0.411 | 2.2 | No |
| Alex Gordon | Royals | 0.273/0.364/0.417 | 3 | No |
| Alex Rios | White Sox | 0.306/0.342/0.491 | 2.3 | Yes |
| Alex Rodriguez | Yankees | 0.265/0.355/0.437 | 1.5 | Yes |
| Alfonso Soriano | Cubs | 0.273/0.331/0.494 | 2.2 | Yes |
| Andre Ethier | Dodgers | 0.291/0.357/0.491 | 2.6 | No |
| Andrew McCutchen | Pirates | 0.346/0.401/0.593 | 3.6 | Yes |
| Angel Pagan | Giants | 0.293/0.340/0.415 | 1.7 | Yes |
| Aramis Ramirez | Brewers | 0.262/0.337/0.464 | 1.9 | Yes |
| Asdrubal Cabrera | Indians | 0.298/0.379/0.493 | 2.3 | No |
| Austin Jackson | Tigers | 0.326/0.408/0.537 | 3.6 | No |
| Ben Zobrist | Rays | 0.252/0.375/0.458 | 2.3 | No |
| Brandon Phillips | Reds | 0.288/0.330/0.446 | 2.3 | No |
| Brett Lawrie | Blue Jays | 0.293/0.341/0.438 | 2.8 | No |
| Buster Posey | Giants | 0.296/0.363/0.472 | 2 | Yes |
| Carlos Beltran | Cardinals | 0.310/0.396/0.576 | 2.9 | Yes |
| Carlos Gonzalez | Rockies | 0.337/0.394/0.604 | 2.3 | Yes |
| Carlos Ruiz | Phillies | 0.358/0.423/0.585 | 4 | Yes |
| Chase Headley | Padres | 0.271/0.369/0.415 | 3.2 | No |
| Colby Rasmus | Blue Jays | 0.257/0.312/0.476 | 2 | No |
| Dan Uggla | Braves | 0.235/0.363/0.414 | 2.4 | No |
| David Freese | Cardinals | 0.280/0.331/0.481 | 1.5 | Yes |
| David Ortiz | Red Sox | 0.305/0.393/0.613 | 2.4 | No |
| David Wright | Mets | 0.355/0.449/0.564 | 4.5 | No |
| Denard Span | Twins | 0.275/0.344/0.391 | 2.2 | No |
| Dexter Fowler | Rockies | 0.286/0.381/0.536 | 1.9 | No |
| Edwin Encarnacion | Blue Jays | 0.289/0.365/0.570 | 2.3 | No |
| Elvis Andrus | Rangers | 0.305/0.381/0.411 | 2.9 | No |
| Giancarlo Stanton | Marlins | 0.283/0.363/0.547 | 2.9 | No |
| Gregor Blanco | Giants | 0.254/0.344/0.388 | 2 | No |
| Hanley Ramirez | Marlins | 0.259/0.333/0.441 | 1.6 | No |
| Hunter Pence | Phillies | 0.286/0.351/0.498 | 1.7 | Yes |
| Ian Desmond | Nationals | 0.276/0.305/0.483 | 2.6 | Yes |
| Ian Kinsler | Rangers | 0.276/0.336/0.450 | 2.1 | No |
| Ichiro Suzuki | Mariners | 0.274/0.301/0.372 | 1.8 | No |
| Jamey Carroll | Twins | 0.251/0.332/0.295 | 1.5 | No |
| Jason Heyward | Braves | 0.272/0.344/0.502 | 3.3 | No |
| Jason Kipnis | Indians | 0.275/0.335/0.426 | 2.4 | No |
| Jed Lowrie | Astros | 0.261/0.347/0.486 | 2.6 | No |
| Jimmy Rollins | Phillies | 0.263/0.317/0.409 | 2.2 | Yes |
| Joe Mauer | Twins | 0.325/0.416/0.448 | 2.5 | No |
| Joey Votto | Reds | 0.350/0.471/0.632 | 4.8 | No |
| Jose Altuve | Astros | 0.309/0.351/0.453 | 1.9 | Yes |
| Jose Bautista | Blue Jays | 0.239/0.359/0.549 | 2.9 | No |
| Josh Hamilton | Rangers | 0.319/0.385/0.652 | 3.7 | Yes |
| Josh Reddick | Athletics | 0.260/0.342/0.517 | 3.1 | No |
| Josh Willingham | Twins | 0.268/0.381/0.532 | 2.5 | Yes |
| Kyle Seager | Mariners | 0.252/0.308/0.442 | 1.9 | No |
| Mark Trumbo | Angels | 0.313/0.363/0.614 | 2.7 | Yes |
| Martin Prado | Braves | 0.323/0.387/0.467 | 3.7 | No |
| Matt Holliday | Cardinals | 0.307/0.389/0.500 | 2.7 | Yes |
| Matt Wieters | Orioles | 0.249/0.331/0.440 | 2 | No |
| Melky Cabrera | Giants | 0.350/0.393/0.514 | 3.1 | Yes |
| Michael Bourn | Braves | 0.307/0.355/0.442 | 4 | Yes |
| Michael Saunders | Mariners | 0.258/0.319/0.427 | 1.7 | No |
| Miguel Cabrera | Tigers | 0.315/0.376/0.541 | 2.7 | Yes |
| Miguel Montero | Diamondbacks | 0.279/0.375/0.434 | 2.2 | No |
| Mike Aviles | Red Sox | 0.266/0.285/0.420 | 2 | Yes |
| Mike Moustakas | Royals | 0.264/0.331/0.472 | 2.6 | No |
| Mike Trout | Angels | 0.336/0.391/0.526 | 4 | No |
| Omar Infante | Marlins | 0.289/0.313/0.460 | 1.7 | Yes |
| Paul Goldschmidt | Diamondbacks | 0.293/0.369/0.542 | 2 | No |
| Paul Konerko | White Sox | 0.336/0.413/0.556 | 2.2 | Yes |
| Pedro Alvarez | Pirates | 0.226/0.297/0.477 | 1.7 | No |
| Rafael Furcal | Cardinals | 0.280/0.346/0.377 | 1.5 | Yes |
| Robinson Cano | Yankees | 0.308/0.370/0.582 | 3.9 | Yes |
| Ryan Braun | Brewers | 0.313/0.394/0.611 | 4 | Yes |
| Shane Victorino | Phillies | 0.254/0.323/0.388 | 1.6 | No |
| Shin-Soo Choo | Indians | 0.291/0.382/0.471 | 1.7 | No |
| Starlin Castro | Cubs | 0.298/0.319/0.432 | 2.1 | No |
| Yadier Molina | Cardinals | 0.311/0.362/0.510 | 3.2 | Yes |
| Yunel Escobar | Blue Jays | 0.255/0.304/0.341 | 1.6 | No |
| Zack Cozart | Reds | 0.249/0.294/0.407 | 1.6 | No |
So that's the predictions that this dataset yielded. While the data yielded good predictions in cross-validation, the extreme variability associated with baseball will make these predictions interesting. So, what do you think? Which of the players mentioned above will definitely crash? Who will avoid the dog days?
0 comments
|
Add comment
|
0 recs |
Do you like this story?
Comments
Something to say? Choose one of these options to log in.

- » Create a new SB Nation account
- » Already registered with SB Nation? Log in!

by stvfres on 






















