/cdn.vox-cdn.com/uploads/chorus_image/image/14952799/20130512_tjg_ah7_268.0.jpg)
Prediction is very difficult, especially about the future.
What do a three-time American League MVP and a Nobel-winning Physicist have in common? Both have had this quote attributed to them. Yogi Berra and Neils Bohr both supposedly have this quote to their name, although I'd probably defer to Neils Bohr on the subject of predictions.
Regardless who originated the quote, they're decidedly correct. Nothing is more difficult than predicting future events, especially when you don't have good data to go off of. Baseball is especially difficult to predict, because baseball is one of the noisiest sources of data.
Despite this, people still try to project what players will accomplish in the coming season. Around February-March, we see projections for players and teams starting to come out of the woodwork. Some involve complex formulas, while others merely involve general feelings about a player.
Part of the fun over the course of a season is seeing which players exceed their levels. Probably nothing is more talked about on television, radio, fantasy chats, and general conversation, than breakout/bust players. A fair question to ask is, "Who has had the best breakout season?" Or, more specifically, who has most exceeded expectations?
Now, an easy way to look at this is to merely scale the player's full-season WAR for their current number of plate appearances, and look at the differences. Now, this would be a little short-sighted, because players of different hitting types have different amounts of variability expected with their seasons. And anyways, what fun is it going with the simple answer?
So instead, why don't we look at the probability that a player exceeds their current level, given that their projection is correct. While this is can be difficult to look at directly, we are able to look at this through a few simulations.
Of course, to start, we need a projection system to start off from. For this, I decided to choose Dan Szymborski's ZiPS projections. Now, we need to discuss how we can get at the probability estimate we want to look at.
Many times before this, I've mentioned using Bootstrapping to assess the variability in an estimator. Well, I will again be using bootstrap-esque procedure, and I want to explain a little how the bootstrap method works in this case. Consider it "Bootstrapping For Baseball Applications." If you aren't interested, jump to the next header.
Bootstrapping: A Primer
Bootstrapping is a useful method when looking at the distribution or properties of an estimator when observing these properties directly may be difficult. Though looking at variability often needs multiple observations, i.e. multiple point estimates, we often only have one sample. Baseball is an excellent example of this, where we may want to know the distribution of, say, wOBA for a player, but can only observe one single season of wOBA.
However, our one sample has many data points which should be representative of the population as a whole. In baseball terms, a player's season is made up of individual plate appearances which should be representative of his overall talent level. Why couldn't we create seasons out of this sample?
This is what bootstrap does. It essentially takes our sample, and creates a new dataset by resampling with replacement from our real sample. So in baseball, we could create as many seasons as we want by sampling with replacement from the plate appearances seen in our season. Suddenly, we have a large number of point estimates, allowing us to estimate variances, probabilities, or anything we may want to investigate.
So, we'll use this bootstrap-esque technique to look at the probability that a player could exceed is expectations. To start with, we assume that the ZiPS projection is an accurate projection of the player's expected level. Then we sample from the plate appearances seen in the projection with replacement x times, where x is the number of plate appearances for the player at this point. Then, we calculate the player's WAR by adding the player's bootstrap RAA to his actual UZR, Replacement, and Positional levels. Then, we can look at the number of times a player exceeds his current level, out of 10,000 created seasons.
The Most Unexpected Players of 2013
So, of the 253 players who had at least 150 PAs by this past Monday, who exceeded their projection by the largest amount? And what was the probability that we'd see a better season from this player? In the table below, we have the player, their actual WAR, and the probability they exceeded their current level based on preseason projections.
Player | Actual WAR | P(Higher WAR) |
---|---|---|
Chris Davis | 3.8 | 0.0001 |
Everth Cabrera | 3.3 | 0.002 |
Josh Donaldson | 2.7 | 0.0042 |
Carlos Gomez | 4.3 | 0.0053 |
Marco Scutaro | 1.9 | 0.0096 |
Adam Lind | 1.8 | 0.0116 |
Jean Segura | 3 | 0.0147 |
James Loney | 1.6 | 0.0177 |
Matt Carpenter | 3.4 | 0.0196 |
Jhonny Peralta | 2.7 | 0.0221 |
Coco Crisp | 2.3 | 0.0237 |
Daniel Nava | 1.1 | 0.0282 |
Howie Kendrick | 2.1 | 0.0318 |
Gerardo Parra | 2.5 | 0.0325 |
Brandon Crawford | 1.9 | 0.0334 |
Carlos Gonzalez | 4.2 | 0.0351 |
Jason Castro | 1.8 | 0.0358 |
Manny Machado | 3.6 | 0.0359 |
Nate Schierholtz | 1.5 | 0.0404 |
Kyle Blanks | 1.2 | 0.0443 |
Troy Tulowitzki | 4.3 | 0.0495 |
Mitch Moreland | 1.6 | 0.0544 |
Hunter Pence | 2.9 | 0.0553 |
Miguel Cabrera | 4.1 | 0.0642 |
Michael Cuddyer | 1.5 | 0.0647 |
Paul Goldschmidt | 2.9 | 0.0679 |
Chris Johnson | 0.7 | 0.0731 |
Didi Gregorius | 1.7 | 0.0756 |
Marlon Byrd | 1.4 | 0.0819 |
Yadier Molina | 2.8 | 0.0877 |
Brandon Barnes | 1.1 | 0.0907 |
Evan Gattis | 1.9 | 0.0938 |
Bryce Harper | 1.8 | 0.1077 |
Kyle Seager | 2.1 | 0.1221 |
Jarrod Saltalamacchia | 1.5 | 0.1222 |
Jedd Gyorko | 1.6 | 0.1226 |
Matt Joyce | 1.3 | 0.1231 |
Shin-Soo Choo | 2.5 | 0.1235 |
Alex Rios | 2.4 | 0.1273 |
David Wright | 3.1 | 0.1322 |
Nate McLouth | 1.5 | 0.1497 |
Joe Mauer | 2.8 | 0.151 |
Carlos Santana | 1.8 | 0.1576 |
Carl Crawford | 1.9 | 0.1578 |
Munenori Kawasaki | 0.6 | 0.1613 |
Brett Gardner | 2.5 | 0.1618 |
Jason Kipnis | 2 | 0.1623 |
Dexter Fowler | 3.1 | 0.1639 |
Evan Longoria | 3.6 | 0.167 |
Carlos Beltran | 1.1 | 0.1841 |
Starling Marte | 2.3 | 0.1959 |
Kelly Johnson | 1.2 | 0.1991 |
Luis Valbuena | 1.8 | 0.1995 |
Domonic Brown | 1.4 | 0.2017 |
Trevor Plouffe | 0.2 | 0.2037 |
Mike Trout | 3.6 | 0.2057 |
Ian Desmond | 2 | 0.2182 |
Russell Martin | 2 | 0.2223 |
Endy Chavez | -0.1 | 0.2374 |
Seth Smith | 1 | 0.2388 |
Raul Ibanez | -0.4 | 0.239 |
Michael Morse | -0.1 | 0.2409 |
Nick Punto | 0.6 | 0.2431 |
Jason Bay | 0.6 | 0.2502 |
Jed Lowrie | 1.1 | 0.2517 |
Brandon Moss | 1 | 0.2603 |
Mark Trumbo | 1.7 | 0.2638 |
Colby Rasmus | 2 | 0.268 |
Lorenzo Cain | 1.6 | 0.2739 |
Marcell Ozuna | 1.4 | 0.2767 |
Gregor Blanco | 1.1 | 0.2788 |
J.J. Hardy | 2.1 | 0.2833 |
Ian Kinsler | 1.1 | 0.2858 |
Buster Posey | 2.5 | 0.2867 |
John Mayberry | 0.5 | 0.287 |
Justin Upton | 1.7 | 0.2886 |
Omar Infante | 1.4 | 0.2969 |
Adam Jones | 1.8 | 0.2996 |
Norichika Aoki | 1 | 0.3076 |
Dustin Pedroia | 2.9 | 0.3101 |
Kendrys Morales | 0.8 | 0.3413 |
Lucas Duda | -0.3 | 0.3415 |
Daniel Murphy | 1.8 | 0.3459 |
Freddie Freeman | 0.8 | 0.3461 |
Adrian Beltre | 2.1 | 0.36 |
A.J. Pierzynski | 0.8 | 0.3651 |
Michael Bourn | 1 | 0.3704 |
Salvador Perez | 1.5 | 0.3742 |
Nelson Cruz | 0.9 | 0.3805 |
Drew Stubbs | 0.7 | 0.3947 |
Mike Aviles | 0.6 | 0.396 |
Desmond Jennings | 1.4 | 0.3974 |
A.J. Ellis | 1.3 | 0.4029 |
Chris Iannetta | 0.8 | 0.4042 |
Justin Smoak | -0.1 | 0.4077 |
Skip Schumaker | -0.8 | 0.4102 |
David Freese | 0.8 | 0.4128 |
David Ortiz | 1.8 | 0.4132 |
Chase Utley | 1.6 | 0.4168 |
Michael Young | 0.5 | 0.4199 |
Lyle Overbay | 0.2 | 0.425 |
Brandon Phillips | 1.8 | 0.4253 |
Pedro Florimon | 1.3 | 0.4265 |
Pete Kozma | 0.9 | 0.4284 |
David DeJesus | 1.4 | 0.4318 |
Todd Frazier | 2.2 | 0.4333 |
Allen Craig | 0.6 | 0.4344 |
Neil Walker | 1.2 | 0.442 |
Eric Sogard | 0.3 | 0.4422 |
Carlos Quentin | 1.1 | 0.4475 |
Mark Ellis | 0.3 | 0.4476 |
J.D. Martinez | -0.5 | 0.4579 |
Edwin Encarnacion | 1.7 | 0.4613 |
Prince Fielder | 1.3 | 0.4638 |
Adrian Gonzalez | 1 | 0.4678 |
Mark Reynolds | 0.3 | 0.4679 |
Garrett Jones | 0 | 0.4801 |
Nick Hundley | 0.5 | 0.4817 |
A.J. Pollock | 1.7 | 0.4828 |
Nolan Arenado | 1.7 | 0.483 |
Jonathan Lucroy | 1.1 | 0.4836 |
Chris Denorfia | 1.4 | 0.4869 |
Leonys Martin | 0.5 | 0.4875 |
Alex Gordon | 1.8 | 0.5007 |
Yonder Alonso | 0.4 | 0.5033 |
Conor Gillaspie | 0.8 | 0.5104 |
John Buck | 0.9 | 0.5119 |
Will Venable | 0.7 | 0.5158 |
Jay Bruce | 1.3 | 0.5248 |
Derek Norris | 0.5 | 0.5256 |
Nick Markakis | 0.8 | 0.5476 |
Asdrubal Cabrera | 0.4 | 0.5485 |
Travis Hafner | 0.5 | 0.5501 |
John Jaso | 0.9 | 0.5559 |
Wilin Rosario | 1.2 | 0.5594 |
Yunel Escobar | 1 | 0.5607 |
Ryan Zimmerman | 0.5 | 0.5616 |
Michael Brantley | 0.5 | 0.5651 |
Joey Votto | 3.1 | 0.5652 |
Erik Kratz | 0.7 | 0.5667 |
Alejandro De Aza | 0.8 | 0.5675 |
Andres Torres | 0.8 | 0.5735 |
Mike Napoli | 1.7 | 0.5769 |
Marwin Gonzalez | 0.2 | 0.5873 |
Adam LaRoche | -0.1 | 0.5955 |
Jayson Werth | 0.3 | 0.6015 |
Angel Pagan | 0.4 | 0.6068 |
Dan Uggla | 0.4 | 0.6098 |
Robinson Cano | 2.1 | 0.613 |
Brian Dozier | 0.6 | 0.6144 |
Ryan Doumit | -0.1 | 0.6161 |
Austin Jackson | 0.8 | 0.6192 |
Ryan Braun | 1.9 | 0.6242 |
Yoenis Cespedes | 1.6 | 0.6382 |
Nick Swisher | 0.9 | 0.651 |
Josh Willingham | 0.5 | 0.6519 |
Freddy Galvis | 0 | 0.6544 |
Justin Morneau | 0.6 | 0.6547 |
Torii Hunter | 0.8 | 0.6563 |
Jimmy Rollins | 1.3 | 0.6582 |
Andrew McCutchen | 2.7 | 0.6593 |
Jayson Nix | 0.4 | 0.661 |
Ben Zobrist | 1.4 | 0.6638 |
Ryan Howard | 0.3 | 0.6643 |
Pedro Alvarez | 1 | 0.6644 |
Pablo Sandoval | 0.8 | 0.6684 |
Welington Castillo | 0.9 | 0.6777 |
Matt Dominguez | -0.3 | 0.6778 |
Billy Butler | 0.7 | 0.68 |
Stephen Drew | 1.3 | 0.6828 |
Travis Snider | 0 | 0.6889 |
Chris Carter | -0.4 | 0.6981 |
Matt Holliday | 1.1 | 0.6991 |
Justin Ruggiano | 1.4 | 0.7098 |
Jacoby Ellsbury | 2.2 | 0.7156 |
Jonny Gomes | 0.7 | 0.7209 |
Shane Victorino | 1.6 | 0.7246 |
Chris Parmelee | -0.5 | 0.7278 |
Jose Altuve | 0.9 | 0.7298 |
Ichiro Suzuki | 0.3 | 0.735 |
Josh Reddick | 0.5 | 0.7367 |
Todd Helton | 0 | 0.7388 |
Adam Dunn | -0.4 | 0.7412 |
Lance Berkman | 0.6 | 0.7419 |
Brandon Belt | 1 | 0.7475 |
J.P. Arencibia | 0.3 | 0.7488 |
Alberto Callaspo | -0.6 | 0.749 |
Zack Cozart | 1.1 | 0.7532 |
Juan Pierre | 0.3 | 0.7549 |
Vernon Wells | -0.1 | 0.7595 |
Denard Span | 0.8 | 0.7664 |
Carlos Pena | 0.3 | 0.7665 |
Ben Revere | 0.3 | 0.7705 |
Jose Bautista | 2.5 | 0.7778 |
Erick Aybar | -0.1 | 0.7846 |
Michael Saunders | 0 | 0.7895 |
Anthony Rizzo | 1.2 | 0.7925 |
Tyler Flowers | 0.4 | 0.7929 |
Matt Wieters | 1.2 | 0.7967 |
Alfonso Soriano | 0.6 | 0.8073 |
Rob Brantly | 0 | 0.8102 |
Greg Dobbs | -0.3 | 0.8129 |
Juan Francisco | -0.1 | 0.8133 |
Cliff Pennington | 0.5 | 0.8161 |
Andre Ethier | 0.5 | 0.8262 |
Eric Young | -0.5 | 0.8281 |
Placido Polanco | -0.6 | 0.8306 |
Eric Hosmer | 0.3 | 0.8327 |
Brendan Ryan | -0.4 | 0.8442 |
Alexei Ramirez | 1.1 | 0.8456 |
Adeiny Hechavarria | -0.8 | 0.8472 |
Chase Headley | 1.3 | 0.8507 |
Darwin Barney | 0 | 0.8584 |
Will Middlebrooks | -0.5 | 0.8622 |
Chris Young | -0.3 | 0.8674 |
Andy Dirks | 0.8 | 0.8716 |
Andrelton Simmons | 0.9 | 0.8726 |
Alcides Escobar | 0.7 | 0.8727 |
Yuniesky Betancourt | -0.5 | 0.8728 |
Jon Jay | -0.3 | 0.8731 |
Jason Heyward | 0.9 | 0.8782 |
Kurt Suzuki | 0 | 0.8817 |
Cody Ross | 0.1 | 0.8921 |
Josh Rutledge | -0.2 | 0.8922 |
Melky Cabrera | 0.1 | 0.8964 |
Aaron Hicks | -0.4 | 0.9081 |
Emilio Bonifacio | -0.6 | 0.9104 |
Albert Pujols | 0.2 | 0.9113 |
Dayan Viciedo | -0.4 | 0.9114 |
Jeff Francoeur | -0.6 | 0.919 |
Brett Lawrie | 0.3 | 0.9322 |
Rickie Weeks | -0.2 | 0.9329 |
David Murphy | -0.2 | 0.9412 |
Clint Barmes | -0.3 | 0.9527 |
Josh Hamilton | 0.2 | 0.9619 |
Maicer Izturis | -1.4 | 0.9709 |
Ruben Tejada | -0.5 | 0.9724 |
Elvis Andrus | 0.9 | 0.9729 |
Victor Martinez | -1.1 | 0.9737 |
Dustin Ackley | -0.3 | 0.9738 |
Ryan Flaherty | -0.2 | 0.974 |
Martin Prado | -0.5 | 0.9772 |
Miguel Montero | 0.3 | 0.9815 |
Paul Konerko | -0.9 | 0.9826 |
Steve Lombardozzi | -0.7 | 0.9835 |
Alex Avila | -0.2 | 0.9844 |
Starlin Castro | -0.3 | 0.9868 |
Mike Moustakas | -0.5 | 0.9874 |
B.J. Upton | 0.4 | 0.9879 |
Matt Kemp | -1.3 | 0.994 |
Danny Espinosa | -0.6 | 0.9971 |
Ike Davis | -1.1 | 0.9993 |
Jeff Keppinger | -1.3 | 0.9994 |
So to add to all his accomplishments this year, Chris Davis is the most unexpected player of 2013 so far, and it's not even close. The season he is having is roughly 20 times less likely than any other player, at least when compared to their respective projections. Other players who exceeded expectations by far were breakout players Everth Cabrera, Josh Donaldson, and Carlos Gomez.
On the other end of the spectrum fall the season busts. Some are down there because of injuries, other just because they haven't produced so far. Jeff Keppinger barely edges out Ike Davis for the most underwhelming, with Danny Espinosa and Matt Kemp close behind.
One final comment to make about these projections is about how difficult projections really are. We can assess this by looking at how the probabilities from the above table are distributed.
The better the projection, the closer the probability of a better season is to 0.5. As we can see, the distribution of probabilities is pretty close to uniform throughout [0,1]. Just a reminder about how difficult these projections can be to get correct.
All statistics courtesy of Fangraphs. Statistical work done in R.