In the comments of my previous article on extrema, there was something that seemed like an interesting topic area.
This is something I’ve been wanting to see for a long time now, ever since I studied and wrote about draft picks about a dozen years ago...extrema is what you want when drafting, not average. Because we want to find the Trouts and Kershaws, not the Michael Tuckers or even Marquis Grissom...
This is an excellent point. It is true that teams are trying to identify the maximums in a draft, as well as trying to determine the expectation they'll get out of their draft picks. Now, it's very true that we can use retrospective averages to do that, but we can equally use an analysis of order statistics (The generalization of extrema), to look into this.
The Mathematics of Order Statistics
For once, I'll spare you the gory math. If you're interested in the statistical theory behind order statistics, you can look here or feel free to email me for some explanation.
Distribution of First Round Talent
Generally speaking, the best talent in the MLB draft is most likely to show up in the 1st Round. But even with this being the case, a first round pick is no guarantee. In fact, 35.8% of all first round picks from 1990-2010 didn't (or haven't to this point) made it to the majors. That's would be an astounding proportion to someone who considered the NFL or NBA drafts the rule when it comes to drafts.
The 'Problem' With Sabermetrics
Sabermetrics have changed the way people think about the game of baseball -- particularly in big league front offices. But there are a few old-schoolers who don't believe in the movement, and therein lies the problem.
The problem comes in evaluating the players who do reach the majors. It seems fair to evaluate players on the basis of WAR/Season, but what defines a season? It could potentially be a question of WAR/600 PAs, or WAR/180 IP for starters, or WAR/60 IP for relievers. However, in doing this, you turn players who turn in -1 WAR in 100 PAs into historically bad -6 WAR players in 600 PAs. This is a disservice to those players, as they surely wouldn't be played that long to reach levels that low.
So, we will have to define a season for each player. This will be a weighted average of PAs or IPs for each player, with the weights being the number of PAs or IPs for each season. So, for example, if a player had 100, 300, and 600 PAs in three seasons, the weighted mean would be 460 PAs for a season for the player. From there, the WAR/Season will be the WAR/PA or WAR/IP times the appropriate seasonal PA or IP amount.
From this point, we can estimate the distribution of 1st Round talent, defining talent as WAR/Season. It is important to remember that this distribution represents the 64.2% of 1st Round draft picks who make the majors. So, in a sense, this is the distribution of 1st Round talent given that the player reaches the major leagues. And for those wondering, yes, the higher pdf value out near 8 WAR/Season is Mike Trout.
Now that we've established this distribution of 1st Round talent, let's enter a fantasy world. A world where the draft has no signability concerns, no compensation picks, no signing bonus disputes or injury concerns causing problems. In this world, the job of the team with the #1 pick is to identify the player who will be the best/most talented/highest WAR/Season. The expectation of the pick in this scenario is not the average of similar picks past, it's the expectation for the maximum of the 30 first round picks from that talent distribution. This continues downward where the expectation of the ith pick is the expectation of the ith best player in the 1st Round.
There are two concerns working at once here. The first is working with the probability that the best player, 2nd-best player, etc. in the 1st Round reach the majors. The second concern is that, given the ith best player reaches the majors, what is his distribution of WAR/Season like?
We can work out the first concern with a simple binomial distribution, after a little thought process. If the "best" player in the 1st Round doesn't reach the majors, that means that all 30 of the 1st Round picks fail to make the majors, as we are assuming we can order the sample of 30 picks from largest/best X(1) to smallest/worst X(30). If the 2nd best player, X(2) fails to make the majors, that implies that either 29 or 30 picks didn't make the majors. On the other end, if the worst player, X30, fails to make the majors, that means anywhere from one to 30 guys failed to make the majors. So, in general, we can say that the probability of the ith best player reaches the majors is
where pMinors is 35.8%=0.358. While this math looks bad, it is easy to implement in R or some other software package. To visually see this probability, take a look at the picture below.
So we now can look at the second consideration, the distribution of WAR/Season for the nth best player given the player reaches the majors. This is where the gory math skipped above comes into play. But skipping all that, we can visually look at the resulting distributions.
Here, we see the distribution for each pick rapidly shifting towards smaller (Even negative) WAR/Season values as the pick increases. But remember that the chances of those smaller values becomes less and less, because the probability that the ith best player doesn't even reach the majors increases as n goes to 30.
So, what can we expect from each of the first 30 picks? We can easily calculate this value, along with credible bounds for the WAR/Season given the player reaches the majors.
Not surprisingly, the best player not only has the highest average but the highest upside. To put all the numbers out there, the probability that the ith best player fails to reach the majors, the E(WAR|Reaches Majors), and the 2.5th and 97.5th percentiles given the the player reaches the majors are given below.
|Pick Value/ith Best Player||P(Minors)||E(WAR/Season|Reaches Majors)||2.5th Percentile||97.5th Percentile|
So far, we have been looking at the expectation for the ith pick using the E(WAR of ith best player). However, this is not the only method of doing this. You could look to minimize risk by looking for the floor of realistic values, or the 2.5th percentile. Finally, you could hope for the best and look at the ceiling of realistic values, the 97.5th percentile.
As teams are trying to maximize their value, let's assume they're looking at the reasonable ceiling values. However, the above table ignores the probability of the player never reaching the majors. However, once we include that, the importance of high draft picks becomes even clearer.
|Pick Value/ith Best Player||97.5 Percentile Including Minor Leagues Probability|
Here, we see the effect of, say, the 2004 Padres missing on Matt Bush with the #1 pick. Instead of the hope and expectation of getting a 7.9 WAR/Season, they got a career-minor leaguer. Further, it shows the increase in value of Mike Trout. At the #25 pick, assuming the Angels expected to get the 25th-best player in the 1st Round, it would be difficult to project anything further than a career minor leaguer. Instead, they got a 7.9 WAR/Season monster. In the end, it's easy to see value of the high picks, and equally as easy to see how difficult it is for team's to identify the right player.
. . .
Data courtesy of Baseball Reference.
Stephen Loftus is an editor at Beyond The Box Score. You can follow him on Twitter at @stephen__loftus.