Now what you read is not a test, I'm ranking prospect lists
And me, Rstats, and some math are gonna try to lift the mist
-Wonder Mike Trout
There has been some great analysis on how well Baseball America's prospect rankings translate to future success, see my Prospect Analysis page for examples from various writers around the web. Adam Foster at Project Prospect has done some work with this in his industry comparisons. However, BA isn't the only list around. Throughout the years, various rankings have come and gone. Out of all of these, which was the best prospect list?
Check it out, it's the k-e-n-d-the-a-l-l
And the rest is t-a-u
You see, this is the code for the method of the post
And these reasons I'll bring to you
-Big Bank Hammerin' Hank
First, I compiled 166 prospect lists in the era from 1990 to 2010. I used 2010 as a cutoff to allow players four years to blossom in the major leagues. I found as many as I possibly could, knowing I missed many as well*. Once everything was compiled, the question becomes: how do you figure out which one is best? In my initial research into this, I found the book "Who's #1?" which suggests using Kendall's Tau or Spearman's weighted footrule when comparing ranked lists.
*If you have a list that you would like to be included in this, please e-mail me or contact me on Twitter @stealofhome.
Recently, Neil Paine used Spearman's rho on organizational prospect rankings at 538. However, I believe the fact that I have repeated rankings (I gave a rank of 101 to all prospects from the real list that did not appear on the historical lists) disallows me from using this.
I asked Tango the best way to do this and his idea was to create a fantasy draft for each year and allow each list to draft their first available player, then repeat this for each year and each possible order of drafting, with the list that drafted the highest average fWAR as the winner. This is probably the correct way to do this. However, if I waited until I had working code for this scenario, this research would not be completed until prospect lists (as well as me) were but a dim memory. However, if you are capable of putting that code together, I would be very interested in working with you to publish that research.
I landed on using Kendall's Tau for this analysis and will do my best to attempt to describe it. Note: I am not a professionally-trained statistician, so if you are and see any errors I make in my description, please let me know.
Kendall's tau shows how close two columns of rankings are to each other. It gives a range from -1 (meaning they are ranked in the opposite order) to 1 (they are the same). It compares the number of rankings below and above the rankings and takes into account ties. Here is a youtube video describing part of what it does, although this does not look at ties.
Since Kendall's Tau works by comparing one list to another, I had to find the "true" top 100 for each year. I did this by finding all players eligible for prospect lists each year and ranking them by their career fWAR. Then I compared that true top 100 list to the various prospect lists. This is an area that may need cleaning, as I don't have the proper data to do this perfectly.
If the player did not appear on the list in question, they received a ranking of 101. If the list went over 100 players, I only used their first 100 rankings. If a list ranked fewer than 100 players, I compared it with the true list of that length for that year (e.g., MLB's top 50 lists are only compared to the true top 50 prospects eligible that year). One more wrinkle in this analysis is that not all lists claim the same eligibility standards. Most rank anyone who would be a rookie, but there are some variations on that. This may hurt some lists in these places.
This is an interesting exercise in itself to see which prospects were missed and which players were lurking in the lower levels before exploding on the scene. For instance, Pedro Martinez had the best career of anyone eligible for prospect lists in 1991, but was nowhere to be found on BA's list. However, in 1992, he jumped into the top 10 with a strong performance in three levels of the minor leagues the previous year.
BA did not have 11 of the top 20 eligible prospects in their 1991 list. However, many of these players were young at the time and did eventually appear on a BA top 100. Bernie Williams was ranked higher (11) than his true score (17). Jim Thome was ranked lower (93) than his true score (7). Kendall's Tau takes all of this into account and spits out a rank correlation of 0.157 - about average for the results I found.
Well it's on and on and on on and on
This post don't stop until I make you yawn
-Master Dillon Gee
Before I get into the results, I want to add a personal note: Please do not use these rankings to trash writers who may rate poorly here. First, I believe this method can be improved upon, which may change the results a bit. Second, and most importantly, prospect analysis is really hard. These lists require hours of research and dedication, making phone calls and meeting scouts, poring over numbers, and doing whatever else it takes to rank players. These are a labor of love for each individual and it takes a lot of guts to put your name next to something that is both outdated almost immediately—thanks to the dynamic nature of young players—and destined to mostly fail-thanks to how hard the majors are.
With that said, here are the raw results after running Kendall's Tau on each of the lists in my database:
|2008||100||Mound Talk Community||0.364||107|
|2008||100||Top Prospect Alert||0.355||104|
|2010||40||The Cardinal Nation||0.340||196|
|2007||100||Fantasy Baseball Café||0.325||135|
|2007||50||Baseball Notebook Fantasy||0.316||131|
|2007||100||Minor League Ball Community||0.281||116|
|2007||100||Baseball Digest Daily||0.270||112|
|2007||100||Top Prospect Alert||0.256||106|
|2009||100||Top Prospect Alert||0.241||106|
|2009||100||MLB Prospect Guide||0.228||100|
|2009||100||The Hardball Times Fantasy||0.214||94|
|2003||50||Baseball Think Factory||0.201||178|
|2006||75||Inside the Dugout||0.192||116|
|1999||50||Prospects, P, and Suspects||0.188||224|
|2005||50||The Hardball Times||0.169||98|
|2004||100||The Sporting News||0.167||103|
|2006||100||Warm October Nights||0.152||92|
|2010||100||MLB Prospect Guide||0.146||84|
|2006||50||The Hardball Times||0.146||88|
|2003||100||Top Prospect Report||0.129||114|
|2003||100||The Sporting News Wheeler||0.124||110|
|2010||100||The Hardball Times Fantasy||0.122||71|
|2010||100||Top Prospect Alert||0.119||69|
|2001||100||Top Prospect Alert||0.102||75|
|2004||50||The Hardball Times||0.094||58|
|2003||100||The Sporting News||0.093||82|
|2002||100||Top Prospect Alert||0.079||104|
|2002||100||The Sporting News||0.018||24|
|2000||80||Top Prospect Alert||0.000||2|
Congratulations, John Sickels, your 2008 Rotowire list was the best of all time!
The first thing that jumps out from these results is that tau is very highly correlated to the year the list was posted. Since most lists are created from a very similar group of players each year, that is expected. What this really shows, then, is that 2007 and 2008 were good years for prospect lists in general. Because of that, I have also included a corrected tau, which accounts for the yearly average. This is tau divided by yearly tau times 100. This makes... John Sickels' 2002 Top 50 the best list.
This is what the average tau looks like by year:
There is a cyclical trend here, where the average tau generally increases from 1990 to 1997, decreases to 2000, increases to 2008, and has decreased since then. This has to do with many of the top players graduating into the higher levels of the minors and gaining more national attention. This then creates better lists (e.g., Pedro Martinez going from unranked to the top 10 in 1991).
Finally, this table summarizes these results by list "voice." I have done my best to match these up, but let me know of any I missed:
|Voice||Lists||Average of Tau||Average of Tau+|
|Mound Talk Community||1||0.364||107|
|The Cardinal Nation||1||0.340||196|
|Fantasy Baseball Café||1||0.325||135|
|Baseball Notebook Fantasy||1||0.316||131|
|Minor League Ball Community||1||0.281||116|
|Baseball Digest Daily||1||0.270||112|
|Inside the Dugout||1||0.192||116|
|Prospects, P, and Suspects||1||0.188||224|
|Warm October Nights||1||0.152||92|
|The Sporting News||2||0.092||63|
|Top Prospect Alert||3||0.060||60|
Project Prospect ranks the highest for absolute tau, but their lists cover a very good range of years for lists in general (2007-2010). When taking year into account, John Sickels (2000-2002) and Dayn Perry (2004-2007) look much better.
Have you ever read over an internet post
And the writer ain't no good?
I mean the grammar is sloppy, the words are mingled
And he can't be understood
-Wonder Mike Trout
The main purpose of this post is to get this research out there to begin a discussion about the best way to rank prospect lists. So how can we improve this? Is Kendall's Tau the best method to use? What is the best way to put all lists on equal footing? Should we account for the year? What about the length of the list? Let me know your ideas.
As it stands now, of lists ranked at least three times from 2007 to 2010-the most recent lists in this analysis- Jonathan Mayo (120) and Project Prospect (110) are the only lists to maintain an above-average rating. Top Prospect Alert comes in at 96 and Baseball America sits at 95, while Baseball Prospectus is at 92. ESPN brings up the rear at 89.
. . .