/cdn.vox-cdn.com/uploads/chorus_image/image/36194118/20140711_jcd_sr9_395.JPG.0.jpg)
Bill James, the godfather of statistical analysis, developed a concept of comparing non-Hall of Fame players to those enshrined in the HoF in his book, The Politics of Glory, calling it Similarity Scores. Baseball-Reference has a thorough breakdown of how the scores are determined, but in nutshell: a player begins with a tally of 1000 and points are subtracted for differences in games played, hits, stolen bases, homeruns, etc. And likewise for pitchers: wins, losses, ERA, saves, etc.
But Baseball-Reference, in all its glory, has taken that a step further and has an age-based breakdown too. For example, through the age of 33 Albert Pujols’ most similar player is Frank Robinson with a score of 851.
I’ve taken James’ original idea and reworked the formula, using analytical scouting stats as well as age and level to compare prospects -- a system I’ve dubbed: CAL, or Comparison And Likeness.
What is CAL?
Before that question can be answered, let's take a look at what CAL isn't -- namely, a projection system in the traditional sense. CAL doesn't forecast an upcoming season -- or seasons -- like PECOTA or ZiPS or Oliver or any other groundbreaking well-known projection system.
So what exactly is CAL then? It's a player classification system whose singular goal is quite simple -- to provide a better context in which minor league numbers can be evaluated by finding closely related players. It's another piece to the analytical puzzle.
Taking James' original formula and reworking it with a litany of differently weighted statistics, CAL searches through the database for players of the same ilk. Each player starts out with 1000 points and subtractions are made for differences in age, level of competition, and, of course, position, among many others. (Some of the statistics used are: strikeout percentage, walk percentage, and homerun rates for pitchers, and plate discipline, Isolated Power, and Speed Score, which was developed by James himself).
Ideally, players with a score of 980 or higher represents a potentially strong correlation between skillsets and, again, a potentially similar development path. Outside of 980 points, skillsets begin to differ; obviously, so, the further away from that point. One player may have more speed or less power or play a different position.
The database has been built using FanGraphs' minor league statistics, which means the history begins with 2006 season. This also brings me to another interesting point: it's not an expansive database; it extends just eight seasons. So as the data size continues to grow we'll have an even better understanding of CAL's potential as an analytical tool.
Now there are two things to note: The scores aren't based on a player's collective history, it's just one season at a time. For example, Player A's age-23 season matches up well with Player B's age-23 season. And the second is that until advanced minor league defensive data becomes available, CAL only focuses on a hitter's offensive ability. It simply uses a player's position as another filter.
Finally, each of the following examples depicts each player as if he is currently working through the minor leagues.
Is CAL a predictive tool for hitters?
Nothing is 100% definitive. However, I've run through numerous test cases that suggest that CAL can become -- and is -- a useful analytical tool for hitters. The system has shown the ability to root through a lot of the statistical mumbo-jumbo -- and all the unnecessary hype -- and sniff out some of the bigger prospect busts and surprises, including some late-blooming big leaguers as well.
Again, CAL provides the evidence to allow the user to make better educated guesses by looking at his contemporaries.
Were Mike Moustakas' Hitting Woes Predictable?
Moustakas, more so than any other prospect formerly of the Royals' system, has been under the microscope for his failure to produce at the big league level. In essence, he's become the poster boy for a franchise struggling to turn the corner.
The #2 overall pick in the 2007 draft -- and ahead of several more established big leaguers such as Madison Bumgarner, Matt Wieters, and Jason Heyward -- Moustakas was annually placed on everybody's top prospect lists, a lefty-swinging third baseman with a smooth stroke that did some serious damage across parts of six minor league seasons.
In total, he owns a career .284/.338/.504 triple-slash line in the minors. He showed solid contact skills, plenty of power, wasn't unwilling to take the occasional free pass, and spent the entire time playing -- and succeeding -- against older competition. It was seemingly the perfect recipe for future big league success.
But let's look at what CAL thinks:
Year | Age | Level | Top Comparison | CAL |
---|---|---|---|---|
2008 | 19 | A | Matthew Sweeney | 984.44 |
2009 | 20 | A+ | Edward Salcedo | 988.04 |
2010 | 21 | AA | Miles Head | 935.40 |
2011 | 22 | AAA | Matt Dominguez | 942.50 |
2012 | 23 | AAA | Brett Wallace | 985.52 |
That’s not an inspiring bunch of MiLB’ers, is it?
Sweeney, who batted .260/.324/.458 and topped the average offensive production by 18% as a 19-year-old in low Class A, is out of baseball completely. The comparison of Salcedo, a career .235/.302/.383 hitter in Atlanta’s system, shows the type of step backwards the Royals' third baseman took in high Class A. Even in Moustakas' best minor league stretch -- he hit .347/.413/.687 in Class AA -- his best comp is Head, who’s bottomed out the last two years in the Texas League. And, of course, Wallace has been a complete bust. The lone savior is Dominguez, though much of his value comes from his defense.
But let's take it one more step further. Let's look at their respective big league production: Moustakas owns a career 83 OPS+; Wallace is sporting a robust 93, and Dominguez stands at an 87. So, despite not being a "true projection system", CAL's hitter classification appears to have worked -- it removed the noise and determined the true level of Moustakas' talent based off his contemporaries.
Was Charlie Blackmon's big league production predictable?
In terms of prospect status, Blackmon's sort of the anti-Mike Moustakas. He wasn't named on any top prospect lists, played against age-appropriate or younger competition, and, of course, had very little hype.
So, what does CAL think of the first time All Star?
Year | Age | Level | Top Comparison | CAL |
---|---|---|---|---|
2010 | 23 | AA | Todd Frazier | 988.64 |
2011 | 24 | AAA | Nate Schierholtz | 986.00 |
2012 | 25 | AAA | Alejandro De Aza | 985.78 |
2013 | 26 | AAA | Craig Gentry | 992.02 |
That's solid company to keep, and certainly far better than that of Moustakas. But, again, let's look at each players' big league production: Blackmon's career OPS+ is 100; Frazier's is 111; Schierholtz's is 93; De Aza's is 97, and Gentry's is 91. Players of the same ilk.
Is CAL a predictive tool for pitchers?
Yes, at this point in time the evidence suggests that it can be a useful analytical tool. Admittedly, it doesn't root through the same amount of statistical noise that it does for hitters. The problem: with the overwhelming majority of projection systems, pitching is always the most challenging aspect to forecast. Velocities change, repertoires can improve vastly with the development of one single pitch, injuries occur more frequently. Basically, there's far more of a "human element" with pitchers.
Was Kyle Drabek destined to be a bust?
Along with backstop Travis d’Arnaud, Drabek was viewed as the centerpiece to the Roy Halladay deal in 2009. CAL didn’t look too fondly on the right-hander – and former top prospect – though:
Year | Age | Level | Top Comparison | CAL |
---|---|---|---|---|
2007 | 19 | A | Shane Watson | 983.50 |
2009 | 21 | A+ | Joba Chamberlain | 977.02 |
2009 | 21 | AA | Collin Balester | 992.66 |
2010 | 22 | AA | Kyle Lobstein | 976.40 |
2011 | 23 | AAA | Chance Douglass | 987.00 |
Outside of Chamberlain (and let's be honest, he never lived up to his supposed ceiling), it's a collection of uninteresting arms. Watson hasn't thrown a meaningful pitch this season thanks to injury. Balester was a fringe major leaguer. Lobstein's a 24-year-old in Triple-A with a 4.11 ERA this year. And Douglass is out of baseball.
Gio Gonzalez: success story
The All-Star southpaw bounced around quite a bit in the earlier stages of his career, going from Chicago to Philadelphia back to Chicago to Oakland and then, finally, to Washington. At each stop, however, he always took with him some questionable control, particularly in the upper levels of the minors. Here's what CAL thinks:
Year | Age | Level | Top Comparison | CAL |
---|---|---|---|---|
2006 | 20 | AA | Jaime Garcia | 944.06 |
2007 | 21 | AA | Adam Miller | 977.52 |
2008 | 22 | AAA | Homer Bailey | 970.44 |
Those seem pretty spot on. Prior to injuries hampering him in recent times, Garcia was a reliable upper-half-of-the-rotation starter who owns a career 3.42 xFIP. Miller’s career was interrupted by a lingering finger issue (remember: the human element), but was once viewed as one of the better pitching prospects in baseball. And Bailey owns a career 3.90 xFIP. Gonzalez, by the way, is sporting a career 3.73 xFIP.
What's Next?
Now that I've discussed – and hopefully convinced everyone – of some of the potential of CAL. My next post will dig deeper into prospect busts and surprises as well as provide some of the bigger misses (and look into why it missed) as well.
. . .
All statistics courtesy of FanGraphs and Baseball-Reference.
For more analysis check out Joe Werner's site: ProspectDigest.com. You can follow him on Twitter at @JoltinJoey