Two weeks ago I rolled out a shiny new analytic – a player classification system I dubbed CAL, or Comparison And Likeness. The quick overview of it is quite simple: CAL is based off of Bill James’ Similarity Scores, but uses a litany of differently weighted statistics to compare prospects against each other.
Well, after the rollout some of the initial feedback asked if I could tweak the system to show player comps for multiple seasons, not just one year at a time in hopes to paint a more complete picture, and as the guys that run the Steamer Projections can attest, systems like this are always evolving. (Steamer is in its sixth iteration.)
So, I hunkered down for the past two weeks, spending close to sixty hours and much of my remaining patience, and retooled CAL to determine a player’s top comps based on three years of data, not just one.
First, a few things to note:
• The closer the CAL scores to 1000 the better.
• CAL was run for the below examples when a player has at least 200 plate appearances for a single level unless otherwise noted.
• I was originally using James’ positional values – 240 for catcher, 12 for first base, 132 for second base, etc… – but after a thorough research-and-development portion I determined CAL is more effective for multiple seasons using different values.
• Finally -- and most importantly -- remember that’s it’s crucial to examine the group as a whole, not just cherry pick. For example, Joey Votto comes up as the top comp for Jackie Bradley, but the other four players suggest something entirely different. My interpretation of this is quite simple: this is CAL's way of displaying a prospect's volatility, perhaps highlighting a projected ceiling and floor. The more consistent the top five are, the more likely the player is to reach that level of production. And likewise, the more scattered the top five, the less likely the player is to reach his projected ceiling.
Now, the examples: Below I’ve provided 41 instances of CAL – both proving its accuracy (and usefulness) and explaining why CAL failed in certain comparisons. (It’s a bit lengthy, so feel free to bounce around.)
Moustakas batted .311/.362/.588 during his stints in Class AA and Class AAA – numbers that came against competition that averaged at least three years his senior. But despite the robust production, CAL lumped him in with a group that included Brett Wallace, another fellow draft bust in Josh Vitters, who was coincidentally chosen directly after Moustakas, a disappointing international free agent, and a player that looks absolutely lost this season. Career production in terms of wRC+: 91 (Gyorko), 93 (Wallace), 95 (Viciedo), and 84 (Moustakas).
Wood’s another big time prospect bust. To best show CAL’s effectiveness, I ran the algorithms based on his production between the ages of 21 and 23 – a time in which he was ranked among the top 20 prospects each season by some of the most well respected analysts and media outlets. Wood hit .281/.356/.546 and slugged 79 home runs during those 332 games.
Well, only one big league regular, Alvarez, topped the league average offensive production (barely) in his career. And Rodriguez, the only other player to carve out a lengthy big league career, has posted a 90 wRC+ in a super-utility role.
Let’s play a game. Match the players’ offensive WAR per 600 plate appearances during their respective big league careers. Player A: 1.66 WAR/600PA. Player B: 2.34 WAR/600PA. Player C: 1.73 WAR/600PA.
The answers (in order from A through C): Andrelton Simmons, Josh Harrison, and Eduardo Nunez. Remember, CAL doesn’t look at a player’s defensive contributions, only his offensive abilities. And while Simmons is an elite, elite defender (a potential Hall of Famer solely based on that ability), he’s been 12% below the league average offensive production in his big league career, something CAL had pointed out despite his hitting .299/.352/.397 in the minors.
Ackley, another #2 overall pick, was viewed as the premium collegiate bat in 2009’s draft class and a potential cornerstone to Seattle’s rebuild at that time. CAL grouped him with two players that couldn’t hack it in the big leagues, Giavotella and Weeks, another top prospect bust in Antonelli, and a career minor leaguer (Cardenas).
Viciedo, who’s failed to top the league average production in each of his three full big league seasons, hit .283/.331/.450 during his trek through the minor leagues. Pretty much the same comps as Moustakas, which shouldn’t be surprising. All fringe big leaguers with the exception of Schoop, where the jury’s still out.
Gomes was pretty much a nondescript player when Cleveland acquired him as part of the Esmil Rogers trade with Toronto in late November 2012. Immediately following the trade, Tribe beat writer Paul Hoynes wrote that, "[GM Chris] Antonetti said Gomes will go to spring training with a chance to make the team either as a second or third catcher."
Since then, however, he’s been the fifth best offensive catcher in baseball – a shock to some, though not entirely so to CAL.
Castillo topped the league average offensive production by 5% from 2012 to 2013 en route to tallying 3 WAR last season; d’Arnaud has long been considered a top offensive catching prospect and is finally starting to figure things out at the big league level (he’s hit .264/.304/.465 since June 24) and Arencibia has slugged 71 career home runs.
The point: Based on his comps, CAL thought there was a strong chance that some big league value could be extracted from Gomes and it was right. It just didn’t think Gomes would be this good, but who did?
No shock here, really. Trout’s top CALs have been some of the best prospects, and subsequently young stars, in baseball.
An impressive trio of names to be linked with: Jones, Davis, and Myers. And Arcia has been a league average performer through his age-23 season with Minnesota.
Another top prospect letdown, Escobar’s CALs shouldn’t be all that surprising given his lack of offensive punch at the big league level: a group of light hitting infielders. Escobar, by the way, hit .293/.333/.377 in his minor league career, but owns a career .261/.298/.346 line at the big league level and much of his value comes from the defensive side of the ball.
There were only two stops for Heyward that CAL could effectively run – his 2008 in low Class A and 2009 in high Class A (remember: I only use sample sizes of at least 200 PA). But despite the lack of data, CAL still linked him with three regulars in Trout, Freeman, and Rasmus. The Moustakas comp is pretty scary though, perhaps highlighting his floor.
This was a major miss by CAL, comparing Kemp with two busts and three good, not great big leaguers. However, there were just 426 minor league plate appearances CAL could use in analyzing Kemp, which should be considered a small sample size considering that three years of data was the original goal.
This one shouldn’t really be surprising either. CAL compares the game’s top prospect with three impact bats (Springer, Sano, and Rizzo). The Alvarez tie-in would seem to indicate a potential issue with Bryant’s contact ability, as his strikeout percentage is hovering around 27% in Class AAA. Again, an incredibly small sample size for CAL.
Since Hamilton’s hot spell cooled, his production has been 7% below the league average. His top CALs are all fourth outfielder types. The difference is his plus-plus-plus speed, which CAL had trouble sniffing out. Hamilton could hover around the 90 wRC+-mark and still be a three- or four-win player because of his defense. His subpar year in Class AAA did him in with CAL, though.
The Stanton-Sano-Bruce-Rizzo quartet is pretty reasonable. Decker’s grouped in because of his numbers in the lower levels of the minor leagues, which increasingly worsened as he moved up the ladder. And, well, Snider never really panned out.
One additional thing to note: Stanton’s comps were thrown off a bit because his production cratered in his first stint in Class AA (.231/.311/.455), but I still included it in the CAL calculations instead of his low Class A numbers because I didn’t want to skew the comps to help convince people of its usefulness.
Of Butler’s top five CALs, no one has topped 30 home runs in a season. Prior to this year, Morrison owned a 108 wRC+ and Butler a 120. And the Morrison-Barton-Choi trio are all power-deficient first basemen.
There's not a whole lot that separates Revere, the starting center fielder and faux leadoff hitter, and Revere, the slap-hitting, serviceable fourth outfielder. He doesn’t walk much or hit for any power; it’s empty batting averages and stolen bases. His top two CALs, Inciarte and Cunningham, are the basic fourth/fifth outfielders. And through his first 1830 plate appearances, Revere owns a career 84 wRC+.
Trumbo does one thing pretty well: hit for power. Otherwise, his overall production (109 wRC+) is certainly bordering on mediocrity, especially for his position, and one that doesn’t suggest a lengthy big league career as an everyday type guy – pretty similar to that of Chad Tracy. Outside of Craig, it’s a handful of bench bat options.
At least in the local media (I live in Cleveland), Santana is constantly hounded for his low batting averages, but even in a down year like 2014 – he’s hitting .231/.369/.433 – his overall production has topped the league average by 31% thanks to his power/patience combo.
Career wRC+s: 112 (Grandal), 128 (Santana) and 137 (Posey), and Wieters has certainly flashed thump in his offensive game.
Gallo’s registered the lowest CAL scores that I’ve seen at this point – and probably will ever see. Basically, there’s no reasonable comp for him; he’s a one of a kind. He’s swings-and-misses a ton, has gobs of power, and walks. But he’s also shown a drastic improvement in his K-rate, followed by a massive step backward.
This is another one of CAL’s bigger misses: not one MLB-worthy guy in the bunch. The problem: Lowrie had one truly great season (2007) followed by two average-ish stints in the minors. Again, I chose not to run the comparison using Lowrie’s 177 plate appearances in Class AAA in 2007 so as not to skew CAL in his favor.
Again, not very surprising.
The Sands and Evans groupings are the outlier (obviously). Both absolutely mashed throughout their respective minor league careers. But the Bruce, Jones, Myers comps are reasonable. CAL had some reservations about Rizzo's ability to reach his potential—clearly.
|Alejandro De Aza||994.35|
A lot of fringe everyday-type guys and solid, useful bats. The 994.35 CAL score for De Aza suggests a high comparison. Career wRC+s: 93 for Blackmon and 97 for De Aza.
There’s really nothing more to add, truthfully.
It’s way too early to determine Castellanos’s ceiling, but outside of one stint in high Class A in 2012 he’s never really dominated at the minor league level. Dominguez and Chisenhall are solid league average regulars, which Schoop has a chance to be too, and they seem like reasonable comps. Castellanos, by the way, is hitting just .258/.309/.404 in his rookie campaign.
I chuckled when Altuve jumped out at the top of Betts’s list, because it’s the one I was suspecting. Smaller middle infield bats with pop.
Pretty nice comps for a middle infielder: four former top prospects and one who has a chance to be a decent regular.
OK. Let’s play another game. Guess the effectiveness of Cameron Maybin’s bat. Has he been 11% below the league average, 11% above the league average, or right at the league average?
The answer: he’s been 11% below the league average in nearly 2000 plate appearances. His reputation has basically been coasting off of his 4-win season from 2011, which has been the only time he’s topped the league average (and that was just by 5%).
For his CALs, I used his 2007 through 2009 minor league numbers -- each year he topped the league average production by at least 25%. But CAL suggested one major bust (maybe two), one league average performer, and Tolisano. CAL could have been useful in the Miguel Cabrera deal for Miami, huh?
Taveras has long been among the game’s better prospects, so it’s not surprising to see him linked with Kemp, Polanco, and Piscotty. Flores and Tucker have both handled themselves well in the minor leagues, though each have yet to prove it in the bigs. CAL hasn’t been impressed by Taveras’s Class AAA numbers.
Now six full seasons into his big league career – and an eight-year, $120 million contract in his pocket – Andrus has been a well below-average hitter, posting a career wRC+ of 84. Average-ish walk rate, speed and little pop isn’t exactly a recipe for stardom, so it’s not surprising to see Andrus linked with five light-hitting middle infielders. Andrus’ above-average defense has helped buoy his WAR totals.
Franklin, Alcantara, and Machado are all promising, reasonable comps. Schoop has been talked about previously. And Winfree flamed out. Perhaps Bogaerts isn’t quite a lock for stardom?
Carter absolutely mashed during the 21- to 23-year-old seasons which CAL analyzed, as he posted an OPS of at least .930 in each one. But CAL was quite hesitant, linking him with just one big league regular, a pair of Quad-A types, and two flame outs. Carter owns a career .222/.312/.459 line with a 113 wRC+. This was sort of half-win/half-loss for CAL.
This sort of falls into the same category as Carter: Alvarez was grouped with Carter (no surprise) and a bunch of career minor leaguers. Again, sort of a half-win/half-loss for CAL; it recognized that Alvarez was overrated as a prospect, but only linked him to one reasonable big leaguer. Alvarez, by the way, was named among the game’s top 10 prospects.
Brantley has been arguably the biggest surprise in baseball this season, tying Andrew McCutchen, Yasiel Puig, and Jason Heyward with 4.6 fWAR. Brantley’s already nearly doubled his power output and has posted a ridiculous 156 wRC+.
Prior to this year, though, he was basically a fringy league average bat, something that matched his minor league numbers (he’s a career .303/.388/.377 MiLB hitter). CAL linked him with four backup types and one All Star. You can draw your own conclusions from that.
Basically, it’s a similar list of comps as Brantley.
The Jay Bruce and Adam Jones comps bookend two prospect busts and another underrated minor leaguer. CAL’s obviously skeptical of Baez’s swing-and-miss tendency. And Baez’s slow start in AAA this season certainly skewed his overall comps.
The Votto comparison is incredibly promising, but looking at the group as a whole the overwhelming majority of the evidence suggest that Bradley’s headed for below average offensive production (role player-dom). The fact that he’s hitting .208/.284/.303 through his first 142 games would seem to back that up.
I’ve long been on the Joc Pederson bandwagon, twice naming him among the game’s top 100 prospects (2013 and 2014). But, man, four of those five CALs are downright scary. The lone hope is Myers, but the evidence is suggesting Pederson quite doesn’t live up to the hype.
This would be another miss by CAL. Yes, the Heyward comparison is quite promising, but outside of that it’s a mixed bag
As a Clevelander, I find this downright depressing. But Lindor’s always performed just slightly above the league average in terms of offense, so the fact that he’s linked to light-hitting middle infielders isn’t shocking. If his plus-plus defense is as good as the reports suggest, Lindor could be another guy in the Elvis Andrus mold, who also had several of the same CAL players.
This one bounces around a ton: going from the likes of All Star-type production to league average regular to role guys. Jennings’ offense slides somewhere in between.
For more analysis check out Joe Werner's site: ProspectDigest.com. You can follow him on Twitter at @JoltinJoey