Over the past few years, I have attempted to find which prospects have a higher likelihood of success based on their minor league numbers. I have broken these players into level and age, focusing solely on their walks and strikeouts per plate appearance. But a trend became apparent through the analysis: slugging prospects were severely underrated, including the system's namesake Javier Baez. However, there are two big reasons I didn't include this in the first place: first, I knew using home runs per plate appearance would be far too noisy to be useful and second, it's really hard to even up all the varying run environments in six levels of minor leagues over 36 years of play.
So the idea of improving this analysis sat for a while until a tweet from Christopher Long piqued my interest:
@MaxWeinstein21 Not publicly, no. GB/FB is a crude measure of vertical spray tendency, ISO is a crude measure of ball speed.— Christopher D. Long (@octonion) December 26, 2013
That was a game-changer for me, never having spent any time thinking about ISO. In addition to this tweet by Long, Max Weinstein notes in his article on batted ball run values, "[ISO] is a pretty great proxy for the value of a player’s fly ball." So instead of using the terribly noisy statistic of home runs to determine a hitter's power, we can just use ISO (maybe I should have been able to come up with that on my own).
The method of JAVIER builds off the previous work I have done and is similar to Carson Cistulli's SCOUT. However, I have not built regression into the model yet and am bringing the calculation one step further in comparing the players to those who went before them. This time around, I am no longer relying on the arbitrary definitions of Low and High, but instead will be using a more continuous and interactive approach.
I will also be using a sample of all players in the minor leagues, as opposed to only prospects. This will allow for the system to be used on any minor league player, regardless of age, level, or prospect standing. The only necessary inputs to determine the productive, average, and bust percentages for any player are their PA, AB, 2B, 3B, HR, BB, and K--statistics readily available on the internet. I have provided an Excel Web App that will calculate the appropriate z-scores.
And z-scores are what this is built on. I find the zBB%, zK%, and zISO for each season from each minor league hitter since 1978, then find comparable seasons from each of these. The averages in the z calculations are based on the league as well as the level. This helps with the run environment problems mentioned earlier. The best way to do this would be to find park factors for each player season instead of league averages, but that would require play-by-play data I don't have.
This system uses VORP as the ultimate determination of a hitter's success. Part of this is due to the fact that my data comes from Baseball Prospectus and it made the transition easier. Another part is that it more completely covers a hitter's success. According to BP's glossary, VORP "considers offensive production, position, and plate appearances."
VORP is a good measure of overall offensive production, but it also includes a positional adjustment. I believe this better represents reality by building a small bit of defensive capability into the system. Light-hitting shortstops can still be productive if they stick at short for ten years, even if they don't bring down the house with their offense. The positional adjustment allows for some of these defensive-focused players to make their way into the productive/average categories.
The following are the definitions for Productive, Average, and Busted hitters. There are a few necessities for a player to qualify for any of the categories. They must be a non-pitcher, have played a majority of their career after 1978, have had an MLB debut prior to 2010, or were in the minor leagues in 2008 and were 25 in 2013. The final criteria find those older players who never made the major leagues.
Productive: At least 1,000 PA in the majors and at least .0275 VORP per PA. The range of these 508 hitters goes from Matt Murton (29.3 VORP in 1,058 PA) to Barry Bonds (1,592.7 VORP in 12,606 PA)
Average: At least 1,000 PA in the majors and between -.025 and .0275 VORP per PA. The range of these 649 hitters goes from Rafael Belliard (-63.3 VORP in 2,524 PA) to Omar Vizquel (296 VORP in 12,013 PA).
Bust: Fewer than 1,000 PA in the majors or less than -.025 VORP per PA. There are 12,627 busts, 11,038 of which never made the major leagues (yeah). The hitter with the most VORP who is labeled a "Bust" is Troy Neel (42.9 VORP in 861 PA), a former first baseman for the Athletics.
How do zBB, zK, zISO, Productive, Average, and Bust interact? Good question! This chart shows just that. On the x-axis, I used zBB-zK in order to get both on the graph. Each line represents a range of zISO values for a minor league career. The y-axis is the percentage of players for those minor league zBB-zK and zISO values that became productive major league hitters.
ISO is definitely the most important part of this system, as the Productive% for each zBB-zK value is highest at the higher zISO category.
JAVIER in action
To better describe what is happening behind the scenes, how about an example, using Javier Baez? I will actually do two examples--one for his 2014 year only and one for his minor league career to this point. In the following posts in this series, I will give you the tools to do these exact same calculations, all you will need are those basic stats mentioned earlier: PA, AB, 2B, 3B, HR, BB, and K.
JAVIER on Javier - Single Season
So far, Baez's 2014 stats are as follows:
Not very pretty. But how does that compare to the 2014 PCL?
So Baez's zBB becomes (8.3-8.7)/0.04 = -0.10. (Don't worry about all of this, it will be done automatically for you.) Here they all are:
Those are the magic numbers. Now I can create a range of values around each of the z-scores with which to compare other players, usually plus or minus one. So I will find any player who had a season in AAA where he was between 20 and 22 and had similar z-scores, then look at his major league career and see how well he performed there.
Here are the results for those similar seasons to Baez's 2014:
So 7 of the 44 (15.9%) players with similar seasons to Baez's 2014 went on to become productive major league hitters. Since the average productive% in AAA is only 7%, Baez's seemingly poor season is actually looking fairly good (even though he will have to bring that strikeout rate down). The best players with similar seasons are Jorge Posada and Devon White.
JAVIER on Javier - Career
The previous example only covered Baez's 2014 season, and we were able to find 44 similar player seasons with very stringent filters. But what if we want to see how his entire minor league career is looking? That's where this calculation comes into play.
Baez's minor league stats:
The walk and strikeout rates go down a little and the ISO comes up a whole bunch. The minor league averages and standard deviations during his time there are these:
Again, his zBB is (6.5-8.9)/0.03 = -0.83. All of the z-scores:
Now I will create a range of values just like before, but compare these to all other player's minor league totals instead. This does not account for age or level, just overall minor league production.
Since the average productive% is 3.7 for the career numbers, this isn't quite as promising as his 2014, but it still shows him as being about twice as likely as the average player to become a productive major league hitter. The top hitters with similar careers to Baez are David Ortiz and Juan Gonzalez. We can see now why his hitting ability, combined with enough defensive aptitude to play shortstop in AAA make Baez an extremely exciting prospect.
Top 2014 prospects
The most interesting part of JAVIER will be without a doubt the visualizations. Unfortunately, you will not find those in this post (but they're coming soon!). For now, I would like to apply the system to the current minor league prospects. What kind of years are they having and how is it affecting their likelihood of success in the major leagues? Below I have included a table of all prospects who appeared in my Team Prospect series who have at least 20 similar seasons to their current (as of late June) 2014 numbers. There may be a few pitchers included in this table which you can disregard as this is an offense-based system only. There also may be some players in the table who are listed incorrectly, since I matched these numbers on name and not ID. Also let me know if anyone is missing, which may be due to there not being enough similar seasons for them.
Previous minor league hitters
I ran this analysis on minor league players from 1995 to 2005, using data from 1978 to 1994. These player’s careers are basically settled at this point and it will be a good check to see if this system is actually useful. The first thing I did was to find similar seasons and create productive, average, and bust percentages. Then I used these to guess what a player may turn into. Obviously a bust is most likely, but by comparing the percentages to the average, it is possible to find which category the player falls under that is the most above average. Here is what I found based on a minimum of 20 similar careers:
|Actual Productive/Average||Actual Bust|
22% of the time JAVIER guessed productive or average on this data set, the player became a productive or average hitter. Compare that to only 3% when JAVIER guessed bust and the hitter actually became productive or average. Hitters are always most likely to become a bust, but the system definitely helps improve your chances of finding success. Finally, I look at each of the productive, average, and bust categories. How well do my guesses match up with the results?
JAVIER does a good job for each of these, but it does seem to break down at the extremes. Perhaps I could have used more categories for my guesses.
Here is the data set I used in this analysis with JAVIER’s guesses for minor league players from 1995 to 2005.
Due to the length and depth of this article, I decided to not run any of the visualizations here. In the coming days, I will post two more JAVIER-related articles. The first will allow you to do the Single Season calculation I ran earlier and the second will allow for the Career calculation. Both will have Tableau interactive graphics which give you filtering capabilities on age and level (only for Single Season), BB, K, and ISO, along with a graph that allows you to quickly find the best and worst similar players. So stick around, the best is yet to come!
. . .
Statistics courtesy of Baseball Prospectus.
Chris St. John is a writer at Beyond The Box Score. You can follow him on Twitter at @stealofhome.