After posting the original articles on JAVIER, I knew I would have to update these values after the season, which would provide an opportunity to build some improvements into the system.
As a refresher, JAVIER is a minor league hitting evaluation system that uses basic statistics and finds comparable players in history. It uses each player’s minor league walk, strikeout, and isolated power numbers and compares them to the league average, turning them into z-scores. These z-scores are then compared across minor league data spanning back to 1978. Read the previous description for a full understanding.
Before my prospect system got all fancy and had an official name, it only included walk and strikeout rate. The previous update added the name JAVIER, but more importantly isolated power as a means of differentiating between sluggers and slap hitters. This time I added speed, age adjustment and regression to league average elements. The productive, average, and bust categories still exist, but I instead use an average VORP approach to rank the players.
I changed quite a few things about the system, so I will walk through each modification one-by-one.
Perhaps the biggest internal change to the system is the addition of speed score. Speed, like power, is a distinct talent that creates a measurable difference between two players with an otherwise similar approach. This speed, or lack thereof, affects a player’s probability of success on offense. I wanted to add in a speed component while keeping the statistics basic and settled on using a form of Bill James’ speed score. In 2005, Patriot posted his version of Speed Unit on Walk Like a Sabermetrician, which fits perfectly with the JAVIER z-score system.
This statistic measures a player’s impact from speed in four ways, each from basic statistics: Stolen base percentage, stolen base attempts, triples, and runs. The assumption is that a player with good game-usable speed will steal bases and be safe more often, hit more triples, and score more runs than league average. Each underlying calculation is based off a z-score using the three-year average from that year, league, and level.
Here are the equations I used, taken from Patriot and Bill James. The statistics are all basic and easily found online.
Stolen Base Percentage = (SB+3)/(SB+CS+7)
Stolen Base Attempts = (SB+CS)/(1B+BB+HBP)
Triples = 3B/(AB-HR-K)
Runs = (R-HR)/(1B+2B+3B+BB+HBP)
After z-scores are created for each statistic, the final speed score is created.
Speed = 50+4.25*(zSBP+zSBA+zT+zR)
A speed score of 50 is league average and each component is weighed equally.
The components of speed score were not regressed to the league average. They were also not affected by the age adjustment; it did not seem appropriate to do so, since speed actually peaks early in a player’s career.
Another new feature in the JAVIER calculations is regression to the mean, which helps to rein in outliers and assist with small samples. I used Russell Carleton’s work, with assistance from John Choiniere.
Carleton found that ISO stabilizes at 160 AB, which means approximately 70 ABs of league average should be added to the player’s total. This is found by the following equation:
ABs to add = Opportunities/Correlation – Opportunities
In this example, the number of ABs of league average to add is equal to 160/0.7-160 = 69.
I added a 25% cushion and regressed each player’s ISO to 85 ABs of league average.
These are the league regression equations that I used.
Regressed ISO = (ISO*AB+LgISO*85)/(AB+85)
Regressed Walk Rate = (BB+LgBBPA*65)/(PA+65)
Regressed Strikeout Rate = (K+LgKPA*35)/(PA+35)
After regressing to the mean, I age-adjusted the numbers. I calculated the average league age, weighted by amount of plate appearances. In the minor leagues, this decreased the amount from the straight average by about half a year, meaning younger players typically get more plate appearances than older players. In the majors, the weighted age average is about half a year older than the straight average.
I compared the player’s seasonal age to the league average that year and divided their walk rate and ISO by this number. I multiplied strikeouts by the age adjustment, since a younger player would be expected to strike out more. This means an age adjustment for a younger player should decrease the amount (multiply by a number smaller than one) instead of increase it.
In the major leagues, any player older than the league average was given an age adjustment of one since there is no reason to dock a more chronologically-gifted player at the highest level.
Instead of using one-year league averages in the z-scores, I used three-year averages for walks, strikeouts, isolated power, and speed. This helped smooth out the abrupt changes from year-to-year in the same league due to randomness.
While looking through results from the previous version of JAVIER, I noticed that the amount of comparable players was incredibly variable. Some players had over a thousand comparisons, while others had zero. All players are most likely to fail, so the more players a hitter is compared to, the lower his JAVIER score becomes. I wanted to improve this while keeping it simple (read: not using similarity score matrices). Instead of using a flat range of 0.5 for the z-score comparisons, I used a fluid range based on the actual z-score value and how many similar players there were with that value. My cutoff was 3,750 similar players in that range for each individual z-score. This number is somewhat arbitrary, but it allowed for a consistent amount of comparable players, with most having between 100 and 400.
For instance, 4,477 qualified players had a career minor league zBB of 0 ± 0.04, so any player with a zBB of 0 would use a range of 0.04. However, a zBB of -1.50 does not hit the 3,750 cutoff until a range of 1.2, so players with this zBB would use a range of 1.2. This kept the amount of similar players in control. I also manually changed the ranges for a few players that still had very small amounts of comparisons.
Each z-score was calculated for every team season for every player. If said player was traded mid-season, he had two seasons in the database as opposed to one. Every change in team created a new season. I got rid of JAVIER calculations based on each one of these seasons because the sample sizes were too small and it left out a lot of otherwise relevant career data. However, I still wanted the league and age adjustments to be included in the career number calculations.
In order to solve this problem, I weighted z-scores by the particular denominator in the underlying calculations. For walk and strikeout rate, I multiplied the z-score by PA; for ISO, by AB; and for the speed score equations, by the various denominators. Then I added these numbers on the career level and divided by the appropriate career denominator. This kept the values in the proper ratios and also accounted for age and level at the same time, since that was built into the original z-score.
The productive, average, and bust categories still exist in the database, but are no longer used as an output for the system. I also changed how they are calculated. The player must have been in their age-28 season in 2014 for me to place a label on their major league career. This is based off my research on when prospects break out:
Productive – At least 1,000 PA, more than 175 VORP, and more than 0.04 VORP/PA
Average – At least 1,000 PA, more than 0 VORP, or greater than 2,000 PA and more than -0.02 VORP/PA
Bust – Fewer than 1,000 PA, less than 0 VORP, or more than 2,000 PA and less than -0.02 VORP/PA
Here is the difference between the former and newer way of calculating these boundaries. The x-axis represents major league career plate appearances and the y-axis represents major league career VORP.
So how do we put this all together? Instead of finding the likelihood of success and failure based on my subjective definitions, I found the average VORP for similar players. This is a somewhat analagous approach to Baseball Prospectus’s UPSIDE. There are three major relevant differences between the two systems. First, BP ignores players with no major-league experience, while JAVIER gives those players a 0 VORP and includes them in the comparison. Second, BP only uses the first six years of a player’s career, while JAVIER uses the entire history. Finally, BP uses exactly 100 similar players, each weighted based on how similar they are. JAVIER uses a varying range of similar players, all weighted equally.
Only non-pitchers in their 28-year-old season or later in 2014 with at least 500 minor league PAs were eligible to be included as historical comps.
How does this system perform? If I run JAVIER on players from the past who are at least 28 by now, this is what I get, using a 10 VORP moving average.
As JAVIER’s calculated average VORP increases, so does the actual career amount for historical players in an exponential manner. Imagine that -- players who perform well in the major leagues tend to have performed well in the minor leagues. Anything over a JAVIER VORP of 30 is noteworthy, but those over 50 have even better careers. Players with an average JAVIER VORP of 60+ are the truly stand-out elite minor league hitters. Historically, only 35 hitters had a JAVIER VORP of 60+.
|Name||JAVIER VORP||MLB VORP||MilB PA||zBB||zK||zISO||Speed|
This is a list of many of the top MLB hitters in recent memory, with the most notable exceptions of Jeff Salazar and Jeremy Reed.
If we take the same chart as above and split by position, we get this (notice the y-axis range is much larger):
Catchers have much lower values in JAVIER than other positions, so a backstop with a 30+ JAVIER value is much more impressive than a player at a different spot on the diamond. However, many of the top offensive catchers in the game (Mike Piazza, Ivan Rodriguez, and Joe Mauer) had JAVIER values lower than 10.
After boring you with all of those details, how about some results? Here is what JAVIER thinks of select younger players, mostly current and former prospects. I typically like to allow for at least 500 minor league plate appearances before putting any kind of weight into the ranking system. 2014 draftees do not have enough experience as professional hitters yet, but their rankings are interesting nonetheless.
There is a download link on the bottom right hand corner of the embedded file if you would like to play with the numbers on your computer.
Here are the same numbers, but for every player in the database.
. . .
All statistics courtesy of Baseball Prospectus.
Chris St. John is a writer at Beyond The Box Score. You can follow him on Twitter at @stealofhome.