/cdn.vox-cdn.com/uploads/chorus_image/image/37671498/57270048.0.jpg)
This article is the result of an idea I've had for quite a while, but never seriously explored until now. I've read a lot about OPS, how it works okay despite combining stats with different denominators, and the way multiplying OBP by 1.8 is better than simply adding it to SLG. Then you have stats like RE24, which encompass not only what the outcome of the plate appearance is but also the situation it occurred in and the resulting positions of the runners. Below, I detail a rate stat that combines features of OPS and RE24, making it slightly situational without reliance on run-expectancy tables. My hope is that you, dear reader, find it intuitive, easy to understand, and informative. Enjoy!
Advancement Percentage
The idea behind Advancement Percentage, or AP (you'll see why I avoided A% later), is similar to that of RE24. The batter's destination does not a complete picture make, and some other information is necessary to get an idea of the value each outcome provides. For RE24, you have the run expectancy table, with the mean value for each of the base-out states. AP works on a simpler principle: namely, to disregard the number of outs and only focus on the arrangement of the runners. Every plate appearance has a minimum of four of what I call Advancement Opportunities, or AO (for consistent notation): the batter can advance himself four bases by hitting a home run. The batter has chances to advance any runners on as well: he can move a runner on first a maximum of three bases, a runner on second two, and a runner on third one. Summing, we get a maximum AO of 10 with the bases loaded. Each plate appearance yields at most ten Advancement Completions, or AC. AP is simply AC divided by AO, much as batting average is simply H/AB.
Note that one can accrue negative AC by creating a worse situation on the bases, such as hitting into a double play or hitting into an unforced fielder's choice.
Methodology
I used play-by-play files available on Retrosheet to calculate AP, since unfortunately doing so requires play-level and not season-level data. I disregarded all plays that were not batter events, such as stolen bases and wild pitches (since AP is not designed to capture baserunning ability). Also, I eliminated errors and interference plays, since doing so would unfairly assign credit to batters. I looked at the years 1952-2013 (2014 data is not yet available).
Results
AP falls below batting average in its range of usual values, with my data set having a 0.139 AP overall. Here are the top ten qualifying seasons:
Batter | Year | AO | AC | AP |
---|---|---|---|---|
Barry Bonds | 2004 | 3410 | 825 | 0.242 |
Barry Bonds | 2001 | 3600 | 868 | 0.241 |
Barry Bonds | 2002 | 3311 | 782 | 0.236 |
Todd Helton | 2000 | 3911 | 888 | 0.227 |
Ted Williams | 1957 | 2820 | 637 | 0.226 |
Larry Walker | 1999 | 2841 | 640 | 0.225 |
Jeff Bagwell | 1994 | 2580 | 581 | 0.225 |
George Brett | 1980 | 2823 | 635 | 0.225 |
Larry Walker | 1997 | 3673 | 810 | 0.221 |
Barry Bonds | 2003 | 3041 | 665 | 0.219 |
Surprise, surprise: Barry Bonds and his 232 walks lead the list, and he's also the holder of the top three single-season marks, all head and shoulders above the rest. There's a nice mix of hitters who hit for power (like Bagwell) and those that hit for average (like Brett), promising news for a stat that I intended to balance both skills sort of like OPS. Now for the career leaderboard:
Batter | AO | AC | AP |
---|---|---|---|
Ted Williams | 18645 | 3891 | 0.209 |
Barry Bonds | 67900 | 13225 | 0.195 |
Albert Pujols | 46887 | 8906 | 0.190 |
Manny Ramirez | 54723 | 10268 | 0.187 |
Mickey Mantle | 51222 | 9586 | 0.187 |
Duke Snider | 33192 | 6141 | 0.185 |
Larry Walker | 43521 | 8049 | 0.185 |
Frank Thomas | 55420 | 10244 | 0.185 |
Joey Votto | 20208 | 3733 | 0.185 |
Mark McGwire | 41885 | 7714 | 0.184 |
The Splendid Splinter beats out Bonds by a clear margin for the top spot. That's a bit misleading, though, since my data set only goes back to 1952 (for accuracy and convenience reasons), just past the midpoint (in seasons) of Williams's career. The uncertainty of whether or not one can trust his numbers is another good question, one which I will address statistically.
Reliability
Taking a slice out of Russell Carleton's pizza box, I'll look at a measure known as split-half reliability. The basic idea is that we want to find out how many PAs a player needs before we have a real handle on what his true talent level is for a particular stat. My naïve way of calculating split-half, which Dr. Carleton pointed out can be improved upon but is still useful, is to number the plate appearances consecutively and look at the odd ones versus the even ones. When the two samples' correlation is at least 0.7 (meaning ≥50% of the variance is explained), we conclude the statistic has stabilized. I took the 10 years from 2004-2013 as my population, and selected only players with at least 2500 PA over that time frame (287 in all). Here are the results (at 250 PA intervals):
PA | r |
---|---|
250 | 0.220 |
500 | 0.396 |
750 | 0.433 |
1000 | 0.531 |
1250 | 0.592 |
1500 | 0.591 |
1750 | 0.686 |
2000 | 0.691 |
2250 | 0.740 |
2500 | 0.736 |
A couple of things to note: first, the correlation coefficient does not increase with each increase in PA, dropping (albeit slightly) from 1250 to 1500 and 2250 to 2500 PA. This is probably just random fluctuation, as it's unlikely that adding PA actually decreases reliability. Also, the magic threshold is somewhere between 2000 and 2250 PA, probably closer to 2000. This means it may take several full seasons for a player's AP to settle at true talent level. While larger than any of the numbers Dr. Carleton obtained, it's not completely out of whack with the rest—for example, extra-base hit rate took around 1610 PA. While this makes AP less stable (and therefore more prone to small sample size issues), it's reassuring that we're clearly not looking at something that's essentially random.
That being said, we can get a reasonable idea of who's the best all time in terms of AP. But since I already showed that leaderboard, let's take a look at...
The Laggers
Here you have it, the ten worst single-season AP marks among qualified hitters:
Batter | Year | AO | AC | AP |
---|---|---|---|---|
Hal Lanier | 1968 | 2794 | 226 | 0.081 |
Bob Lillis | 1963 | 2550 | 214 | 0.084 |
Mark Belanger | 1968 | 2819 | 245 | 0.087 |
Hector Torres | 1968 | 2412 | 215 | 0.089 |
Horace Clarke | 1968 | 2946 | 265 | 0.090 |
Bob Boone | 1984 | 2627 | 245 | 0.093 |
Dick Schofield | 1965 | 2662 | 250 | 0.094 |
Larry Bowa | 1973 | 2495 | 237 | 0.095 |
Johnnie LeMaster | 1982 | 2562 | 245 | 0.096 |
Don Blasingame | 1971 | 3151 | 302 | 0.096 |
You never saw a greater collection of light-hitting middle infielders, with one catcher (Bob Boone) thrown in for good measure. None of these names are particularly surprising, although it's interesting to note that the 1980 champion Phillies had two laggers on their team, Boone and shortstop Larry Bowa. The career worsts are not worth showing since they're all pitchers who never qualified in a single season, and all of them are 0.051 and 0.059 overall.
Conclusion
Must Reads
Must Reads
There's certainly room for improvement (tightening the intervals used to find the reliability, incorporating data from earlier years, removing pitchers entirely), but overall the results are very promising. It shouldn't be difficult to get a feel for what constitutes a good or bad AP: 0.200 is extremely good, 0.150 is around average, and 0.100 is pretty terrible. This 50-point deviation from the mean at the extremes is in keeping with that for other stats when you consider it as a proportion of league average; that is, if 0.330 is an average OBP, 0.220 is atrocious and 0.440 is world-class.
If you have any suggestions or opinions to share, please do so in the comments. I'd also like to note that I frequently consulted the work of Russell Carleton as well as this paper by Gary Hardegree on a near-identical, independently-developed statistic he called base-advance average. This work would have suffered without their work and that of the collective baseball research community; indeed, the mutual benefits of such research are part of what makes doing it so enjoyable.
. . .
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.
Steven Silverman is a featured writer at Beyond the Box Score and a student at Carnegie Mellon University. He also writes for Batting Leadoff. You can follow him on Twitter at @Silver_Stats or email him at Steven@SilverStats.com.