clock menu more-arrow no yes

Filed under:

Introducing Advancement Percentage, a new metric for hitters

New, 4 comments

Situational stats like RE24 are all the rage these days, but they're often unintuitive and difficult to calculate. In this article, I introduce a new metric that solves both of these problems.

Jed Jacobsohn

This article is the result of an idea I've had for quite a while, but never seriously explored until now. I've read a lot about OPS, how it works okay despite combining stats with different denominators, and the way multiplying OBP by 1.8 is better than simply adding it to SLG. Then you have stats like RE24, which encompass not only what the outcome of the plate appearance is but also the situation it occurred in and the resulting positions of the runners. Below, I detail a rate stat that combines features of OPS and RE24, making it slightly situational without reliance on run-expectancy tables. My hope is that you, dear reader, find it intuitive, easy to understand, and informative. Enjoy!

Advancement Percentage

The idea behind Advancement Percentage, or AP (you'll see why I avoided A% later), is similar to that of RE24. The batter's destination does not a complete picture make, and some other information is necessary to get an idea of the value each outcome provides. For RE24, you have the run expectancy table, with the mean value for each of the base-out states. AP works on a simpler principle: namely, to disregard the number of outs and only focus on the arrangement of the runners. Every plate appearance has a minimum of four of what I call Advancement Opportunities, or AO (for consistent notation): the batter can advance himself four bases by hitting a home run. The batter has chances to advance any runners on as well: he can move a runner on first a maximum of three bases, a runner on second two, and a runner on third one. Summing, we get a maximum AO of 10 with the bases loaded. Each plate appearance yields at most ten Advancement Completions, or AC. AP is simply AC divided by AO, much as batting average is simply H/AB.

Note that one can accrue negative AC by creating a worse situation on the bases, such as hitting into a double play or hitting into an unforced fielder's choice.

Methodology

I used play-by-play files available on Retrosheet to calculate AP, since unfortunately doing so requires play-level and not season-level data. I disregarded all plays that were not batter events, such as stolen bases and wild pitches (since AP is not designed to capture baserunning ability). Also, I eliminated errors and interference plays, since doing so would unfairly assign credit to batters. I looked at the years 1952-2013 (2014 data is not yet available).

Results

AP falls below batting average in its range of usual values, with my data set having a 0.139 AP overall. Here are the top ten qualifying seasons:

Batter Year AO AC AP
Barry Bonds 2004 3410 825 0.242
Barry Bonds 2001 3600 868 0.241
Barry Bonds 2002 3311 782 0.236
Todd Helton 2000 3911 888 0.227
Ted Williams 1957 2820 637 0.226
Larry Walker 1999 2841 640 0.225
Jeff Bagwell 1994 2580 581 0.225
George Brett 1980 2823 635 0.225
Larry Walker 1997 3673 810 0.221
Barry Bonds 2003 3041 665 0.219

Surprise, surprise: Barry Bonds and his 232 walks lead the list, and he's also the holder of the top three single-season marks, all head and shoulders above the rest. There's a nice mix of hitters who hit for power (like Bagwell) and those that hit for average (like Brett), promising news for a stat that I intended to balance both skills sort of like OPS. Now for the career leaderboard:

Batter AO AC AP
Ted Williams 18645 3891 0.209
Barry Bonds 67900 13225 0.195
Albert Pujols 46887 8906 0.190
Manny Ramirez 54723 10268 0.187
Mickey Mantle 51222 9586 0.187
Duke Snider 33192 6141 0.185
Larry Walker 43521 8049 0.185
Frank Thomas 55420 10244 0.185
Joey Votto 20208 3733 0.185
Mark McGwire 41885 7714 0.184

The Splendid Splinter beats out Bonds by a clear margin for the top spot. That's a bit misleading, though, since my data set only goes back to 1952 (for accuracy and convenience reasons), just past the midpoint (in seasons) of Williams's career. The uncertainty of whether or not one can trust his numbers is another good question, one which I will address statistically.

Reliability

Taking a slice out of Russell Carleton's pizza box, I'll look at a measure known as split-half reliability. The basic idea is that we want to find out how many PAs a player needs before we have a real handle on what his true talent level is for a particular stat. My naïve way of calculating split-half, which Dr. Carleton pointed out can be improved upon but is still useful, is to number the plate appearances consecutively and look at the odd ones versus the even ones. When the two samples' correlation is at least 0.7 (meaning ≥50% of the variance is explained), we conclude the statistic has stabilized. I took the 10 years from 2004-2013 as my population, and selected only players with at least 2500 PA over that time frame (287 in all). Here are the results (at 250 PA intervals):

PA r
250 0.220
500 0.396
750 0.433
1000 0.531
1250 0.592
1500 0.591
1750 0.686
2000 0.691
2250 0.740
2500 0.736

A couple of things to note: first, the correlation coefficient does not increase with each increase in PA, dropping (albeit slightly) from 1250 to 1500 and 2250 to 2500 PA. This is probably just random fluctuation, as it's unlikely that adding PA actually decreases reliability. Also, the magic threshold is somewhere between 2000 and 2250 PA, probably closer to 2000. This means it may take several full seasons for a player's AP to settle at true talent level. While larger than any of the numbers Dr. Carleton obtained, it's not completely out of whack with the rest—for example, extra-base hit rate took around 1610 PA. While this makes AP less stable (and therefore more prone to small sample size issues), it's reassuring that we're clearly not looking at something that's essentially random.

That being said, we can get a reasonable idea of who's the best all time in terms of AP. But since I already showed that leaderboard, let's take a look at...

The Laggers

Here you have it, the ten worst single-season AP marks among qualified hitters:

Batter Year AO AC AP
Hal Lanier 1968 2794 226 0.081
Bob Lillis 1963 2550 214 0.084
Mark Belanger 1968 2819 245 0.087
Hector Torres 1968 2412 215 0.089
Horace Clarke 1968 2946 265 0.090
Bob Boone 1984 2627 245 0.093
Dick Schofield 1965 2662 250 0.094
Larry Bowa 1973 2495 237 0.095
Johnnie LeMaster 1982 2562 245 0.096
Don Blasingame 1971 3151 302 0.096

You never saw a greater collection of light-hitting middle infielders, with one catcher (Bob Boone) thrown in for good measure. None of these names are particularly surprising, although it's interesting to note that the 1980 champion Phillies had two laggers on their team, Boone and shortstop Larry Bowa. The career worsts are not worth showing since they're all pitchers who never qualified in a single season, and all of them are 0.051 and 0.059 overall.

Conclusion

There's certainly room for improvement (tightening the intervals used to find the reliability, incorporating data from earlier years, removing pitchers entirely), but overall the results are very promising. It shouldn't be difficult to get a feel for what constitutes a good or bad AP: 0.200 is extremely good, 0.150 is around average, and 0.100 is pretty terrible. This 50-point deviation from the mean at the extremes is in keeping with that for other stats when you consider it as a proportion of league average; that is, if 0.330 is an average OBP, 0.220 is atrocious and 0.440 is world-class.

If you have any suggestions or opinions to share, please do so in the comments. I'd also like to note that I frequently consulted the work of Russell Carleton as well as this paper by Gary Hardegree on a near-identical, independently-developed statistic he called base-advance average. This work would have suffered without their work and that of the collective baseball research community; indeed, the mutual benefits of such research are part of what makes doing it so enjoyable.

. . .

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.

Steven Silverman is a featured writer at Beyond the Box Score and a student at Carnegie Mellon University. He also writes for Batting Leadoff. You can follow him on Twitter at @Silver_Stats or email him at Steven@SilverStats.com.