clock menu more-arrow no yes mobile

Filed under:

The Percentage Player Index

Bill James developed the Percentage Player Index some time back -- this post tweaks his formula and updates it for current players.

If you buy something from an SB Nation link, Vox Media may earn a commission. See our ethics statement.

Ben Zobrist
Ben Zobrist
Tim Heitman-USA TODAY Sports

In The New Bill James Historical Baseball Abstract, Bill James laid out a framework for something he called the Percentage Player Index in an essay on Joe Morgan (pp 479-480). He sought qualities that indicated intelligence on the field and chose four factors -- fielding percent compared to league average, stolen base percentage, strikeout to walk ratio and walk frequency. He assigned them weights, and every time I re-read that essay I say to myself "Hey, I should run that for modern players" and find out I can't get the math to work (I really don't know why this is the case -- it looks fairly straightforward even for an idiot like me).

I've written on something similar to this in the past, and as I gathered the data I realized I could use newer measures, so I updated some of the numbers, added others and did my best to convert them into runs. It won't be perfect for any number of reasons, but it measures modern-day players using James' construct.

I'll begin by explaining what I changed -- first, instead of fielding percent, I used Defensive Runs Saved (DRS), readily available on both Baseball-Reference and FanGraphs. There's so much that fielding percent does not tell that DRS gives a more complete measure of a player's defense. Since DRS is converted to runs, I did my best to convert the other inputs into runs as well, so instead of stolen base percent I used the Tom Tango run equivalents for stolen bases and caught stealing. Since B-R also includes other measures of success or failure on the base paths, I also included pickoffs, times thrown out on the base paths, advancing on an out and taking an extra base on a hit. I also included credit for infield hits and reaching on an error, items I feel are indicators of hustle, and converted walks and strikeouts into runs as well. Taken together, these are the various inputs and their run equivalents:

Positive Outcome
Runs Negative Outcome Runs
Reach on error .89 Caught Stealing -.39
Stolen base .20 Pickoff -.39
Extra base taken .20 Out on base -.39
Infield hit .89 Strikeout -.39
Walk .69
Extra base on hit .20

I made educated guesses based on the inputs Tom Tango calculated for the wOBA formula -- for example, I equated reaching on error and infield hits as being the same as a single and used the same stolen base value for taking extra bases. I used the caught stealing value for pickoffs and being thrown out on the base paths. The value I'm least comfortable with is for strikeouts*, but since it's a constant it will only change the overall number and should have no effect on rank.

* - My copy of Tango/Lichtman/Dolphin's The Book was purchased on my first-generation iPad and stopped opening a couple months ago. For some reason, I tried during during Bible Study this morning and it opened, so I was able to see the value for strikeouts as shown on Table 7 (p17)  is relatively close to what I used, so have more comfort than I did before. As it was opening, other people at my table asked what version of the Bible The Book was, and I said it was a quite different one they probably wouldn't understand.

I went back to 2010, and in that time span, these are the players this formula identifies as the best percentage players:

Player PA ROE SB CS PO OOB BT XBT Inf DRS KO BB Runs
Ben Zobrist 3349 27 78 26 4 22 102 152 98 63 513 413 305
Dustin Pedroia 3038 29 78 26 4 30 83 138 114 66 333 295 303
Ian Kinsler 3254 37 96 33 21 34 114 188 79 57 356 285 263
Elvis Andrus 3433 50 159 60 26 35 136 219 159 6 459 275 258
Jason Heyward 2819 34 63 24 5 18 73 152 77 98 544 315 241
Ichiro Suzuki 3056 37 146 30 12 25 77 122 205 12 347 153 240
Jose Bautista 2938 35 36 13 8 31 76 129 50 -5 470 464 235
Denard Span 2914 31 100 24 20 30 98 137 118 24 314 226 228
Brett Gardner 2439 19 143 37 13 21 74 111 120 70 462 252 225
Joey Votto 2840 21 36 18 11 39 83 133 34 27 526 477 224

Runs are rounded to nearest whole run.

Generally speaking, a formula like this would be expected to favor speedy up-the-middle players, and this is what is shown, with some notable exceptions. The top four make perfect sense to me, and after that I am generally surprised. What is shown are players who are smart on the base paths in all facets, don't strike out too much and are good with the glove. I made a slight tweak to the formula after I had written the first draft -- the same 10 players are listed, but with a slight change in order -- let's just say I had to change the picture I used with this post.

What about the other end of the spectrum?

Player PA ROE SB CS PO OOB BT XBT Inf DRS KO BB Runs
Chris Johnson 2453 12 16 3 1 16 42 72 52 -57 595 114 -135
Pedro Alvarez 2293 15 12 3 2 11 33 1 42 -28 678 211 -107
Alfonso Soriano 2535 20 32 13 6 21 46 74 38 -34 616 158 -99
Mike Morse 2117 18 2 5 0 13 37 58 34 -39 495 126 -86
J.P. Arencibia 1614 19 2 3 2 11 23 30 23 -6 462 84 -86
Matt Kemp 2695 21 85 35 11 39 67 142 52 -84 653 241 -82
Ryan Howard 2521 18 2 1 0 18 48 37 42 -43 713 249 -82
Mark Reynolds 2691 16 22 13 6 20 41 76 42 -44 842 329 -81
Juan Francisco 1066 2 2 6 1 11 13 20 23 -13 368 78 -80
Jarrod Saltalamacchia 1769 9 5 3 1 10 43 57 23 -24 545 166 -78

This type of player is slower, more rash on the base paths, strikes out too much and is defensively challenged. This is not to suggest these are either the best or worst players in baseball -- for example, leaving aside how Ryan Howard has succumbed to injuries over the last couple of years, his value isn't in his base running acumen as much as his power.

This Google Docs spreadsheet contains additional information on the 500+ players with at least 500 plate appearance since 2010, and also allows for sorting on the constituent elements of the components -- sorting the columns highlighted in yellow shows who contributed the most (or least) on the base paths, infield hits, DRS or strikeout to walk ratio and give greater explanation why players ranked where they did. In addition, for most players, position is included so similar positions can be filtered and evaluated.

This method isn't perfect -- for instance, I've turned it into a counting stat instead of the rate stat that Bill James initially created, which will give greater weight to those players with more plate appearances in both positive and negative ways. It also treats all five years equally instead of giving greater weight to more recent years (best exemplified by Ichiro Suzuki), something I'll play around with.

I'm certain there are aspects of player performance left out some would consider important, but when the term "smart player" is used, the insinuation is the player is using all the information available to make the most of what is given -- he judges when he might be able to steal, take an extra base on a player with an inferior throwing arm or a bad angle, keeps strikeouts to a minimum and fields his position well despite range or arm strength. Play around with the spreadsheet and some interesting nuggets will emerge.

Runs have been decreasing since around 2007, and as I write this discussions are beginning on shrinking the strike zone. When runs are scarce and increased playoff slots make every game that much more valuable, everything in a player's control that can generate an advantage needs to be used. I gathered the data for all players since 2010, and it would be interesting to see if a Percentage Team Index would correlate well with winning, but that's a discussion for another day (view this Tableau data viz for a brief look). For now, thanks to Bill James for yet another intriguing way to evaluate a player dimension people have been discussing for as long as baseball has been around, Tom Tango for the inputs and Baseball-Reference and FanGraphs for the ease with which the data can be gathered.

* * *

Data from Baseball-Reference and FanGraphs. Any errors in gathering and processing the data are the author's.

Scott Lindholm lives in Davenport, IA. Follow him on Twitter @ScottLindholm.