/cdn.vox-cdn.com/uploads/chorus_image/image/45693074/usa-today-8030084.0.jpg)
In The New Bill James Historical Baseball Abstract, Bill James laid out a framework for something he called the Percentage Player Index in an essay on Joe Morgan (pp 479-480). He sought qualities that indicated intelligence on the field and chose four factors -- fielding percent compared to league average, stolen base percentage, strikeout to walk ratio and walk frequency. He assigned them weights, and every time I re-read that essay I say to myself "Hey, I should run that for modern players" and find out I can't get the math to work (I really don't know why this is the case -- it looks fairly straightforward even for an idiot like me).
I've written on something similar to this in the past, and as I gathered the data I realized I could use newer measures, so I updated some of the numbers, added others and did my best to convert them into runs. It won't be perfect for any number of reasons, but it measures modern-day players using James' construct.
I'll begin by explaining what I changed -- first, instead of fielding percent, I used Defensive Runs Saved (DRS), readily available on both Baseball-Reference and FanGraphs. There's so much that fielding percent does not tell that DRS gives a more complete measure of a player's defense. Since DRS is converted to runs, I did my best to convert the other inputs into runs as well, so instead of stolen base percent I used the Tom Tango run equivalents for stolen bases and caught stealing. Since B-R also includes other measures of success or failure on the base paths, I also included pickoffs, times thrown out on the base paths, advancing on an out and taking an extra base on a hit. I also included credit for infield hits and reaching on an error, items I feel are indicators of hustle, and converted walks and strikeouts into runs as well. Taken together, these are the various inputs and their run equivalents:
Positive Outcome | Runs | Negative Outcome | Runs | |
---|---|---|---|---|
Reach on error | .89 | Caught Stealing | -.39 | |
Stolen base | .20 | Pickoff | -.39 | |
Extra base taken | .20 | Out on base | -.39 | |
Infield hit | .89 | Strikeout | -.39 | |
Walk | .69 | |||
Extra base on hit | .20 |
I made educated guesses based on the inputs Tom Tango calculated for the wOBA formula -- for example, I equated reaching on error and infield hits as being the same as a single and used the same stolen base value for taking extra bases. I used the caught stealing value for pickoffs and being thrown out on the base paths. The value I'm least comfortable with is for strikeouts*, but since it's a constant it will only change the overall number and should have no effect on rank.
* - My copy of Tango/Lichtman/Dolphin's The Book was purchased on my first-generation iPad and stopped opening a couple months ago. For some reason, I tried during during Bible Study this morning and it opened, so I was able to see the value for strikeouts as shown on Table 7 (p17) is relatively close to what I used, so have more comfort than I did before. As it was opening, other people at my table asked what version of the Bible The Book was, and I said it was a quite different one they probably wouldn't understand.
I went back to 2010, and in that time span, these are the players this formula identifies as the best percentage players:
Player | PA | ROE | SB | CS | PO | OOB | BT | XBT | Inf | DRS | KO | BB | Runs |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Ben Zobrist | 3349 | 27 | 78 | 26 | 4 | 22 | 102 | 152 | 98 | 63 | 513 | 413 | 305 |
Dustin Pedroia | 3038 | 29 | 78 | 26 | 4 | 30 | 83 | 138 | 114 | 66 | 333 | 295 | 303 |
Ian Kinsler | 3254 | 37 | 96 | 33 | 21 | 34 | 114 | 188 | 79 | 57 | 356 | 285 | 263 |
Elvis Andrus | 3433 | 50 | 159 | 60 | 26 | 35 | 136 | 219 | 159 | 6 | 459 | 275 | 258 |
Jason Heyward | 2819 | 34 | 63 | 24 | 5 | 18 | 73 | 152 | 77 | 98 | 544 | 315 | 241 |
Ichiro Suzuki | 3056 | 37 | 146 | 30 | 12 | 25 | 77 | 122 | 205 | 12 | 347 | 153 | 240 |
Jose Bautista | 2938 | 35 | 36 | 13 | 8 | 31 | 76 | 129 | 50 | -5 | 470 | 464 | 235 |
Denard Span | 2914 | 31 | 100 | 24 | 20 | 30 | 98 | 137 | 118 | 24 | 314 | 226 | 228 |
Brett Gardner | 2439 | 19 | 143 | 37 | 13 | 21 | 74 | 111 | 120 | 70 | 462 | 252 | 225 |
Joey Votto | 2840 | 21 | 36 | 18 | 11 | 39 | 83 | 133 | 34 | 27 | 526 | 477 | 224 |
Runs are rounded to nearest whole run.
Generally speaking, a formula like this would be expected to favor speedy up-the-middle players, and this is what is shown, with some notable exceptions. The top four make perfect sense to me, and after that I am generally surprised. What is shown are players who are smart on the base paths in all facets, don't strike out too much and are good with the glove. I made a slight tweak to the formula after I had written the first draft -- the same 10 players are listed, but with a slight change in order -- let's just say I had to change the picture I used with this post.
What about the other end of the spectrum?
Player | PA | ROE | SB | CS | PO | OOB | BT | XBT | Inf | DRS | KO | BB | Runs |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Chris Johnson | 2453 | 12 | 16 | 3 | 1 | 16 | 42 | 72 | 52 | -57 | 595 | 114 | -135 |
Pedro Alvarez | 2293 | 15 | 12 | 3 | 2 | 11 | 33 | 1 | 42 | -28 | 678 | 211 | -107 |
Alfonso Soriano | 2535 | 20 | 32 | 13 | 6 | 21 | 46 | 74 | 38 | -34 | 616 | 158 | -99 |
Mike Morse | 2117 | 18 | 2 | 5 | 0 | 13 | 37 | 58 | 34 | -39 | 495 | 126 | -86 |
J.P. Arencibia | 1614 | 19 | 2 | 3 | 2 | 11 | 23 | 30 | 23 | -6 | 462 | 84 | -86 |
Matt Kemp | 2695 | 21 | 85 | 35 | 11 | 39 | 67 | 142 | 52 | -84 | 653 | 241 | -82 |
Ryan Howard | 2521 | 18 | 2 | 1 | 0 | 18 | 48 | 37 | 42 | -43 | 713 | 249 | -82 |
Mark Reynolds | 2691 | 16 | 22 | 13 | 6 | 20 | 41 | 76 | 42 | -44 | 842 | 329 | -81 |
Juan Francisco | 1066 | 2 | 2 | 6 | 1 | 11 | 13 | 20 | 23 | -13 | 368 | 78 | -80 |
Jarrod Saltalamacchia | 1769 | 9 | 5 | 3 | 1 | 10 | 43 | 57 | 23 | -24 | 545 | 166 | -78 |
This type of player is slower, more rash on the base paths, strikes out too much and is defensively challenged. This is not to suggest these are either the best or worst players in baseball -- for example, leaving aside how Ryan Howard has succumbed to injuries over the last couple of years, his value isn't in his base running acumen as much as his power.
This Google Docs spreadsheet contains additional information on the 500+ players with at least 500 plate appearance since 2010, and also allows for sorting on the constituent elements of the components -- sorting the columns highlighted in yellow shows who contributed the most (or least) on the base paths, infield hits, DRS or strikeout to walk ratio and give greater explanation why players ranked where they did. In addition, for most players, position is included so similar positions can be filtered and evaluated.
This method isn't perfect -- for instance, I've turned it into a counting stat instead of the rate stat that Bill James initially created, which will give greater weight to those players with more plate appearances in both positive and negative ways. It also treats all five years equally instead of giving greater weight to more recent years (best exemplified by Ichiro Suzuki), something I'll play around with.
I'm certain there are aspects of player performance left out some would consider important, but when the term "smart player" is used, the insinuation is the player is using all the information available to make the most of what is given -- he judges when he might be able to steal, take an extra base on a player with an inferior throwing arm or a bad angle, keeps strikeouts to a minimum and fields his position well despite range or arm strength. Play around with the spreadsheet and some interesting nuggets will emerge.
Runs have been decreasing since around 2007, and as I write this discussions are beginning on shrinking the strike zone. When runs are scarce and increased playoff slots make every game that much more valuable, everything in a player's control that can generate an advantage needs to be used. I gathered the data for all players since 2010, and it would be interesting to see if a Percentage Team Index would correlate well with winning, but that's a discussion for another day (view this Tableau data viz for a brief look). For now, thanks to Bill James for yet another intriguing way to evaluate a player dimension people have been discussing for as long as baseball has been around, Tom Tango for the inputs and Baseball-Reference and FanGraphs for the ease with which the data can be gathered.
* * *
Data from Baseball-Reference and FanGraphs. Any errors in gathering and processing the data are the author's.
Scott Lindholm lives in Davenport, IA. Follow him on Twitter @ScottLindholm.