As some of you may know from comments I've made in other posts over the past month or so, I have been working on software that I am calling "Total Run Accounting" (TRA). I finally have it done to the point where I can post some preliminary results for team offense. I've posted a spreadsheet to Google Docs here:
I've sorted the sheet based on ability of a team to do the "little things" (baserunning, moving runners over - more details below the bump). Surprisingly (the extent, not the team), the Minnesota Twins came out a full 27 runs better than the next best team (Angels) at doing the "little things".
More details on methodology, etc below:
Here's the short description of the method behind TRA. I'm working on a longer, more detailed description that I'll post when finished.
Expected Runs (ER)
Basically, I set out to account for each and every run scored during the 2008 season, using each play to allocate runs to a baserunner, fielder, hitter or pitcher. The TRA software was written to read and manipulate MLB Gameday play by play data, breaking down each plate appearance into a series of "transitions" that can be allocated to a single player. The basis of TRA is "Expected Runs" (ER), best described elsewhere, but posted by BPro here:
The general concept is that in any inning situation (outs, baserunners), averaged over all of 2008, a team can be expected to score some number of runs. As outs are made, this number decreases. As runners reach base, this number increases.
I'll be using the following notation for a given inning situation: "ABC/Y", where A represents the runner on first ("1" if there is a runner on first, "X" if not), B is the runner on second ("2" or "X"), etc and "Y" is the number of outs.
When I break a play into each "transition", I calculate ER and allocate accordingly. For example, with zero outs and no runners on base (XXX/0), Denard Span walks. I create one transition from the start situation (0.52 ER) to the end situation (1XX/0, 0.90 ER) and allocate offensively to Denard Span as a batter. I denote this transition as T(XXX/0, 1XX/0) = +0.38.
This is the simple case. Once runners reach base, things get a bit more complicated.
Batting Expected Runs (Batting ER)
Batting ER is relatively simple. In each plate appearance, I create at least one transition and allocate to the batter. This is where the batter gets blame/credit for taking a walk, hitting a double, striking out, etc. In the above example, Denard Span would be assigned +0.38 Batting ER for taking a walk.
In order to capture the "little things" that generally are not captured in a box score, I have defined two types of Batting ER.
"Standard" Batting ER
"Standard" Batting ER captures the batter's responsibility based on the outcome of an at bat. For a base hit with runners on base, the batter is assigned ER for advancing the runners one base for a single, two for a double, etc. The question becomes, what do we do with runners advancing an extra base?
"Other" Batting ER
"Other" Batting ER acknowledges the fact that based on the type of hit (LD, FB, GB, etc) and location (position for now, fielding zone in a future iteration), a runner may be more or less likely to advance an extra base. For example, with a runner on second, a ground ball is much more likely to advance him to third if hit to the right side than to the left side of the infield. The batter should get credit for this. In order to calculate the batter allocation, I analyzed plays over the entire season and determined the average percentage chance that a baserunner successfully advanced to a given base, given type of hit and location. I assign "other" ER to the batter based on the percentage chance of advancing. If a baserunner has a 50% chance of advancing an extra base, I create a transition in which the batter gets 50% of the additional ER for the baserunner advancing, regardless of whether the runner actually advanced. More on that below.
For Example, suppose Joe Mauer singles on a ground ball to the right fielder. Mauer is assigned T(1XX/0, 12X/0) = +0.63 ER. Then, because there is a 41.3% chance of the runner advancing to third, I assign 0.413 * T(12X/0, 1X3/0) = 0.413 * 0.24 = +0.099 "Other" ER.
Baserunning Expected Runs (Running ER)
Stolen base attempts are easy, they result in a single transition that is assigned to the baserunner, positive or negative depending on success. In the case of double steals, multiple transitions are created starting with the lead runner. For example, given situation 12X/0 and a double steal resulting in X23/0, I first assign ER to the lead runner for Transition T(12X/0, 1X3/0) and then I assign a transition to the following baserunner for T(1X3/0, X23/0).
In the case of runners potentially advancing an extra base, I simply assign Running ER based on the difference between the "Other" Batting ER that was assigned to the hitter and the end result. There could three different results:
- The runner advances the extra base. In the above example, the baserunner Denard Span would be assigned Running ER = (1 - 0.413)* T(12X/0, 1X3/0) =+0.141 ER.
- The runner stays at his current base. In this case, Denard Span would be penalized for station to station baserunning, assigned Running ER = 0.413 * T(1X3/0, 12X/0) = -0.099 to cancel out the "Other" Batting ER that had been assigned to the hitter.
- The runner is thrown out trying to advance the extra base. Denard Span would be penalized quite a bit for making the out, assigned Running ER = -0.099 + T(12X/0, 1XX/1) = -1.099
As one can see from the numbers above, if Span chooses to go for the extra base, he better have an 88% chance of success to work out over the long run.
Opponents' Fielding ER
As I was working through offensive situations, I struggled with how to assign ER for opponents' fielding, specifically errors made by the defense. If a batter grounds to short and the shortstop commits a fielding error, it does not seem right for the batter to be rewarded the same as for a base hit. As a result, I created a separate ER category for each player to capture these runs due to the opponent's defense.
As a few of you have pointed out elsewhere (thanks Sky!), the batter should get some credit for at least putting the ball into play and creating the opportunity for the error. This is correct, but I plan to address this in the general case as I work on defense. I plan to update assignment of Batting ER based on "fielding independent hitting", assigning Batting ER not based on the actual result (single, double, groundout), but rather on the expected, probabilistic result of a play (averaged over the entire league / season) based on type (LD, FB, GB, PF, BU) and location (fielding zone). I'm not there yet though.
As one would expect, the total Opponents' Fielding ER averages out somewhat across the league, ranging from 42.69 (San Diego) to 73.23 (New York Mets). Analyzing this variance is a problem for another day...
As I've noted above, I'm currently working on defensive ER assignment. Because each run scored is a run allowed by the defense, total Offense ER = total Defense ER. My plan is to assign ER to the pitcher based on type and location of the batted ball (same expected outcome as I'm planning for hitters), and then assign ER to the fielders based on the actual result of the play.
You've probably noticed that this post was heavier on methodology and lighter on the analysis of results. This is because I'm still tinkering with the method as I work through defense, and I want to put it out there for comments and recommendations.
- The Minnesota Twins, by far, take advantage of doing the "little things", baserunning and moving runners over. Their +42.22 ER (compared to average) was 27 runs ahead the second place team (Angels) and about 33 runs ahead of the next best AL Central team (Indians). Comparing to the White Sox, the gap is enormous, over 65 runs.
- The overall impact of baserunning, from best (Philadelphia +25.15 compared to average) to worst (-19.65) is about 45 runs, or 4.5 wins. Philadelphia is far and away above any other team, next best being Colorado's +11.73.
- Offensively, Cleveland and the New York Yankees were almost identical, varying by no more than 0.23 runs in either batting category or baserunning. Cleveland's 16 run advantage was virtually entirely due to a gap in opponents fielding.
- The Yankees, Indians, Cubs and Twins were the only teams that were above league average in both batting categories and baserunning.
I'm sure there are more findings here, let me know what you all think and see.