"Total Run Accounting" - Team Offense
As some of you may know from comments I've made in other posts over the past month or so, I have been working on software that I am calling "Total Run Accounting" (TRA). I finally have it done to the point where I can post some preliminary results for team offense. I've posted a spreadsheet to Google Docs here:
Total Run Accounting - Team Offense 2009-01-28
I've sorted the sheet based on ability of a team to do the "little things" (baserunning, moving runners over - more details below the bump). Surprisingly (the extent, not the team), the Minnesota Twins came out a full 27 runs better than the next best team (Angels) at doing the "little things".
More details on methodology, etc below:
Method
Here's the short description of the method behind TRA. I'm working on a longer, more detailed description that I'll post when finished.
Expected Runs (ER)
Basically, I set out to account for each and every run scored during the 2008 season, using each play to allocate runs to a baserunner, fielder, hitter or pitcher. The TRA software was written to read and manipulate MLB Gameday play by play data, breaking down each plate appearance into a series of "transitions" that can be allocated to a single player. The basis of TRA is "Expected Runs" (ER), best described elsewhere, but posted by BPro here:
The general concept is that in any inning situation (outs, baserunners), averaged over all of 2008, a team can be expected to score some number of runs. As outs are made, this number decreases. As runners reach base, this number increases.
Notation
I'll be using the following notation for a given inning situation: "ABC/Y", where A represents the runner on first ("1" if there is a runner on first, "X" if not), B is the runner on second ("2" or "X"), etc and "Y" is the number of outs.
When I break a play into each "transition", I calculate ER and allocate accordingly. For example, with zero outs and no runners on base (XXX/0), Denard Span walks. I create one transition from the start situation (0.52 ER) to the end situation (1XX/0, 0.90 ER) and allocate offensively to Denard Span as a batter. I denote this transition as T(XXX/0, 1XX/0) = +0.38.
This is the simple case. Once runners reach base, things get a bit more complicated.
Batting Expected Runs (Batting ER)
Batting ER is relatively simple. In each plate appearance, I create at least one transition and allocate to the batter. This is where the batter gets blame/credit for taking a walk, hitting a double, striking out, etc. In the above example, Denard Span would be assigned +0.38 Batting ER for taking a walk.
In order to capture the "little things" that generally are not captured in a box score, I have defined two types of Batting ER.
"Standard" Batting ER
"Standard" Batting ER captures the batter's responsibility based on the outcome of an at bat. For a base hit with runners on base, the batter is assigned ER for advancing the runners one base for a single, two for a double, etc. The question becomes, what do we do with runners advancing an extra base?
"Other" Batting ER
"Other" Batting ER acknowledges the fact that based on the type of hit (LD, FB, GB, etc) and location (position for now, fielding zone in a future iteration), a runner may be more or less likely to advance an extra base. For example, with a runner on second, a ground ball is much more likely to advance him to third if hit to the right side than to the left side of the infield. The batter should get credit for this. In order to calculate the batter allocation, I analyzed plays over the entire season and determined the average percentage chance that a baserunner successfully advanced to a given base, given type of hit and location. I assign "other" ER to the batter based on the percentage chance of advancing. If a baserunner has a 50% chance of advancing an extra base, I create a transition in which the batter gets 50% of the additional ER for the baserunner advancing, regardless of whether the runner actually advanced. More on that below.
For Example, suppose Joe Mauer singles on a ground ball to the right fielder. Mauer is assigned T(1XX/0, 12X/0) = +0.63 ER. Then, because there is a 41.3% chance of the runner advancing to third, I assign 0.413 * T(12X/0, 1X3/0) = 0.413 * 0.24 = +0.099 "Other" ER.
Baserunning Expected Runs (Running ER)
Stolen base attempts are easy, they result in a single transition that is assigned to the baserunner, positive or negative depending on success. In the case of double steals, multiple transitions are created starting with the lead runner. For example, given situation 12X/0 and a double steal resulting in X23/0, I first assign ER to the lead runner for Transition T(12X/0, 1X3/0) and then I assign a transition to the following baserunner for T(1X3/0, X23/0).
In the case of runners potentially advancing an extra base, I simply assign Running ER based on the difference between the "Other" Batting ER that was assigned to the hitter and the end result. There could three different results:
- The runner advances the extra base. In the above example, the baserunner Denard Span would be assigned Running ER = (1 - 0.413)* T(12X/0, 1X3/0) =+0.141 ER.
- The runner stays at his current base. In this case, Denard Span would be penalized for station to station baserunning, assigned Running ER = 0.413 * T(1X3/0, 12X/0) = -0.099 to cancel out the "Other" Batting ER that had been assigned to the hitter.
- The runner is thrown out trying to advance the extra base. Denard Span would be penalized quite a bit for making the out, assigned Running ER = -0.099 + T(12X/0, 1XX/1) = -1.099
As one can see from the numbers above, if Span chooses to go for the extra base, he better have an 88% chance of success to work out over the long run.
Opponents' Fielding ER
As I was working through offensive situations, I struggled with how to assign ER for opponents' fielding, specifically errors made by the defense. If a batter grounds to short and the shortstop commits a fielding error, it does not seem right for the batter to be rewarded the same as for a base hit. As a result, I created a separate ER category for each player to capture these runs due to the opponent's defense.
As a few of you have pointed out elsewhere (thanks Sky!), the batter should get some credit for at least putting the ball into play and creating the opportunity for the error. This is correct, but I plan to address this in the general case as I work on defense. I plan to update assignment of Batting ER based on "fielding independent hitting", assigning Batting ER not based on the actual result (single, double, groundout), but rather on the expected, probabilistic result of a play (averaged over the entire league / season) based on type (LD, FB, GB, PF, BU) and location (fielding zone). I'm not there yet though.
As one would expect, the total Opponents' Fielding ER averages out somewhat across the league, ranging from 42.69 (San Diego) to 73.23 (New York Mets). Analyzing this variance is a problem for another day...
Defense ER
As I've noted above, I'm currently working on defensive ER assignment. Because each run scored is a run allowed by the defense, total Offense ER = total Defense ER. My plan is to assign ER to the pitcher based on type and location of the batted ball (same expected outcome as I'm planning for hitters), and then assign ER to the fielders based on the actual result of the play.
Findings
You've probably noticed that this post was heavier on methodology and lighter on the analysis of results. This is because I'm still tinkering with the method as I work through defense, and I want to put it out there for comments and recommendations.
- The Minnesota Twins, by far, take advantage of doing the "little things", baserunning and moving runners over. Their +42.22 ER (compared to average) was 27 runs ahead the second place team (Angels) and about 33 runs ahead of the next best AL Central team (Indians). Comparing to the White Sox, the gap is enormous, over 65 runs.
- The overall impact of baserunning, from best (Philadelphia +25.15 compared to average) to worst (-19.65) is about 45 runs, or 4.5 wins. Philadelphia is far and away above any other team, next best being Colorado's +11.73.
- Offensively, Cleveland and the New York Yankees were almost identical, varying by no more than 0.23 runs in either batting category or baserunning. Cleveland's 16 run advantage was virtually entirely due to a gap in opponents fielding.
- The Yankees, Indians, Cubs and Twins were the only teams that were above league average in both batting categories and baserunning.
I'm sure there are more findings here, let me know what you all think and see.
1 recs |
13 comments
|
Comments
I'm somewhat confused
Shouldn’t the total offensive ER numbers work out close to actual runs scored?
Or am I misunderstanding how expected runs work?
Not quite
At the start of any inning, a team expects to score or give up 0.52 runs (in 2008). This is built in to the projection. Everything that happens during an inning (batting, running, etc.) moves that number up to 1,2,3, etc or down to zero. Basically, everything is relative to the mean expected runs.
I probably need to include total average expected runs for each team based on the total number of innings each team batted.
by Adam Peterson on Jan 28, 2009 10:03 PM EST up reply actions
So the sum of a team's ER is the number of runs they scored above/below the average number of runs scored by the league.
Assuming the league scores .52 runs per inning, naturally.
This is like Fangraphs’ BRAA, except it divides up credit/blame between multiple players, not just hitter and pitcher, right? So it’s a value metric, not an ability metric. And situational leverage is involved.
(I’m still digesting the details.)
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
Yes, pretty much
The sum should be the number of runs scored above average, if you account for the total number of innings a team was at the plate. (as well as walkoff innings). I’m working on calculating the total expected runs (0.52 * innings) and comparing to ER totals to verify.
Situational leverage is involved, but not nearly to the point of WPA. Definitely a value metric, but it takes more into consideration than BRAA. As you state, credit/blame is spread among the batter, baserunners and opposing defense (for offense), and pitcher / fielders for defense.
by Adam Peterson on Jan 29, 2009 10:15 PM EST up reply actions
So is this runs above or below expectation then?
Would that sum to 0 across the teams? By definition, the total number of Runs above or below expectation should be the expectation, right?
Or is it thrown off from 0 by including the Opponent Fielder ER?
Sorry for the basic questions, but something just isn’t clicking here.
I really like the approach though – especially the additions you’re planning. I’ve thought of doing something similar myself.
by Dan Turkenkopf on Jan 28, 2009 10:16 PM EST reply actions
Yes
The total offensive ER should sum to zero across MLB. Opponent fielding ER simply adds another category (currently biased in the positive direction) to be considered.
However, I will freely admit that this first iteration has resulted in a total of exactly 5.11 runs difference between total offensive ER and offensive ER (for each team) compared to league average. I’m surprised by this and am digging a bit deeper here. I’m guessing this has something to do with my ER calculations for each inning situation.
by Adam Peterson on Jan 28, 2009 10:59 PM EST up reply actions
Fielding
I’m not claiming that a similar approach has not been attempted before, but in this case, the attempt to isolate baserunning from batting, and offense from defense in general, should be considered.
by Adam Peterson on Jan 28, 2009 11:04 PM EST up reply actions
Open to suggestions...
If not tRA, then what? I’m not exactly hung up on a name…
by Adam Peterson on Jan 29, 2009 7:17 AM EST up reply actions
I like that you've built yourself a framework for future analysis.
I think the value of your efforts comes in with whatever analysis you can use it for. And you may very well be able to tighten up determining individual credit on one play.
The toughest part about pitching/fielding is having play-by-play data and a defensive metric. You may want to talk to Sean Smith about modifying TotalZone for Gameday data (or just implementing your system using Retrosheet dadta) so you can really start rolling with fielding.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
Yeah
I’m looking forward to MLB upgrading their fielding data to indicate the location a ball was hit versus fielded. As I have it now, I have no choice but to determine fielding zone based on where the ball was fielded (since that’s what MLB provides).
For individual credit on a per-play basis, on offense I like the idea of “fielding independent hitting”, resulting in equal ER assignments to the hitter and the pitcher (as it should be). There’s a lot of room on the defensive side since I’m still working it.
Yes, the main purpose here is to use for analysis. I will probably focus a bit more on the Minnesota Twins, but it will probably be interesting to see who grades out better than expected (e.g., Joe Mauer was surprisingly good, grading out at +5.12), etc. I also focused first on writing software to read and manipulate the MLB Gameday data in general to allow for other non-TRA types of analysis, e.g., did Carlos Gomez become more patient later in the season?).
by Adam Peterson on Jan 29, 2009 10:24 PM EST up reply actions

by 
























