WPA doesn't quite work as a tool to evaluate context-dependent performance. Do any other metrics provide a solution?
Last week, I looked at some issues that arise when we use WPA (Win Probability Added) as an evaluation tool - namely, the fact that WPA only considers the events that have led up to the play in question, but not the events that occurred from the time of the play to the end of the game. As I said in the article, I do not have anything against WPA as a storytelling metric, but when it comes to measuring the importance of a play or of a performance on the end result, WPA falls short.
In this installment, I'd like to look at a few alternatives to WPA for measuring context-dependent value. But first, I need to establish some sort of guideline for evaluating the metrics. To do so, below I state some intuitions about the relative importance of certain pairs of plays. You may not agree with these intuitions entirely, but I based on my conclusions from last week, I believe they are at least one coherent and rational way to evaluate plays.
All else being equal:
1) A home run in the first inning is equivalent to a home run in the ninth inning.
2) A grand slam is better than a solo home run.
3) With no one on base, a triple with no outs is better than a triple with two outs.
4) In the bottom of the 9th in a tie game with a runner on third, a single is equivalent to a double.
5) A home run in a game that ends 1-0 is better than a home run in a game that ends 6-0.
With those assumptions in mind, let's get down to business. I'm going to examine three context-dependent valuation tools that try to fix the previously described issues with WPA: WPA/LI, RE24, and a mutation of Dave Studeman's Game Leverage Index (as described here).
This is a pretty straightforward metric - it takes WPA and essentially normalizes it based on leverage, so that players who have been in more high leverage situations move closer to 0, and players who have had fewer high leverage opportunities see their WPA move away from 0. This is sort of like normalizing RBI so that it reflects the number of run-scoring opportunities a player has encountered, rather than simply counting runs.
Unfortunately, WPA/LI, as I touched on is my previous article, still encounters many of the same issues as WPA, albeit less drastically. Consider a shortened version of the example I posed last week:
Solo home run in the bottom of the first, tie game, two outs: .260 WPA/LI
Solo home run in the bottom of the 9th, tie game, two outs: .337 WPA/LI
Already, we see that WPA/LI fails on the first test. If you believe that a 9th inning home run is more important, that's fine. Maybe WPA/LI is right for you. But it's not what I'm looking for, so let's move on.
RE24 is simple - take the number of runs a team is expected to score before the play and subtract it from the expected runs after the play (plus any runs the team scored as a result of the play). Essentially, it's WPA without considering inning and score. It's also easier to measure in certain situations, since a solo home run is worth exactly one run, regardless of score or inning.
As a result, 1) above is true under RE24. The inning doesn't matter - a solo HR in the first is just as important as a solo home run in the 9th. In fact, I can tell at a glance that 2) and 3) are true as well. But let's go through the calculations anyway:
2) Grand slam (with, say, 0 outs): [Expected runs with no one on, 0 outs] - [Expected runs with bases loaded, 0 outs] + 4 = 0.544 - 2.39 + 4 = 2.154 runs.
Solo HR (again, 0 outs): [Expected runs with no one on, 0 outs] - [Expected runs with no one on, 0 outs] + 1 = 0.544 - 0.544 + 1 = 1.
2.154 > 1. Check.
3) Triple with no outs: [RE with runner on 3rd, 0 outs] - [RE with no one on, 0 outs] = 1.433 - 0.544 = 0.899.
Triple with two outs: [RE with runner on 3rd, 2 outs] - [RE with no one on, 2 outs] = 0.385 - 0.112 = 0.273.
0.899 > 0.273. Check.
We run into some problems with 4) and 5), however. The fact that a single is just as important as a double with a runner on third in the bottom of the 9th is not recognized by RE24 because it gives the batter credit for ending up on 2nd instead of 1st after the play. 5) will also fail because RE24 does not care about the score of the game at the time of the play, let alone the final score.
So, RE24 is a NO as well. But it's not too far off. It works because it doesn't care about inning, but in doing so, it ignores the finality of the bottom of the 9th. More importantly, it fails to take score into account. We want to reward positive outcomes in close games more than those in blowouts.
To solve the score problem, we essentially need to adjust each play by the run differential in the game. And remember, I don't mean the run differential at the time of the play, but the run differential at the end of the game. Luckily, Dave Studeman of the Hardball Times wrote a great piece back in 2007 about this very thing. He ended up creating something called the "Game Leverage Index" (GLI). I'll let him explain:
I pulled all the WPA events from 2006 and calculated the average impact of each event on the win probability of the batter's team. Not surprisingly, I found that the same type of event (such as a single) in close games has a larger win impact than that type of event in blowouts. Based on the data, I have estimated a standard multiplier for calculating the impact of a hit or out in a game with a particular victory margin.
Here's what that looks like:
Now, this isn't sufficient by itself to satisfy all the criteria above, because it doesn't take into account the base-out situation. That's where RE24 comes in. If we use RE24 to take care of the base-out states, and use GLI to take care of the score difference, we can, maybe, get the results we want. Let's find out what happens if we simply multiply them together.
RE24 * GLI:
1) through 3) are already taken care of by RE24 because the score of the game is held constant in the comparison. I'm going to skip 4) for now, because I have a feeling it will create some trouble, but 5) is what we really want.
5) Solo home run, 2 outs, 1st inning of a game that ends 1-0:
RE24 * GLI = ([RE with none on, 2 outs] - [RE with none on, 2 outs] + 1) * GLI(1) = (0.112 - 0.112 + 1) * 1.38 = 1.38.
Solo home run, 2 outs, 1st inning of a game that ends 6-0:
RE24 * GLI = ([RE with none on, 2 outs] - [RE with none on, 2 outs] + 1) * GLI(6) = (0.112 - 0.112 + 1) * 0.66 =0.66.
1.38 > 0.66. Check.
That's a pretty simple example, or course, but we now have a metric that makes most of the previous criteria true. Before I get to 4), I'd like to see how this metric handles some more complicated examples.
Grand slam, top of the 9th inning, 2 outs, game ends 6-4 (with the batter's team losing):
RE24 * GLI = ([RE with none on, 2 outs] - [RE with bases loaded, 2 outs] + 4) * GLI(2) = (0.112 - 0.814 + 4) * 1.13 = 3.727
Solo home run, top of the 9th inning, 2 outs, game ends 1-0 (with the batter's team winning):
RE24 * GLI = ([RE with none on, 2 outs] - [RE with none on, 2 outs] + 1) * GLI(1) = (0.112 - 0.112 + 1) *1.38 = 1.38.
In this case, as you can see, a batter who hits a grand slam but whose team still loses by 2, receives more credit than someone who hits a go-ahead solo home run in the 9th inning. Is this right? On the one hand, we want to reward the former batter for getting a clutch hit when there was a huge opportunity to get back into the game. On the other hand, the latter batter's home run was crucial to the end result. While the former's hit ended up being insignificant in the grand scheme of things, the latter's hit essentially won the game for his team.
There's another problem with RE24*GLI: only the run differential matters, not which team is on top. In the first example above, we would have given the same amount of credit if the batter had instead hit the home run to put his team ahead by 2 instead of behind by 2. But intuitively, we want to reward the batter who puts his team ahead over the one who just put his team in striking range. Winning, after all, is what we really care about.
In that way, WPA gets it right. Consider the WPA and WPA/LI values for the first situation above and the alternative in which the grand slam puts the team ahead by two runs:
Grand Slam, 2 outs, top 9, down by 6: 0.0073 WPA, 0.052 WPA/LI
Grand Slam, 2 outs, top 9, down by 2: 0.7684 WPA, 0.135 WPA/LI
What it seems that we're seeing here, and believe me when I say I didn't plan this, is that we want some sort of combination of the three metrics above. WPA/LI is helpful because it can accurately measure the impact of different plays within an inning while taking score into account. It also does a good job with 4) above, and adjusts for opportunity. RE24 is helpful because it can measure run expectancy independent of inning. Finally, GLI solves the problem I presented last week, in that it rewards plays in close games without taking inning into account.
I'm afraid I'll have to leave it at that for now, as I'm creeping up on 2000 words. In the next installment, I'll actually, finally, work on combining all these concepts in order to create a metric that accurately measures the relative importance of plays.