Evaluating relief pitcher performance and talent remains a relatively poorly understood aspect of the game. Much focus is given to Saves, ERA, and inherited runners scored. Yet none of these measures effectively credit reliever performance. The difficulty with assigning credit, and for that matter penalty, to relief pitchers is at least partially because of mid-inning pitching changes. On Monday, Neil Weinberg had a very interesting article on this issue. A reliever that enters a game with two outs and runners on second and third only to give up a bases clearing triple before getting the final out, will not be charged with any of the runs. Instead, the pitcher(s) that allowed the runners to reach base will be. This method is an accounting practice that has been held over from the origins of baseball and baseball scorekeeping. But it is silly. It does not make sense that pitchers leaving the game should be responsible for all runners left on base, and it does not make sense that incoming relievers are not responsible for any of those runners. This method limits our ability to accurately differentiate good pitchers from bad. The context of the situation needs to be considered.
The desired context is integrated into the RE24 statistic. RE24 (Run Expectancy for 24 Base-Out states) has been described from the offensive perspective here (by Tom Tango) and here (by Matt Hunter), and from a relief pitching perspective in another excellent article by Matt Hunter. Some of you will already be quite familiar with RE24 and can skip the rest of this paragraph and the next. For those unfamiliar I suggest carrying on here and checking out those links. The basic idea behind RE24 is that it takes a team's expected runs scored in the inning before a given play, subtracts that from the expected runs scored in the inning after the play, while also subtracting any runs that actually score on the play. The RE24 value (positive or negative) is then attributed to the hitter and pitcher involved in the play.
RE24 would be calculated for the reliever given in the opening example as follows. When the manager brought the reliever into the game there were runners on second and third and two outs. The batting team is expected to score 0.626 runs for the rest of the inning. Run expectancy is derived from run expectancy matrices. But the reliever gives up a bases clearing triple before and then gets the third out. This leaves the batting team with a run expectancy of zero. The run expectancy for the batting team has dropped from 0.626 to 0 during the reliever's appearance (+0.626 to the reliever), but two runs have scored (-2 to the reliever). So the reliever would be given a -1.374 RE24 value for this appearance (0.626 - 2). It should be evident that this is a much better way of capturing and evaluating reliever appearances than had we recorded him giving up 0 runs in his 0.1 IP; as is typically done. But a relevant follow-up question is if RE24 can be used as a reliable predictor of future performance?
Evaluating a metric as predictive of future performance is typically done through year-to-year correlations. Metrics that correlate strongly are taken as indicative of a stable skill (e.g., GB%, FB%, K%). As evaluators we want to find and use metrics that seem to represent stable skills when attempting to predict performance and avoid using metrics that vary greatly season to season (e.g., ERA, LOB%). A great reference for relief pitching year-to-year correlations can be found here. Interestingly, this analysis has not been conducted for RE24 (or I cannot find it). So I decided to look into it and hopefully provide a useful reference.
To do this I obtained data from all relief pitchers that pitched at least 50 IP in a season between 1989-2013. This 24-year period was chosen to reflect the recent period during which relief pitching has become a more prevalent part of the game. I divided all of the player's careers into halves, with even seasons on one side and odd seasons on the other. To be included in the analysis, a pitcher had to have at least 450 total batters faced (TBF) in each of the even and odd season groups. 277 pitchers met these criteria.
Given below is a scatterplot of RE24 per 250 BF (250 BF is around a season's worth for this sample) for the 277 players that had at least 450 TBF in each half of their careers. Even seasons are given on the horizontal axis, and odd seasons are given on the vertical axis. If RE24 is measuring a stable skill for pitchers we should see a linear pattern. If not, we should see more of a random distribution of points.
The plot shows there is some pattern to the data, but it is closer to a random distribution than a linear relationship. Players with large positive values for RE24 per 250 BF in their even seasons have a tendency to have high RE24 per 250 BF in their odd seasons. Yet, the correlation is not very strong: 0.298. This moderate relationship is similar to the year-to-year correlations found for WHIP and IFFB%. The R-squared for this RE24 analysis is 0.09, which means that about 9% of the variance of RE24 in even seasons can be explained by RE24 in odd seasons. In other words about 9% of performance in RE24 might be explained by a player's skill, while the remaining 91% is best attributed to other factors and/or random variation.
As a matter of interest here are the 10 best and 10 worst pitchers from this dataset as ranked by RE24 per 250 BF.
First, the 10 best.
|RE24 per 250 BF
And, the 10 worst.
|RE24 per 250 BF
These lists are very interesting. The list of 10 best contains many names that we generally consider great relievers, while the 10 worst are for the most part largely unrecognized players (and a player the Dodgers recently gave a 3-year $22.5 million contract!). Perhaps there is actually something more than skill related to the RE24 metric.
To explore this a little bit further I ran a correlation analysis of a few pitching statistics against RE24 per 250 BF. From this analysis, I found a strong relation with K% (r = 0.597) and Shutdown:Meltdown ratio (r = 0.746). I expected BB% to have a strong negative relation but it did not (r = -0.175). Once again we see the importance of striking batters out. Investigating batted ball data (GB%, LD% and FB%) may also reveal some interesting relations. Unfortunately, it is not easily accessible for seasons prior to 2002, which would eliminate a large portion of this sample. It is something to consider for a future examination.
Generally speaking, this analysis shows that RE24 is likely not the best metric to use when attempting to predict future performance. While the context it provides is important, the context can randomly fluctuate too much from year-to-year depending on factors that are largely out of the control of the pitcher (e.g., quality of the other members of the bullpen, timing of managerial decisions to remove other pitchers). In this way, RE24 is better suited to describing what a pitcher has done, rather than what he will do.
. . .
All statistics courtesy of FanGraphs
Chris Teeter is a contributor to Beyond the Box Score. You can follow him on Twitter at @c_mcgeets.