It's the beginning of August, maybe the end of July. Your team is halfway through a 12-game road trip against division rivals and playoff contenders, and fresh off back-to-back blowouts that emptied the bullpen. As tonight's starter gets into an early jam, the color commentator says, "They're looking for a long outing from him tonight. Even if he doesn't have his best stuff, they might leave him out there just to save the bullpen."
Saving the bullpen -- leaving the starter in for an extra inning or two to minimize the workload on a team's relief pitchers -- is part of baseball's conventional wisdom, appearing in countless game stories and press conferences. But how much does a team actually gain from this extra effort?
To this point, it has been difficult to quantify the exact contribution of this effect on a team's long-term success. Previous articles on this site show the relationship between starters who throw more than 200 innings per season and team winning percentage. The relationship is clear -- teams whose best starters stay healthy and effective are more likely to win -- but tells us nothing about the importance of a well-rested bullpen. Approaching it from the other angle, the authors of The Book found that relievers "can handle a much heavier workload than current managers are imposing" without diminishing their performance. This suggests that bullpens may not need as much saving as is usually assumed.
Working off this research, I wanted to determine the dynamic impact a longer-than-average start has on a team's winning percentage in subsequent games. If, as is claimed, a starter who pitches longer into games refreshes the bullpen, we would expect to see the performance of a team's bullpen improve, translating into a greater-than-expected winning percentage in the few days after a long start. And if this effect exists, it should be included in our calculations of pitcher value, as it suggests that an "innings-eater" -- those starters who can be counted on to pitch a large number of innings, year after year -- adds value to his team not only on the days he pitches but also on the next few days.
To test this hypothesis, I wanted to develop logistic regressions to describe the relationship between pitcher usage in game n and winning percentage in game n + k, where k is an integer between 1 and 5. Note that I'm referring to games and not days -- for this first approximation, I ignored the contribution of off-days and doubleheaders, though obviously they may contribute to bullpen fatigue. Since we expect the effect of a quality start to decay over time, I produced five logistic regression models, one for each value of k. I stopped after five games because, at that point, the five-man rotation has usually reset itself, and I was concerned that this would produce confounding effects in the data.
I built a database using Retrosheet of all regular season games between 1998 and 2012. I excluded games after the September 1 roster expansion, so that the extra bullpen arms afforded by the roster expansion would not dilute any potential effects. This produced a database of 60,140 games.
From each game I compiled a variety of usage statistics that would describe how long the starter lasted and how much effort was required of the bullpen. These statistics included number of relievers used, as well as innings pitched, batters faced, and pitches thrown by the starter and the bullpen unit. To prevent strength of schedule from influencing my results, I also included season-long measures of team offense (OPS+), pitching (ERA+), bullpen (bullpen R/G), and defense (park-adjusted defensive efficiency). Because not all of the games in Retrosheet include pitch counts, I excluded those games without pitch count data.
There was also some concern that starter quality could act as a hidden influence -- although better starters are more likely to go further into games, they are more likely to be followed by a weaker starter. This could produce the unexpected effect of lowering winning percentage after a long start, if, for instance, a team's #2 pitcher is significantly worse than the staff ace. To control for this, I also included the starter's ERA+ for the season as an estimate of pitcher quality, throwing out those games where the starter had an infinite ERA+ (i.e., a starter had a season ERA of 0.00).
This table shows the R2 values associated with each of the five models. We can see that there is almost no correlation between pitcher usage in game n and team performance in game n + k. The slight increase in R2 value for game n + 5 can be attributed to the variables related to starter quality (especially ERA+): since most rotations reset on game n + 5, the same pitcher will pitch both game n and n + 5, and thus winning percentage for game n + 5 depends on starter quality in game n.
To further hammer the point home, let us further investigate the one significant variable related to usage. The number of innings pitched by the bullpen was shown to influence the winning percentage for the following day. Can we determine if this effect is due to random chance, or if it is small but important?
Reliever IP is a variable that takes discrete values, so for each potential value, we can calculate a winning percentage on the next day for every game in our database. If there is no relationship between the two variables, we expect winning percentage to follow a binomial distribution, with p = 0.5 for the N games associated with each value of reliever IP.
This graph shows the winning percentage associated with reliever IP values up to eight innings. The dashed red lines in this graph represent the 5% and 95% bounds on a binomial cumulative density function for the given number of games. For example, we have 2,310 games in our database where the bullpen threw zero IP the previous game. If the relationship between reliever IP and winning percentage were random, we expect to see a winning percentage between .483 and .517 90% of the time, and we found a winning percentage of .505 for these games in our database. We can therefore conclude that any possible effect this variable has is small enough to be considered statistical noise.
From this analysis, we conclude that the efficacy of "saving the bullpen" is overstated, and that there is little if any carryover effect on the bullpen from a longer outing by a starter. Admittedly, there are some improvements to the methodology described here, including developing a probit model (as opposed to a simple linear regression) and using statistics that more directly measure bullpen performance. However, the results here serve as a first approximation showing that any potential dynamic effects from a long outing are most likely too small to contribute to player valuation models such as wins above replacement (WAR).
This conclusion is counter-intuitive and in some ways unsatisfying, as it suggests that innings-eaters have no value beyond their performance. Yet given the choice between a starter who contributes two WAR over 200 IP and one who contributes two WAR over 150 IP, I believe most front offices would prefer the starter who throws more innings. This suggests that we should continue to work to improve our player valuation models to account for such dynamic effects.
. . .
All statistics courtesy of Retrosheet.
Bryan Cole is a contributor at Beyond the Box Score. You can follow him on Twitter at @doctor_bryan.