clock menu more-arrow no yes mobile

Filed under:

Bullpen Chaining and Reliever WAR

via <a href="http://farm1.static.flickr.com/51/148661872_99858dd6ee.jpg?v=0">farm1.static.flickr.com</a>
via farm1.static.flickr.com

Sub-title: Another reason closers are overpaid.

Sean Smith recently released historical WAR data for pitchers.  There are two major pieces of his methodology that are interesting and different from what Fangraphs has done. [Edit: That's not true -- FG's does use the second item upcoming.  Hooray, once again, for Fangraphs.]  One, he's using ERA instead FIP, but adjusting it for ballpark, league, and quality of defense (thanks to his own defensive metric, TotalZone).  That has some advantages and disadvantages, but I'm not going to discuss them here.  Two, for relievers, he's not using their actual Leverage Index (LI) but instead the average of their actual LI and 1.  Why?  Bullpen chaining.  Here's what chaining is and why you need to account for it.

Chaining Bullpen Roles

Let's look at a typical bullpen that goes seven relievers deep, each tossing 72 innings.  Their ERAs and LIs are listed below:

ERA  LI
3.00  1.8
3.75  1.3
4.00  1.0
4.25  0.9
4.50  0.8
4.70  0.7
4.80  0.6

Note tha the bullpen's average LI is 1.0 and their average ERA is 4.15.  Because of leverage, however, we don't really care about the average ERA, we instead care about the leveraged ERA.  It doesn't matter whether the mop-up reliever posts a 5.00 ERA or a 2.00 ERA in blow-out innings, but it really matters how the studs performs in their high-leverage innings.  By weighting each reliever's ERA by their LI, the leveraged ERA of this bullpen comes out to be 3.98.  Over 504 innings, they would allow 223 leveraged earned runs.

Now, imagine that the closer goes to the DL.  He's removed from the bullpen and replaced by a replacement-level reliever with a 4.85 ERA.  But the new guy won't become the closer -- he's not good enough.  What will actually happen is that the main setup guy is bumped up to closer duties, and everyone below him gets bumped up a notch, too, with the new guy filling in the least important role.  That's what we call chaining -- everyone moves up the chain a step.  Here's what the new bullpen looks like:

ERA  LI
3.75  1.8
4.00  1.3
4.25  1.0
4.50  0.9
4.70  0.8
4.80  0.7
4.85  0.6

The leveraged ERA of this new bullpen is 4.33 and will allow 242 leveraged runs, 19 more than with the bullpen ace healthy.  Those 19 runs represent the actual value of the ace.  He's still being compared to a replacement-level reliever, but indirectly via chaining of bullpen roles.

If you instead compare the bullpen ace directly to a replacement-level pitcher, things look a little different.  Using (repERA - ERA) * IP/9 * LI and substituting in numbers you get (4.85 - 3.00) * 72/9 * 1.8, which equals 26.5 runs above replacement, 7.5 more than using chaining.  This method is flawed, once again, because it assumes the replacement-level reliever will pitch in the same role as the injured ace.

Accounting

Chaining makes logical sense because it describes what actually happens when one reliever is replaced by another.  But it also makes sense from an accounting stand point.  Take a replacement-level team that wins about 30% of the time for about 48 wins over a full season.  That's 33 wins worse than average.  Of those 33 WAR, about 20 go to position players (9*2.25), leaving 13 for pitchers.  Of those, one-third go to relievers, as they pitch one-third of the innings, leaving 4.25 WAR for the average bullpen.

Using the example bullpen above, you can replace each of the seven pitchers with a replacement-level reliever and find their value via chaining.  Here's the runs above replacement for each pitcher:

ERA RAR
3.00  19
3.75  8.5
4.00  6.0
4.25  4.0
4.50  2.0
4.70  1.0
4.80  0.2

In total, that's 41 RAR.  At 10 runs per win, it's extremely close to the estimate above of 4.25 WAR for an average bullpen.  On the other hand, if you calculate each reliever's RAR by replacing them directly with a replacement-level reliever without chaining, you find they're worth this much:

ERA RAR
3.00  26.5
3.75  11.5
4.00  7.0
4.25  4.5
4.50  2.0
4.70  1.0
4.80  0.2

Adding up these numbers yields 52.5 RAR for the average bullpen, which is a full win more than we account to an average bullpen.  Again, advantage to chaining.

Adjusting WAR and LI For Chaining

Assuming you now agree that chaining is the cool new thing to do, we need a way to value one reliever at a time, both because the chaining process is annoying, but also because we don't want to have to force every pitcher into one of seven roles.  Their actual usage is a lot more continuous.

If you compare each reliever's chaining-based RAR, their non-chaining RAR, and their non-leveraged (assuming their LI is 1) RAR, you'll find that the chaining RAR is in the middle.  That means a reliever's effective LI needs to be about half way between their actual LI and 1.  That's why Sean Smith uses the adjustment to LI that he does: to take chaining into account.

Now, I do think we can improve on the basic eff_LI = (1 + LI) / 2 formula with a bit of research.  In fact, I bet there's a way to use some calculus.  To start, we need a continuous function that describes the distribution of innings in an average bullpen for FIP and LI.  Then a pitcher's actual IP, FIP, and LI are put into the system, and all the infinitessimal changes due to chaining are added up.  Anyone geekier than me have any ideas on how exactly to make that work?

Even if you're going to pass on the calculus challenge, I hope the idea of chaining makes sense.  Yes, closers pitch in extremely high-leverage situations, in the 1.8 to 2.2 range.  But because they are not directly replaced by replacement-level pitchers, and can partially be replaced by decent relievers moving up the chain, they don't deserve the credit or the salary implied by their actual leverage.  Yet another reason closers are overvalued...

H/T to Tom Tango, Guy, and others.