Bullpen Chaining and Reliever WAR
Sub-title: Another reason closers are overpaid.
Sean Smith recently released historical WAR data for pitchers. There are two major pieces of his methodology that are interesting and different from what Fangraphs has done. [Edit: That's not true -- FG's does use the second item upcoming. Hooray, once again, for Fangraphs.] One, he's using ERA instead FIP, but adjusting it for ballpark, league, and quality of defense (thanks to his own defensive metric, TotalZone). That has some advantages and disadvantages, but I'm not going to discuss them here. Two, for relievers, he's not using their actual Leverage Index (LI) but instead the average of their actual LI and 1. Why? Bullpen chaining. Here's what chaining is and why you need to account for it.
Chaining Bullpen Roles
Let's look at a typical bullpen that goes seven relievers deep, each tossing 72 innings. Their ERAs and LIs are listed below:
ERA LI
3.00 1.8
3.75 1.3
4.00 1.0
4.25 0.9
4.50 0.8
4.70 0.7
4.80 0.6
Note tha the bullpen's average LI is 1.0 and their average ERA is 4.15. Because of leverage, however, we don't really care about the average ERA, we instead care about the leveraged ERA. It doesn't matter whether the mop-up reliever posts a 5.00 ERA or a 2.00 ERA in blow-out innings, but it really matters how the studs performs in their high-leverage innings. By weighting each reliever's ERA by their LI, the leveraged ERA of this bullpen comes out to be 3.98. Over 504 innings, they would allow 223 leveraged earned runs.
Now, imagine that the closer goes to the DL. He's removed from the bullpen and replaced by a replacement-level reliever with a 4.85 ERA. But the new guy won't become the closer -- he's not good enough. What will actually happen is that the main setup guy is bumped up to closer duties, and everyone below him gets bumped up a notch, too, with the new guy filling in the least important role. That's what we call chaining -- everyone moves up the chain a step. Here's what the new bullpen looks like:
ERA LI
3.75 1.8
4.00 1.3
4.25 1.0
4.50 0.9
4.70 0.8
4.80 0.7
4.85 0.6
The leveraged ERA of this new bullpen is 4.33 and will allow 242 leveraged runs, 19 more than with the bullpen ace healthy. Those 19 runs represent the actual value of the ace. He's still being compared to a replacement-level reliever, but indirectly via chaining of bullpen roles.
If you instead compare the bullpen ace directly to a replacement-level pitcher, things look a little different. Using (repERA - ERA) * IP/9 * LI and substituting in numbers you get (4.85 - 3.00) * 72/9 * 1.8, which equals 26.5 runs above replacement, 7.5 more than using chaining. This method is flawed, once again, because it assumes the replacement-level reliever will pitch in the same role as the injured ace.
Accounting
Chaining makes logical sense because it describes what actually happens when one reliever is replaced by another. But it also makes sense from an accounting stand point. Take a replacement-level team that wins about 30% of the time for about 48 wins over a full season. That's 33 wins worse than average. Of those 33 WAR, about 20 go to position players (9*2.25), leaving 13 for pitchers. Of those, one-third go to relievers, as they pitch one-third of the innings, leaving 4.25 WAR for the average bullpen.
Using the example bullpen above, you can replace each of the seven pitchers with a replacement-level reliever and find their value via chaining. Here's the runs above replacement for each pitcher:
ERA RAR
3.00 19
3.75 8.5
4.00 6.0
4.25 4.0
4.50 2.0
4.70 1.0
4.80 0.2
In total, that's 41 RAR. At 10 runs per win, it's extremely close to the estimate above of 4.25 WAR for an average bullpen. On the other hand, if you calculate each reliever's RAR by replacing them directly with a replacement-level reliever without chaining, you find they're worth this much:
ERA RAR
3.00 26.5
3.75 11.5
4.00 7.0
4.25 4.5
4.50 2.0
4.70 1.0
4.80 0.2
Adding up these numbers yields 52.5 RAR for the average bullpen, which is a full win more than we account to an average bullpen. Again, advantage to chaining.
Adjusting WAR and LI For Chaining
Assuming you now agree that chaining is the cool new thing to do, we need a way to value one reliever at a time, both because the chaining process is annoying, but also because we don't want to have to force every pitcher into one of seven roles. Their actual usage is a lot more continuous.
If you compare each reliever's chaining-based RAR, their non-chaining RAR, and their non-leveraged (assuming their LI is 1) RAR, you'll find that the chaining RAR is in the middle. That means a reliever's effective LI needs to be about half way between their actual LI and 1. That's why Sean Smith uses the adjustment to LI that he does: to take chaining into account.
Now, I do think we can improve on the basic eff_LI = (1 + LI) / 2 formula with a bit of research. In fact, I bet there's a way to use some calculus. To start, we need a continuous function that describes the distribution of innings in an average bullpen for FIP and LI. Then a pitcher's actual IP, FIP, and LI are put into the system, and all the infinitessimal changes due to chaining are added up. Anyone geekier than me have any ideas on how exactly to make that work?
Even if you're going to pass on the calculus challenge, I hope the idea of chaining makes sense. Yes, closers pitch in extremely high-leverage situations, in the 1.8 to 2.2 range. But because they are not directly replaced by replacement-level pitchers, and can partially be replaced by decent relievers moving up the chain, they don't deserve the credit or the salary implied by their actual leverage. Yet another reason closers are overvalued...
2 recs |
36 comments
|
Comments
So, using chaining
the cost of losing your ace reliever, assuming everyone just shifts up a slot, is about 2 wins?
by Harry Pavlidis on Apr 29, 2009 12:34 PM EDT reply actions 0 recs
Yup.
Well, if he’s a 3.00 ERA guy. The studs are better than that.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Apr 29, 2009 12:42 PM EDT up reply actions 0 recs
Just more proof
of the insanity of spending any significant cash on relievers.
G G G E-flat_______ F F F D__________....
by t ball on Apr 29, 2009 2:41 PM EDT up reply actions 0 recs
I don't know about that.
2 wins is still worth 9-10 million dollars, which is certainly significant for any small market team.
Not that I disagree, because of how easy it is to find RP talent in the minors, just pointing that out.
---
Juuuust a bit outside!!
http://www.rightfieldbleachers.com
by Jack Moore on Apr 29, 2009 3:35 PM EDT up reply actions 0 recs
Well, but there aren't many relievers
that will be two wins better than a talented cheaper guy that a team might also have available. And really, from one year to the next, most reliever are pretty unpredictable. I’m mostly talking about middle relievers with the cash comment, but even some of the better late inning guys have pretty up and down performances from year to year.
G G G E-flat_______ F F F D__________....
by t ball on Apr 30, 2009 1:37 AM EDT up reply actions 0 recs
Interesting
So it doesn’t matter where the “closer” pitches in the bullpen, just that he pitches in the most leveraged situations, right? I realize that most closers pitch in more high leverage situations than other bullpen guys, but does chaining refute the whole “bullpen ace” theory that I have? I feel you’re better off using your best relief pitcher in a high leverage situation, no matter when it occurs after the start of the 7th inning. So he might get 2 outs in the seventh or 2 outs in the eighth with a one run lead and the go-ahead run on 2nd base, while another reliever gets the “save” in the ninth by starting the inning, allowing one hit and a walk and then getting a double play and a strikeout. Of those, who would be in the higher leverage situation, the 7th inning guy, the 8th inning guy, or the 9th inning guy?
Sorry if this seems repetitive, I’m just not quite understanding how the leverage index works and how you’re applying it to chaining.
"I just wish that the late Harry Caray were still around so I could hear him mispronounce 'Kosuke Fukudome' every fukun' night" -- Dennis Miller
by fourstick on Apr 29, 2009 1:10 PM EDT reply actions 0 recs
Right:
So it doesn’t matter where the "closer" pitches in the bullpen, just that he pitches in the most leveraged situations, right?
That’s not new, though. As a closer, the bullpen ace racks up about a 1.8 LI. If used optimally (not always in save situations) the bullpen ace might achieve a 2.0 LI for the season, which is good. Not sure how to answer your specific situations question. In general, the later the inning, the higher the leverage. The more baserunners, the higher the leverage. You can look up some specific situations here: http://www.insidethebook.com/li.shtml
As for how chaining fits in to all of this, I’m defining “roles” as bullpen ace, next best guy, third best guy, etc., based on optimizing LI. I’m not saying the best guy has to be a closer, just using that terminology because the best guys are currently being used as closers. Whoever goes down, the replacement pitcher doesn’t take their role. He gets the least important role (because he’s the worst pitcher) and everyone else that was in the bullpen gets promoted one role. The value of a reliever is the difference between the original bullpen and the new bullpen.
Any other specific questions?
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Apr 29, 2009 1:27 PM EDT up reply actions 0 recs
great post
I go back of and forth on this stuff. I wish I had been more a part of the deLI discussion, but I came in too late. I thought doing a deLI vs. actual LI might be an interesting way to (partially) judge managerial skill, if they are basically on the same scale.
I’m a bit confused as to your overall point, Sky (I just ate lunch, maybe that’s the problem) — to sum up: basically, you aren’t thrilled with (1+li)/2, but think it is the quickest and easiest solution we have now?
I'm not a sabermetrician, but I do play one at Driveline Mechanics.
by devil_fingers on Apr 29, 2009 1:13 PM EDT reply actions 0 recs
My main points:
- Chaining is necessary to properly value relievers.
- (1+LI) / 2 is ok, but can be improved, I think. My initial guess is that it doesn’t regress towards 1 quite enough, actually.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Apr 29, 2009 1:29 PM EDT up reply actions 0 recs
Position Players
Very good explanation about how chaining works.
However, doesn’t chaining impact the replacement level for position players also? When a ‘regular’ is DL’d, it’s not the 25th man on the roster who actually replaces him, but usually the 9th best position player. And the 10th best player becomes the #1 reserve, etc. Seems like WAR calculations should also be adjusted for position player chaining?
KJOK
by KJOK on Apr 29, 2009 2:18 PM EDT reply actions 0 recs
except what is "chained" for relievers is leverage
because they “earn” the situations they are employed in by managers (if the managers are smart). Positoin players generally play the same way every day — the team can’t control the leverage of the situation they are put in.
I'm not a sabermetrician, but I do play one at Driveline Mechanics.
by devil_fingers on Apr 29, 2009 2:47 PM EDT up reply actions 0 recs
Yes, true.
Assuming the reserves are better than replacement-level, chaining would matter. My guess is that it’s less important, though.
It does bring up one point, though, that holds for bullpens, too: Chaining assumes the team already has other non replacement level players on the team. Those cost money and I wonder if it’s totally correct that they should be paid according to a perfect chaining model. Almost seems like a game theory question about perfect knowledge of the system…
D_F, playing time can be a bit like leverage.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Apr 29, 2009 3:21 PM EDT up reply actions 0 recs
Can you explain further?
I get the gist of it. Maybe in some situations, such as platooning two guys, “leverages” their abilities, or putting in a better defender with a GB pitcher on the mound.
But in terms of Win expectancy in a game, relievers seem to be unique, or close to it.
I'm not a sabermetrician, but I do play one at Driveline Mechanics.
by devil_fingers on Apr 29, 2009 4:11 PM EDT up reply actions 0 recs
There's a limit to the number of outs (and therefore PAs, to most of an extent) a team gets.
So a bench player will see, say, 100 PAs. And so you pay him accordingly. But when a starter goes down, he might see 400 PAs, which is better than giving a true replacement level player 300 more PAs. That true replacement level player might see 100 as the new bench player. Instead of seeing higher leverage, the chaining here gives players higher PAs. You paid the bench player like you’d give him 100 PAs, but he gets 400.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Apr 29, 2009 4:30 PM EDT up reply actions 0 recs
hmmm
interesting. Do you think that effect is significant to be important in WAR calculations?
I'm not a sabermetrician, but I do play one at Driveline Mechanics.
by devil_fingers on Apr 29, 2009 4:47 PM EDT up reply actions 0 recs
Not really sure for position players.
Probably not.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Apr 29, 2009 5:24 PM EDT up reply actions 0 recs
awesomeness
THIS STORY ONLY ENDS ONE WAY
by colintj on Apr 29, 2009 3:47 PM EDT reply actions 0 recs
should we adjust LI or replacement level?
I realize it’s the leverage where the chaining shows up, but it seems awkward to diddle with the individual’s LI to address discrepancies in replacement level. Performance above baseline is more still more valuable in higher leverage situations, even if the baseline is higher.
It seems more accurate to correctly establish baseline than to fudge by regressing leverage….especially since there doesn’t seem to be an obvious direct relationship between replacement level and usage customs with respect to leverage. Wouldn’t it make more sense to try to define the replacement_level(LI) function and then evaluate against the adjusted replacement level over the individual’s full LI?
by scottdm on Apr 29, 2009 4:02 PM EDT reply actions 0 recs
Interesting thought.
I agree that changing the LI is just a fudge factor and a temporary step to get the end result we want. It’s a shortcut for doing the whole chaining process. Maybe think of it as a “value leverage” instead of a WPA type leverage?
Changing replacement level is intriguing, but a replacement level reliever WILL have a 4.85 FIP (well, actually a bit lower in today’s run environment.) Making it higher also seems like a fudge factor, no? Just like I solved for LI using the other data to spit out the correct RAR, you could solve for repFIP using the actual LI to get the correct RAR.
The correct baseline might actually be defined as a combination of FIP and LI (and innings?) I wonder how that would look…
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Apr 29, 2009 4:46 PM EDT up reply actions 0 recs
I think you're on to something with the 3 parts of replacement level
Especially for pitchers.
But it gets difficult. Because there’s a generally a fixed starting rotation, you might not think chaining would matter for starters. But assume your ace who averages 7 innings per start goes down in April and is replaced in the rotation by a replacement level starter for his last 30 starts of the season. That replacement level starter is probably only average 5 innings a start or so – leaving an additional 60 innings of work for the bullpen to pickup.
I’m starting to think the easiest way to do this might be by simulation rather than by algebra or calculus. It won’t be as exact, but might be easier to model all the moving parts.
by Dan Turkenkopf on Apr 29, 2009 8:33 PM EDT up reply actions 0 recs
One thing that's interesting about starters who only go 5 innings is that it can be a GOOD thing.
Most post better ERAs than many starters, especially 4th and 5th starters, but also some 3rd starters. So instead of a 6th and 7th innings from a 4.50 to a 5.00 ERA guy, you get two innings from a 4.00 to 4.50 ERA guy, and that’s not from one of the best two relievers. Sure, that taxes the bullpen a bit, but it’s a great situation for the middle of the bullpen, and it will occasionally be an opportunity to use the bottom of the bullpen.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Apr 29, 2009 9:40 PM EDT up reply actions 0 recs
Leaverage Component
Just to make a small correction to the article, FanGraphs does use (gmLI + 1)/2 as the leverage component for the WAR values for relievers.
by dkappelman on Apr 29, 2009 11:30 PM EDT reply actions 0 recs
out of curiousity how to you figure win%
with pythag with a “2” exponent, “1.83,” or a floating exponent a la PythagenPat?
I'm not a sabermetrician, but I do play one at Driveline Mechanics.
by devil_fingers on Apr 30, 2009 12:08 AM EDT up reply actions 0 recs
something else
We do it a little bit differently. Where we take the difference between the League run environment minus the park adjusted FIP scaled to RA. Then we divide that by the dynamic run to win converter and then add .50 – replacement level to whatever that was.
The whole thing looks something like:
(LeagueRPG – FIP)/(RtWConverter) + (.5 – replacement level)
by dkappelman on Apr 30, 2009 3:29 PM EDT up reply actions 0 recs
huh
It would be cool if you or someone else at FG could post about that there or elsewhere and explain the reasoning.
I'm not a sabermetrician, but I do play one at Driveline Mechanics.
by devil_fingers on May 1, 2009 7:20 PM EDT up reply actions 0 recs
Ah, excellent, sorry about that Dave. I'll fix it. Skip the slander lawsuit? Thanks.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Apr 30, 2009 8:55 AM EDT up reply actions 0 recs
Another minor impact of chaining
The team’s leverage situation is likely going to change.
A really good team or a really bad team should have a lower average LI than a team that’s closer to average.
If a good team loses their ace reliever, then the remainder of the bullpen should be worse. They’ll be more likely to turn low leverage 3 run leads into high leverage 1 run leads. Which means that there will be more close games – and therefore a higher leverage.
I’m guessing the effect is pretty small, and may not even be noticeable, but there could be an impact. And losing a starter, even though they tend to have an average LI of around 1 could impact the relievers’ LIs / usage patterns even more because the score/inning when the starter comes out determines the reliever used.
by Dan Turkenkopf on May 5, 2009 8:24 AM EDT reply actions 0 recs
Wouldn't a worse bullpen also turn more close games into larger losses?
The saber-cliche is that good and bad teams play and win about the same number of one run games, right? It’s just that good teams win a lot of games by 3+ runs while bad teams lose them by 3+ runs.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on May 5, 2009 11:38 AM EDT up reply actions 0 recs
Good point, even though it's counterintuitive
I need to re-read some of those studies. It seems like the winning percentages in one run games could be similar for good and bad teams, but the number of games might be different? And shouldn’t average teams play more close games?
Wouldn’t the leverage effects in one direction though?
Bad teams would take their big deficits and make them larger, good teams would take their big leads and make them smaller.
by Dan Turkenkopf on May 5, 2009 9:09 PM EDT up reply actions 0 recs
I found this in the Bill James Gold Mine from 2008
Previous research, which may be unpublished, shows that a team’s expected ratio of wins to losses in one-run games is the same as their ratio of runs scored to runs allowed.
- Bill James Gold Mine, pg 235
by Dan Turkenkopf on May 5, 2009 9:22 PM EDT up reply actions 0 recs
It sounds right that average teams should play more close games, but I'm not sure.
And it makes sense that while 1-run winning percentage should be close to .500 for all teams, it’s probably slightly higher for good teams and slightly lower for bad teams.
Someone needs to do definitive study on that.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on May 6, 2009 1:20 PM EDT up reply actions 0 recs
The same page says .600 teams play around .550 in one-run games...
while .400 teams play around .450 in one-run games.
But it’s a throwaway line.
I agree it would be interesting to see the full results.
by Dan Turkenkopf on May 6, 2009 8:36 PM EDT up reply actions 0 recs

by 











BtB on Facebook















