Lately, I’ve been writing a lot about stars and scrubs — the team-building approach that focuses on getting a small number of really excellent players rather than spreading that value out among a larger number of good-but-not-great players. The idea is that one six-win player is worth more than two three-win players, or three two-win players (though the exact mechanics of why that’s supposed to be the case aren’t always clear.) It can work; it often doesn’t. It seems to always grab the attention of fans, probably because our brains are more engaged by one great player than multiple good ones.
It would be nice if there was some aspect of baseball where reality matched our perception of it, at least in this small respect. Luckily, there seems to be one: the bullpen. Unlike most aspects of baseball, relief pitching has an actual mechanic by which a great pitcher and a poor pitcher could, hypothetically, perform better in the aggregate than two medium pitchers.
With position players, teams don’t get to choose (for the most part) when they get the chance to perform. Sometimes Mike Trout comes up with the bases empty, and sometimes Carlos Perez comes up with the bases loaded. Because the high-leverage situations are distributed nearly randomly between the two of them, the difference between Trout/Perez and, say, Joc Pedersen/Yasmani Grandal is slim.
The same is not true of relievers. While managers might not always have their pick of the entire bullpen, they can and do play matchups, and choose when to use their relief ace and when to use the mop-up guy. A bullpen with two mediocre pitchers is going to get mediocre results in every situation; a bullpen with a great pitcher and a bad one is going to get great results in the important situations, and bad results in the less important situations, a net positive when compared to the mediocre bullpen. It’s like letting Mike Trout bat 2nd and 7th in close games, as long as Carlos Perez bats 2nd and 7th in games that aren’t close.
Now, that’s all well and good in theory, but bullpens have more than two pitchers, and managers can’t predict what the leverage of the following inning will look like. Does it translate to actual practice?
To find out, I looked at the performance of relievers as measured by both WAA — wins above average, precisely the same as WAR but scaled to league average instead of replacement level (which I didn’t use for a reason, explained below) — and WPA (win probability added).
The former is “context-neutral,” meaning a strikeout with the bases loaded and two outs in the ninth is worth the same as one down seven runs in the fourth; the latter is “context-dependent,” and in fact, the only thing it measures is context. It’s calculated by calculating the historical odds of winning in each possible situation (e.g., runner on second, no outs, tie game in the sixth) and comparing the odds when a pitcher enters to the odds when he leaves. WPA is built around average, so I used WAA instead of WAR so that it could be safely compared.
Weighting by each pitcher’s innings pitched, I looked at the relationship between the two. I could show you a plot, but it’s just a mess of points, and while there’s an obvious relationship — pitchers who are generally talented tend to improve their team’s odds of winning, unsurprisingly — what we’re interested in is the slope of that relationship.
In this context, the slope conveys how WAA translates to WPA. For example, if we look at all batters, the slope is .58, meaning that, on average, each additional WAR yields a little over half a WAA. For starting pitchers, on the other hand, the slope is .87. (Sidenote: I can’t think of a way to explain the difference there. Interesting!) For relievers, however, the slope is even higher, at 1.09.
To translate those values into something understandable, investing in one WAR (one context-neutral win) of position player translates to about .6 WPA (three-fifths of a context-dependent win) because some of that performance gets wasted in games that don’t matter. For relievers, on the other hand, the conversion is one WAR to over one WPA, because the great performances can be arranged and maximized instead of wasted.
Obviously, proving causality is hard, but one way for this to happen is if the good pitchers are pitching in the high-leverage moments, and the bad pitchers in the low-leverage. And it’s, uh, kinda nuts (!) that the ability to choose when you use a given reliever is strong enough to show up in the data like this.
What that means is that, while the stars-and-scrubs approach doesn’t seem to work in most areas of the game, it might pay off in bullpens. One Andrew Miller is worth more than two Brad Brachs, even if their WARs would suggest otherwise. In the latter, you get all Brach, all the time; in the former, you get Miller in the critical situations, and, say, Scott Feldman in the blowouts.
There’s been a lot of collective head-scratching in the baseball world over the enormous price tags that certain high-end relievers have fetched over the last few months. It was all too easy to remember when “proven closers” were a thing and dismiss it as the irrational moves of behind-the-time executives. However, it it seems like there’s not just a reason, but a good reason for teams to shell out more for the very best relievers.