Over the past month, Bradley Woodrum of FanGraphs has been researching the changing nature of stolen bases. In his first piece, Bradley noted one crucial variable that affects the break-even rate for stolen base success - that is, at what rate must the runner steal successfully for the benefit of stealing to match the cost of getting caught? The variable that most strongly affects this rate? Home runs.
I'll let Bradley explain:
Unsurprisingly, the relationship between homers and the run values of steals are considerably and inversely related: The break-even rate for SB success shares a .806 .685 R-squared with HR rates, when looking at MLB seasons from 1950 through 2012.
He goes on to calculate the break-even SB success rate for each team, given their respective home run rates. The results are very interesting, showing that teams like the Yankees need to steal at a higher rate (break-even of 72%) in order for their stolen bases to be effective, while teams like the Giants should be stealing more often since their break-even is only 65%.
As I wrote in a response to this research, I see a slight problem with this conclusion - home run ability is not distributed evenly across a team. Take the 2012 Blue Jays. They hit 198 home runs, but almost half of those home runs (92) came from three players: Edwin Encarnacion, Jose Bautista, and Colby Rasmus. It follows that the break-even rate for stolen bases should be higher when these three hitters are up than when the rest of the team is up.
That's not all, though. While Bradley chose to just look at home run rate, there are other factors that affect the SB break-even rate. A player with a high walk rate will raise the break-even point, since a stolen base is useless if the batter proceeds to draw a walk.
Based on these factors, I figured I would delve deeper into Bradley's research. Instead of finding break-even SB rates for each team based on solely home runs, I wanted to find the break-even rates (just stealing second, for now) for each individual hitter based on all outcomes. It's a daunting task to be sure, but if you're up for it, so am I.
Warning: if you don't care about how I went about calculating these numbers, you can just go ahead and skip to the Results section.
Here was my initial (and by initial I mean after a lot of trial and error) plan of action:
1) Using data from the 2012 season, figure out the chances of each outcome (single, walk, strikeout, double play, etc) for each batter (let's say minimum 300 PAs).
2) Determine the run expectancy in the inning if there's is a runner on first and no outs (no SB), runner on second and no outs (SB), and no one on and an one out (CS) - all based on the probabilities in step one.
3) Using those RE values, calculate the break-even rate for a stolen base if there are no outs.
4) Repeat steps 2 and 3 for SB attempts with one out and two outs.
Steps one and two were much more difficult than I initially thought. I first thought I would just look at singles, doubles, triples, home runs, walks, and strikeouts, and the rest would just be outs on balls in play (BIP-outs). Nope. Not only did I need to also consider hit-by-pitches, double plays, and reach-on-errors, but not all hits or BIP-outs are the same. I needed to account for both the runner advancing to extra bases on singles, doubles, and BIP-outs.
To do so, I used the great data freely available at Baseball-Reference for league-wide numbers on advancing to extra bases on singles/doubles as well as advancing to third on a BIP-out. I then converted these numbers to percentages and applied them to the outcomes when I calculated the RE values.
This was all well and good, but the RE numbers as a whole weren't quite matching the league-wide numbers found here. Then I came to a realization: the distribution of the various outcomes isn't the same for each out state. Batters hit more singles, doubles, triples, and home runs with 0 outs, but they get way more walks with 2 outs.Not to mention the fact that double plays only occur with less than 2 outs and a runner on first. I couldn't just assume the same distribution of outcomes regardless of base-out state.
Luckily, B-R came to the rescue once again. I found the total numbers for each out state, converted them to percentages, and normalized around 100% (as in, if singles occur 3% more often than average with 0 outs, they are listed as 103%). I then used these normalized percentages to adjust the season-wide percentages calculated in 1) above.
At this point, you're probably tired of reading about my process, so I'll just show you the results. I want to make it clear, however, that these are not perfect numbers - there are a ton of variables that go into the run expectancy of any given situation, as well as a ton of variables that go into the stolen base break-even rate. However, the results, as you will see, do match intuition, and give us a good idea of which players have the highest and lowest break-even rates.
You can find a full spreadsheet of this information here.
|Name||Break-even - 0 outs|
There are some surprising names at the top of this list. While many of these players are obvious choices due to their exceptional on-base ability and/or power, guys like Gregor Blanco and Brandon Belt might be somewhat unexpected. The variable that really helps Blanco and Belt in particular is double plays. Both players did a great job at avoiding double plays with a runner on first, thus mitigating the positive effect of the stolen base. On the other hand, guys at the bottom of list like Dayan Viciedo and Michael Young hit into a ton of double plays last year, which makes the stolen base much more valuable.
|Name||Break-even - 1 out|
The break-even rate with one out is pretty similar to no outs, which is expected. The double play is still a factor, and the possibility of killing a rally remains an important consequence of caught stealing.
|Name||Break-even - 2 outs|
With two outs, we see a pretty significant difference in the list. Instead of being headed by a mix of on-base guys, power guys, and no-double-play guys, the top 10 is almost entirely players that have elite power. It's not just power, however. You'll notice that Ryan Braun, Miguel Cabrera, and Josh Hamilton are notably missing. This is because singles are much more positive with a runner on second than with a runner on first, especially with two outs - consequently, hitters with contact ability will see a lower break-even rate.
Just by looking at these lists, we can come to some conclusions that should be fairly intuitive:
- The most important factors for SB break-even rates are on-base ability, power, and likelihood of hitting into a double play.
- With no outs and one out, on-base ability and, obviously, double play propensity have a much more significant effect on the break-even rate.
- With two outs, the most important factor is home run power. However, hitters who make less contact will have a higher break-even rate, all else being equal.
The above data is really just the tip of the iceberg for this concept. We've seen which hitters were best and worst to "steal on", so to speak, but going forward, I will take a look at what teams actually did with these hitters at bat. I'll also use projections for 2013 to come up with similar lists as above, and use those to hypothesize about some potential strategies for lineup construction using this information.
Thanks to Baseball Reference, Baseball Prospectus, and FanGraphs for the data. Also, a big thanks to Spencer Schneier for the conceptual help and James Gentile for the SQL help.