clock menu more-arrow no yes

Filed under:

Learning R: On a quest to calculate the odds of 400/400

New, 1 comment

Simulating a season is easy enough, but simulating a career gets a little trickier.

MLB: AUG 31 Red Sox at Angels Photo by Brian Rothmuller/Icon Sportswire via Getty Images

One of my favorite genres of baseball facts are the kind that make you think, “Wait, that can’t be right,” and prompt you to double and triple check to make sure you’re not missing anything. Barry Bonds’ career is chock full of these sorts of facts. An excellent example is what would have happened if Bonds played the 2004 season without a bat. The conclusion that Bonds would have posted a .608 on-base percentage if he didn’t have a bat prompted Jon Bois to say, “I have to be some idiot who’s full of crap,” and honestly, what Bois felt is the weirdest of highs.

A less extreme example of this feeling is remembering that not only is Bonds the only member of the 500/500 club (500 home runs, 500 stolen bases), he’s also the only member of the 400/400 club. Every time I remember that, I have to Google whether that’s actually right because that can’t possibly be true. Are you telling me that Willie Mays didn’t steal 400 bases? He didn’t. He “only” stole 338, and of the players with at least 400 home runs, Mays was the closest to 400 steals. On the flip side, of the players with at least 400 steals, the closest anyone got to 400 homers was Bonds’s father Bobby Bonds.

Bonds won’t stand alone forever. Surely, someone will join him eventually, and there are more than a few active candidates. Mike Trout is an obvious prediction. At 28, Trout currently has 295 homers and 201 stolen bases. Trout will almost certainly clear 500 homers let alone 400 homers, but it who knows if he’s going to clear 400 stolen bases. Trout’s an excellent base stealer—he has a success rate of 84.8 percent—but Trout hasn’t been stealing as often lately. He hasn’t stolen more than 30 bases in a season since 2016, and since the beginning of last season, he’s only swiped 12 bags.

At The Baseball Gauge, Dan Hirsch gives Trout a 12.3 percent chance of reaching 400 stolen bases. This doesn’t include 2020 data, and missing over 100 games will certainly knock this down further. Not to mention Trout has only attempted to steal one base this year as of Saturday morning.

The Baseball Gauge

Before this year, Trout actually had a better chance of hitting 763 home runs and becoming the new home run king than he did of becoming the second member of the 400/400 club.

There are four active players under the age of 30 with at least 100 homers and 100 stolen bases. Those four are Mike Trout, Mookie Betts, Christian Yelich, and José Ramírez. Of those four, only Trout is outpacing Bonds in home runs. Through age 26, Betts was three home runs behind Bonds. Yelich and Ramírez are awfully close considering they were late bloomers.

Data courtesy of the Lahman Database

Unsurprisingly, Bonds maintains a substantial lead in stolen bases. Bonds came up in the ‘80s when stolen bases were still cool. Per The Baseball Gauge, Bonds’s rookie year 1986 was a 75th percentile year for stolen bases per game played. 2019 was in the 35th percentile. Despite coming up in an era when catcher pop times are better, pitchers are better about holding on runners, and teams are disinclined to run unless they’re assured of success 70 percent of the time or more, Trout was outpacing Bonds until his age 26 season.

Data courtesy of the Lahman Database

What I would like to do is devise some way to calculate the chances of a player reaching the 400/400 milestones, but I suspect that’s going to require some tools that aren’t in my box yet. Calculating the chances of Donovan Solano or Charlie Blackmon (or any other hitter) hitting .400 in a season is pretty straightforward, but that same method can’t be used over the course of a career.

My method for that relied on memoryless calculations. The chances of a hitter getting a hit in the first week of the season are roughly the same as the same hitter getting a hit in the last week of a season. Players exhibit varying levels of preparedness and fatigue over the course of a season, but the variance usually isn’t so great as to completely throw off approximations like the ones I made for Solano and Blackmon especially in a 60-game season.

Any sort of predictive model for a player’s career needs to consider more context and the probabilities for certain events are going to change. By now, we’re all aware of player aging curves. Players tend to peak around age 27 and decline into their 30’s. Typically, players steal less when they’re older because they hit worse and they’re slower. Getting on base less often means they have fewer opportunities to steal, and being slower means they’ll be less successful when they try.

Bonds kept mashing through his 30’s, but his stolen base numbers really slowed down in his early 30’s. Like all players, Bonds slowed down, but he also dealt with nagging knee injuries. Even if Bonds had perfect knee health, he would have stolen fewer bases at 38 than he would at 28.

Back in 2013, Jeff Zimmerman and Mike Podhorzer depicted the stolen base aging curve and found that 20-steal players lost about 8 steals per year off their peak once they hit their 30’s.

Jeff Zimmerman | RotoGraphs

Anything I build to predict career stolen base totals will need to reflect this curve. I’m not entirely sure how I’m going to do that just quite yet, but I’m hoping to figure it out over the next couple of weeks.


Kenny Kelly is the managing editor of Beyond the Box Score. You can follow him on Twitter @KennyKellyWords.