Filed under:

Learning R: A losing team wins the World Series

Congratulations to the 29-31 Milwaukee Brewers on their championship season!

Should the 2020 season conclude, a whopping 16 teams will advance to the playoffs. This is, mathematically, too many teams. Even in years where a third of the league isn’t actively trying to be bad at baseball, it’s rare that more than half the majors is actually any good. Good-not-great teams have won the World Series before (see: the 2010, 2012, and 2014 San Francisco Giants)*, but I don’t know that we’ve ever seen a bad team become World Champions. Maybe 2020 is the year.

It’s incredibly likely that at least one team will advance to the postseason with a losing record. Last week, with some code stolen borrowed from Jim Albert, I found that when the playoffs were expanded to 16 teams in a 60-game season, the most common win total for a Wild Card team was 31.

It was just as likely for a 29-win team to get a wild card spot as it was for a 33-win team to get one. That’s partly because the more games a team wins, the greater their chance of getting one of the two divisional titles.

In one particular simulation, the 29-31 Brewers and the 28-32 Reds both received a playoff spot. Note that in these simulations, talent is randomly assigned, so the identity of the teams don’t matter. Sorry, Mariners fans.

I left off last week knowing how to simulate an individual series, but I couldn’t figure out how to simulate an entire postseason without replicating that process 15 times. I suspected that a well-written function would dramatically cut down on the time needed and make it easy to replicate the results.

Reader, I still can’t figure it out.

Instead, I manually simulated each series which really didn’t take as much time as I thought. It was certainly quicker than learning how to write a complex function.

To get the playoff picture for both leagues, I used the filter() function to only return teams that won the division or the wild card and created two separate data frames for each league. Then I arranged each league by wins in descending order.

The first series I simulated was the number one seed Royals against the number eight seed Astros. To do this, I took some code straight out of Analyzing Baseball Data with R.

> AL.WC.Round <- AL.Seeds %>%

+ slice(1, 8) %>%

+ mutate(outcome = rmultinom(1, 3, prob),

+ Winner.Series = ifelse(outcome > 1, 1, 0))

To do another series, I changed the arguments in the slice() function and made it write to another data frame.

> AL.WC.Round2 <- AL.seeds %>%

+ slice(2, 7) %>%

+ mutate(outcome = rmultinom(1, 3, prob),

+ Winner.Series = ifelse(outcome > 1, 1, 0))

Then it was just a matter of repeating that for every series, using full_join() to merge data frames together to give me seeding for the next round, and doing it all again until I could crown a World Champion.

The downside is that I can’t simulate an entire postseason with one line of code. The upside is that I got the result that I wanted on the first simulation.

(In reality, the Reds and Brewers should have faced each other in the divisional series, but I re-assigned seeding based on regular season win total at the beginning of every round. If I ever manage to write a function for this, I’ll match opponents correctly. For now, this is fine.)

Yes, the 29-31 Brewers took down the 39-21 Royals in a historical upset. The legitimacy of the entire 2020 season was thrown into question. The Brewers’ first championship was forever tainted by their sub-.500 regular season record.

What’s unfair for these fictional Brewers is that they actually weren’t that bad of a team. Their randomly assigned talent level was 0.09. (In these simulations, a 0.2 talent team would have about a .550 winning percentage.) Their true talent was probably a 31-win team which would give them a .517 winning percentage.

That’s worse than the average World Series winning team for sure, but it’s a better winning percentage than the 2006 Cardinals’ Pythagorean record. These Brewers were not unprecedentedly bad.

If a losing team wins the World Series this year, that’s probably how it’s going to happen. A good or even great team will lose more games than they should but still manage to eke into the playoffs where they’ll play closer to their true talent level.

If that team happens to be the Dodgers, for instance, I think baseball fandom at large will recognize that a 60-game regular season isn’t enough to tell us anything. The Dodgers are the best team in the majors now, and they’ve been the best team for the last seven years. They’ve won two-straight NL Pennants. The only people who will question the team’s legitimacy are sports radio hosts who need something to be mad about.

If instead that teams is, let’s say, the Reds, that’s when the Discourse will become apoplectic. The Reds are a good team, and they were a good team last year but their record didn’t show it. They’re also coming off six-straight losing seasons. They were 6-8 before play on Saturday which would give them a reasonable shot at getting into the postseason with a losing record. As of Saturday morning, their starters lead the majors in fWAR, so Luis Castillo, Sonny Gray, and Trevor Bauer could easily carry the team through the first few postseason rounds and beyond.

If that happens, we’ll never hear the end of how the Reds got lucky or how the season was a crapshoot or the expanded playoffs were just a cynical money grab. All of those could be true to some extent, but I think this exercise has reminded me that we’ll have to further divorce wins and losses from how we evaluate a team’s talent this season.

Kenny Kelly is the managing editor of Beyond the Box Score. You can follow him on Twitter @KennyKellyWords, and you can read the rest of the Learning R series here.