The playoffs are underway, and there have already been some exciting moments. The old refrain that the playoffs are a crapshoot is as tired as it is common (even in its more expletive-peppered variety).
Forgive me for stating the obvious, but in a straight casino, a crapshoot really is random. The odds may have been jiggered so that a savvy craps player has a 1-2% disadvantage against the house, but the dice are not weighted. The outcomes truly are random.
In the playoffs, we all know that isn't true. But we also realize that the best team does not win 100% of the time. So how can we fix a point somewhere in the middle?
Table of Contents
The hardest situation to predict in a baseball game is the single plate appearance. As you start to aggregate plate appearances, things get progressively easier. By the time you get to a full season, projections start to look better and better (but this, too, remains difficult).
Somewhere in the middle is the difficulty of predicting a single game. Consider that the Yankees, the team with the highest winning percentage in the majors this year, won fewer than 64% of its games. Assuming that is the true estimation of team strength (we'll get to that in a minute), you'd be wrong picking the Yankees against an average team more than once every three times.
Now consider that the Yankees in the playoffs face above average teams (insert joke about Twins' mediocrity here). So how can we sort out how well playoff teams will do against each other?
Because this is sort of a do-it-yourself website that entertains the shop-table tinkering of people who are inclined to ignore warnings not to try this at home, I thought I would walk us through the math.
The classic way to solve a problem like this is to use the log5 method, which was developed by Bill James in his 1981 Abstract. The log5 method can be written in algebraic form like this:
Where "A" is the winning percentage of one team and "B" is the winning percentage of the other team. If you'd like more detailed information on how this formula can be derived, see the August, 2004 issue of SABR's "By the Numbers," which Phil Birnbaum has helpfully archived here.
You can actually use any number here you like; it doesn't have to be the backward-looking actual season record. I typically use a teams third-order winning percentage (see here), but you could use a team's total WAR to imply a winning percentage. You could also, as JinAZ does here, use his wonderful TQI. Today, to keep it in-house, I'll use JinAZ's numbers. He has graciously provided me with the end of season numbers for the Rox and Phils.
Let's put the method to work to find the probability that my beloved Phillies will win tomorrow evening. At the end of the season, Philadelphia had a TQI of .525. Colorado's was .572.
Plugging that in, we get:
WPct = ( (.525 - .525 * .572) / (.525 + .572 - 2 * .525 * .572) )
WPct = .453
That means, using TQI and log5, the Phillies have a 45.3% chance of winning a game against the Rockies.
Some caveats apply. In order to arrive at this result, I had to make a few assumptions. First, I assumed that TQI is the best estimate of a team's actual strength. You might argue that the Rockies are a weaker team now without Jorge De La Rosa, and the Phillies a stronger team with Cliff Lee. But since I am not confident in my abilities to arrive at an exact number making all relevant assumptions, I'll plead ignorance and go with what is a very good estimate.
Now that we have a number that implies the probability of the Phillies beating the Rockies, we can begin to make some inferences about what will happen in a series.
Since I'm a law student, I am subject to the tyranny of the hypothetical. And now, dear friends, you will share in my pain.
Let's say (purely hypothetically!) you have tickets to Game 5 of the NLDS, set to be played on Tuesday. Of course, this game will only be played if it is necessary. What are the odds it will be necessary? First, let's figure out what would need to happen for the game to be played.
There are four sets of outcomes for Games 3 and 4, the games that will determine whether there is a Game 5. As the series is currently tied 1-1, only if one team wins both games this weekend will there be no need for a Game 5. If however, the teams split, a Game 5 will be played. Written out, the possibilities are:
In the two bolded cases, there would be a Game 5. In the two other cases, there would not. So, the odds are 50%, right? Wrong!
In fact, we already have the information to figure this out, though. Applying the probability already determined to the outcomes above:
PHL, PHL (.453 * .453 = .205)
PHL, COL (.453 * .547 = .248)
COL, PHL (.547 * .453 = .248)
COL, COL (.547 * .547 = .299)
Adding the probabilities of the two middle outcomes, we get a probability of .496. See, I told you it wasn't 50%. It's just ridiculously close to 50%!
So there is a 49.6% chance that our theoretical ticket holder has to find the fastest way to get from New York to Philadelphia next Tuesday.
But what if we want to figure out the odds of a whole series? Here's a little guide you can use once the next round starts.
Let's assume the Angels face off against the Yankees (calm down Red Sox and Twins fans, this is another hypothetical). The Yankees' TQI was .643. The Angels' was .549. Using the above formula, the odds of the Yankees beating the Angels in a single game is .597. So what are the odds the Yankees will prevail in a best-of-seven series?
To figure this out, we can use our old friend, Excel's BINOMDIST function. Here's the syntax
This is, strictly speaking, calculating the odds that the Yankees do not lose more than three games in the series. To do that, I have told Excel to find the odds of losing up to 3 games (that's the first term) out of 7 (second term), given the probability of the Yankees losing (third term). The last term is telling Excel to count probabilities for the Yankees losing 2, 1, or 0 games as well. When you run this formula, you find that the Yankees have a .704 probability of winning such a series.
You can do this for any series with the aid of only a spreadsheet. Pretty neat, right?
Of course, there are more caveats. To arrive at these numbers, I had to make a few assumptions. First, I ignored home field advantage, which has been shown to be statistically significant. Second, I assumed that the probabilities of winning each game were identical and independent. That is, the outcome of one game does not affect the odds in the next game. Finally, I have ignored the starting pitchers of the teams. This is mostly for simplicity's sake.
I'm glad to report that Clay Davenport's Postseason Odds are back online this year. Rather than using a binomial function, Clay uses a Monte Carlo simulation to calculate odds. Notably, his method DOES include information about the expected starters in each game. Thus, his system is probably more accurate than the one outlined above.
You can also find detailed probabilities at Cool Standings. They will even show you the probabilities of a particular series outcome (say, 3-1 in the ALDS). Of course, using BINOMDIST, you could find this as well!
It's been a bad year for forecasting (particularly of the economic variety). How possible do you think it is to predict outcomes of playoff series? Am I being overly sanguine about the possibility? Is it truly a crapshoot?