Based on Nick's FanPost earlier in the year on playoff probability added, I invited him to work on some playoff probability numbers for BtB. He accepted the challenge and in addition to updated weekly results, he'll be presenting some articles dissecting the numbers. I'm psyched Nick's on board for this project and I'm psyched at the potential for using these numbers in our analysis.
About 2 weeks ago, the Cardinals traded promising a young fireballer Chris Perez and a PTNBL to the Indians for Mark DeRosa. After thinking about it for awhile, weighing the pros and cons in my head, I still didn't know whether to like the trade or not. Despite the fact that Perez had basically been replacement level so far in the majors, he was always a top prospect in the minors, putting up ridiculous strikeout rates, and he would be under team control for five more years, while DeRosa would likely only be with the team for a half a season.
The upside in acquiring DeRosa is that he was expected to improve the Cards chances of making the playoffs. Of course, while he almost certainly will, we don't know how much. Until we can estimate roughly how much, it's really hard to judge the trade. With the trade deadline approaching, there are bound to be a lot more trades like this one, each presenting a similar conundrum. Therefore, BtB will be publishing a playoff odds report each week through the end of the season. These reports will help us understand where each team stands in terms of playoff probability, and how changes in the intrinsic strength of each team, via trade, injury, etc., affects these odds.
The first thing we have to know to calculate playoff odds is how each team is projected to play for the rest of the season. Creating projections was probably the biggest dilemma I had. One part of me wanted to use ZIPS RoS projections adjusted for playing time and then convert to wins and adjust for strength of schedule... but that would have been way too time consuming.
Instead, I decided to combine each team's pre-season PECOTA-projected winning percentage with Justin's current BtB Power Rankings, basically updating the prior estimate of the strength of each team with their performance to date. While this misses out on things like injuries, trades and other playing time concerns, it should give a solid picture of the strength of each team going forward.
To do it properly, I first had to find out how much weight to give each system based on how far are into the season we are. Given that Justin just started his power rankings, I used my favorite website, Archive.org, to extract BPro's 3rd Order Wins at various points throughout recent seasons (3rd Order Wins are similar to Justin's power rankings in their methodology, so it seemed okay to use the former as a historical substitute for the latter). I then ran a correlation for each data set on 3rd Order winning percentage and the actual winning percentage of each team for the rest of the season. I also ran that same correlation with PECOTA preseason projections. The results are best shown like this:
The x axis represents how many games have been played, and the y axis is the correlation coefficient. These jibe with our expectations: the further you go into the season, the less predictive PECOTA becomes and the more you should look at actual team performance so far. However, even at the highest level that I tested (once you get past 125 games or so, the sample size simply gets too small), PECOTA is still more predictive than the team's performance so far, which really speaks to its relative accuracy.
Applying that above graph into weights is easy. For example, PECOTA projected the Cubs to win about 94 games, good for a .580 W%; however, 86 games into the season, they are playing like a .487 team according to Justin's rankings. Using the formula spit out by the above chart, we know that PECOTA is about 1.4 times more predictive thus far into the season. Then using some 7th grade algebra (x+1.4x=1) allows us to figure out how to properly weight PECOTA and 3rd Order Wins. In this case, it's about .58 and .42, which yields a projected winning percentage of .541 for the rest of the season and just over 84 wins total... ouch.
This is similar to a Bayesian style projection, but I wasn't quite smart enough to do that properly. Also, there is something to be said for deriving the weights empirically, even if they may not be quite as accurate due to small sample size.
Using that methodology, I went back to the the archives of 3rd Order Wins and tested my RoS projections vs. actual RoS winning percentage at various points during the season. I got standard errors ranging from .06 points of winning percentage (about 9.5 games over a full season) to .074 (about 11 games over a full season). In fact, the progression was almost perfectly linear, meaning that the more games into the season we play, the less certain our projections are for the rest of the season. Possible reasons for that may be that my projection model may simply be crappy (very possible), or that with fewer games being played there is more room for variance.
Anyway, finding out the standard error of the projections is very useful as it allows us to quantify the uncertainty in them. For example, while the Cubs are projected to win around 84 games, they have a good shot at winning significantly more or less than that. Given that the wins they already have cannot be taken away, that variance is implied upon their RoS projection of just over 41 wins. The standard error for their projection is a little over five wins, which, assuming the distribution is normal, means they have a 1/3 chance of winning five games more or five games less games than their projection. The full range of possibilities are best shown like this:
As you can, see, while the distribution peaks at about 84 wins, they could very well win 90 games... or 78 for that matter.
To put that into better context, here are the win distributions of each team in the NL Central:
This is a little bit messy, but you can click on it for a larger view.
As you can see, the Cubs, Brewers and Cardinals have basically the same win probability distribution, while the other teams lag behind. Using that graph, it is a simple (although tedious) task of figuring out the Cubs chances of winning the division:
1) Figure out the Cubs chances of winning a certain amount of games. Let's say it's 90; in which case the answer is a little over 4%.
2) Figure out the chance that all of the rest of the teams in the division win less than 90 games. This is equal the chance that the Brewers win less than 90 games * the chance the the Cardinals win less than 90 games * the chance that the Reds win less than 90 games, etc. The answer is about 73%.
3) Multiply the first two answers. The Cubs then have about a 3% chance of winning 90 games and winning the division. I also add partial credit for a tie, but in most cases the effect was very minimal.
4) Rinse and repeat for all number of games that the Cubs could possibly win.
Using this method, the Cubs have roughly a 30% chance of winning their division. The Cards, however, have the best chance, with about 35%, and the Brewers are also hanging in there at about 22%.
Figuring out wild card odds is a little more complicated. It's basically the odds that a team won't win the division, multiplied by the odds that they will win more games than any non division winner. That can get especially tricky as we don't know which team will win each division, so we have to do it for all possible combination of division winners and multiply that by the chance that all of those teams win the division. Suffice to say, it made for some late nights up with Excel.
|AL East||W||L||Justin||PECOTA||predW%||Final W||Division||WildCard||Playoffs|
|AL Central||W||L||Justin||PECOTA||predW%||Final W||Division||WildCard||Playoffs|
|AL West||W||L||Justin||PECOTA||predW%||Final W||Division||WildCard||Playoffs|
|NL East||W||L||Justin||PECOTA||predW%||Final W||Division||WildCard||Playoffs|
|NL Central||W||L||Justin||PECOTA||predW%||Final W||Division||WildCard||Playoffs|
|NL West||W||L||Justin||PECOTA||predW%||Final W||Division||WildCard||Playoffs|
You'll notice that the Dodgers are at 100% to make the playoffs. That's not a typo or a miscalculation. When you combine their current performance, record and pre-season expectations, they are so far over everyone else in their division (and the entire NL for that matter), that they are essentially a lock.
The NL Central is the tightest division in baseball, with 3 teams within 15 points of playoff probability of each other. Despite the fact that the Cubs have been playing poorly this year (and their record shows it), they still are projected to finish in a virtual tie with the Cardinals in terms of playoff odds, mainly on the strength of players regressing towards preseason projections.
It appears that one of the Yankees, Rays or Red Sox will win the wild card (as per usual). Combined they have an agreggate 98% chance of taking the wild card, with the other eleven teams scrambling to fill that other 2%.