There is little doubt which has been the most exciting and toughest division in Major League Baseball over the last couple of years. Ladies and gentlemen, I present you the AL Central. In 2005 the AL Central produced perhaps one of the most dramatic and then anticlimactic races of all time. The Chicago White Sox opened up a 15 game lead before being slowly reeled in by the Cleveland Indians. With a game and a half between them and a final three game set to close the season, the pesky Southsiders swept the Indians who, at the same time, also defused the excitement in the AL East as the Red Sox and Yankees were both allowed to take up their customary postseason slots with a perfunctory whimper.
The AL Central in 2006 could yet fulfill the empty promise of 2005. Currently three teams are in shooting distance of both the division and wild card. Any of about 12 permutations are possible. The Tigers are having a quite extraordinary season, especially in the context of the last few years: By July 1st they were 55-25, a lead of 2.5 games, which they extended to an almost insurmountable 8.5 games on August 1st (it was at this point in 2005 that the White Sox had that 15 game cushion). However, since then they have stuttered a touch with both the Twins and White Sox drawing closer. This is baseball I hear you say, these things happen. Yes, but remember this Tiger team lost an astonishing 119 games not three years ago. They haven't posted a .500 season record since 1993. Are they for real? Can the Tigers really win the division? Also, at what point did people start to believe that the Tigers were the genuine article rather than being lucky? And what about the last few games where the gap between the Tigers and Twins has halved from 9 to 4, has that changed people's perception of how good this team really is? So many questions and so little time.
A few weeks ago I introduced the concept of predictive baseball markets. For those too lazy to click on the link, a predictive market allows the proletariat to bet on how likely it is for a given event to happen or not happen, from which we can infer the probability of the said event occurring. There are many predictive markets in baseball including in-game outcomes, division outcomes, home run totals, Bonds and steroids, to name a few. Take the AL Central division: By looking at the market for the Tigers winning the division we can assess the public's changing perception of the Tigers. Here it is:
This graph tells us quite a lot so it is worth taking time to walk through it. The blue line represents the probability of the Tigers winning the division, while the green bars indicate the amount of liquidity in the market. Given the continual swings in price, liquidity does not appear to be a problem. The green bars can be ignored from now on.
We can see that at the start of the season the Tigers' win expectancy was a lowly 3-4%. In other words the market ascribed almost no chance to the Tigers winning the division. A wager here would have been a profitable and prescient move! The Tigers opened strongly with a 15-7 record in April and saw their win expectancy sneak up to around 15% -- by no means a lock for the division at this point.
Coincidentally, the World Champion White Sox also opened up with a 15-7 record. What do you think their win expectancy was at this point? You'd probably guess that it was higher than the Tigers' and you'd be correct. Below is the White Sox's win expectancy chart:
The first thing to notice is that on the back of their World Series victory many expected them to have the opportunity to repeat the feat by at least making the post season. On opening day their win expectancy was an eye-popping 50%. That was before a ball had even been pitched! After the 15-7 start their win expectancy had surged to 63%. Many thought that the Tigers opening salvo was largely luck and wouldn't be maintained deep in to the season.
This isn't a surprise. Given the Tigers' abysmal recent record it is no surprise that many applied a heavy discount to those early games. After all, the reliability of a 22 game sample is relatively small. From a statisticians perspective a 15-7 record is no different to a 10-12 through random variation alone - the standard deviation is about 10%.
After this strong start to the season the Tigers kept on winning. By the end of May they were still a healthy 35-17 and win expectancy was over 30%. By the all-star break a major league best 59-29 record resulted in a win expectancy of just 40%. Still below where the White Sox opened the season. At this point the Southsiders were 57-31 and still favorites to land the division with 60% probability - again, history weighing strongly on the likelihood of a White Sox repeat.
Following the all-star break the Tigers continued to play well and all of a sudden a seismic shift occurred in the public's perception of the team. No longer were they considered rank outsiders; they started to assume the tag of favorite. Over July their win expectancy soared from 40% to 80% while the White Sox slipped from 60% to a shade under 20% -- this as the Tigers opened up a 9 game lead. At this point of the season the statistical uncertainty is narrower. A .600 team has a random standard deviation of 5% after 100 odd games. Despite stuttering a bit in the last couple of weeks, the Tigers have managed to maintain their win expectancy at this level .
Another striking feature of the AL Central has been the surge of the Twins. Let's take a look at their win expectancy chart:
At the start of the season the Twins' win expectancy was 15%. Any team headed up by Johan Santana has to be a threat, and Francisco Liriano, while not exactly a twinkle in Gardenhire's eye at that point, was also (correctly) expected to break through during in the season. Unfortunately for Twins fans the team got off to a sodden start posting a 9-14 starting record in April, which led to win expectancy collapsing to 3%. May wasn't much better and as the Tigers and White Sox tore up the division the Twins were 23-28 by the month end and the market had largely written them off; win expectancy was now below 2%! From that point on the Twins started a biblicalesque recovery as Santana found form and young Linarno dominated like Pedro Martinez in his pomp. A record of 18-7 in June and 17-8 in July saw the Twins' win expectancy heat up to 12% at one point. OK, so the Twins aren't likely to romp home with the AL Central, but what about the AL Wild Card?
Not surprisingly the expected odds of the Twins winning this has increased to 45% (a combined 57% chance of making the post season), which is an amazing turnaround from the depths and despair of late May when the combined post season odds stood at around 8%!
Prediction markets aren't the only forum with which we can look at a team's win expectancy. Those clever folks at Baseball Prospectus produce Playoff Odds Reports. There are a number of different reports but perhaps the most reliable is based on a combination of the current season's performance buttressed with pre-season PECOTA expectations. This PECOTA component is important because it accounts for the fact that a team may have played above its real ability for a portion of the season. As the season progresses the real ability of a team shines through in in-season play and the PECOTA components become less important. So, how do the PECOTA predictions compare with market win expectancy. I'll focus on odds of winning the division on 1st May, 1st June, 1st July and 1st August for the three main protagonists. Take a look at the following table:
I am not going to spend a lot of time on this but one thing is clear. If you believe that BP's work is more accurate than the market then there is *a lot* of opportunity for arbitrage. Now, if I really thought that then I wouldn't be sat here hammering the keys on my laptop, I'd be pouring all my cash into Internet gambling. A quick subjective glance of the two shows that, for me, both have their flaws. The market may have been a little irrational in its expectation of a White Sox surge (or a Tigers collapse) after the all star break, though that is largely attributable to the Tigers' poor form for most of recent memory. On the other had BP seems a little bullish with the Tigers' odds early on in the season (given what history told us and the narrow gap between them and the Sox) and certainly seems to underrate the Twins. Later on this year, probably after the World Series, I'll post an article on how to calculate playoff odds expectancy using actual data and discuss this apparent arbitrage opportunity in more depth.
Before closing it wouldn't be fair unless I considered the other protagonists in the AL Central. First is the win expectancy chart of the Cleveland Indians.
Preseason, the Indians were by many (me included) picked to romp home with the division and even the whole shebang by a few. That is reflected in their opening day win expectancy of around 35%. However, as they opened the season poorly and failed to recover that soon fell away and the Indians were never in contention.
And what about the hapless Royals? A flat-line at the zero mark? Have a look:
Nothing surprising there!
What have we learnt? Well, apart from the small possibility of a get quick rich scheme, win expectancy is a useful tool that can help us understand the shape and evolution of likely post season play, among other things. Personally, looking back at how prediction markets perceive certain events is fascinating. Not that I want to get into a moral debate of the whatfors and whatnots of on-line gambling, one thing is for sure: This type of analysis will be difficult without it.