One of the biggest things baseball analytics can teach a fan is the role of luck in the sport, both on the individual and team level. I know it was a major watershed moment for me when I realized just how little control even the best front offices have over their team's performance each year. It can be wonderful, and it can be very frustrating, but it's well understood by most statistically-oriented fans.
That said, when we talk about luck or un-luck, we're really talking about a whole host of things, and lumping them all together can be deceptive. I want to talk about what I think the logical grouping is: timing luck and performance luck (catchy names, I know). Timing luck is more commonly isolated, understood, and discussed; this is what looking at BaseRuns (developed by David Smyth, available at Fangraphs) or Cluster Luck (developed by Joe Peta) is meant to account for. If one team's inning goes home run, single, K, K, K, and another team's inning goes single, home run, K, K, K, both are viewed equally from a context-neutral perspective, since each team had a single, a home run, and 3 Ks, but one scored 2 runs and the other 1. That difference is due to the relative difference in timing luck.
The other type of luck (or lack thereof) is a little less well-defined, and some readers might balk at describing it as "luck" at all. It's when a player performs much better than he is expected to going forward, such as when a 30-year-old with a career wRC+ of 90 hits for a month or two at a wRC+ of 120. This might be BABIP-fueled, with more pop flies and slow rollers going for hits than would normally be expected, and in those cases, calling the performance lucky tends to be fairly non-controversial. Alternately, a player might really have performed that well, but wasn't expected to be that good before the hot streak and isn't expected to continue to be that good going forward. "Luck" is maybe not an ideal name for this, since under- or over-performance like this happens for a reason, and not knowing that reason is different from that reason not existing. But these performances are, by definition, unforeseeable with our current tools, and so effectively random from an analytical perspective, which is why I'm alright describing them as luck.
That is the split that makes the most sense to me, but these two aspects are often lumped together, or one is discussed while ignoring the other. Readers probably have a good idea of what teams have been lucky or unlucky so far in the 2015 season, but determining if it's timing luck or performance luck that is driving unexpected pushes or collapses is important. Timing luck, as best as we know, is almost totally uncontrollable. Performance "luck", however, might be a result of a blind spot or bias in our analytic framework, and lumping the two together makes performance luck seem more immutable than it might be.
So: what teams have clustered well or poorly, and what teams have over- or under-performed what we think their true talent is?
First, a leaderboard:
|Rank||Team||6/22 Win%||BaseRuns Expected Win%||W%-BR%|
The source for the above figures is the FanGraphs BaseRuns standings, which strip out timing and context and produce what a team's record "should" be, based only on how they've hit and pitched and not the order in which they've done things. A more precise definition and some good discussion of why BaseRuns are better than something like pythagorean record can be found on Tom Tango's wiki.
The last column on the right shows the difference between a team's actual win percentage and their expected win percentage per BaseRuns, with positive values corresponding to good timing luck and negative values to bad. The biggest gap by far comes for the Athletics, who are well understood to have been unlucky thus far, but the magnitude of the difference is still shocking. They've hit and pitched like a .595 team, and have won games like a .431 team. That corresponds to almost 12 fewer wins than expected in 72 games, which would take them from last place in the AL West to first. They have been absolutely crushed by timing, which is illustrated by their 6-18 record in one-run games.
The only team to play better than the A's by BaseRuns is the Dodgers, who are also the second-unluckiest team by this measure (but still only half as unlucky as the A's). Unlike the A's, however, the Dodgers have managed to cling to the lead in their division, and so with this stretch of bad clustering unlikely to continue, the Dodgers' chances of making the offseason are pegged at about 92 percent by FanGraphs, despite only holding a one-game lead over the Giants.
On the other end of the spectrum, we find the Twins, with a positive difference almost as large as the A's is negative. They have the fifth-worst BaseRuns expected record in the league, but have the ninth-highest actual record and are currently tied with the Yankees for the lead in the Wild Card. Given that their 38 wins are in the bank despite being highly luck-influenced, the Twins do stand an outside shot at making the postseason, but they'll probably have to play better than they have been -- a .426 BaseRuns record is probably not going to continue to result in a .543 winning percentage.
Now, another leaderboard, this one showing the difference between each team's BaseRuns expected winning percentage and their projected preseason winning percentage. I'm looking at the difference between those to see the difference between how teams were expected to perform (projections) and how they actually have (BaseRuns). You might not agree with the projections, but they are the closest thing to a consensus that are publicly available, and if you could do any better you would be very rich, so no complaining allowed. And again, maybe the projections did miss something, but that's part of what this exercise is meant to illuminate. Who has performed better or worse than expected?
|Rank||Team||BaseRuns Expected Win%||Projected Win%||BR%-P%|
Again, it's a little weird to call what the Astros have done "luck", since they really have played this well, but it is very easy to call this unexpected. Part of it is the impact of rookie call-ups like Carlos Correa, but he was given some playing time on the preseason depth charts, so it's not like his presence was unanticipated; it's more the 132 wRC+ that is a surprise.
Similarly, the A's' underperformance of their BaseRuns record seems a little less tragic when you see how much they're overperforming our best guess for their true talent. Stephen Vogt was not expected to have a 15 percent walk rate, nor was Billy Burns expected to have a .367 OBP. Again, these are real events that have happened, and this is not meant to take away anything from the players in question, but it looks like in terms of performance, the A's are hitting the upside on almost every player.
On the flip side, it's hard not to feel a little pity for the Brewers and White Sox, both of whom were seen as sub-fringe contenders by the preseason projections and are instead having downright terrible seasons. In the long-term, being further out of the race might benefit these teams, since there's no temptation for them to sell out for a slim chance at a Wild Card birth, but goodness it is not fun to watch. The White Sox position players, as a team, have combined for an astoundingly bad negative-3.4 WAR. They are almost certainly not that awful, but look at the prior leaderboard, and note that they're actually the team that has benefited the second-most from the distribution of hits. It's very bad on the South Side of Chicago; it could be so, so much worse.
Finally, I wanted to show the interaction between these two categories of luck. The following is a scatterplot showing the impact of clustering on a team's record on the horizontal axis, and the impact of over- or under-performance on their record on the vertical axis. Teams above the line that runs through the origin have had overall good luck, in that the combination of the two types is positive; teams below that line have had overall bad luck.
To give you a sense of the magnitude, the team hit hardest by this combined measure is Milwaukee, to the order of about 9 wins in 70 games; the teams that have benefited the most are Kansas City and St. Louis, both by slightly less than 8 wins in 70 games. Interestingly, there appears to be something of a negative correlation between the two (R-squared of about .15). I'm really not sure why that might be, which makes me inclined to dismiss it until it can be replicated, or observed over a full season, but it's certainly a trend worth keeping an eye on.
Hopefully this was interesting, and leads toward a better discussion of what is causing a team to perform differently than expected. Calling the Royals or Cardinals "lucky" feels strange, so a better vocabulary is definitely needed to describe these phenomena, as it's important to distinguish between the different avenues of randomness.
. . .
Henry Druschel is lucky enough to be a Contributor at Beyond the Box Score. You can follow him on Twitter at @henrydruschel.