FanPost

clusterluck

So about a month ago I finished reading a book by Joe Peta called "Trading Bases: A Story About Wall Street, Gambling, and Baseball". Incredible read for anyone into sabermetrics, sports betting, the corruption of high finance, you know, the fun stuff. Early in the book, Peta introduces a concept called "clusterluck", that essentially shows how much a team benefited from, or got screwed by, their distribution of hits. For those familiar with the Pythagorean Theorum of baseball pioneered by Bill James in 1979 and since refined, you can basically estimate a team’s winning percentage based on two simple stats: runs scores and runs allowed. Not exactly brain surgery. But Peta took it one step further by refining runs scores and runs allowed into how many runs a team should have scored and allowed based on how efficient their offense and pitching were.

Since its March and we’re approaching Selection Sunday, let’s do a little blind resume test from the 2013 season that will show you exactly how clusterluck can win you some money betting team over/unders (or, to put in a way that seems less shady, investing in the baseball futures market).

Note: BA/OBP/SLG; Pitching stats are opposing teams averages against; In the actual regression analysis, BA is subtracted from OBP to give a team’s Isolated Power, but that does not change the validity of the following example.

Team A:

Hitting : .238/.300/.392

Pitching: .244/.318/.400

Team B:

Hitting: .242/.307/.376

Pitching: .261/.318/.413

Just looking at the numbers, both of these teams are pretty bad. The MLB average for both hitting and pitching stats are the same: .253/.317.396. The teams are right about the same in hitting efficiency. Team A outperformed Team B in pitching efficiency by a bit, but all things being equal, these two teams should both be well below .500 and within a few games of one another. The idea behind clusterluck is that a team’s hitting and pitching statistics have a statistically strong relationship with the number of hits it takes to score/allow a run (duh). In fact, a regression analysis of the five previous seasons yields the following relationship:

(If you are not statistically minded, you can ignore this part- it’s not actually necessary but just thought I would include it for those who may be interested.)

Expected Hits Per Run Scored= 3.925 – 9.135(OBP) + 7.099(SLG) – 12.365(ISO)

Expected Hits Per Run Allowed= 4.094 – 9.443(OBP) + 7.019(SLG) – 12.630(ISO)

(As noted above, ISO is simply SLG – BA)

However, there are always some teams that, for reasons that basically amount to the luck behind their distribution of hits (their hits/hits allowed came disproportionately with runners on base), will score and allow more or less runs than they should. The ability to spot these teams gives you an inherent advantage and a better starting point from which to make a team wins projection for the upcoming season. Case in point: in the above example, Team A is the Chicago Cubs (66 wins), Team B is the New York Yankees (85 wins).

That’s right everyone, a 66 win Chicago Cubs team had a slightly better (technically more efficient) season than a team that won a full 19 more games. Run their actual stats through the regression analysis above to correct for clusterluck, apply the Pythagorean Theorum using the amount of runs a team should have scored and allowed, and suddenly the Cubs are a 75 win team and the Yankees are a 72 win team. Obviously these numbers have to be corrected for roster changes before you make a 2014 projection, but you are now operating from a completely different (and I believe much more accurate) model than the one used by most sports books.

Here are the 5 luckiest and unluckiest teams from the 2013 season:

Luckiest:

1. Yankees (+13 wins)

2. Orioles (+6 wins)

3. Indians (+6 wins)

4. Cardinals (+5 wins)

5. Royals (+5 wins)

Unluckiest:

1. Cubs (-9 wins)

2. Tigers (-8 wins)

3. White Sox (-7 wins)

4. Brewers (-6 wins)

5. Marlins (-5 wins)

Now here are the revised standings from the 2013 season after stripping out the effects of clusterluck:

NL East:

1. Braves (93-69)

2. Nationals (86-76)

3. Mets (75-87)

4. Phillies (69-93)

5. Marlins (66-96)

NL Central

1. Pirates (93-69)

2. Cardinals (92-70)

3. Reds (91-71)

4. Brewers (80-82)

5. Cubs (75-87)

NL West:

1. Dodgers (91-71)

2. Giants (79-83)

3. Diamondbacks (78-74)

4. Rockies (76-86)

5. Padres (73-89)

AL East:

1. Red Sox (99-63)

2. Rays (92-70)

3. Orioles (79-83)

4. Blue Jays (76-86)

5. Yankees (72-90)

AL Central:

1. Tigers (101-61)

2. Indians (86-76)

3. Royals (81-81)

4. White Sox (70-92)

5. Twins (67-95)

AL West:

1. Athletics (96-66)

2. Rangers (88-74)

3. Angels (81-83)

4. Mariners (73-89)

5. Astros (54-108)

Some shuffling around in the standings (raise the Jolly Roger the Buccos win the Central!) but nothing huge because the majority of teams perform right around where the model predicts they will. Anything outside of a 3 win differential from the number of games a team won and the number they should have won should be seen as good or bad luck. For the Yankees, a differential of 13 is ridiculous. There were over twice as lucky as any other team in baseball. Without even running the numbers for roster changes, I can assure you, given most books have them around 85.5 wins, the under will be an absolute steal (granted, Tanaka is something of a wild card. How much of a wild card is up for debate, but there's no shot even the most extreme estimates makes up the difference).

Depending on how much time I have and if enough people are interested, I can run the numbers for roster changes using WAR projections, come up with some projected 2014 standings, and show the best value on team over/unders.

PS- If you found yourself thinking that that Yankees (or any other team) were "clutch" rather than "lucky" because they had a ridiculous amount of hits come disproportionately with runners on base, do the following: ball your hand into a fist and punch yourself in the face as hard as you can. I could go on for days about the myth of "clutch" in sports.



Trending Discussions