clock menu more-arrow no yes

Filed under:

2016 College World Series preview

We simulate the NCAA division 1 baseball championships that start this weekend.

Our model gives the Virginia Cavaliers a 6% probability of making their 3rd straight championship series.
Our model gives the Virginia Cavaliers a 6% probability of making their 3rd straight championship series.
Bruce Thorson-USA TODAY Sports

The Division 1 College World Series kicks off this week. 64 of the 300 teams at the NCAA's highest level will compete, leading up to a best-of-three finale in Omaha, Nebraska, beginning June 18.

To get you ready for the event, we built an Elo rating of all D1 teams. We have previously used the Elo ranking system to compare international baseball squads. Others have used Elo systems to compare football teams, professional and collegiate basketball teams, soccer teams, and more.

We trained our model on a database of all games between D1 teams from 2002 through the present, available on GitHub here. Games from 2002 through 2014 were used for training, and games from 2015 and 2016 were used for validation. (Games between teams in lower divisions were not included in this rating.) After each season, a 25 percent regression to the mean was applied -- in other words, a team's rating at the start of a new season was three parts its rating at the end of the previous season, and one part the rating of an average team. Home teams were given a 60-point bonus (equivalent to a .585 winning percentage). Each game was given a weight that approximately correlated with its importance, as listed in the table below. Margin of victory also impacted the weight assigned to each game. The cube root of margin of victory acted as a multiplier, so winning a game by 8 runs was worth twice as much as winning a game by 1, and winning a game by 27 was worth three times as much.

Event Weight
Non-conference game 15
Conference game 20
Conference tournament 25
NCAA tournament 40
CWS finals 45

The results of calibration tests on the training and validation sets can be seen below. For each game, an expected winning percentage was computed using the Elo model. The home team's expected winning percentages were grouped into bins 2.5 percentage points wide (e.g., from 50% to 52.5%), and the winning percentage of all home teams in those games were computed. The actual winning percentage for each group closely matches the expected winning percentage, as represented by the dotted diagonal line. This tells us the system is well-calibrated.

The spreadsheet below shows the current rankings as of the end of conference tournament play. You're more than welcome to take these with a grain of salt: As well as the system did in training, there's not a very strong correlation between these results and the consensus ranking. I am way too high on Ohio State and way too low on Texas Tech, for example.

We can also use these ranking to predict the upcoming College World Series, the NCAA baseball championship. The event is split into three rounds over the next three weekends:

  • Regionals: Four teams (seeded 1 through 4) compete in a double-elimination tournament. There are 16 regionals at campus sites (or occasionally in a nearby minor league park). Our simulation gave the host team the same 60-point home field advantage as above.
  • Super Regionals: Two regional champions meet in a best-of-three series, also held on a campus site. The eight super regional winners advance to the...
  • College World Series: Held in Omaha, the College World Series is held in two stages. The first stage is two four-team double elimination brackets; the winner of each bracket meet in a best-of-three series to decide the national champion.

Our predictions are in the spreadsheet below, based on 100,000 runs of a Monte Carlo simulation. (The MATLAB code used to simulate the tournament is also available on GitHub.) Each team gets a probability of advancing out of its regional, out of the super regional, into the College World Series finals, and winning that finals.

But remember, the combination of college players and a double-elimination tournament means we're still talking about a pretty unpredictable event. We're only a few years away from a four-seed winning the national championship.

. . .

All statistics courtesy of NCAA. These statistics are freely available in MySQL-compatible format through Bryan's GitHub page. Special thanks to Christopher D. Long and Meredith Wills for their code.

Bryan Cole is a featured writer for Beyond the Box Score, who will be rooting (as always) for Tulane to make it back to Omaha. You can follow him on Twitter at @Doctor_Bryan.