clock menu more-arrow no yes mobile

Filed under:

Keeping score: can batting average and pitcher wins predict World Series winners?

A recent New York Times article discussed key statistics to use to predict the winner of this year's World Series. Are there better stats to use and do they better predict winners?

Rob Carr

In yesterday's New York Times, there was an article that discussed four statistical factors that, based on the authors professional research, might help predict the winner of this year's World Series between the Boston Red Sox and St. Louis Cardinals. Using data and analysis from championship games or series from the four major sports -- the MLB, NFL, NBA, and NHL -- and the major finals in golf and tennis, the authors have identified 50 characteristics indicative of winning it all. Their work relates key statistical factors to concepts of sports psychology, such as leadership, consistency and minimizing errors. While their factors are heavily steeped in relevant research, the statistics they use appear to be some of the more antiquated measures of success and performance, and ones that have been replaced by more advanced metrics. Let's discuss these four factors and also provide some potentially better alternatives that could provide a better prediction of success.

Leadership: The World Series finalist with the better top of the rotation, measured by total victories by its top two pitchers, has won 64 percent of the World Series over the past 24 years

For this point, the authors look to wins of the top two pitchers of a given rotation. How they come to selecting the top two pitchers is not discussed -- is it top two by wins, top two by rotation spot, or some other measure? Beyond this, as most who read Beyond the Box Score already know and understand, using wins as their measure of leadership, in spite of the stat being dependent on things that pitchers aren't always necessarily in control of at all times, is probably not the way to best exemplify leadership.

An alternative? How about fielding independent pitching (FIP)? With FIP, we have a better gauge of the things that a pitcher has control over and thus, an improved ability to measure how much success can be attributed to his performance. Add to this the year-to-year correlation of FIP (0.59) being higher than wins (0.29) across starting pitchers, and we have a reasonable alternative that provides more information that could be used as a measure of a pitcher's leadership and with more year to year stability.

Consistency: Our research points to batting average as a good indicator of consistency. The team with the higher league rank in batting average has won 72 percent of the past 23 World Series.

Switching to batting stats, we are introduced to the concept of consistency, with team batting average being the surrogate. Again referencing the fantastic work done by Bill Petti here at Beyond the Box Score, we see that year-to-year correlations of a player's batting average isn't as strong as some other more advanced metrics. Take, for instance, weighted on base average (wOBA); not only does it include a batter's walk rate, it also goes beyond a simple batting average, by providing the proper value for each type of hitting event. Using this advanced stat not only provides improved context of what a batter did at the plate, but is also a better indicator of consistency, as judged by year to year correlations.

Defense: Over the past 24 years, the team with the better defense, as measured by league rank in fielding percentage, has won 64 percent of the World Series.

Defensive metrics are still works in progress, but they have at least gone well beyond simply providing an measure of error rate to a player. Stats such as ultimate zone rating (UZR) and defensive runs saved (DRS) provide a wealth of information as to the defensive prowess of a player or team, including some of the less obvious skills that go into playing good defense, such as the range and arm strength of a player. Much like our previous alternative stats, UZR and DRS show some year-to-year consistency across player.

Big-Game Experience: Big-game experience has had a positive relationship with winning championships across all major North American sports we have studied...But in baseball over the past 24 years, the team with more big-game experience has won only 47 percent of the time.

Again, we run into the issue of semantics; what exactly is 'big game experience'? Is it any postseason appearance, or does it only include World Series appearances? Is the criteria based on simply being on the roster, or do you require a certain number of plate appearances to qualify? Thankfully, much of the statistical heavy lifting for this argument has been done for us, thanks to Russell Carleton at Baseball Prospectus. Recently, he wrote about this exact phenomenon, and concluded, much like the NYT authors, but with a little more statistical finesse, that big game experience doesn't play as large of a role as one would think on world series wins. As such, no alternative methods of relating the phenomenon to an advanced stat are necessary; big game experience is a contentious factor, at best.

Let's go back and discuss the first three factors -- leadership, consistency, and defense -- and talk more about the alternative stats and their potential for predicting the World Series winner. Do our choices -- FIP, wOBA, UZR, or DRS show us anything of significance when using them in place of the authors' choices of pitchers wins, batting average, or fielding percentage?

For leadership, let's try to replicate their methods, using the assumption that the 'top two' pitchers of a rotation are the starters for Game 1 and Game 2 of each respective World Series, and compare average FIP. When we do so, we find that in the last 24 years, it is a 50:50 toss up for FIP; half of the winners had an average top two FIP higher than the loser, half didn't. Somewhat similar results to the authors, whose 64% results using wins is pretty close to a coin flip.

Using our new measure of consistency (wOBA), performing the same frequencies as the authors showed a similar trend seen with FIP -- of the past 24 World Series, the winning team had a better wOBA 57% of the time, a 15% drop compared to batting average.

Defensive metrics show the same trend when applying the authors' methods of comparison. When replacing fielding percentage with UZR, only 45% of the last 11 World Series winners had a better UZR than the losers; DRS is only slightly better, with a 50:50 split seen in the last ten World Series. Our data for UZR and DRS are limited to the last 11 and 10 years, respectively, due to each stat being available for that given length of time; adjusting the fielding percentage frequencies to reflect the years of interest that include UZR and DRS data, we find some slight differences in the frequencies of the older defensive metric than what the authors discussed in their article.

Years New Stat, % Fielding Pct
2002-2012 UZR, 45% 50%
2003-2012 DRS, 50% 40%

Here, we again see there not being much predictive power with either new or old stats, and are essentially back at a coin flip with respect to probabilities. However, we do see that in the years of interest, DRS is a slightly better predictor of World Series success, adding to its advantage over fielding percentage beyond taking more factors into account when explaining defensive prowess than just putouts, assists, and errors.

The results of the frequencies are fairly surprising. While the method used to compare the statistics of choice isn't terribly rigorous as described in the article, it does point to an interesting dichotomy between some of the older, traditional stats and their newer, more advanced counterparts. While it will always be behooving to use those statistics that provide the most bang for the buck within the proper context, the notion that some of the more basic, straight out of the box score stats are reasonable to use to predict outcomes, at least on a team level with a reasonably large data set, isn't farfetched, in spite of other stats providing a more granular and complex description of a particular game situation. These results should also not completely dismiss the use of sabermetrically derived statistics as valuable tools in describing and predicting performance. As stated previously, the statistical method for comparing the groups of interest was not particularly complex and further investigation using alternative methods of explaining the differences between World Series winners and losers, including, but not limited to logistical regression methods, should be pursued.

In the end, this exercise showed the surprising ability of certain traditional stats in predicting particular events at a cursory level. It also showed how data that shows consistency and correlative properties from a player perspective from year to year doesn't necessarily retain that property when applied to team level data, or to different game situations, adding to the perception that in the playoffs, it's a whole different ballgame.


All data courtesy of FanGraphs and Baseball-Reference

Stuart Wallace is a writer at Beyond The Box Score. You can follow him on Twitter at @TClippardsSpecs.