clock menu more-arrow no yes mobile

Filed under:

Regression to the mean, starring Phil Hughes

The Minnesota right-hander almost certainly won't replicate his astounding 2014 season — not because of any decline in skill, but because of a statistical principle.

In all likelihood, Hughes won't keep up his current production.
In all likelihood, Hughes won't keep up his current production.
Adam Bettcher

In baseball, as in life, weird stuff happens. Sometimes, this stuff has an aptitude-based explanation; sometimes, it comes as the result of random variation; and most of the time, it's the combination of the two. Because most achievements have both factors behind them, we have an integral statistical concept: regression toward the mean.

While regression to the mean confuses and misleads a lot of people, it's not all that complicated. In a nutshell, it's the idea that a player will tend to approach a certain level of performance. This affects everyone, but the extremes in particular: The really good players will generally play a little worse, while the really bad players will generally play a little better. It doesn't mean that they'll all play at the same level, or that the good and bad players will flip (which the law of averages might suggest). It permeates every aspect of the game, and one can hardly understate its impact.

We'll look at Phil Hughes as a case study. Sabermetrically-inclined folks have written a lot about him this year, and for good reason: Of the 855 batters whom he faced in 2014, he walked 16. The resulting 1.9% BB% led the majors easily (second-place Hisashi Iwakuma sat a comfortable 1.1 percentage points behind) and gave him the best strikeout-to-walk ratio of all time.

Was it a total fluke? Not really — Mike Podhorzer's expected walk rate equation puts it at 0.8%. So why does Steamer predict it'll rise to 4.3% last season? With seemingly little evidence to substantiate it, why should we think that Hughes will depreciate?

To answer that question, I looked to history. Hughes's 1.9% 2014 walk rate translates to a z-score of -2.45, meaning it was two standard deviations below the mean for qualified pitchers. TBF for pitchers goes back to 1916, so I examined the past 99 seasons. In that span, there were 7,354 pitcher seasons that qualified for the ERA title; of those, 82 (including Hughes) featured a walk rate z-score of -2 or lower.

Possessing a suitable list of comparisons for Hughes, I then looked to the following season for each pitcher, to see if he qualified, and if so, what their z-score was. I've put the results in an entirely-sortable table below:

Name Season BB% z_BB% Next nBB% nz_BB%
Phil Hughes 2014 1.9% -2.45 2015 N/A N/A
Josh Tomlin 2011 3.2% -2.26 2012 -- --
Dan Haren 2011 3.5% -2.08 2012 5.1% -1.04
Brandon McCarthy 2011 3.6% -2.02 2012 -- --
Cliff Lee 2010 2.1% -3.26 2011 4.6% -1.43
Roy Halladay 2010 3.0% -2.72 2011 3.8% -1.91
Carl Pavano 2010 4.1% -2.06 2011 4.2% -1.67
Joel Pineiro 2009 3.1% -2.18 2010 -- --
Paul Byrd 2007 3.4% -2.12 2008 4.5% -1.37
Greg Maddux 2007 3.0% -2.34 2008 3.7% -1.74
Carlos Silva 2005 1.2% -2.92 2006 4.0% -1.62
David Wells 2005 2.7% -2.13 2006 -- --
Brad Radke 2005 2.8% -2.08 2006 4.6% -1.32
Brad Radke 2004 2.9% -2.14 2005 2.8% -2.08
David Wells 2004 2.5% -2.32 2005 2.7% -2.13
Jon Lieber 2004 2.4% -2.36 2005 4.5% -1.19
Roy Halladay 2003 3.0% -2.12 2004 -- --
Brad Radke 2003 3.2% -2.02 2004 2.9% -2.14
David Wells 2003 2.3% -2.46 2004 2.5% -2.32
Rick Reed 2002 3.3% -2.24 2003 -- --
Curt Schilling 2002 3.2% -2.29 2003 4.8% -1.24
Brad Radke 2001 2.8% -2.15 2002 -- --
Greg Maddux 2001 2.9% -2.10 2002 5.5% -1.08
David Wells 2000 3.2% -2.20 2001 -- --
Greg Maddux 1999 3.9% -2.08 2000 4.2% -1.78
Shane Reynolds 1999 3.8% -2.13 2000 -- --
Gil Heredia 1999 4.0% -2.04 2000 7.7% -0.32
Brian Anderson 1998 2.8% -2.27 1999 -- --
Greg Maddux 1997 2.2% -3.06 1998 4.6% -1.44
Rick Reed 1997 3.8% -2.17 1998 3.4% -1.99
John Burkett 1997 3.6% -2.28 1998 5.4% -1.07
Greg Maddux 1996 2.9% -2.37 1997 2.2% -3.06
Kevin Brown 1996 3.6% -2.05 1997 6.8% -0.49
Greg Maddux 1995 2.9% -2.72 1996 2.9% -2.37
Bret Saberhagen 1994 1.9% -2.53 1995 5.0% -1.60
Bob Tewksbury 1993 2.2% -2.92 1994 3.3% -1.96
Bob Tewksbury 1992 2.2% -2.57 1993 2.2% -2.92
Jimmy Key 1989 3.1% -2.26 1990 -- --
Bill Long 1987 4.0% -2.30 1988 5.9% -0.77
Dennis Eckersley 1985 2.9% -2.21 1986 5.0% -1.44
La Marr Hoyt 1985 2.4% -2.45 1986 -- --
La Marr Hoyt 1983 3.0% -2.45 1984 4.4% -1.78
Rick Honeycutt 1981 3.3% -2.16 1982 7.4% -0.10
Bob Forsch 1980 3.8% -2.09 1981 5.8% -0.92
Scott McGregor 1979 3.3% -2.30 1980 5.6% -1.06
Dave Rozema 1977 3.8% -2.18 1978 4.8% -1.49
Gary Nolan 1976 2.8% -2.38 1977 -- --
Jim Kaat 1976 3.5% -2.02 1977 -- --
Gary Nolan 1975 3.4% -2.16 1976 2.8% -2.38
Fergie Jenkins 1974 3.5% -2.21 1975 5.0% -1.48
Catfish Hunter 1974 3.7% -2.11 1975 6.4% -0.87
Fritz Peterson 1970 3.8% -2.05 1971 3.8% -1.60
Fritz Peterson 1969 4.0% -2.16 1970 3.8% -2.05
Lew Burdette 1961 2.9% -2.25 1962 -- --
Robin Roberts 1956 3.3% -2.15 1957 4.2% -1.61
Robin Roberts 1954 4.2% -2.01 1955 4.2% -1.62
Fred Hutchinson 1951 3.5% -2.51 1952 -- --
Ken Raffensberger 1951 3.8% -2.37 1952 4.5% -1.84
Ken Raffensberger 1950 3.9% -2.33 1951 3.8% -2.37
Preacher Roe 1948 4.6% -2.03 1949 5.2% -1.58
Schoolboy Rowe 1947 5.3% -2.10 1948 -- --
Tiny Bonham 1945 3.0% -2.36 1946 -- --
Ray Prim 1945 3.5% -2.12 1946 -- --
Schoolboy Rowe 1943 3.5% -2.14 1944 -- --
Tiny Bonham 1942 2.8% -2.40 1943 5.8% -1.08
Ted Lyons 1942 3.6% -2.06 1943 -- --
Paul Derringer 1940 4.0% -2.01 1941 5.7% -1.31
Paul Derringer 1939 2.8% -2.28 1940 4.0% -2.01
Paul Derringer 1936 3.5% -2.09 1937 5.7% -1.04
Red Lucas 1936 3.6% -2.04 1937 -- --
Red Lucas 1933 2.0% -2.60 1934 5.4% -1.00
Herb Pennock 1930 3.0% -2.29 1931 3.6% -1.88
Jack Russell 1929 4.1% -2.11 1930 5.2% -1.12
Herb Pennock 1929 4.0% -2.16 1930 3.0% -2.29
Pete Donohue 1926 3.3% -2.14 1927 3.8% -1.80
Pete Alexander 1925 3.0% -2.23 1926 3.8% -1.89
Pete Alexander 1923 2.4% -2.68 1924 3.5% -1.90
Babe Adams 1922 2.1% -2.42 1923 -- --
Babe Adams 1921 2.8% -2.09 1922 2.1% -2.42
Babe Adams 1920 1.7% -2.63 1921 2.8% -2.09
Slim Sallee 1919 2.2% -2.24 1920 -- --
Babe Adams 1919 2.3% -2.19 1920 1.7% -2.63
Slim Sallee 1918 2.3% -2.23 1919 2.2% -2.24

The first thing to point out, obviously, is the fact that 57 of the 81 didn't qualify for the ERA title in the following year. This only proves the point that Jeff Zimmerman's exhaustive research has made: Pitchers get hurt*.

*On the other hand, the fact that "only" 29.6% of those pitchers broke down (compared to ~40% of the pitchers in the aforementioned study) could support another of Zimmerman's theories: Pitchers with good control generally stay healthier.

For the 58 who did accrue enough innings, though, something becomes clear: They declined. No one did too horribly — everyone's z-score remained negative — and some of them even stayed elite — 16 posted another -2 standard deviation campaign. But on the whole, their performance dropped off, with the group's average z-score in the follow-up year (1.64) more than a half a standard deviation greater than the original year (-2.28). Further, 49 of them saw their z-score increase after their phenomenal year, so even the ones who sustained their dominance did so to a lesser extent.

Why did this happen? These pitchers didn't necessarily devolve as time went along in a true talent sense, they simply benefited from a little less good fortune. In order to post an all-time walk rate like this, everything has to break just right and things usually don't break perfectly forever, through no fault of anyone in particular. As they continued to throw to the best hitters in the world, they performed worse, and in so doing, their walk rate regressed toward the league average walk rate (z-score of zero) during the next season. In other words, this means you're likely to see something between Hughes' old walk rates and his 2014 mark in 2015 due to this shift toward the overall average that has nothing to do with his ability.

Unless he completely melts down, Hughes won't pitch terribly next year. And for what it's worth, if he had put up a 4.3% walk rate this year, that still would have come in at 1.22 standard deviations below average. But he has little to no chance of a repeat, and he shouldn't take the blame for that. The 2014 mark was amazing and performing worse than amazing isn't a deficiency. The universe works in a funny way sometimes; for Hughes, that might end poorly.

. . .

This article has been revised for clarification.

All data courtesy of FanGraphs and Baseball-Reference.

Ryan Romano is an editor for Beyond the Box Score. He also writes about the Orioles on Birds Watcher and on Camden Chat that one time. Follow him on Twitter at @triple_r_ if you enjoy angry tweets about Maryland sports.