Now that the frenetic pace of yesterday's trade deadline has slowed to a crawl, we have a little bit of room to reflect on one of the most important concepts of sabermetrics (and statistics more generally): regression to the mean. This concept has been furiously discussed over the past several days and it's worth taking a look at what various people have had to say. It all began when MGL, over at the Book Blog, noticed people still didn't seem to get it:
The thing that people don’t understand (actually one of the things) about regression toward the mean in baseball is that the reason any above or below average player will always regress, on the average, towards average, is that they were not really as good or bad as we thought in the first place, based on any of their stats.
His observation was based on a comments thread at BTF about CC Sabathia. But it didn't end there. Of course, there was discussion of that article back at BTF, in which many of BTF's commenters did not acquit themselves especially well (note: ad hominem attacks are neither interesting nor convincing). Two enterprising writers at Hardball Times decided to try to further explain the concept of regression. Here's Colin Wyers:
The theory states that a measurement consists of two factors: an individual's "true score," and measurement error, sometimes called random error. (I'll take this opportunity to note that not all measurement error is random; some is biased.) In sabermetrics, we have taken to calling a player's underlying talent (if it could be observed without measurement error) as "true talent."
We can express this using a mathematical equation:
Observed Performance = True Talent + Random Error + Bias
Separating talent from random error is one of the biggest challenges of sabermetric analysis, and it is at the root of the well-worn saw about small sample size. But it didn't end there.
With more on true score theory as it applies to baseball, Wyers' fellow THTer David Gassko jumped in:
Again, remember that all statistics know is what they show. If all we know about a player is that he (1) plays in the major leagues, and (2) got a hit in his last plate appearance, we have very little to distinguish him from every other major leaguer. The odds are roughly equal of his being above average or below, though it is ever-so-slightly more likely that he is above.
We intuitively understand this principle to be true when it comes to things we know are governed by chance. Take, for example, a roll of the dice. If I roll a six, my first assumption is not that the die is weighted toward the six, but rather that the one-in-six chance that I rolled six in fact came to pass. But if I rolled the die 1000 times, and got 200 sixes (represented 1/5 of observed outcomes), I might start to question whether or not I was using a true die. The same would be true of coin flips (although you have to be careful about that one).
However, when it comes to baseball players, the resistance to this rather uncontroversial aspect of statistics increases a great deal. That leads me to believe that some people do not believe baseball to be governed at all by chance. This is very curious indeed.
However, what we have to remember is that regression to the mean does not imply that good players tend to get worse and that bad players tend to get better, but rather that their statistics do a better job of reflecting how good they are the more data we have.
But perhaps a slightly more concrete example is in order. Let's use UZR and the case of Michael Bourn, courtesy of the Crawfish Boxes. They wonder if Bourn is better than his 2009 UZR indicates:
I think most Astros' fans who watch Michael Bourn play center field think he is a superlative defensive player. Based on watching most of his games, I agree. But a funny thing happened on the way to Bourn's defensive rating. His Ultimate Zone Rating (UZR) for 2009 is -1.6. This is worse than Bourn's UZR in 2008, which was +2. Similarly the Fielding Bible +/- system indicates that Bourn's range has been sub par this year, but that his overall defense has saved 3 runs this year because of his throwing arm.
I think these two defensive metrics are the most sophisticated defensive measures; generally they are the best we have. However, they are just measures, and sometimes they may not be reliable indicators of a players' defense. As the creator of UZR has said, UZR doesn't tell you if a player is a good or bad defensive player, but only whether the player has a good or bad UZR.
It's a good reminder that statistics definitively tell us what a player has done; in other words, it tells us how valuable a player has been but not necessarily what their true talent level is. Without regression to the mean (and for an entire season of UZR the required regression is about 50%), UZR by itself is not a particular good predictor of future UZR. So when we're talking about a little more than half a season of Bourn's UZR, it has to be heavily regressed to the mean. Better still would be to take his career UZR and then regress that figure to the mean of all centerfielders. Only then might we have a good read on how competent a defender Bourn really will be in the future. So the answer to the Crawfish Boxes' question is almost certainly yes, even though I'm not sure they quite got the reasoning right.
To give you an example of how chance intersects with baseball in another (slightly more trivial) way, look at how one particular moment in baseball history--namely, game 6 of the 1993 World Series--impacted people differently. For the Canuck Jonah Keri, it was a pretty great time:
The crowd at Gerts runs out into the streets. We’re all singing O Canada at the top of our lungs. Mark and I jump on a van at one point and lead the drunken proceedings. Madness.
And somehow the story ends with him meeting his wife. Which is to say, pretty well. For me, however, Joe Carter's home run was terrible--the iconic moment of failure that (until last October) I figured would haunt me for decades as a Phillies fan. So terrible, in fact, I tweeted about how I was actually at my ninth birthday when it happened, and I was pretty upset about the Phillies losing and all. Joe Carter's response?
Finally, we'll end with a few highlights from this weekends SABR convention. First, Seamheads writer Ted Leavengood describes Christina Kahrl's keynote speech:
"Diversity of content offering," she said, is a key. She was talking about the need not just to stress statistical analysis in her end of things, but she said the smart journalists are always looking for the newest and best kinds of information, always on top of exactly what people are looking for. "It is asking the right questions," she said, "more than finding the right answers."
SABR 39 was held in Washington, DC this year, and The Washington Times also had a piece on it, including an anecdote about the career matchup between Walter Johnson and Babe Ruth:
Fastballer Johnson won 417 games for the Senators in 21 seasons. Ruth first faced him as a 19-year-old in his third major league game and struck out. In another meeting, Ruth became possibly the only pitcher ever to bat cleanup. And before he hung up his toeplate to become a full-time outfielder, the Babe won six of seven duels with the Big Train, three of them by 1-0 scores.
Want more trivia? Well, Johnson's final appearance in the bigs came as a pinch hitter against the Yankees on Sept. 30, 1927 - the same day Ruth smote No. 60.
I wonder if fans spent a lot of time watching their matchups on contraptions like the Playograph, a sort of predecessor to gamecast technology.
In any event, because of regression to the mean, we know that even though Ruth won six of the seven matchups, he probably wasn't "extra" good against Johnson.