The following is the first part of a two-part set reviewing the 2005 season for the Angels and Athletics. But really, it's an excuse to do a run-distribution project.
2005 Team Review: Oakland Athletics, Los Angeles Angels of Anaheim, and Run Distributions
The AL West was a two team race, as expected, between the Oakland Athletics and the Los Angeles Angels of Anaheim. I've been asked to write a review of the 2005 season for these two teams, and I've thought about what angle I should take. Big-budget versus small-market? Nah, it's been done. Nor-Cal smooth versus So-Cal cool? Also been done. Moneyball versus smallball? That one has definitely been done.
Instead, we'll start traditionally, taking a look at the players who played the games, and then we'll take a closer look at the numbers that decided the season.
The story with the Angels was their pitching: a deep rotation, featuring four starters with an ERA under 4.00 without anybody posting absurdly low BABIPs. John Lackey (3.66 RA) broke out in a big way, striking out one out of every five batters he faced and allowing only 13 home runs all year, down from 22 the year before. Jarrod Washburn (3.35 RA) also cut his home run rate and continued to induce groundballs as he did in 2004. Big Bartolo Colon, while not even close to being the best pitcher in the AL, managed to get his RA (3.76) to more closely reflect is weight. Scot Shields (9.6 K/9), Frankie Rodriguez (12.2 K/9), and Kelvim Escobar (9.5 K/9) anchored a spectacular strikeout bullpen to round out the pitching. An above-average defense, as measured by Baseball Prospectus' defensive efficiency (.702), rounded out the run-prevention unit.
The offense was another story altogether, falling well below expectations. It is here that I will allow myself to be critical of the Angels. Between their farm system and their payroll, they have a deep talent pool from which to choose players. Management's insistence on playing the contracts of Garret Anderson (95 mOPS+) and Darin Erstad (93 mOPS+) over the talent of Juan Rivera (100 mOPS+) and Casey Kotchman (112 mOPS+) cost the team dozens of runs last year. Kotchman in particular had been major league ready for over a year, and yet Erstad's hacktastic plate appearances kept piling up. No amount of first base defense can make up for that kind of offensive performance. To overpay for Orlando Cabrera (90 mOPS+) with their deep corps of middle-infield prospects was also foolish. Adam Kennedy will be a free agent after 2006, and the handling of what ought to be his departure should be closely monitored as a barometer of the front office proclivities. Will Arte Moreno's big bucks turn this team into the Yankees, afraid to play youngsters and trading prospects for veterans every year come July?
There were some positives on the offensive side of the ledger. Vladimir Guerrero turned in a typical superstar year (128 mOPS+), and he'll continue to be the most important player on the team as long as his back holds up. Even the Steve Finley gambit was reasonable last offseason if you buy into the theory that Erstad can't play center anymore. GM Bill Stoneman's creative acquisition of Maicer Izturis, along with the continued strong play of superutility man Chone Figgins, gave the Angels much-needed positional flexibility. Still, the Angels misused their considerable resources, and as long as they continue to do so, the mid-market Athletics will always have a fighting chance in the AL West.
Of course, this was supposed to be the year that the A's didn't have a fighting chance in the AL West. GM Billy Beane, whose very name has become anathema to baseball traditionalists, traded away Mark Mulder and Tim Hudson and turned over the 1999-2003 Moneyball teams. It was a qualified success, as the A's managed to contend until the final week of the season while fighting injuries. Strangely enough, the young players did their part, with Joe Blanton (3.84 RA), Danny Haren (4.19 RA), and Dan Johnson (109 mOPS+) turning in solid seasons. Huston Street (8.3 K/9) and Justin Duchscherer (8.9 K/9) were spectacular in the bullpen, and Mark Ellis (115 mOPS+) was nothing short of phenomenal at second base. Short of Roy Halladay, Rich Harden (2.95 RA) was the best pitcher in the AL when healthy.
The real problem with this team was the veterans: Mark Kotsay (98 mOPS+), Eric Chavez (105 mOPS+), and especially Jason Kendall (93 mOPS+) turned in their worst performances in years. Even Barry Zito's bounceback year (4.18 RA) wasn't a return to his halcyon days. Scott Hatteberg (94 mOPS+) was a downright awful DH, and 2005 should conclude his most compelling Moneyball chapter. Keiichi Yabu and The Joe Kennedy Experiment were basically busts, and Juan Cruz was a total disaster. There is hope for next year, as Kotsay, Kendall, and especially Chavez should all be expected to be more productive in 2006, along with projected improvements in performance and health from youngsters Nick Swisher, Bobby Crosby, Harden, and Johnson.
One of the more interesting subplots in Oakland's year was their defense. Over the last few years, Beane has transformed his team from slow-footed sluggers to a pitching-and-defense team. I submit that it was an adaptation to the new baseball economy. Young pitching is never overpaid because of salary control for players with less than six years of service. Beane also appears to have figured out how to value defense, as he stocked up on strong defensive players such as Jay Payton, Kotsay, Chavez, and Ellis. Put it all together, and you had the AL's top defensive unit as ranked by defensive efficiency (.715).
The Angels took control of the division early in the year while Oakland stumbled badly in May. A remarkable run of .800 baseball put the A's back in the race, and they took the division lead by defeating the Angels in two dramatic games, culminating with Jason Kendall scoring the winning run in the bottom of the ninth on defensive indifference on August 11. But the Angels finished the year on a 21-9 tear, getting a huge boost from the return of Kelvim Escobar, while Oakland struggled without Bobby Crosby and Rich Harden, who were both injured for most of September.
In the end, the Angels won the AL West by a seven game margin, although the division was much, much closer than that.
According to the Pythagorean theorem, baseball version, the Athletics lost more games than expected and the Angels won more than expected; in reality, the Angels won the division in the final week of the season and finished with a wide cushion. (According to Baseball Prospectus metrics, the Athletics outplayed the Angels by three games. The Prospectus standings are based on adjusted batting and pitching lines. While informative, it doesn't help us here because the teams did score and allow the number of runs they did, whether or not they should have.) Let's take a closer look at the these Pythagorean projections, starting with the theorem itself:
The matter of the exponent γ has been a subject of some debate for some time; the best appears to be the Smyth/Patriot model, which results in γ = 1.8 in today's run-scoring environment. I used 1.8 in the Pythagorean records shown above, although the easy-to-use at-home version of the formula uses γ = 2. When a team over- or under-performs its projected winning percentage, you can hear the chatter at sabermetric cocktail parties: "They were lucky/unlucky." But is that the whole story? After all, baseball is a game of discrete events: each game is played and a certain number of runs are scored by each team. You can't carry extra runs over to the next game.
The Angels outscored their opponents by 115 runs; the A's by 109 runs. That's almost the same. Let's combine these teams into the one ?ber-California team and say this team outscored its opponents by 110 runs. If this team could exactly control how it scored and allowed runs, it would win an incredible 136 games. But just between you and me, I'm pretty sure that whether they take-and-rake or bunt-and-run, it's just not possible for a team to only win by one run and only lose by one run.
It's not possible because teams generally don't decide to score X runs and give up Y runs in a game. Hitters keep trying to hit and pitchers keep trying to stop hitters from hitting, unless the game has really gotten out of hand. Teams distribute the runs they score and the runs they allow in a very particular way, and it's something that tends to be glossed over. Imagine two teams, one that scores five runs every game and another that scores ten runs every other game. They will still have the same average number of runs scored, but which team will win more often? The table below shows the average winning percentage, sorted by number of runs scored for the 2004 AL. Also shown is the percentage of time with which that score occurs.
The team that consistently scores five runs will, on average, have a .607 winning percentage and the team that scores 10 runs every other game will have a .483 winning percentage. Of course, this is an extreme example. But teams do distribute the way they score and allow runs, and I submit that the correlation between the two is low. Yes, managers tend to manage their bullpen differently depending on the score differential and will occasionally swap out position players for benchwarmers at the end of a blowout. But baseball is a funny game, and we've all seen games where the mop-up man throws four shutout innings or the backup catcher hits a three-run homer. For the purposes of this exercise, we'll assume that the number of runs a team scores has little effect on the number of runs a team allows in any particular game.
(There's a lot of mathematical jargon coming up that, if you ever want a date again, you will either deny reading or deny understanding.)
I have wondered for a while what type of distribution describes run scoring. We can't use a symmetric distribution like a Gaussian (bell curve), since there is a lower limit on runs scored (zero) but no upper limit. Still, the distribution has to decay to zero since we know that the probability that a team scores an infinite number of runs must be zero. A Rayleigh distribution might make sense if a team is never shut out. But I've seen Royals games, and I know that a team can score zero runs.
What kind of distribution is it? Professor Steven Miller of Brown University has the answer. I'll skip most of the mathematics, which is as thrilling as a Rockies-Reds game in September. The key is that Professor Miller has shown that a three-parameter Weibull distribution describes the run distribution of teams quite well (A HUGE thanks to andeux of Athletics Nation, who uncovered this nugget and steered me toward it). The three-parameter Weibull distribution is:
In English, the frequency f with which a team scores (or allows) x runs is equal to a long messy equation with three parameters, α, β, and γ. The real magic of the Weibull distribution is that it can be used to derive the Pythagorean theorem - and the exponent is the same in both the Weibull and in the Pythagorean theorem. (Think about that for a minute: the Weibull distribution models run distribution well and results in the Pythagorean theorem, baseball version. Now that's cool!) Furthermore, Professor Miller shows that there is no reliable statistical correlation between runs scored and runs allowed per game, an assertion I made earlier without proof.
What do we do with all this mathematics now? Well, Marc has asked me to write about the Athletics and the Angels, so we'll look at their run distributions and compare it to a Weibull distribution. For our three parameters, we'll use:
* γ = 1.8. This is the generally accepted value of the Pythagorean exponent in today's run-scoring environment.
* β = -0.5. This is a mathematical trick that Professor Miller's paper discusses in some detail. I won't rehash it here, but you can check the original paper if you are interested.
* α is computed so that the observed average is equal to the Weibull-determined average. By taking the first moment of the Weibull distribution, the average μ can be computed as
where Γ is the well-known gamma function. Thus
and it is calculated separately for both runs scored and runs allowed. This is not as robust as minimizing the mean-square error, but it sure is quicker.
(This is more or less the end of the heavy-hitting mathematics. Please start paying attention again, and I'll keep the jargon to a minimum.)
The plots show the actual distribution of runs scored or allowed (circle) and the three-parameter Weibull distribution (lines) using the parameter values described above. If a circle lies below the line, then the event occurs less frequently than the Weibull model predicts. If the circle lies above the line, then the event occurs more frequently than the model predicts. (The shape of the Weibull curve will look familiar you have read articles about my lineup simulator.)
Keep in mind that if a team matches the Weibull distribution exactly, it will also match its Pythagorean projection since the latter is derived from the former. Of course, neither team exactly matched the distribution. Where were the differences?
The Angels scored runs in line with the Weibull distribution, with the notable exception that they scored three runs in a game more often than expected. On the other hand, their run prevention unit did not give up two, three, four, or five runs with nearly the frequency expected. These are the most important runs in a game, and the Angels' ability to suppress these runs was a key in their outperforming their Pythagorean projection. The fact that the Angels matched or exceeded the model in this region on the run production side only helps matters. Indeed, when scoring fewer than five runs, the Angels were a respectable 32-55. An average team would win only a quarter of such games. The offense was just good enough to get enough runs to win, and the Angels' stellar pitching did the rest.
The A's, on the other hand, were a miserable 19-64 in games in which they scored fewer than five runs, for a .229 winning percentage. Not only is that winning percentage worse than the average AL team, the fact that they could not manage at least five runs in half of their games is also alarming. Their runs scored distribution has a blip similar to the Angels' runs allowed distribution in the 3-4 run range. The same blip that helped the Angels win a couple of extra games probably cost the A's a few games. And yet, at the end of the season, Oakland had scored an above-average number of runs. How? Check out the tail end of the Oakland runs scored distribution. They had a number of games where they racked up a ton of runs. Unfortunately for the A's, runs have a diminishing marginal return: scoring an extra run when you already have ten is not nearly as valuable as doing so when you've got only four. Coupled with a runs allowed distribution that does not compensate, and the A's underperformed their Pythagorean projection.
The A's won in blowouts while the Angels usually won tight games. The average margin of victory for the A's was almost 4 runs last year and only 3.3 for the Angels. I've included a table which shows the winning percent when a team scores a particular number of runs. The A's were world-beaters when they managed to score 5 runs, but absolutely putrid otherwise; the Angels had respectable winning percentages even when they couldn't score very many runs.
In the end, the difference between the two teams in 2005 was that the Angels had a more optimal distribution of runs scored and runs allowed than the A's did. The peculiar run distributions of the Anaheim run prevention unit and Oakland's run producing unit are the culprits here, and given the similarities in their aggregate runs scored and runs allowed, the run distribution almost wholly accounts for the gap in the final standings. I don't think I can put it any more plainly than that.
Stay Tuned for Part Two, tomorrow