No matter what anybody tells you, baseball is a game that is based in numbers. The only thing that separates the performance analysis community from mainstream analysis is which numbers we look at. If you're James Click, you're poring over VORP reports and WXRL spreadsheets. If you're the average fan, you're looking at AVG and HR. And if you're John Kruk, you're still paying attention to pitcher wins and losses (has Randy Johnson gotten to 30 wins yet?).
The only problem baseball has with numbers is that there are too many of them. In order to get a better handle on things, analysts often talk in terms of gross averages; sometimes it is the only way to illustrate a point without burying oneself in information.
And yet, as baseball fans, we often talk about consistency and balance, as if we recognize, even if dimly, that there is some sort of value in predictability. This manifests itself in a lot of ways, but there's one idea in particular that I am going investigate today.
Let me ask you a question: would you rather have an offense of one premiere offensive player and a bunch of average hitters or a deep offense of above average hitters?
I'll be more specific. You can choose between these two lineups:
STAR AND SCRUBS EVEN STEVENS
1 Scrub 270/333/433 Steven 277/340/452
2 Scrub 270/333/433 Steven 277/340/452
3 Star 333/400/600 Steven 277/340/452
4 Scrub 270/333/433 Steven 277/340/452
5 Scrub 270/333/433 Steven 277/340/452
6 Scrub 270/333/433 Steven 277/340/452
7 Scrub 270/333/433 Steven 277/340/452
8 Scrub 270/333/433 Steven 277/340/452
9 Scrub 270/333/433 Steven 277/340/452
Team averages: 277/340/452
(Stats are AVG/OBP/SLG)
Star and Scrubs has Vladimir Guerrero or Manny Ramirez batting third in a lineup filled with otherwise league-average hitters (league average defined as 2004 AL). Even Stevens hitters are all exactly the same - no superstars, but all solid. The mean AVG, OBP, and SLG are the same for each team. Which team will score more runs? Which team will win more games?
Which offense would you want?
To answer this question, I've employed my lineup simulator, described in this article (by the way, my program needs a cool name, so please leave suggestions in the comments). In a nutshell, it is a decent predictor of offensive output that does not take into account secondary effects such as baserunning, opposing pitcher, matchups, and "clutchness." I'm not totally sure, but based on what I've seen, my program tends to underestimate offense.
I used the program to simulate 50,000 games and distilled the results into several forms. Let's see the what my computer spit out:
STAR AND SCRUBS: 5.043 runs per game average
EVEN STEVENTS: 5.042 runs per game average
For reference, a team of all scrubs scored an average of 4.713 runs per game
Hmmmm...not surprising. Two teams with exactly the same AVG, OBP, and SLG will, on average, score the same number of runs. I can hear Marc Normandin now: "You said you would go beyond averages. You're off the team!" Don't fire me just yet, Marc. Let's look at the run scoring distribution for these two teams:
(I was naughty and forgot to label the axes. The x-axis is the number of runs scored in a game and the y-axis is the frequency of that event.)
You can see that the Star and Scrubs team has almost the exact same run distribution as the Even Stevens team. It would appear that having an elite offensive player does not change the distribution of run-scoring in games. This surprises me, as I would have guessed that a premier offensive player having a good game can help you pile up runs very quickly and avoid shutouts. But the balanced attack is just as effective at putting up a crooked number or avoiding a shutout. The Even Stevens are as consistent as the Star and Scrubs: both score the same number of runs on average, and both distribute their run scoring in the same way.
What does it all mean? For one thing, it is important to remember that a team's run scoring distribution can effect their winning percentage. In the 2004 AL:
Runs Scored Winning Pct
That is, a team that scored exactly seven runs in a game won almost 74% of the time in the 2004 AL, averaged over all pitching staffs. Two teams can score the same number of runs on average, but have different records based on their particular run distributions. When I originally started this study, I figured that one type of team would have a different run distribution than the other and that it would result in a different winning percentage. That appears not to be the case.
Talking heads will often tell you that an offense needs to be deep in order to be consistent. I used to agree - now I am not so sure that I do. Maybe, just maybe, using gross averages to describe team offense does not hide as much information as we thought.
(I am in no way claiming that run distributions are not important - I have a lot of upcoming work on the topic. I am saying that lineup construction may not have a huge effect on run distribution.)
It also makes me wonder whether I've been too hard on the Anaheim Angels. I've said several times that having only one offensive threat - Vladimir Guerrero - would hurt their offense. And yet for all my tooth-gnashing, they score nearly as much as a team without an offensive standout such as the Oakland Athletics. (I apologize for using the Angels and A's as examples so often - it's just that I'm a fan of one of the teams, and for two teams that have such disparate philosophies, they sure have similar records).
From a risk-management point of view, having the deeper offensive team is a better way to construct an offense; in the event of an injury to your star, you're out of luck. Salaries also seem to increase non-linearly with offensive output, and the money saved by getting out from under a Manny Ramirez contract and replacing him with a couple of Grady Sizemores and Coco Crisps may allow you to upgrade that pesky middle relief or the back end of the rotation. I don't think Theo Epstein is nuts to attempt to dump Manny Ramirez, despite Manny's cartoonish offensive stats. In fact, if the Red Sox ever did dump Manny's contract, I think they could be quite dangerous with that free money (if spent correctly).
Expect more from me on run distributions, especially as the season ends and league data is made available through Retrosheet.