Our supreme overlord and all-around good guy Blez recently asked the BtB All-Stars:

The answer to Blez's question, of course, lies in the Pythagorean theorem and the length of the season. Most work that has investigated the distribution of wins around the Pythagorean expectation is based on empirical data. In a way, this is the way things ought to be, since baseball is an empirical game.

But sometimes it is fun - and informative - to skip past all of the empiricism and do some thought experiments. Blez's question immediately brought to mind a couple of thought experiments that I thought we could do together.

Question the First: How do we know that the Pythagorean theorem is right?

I've talked a lot recently about run distributions, so you should know by now that the distribution that describes the way teams score and allow runs is called the Weibull distribution. Using some mathematical gymnastics described in Professor Miller's paper, we can show that, on average, a team that scores an average of RS runs and allows an average of RA runs will have a winning percentage described by the now famous Pythagorean theorem:

Question the Second: But teams don't always have record that line up with their Pythagorean winning percentage. For a team that allows and scores runs as described by the Weibull distribution, how likely is it that they will actually end up with their Pythagorean record?

Let's do our first thought experiment. Imagine a team that allows and scores, on average, four runs per game. The distribution with which they allow or score these runs is described by the Weibull curve (although, as it turns out, that won't matter). I reach into the runs allowed distribution and pick out a number: I get four. Now I reach into the runs scored distribution and pick out another number: I get five. This team has just "won" the first game of our thought experiment, 5-4. We can repeat this a million times, and since the team, on average, allows 4 runs and, on average, allows 4 runs, they will end up winning 50% of their games as predicted by Ancient Greek and noted baseball fan Pythagoras.

But what if we repeat this only 10,000 times? Or 1000? Or 162? Teams won't always meet their Pythagorean fate, but most come close. It's easy to imagine that there is a distribution around a Pythagorean record, and that this distribution has a certain width. Those with a statistics background will recognize that the distribution around the Pythagorean projection will be normal (shaped like a bell-curve) as a consequence of the Central Limit Theorem. The width of this curve will be defined by the standard deviation, which basic statistics tells us will be 1/[2 x sqrt(N)], where N is the number of games.

[Aside: Yes, we are assuming that runs scored and runs allowed are independent, which is not *strictly* the case, but is true enough for our experiment to illustrate a point.]

Let's look at a graph.

The peak at x = 0.500 shows that the most probable situation is that, after 81 games, a team will have a .500 record. But the shaded area (the area under the curve, or the integral) shows that there is a 11.1% probability that a team whose "true" ability lies at .500 can go 46-35 (or better) after 81 games, on pace for 92 (or more) wins and possible playoff berth. That's not insignificant. On the flip side, there is a 11.1% chance that this team will stumble to a 35-46 (or worse) record after half a season and the GM will start trading off expensive veterans.

After 162 games, however, there is only a 4.2% chance that a "true" .500 team will pull off a 92-win season or greater (once again, the shaded area).

Notice how the peak is sharper and the shaded area accounts for less of the overall area under the curve. As the peak becomes sharper and sharper, the shaded area will account for fewer and fewer of the possible outcomes.

To begin to answer Blez's question, the reason why people say that the cream rises to the top is because, in a sense, it does over the long haul. In the year 2020, when Bud Selig's sentient toupee has extended the regular season to 1,000 mega-games per hyper-season, we will find that there is only a .0000086% chance that "true" .500 team plays the equivalent of 92-win ball (568 uber-wins in the future). The longer the season, the more narrow the distributions get, and, as I said earlier, the standard distribution of winning percentage is always 1/[2 x sqrt(N)]. That's why baseball has a long regular season - to reduce the chance that a truly mediocre team can come out looking like a playoff contender.

Question the Third: How well does a 162-game season do in separating the good teams from the bad teams? The excellent teams from the good teams?

Let's do our second thought experiment. Imagine a good team (a 90 Pythagorean wins team) and a terrible team (a 90 Pythagorean loss team). Neglect, if you will, things like in-season injuries, trades, and desperate Jose Lima signings in the middle of the year. We'll have to neglect the fact that one team's wins are not independent from another team's wins since they actually play each other.

What is the likelihood that the cream does *not* rise to the top - that is, how likely is it for the terrible team to finish with a better record than the good team? If the two teams' winning percentages are independent of each other, then we can compute the probability that the good team will finish with a worse record than the terrible team over 162 games.

For example, we can compute the probability that the good team will have a winning percentage of exactly .500 (1.3%) and that the poor team will have a winning percentage of exactly .510 (1.1%). The probability that both of these events occur simulataneously is the product of these two probabilites (1.1% x 1.3% = .014%). We can do this for all the different situations in which the bad team finishes with a better record than the good one - all infinity of them - and add them up. The end result is the (nerd alert!) integral:

where f1(x) is the probability distribution that the good team has winning percentage x and f2(y) is the probability distribution that the terrible team has winning percentage y. (For those of you who aren't familiar with integrals, well, good luck in your love life. Us Sabermetric Spocks get unbelievable booty.)

Using this equation we can compute the probability that the terrible team has an equal or better record after 162 games as 2.3%. Now let's take Blez's hypotheticals: what if the season were only 100 games long? The probability of that the terrible team has an equal or better record goes to 5.6%. And 200 games? 1.3%. Once again, we can see the value in playing a long season.

Another thought experiment - imagine a very good team, one that plays at a Pythagorean level of 95 wins, in a pennant race with another very good team that plays at a Pythagorean level of 98 wins. Over a 162-game season, the better team only finishes with a better record 63% of the time. That is huge chunk of hypothetical seasons - nearly 1 in 3 - in which the better team does not finish with the better record, by mere chance alone! We can plot the probability that the terrible team finishes with an equal or better record than the good team as a function of season length.

Yes, even after 1000 games, the "true 95 win team" still stands better than a 1 in 5 chance of finishing with a better record. In this case, to be 99% positive that the better team finishes with a better record, you would have to play 8100 games.

The answer to Blez's question is - and I know this will irritate a lot of you - "Because of the sample size." By using some binomial statistics, we can show that the best team, Pythagorically speaking, does not always finish with a best record, and that the probability that they do not is non-trivial. We know that some teams escape their Pythagorean fate, and we have some evidence that leveraging relievers is one way to outperform your Pythagorean projection. Combined with random happenstance, it's not unusual at all that a good team can slip into the playoffs while a slightly better team stays home in October. The more games teams play, the less frequently this will occcur.

(side note: The Indians had 96 Pythagorean wins, and the White Sox had 91. In about 30% of 162 game seasons, a 91 Pythagorean Win team will come out ahead of a 96 Pythagorean Win team. Of course, the White Sox were the ones hoisting the trophy and flashing the bling last week, and I don't think they give whits about Pythagoras.)

Question the Last: You smell bad.

Yeah, but I can integrate like a motherf---er.