Please welcome Chris St. John, our newest contributor at Beyond the Box Score. He will generally be posting about statistical oddities and other sabermetric wonders. -ed.
Legendary Dodgers broadcaster Vin Scully likes to repeat a quote from a well-known former Major League manager, "Give me 50 games and I'll know what kind of team I have." I don't remember who said it, or what the exact quote is, but that's the gist of it. Just for reference, 50 games into the MLB season usually lands around the end of May.
I wanted to test this out and see how quickly we know how good a team actually is, so I did what any regular baseball fan would do: I went to coolstandings.com and grabbed the record at the end of each month for every team since 1998, when the Tampa Bay Devil Rays and Arizona Diamondbacks were added to the major leagues. Then, I looked at the end of month winning percentage and compared it to the end of season win total, using a linear regression. I also split each month up into bins of team winning percentage, where each bin contains about 65 teams.
Teams are all over the place at the end of April. Sure, the 43-119 2003 Tigers looked terrible, but so did the 102-60 2001 Oakland Athletics (end of April WP of .320). There is a definite pattern, but not enough to say who is good or bad with any certainty.
Using Pizza Cutter's "magic number" of stability of r^2 = .5 from this classic article on good sample sizes for statistics, winning percentage is "stable" by the end of May. However, there is still quite a bit of overlap between terrible teams and great teams. The 74-win 2005 Orioles had an end of May winning percentage of .608, while the 89-win 2005 Astros has an end of May winning percentage of .373. Still doesn't look good enough.
June is where teams really begin to separate from each other. The really good and really bad teams are obvious for the most part, but the middle-of-the-pack teams are still pretty blurry. If a team is still winning at a .600 clip by the end of June, they're a good bet to finish the season with at least a respectable record. However, there are still some 80-win teams left there, so be careful.
July looks pretty similar to June with a little more separation. The 95+ win teams are all looking really good and all of the .570+ teams should finish with at least a winning record.
By the end of August, the standings are fairly well set. There may still be some jostling in the middle of each division, but the top and bottom teams are pretty obvious.
Once the end of September rolls around, we know with almost perfect certainty how many total wins a team will have. Obviously this is true, since the season is basically over. The only thing left to determine may be a particularly close playoff race. One final thing I looked at was the average winning percentage by month, grouped by end of season wins. The win totals were chosen to select a similar total of teams in each bin.
There are some interesting trends to spot here. First, on average, the great teams (94+ wins) have great winning percentages throughout the year, but they actually get better as the season progresses. The average end of month winning percentage increases from .594 in June to .605 in September. This may be due to trading for a great player in June or July and also feasting on lesser competition once September call ups roll around.
A barely note-worthy trend: terrible teams are terrible all year. They are terrible to begin the season, get a little less terrible in the middle of the season and then get more terrible at the end of the season, but not quite as terrible as they were at the beginning. The drop off at the end of the season is also most likely because of September call ups. The most intriguing bins to compare are the 88-93 and 82-87 bins. This is the line between "contender" and "pretender."You could argue the line is between 89 and 90, but the 88 and 90 win teams were necessary to keep the bin sizes equal, since there have been 30 of them. There is a steep drop off between 88 and 87, though:
The "pretenders" actually have a little better April, in general. That's probably just a weird quirk. However, as the season progresses, the "contenders" get better every month, while the "pretenders" get worse every month (besides September). So when do the standings matter? At the end of the season, when all the playoff spots are set. But when can we tell who is good and who is bad? It looks like somewhere in August.
The terrible teams and great teams separate in June, but the middle is still muddled. This appears to clear up somewhere in August. So how about that quote? Can a manager tell what kind of team he has within 50 games? Possibly. Can the general public tell how well a team will finish at the end of the season after 50 games by looking at their winning percentage? Forget it.