One of the things I've long planned to include in the power rankings is a strength of schedule adjustment. There's a big difference between playing in the AL East than...well...any other division. Baltimore may not be a good team, but they probably look worse than they are because they play so many games against elite teams like the Yankees, Rays, and Red Sox.
Well, I finally have it going, and thought I'd give a preview of it here before posting the power rankings tomorrow.
First, the methods. You can skip this and click "more" below if you just want to see the results!
The approach is pretty straightforward. First, I calculate the weighted average component winning percentage of each team's opponents. This is basically the strength of schedule adjustment. Face more tough teams, you'll have a higher opponent component winning percentage. We can then use the log5 method (solving for W%(A)) to apply this adjustment to a team's raw component winning percentage and calculate an adjusted component winning percentage. This adjusted component winning percentage should be a better estimate of a team's true performance, because it accounts for the fact that some teams have faced tougher competition than others.
There's one additional wrinkle. As @cwyers pointed out to me on twitter, it is then possible--and desirable--to use this adjusted component winning percentage to re-calculate strength of schedule adjustments. That way, your strength of schedule measures are based on a better measure of team performance than raw component winning percentages. And, of course, once you get this new strength of schedule adjustment, you would want to generate new adjusted component winning percentages for teams...and you can repeat this cycle indefinitely. I'm finding that after three iterations, you don't get much change, so that's what I'm doing.
...Ok, one last thing. It is the case that a given team has a say in the performance of his opponents, though this effect on any one team should be small in most cases. Nevertheless, because I'm pulling data from baseball-reference team schedule tables, I don't have the ability to account for this game by game. So I opted to "regress" 10% back toward 0.500, reasoning that few teams have accounted for more than 10% of another's games played, and thus shouldn't drive more than 10% of the strength of schedule adjustment. It's an imperfect solution to this problem, but it's the best I can do.
Make sense? That's the methodology. And now, at long last, here are strength of schedule (SoS) adjustments through 27 July--these are essentially measures of opponent winning percentage, as measured by the methods used in the power rankings:
So the Orioles take the cake as having the worst schedule in baseball (big surprise!). Other teams with tough schedules, at least thus far, include the Diamondbacks, Mets, Indians, and Marlins--all teams that have arguably underperformed at times this season.
On the other side of the coin are teams with particularly weak schedules. These include the Cubs (no excuses!), Rangers, Reds, A's, Brewers, Cardinals, and Yankees. As you can see, while the pattern is not absolute, a number of the "surprise" teams (Rangers and Reds first and foremost) have had fairly easy schedules thus far. You can also see that the NL Central seems to be a good place to play--four of the six easiest schedules belong to teams from that division...because there are a lot of bad teams in that division, and no really outstanding ones! I haven't looked closely, but I doubt the Reds' and Cardinals' schedules will be much worse moving forward. The Yankees were a surprise here, but while they do play the Red Sox and Rays a lot, they have otherwise had a fairly light schedule...including 12 games vs. Baltimore, their most common foe thus far.
Finally, if you look closely, there's an interesting pattern here where many of the best teams in the standings have tended to have weaker strength of schedules. An obvious reason for this is that they don't have to face themselves! The correlation isn't huge (r = -0.32), but it's there. This is one reason the iterations are an important addition--without the iterations, the correlation was closer to 0.6. But, of course, another possibility remains--that part of their success is just the good fortune to have an easy schedule. We'll see what happens over the rest of the season.
Anyway, hope you like this! I'll show how these values are incorporated into the power rankings tomorrow.