Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Miikka Kiprusoff Wins 300th Game, Buffalo Crushes Boston

MLB Strength of Schedule Estimates through 27 July

Sorry Lou--your team has had the weakest schedule in baseball.  And you're still 46-56.

One of the things I've long planned to include in the power rankings is a strength of schedule adjustment.  There's a big difference between playing in the AL East than...well...any other division.  Baltimore may not be a good team, but they probably look worse than they are because they play so many games against elite teams like the Yankees, Rays, and Red Sox.

Well, I finally have it going, and thought I'd give a preview of it here before posting the power rankings tomorrow.

First, the methods.  You can skip this and click "more" below if you just want to see the results!

 The approach is pretty straightforward.  First, I calculate the weighted average component winning percentage of each team's opponents.  This is basically the strength of schedule adjustment.  Face more tough teams, you'll have a higher opponent component winning percentage.  We can then use the log5 method (solving for W%(A)) to apply this adjustment to a team's raw component winning percentage and calculate an adjusted component winning percentage.  This adjusted component winning percentage should be a better estimate of a team's true performance, because it accounts for the fact that some teams have faced tougher competition than others.

There's one additional wrinkle.  As @cwyers pointed out to me on twitter, it is then possible--and desirable--to use this adjusted component winning percentage to re-calculate strength of schedule adjustments.  That way, your strength of schedule measures are based on a better measure of team performance than raw component winning percentages.  And, of course, once you get this new strength of schedule adjustment, you would want to generate new adjusted component winning percentages for teams...and you can repeat this cycle indefinitely.  I'm finding that after three iterations, you don't get much change, so that's what I'm doing.

...Ok, one last thing.  It is the case that a given team has a say in the performance of his opponents, though this effect on any one team should be small in most cases.  Nevertheless, because I'm pulling data from baseball-reference team schedule tables, I don't have the ability to account for this game by game.  So I opted to "regress" 10% back toward 0.500, reasoning that few teams have accounted for more than 10% of another's games played, and thus shouldn't drive more than 10% of the strength of schedule adjustment.  It's an imperfect solution to this problem, but it's the best I can do.

Make sense?  That's the methodology.  And now, at long last, here are strength of schedule (SoS) adjustments through 27 July--these are essentially measures of opponent winning percentage, as measured by the methods used in the power rankings:

Star-divide

Team SoS
Orioles 0.529
Diamondbacks 0.523
Mets 0.517
Indians 0.515
Marlins 0.514
Phillies 0.513
Royals 0.511
Mariners 0.506
Rockies 0.506
Blue Jays 0.504
Red Sox 0.504
Astros 0.503
Nationals 0.503
Angels 0.502
Rays 0.501
Braves 0.500
Dodgers 0.498
Pirates 0.498
Giants 0.495
Padres 0.495
White Sox 0.493
Twins 0.493
Tigers 0.490
Yankees 0.490
Cardinals 0.489
Brewers 0.487
Athletics 0.487
Reds 0.481
Rangers 0.478
Cubs 0.477

So the Orioles take the cake as having the worst schedule in baseball (big surprise!).  Other teams with tough schedules, at least thus far, include the Diamondbacks, Mets, Indians, and Marlins--all teams that have arguably underperformed at times this season.

On the other side of the coin are teams with particularly weak schedules.  These include the Cubs (no excuses!), Rangers, Reds, A's, Brewers, Cardinals, and Yankees.  As you can see, while the pattern is not absolute, a number of the "surprise" teams (Rangers and Reds first and foremost) have had fairly easy schedules thus far.  You can also see that the NL Central seems to be a good place to play--four of the six easiest schedules belong to teams from that division...because there are a lot of bad teams in that division, and no really outstanding ones!  I haven't looked closely, but I doubt the Reds' and Cardinals' schedules will be much worse moving forward.  The Yankees were a surprise here, but while they do play the Red Sox and Rays a lot, they have otherwise had a fairly light schedule...including 12 games vs. Baltimore, their most common foe thus far.

Finally, if you look closely, there's an interesting pattern here where many of the best teams in the standings have tended to have weaker strength of schedules.  An obvious reason for this is that they don't have to face themselves!  The correlation isn't huge (r = -0.32), but it's there.  This is one reason the iterations are an important addition--without the iterations, the correlation was closer to 0.6.  But, of course, another possibility remains--that part of their success is just the good fortune to have an easy schedule.  We'll see what happens over the rest of the season.

Anyway, hope you like this!  I'll show how these values are incorporated into the power rankings tomorrow.

Comment 13 comments  |  0 recs  | 

Do you like this story?

Comments

Display:

Good post

Somehow the fact that the Yankees have one of the easier schedules makes me hate them even more.

Have you thought about using preseason projected winning percentages, or updated projections to do this analysis?

by vivaelpujols on Jul 29, 2010 2:31 AM EDT reply actions  

Ditto on the Yankees

It also might mean that they stand to lose a little ground over the remainder of the season, as I have to expect that their proportion of games against the other non-BAL AL East teams will increase as the season rolls on. MLB always saves a bunch of Yankee/Red Sox games for September.

Re: projections—I’ve thought about incorporating preseason projections into the power rankings to get a true talent estimator number to go with TPI. Hadn’t thought about using those data for the strength of schedule stuff, but there’s no reason I couldn’t….if I ever get around to using preseason projections. ;)
-j

by JinAZ on Jul 29, 2010 8:25 AM EDT up reply actions  

How much it matters...

Using the log5 method (which I’ve heard about, but hadn’t check out until now), I wanted to know how much these differences in SoS mattered. If you put a .500 team against a .470 schedule over a full season, they’ll win 9.7 more games than playing against a .530 schedule (which is just slightly wider than the range in Justin’s table). At this point in the season, it’s about a six game difference.

W%(A v. B) = W%(A))/(W%(A)(1 – W%(B)) + (1 – W%(A))*W%(B))

http://www.tangotiger.net/wiki/index.php?title=Log5

by Sky Kalkman on Jul 29, 2010 10:48 AM EDT reply actions  

Wow, that's pretty huge.

I’d guess that SoS will improve as time goes on. Baltimore can’t have many more games vs. the Yankees, for example, so their schedule will get easier. The Cubs…well…the Cardinals are the only genuinely formidable opponent in the NL.

Still, +/- 3 games is a pretty big deal.
-j

by JinAZ on Jul 29, 2010 10:53 AM EDT up reply actions  

Also, in case someone wants to do log5 backwards like I did above

…and doesn’t want to test their algebra skills (mine were pretty rusty), here’s the log5 equation solved for W%(A):

A = Bx / (1-B-x+2Bx)

Where:
A = W%(A)
B = W%(B) <—-Strength of Schedule in the calculations I did above
x = W%(A v. B)
-j

by JinAZ on Jul 29, 2010 11:06 AM EDT up reply actions  

Greed mode:

Any chance of computing strength of schedule in a team’s remaining games?

Any chance of incorporating home/away differentials?

by Sky Kalkman on Jul 29, 2010 10:53 AM EDT reply actions  

Probably not anytime soon

I can see pulling #home vs. #away games, and using that to modify the SoS due to home team advantage. I’m not feeling real motivated to do that right now, though, as I think it would matter very little at this point in the season. Maybe next year.

SoS over remaining games is possible to do thanks to the remaining games table at B-Ref. But it’s hard to automate. Those tables on that page are pure text, and they are hard to get to automatically format well using text-to-columns. Plus, the Vlookuping and Hlookuping it would require to make it work makes my head hurt…
-j

by JinAZ on Jul 29, 2010 11:03 AM EDT up reply actions  

Yep

They are pretty hard to work with in Excel. I learned that when I tried to convert them so I could run sims on the rest of the season. That’s the primary reason I haven’t done more iterations.

by stevesommer05 on Jul 29, 2010 11:31 AM EDT up reply actions  

Very nice, thanks!

I would also add that another reason why the good teams have weaker schedules and vice-versa, something most people miss when doing this type of analysis: the good team’s record against the teams in their division.

I noticed this with the 49ers, the analysis would talk about how poor the records of the teams they had played are, but when you are 12-1 or whatever, that means the teams you played are 1-12 against you, and take that immediate hit.

Not sure what the best way to adjust this is. For my football example, I would take out the 49ers record against those teams. For baseball, since this is different in that you are assessing remaining games on the schedule, that might not work, but is one option.

Adoptive parental unit of Ehire Adrianza.
Godfather of Travis Ishikawa.

"Woo hoo!" - Tim "The Kid" Lincecum
"The objective is that World Series ring" - The Kid
"I think my role here has changed a little bit. I'm counted on a little more." - Posey after hitting 12-24 with 4 homers after Molina trade

by obsessivegiantscompulsive on Jul 29, 2010 1:42 PM EDT reply actions  

Yeah, this is a problem

This is why I pulled back the SoS estimates by 10% toward 0.500 (it’s mentioned up there in that long post). Unless I were to do it with gameday or something and pull out statistics from games involving a team when calculating that team’s opponent w% (which would also require that I come up with my own fielding metric!), I can’t really fix this in a meaningful way. But at least we know that 10% is roughly the upper bound (it’s actually ~13%) of how many games one team can play against any other one team, so this will hopefully help correct for this. A little bit.

Fortunately, after doing a few iterations, the correlation is fairly weak at 0.3. It had been 0.6 or so, which struck me as scary-high. But as it is, I think we’re ok.
-j

by JinAZ on Jul 29, 2010 3:29 PM EDT up reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
Context Neutral Run and RBI projections
Small
Free Agent Compensation
Img_0001_small
Value of Various Plate Approaches
Strike_three2_small
Effect of Foul Area on Strikeouts: AL 1954-68: Erratum
Small
Baseball on a stick
Small
Player Evaluating Statistic
Baseball_small
Rays Outfield: Cheap but Extremely Productive
Small
A new xBABIP
Small
Jack Morris "pitching to the score"
Strike_three2_small
Foul Area and Differences in SO: AL vs NL

+ New FanPost All FanPosts >

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Picture-6_small Chris St. John

Btbpro_small Dave Gershman

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung