This week's rankings feature a number of improvements under the hood. More on that below...but if you just want to see the rankings, here they are!
"On Paper" Playoff Standings
American League: E=Rays, C=Twins, W=Rangers, WC=Yankees
National League: E=Braves, C=Cardinals, W=Padres, WC=Rockies
This Week's Feature
There was a good amount of discussion last week about various ways to improve the rankings. Fortunately, it's summer, and I just got a big project done Tuesday, so I was able to sit down and implement them yesterday. Let's take a tour of the new features:
1. Offensive runs are now calculated using base runs instead of FanGraphs' wRC. I don't think there's anything particularly wrong with wRC, but there are some advantages to using base runs:
- I have complete control over the equation being used (though I suppose that means I'm responsible for it, which is a downside). I'm using this one (see end of article), which performs well across all major and minor leagues, though I do tweak it to make sure the total runs scored in MLB adds up to exactly the estimated runs scored.
- Base runs works better in team situations, because it gives extra credit to teams that have both good on base ability and good advancement-around-the-bases ability (and penalizes teams that do not) because those two elements of offense can interact.
- I'm now able to include Reached On Errors, which wRC does not include. While ROE's are not traditionally credited to hitters, there is a significant "skill" component to getting them (hitters who hit more grounders, or are fast, get more ROE's; hat tip to @fastballs for the link to the study), and so it's worth it to include them here.
- Calculate component winning percentage for each team.
- Use the log5 method to back-calculate what a team's winning percentage would have been had they played 0.500 teams. Basically, I'm solving for W%(A) in the log5 equation reported here. The result is what I'm calling "cW%s" (for schedule-adjusted component winning percentage).
- Again use the "reverse" log5 method to apply the league adjustment to this total. This is a slight change of methodology, as I applied the league adjustment to run totals. Here, I'm applying the same adjustment, but doing it by providing a second strength of schedule adjustment based on the expected true talent estimate for AL teams. Those numbers are here, though I reduced their magnitude slightly because AL and NL teams do play each other a small amount each season and thus have already affected their performances (this is a tweak I did last year, thanks to @btb_sky for the suggestion). The result is TPI.
Under the Hood
Converting Runs to Wins
RS = Actual Runs Scored, after a park adjustment
eRS = Estimated Runs Scored, after park adjustment (see table below)
RA = Actual Runs Allowed, after a park adjustment
eRA = Estimated Runs Allowed, after park adjustments (see table below)
W% = Actual Winning Percentage
pW% = PythagenPat Winning Percentage, based on actual runs scored and run allowed totals
cW% = Component Winning Percentage, using estimated runs scored and estimated runs allowed totals. If you don't like the league adjustment, click in the header and sort by this column to get an "unsullied" ranking.
SoS = Strength of Schedule. This is an iterative weighted average of the component-based winning percentages of a team's opponents. Described in this post.
cW%s = Schedule-adjusted Component Winning Percentage. Calculated by applying SoS to cW% with the log5 method, as described in this post.
xTW = Extrapolated wins. Based on current real wins to date, and extrapolated wins over the rest of the season. Extrapolations are based on an average of cW% and cW%s, as justified in this post.
LgQ = League Quality. The AL has superior talent to the NL (justification here and here, and modified most recently here). The number shown is an estimated true talent level (in winning percentage) of the two leagues were they to be able to play one other for a large number of games. It's based on the last two years of interleague, with a small adjustment toward 0.500 to account for the fact that the leagues do play one another and thus have already had a small effect on one another's performance.
TPI = Team Performance Index, a hypothetical winning % based on cW%s, after adjustment for league quality. Think of this as the W% we'd expect teams to have if they were all in one big league and were allowed to play 10,000 games vs. every team.
Team Offenses and Defenses
RS = Actual Runs Scored
eRS = Estimated Runs Scored: HitRns + EqBRR
wOBA = The Book's statistic, but park adjusted, and using data from both HitRns and EqBRR
OBP = On Base Percentage (Times on Base / Plate Appearances)
SLG = Slugging Percentage (Total Bases / At Bats)
HitRns = Base Runs-estimated runs scored, ignoring all base running, using the equation in this post.
EqBRR = Dan Fox's composite baserunning statistics from Baseball Prospectus, minus stolen bases since they are included in wRC.
RA = Actual Runs Allowed, after park adjustment
eRA = Estimated Runs Allowed: PitRns - Field
ERA = Straight-up Earned Run Average
FIP* = Fielding-Independent Runs, based strictly on K-, BB-, and HR-rates. HR/FB rates are park adjusted using these park factors.
xFIP = Expected Fielding-Independent Runs from FanGraphs. Like FIP, but with HR/Outfield Fly Ball rates regressed completely to league average. xFIP is as predictive as any other DIPS-like stat.
PitRns = Pitching Runs Allowed, the average expected runs allowed based on FIP and xFIP. Described in this post.
Field = Described in this post. It is essentially an average of team UZR, DRS (minus rSB since I calculate catcher fielding separately), and BsRFld. BsRFld is just difference between FIP-based runs allowed and park-adjusted Base Runs, and is a less direct approach of measuring fielding. The fielding number also includes a catcher fielding statistic, based on SB's, CS's, WP's, PB's, E's, and this year catcher interference. The catching methods are essentially those described here. But I'm using B-Ref data this year, and so there are slight tweaks to the methodology, generally in ways that should lead to greater accuracy. If you want to know, feel free to ask!
BABIP = Batting Average on Balls In Play. Fluctuates at the team level with fielding, although park effects and chance events can have effects as well.