Two starters in the same rotation are generally judged statistically on similar scales, but should that be the case? I will be looking into the obvious, and not so obvious, factors that can change run environments for pitchers.
Most of you reading this site know that ERA's cannot be compared equally across the league. Factors like league, role, ballpark, etc., influence a pitcher's run environment, creating a different baseline for each team. However, an overlooked part of this process is the variability within each team.
Baseball-Reference does a tremendous job outlining each step of the WAR process for pitchers. After determining the quality of opponents a certain pitcher faces, they adjust for their defensive support, whether they start or relieve, and ALL of the ballparks they pitch in. The end result is that pitcher's own run environment, which is used to determine WAA, then WAR. All of these components have their own quirks that can influence run values.
The initial step, the opponents' scoring rate, is the trickiest to calculate, as you have to adjust for ballpark and interleague play, among other things. This is where the league adjustment comes into play, with the AL pitchers expected to allow .2 runs more per 9 IP. FanGraphs uses general league baselines, while B-R breaks down each pitcher start-by-start to determine their baseline.
Within each team, it used to be generalized that the teams each pitcher faced "averaged out," but that is not the case. Among starters with at least 50 IP, the teammates with the largest spread of opp. quality were Carl Pavano (4.78, highest in the majors) and Cole DeVries (4.31, lowest in the AL). A few other teams had players with a .3+ run difference, enough to create a half-win change over 150 IP, all else equal.
Another theme of this post is the difference between value and talent. If a pitcher throws a disproportionate amount of games against an opponent he matches up against well, his WAR will likely end up higher than his talent would indicate. Something even more deceiving is someone pitching more often against teams riddled with injuries at those times. His quality of opponent would be higher than the actual lineups he faced. More examples of this Value vs. Talent divide are coming.
The next adjustment is for defense, for which B-R uses team DRS. The variability comes with differing K rates among pitchers, as the defensive adjustment factors in the rate of balls in play, not just innings pitched. If the team defense rates around average, there will be nearly no variability, but extremely good or bad defense can influence this rating by a tenth of a run or more. The Braves had the best defense in the league last year according to DRS, with Tim Hudson getting +0.53 runs of support per nine innings, while Brandon Beachy only received a +0.44 figure.
This adjustment is also a blanket figure, meaning pitchers on a team are assumed to have received the same defensive support throughout the year. This has two catches, with the obvious one being random variation of talent over the course of the season. The other one is that sinkerballers and flyball pitchers get the same adjustment, no matter which positions rank strongest. Using the Braves again, their OF supplied most of their DRS value, but Hudson did not get the benefit of that group as much as Beachy and others. This creates a bit of an under-valuing of Hudson, since he had to watch Dan Uggla and Chipper Jones try to make plays more often than most.
The next adjustment is starting vs. relieving, a very basic calculation. Weighing starts and relief appearances, they give starts a 0.17 run boost, while knocking down relievers 0.33 runs, and the average of every appearance is the role adjustment. This shows the standard half-run difference between starting and relieving, whether that pitcher performed better out of the bullpen or not.
The last adjustment, possibly the biggest of them all, is the park factor tinkering. It may seem to be an NL West argument, but some of the biggest differences came on Central and East teams. My Braves show up again, with Beachy posting a park factor of 105.3 while Tommy Hanson had a slightly pitcher-friendly 99.2 park factor. The Twins also show up again with DeVries (103.7) and Liam Hendriks (95.9). Most teams had a maximum split of at least 4 points among starters.
We delve once more into talent vs. value here, as the park factors used here are just run-scoring environments. When pitching at Fenway Park, a flyball lefty will allow more runs on average than a groundball righty, assuming their peripherals are similar. "Normal" run park factors are used for value, while component park factors are used for talent evaluation.
After putting all of these adjustments together, the biggest differences in run environments within a team was around .4 runs. Colorado's 120 park factor and large negative defensive value created a huge run environment ranging from Juan Nicasio (5.99) to Jhoulys Chacin (6.39). The Astros also saw that large of a difference between Lucas Harrell (4.9) and Dallas Keuchel (5.29).
For anyone wondering how much correlation there may be between WAR and environment, there was an R-squared of 0.02, with a slight positive slope. That gives the impression that extreme environments are not influential on WAR totals, which is a good thing. While I always thought there were some minor differences in opponents and parks between teammates on the mound, it turns out I under-estimated these effects. While these metrics aren't perfect, they show that there is usually more than what meets the eye.