On April 21, 2012, Clayton Kershaw turned in a dominating performance. He took the hapless Houston Astros to town, pitching 7 innings, allowing 0 runs, and striking out 9. Three days later, Kershaw's AL counterpart David Price turned in a similarly dominating performance against the Los Angeles Angels: 0 runs allowed, 6 strikeouts while going the distance.
Which performance was more impressive? You could argue for game scores, number of batters struck out, walks, and so on, but for me it comes down to one thing. The Astros would score 583 runs that year, while the Angels would score 767. Think about that; the Angels scored a full 1.14 more runs per game than the Astros. To me, Price's performance was more impressive because he dominated a better opponent.
That's the thing in all this. We have to consider opposition faced when evaluating players. It's the reason why the Linebacker from the SEC gets picked before a similar one from the WAC. In baseball, over the course of a full season, this is thought to average out. But in individual games, opponent can matter greatly.
With this in mind, we're going to develop a metric for runs allowed that adjusts for who the pitcher faces. Called OARA (Opponent Adjusted Run Average, pronounced like "aura"), it can help differentiate performances fueled in part by poor opposition from truly great performances.
So, in general, the amount of runs scored in a game can be "blamed" on the pitcher or on the offense that he faces. This idea of assigning the result of something to a source is occasionally seen in statistics. In text mining, you'll see people assigning words to topics in a paper (A process called Latent Dirichlet Allocation). Biologists try to attribute the presence of microbiota to certain environments. Seems reasonable that we can attribute the result of a game (In terms of allowed runs) to either the pitcher or the offense he faces.
This idea of assigning blame implies that the runs allowed in the game is a mixture of two distributions: one controlled by the pitcher and one controlled by the opposing offense. In notation, that is
RA ~ p Pois(λPitcher) + (1-p) Pois(λOpponent)
Now, this applies to each game, so the mixing proportions overall and for each game are unknown. We could estimate this through some sort of maximum likelihood estimate (Most likely given the data), or some EM algorithm, but I prefer to go the Bayesian route. With that in mind, the hierarchical model is defined.
RAi|zi ~ zi Pois(λPitcher) + (1-zi) Pois(λOpponent,i)
zi|p ~ Bern(p)
p ~ Beta(1,1)
λPitcher ~ Γ(aP,bP)
λOpponent,i ~ Γ(aO,i,bO,i)
Where we set our aP and bP such that aP/bP equals the Pitcher's runs allowed per game and aP+bP=10. The aO,i and bO,i are set similarly for the opponent's offense.
From here, we code up a Gibbs sampler and run for 10,000 iterations. We then can estimate the blame assignments for each game, or P(zi=1|data). We do this by averaging the draws of each individual zi from our Gibbs Sampler. Now that we have our blame probabilities (Or the probability that the pitcher gets blamed for the game), we are able to calculate OARA.
In order to that, we calculate an expected runs score. This is calculated as the average runs from the league average times the probability this is ascribed to the pitcher. In notation, this is
Where RALeague is the league average runs per game. We then sum this across all starts, divide by the innings pitched, and multiply by 9 (Similar to ERA). Then, to get OARA, we add this quantity to the league average runs per game. In the end, OARA has a similar range to ERA, although with slightly higher maximum values.
This allows us to assign more credit to a pitcher when he shuts down a good-hitting team than when he shuts down a poor-hitting team. Similarly, when he gets beat up by a good-hitting team, he isn't hurt quite as much as he would be against a poor-hitting team.
Examples: Kershaw, Price, and Lincecum
So lets calculate the OARA for a few examples. To begin with, let's look at Kershaw and Price in 2012. Price allowed 2.69 runs per game in 2012 against offenses that averaged 4.55 runs per game for the season. His expected runs score was -1.92, leading to an OARA of 2.40. Meanwhile, Kershaw allowed runs at a rate of 2.77 against opponents scoring at an average rate of 4.25. His expected runs score was -1.79, leading to a OARA of 2.53.
Now let's look at a pitcher who had a little less success last year. For that, we'll go to Tim Lincecum. He allowed runs at a rate of 5.37 against opponents who scored on average at a rate of 4.27. His expected runs score was 1.85, leading to an OARA of 6.17.
The technique tends to punish pitchers with high runs allowed rates a little more, but I'm find with that. Really, OARA is best used when comparing two pitchers who have similar ERAs or runs allowed rates, as it allows us to take out the effect of opponent quality. I do not currently have OARAs calculated for the entire league, as this would take a little time to compile the data and run all the Gibbs samplers.
All start-by-start data was gotten from Baseball Reference. Thanks to K.C. Kubli for allowing me to run some of the particulars of the idea by him.