clock menu more-arrow no yes

Filed under:

An ABC of translating statistics ...

New, 2 comments

One of the problems with the slew of exotic sabermetric statistics that exist is that it is sometimes difficult to put their relative values in to context of everyday game performance. When a pitcher has a defensive OBP of .250, what does that actually mean in terms of ERA? And more to the point how can we find out?

My interest was piqued in this when reading Phil Birnbaum's review of The Book in the quarterly By the Numbers newsletter. A frustration of Phil's was that he didn't understand how a pitcher's wOBA (a statistic defined by The Book) equated to ERA. What I want to outline in this article is how to make this link.

There are a few different methods we can employ to solve this conundrum, here I will outline two. First, plotting the two statistics in question and drawing a line of best fit will allow us to relate the two measures. Second, we can break down the individual statistics in to their constituent building blocks and relate them through logical reasoning. We'll do this for defensive OBP and ERA to illustrate the concept.

Let's start off by plotting ERA and OBP for all pitchers who threw over 40 innings in 2005.

We see a strong correlation between these two statistics, r^2 = 0.62. This isn't a surprise since we would expect that the more frequently a hurler lets an opposing batter get on base, the more likely he is to circle the bases and score an earned run. Now let's calculate the linear equation that best defines this relationship. We can use this to work out what impact on ERA a 10 point change in OBP represents. In Excel it is easy to calculate the line of best fit, and it turns out to be:

ERA = OBP*27.988 -4.906

The intercept term doesn't concern us as it only modifies ERA and not OBP. What we are interested in the gradient coefficient, which is 27.988. This is saying that an increase in OBP of 1 is equal to a change in ERA of 27.988! This is, of course, nonsensical. We know that OBP cannot exceed 1 by definition. It helps to take the inverse of the gradient coefficient to see what impact a 1 point change in ERA has on OBP. Inverting 27.988 we get 0.035. This tells us that an additional 35 points of OBP is more or less equivalent to 1 point on ERA. Note that it isn't a perfect relationship as the two statistics don't correlate perfectly. However, it is a perfectly good rule of thumb.

An empirical approach is all well and good but can we logically rationalize the result? The answer, not surprisingly, is yes. Think about what how an earned run is defined. It is the act of a batter scoring as a result of a pitcher giving up at least one base to him at some point, through either a walk or a hit, and then being able to trot around the bases without the fielders committing an error. Not everyone who gets on base scores - a typical score rate is about 30%; that the score rate remains reasonably constant allows the relationship to work.

Take a pitcher who faces 300 batters in a season (300 BFP). In this environment giving up an additional base adds about 3 points to OBP (1/300). Processing the math this means that over the course of a 300 BFP season, a hurler who gives up 10 additional bases sees his ERA creep up by roughly 1 point. Does this make sense? Assuming that these 10 additional bases are 5 singles, 3 doubles, 2 walks and a home run, for example, we can work out the expected run value of these 10 hits (this distribution is not perfect but is representative enough for our purposes). Assigning a run value of 0.5 for a single, 0.25 for a walk, 1.4 for a HR and 0.75 for a double we arrive at a value of 7 runs for the 10 bases given up. Suppose our hurler faced an average of 4.5 batters per inning, then 300 BFP equates to about 65 innings. The math is simple: once a batter has reached base ERA will increase by 1 point (7/65 * 9). Bingo, we now fully understand the relationship between ERA and OBP. Technical note: we have made a couple of simplifying assumptions here, such as ignoring errors and doing some simple rounding. To be technically correct we should account for errors and be a little more precise with rounding, but the purpose of this article is to demonstrate the correlation technique works. We can see it does.

Hopefully you can see the power of the technique we just walked through. It actually allows us to have a more fundamental understanding of how the statistics we are using interact, and how much the relative values of different offensive events contribute to the two statistics.

Let's try to repeat this but with another two stats: EqA and runs scored - after all runs, because they create wins, are the batter's main currency in baseball. First, let's plot the two statistics. Actually we need to plot EqA against runs per plate appearance for a true comparison. Otherwise the number of times a hitter comes to the plate will skew the data (EqA is a rate stat and runs scored is a count stat). Below is the graph relating the two variables (there is no need to adjust for park and league as both runs and EqA would both need to be equally adjusted).

Once more there is a linear relationship and the r^2 is 0.55. Again we can work out the equation between EqA and runs by studying the gradient of the line of best fit. The gradient is 0.435. This tells us that an increase in EqA of 10 points (0.01) equates to roughly 0.004 runs per plate appearance, or 2 runs for 450 plate appearances. Does this make logical sense? The formula for EqA is:

              H + TB + 1.5*(BB + HBP) + SB 
EqA = ----------------------------
AB + BB + HBP + CS + SB/3

Take a player with 450 plate appearances, which works out at roughly 400 at-bats (the cut off in the EqA vs. runs graph). The denominator in the equation will be somewhere around 450. Hence, to increase EqA by 10 points we need to add 4.5 to the numerator. It matters which numerator term we increase because EqA supposedly reflects the run value of each event (see my earlier article ). By adding in 3 walks we get 4.5, as we do if we add in 2? singles. Either way the run value is somewhere around 1, not 2 as we expect (we know this by assigning the same run value as before - a single = ~0.47); what is going on? The problem is that EqA is adjusted to scale to batting average, and we must now make the same adjustment. It turns out that the real baseline for the above EqA formula is about 0.53 (for batters with more than 400 at bats). The actual baseline for EqA is 0.260, and this is what our graph represents. Therefore we must multiply the result derived through logical reasoning by 2 This gives 2, which agrees with our graphical analysis.

Although we have plotted our two chosen stats and then tried to rationalize the result there is no need to do this in reality. Just plot the two stats and find the line of best fit. So next time you come across an exotic stat and are wondering what on earth it means you know what to do. Never again do we need to be discombobulated by what EqA or VORP or wOBA actually mean in terms of plain old batting average or ERA. Hooray for the common fan.