clock menu more-arrow no yes

Filed under:

Determining the Best Runs/Win Formula

New, 5 comments

At some point Marc is going to start yelling at me for posting these technical, boring posts on BtB. Until he does, I'll continue writing about things the best runs/win estimators, which is the topic of today's post. In a nutshell, the importance of these methods (and there are a bunch) is to best convert a player's marginal runs (like runs above average or runs above replacement) into wins (above average or above replacement). Runs are a nice measurement, but baseball is all about wins and losses.

The most famous, and often-used runs-to-wins converter is Pete Palmer's, published in The Hidden Game of Baseball, and used in his Batter-Fielder Wins system. That converter is runs/win = 10*SQRT(RPG/9). RPG is runs per game, the average number of runs scored in a game. For example, in last year's American League, the average team scored 4.76 runs per game (and allowed the same), meaning that the RPG in the 2005 AL was 9.52. Plugging that into Palmer's runs/win converter, we get 10.28, meaning that the average team would need to score 10.28 extra runs to get one extra win, according to Palmer.

Is that right? We'll get to that in a second. First, let's quickly run down the other runs/win converter's out there. There are three in particular that I am familiar with, one from BaseRuns creator David Smyth, and two from noted baseball stat guy Tangotiger. Smyth's formula is the simplest, it's simply runs/win = RPG. It's not mean to be exactly correct, but just easy to use. In fact, if we look at last year's AL, it tells us that we would need 9.52 marginal runs to add an extra wins, which isn't all that different from Palmer's formula (though certainly, even though the difference between the two won't be more than half-a-win for any player, half-a-win is still worth almost $2 million on the free agent market).

Tangotiger's two formulas are pretty similar. The first is runs/win = .8*RPG + 2.4. That gives us a runs/win value of 10.02 for last year's AL, right between Smyth and Palmer, though closer to the latter. The second is runs/win = .7*(RPG + 5), which gives us a runs/win value of 10.16. So both of Tangotiger's formulas are closer to Palmer, though they both give lower values than the Hidden Game of Baseball author.

Now what is the correct value? How about I tell you first and then I explain? The correct value happens to be 9.97, which is best approximated by Tango's first formula. How did I determine that? Simple (or not so simple, depending on how you feel about calculating Pythagorean records with custom exponents).

The Pythagorean record is a team's expected record based on the number of runs it scores and allows. It takes the following form: W% = Runs Scored^Exponent/(Runs Scored^Exponent + Runs Allowed^Exponent). When Bill James developed the Pythagorean formula, he simply used an exponent of two, however since then, it has been shown that there are better exponents, and that the exponent is in fact dependent on the run environment. To determine the correct exponent based on the run environment, Smyth and a stat wonk who goes by US Patriot developed the Pythgenpat formula which is Exponent = RPG^.287. Exponents such between .278 and .287 have all been shown to work as well, but I like to stick to .287. It doesn't really change your answer very much, no matter which exponent you use.

Anyways, what's cool is that using this formula, we can determine the correct amount of runs it takes to gain a marginal win in any run environment. Here's the simplest (though not-quite mathematically correct, but so close it's more than close enough) way to do so: Take an average team in that run environment. For last year's American League, that would be a team that scores 4.76 runs a game and allows the same. Now add a very small total of runs, say .001 to its offense. How many more games will it win? Well, doing the math, we expect the team to have a .5001 W%, rather than be .500 (we actually need to use more decimal points for maximum accuracy). So that's .0001 more wins than expected. So how many runs would we have to add to get one more than expected? Well, simply divide 1 by .0001 and you get 10,000. Then multiply that by .001 (because remember, we added .001 runs to the offense, so really what this is saying is that we would need to add .001 runs 10,000 times for an extra win). The answer is 10. You would need to score 10 extra runs in a game last year in the American League to win one extra game. Remember, the actual answer is 9.97, if don't do any rounding.

Using this method, and the other estimators mentioned, I've done the math for every run environment between 1 and 20 RPG. Here is a graph of how the estimator's compare:

You can see that they are all very close when it comes to run environments that baseball is actually played in, which is why they are all usable. Nevertheless, their weaknesses are obvious if we look at the graph. Every estimator except for Smyth is too high at very low RPG levels, while Smyth is way too low. On the other hand, Smyth's estimator over-predicts once we get past 11 RPG. This is because his formula is linear, while the number of runs it takes to get a marginal win is not. Nevertheless, at least his formula is simple.

Palmer's formula is also terrible for weird RPG ranges. It over-predicts the number of runs needed to add a marginal win at the low RPG ranges, and way under-predicts at high RPG ranges. Essentially, it is only usable in normal ranges (though on the other hand, that really is the only place we ever use these formulas anyways).

Tangotiger's two formulas hold up better, though they have their own problems. His first formula gets very close to the true number at 4 or 5 RPG, but it begins to drift away at around 13. His second formula over-predicts badly at low RPG ranges, but is very close to the truth at high RPG ranges.

However, since you're really only going to use any of these formulas to evaluate players playing in real contexts, let's look at how closely these formulas track the truth in real run environments, between 8 and 11 RPG:

First, it's interesting to note how all the formulas converge at 11 RPG. You can see that Smyth's formula, while the simplest, is also the worst. Palmer's formula isn't as good as I would have thought it to be either. Tango's second formula is better, but his first takes the cake. It tracks the true number almost exactly. So when you want to convert a player's marginal runs into wins, it's best to use runs/win = .8*RPG + 2.4, in lieu of the true number.

In reality, however, none of these runs/win formulas are going to give you a very exact answer, and here's why. Each of these formulas (including the correct one) are based in an average context. They answer the question, "how many runs do you need to score to give an average team one extra win?" However, a player affects his own context, and if it's a good player, his affect is large enough to screw up these calculations. We added .001 runs to determine the correct formula; Albert Pujols adds 70 runs. Pedro Martinez takes away 30 runs. By virtue of being themselves, Pujols and Pedro change their teams' runs/win converters. In reality, we need to account for that fact as well. But that's the topic of a whole other article.