clock menu more-arrow no yes mobile

Filed under:

How Good (or Bad) Are "Replacement Level" Pitchers?

If you are new to sabermetrics or analyzing baseball statistics, you might not have heard what a replacement level anything is. But it is another way to judge a player's (or pitcher's) value. Value can be compared to the next best alternative. If a player hits 30 HRs and the next best available hitter (the "replacement") hits 10, the first guy has a value of 20 HRs. Of course, there is more to baseball than HRs, so figuring value is more complicated and it will be different for pitchers. Some analysts attach a winning percentage to the value of a replacement level player pitcher, like .400. That means a "replacement" pitcher on an average team would be good enough to have a winning percentage of .400. I examine what that hypothetical percentage might be.

I looked at the worst pitchers in two different leagues and years, the 1968 NL and the 2000 AL. Then I tried to estimate the value of the worst 1%, 5%, 10% and 20% of the pitchers in each case. Replacement pitchers are certainly below average, so it is only a question of how far below average. I stopped at the lowest 20% because that is where Keith Woolner set it at in his essay on this topic in the Baseball Prospectus book "Baseball Between the Numbers." He picked that level because the 5 most used starters have historically started about 80% of a team's games.

Starting with the 1968 NL, the worst 1% of the pitchers (by innings), allowed 8.21 runs per game. To determine that, I ranked all NL pitchers from worst to best in RSAA per IP. RSAA or "Runs saved against average" comes from the Lee Sinins Complete Baseball Encyclopedia. The exact definition is: "It's the amount of runs that a pitcher saved vs. what an average pitcher would have allowed." It is park adjusted, so if a pitcher saves 50 runs but in a very pitcher friendly environment, his RSAA will be less than 50. Notice also that it is compared to the league average, so it could be negative. I found the worst pitchers in RSAA per IP up to 1% of the number of innings for the season.

The league average of runs per IP was about .38. The worst 1% had a combined RSAA per IP of about -.533 (that actually means, in this case, that they allowed more runs than average). So they allowed about .912 runs per IP or 8.21 per game. The next question is what winning percentage would a pitcher in the 1968 NL have if he gave up 8.21 runs per game?

I used a formula called "Pythagenpat," which is a variation on Bill James's "Pythagorean Formula." Pythagenpat estimates winning percentage by

  1. Finding the total runs per game and raising that to the .287 power (an exponent of .287). You add runs scored (RS) per game and runs allowed (RA). Call the result X.
  2. Then find

(where RS is runs scored and RA is runs allowed)

In the 1968 NL, the average number of runs per game was 3.42. So we add that to the 8.21 the replacement level pitcher allows, we get 11.63 (recall that we assume that the replacement pitches for an average team, so they score 3.42 runs for him). Then we raise 11.63 to the .287 power to get 2.02. That will be the value for X in step 2. Once all the values are plugged in (RS = 3.42, RA = 8.21), we get a winning percentage of .154. I did the same for other levels and for the 2000 AL as well and all the winning percentages are in the table below.

(the R/G is how many runs per game those pitchers gave up and EXP is the exponent as calculated in the method mentioned above)

Notice that the replacement level is different for different years (as Keith Woolner said it would). Defining the level at the worst 20% seems a little high. If 20% of a team's innings are pitched by "replacements" then that is almost 2 IP per game. Seems like alot. Also, Woolner based it on the idea of the top 5 starters taking 80% of the starts. Sometimes a fairly good reliever takes a spot in the rotation when a regular starter can't go, so the replacement may not be that bad. But I don't want to quibble with Woolner's excellent analysis too much. What is interesting to me is that even going as high as the worst 20% only raises the replacement level to a .367 winning percentage for the 1968 NL and .324 in the 2000 AL. I have seen .400 used and even .450. So those levels are much lower.

One thing that might be misleading here is that pitchers who gave up alot of runs might have been victims of some bad luck. For example, the batting average they allow on balls in play, which, as Voros McCracken showed, is not entirely controlled by the pitcher, might have been unusually high. Maybe it was just bad luck or the pitchers had bad fielders behind them who turned fewer balls than average into outs. If that's the case, they gave up more runs than their true performance warranted and the "replacement" level pitchers are really not quite as bad as this first look indicates.

So I also calculated a "Fielding Independent ERA" or FIP ERA to account for this. It is an estimate based on a pitchers strikeouts (Ks) and HRs & walks (BBs) allowed. The idea is that it would be a pitcher's ERA if he had average fielding behind him. Before seeing where this leads us, how is FIP ERA calculated?

  1. FIP ERA = Constant + 1.44*HR + .33*BB - .22*K (HRs, BBs, and Ks are per 9 IP).
  2. The constant = League ERA - (1.44*HR + .33*BB - .22*K).
Again HRs, BBs, and Ks are per 9 IP. One more step here was to adjust each pitcher's HRs allowed for park effects (I used HR park factors from Ron Selter). Then I ranked the pitchers from worst to best in FIP ERA for each season. The worst 10% in the 1968 NL had an FIP ERA of 4.25. The worst 10% in the 2000 AL had 6.96. Then I found the winning percentage the same way I did before for each level, lowest 1%, lowest 5%, etc. That is, I used the procedure for "Pythagenpat" explained above. Of course, there has to be a league average, which is the runs scored for the average team the replacement gets added to. The league ERA in the 1968 NL was 2.98 and it was 4.92 for the 2000 AL.

These percentages are higher than what was originally found using RSAA. That adds some credence to my idea that the pitchers who gave up the most runs were a little unlucky and/or had bad fielders behind them. One potential problem here is that I used ERA, not runs. So for both seasons I added in the difference between runs per game and ERA. In the 1968 NL, 3.42 runs per game were scored while the league ERA was 2.98. So all of the1968 NL numbers in the previous table had .44 added to them. 5.34 runs were scored per game in the 2000 AL and the league ERA was 4.92, so .42 runs were added (so I am adding in the unearned runs). The new results are in the table below.

(of course, FIP ERA really means FIP ERA + unearned runs in this table)

Now the percentages are even higher. So that adds even more weight to the idea that FIP ERA might be a better way to calculate the replacement level. One other interesting thing is that in no case has the replacement level reached .400. Some important issues might include looking at more seasons and trying to factor in intentional walks and HBPs. I wasn't sure what to do with IBBs when calculating the FIP ERA. It may not be fair to hike a guy's FIP ERA for a walk he was told to issue. But the IBBs were used in figuring the constant for each season. This seems to make sense because IBBs contribute to scoring. To remove them for individual pitchers could be a distortion. I also did not take HBPs into account. The worst pitchers might hit more batters, raising their FIP ERA and lowering their expected percentage (although I don't know if they do hit more batters).

One last thing. I want to get back to Keith Woolner's views. He presented formulas for predicting how many runs a replacement level starter and a replacement level reliever based on the league average number of runs. Here they are:

Starters: R/G = 1.37*(League Average R/G) - .66
Relievers: R/G = 1.7*(League Average R/G) - 2.27

I then used these formulas to create a table which shows the number of runs allowed per game for both starters and relievers (as replacements) at various runs per game levels and their expected winning percentage based on the methods outlined above.

In addition to listing each run level from 3 to 5.50 in .25 intervals, I also put in the 3.42 for the 1968 NL and the 5.34 for the 2000 AL. The 2000 AL projects fairly close to what I had in the second table using FIP ERA (where I raised it up by adding in the unearned runs). There the worst 20% were projected to have a percentage of .381, fairly close to what Woolner's equation predicts for both the starters and relievers. On the other hand, my estimate and his diverge quite a bit for the worst 20% in the 1968 NL. He has 0.428 and 0.484 for starters and relievers, respectively. But I had .399. Now if I tried to divide things up into relievers and starters, it is possible that one of the groups would have come out close to what Woolner's formula predicts. But the other group would have to be way off. For example, if I got .428 for starters, then that means I would have to have gotten something lower for relievers. Lower than .399, that is (because if the whole group is .399, and one sub-group is .428, the other sub-group must be lower). And that would be pretty far off from the .484 he gets for relievers. I really don't know what this means. Maybe it just shows that it's hard to pin down the right winning percentage for replacement level pitchers.