clock menu more-arrow no yes

Filed under:

Is Run Scoring Linearly related to OBP and SLG?

New, comment

If the answer were yes, it means that a .010 increase in OBP (on-base percentage) on a team that hits 240 HRs in a season would add just as many runs to a team that hits only 100 HRs (assuming everything else being equal). It seems reasonable to say that is not true. The same can be said for SLG (slugging percentage): if it goes up .010, we would expect team runs to increase more for a high OBP than one with a low OBP. To be more specific, the run value of any event, like a walk or a double, may depend on how often a team's runners get on base and how much extra-base power the team has. Tangotiger has a site called "Custom Linear Weights Values by Team" which shows how the run value of events changes with the context. What I examine here is if team OBP times team SLG has a linear relationship with team runs per plate appearance.

I ran a regression with team runs per plate appearance (R/PA) as the dependent variable and OBP*SLG as the independent variable. Here is the equation:

R/PA = .836*(OBP*SLG) +.0026

The r-squared was .918, meaning 91.8% of the variation in R/PA is explained by the equation. The standard error was .004284. Over a 6,000 PA season, that is about 25.7 runs. The data includes all teams from 1920-1998.

The graph shows the relationship between R/PA and OBP*SLG

It looks linear. That is, team R/PA falls all along and around the straight trend line. I also predicted how many R/PA each team would get based on the equation then found out how much their actual R/PA differed from the predicted R/PA. The correlation between team OBP*SLG and this differential was practically zero, meaning that neither teams with very a high OBP*SLG nor a very low OBP*SLG probably had their R/PA no better or worsely predicted by the equation than other teams.

Here are the differentials for the top 25 teams in OBP*SLG:

The average season differential for these teams -0.9 runs (Season DIFF is PA*Diff). So even teams with a very high OBP*SLG are very accurately predicted by the regression equation. If there were truly a non-linear relationship between OBP*SLG and R/PA, the model would consistently under predict the R/PA for the highest OBP*SLG teams. But here, they are actually over predicted (although not by much). That is, they scored just a little bit less than predicted. The average discrepancy for the top 50 teams in OBP*SLG was -.37 runs per season.

Here are the differentials for the bottom 25 teams in OBP*SLG:

The average season differential for these teams -3.53 runs. For the lowest 50 teams in OBP*SLG it was -2.83. So again, the relationship is probably linear since teams with a very low OBP*SLG are fairly well predicted.

One last issue that is important. If you wanted to predict how many more runs a team might score if one of their current hitters was replaced by another, you could not simply calculate the new team OBP*SLG, then plug that into the equation, multiply that by team PAs to find out how many runs the team will score. The reason is that the team OBP will change. If it goes up, it means more team PAs. Once a new value for team OBP is found, that would have to be used to estimate a new level for team PA. That total can be used to predict the new team total in runs.

Technical notes: I adjusted each team's OBP by adding the league error rate (1 - fielding percentage).  So if the error rate for a league was .034 and a team had an OBP of .340, their adjusted OBP was .364. Doing this improved the r-squared slightly. The error rates ranged from .018 to .035. A team with a given OBP and SLG would score more runs if more errors were made. It would be even better to have the error rate committed against each team, but those are not available. Also, errors don't just put runners on base, they also advance the runners. So the error rate could be added to SLG instead of OBP. The regression was actually less accurate in that case and it was also less accurate when the error rate was added to both. Plate appearances only counted at-bats and walks.


The Sean Lahman Database