Beyond the Box Score: An SB Nation Community

Navigation: Jump to content areas:


Sports blogs for fans, by fans.
New Blog: RSL Soapbox for Real Salt Lake Fans!

Value of OBP and SLG by Lineup Position

There is an updated version of this at Value of OBP and SLG by Lineup Position, Part 2.

One question that often comes up is "what is the relative value of on-base percentage (OBP) and slugging percentage (SLG)?" Is OBP 50% more important than SLG? Or 60%? Or something else? A stat called OPS simply adds the two, giving them equal weight. But maybe the weight should not be equal. For example, here is the regression equation of team runs per game for the years 2001-03:

R/G = 17.11*OBP + 11.13*SLG - 5.66

This makes OBP about 53% more important than SLG, a fairly typical result. But it is possible that OBP might be more important for certain positions in the lineup, like the leadoff batter. And for SLG, it might be more important for the cleanup hitter. To check this out, I ran a regression in which team runs per game was the dependent variable (DV) and the OBP and SLG of each lineup slot as the independent variables (IVs). OBP1 means the OBP of the leadoff batter, SLG3 means the SLG of the third place hitter, etc. I used data from Retrosheet for the 1989-2002 seasons. Retrosheet shows the stats for each team by lineup position. Below are the coefficient values for the IVs.

There is quite a variance. A point of OBP is worth about .003 runs per game from the leadoff man (a .021 increase in the leadoff OBP would be about .063 runs more per game or 10 for a whole season, which usually means about 1 win) The value of OBP is much less for the number 8 man. For the leadoff man, OBP is three times as important as SLG. For the cleanup hitter, they are almost the same. So this analysis shows that the relative values of OBP and SLG could be different depending on the lineup position of the batter in question.

Mark Pankin has already looked at this issue using a tool called Markov Chains. He presented his results at the SABR convention in 2004. His study is on line at:

http://www.pankin.com/sabr34.pdf

There could be multicollinearity in my analysis, meaning that the coefficient estimates are not as reliable as they could be because IVs are highly correlated with each other. I discuss what I did to detect multicollinearity below. But if this were a problem, I tried a different, but similar model where the IVs would likely be less correlated with each other.

Each lineup slot had 3 variables: walk percentage, hit percentage and extra-base percentage. For walks, hits, and extra-bases, the denominator was plate appearances (PAs). This is a little different than comparing OBP and SLG since OBP has PAs as the denominator and SLG has ABs. Also, by using extra-bases, it is a little like isolated power. SLG is not always as good measure of power because a guy who hits a single drives up his SLG. Isolated power is SLG - AVG, or extra-bases divided by ABs. Of course, here, I am using PAs. H1 is the hit% of the leadoff man, W1 is the walk% of the leadoff man, XB1 is the extra-base% of the leadoff man, etc. Here are the coefficient estimates:

Again, there are some big differences. The value of a walk to the leadoff man is twice what it is for the number 6 man. The cleanup hitter has the highest extra-base value.

I did try some other variables. I had SBs and CS per game in the first model with OBP and SLG. Things were generally fine there except that in a couple of cases, the value of a CS was positive and in one case the value of a SB was negative. Why some lineup slots would have negative values for SBs or positive values for CS is not clear. I tried one regression with just the AL since they have the DH and a regular player bats ninth. The results seemed about the same. Email me if you want those.

Multicollinearity. In the first model with OBP and SLG, most of the correlations between the IVs were under .5. But some were higher and they were all the OBP and SLG for corresponding lineup positions. The correlation between OBP1 and SLG1 was .596. Those correlations ranged from .596 to .739, except for OBP9 and SLG9, which was very high, at .897. But in the second model, only one correlation between IVs was over .5 and that was H9 and XB9 at .648. The vast majority of the others were under .2.

Another way to check for multicollinearity is to run regressions in which one IV is a function of all of the other IVs. In the first model with OBP and SLG, the r-squared was generally in the .5-.6 range (that was 18 regressions). R-squared tells us how what percentage of the variation in the DV is explained by the model. There is a stat called the "variance inflation factor" or VIF. It is 1/(1 - r-squared). So if r-squared was .5, 1- .5 = .5. Then 1/.5 = 2. A couple of sources I looked at suggested that if the VIF is under 10, multicollinearity is not a problem. Most of these were about 2. One got close to 6 (that was SLG9). I did come across one source that said there is no rule about the value of VIF and multicollinearity.

For the second model, I only ran a couple of these regressions where one IV depended on all the others. The first one was W1 and the r-squared was only about .2. I tried XB9 (which corresponds a little to SLG9, the one that was closest to being a problem in the other model) and the r-squared was only about .4, which would mean a very low VIF of about 1.7.

Also, multicollinearity is supposed to be a problem where the standard errors of the coefficient estimates are high. This makes it hard for the estimates to be significant. But that was generally not the case here. One thing I don't know about is that there might be some kind of joint hypothesis about the VIF. Maybe if you have a large number of IVs it only takes a certain number to have a VIF over 2 or something like that for there to be a problem.

0 recs  |  Comment 10 comments

Story-email Email Printer Print

Comments

Display:

Additional Comments
Comments can be found at Baseball Think Factory

As always, thanks to Repoz for the link.

"I don't set the rosters, I just make fun of the guy who does" - Rob Neyer

by Marc Normandin on Feb 13, 2006 12:56 PM EST reply actions   0 recs

Interesting
So if the leadoff hitter's OBP is more important than his slugging, did the Red Sox make a mistake by using Damon as the leadoff man with his homerun power and league average-ish OBP? I'm not sure there was another viable option in the lineup, but work with the scenario.
"I don't set the rosters, I just make fun of the guy who does" - Rob Neyer

by Marc Normandin on Feb 13, 2006 12:57 PM EST reply actions   0 recs

Not as much as they are about to
assuming the media gets its way with batting Crisp leadoff.

Loretta and Youk would make a much better 1/2 combo than Crisp and either.

The Sox biggest mistake last year was batting Renteria, with the lowest OBP on the team and one of the highest GIDP, second. Damon, while not one of the team leaders in OBP, was among the lowest regulars in SP, so it wasn't much of a waste there. Having a strong 1-9 lineup I expect also increases the value of SP from the #1 slot.

by cdamon on Feb 13, 2006 2:25 PM EST up reply actions   0 recs

Damon
You raise an iteresting point. To whatever extent the values of OBP and SLG that this method foudn are true, can they be used to improve actual lineups? I don't know off the top of my head. Maybe it would be simple, like just finding which guy is the best in each slot based on their OBP and SLG. Plug the numbers in for the leadoff spot. The guy who comes out highest should bat first. But what if he also comes out highest at another position?

Or you could check each guy and see what his best spot is? But what if two guys both have cleanup as their best slot? Maybe this would have to be done by trial and error after some initial calculations. Or maybe their is some kind program or equation or algorithm that would do it. I certainly don't know right now. I'll have to think about it.

by Cyril Morong on Feb 13, 2006 5:10 PM EST up reply actions   0 recs

straightforward optimization problem
Each player has a value for each lineup spot. Even the brute force solution for this only has 9! lineup possibilities to check, meaning it can be done on a computer in less than a second.

by cdamon on Feb 13, 2006 9:09 PM EST up reply actions   0 recs

Maybe...
Sal can run his simulator (cough, cough) using some of the options. Unless he's too busy with academia, which is entirely possible.
"I don't set the rosters, I just make fun of the guy who does" - Rob Neyer

by Marc Normandin on Feb 13, 2006 10:58 PM EST up reply actions   0 recs

straightforward optimization problem
Is that something that needs to be programmed or can it be set up in a spreadsheet?

by Cyril Morong on Feb 13, 2006 11:05 PM EST up reply actions   0 recs

Programmed
unless someone who knows Excel much better than I knows some trick.

I can probably write you a tool to do it if you are seriously interested and don't have the expertise.

by cdamon on Feb 14, 2006 9:06 AM EST up reply actions   0 recs

I'll see what my
sim can do, but the generally accepted result, and the one my prelim results suggest, is that batting order doesn't matter all that much.

by salb918 on Feb 16, 2006 12:05 AM EST up reply actions   0 recs

Program
Over on Catfish Stew.

by kenarneson on Feb 21, 2006 4:20 PM EST reply actions   0 recs

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?
Start posting on Beyond the Box Score »

Join SB Nation and dive into communities focused on all your favorite teams.

FanPosts

Community blog posts and discussion.

Recent FanPosts

Leopold_butter_scotch_southpark_small
Using the TVC
Small
Determining Batted Ball Rates using Pitch Type and Location
Small
a new xBABIP calculator
Img587561916661595
Top 15 high school MLB draft prospects
Small
PZR-based Win Values 2001-2006
Small
The "30 parks on a budget" challenge
Sunflower_small
World Series Simulation, Game #6
Small
JT20 Dynasty League
E52205a2_small
New Look
Sth70021_small
Exploring Hit f/x, Albeit Badly

+ New FanPost All FanPosts >

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

Primer on BaseRuns
Cool Baseball Infographics
ESPN's Jerry Crasnick on defensive metrics
I’m also a follower, since Brian Bannister’s on our team, of sabermetric st...
Top Ten Baseball-Reference.com's Sponsorships
Primer on Linear Weights
JC Bradbury on "Hot Stove Myths"
Everyone Should Learn to Throw a Cutter
Criminals of WAR
Ten statisticians you should know about

+ New FanShot All FanShots >

BtB on Twitter

Main Feed: @BtBScore

Tommy B: @tommy_bennett
Sky: @BtB_Sky
Dan: @dturkenk
Harry: @harrypav
Jinaz: @jinazreds
Jack: @jh_moore
Erik: @Erik_Manning
Tommy R: @trancel
Justin: @justinbopp

Subscribe to BtB via Email

Enter your email address:

Delivered by FeedBurner

BtB Goes Social


Managers

Nando_small R.J. Anderson

Limes_125_small Sky Kalkman

E52205a2_small Tommy Bennett

Editors

Face_small Harry Pavlidis

Rawlings_baseball_bigger_small Dan Turkenkopf

770insig_small Jeff Zimmerman (TucsonRoyal)

Aviles_small Justin Bopp

Authors

Banny_small erik

Raysring1_small Tommy Rancel

Jinaz-reds-avatar_small JinAZ

Jmlogo_small Jack Moore

1753738656_110919ebe9_o_small vivaelpujols

1_small Graham

Baseball_small Mike Rogers

Redcap_small SFiercex4

Small Patrick Clark

Walter_album_small Walter Fulbright