Value of OBP and SLG by Lineup Position, Part 2
UPDATE ON FEB. 22-I CAN SEND MY DATA TO ANYONE WHO WANTS IT. JUST SEND ME AN EMAIL (cyrilmorong@sbcglobal.net).
Last week I posted some regression results in which team runs per game was the dependent variable (DV) and the OBP and SLG of each lineup postion were the independent variables (IVs). In some cases, the coefficient values were very different. I only looked at teams from 1989-2002. Retrosheet has how each team's lineup position did from 1959-2004 (but not the NL in 1959). So I went back and used all of the data that they have.
Here are the results:
Again, the coefficient values are not the same and vary quite a bit. I also ran the regression using only teams that had the DH (yes, sometimes the NL gets to use the DH-I mean AL teams only from 1973-2004). Here are those results:
You can judge for yourself if there are any differences when using only DH teams (where a real hitter bats 9th and not a pitcher).
Then I went back to all teams and put it SB and CS per game. Here are the results:
I know, in some cases things don't make sense. We see some negative values for SBs and positive values for CSs. I mentioned this last week. But I just wanted people to get a chance to look at this.
Then I did the SB/CS regression for DH teams only. Here are the results:
Again, judge for yourself if the differences are meaningful.
Now there could be collinearity between the IVs. I discussed this a little last week. I did not run any test yet for it this time. If I do, I will update this story. I ran a regression with some different variables to avoid or lessen this problem. Each lineup slot had 3 variables: walk percentage, hit percentage and extra-base percentage. For walks, hits, and extra-bases, the denominator was plate appearances (PAs). This is a little different than comparing OBP and SLG since OBP has PAs as the denominator and SLG has ABs. Also, by using extra-bases, it is a little like isolated power. SLG is not always as good measure of power because a guy who hits a single drives up his SLG. Isolated power is SLG - AVG, or extra-bases divided by ABs. Of course, here, I am using PAs. H1 is the hit% of the leadoff man, W1 is the walk% of the leadoff man, XB1 is the extra-base% of the leadoff man, etc. Here are the coefficient estimates:
Then I added SB and CS in. Here are those results:
0 recs |
16 comments
Comments
Hmm
by Marc Normandin on Feb 19, 2006 10:11 PM EST reply actions 0 recs
My guess is sarcasm...
by Dan Scotto on Feb 19, 2006 10:14 PM EST up reply actions 0 recs
Power problems (the stats kind)
One concern: given your sample size (big) small variances may have a trendency to come out as more significant than they really are. This does not appear to be a problem for your H, W, XB terms, but I would take a hard look at the SB and CS results.
I'm just speculating, but maybe stolen bases shouldn't be assigned to batting order positions. I see the theoretical basis for doing so in the H, W, XB case (if only because your leadoff guy is more likely to be followed by a power threat); do you have a reason to think that a stolen base by a given batting order position would be worth more?
Have you tried using a more aggregate SB term?
by sunandrain on Feb 20, 2006 11:11 AM EST reply actions 0 recs
SBs, etc
I only put the SB/CS results there in case anyone was interested. I really don't know what to make of them.
It is possible that a SB in front of a guy who gets alot of hits (or just more singles, which I did not include) is more valuable than a SB in front of a guy who walks alot. That is the theoretical justification. Maybe it is a weak one.
Can you give me an idea of an aggregate SB term to try? Should I just try total SB/CS for each team instead of each lineup slot? Or maybe for the top 3, the middle 3 and last three batters?
by Cyril Morong on Feb 20, 2006 11:39 AM EST up reply actions 0 recs
This may be a repost - if so, ignore
If I were to test your theory, I'd do it a little differently. The question is whether an attempted steal changes the number of outs produced by the baserunner/batter combination, right? We can assume that fewer outs = more runs. Could your dataset isolate this kind of combination (OBP of batter at the time of attempted steal, and outcome)? That would be kind of cool to know.
In terms of an aggregate, I had thought of team aggregate and top and bottom of lineup. Thirds as you suggest make more sense. Basically, this becomes a control variable for what you demonstrate with the other variables.
Incidentally, your blog is really great. How long have you been doing this?
by sunandrain on Feb 20, 2006 1:02 PM EST up reply actions 0 recs
Stealing
I am not trying to see whether whether an attempted steal changes the number of outs produced by the baserunner/batter combination. Some of my earlier posts touch on that. Three people who have done good work on this are Tom Ruane, Ted Turocy and Mark Pankin. Also, I think this issue is discussed in "The Book" that is coming out soon by Tangotiger and others (mine has not come yet). This data set will not isolate this kind of combination (OBP of batter at the time of attempted steal, and outcome). But you should search for those three guys on Google to see what they have come up with.
by Cyril Morong on Feb 20, 2006 3:01 PM EST up reply actions 0 recs
The Book
by Marc Normandin on Feb 20, 2006 3:08 PM EST up reply actions 0 recs
Quick request
by Marc Normandin on Feb 20, 2006 2:59 PM EST reply actions 0 recs
Quick request
What is this new book?
by Cyril Morong on Feb 20, 2006 3:02 PM EST up reply actions 0 recs
Re: book
It has been really interesting thus far, and I'll most likely throw an advertisent on the sidebar for it when I am finished. Dayn has some interesting ideas that he backs up with plenty of stat analysis, and he's entertaining just speaking in terms of his writing style as well.
by Marc Normandin on Feb 20, 2006 3:06 PM EST up reply actions 0 recs
Advertisement*
by Marc Normandin on Feb 20, 2006 3:07 PM EST up reply actions 0 recs
DH SLG3?
I ran this on the A's using my script over on Catfish Stew, and it produces a lineup that consistently puts Mark Kotsay batting third. That doesn't seem right.
by kenarneson on Feb 21, 2006 4:39 PM EST reply actions 0 recs
This is pretty radical, actually.
Here are the types it suggests are ideal. I'll add up the OBP and SLG scores, just as a quick and dirty way to compare each slot's relative importance using this data:
#1: High OBP, Low SLG (3.55)
#2: Mid OBP, High SLG (3.34)
#3: Mid OBP, Low SLG (2.48)
#4: High OBP, Mid SLG (3.38)
#5: Low OBP, High SLG (2.44)
#6: Low OBP, High SLG (2.41)
#7: High OBP, Low SLG (3.16)
#8: High OBP, Low SLG (3.06)
#9: Mid OBP, Low SLG (2.22)
by kenarneson on Feb 21, 2006 7:21 PM EST up reply actions 0 recs
DH SLG3?
My main concern was seeing if the values for OBP and SLG would differ with lineup position. I think this shows that they do even if what I found are not the "true" values. You might take a look at what Mark Pankin found on this using Markov chains.
by Cyril Morong on Feb 22, 2006 11:58 AM EST up reply actions 0 recs
DH SLG3?
by Cyril Morong on Feb 22, 2006 10:44 PM EST up reply actions 0 recs
Check your e-mail
by Marc Normandin on Feb 25, 2006 1:24 PM EST reply actions 0 recs

by Cyril Morong on 








BtB on Facebook

















