UPDATE ON FEB. 22-I CAN SEND MY DATA TO ANYONE WHO WANTS IT. JUST SEND ME AN EMAIL (firstname.lastname@example.org).
Last week I posted some regression results in which team runs per game was the dependent variable (DV) and the OBP and SLG of each lineup postion were the independent variables (IVs). In some cases, the coefficient values were very different. I only looked at teams from 1989-2002. Retrosheet has how each team's lineup position did from 1959-2004 (but not the NL in 1959). So I went back and used all of the data that they have.
Here are the results:
Again, the coefficient values are not the same and vary quite a bit. I also ran the regression using only teams that had the DH (yes, sometimes the NL gets to use the DH-I mean AL teams only from 1973-2004). Here are those results:
You can judge for yourself if there are any differences when using only DH teams (where a real hitter bats 9th and not a pitcher).
Then I went back to all teams and put it SB and CS per game. Here are the results:
I know, in some cases things don't make sense. We see some negative values for SBs and positive values for CSs. I mentioned this last week. But I just wanted people to get a chance to look at this.
Then I did the SB/CS regression for DH teams only. Here are the results:
Again, judge for yourself if the differences are meaningful.
Now there could be collinearity between the IVs. I discussed this a little last week. I did not run any test yet for it this time. If I do, I will update this story. I ran a regression with some different variables to avoid or lessen this problem. Each lineup slot had 3 variables: walk percentage, hit percentage and extra-base percentage. For walks, hits, and extra-bases, the denominator was plate appearances (PAs). This is a little different than comparing OBP and SLG since OBP has PAs as the denominator and SLG has ABs. Also, by using extra-bases, it is a little like isolated power. SLG is not always as good measure of power because a guy who hits a single drives up his SLG. Isolated power is SLG - AVG, or extra-bases divided by ABs. Of course, here, I am using PAs. H1 is the hit% of the leadoff man, W1 is the walk% of the leadoff man, XB1 is the extra-base% of the leadoff man, etc. Here are the coefficient estimates:
Then I added SB and CS in. Here are those results: