Can the leadoff batter make a big difference? Does it matter which player hits first? To look at these questions I first determined how many more or fewer runs teams scored than expected based on their stats. Then I checked to see if there was some relationship between this over/under and the stats of the leadoff man.
First, I ran a regression using all teams from 1989-2000. The dependent variable was team runs per game. The independent variables were team OBP, team SLG, team SBs per game and team CS per game. But some teams might have gotten lucky, perhaps hitting better with runners on base. This would allow them to score more than expected, which may or may not have anything to do with the leadoff man. So I broke down OBP and SLG into the none on (NONE) and runners on base (ROB) cases. Here is the regression equation:
R/G = -5.62 + 7.84*NONEOBP + 3.73*NONESLG + 9.44*ROBOBP + 7.03*ROBSLG + .222*SB - .305*CS
The r-squared was .951, meaning that 95.1% of the variation across teams in runs per game is explained by the equation. The standard error is .13 or about 21 runs per 162 game season. If I had not broken down OBP and SL into the NONE and ROB cases, the r-squared was .934 and the standard error was .15 or about 24 runs per season. So there is some value in using the NONE and ROB cases since the standard error falls by about 12.5%. All of the variables were statistically significant.
Then each team's runs per game was predicted using the regression equation. Then the difference between their actual runs per game and the predicted value was found. That is called DIFF. Then I ran a regression with DIFF as dependent variable and the stats for each team's leadoff man as the independent variables(leadoff data comes from Retrosheet). Here is the regression equation:
DIFF = -.147 + .394*OBP + .047*SLG + .023*SB - .32*CS
The r-squared was only .016. None of the variables were statistically significant. So whatever causes teams to score more or fewer runs than expected may not be the quality of the leadoff man. What the leadoff man brings to the table does not seem to explain why a team scores more or less than expected.
There are a couple of interesting things, though. A CS by the leadoff man is extremely damaging. Its negative value is about 14 times the positive value of a SB (normally this ratio is about 1.5 to 1). So it is a bad idea to have your leadoff man get caught stealing while the SB is not that big of a help.
The other interesting thing is that even if the leadoff man has a high OBP it does not help much more than having any player with a high OBP For that, I ran a regression without breaking things down into the NONE and ROB cases. The equation was
R/G = -5.12 + 16.16*OBP + 10.41*SLG + .371*CS - .254*CS
Suppose that a player has a .405 OBP. If that guy gets added to an average team (with, say, a .333 OBP), the team OBP goes up .008. Multiplying .008*16.16 gets .129 more runs per game. For a whole season, that is about 21 more runs. Now for leadoff man, the extra runs go up by .394 for every one point of OBP. If the leadoff man had a .405 OBP, team runs per game would go up by .405*.394 or .159. That is 25.82 for a season and about 5 more than if any player had a .405 OBP. Not quite an extra 5 runs over any guy having a .405 OBP.