Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: U.S. Government Shuts Down Streaming Websites

Lineup Analysis: A Cheat Sheet?

I'll be honest, I struggled to buy into some of the ideas of the lineup analysis tool that went up on Baseball Musings today, based on Cyril Morong's work with coefficients here at BTB and Ken Arneson's programming skills from Catfish Stew.

After thinking about it and looking at the coefficients again and playing around with a LOT of different lineups, I think I've come up with a "handy-dandy lineup tip sheet." If I inadvertently crashed the Baseball Musings server, I deny all charges. And that program kept me from finishing up two papers I have due next week. But I think it's worth it.

First off, imagine you have a National League team that looks like this, and your projections for their production look exactly like their lines from last year. Here's your team.

POS - Player (OBP/SLG)

C - Gregg Zaun (.355/.373)
1B - Paul Konerko (.375/.534)
2B - Chase Utley (.376/.540)
SS - Derek Jeter (.389/.450)
3B - Rob Mackowiak (.337/.389)
LF - Jay Gibbons (.317/.515)
CF - Andruw Jones (.347/.575)
RF - Vernon Wells (.320/.463)
P - Pitcher (.200/.250)

There's a little bit of a suspension of disbelief involved, b/c we're going to assume that all the players involved played in the same park. But the specifics are less important than the actual concepts, here. The players themselves just serve to make this easier to write.

First off, what would you expect the standard lineup, from a "conventional" line of thought, to look like?

  1. Jeter
  2. Mackowiak
  3. Utley
  4. A. Jones
  5. Konerko
  6. Gibbons
  7. Wells
  8. Zaun
  9. Pitcher
I think it would look something like that, to be honest. The scrappy hitter bats number 2, because he can move the leadoff hitter along. Then the boppers drive him in.

Now, for the "stathead's" lineup.

  1. Jeter
  2. Utley
  3. Konerko
  4. Jones
  5. Zaun
  6. Gibbons
  7. Wells
  8. Mackowiak
  9. Pitcher
I bunched the high OBP guys together at the top, and I moved Zaun up in the order, b/c his ability to get on base makes him more valuable than an 8 hitter. Mackowiak has little value, so we should try to minimize his plate appearances.

The conventional lineup gets 4.852 runs/game, in the simulator. My "stathead" lineup does a little better, at 4.885 runs/game.

But the best lineup, the optimal one, gets 5.082 runs/game.

So, what the heck is going on here?

When you think about it, the real goal of a lineup is to fit together well, so that each player's strongest skill is most suitable to his position in the lineup. Somewhere along the line, strange roles were assigned to each spot. Then a statistically minded person comes along and says, "these roles are arbitrary! Let's stack the best hitters and do this more intelligently!"

Both sides are partially right.

A lineup IS at its most efficient when the pieces fit well together, and the conventional wisdom was right on a couple of accounts: the leadoff hitter should get on base, and the 4-hitter should hit for power. But a lot of those roles were arbitrary. And while the simulator that I mentioned earlier does not account for speed and "in-game tactics," this is a big deal.

The reason why I'm posting this is because I picked these 8 guys + pitcher, and managed to order them in the most optimal way without the use of the simulator. I used the coefficients, and trial and error, to determine a rough strategy for ordering your lineup. Here's the lineup:

  1. Jeter
  2. Utley
  3. Mackowiak
  4. Jones
  5. Konerko
  6. Gibbons
  7. Wells
  8. Pitcher
  9. Zaun
And here's the tipsheet:
  1. This is the most OBP-centric spot in the lineup. Your hitter here might very well be your best hitter, IF his best attribute is his OBP. A hitter with a .425 OBP and a .500 SLG would fit in here well, provided that there's not a better OBP threat elsewhere on the roster. When I looked at it, I decided that Derek Jeter is really the optimal leadoff hitter. He has a good OBP and acceptable power, and he's generally a solid hitter.
  2. The 2-hitter should be the lineup's most balanced hitter, a good combination of OBP and SLG. David Wright fits the bill here, as does the player I chose, Chase Utley. The first guy I thought of was Mike Lowell in his prime, when I looked at the results and coefficients.
  3. This was the biggest surprise: the 3 hitter should be the player that doesn't fit into any of the other spots. Every other spot has some significance, but if I were building a lineup, I would just put the leftover player in the 3 hole. This seemed very counterintuitive to me when I first heard it, but David Pinto noted, "Part of what it's telling us is that you need to spread out your easy outs." I still struggled to get this, but I'm starting to, now. Marc said something to the effect of "the worst players have to go somewhere." I guess this is really it; the other spots just have greater needs. If you can get a good hitter here, it means that your lineup is very deep.
  4. This is the bopper. This guy's best attribute should be his power, with OBP being of secondary importance. He should be the foil to the leadoff hitter, in a way; both players could be similar if they're both very complete. Andruw Jones, though, is an ideal #4 hitter: slightly above average OBP, and "phenomenal cosmic power," to quote Aladdin.
  5. Picking the 5 hitter is simple: it's the second choice for the two slot. Paul Konerko, who I picked for this spot, had a very similar line to our #2 hitter, Chase Utley.
  6. The 6 hitter shows the biggest difference between SLG and OBP on the roster. This is because you're going to want to have guys driving in the leftovers. The 6 hitter is the most exclusively power-dependent hitter of the bunch. His OBP is VERY unimportant. Alfonso Soriano and Jay Gibbons are good picks for this slot.
  7. The 7 hitter is the less extreme version of the 6 hitter, with less of a need for power and more usage for OBP. I picked Vernon Wells here.
  8. This is the worst hitter in the lineup. If it's the pitcher, he goes here, unless it's Dontrelle Willis or Jason Marquis or someone similar. This is because you'd rather not put the pitcher close to two of the best hitters in the lineup: the 1 and 2.
  9. The 9 hitter should be a "punchless wonder," of sorts. Scott Podsednik, Gregg Zaun, and Brad Ausmus fit into this role nicely: guys with acceptable OBPs and absolutely no power. This is the "stereotypical leadoff hitter" to the extreme. He's not actually leading off because you don't necessarily want these guys to imbibe plate appearances, I think.
This is all very new stuff, and I could have interpreted this wrong. I think that Cyril Morong is onto something because you CAN rationalize these positions, even if there's a high level of initial cognitive dissonance. I would say that you should try to go over this checklist when you try and optimize the lineup for whatever team you want, and see if it checks out or comes close to it.

Lineup order is not vital, but I've seen simulated teams jump up 3-4 wins with this optimization. I'm not an economist or an Econ major or minor, even, but this looks a great deal like the issue of maximizing utility in economics. Players have strengths and weaknesses, and ordering them in a certain way can help you to maximize those strengths and weaknesses. I guess the next step in this whole model is speed. A caveat I gave to the leadoff man is that I would try and avoid him being a "slow, fat slob" of a baseball player, but that's just out of personal intuition and what seems logical rather than anything proven mathematically. But otherwise, this data looks very interesting. Definitely try out the simulator if you haven't, yet.

Please feel free to leave feedback, to elaborate on rationales for putting a player in a certain spot, or with general criticisms.

Comment 14 comments  |  0 recs  | 

Do you like this story?

Comments

Display:

Comments
Can be found at Baseball Think Factory

Thanks to Repoz for the link.

"I don't set the rosters, I just make fun of the guy who does" - Rob Neyer

by Marc Normandin on Feb 25, 2006 10:06 PM EST reply actions  

Lineups
Dan,

Glad you found my lineup study interesting.

I think the simulators or estimators are using the values from the first study that only used the 1989-2002 seasons. In that case, the value for SLG3 was lower (.93) than it was when I used all years (1.2). I think with more years the values make sense, but the 1989-2002 study might be more relevant for current teams. But it is possible if people used the values from the second study the lineups generated might be different.

I am not sure if this is like utility maximization. In that case, we assume the consumer spends all of their income. Then, if they buy just two goods, the marginal utility of each good divided by its price is the same for both goods. It is like taking into account the cost and benefit of each good. So for the lineup, I guess you want to take into account the cost and benefit of putting a guy in a lineup slot. Perhaps the regression analysis does that. I am not really sure.

Also, I was only looking to see if the values for OBP and SLG varied with each lineup slot. If teams then started using those values to create lineups, and then we ran regressions on those teams, the results might be different. I am not really sure about that, either. Maybe once those new values for OBP and SLG were taken into account, they would suggest different lineups.

Cy

by Cyril Morong on Feb 25, 2006 11:28 PM EST reply actions  

Lineups
That may be a good idea. But I think I will have to do alot of work to separate out the rest of the team (do that 9 times-but I suppose I could just throw in the overall team value and estimate what the other 8 guys did-if I get a chance I'll do it). It might miss the interaction between lineup slots both right before and after the slot you are looking at. The guy batting in front of Bonds or McGwire will now be evaluated as if Bonds and McGwire were just like everyone else on the team.

But like I said, I did not start out trying to find optimal lineups. Just to see if the OBP and SLG values might vary with lineup slot.

by Cyril Morong on Feb 25, 2006 11:52 PM EST reply actions  

Cyril, can you comment?
I posted the below on another post (then re-edited to this post) and wanted Cyril to comment.

You provided AL/DH split but I was hoping you could provide a NL split so that I can see what happens there, because the AL split does not follow the overall pattern, there is a choice to be made between 2nd and 4th, as the value of OBP goes up much higher 4th than for overall (would explain the value of Rob Deer hitting 4th in AL).

What I found interesting from your overall results is that the 2nd position appears to be the spot that you want your best overall hitter to hit at.  I've seen sim results that suggested that batting your best hitter 2nd was best for the lineup (plus your best OBP hitter 1st but random otherwise).  This would confirm that.

In addition to having your best hitter bat 2nd, the overall numbers suggests that SLG is valued the most in the 2nd position, which is a result I found more surprising.  I wonder what others thought about this result, I haven't seen anyone point this one out yet.  

I found that pretty shocking, 4th and 5th is where you typically hit your sluggers and 3rd was usually your best hitter, but this suggests that having a hitter (like the Giants did in 2005 with Vizquel) who can hit and run or bunt over the runner in the 2nd position is not what you are looking for, the Giants would be better off hitting Vizquel leadoff to take advantage of his OBP and Durham or Winn 2nd to take advantage of his SLG.  Or as Felipe suggested, batting Barry there.

That would also explain why the Giants did so well in 2004 when they had JT Snow hitting there because he had a high OBP and was hitting for a higher SLG that season as well.

by BiasedGiantsFanatic on Feb 26, 2006 1:20 AM EST up reply actions  

Cyril, can you comment?
If I get a chance, I will look at the NL only (or atleast non-DH seasons). You are probably right about the number 2 guy and the Giants.

One possible type of analysis is that we could look at teams and see who they bat number 2. The question might do teams with a non-traditional number 2 guy score more runs (or score more than expected based on team SLG and team OBP) and do teams with a traditional number two guy score less? That might take time to go through all the teams and set that up. But maybe that would just mean running a regression like anonymous hero suggests: Do each slot one by one and separate out the team data.

by Cyril Morong on Feb 26, 2006 11:40 AM EST up reply actions  

Lineups
I don't want to be accused of being a member of the Flat Earth Society again, and I do think this is all fun, but I have a problem with drawing concrete conclusions from Cyril's analysis.

That's because, no matter how much Cyril worked on the analysis (and I think he did a fine job), the coefficients are all dependent on each other.

Let me give an example.  Consider first that the overall weight given to OBP and SLG depends on the run environment.  I ran my own regression and found that a point of OBP is worth about 60% more than a point of SLG in low-scoring environments (less than 3.8 runs per game) but 100% more in high scoring environments (more than 5.2 runs/game).

Now expand that line of thought (that weights are dependent on the overall environment) to specific lineup positions.  Just as an example, I looked at the relative weights of OBP vs. SLG for leadoff batters when the 3/4/5 hitters had high slugging averages and when they had low slugging averages.  

I found what you might expect, that the relative weight of the OBP coefficient was about 150% higher with sluggers in the middle of the lineup.

To me, applying static weights to a situation that is highly dependent really isn't an appropriate way to approach the issue.

by studes on Feb 26, 2006 9:02 AM EST reply actions  

Flat Earth?
First off, there is nothing that would make you part of the so-called "Flat Earth Society" simply by not taking everything you see online as fact, even from reliable sources. I really did want to get some criticisms of this, first of all, and second of all, the characteristics of being part of a "Flat Earth Society" would be to ignore all pertinent evidence and to ignore what can be proven to a good extent statistically.

This information and my interpretation are both still quite in question. By not checking our information, we're leaving ourselves very susceptible to making faulty assumptions and flawed conclusions about, well, anything.

As far as the descriptions I used for the lineup slots, I saw this simulator and the original articles, and I was confused at first because of the contradictions to my typical train of thought. I wanted to find a laymen's description of what this all means practically, if we choose to accept it. I didn't find one, so I tried to put it together myself.

This is really a way to reconcile the "old way" of basing a lineup on some preconceived notions about batting order, where you'd have a leadoff hitter who got on base and ran fast but probably didn't have much power, and then a guy who "handled the bat well," and then a guy who was a complete hitter and possibly your best hitter, etc, etc, etc.

I think that this way, with these descriptions of lineup slots, makes sense as a rough guide and as a potential replacement for conventional wisdom as far as creating a lineup because it IS based on some statistics, rather than just what one manager decided a bunch of years ago. (Speaking of which, does anyone know the origin of the modern lineup order? That would be an interesting article to read.)

Your points about the run environment, I did not consider. I mentioned run environments when talking to Marc, but that was more about the variance that a run environment would cause rather than how the run environment would affect the lineup order. To try and take that a step further, would one be able to apply static coefficients if they knew the run environment in advance? Or is this a dead end, in your opinion?

by Dan Scotto @ Beyond the Box Score on Feb 26, 2006 10:08 AM EST up reply actions  

skeptical
I guess I don't see how regression analysis can be applied to something as dynamic as lineup construction to get anything useful out of the results.  As Cyril says, I think running a simulator 10,000 times would be a much, much better way to find optimal lineup constructions.  Maybe Markov chains, but aren't simulators just sophisticated Markov chains?

Maybe, if you isolate run environments and the performance of everyone else in the lineup, and you use Cyril's three variables instead of OBP and SLG, you'd have a shot.  But then the sample sizes would probably be too small.  And I'm still not certain that would be legitimate.

Caveat: I'm not a statistician like you guys.  These are just my opinions, for what they're worth.

by studes on Feb 26, 2006 1:41 PM EST up reply actions  

Flat Earth
The Flat Earth comment wasn't directed towards us I don't think, heh. That's a whole different story...

"Caveat: I'm not a statistician like you guys."

I think your a tad more advanced as a statistician than I am Studes. Just a smidge, y'know? I can interpret numbers, but I can't make them. Hearing that something may be mathematically inaccurate with the data is something I need to hear, because I just know how to read the data thats given for the most part.

"I don't set the rosters, I just make fun of the guy who does" - Rob Neyer

by Marc Normandin on Feb 26, 2006 3:39 PM EST up reply actions  

Yeah
I'm no statistician, myself; my experience is from many hours on Excel and a couple of hours skimming a few books.

I'm quite out of place at a "sabermetrics blog," honestly, but don't tell anyone that. :)

by Dan Scotto @ Beyond the Box Score on Feb 26, 2006 5:11 PM EST up reply actions  

Lineups
I generally agree with the comments from studes.

When I first did this, I was not thinking about the optimal lineups. If I said that I was or hinted at it, I really did not want to. I just wanted to see if the OBP and SLG values varied with lineup slot.

Maybe someone can use a simulator and then create lineups based on the lineup projections that people have been creating to see if they are accurate.

by Cyril Morong on Feb 26, 2006 11:34 AM EST up reply actions  

Lefty/right splits?
You are probably right. But the Retrosheet data just shows what all the players in a given spot of team's lineup did. It does not break it down by Lefty/right splits.

But when putting together a lineup, if you know how the hitters do against in Lefty/right splits, you might be able to apply it that way. Not sure.

by Cyril Morong on Feb 26, 2006 11:20 AM EST reply actions  

Maybe the simulator
that showed Tike Redman should bat 3rd in last year's Pittsburgh lineup wasn't so wrong aftera all.

I'll say it again: I have a simulator built in MATLAB.  If anybody has the chops to convert it to java or c, I can send you my simulator.  You can then run all the sims you want (9!, even), and do away with regression analysis.

by salb918 on Feb 26, 2006 5:21 PM EST reply actions  

simulator in C
I've been hacking together a simulator in C that takes the general concept of what you did in MATLAB, to run through all the permutations. Right now it's looking like to get any sort of confidence about the run totals (+- 1 run per 162 game season with ~ 95% confidence) I have to run on the order of 150,000 - 250,000 games.

Running all 9! combinations is going to take a couple of days on my Powerbook G4...

If anyone wants me to test a lineup for them, let me know - the stats I'm using are PAs, BBs, hits, 2Bs, 3Bs, HRs, SOs.

by false cognate on Mar 16, 2006 3:48 PM EST up reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
Free Agent Compensation
Img_0001_small
Value of Various Plate Approaches
Strike_three2_small
Effect of Foul Area on Strikeouts: AL 1954-68: Erratum
Small
Baseball on a stick
Small
Player Evaluating Statistic
Baseball_small
Rays Outfield: Cheap but Extremely Productive
Small
A new xBABIP
Small
Jack Morris "pitching to the score"
Strike_three2_small
Foul Area and Differences in SO: AL vs NL
Baseball_small
Is there a Kuroda and Oswalt Alternative?

+ New FanPost All FanPosts >

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Picture-6_small Chris St. John

Btbpro_small Dave Gershman

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung