There's going to be a decent helping of math later in this article. You've been forewarned.
Edit: MattS with an excellent comment on why most of what I did was wrong. That said I still believe the concept itself is interesting (others do too apparently) and the math is all correct, so I still think it's worth a read. Just don't treat the final results as anything applicable to real baseball.
The full count has always been my favorite count, mostly because the next pitch (if not fouled off) has to determine the outcome of the PA. My favorite base state is when the bases are loaded since there's no open base for the pitcher to just walk the batter. What happens when you combine those two and then toss up two outs on the board to boot? Besides a boatload of insanity, I'm not really sure, but I think we can try using some game theory to tell us what should be happening.
What's Game Theory?
One of the things that has always interested me in life is game theory and its applications. Wikipedia defines everything much better than I can, but basically game theory boils down to making the best choice given the decisions other players have made. Military, business, poker, even relationship (what girl doesn't like a date's plans being broken down into a normal form game?) decision makers use (or they should) game theory to derive at the optimal decision. Baseball shouldn't be much different.
Is this Even Practical?
Well, it depends on what your definition of "is" is. Naturally, we're going to run into some problems. This game I'm conducting assumes pitchers have total control over where they are placing the ball, when they probably do not. In fact, there's likely a good amount of selection bias towards pitchers with less control since we're looking at a 3 ball count to begin with. I also assume the batter doesn't know if the ball is in or out of the strike zone when he decides to swing at the pitch. This is probably most of the time really, but there are certainly times where the batter does know that the incoming pitch is out of the zone, like the one hit by pitch that occurred in this situation last year. But in practice a lot of game theory games don't play out like they should anyway, yet it's still useful to look at the theoretical outcomes.
(Aside: I'm not sure if any pitchers consciously do anything like this, but I wouldn't be surprised if they did. Most have probably heard of Brian Bannister's sabermetric tilt, but I would love to have an interview with Greg Maddux to see if he did anything like this. Greg, if you're reading this, my email is in the link on my name.)
Alright, I'll play your game. How do we set it up?
Remember, we're only dealing with the 2 out, bases loaded case in this article. The run expectancy of this state per BP's 2008 RE Chart is .799, which I'm gonna call .8 for easier calculations. I feel like this is an even "simpler" game than the full count in general, since if the pitcher throws a ball in this case he's going to allow a run, which he almost certainly doesn't want to do. But that doesn't necessarily mean he shouldn't ever throw the ball outside the strike zone either.
Huh? you may be thinking after that last sentence. Why would a pitcher ever want to intentionally throw the ball out of the strike zone in that situation? The answer is because the batter thinks the pitcher would never throw a ball out of the zone in that situation, so he's more likely to swing at any pitch, in which case he's a lot less likely to be able to do damage on a pitch out of the zone (I assume, haven't actually seen some in zone/out of zone slugging charts yet). This is the essence of game theory, using what your opponent thinks you're going to do to your advantage. Hopefully this chart will make things a bit more clear:
I think the extended game is pretty easy to follow for the most part. The dotted line connecting the batter nodes represents the fact that he does not know whether the ball will be in or out of the zone when he decides to swing. The numbers at the end of each line represents the payout, or the value to the pitcher of each outcome. Since .8 runs are expected to score given this state, I set the value of a strikeout at +.8 for the pitcher. If the pitcher walks the batter its worth -.2, which is the .8 expected runs minus the 1 run that scores from a walk, .8 - 1 = -.2. I think those are fairly straightforward, but if people have a problem with that setup let me know in the comments.
The one issue which is definitely up for debate is how to value a ball in play, both on pitches in and out of the zone. I'm counting a HR as a ball in play for this scenario rather than draw a separate node, but I think we can all agree a HR is worth -3.2. Since the runners are almost always running on a 3-2, 2 out pich, for simplicity I think we can say a single is worth -1.2 (2 runners score on a single) while doubles and triples are worth -2.2. And like a strikeout, an out on a ball in play should be worth +.8. The problem arises in assigning probability to each of these outcomes, since the contact node is really just a summation of these 5 possible outcomes. Thinking about it more, I guess there's 6 possible outcomes if you count foul balls. I'm just going to value them at 0 for this exercise since they don't directly decide the outcome of the event, though I wouldn't be surprised if they're worth some tiny negative amount since they might be more likely to lead to a negative event in the future (negative to the pitcher).
But back to the issue at hand, assigning a probability to each ball in play event. The best way to do this would be a historic look at all 3-2, 2 out, full counts to see the probability of each event happening. I unfortunately don't know how to do that, if the data is even out there. I only have last year's data to work with regarding pitch fx, and there were only about 300 pitches of so thrown in this scenario, which is way too small a sample to try and extrapolate from. The easiest way would be to use the averages from last year overall, but I think that detracts from the whole purpose of looking at the special event, as I'm pretty sure pitchers react much different in this situation than in general. So as the compromise I'm just going to use the outcomes from last year when the bases were loaded, regardless of outs. It's easily available on B-Ref, a cursory glance at tOPS+ says it correlates heavily year to year, and I think the 2 out distinction isn't so huge to set it apart from the bases loaded situation in general. If anything I think the lack of the full count stipulation is the biggest effect, but whatever I've droned on long enough on this let's move on.
So using last year's percentages, we conclude that a ball in play is worth -0.07 runs ([776*-1.2]+[300*-2.2]+[124*-3.2]+[2180*0.8]) / 3380 = -0.07219. Now there's the other issue of how much to change that value for a ball in or out of the zone. I guess first it should be asked whether the location of the ball even has an effect on the outcome. Intuitively I think it does, in that a ball in the zone is more likely to turn into a hit since it's easier to make solid contact on a ball in the zone than out of it. How much so though, I have no idea, and thus I'm not sure how to adjust the runs for in and out of the zone. So for this example I'm going to leave them the same, though I'm not thrilled about it. If anyone has seen any studies on in/out of zone hit effect let me know, it's something I'll probably look at more later.
Solving the Game
Alright, now we're moving on to the fun part; figuring out the optimal strategy for the game. In reality, the percentages you apply to each outcome should vary by batter, I think Jack Cust and Pablo Sandoval would treat 3-2 counts (assuming Sandoval ever sees one) very differently. Really, the same should be done for each possible hit outcome, so the "contact" node should be like 5 different branches coming from it, but I'm too lazy to bother with it right now. So we're just going to be using averages, since that's what we've been using up to this point anyway.
The medians for batters that qualified for the batting title last year per Fangraphs is about 89% contact in the zone and 62% on pitches out of the zone. Plugging these percentages into our chart, we can calculate the value of swings for the batter in and out of the zone. It turns out a swing in the zone is worth .0257 runs, and a swing out of zone is worth .2606 runs. That's not a typo, if my calculations were correct (which I think they were, we can argue about the basis of them, which I still have problems with, later) then a swing on a pitch out of the zone is worth ten times as much to a pitcher than a swing in it. Granted we're working with fractions of a run here, but that's still pretty interesting.
But the game isn't through yet, far from it. Next we need to figure out how often the batter needs to swing to make the pitcher indifferent between throwing a pitch in or out of the zone. Again, this is assuming the batter doesn't know if the ball is going to be in the zone or not when he has to decide to swing. This is probably the case some of the time, and probably more often in a 3-2 count than a lot of other counts (a batter knows a 3-0 pitch is going to be in the zone the vast majority of the time for example). This is a simple multivariable linear equation, where:
.0257x + .8y = .2606x -.2y
x + y = 1
where x = % of the time swinging and y = % of the time taking.
Solving we get y = .2349x and y = 1-x
1-x = .2349x
x = 1/1.2349 ~ .8098. Thus the batter should be swinging about 81% of the time and taking the pitch 19% of the time, in the Nash equilibrium of our game.
Meanwhile, the pitcher should know all this, and it should affect how often he throws in the zone. This multivariable linear equation looks like:
.0257a + .2606b = .8a - .2b
a+b = 1
where a = % of pitches in the zone, and b = % of pitches out of zone
Solving again we get y = 1.682x and y = 1-x
1-x = 1.681x
x = 1/2.681 ~.373. Thus the pitcher should be throwing the pitch in the zone about 37.3% of the time, and out of the zone 63.7%.
Putting all this together, the Nash Equilibrium of our game results in the value of .173. It's been about a year since my last game theory course but I think I did it all right, someone please correct me if the math seems off. Here's a picture of what a solved game looks like:
That's the proper strategy? I would have never thought so.
That's one of the most surprising things about game theory; the results can often be quite counter-intuitive. You'd be surprised what solving a game shows, the Prisoner's Dilemma being the most common case of "Hey why don't you do it the other way it's better for both of you?"
While those are the results of the game, and they should be correct, the basis is still up for plenty of debate. I'm definitely not completely satisfied with either the run values for contact for each branch, or the percentages derived for each. And it doesn't take into account things like pitch type, which I think is pretty important in this situation. You can sub in whatever numbers you feel are correct, but the basic construction of the game should still stand.
What's it like in the real world?
I was quite interested to see what pitchers did in 2008 in this situation, so I consulted the handy pitch fx database. It's only a small sample of a couple hundred pitches (446 to be exact) in the 2 out, bases loaded, full count situation, but the results were quite interesting:
Pitches in zone: 278, or 63.3% Pitches out of zone: 168, or 37.7%.
Almost the exact same numbers in our game, except the zones are reversed.
Pitches swung at: 324, or 72.6% Pitches taken: 122, or 27.3%.
These number are much closer to the ones in our game than the pitchers are.
And to shed some light on the fact that batters can tell pitch location probably a lot better then I accounted for:
Pitches swung at in zone: 86.7% Pitches swung at out of zone: 49.4%
And here's the discipline chart:
Well I mean look at some of those balls out of the zone, pretty sure even I wouldn't have swung at those. To be fair, it would only because I'd already be in the fetal position in the batters box by the time the ball got to the plate.