Daily Box Score 9/28: Pitch Selection and Game Theory
The last time I wrote about game theory, I discussed the strategy involved in bunting for a hit. One surprising observation was that the payoffs for bunting for a hit and swinging away appear not to be equalized. The reason this was surprising was because if the payoffs were not equal, we expect a rational decision maker always to choose the one with the higher payoff unless and until the payoffs equalize.
Surely this surprising bit of non-rationality in baseball is limited to the rare case of bunting for a hit? No, and don't call me Shirley.
Table of Contents
The Example
The Evaluation
The Caveat
Discussion Question of the Day
Today's example comes courtesy of a paper written by Kenneth Kovash and Steven Levitt, entitled "Professionals Do Not Play Minimax: Evidence From the National Football League and Major League Baseball." I'll get around to explaining what the heck minimax is, but let's start with the example.
(I should also add that the paper itself, because of idiotic rules regarding the freedom of academic work and the journals in which they are published, is not freely available. For more information how you can help change this knowledge-limiting fact of law, see here.)
First, we begin with the assumption that pitchers have more than one pitch type. Next, we observe that pitchers have differing run expectancy values for different types of pitches. I'm sure Harry could explain this much better than I, but let's just for the moment use Zack Greinke's 2009 as an example (and oh what an example it is). Per Fangraphs:
Runs above average per 100 pitches for each pitch type:
Fastball: 1.31
Slider: 2.86
Curveball: 0.25
Change up: -0.87
These values are relative to the count and the type of event, but do not correct for defense. Additionally, they use the pitch classification from BIS, which I understand has its limitations. You might argue that we should regress these results to the mean if we're going to use them to make a decision on what pitch to throw next, and that's probably true too.
In any event, these values should give you an idea that the run expectancy of a pitcher's pitches is not the same for each pitch.
So the question is, why doesn't Greinke throw more sliders? If they're worth more than twice as many runs on a per-pitch basis, he should clearly be throwing them more, right? And the fact that the change up has a negative run expectancy indicates he should throw fewer, doesn't it?
The game theory reasoning would go something like this. We have a game where batters largely have to decide before the pitch is even thrown which pitch to "look for." So the decisions of the participants are independent but the outcomes depend on the decisions of both actors. Classic game theory scenario.
Under this structure, it appears that either: (1) Greinke's slider is so good that even when hitters expect it, they can't do much with it, or (2) they don't look for it very often. In either case, it would be better if Greinke threw more sliders. Certainly, it would reduce the strength of (2), most likely reducing the run expectancy of his slider. But it would raise the run expectancy of his other pitches in just about equal measure.
Thus, we ought to expect an equilibrium where all of Greinke's pitches have similar (if not identical) run expectancies. And yet, the data are as plain to you as they are to me: his slider is by far better at preventing runs. What gives?
Kovash and Levitt concern themselves with the expectations we should have of batter-pitcher interactions if they were governed by what is called a "minimax" theory. Minimax is a mixed strategy solution to a two-player, zero-sum game, which is essentially what we have with batter-pitcher strategies. The goal of a minimax theory, theorized by mathematician John von Neumann, is to minimize the maximum possible losses, thus creating a stable equilibrium between the two players. (The minimax solution, in games that are zero-sum, is equivalent to the Nash equilibrium.)
In their paper, Kovash and Levitt tackle several questions, including the one posed above. But they also wonder whether pitch selection is completely unpredictable. A sprinkling of baseball traditionalism will tell you that it is not: pitchers rarely throw curveballs in 3-0 counts. And their findings support this observation:
If the pitcher threw a fastball on the last pitch, all else equal, it lowers the likelihood this pitch will be a fastball by 4.1 percentage points. [...] If the last pitch was a slider, the likelihood that this current pitch is a slider falls by two percentage points, or twenty percent.
So pitchers do not throw their pitches in random sequence, which means that (at least in theory) batters can exploit patterns.
But what about the fact that run expectancies are not equal, regardless of sequencing?
If a pitching staff were able to reduce the share of fastballs thrown by 10 percentage points while maintaining the observed OPS gap on fastballs, this would reduce the number of runs allowed by roughly 15 per season, or two percent of a team’s total runs allowed. Because of behavioral responses by batters, this is likely to be an upper bound on the cost of teams throwing too many fastballs.
Now, we ought to not expect the OPS gap to persist even as teams threw fewer fastballs (throwing fewer ought to increase their effectiveness), but nevertheless the finding that pitchers throw too many fastballs is attention-grabbing.
(If you'd like to get really angry about the fact that Kovash and Levitt used OPS in their analysis instead of the more reasonable choice of linear weights, Tom Tango has got you covered.)
But Phil Birnbaum has a very interesting bone to pick with Kovash and Levitt. As he puts it:
How can you tell, using game theory, whether fastballs are being overused? Simple: you just check the outcomes. [...] But it's not that simple: as soon as the opposition realizes that you're not throwing fastballs, they'll be able to predict your pitches more accurately [...]. Game theory can't tell you the right proportion, at least not without having to make assumptions that would probably be wrong. But it *can* tell you that you should adjust your strategy until the OPS-after-fastball is exactly equal to the OPS-after-non-fastball.
If that's what the Kovash/Levitt study did, it would be great. But it didn't. Instead, it did something that doesn't make sense, and makes almost all its conclusions invalid.
What did it do? It considered outcomes only for pitches that ended the "at bat". (The authors say "at bat", but I think they mean "plate appearance". I'll also use "at bat" to mean "plate appearance" for consistency with the paper.)
Kovash and Levitt aren't quite as unaware of the problems Phil underscores as he makes them out to be. From their paper:
If there are no spillovers across pitches, there should be no difference in outcomes across pitch types if the pitch does not end the at bat. To the extent, however, that fastballs are slightly more likely to generate strikes than non-fastballs, throwing a fastball may provide some benefit to the pitcher when the at-bat does not end with the current pitch.
But Birnbaum's point remains, and I do not have a good explanation for why Kovash and Levitt use OPS nor for why they ignore outcomes that do not end the plate appearance.
Certainly, if we are to give any reason for why Leo Mazzone was such a good pitching coach (other than the astronomically good luck of having Maddux, Glavine and Smoltz under your tutelage), it was because he got his pitchers to "pitch off" their fastballs.
Nevertheless, even if we reran the regression with the proper changes, as Phil suggests, I'm confident we'd find differentials in the run expectancies of various pitch types. And that just doesn't seem to jibe with what game theory tells us ought to happen.
One possible explanation is that pitchers and pitching coaches are not aware of the differentials. Much of game theory, and certainly minimax strategies, rely on each party knowing the payouts for the various choices of both parties. And that just doesn't seem to be the case in professional baseball. While teams study tape and attempt to exploit weaknesses, it does not appear that many (if any) teams have payout matrices on index cards (like Earl Weaver used to keep his stats).
Discussion Question of the Day
Do you think that this line of inquiry could bear fruit for major league teams? Ought they do game theoretic analysis and provide it to their pitching coaches? Or are the assumptions necessary to construct a game like this too attenuated to produce any real world benefits?
15 comments
|
1 recs |
Do you like this story?
Comments
This statement is wrong
Thus, we ought to expect an equilibrium where all of Greinke’s pitches have similar (if not identical) run expectancies. And yet, the data are as plain to you as they are to me: his slider is by far better at preventing runs.
We expect an equilibrium where the marginal value of Greinke’s slider equals the marginal value of his FB. It’s almost certain that an extra slider would be <2.86 runs above average (decreasing returns to scale, if he threw one slider a game it almost certainly would have an insane value and if he threw 100% sliders it would get hammered) so that isn’t the number to use, I don’t have a clue how to measure the marginal value of pitch substitution but this analysis isn’t right.
Not afraid to nitpick
I take run expectancy to be the value of the next one thrown
Which is the same as the marginal value. I’m sorry if that wasn’t clear.
by Tommy Bennett on Sep 28, 2009 8:54 PM EDT up reply actions
The run expectancy on fangraphs is definitely the average run value
And the average run value is obviously not the marginal value.
Not afraid to nitpick
You might argue that we should regress these results to the mean if we’re going to use them to make a decision on what pitch to throw next, and that’s probably true too.
Again, I apologize if I wasn’t clear.
by Tommy Bennett on Sep 28, 2009 9:02 PM EDT up reply actions
Huh?
The ‘2.86 runs above average’ is the average value of Zack Greinke’s sliders (per 100 pitches). Regress it to the mean or not, it’s still the average value. This very much is not the marginal value—-the value of his slider is going to change as he varies its frequency:
If Greinke were to throw 1% sliders, the average value of his slider would be off the charts. It wouldn’t ever get hit. It’d be like 5-6 runs above average per/100 pitches. (The Tim Wakefield fastball corollary)
If Greinke were to throw 100% sliders, it would get destroyed. The average value would be below the charts. It’d be like 5 runs below average per/100 pitches. The value of throwing an additional slider will not be the same as the average of the previous ones.
The marginal value curve for sliders might have much steeper slope than that of fastballs due to familiarity with its break thus the substituted extra slider very well could have a lower run value than that of the fastball he threw. For example, since hitters have seen Greinke’s fastball 58 times already and are familiar with it, the 59th fastball might only be .5 runs above average, but since hitters have already seen (even though they’ve gotten killed by) his slider 20 times it is no longer a 2.86 run pitch, the 21st could be .4 runs above average. In this case you clearly wouldn’t want to substitute another slider yet the average value of the slider would be much higher than the fastball.

It’s very possible this isn’t the case and that the marginal value of slider #21 > fastball #59, but using the average values is wrong.
Not afraid to nitpick
Three things
1) I don’t disagree with your analysis above. I think your marginal run value graph may be a useful way to conceptualize the problem. I’m impressed that you took the time to make it.
2) I don’t think I ever “used” average value, except to give the example with Greinke. After that, I shifted to run expectancy, which is different from the “Pitch Type Values” offered on Fangraphs (and as I note above is the same as marginal value). I don’t mention average value again in the article.
3) The reason I used the average values from Fangraphs is because it is very difficult indeed to calculate the marginal values. If you’ve got a method for determining such a value, I’m all ears. In the meantime, I think average value suffices to show that pitchers do not, in fact, choose an optimal mixed strategy, especially when the average values are so wildly divergent.
by Tommy Bennett on Sep 28, 2009 10:30 PM EDT up reply actions
I understand
2) The entire Greinke part refers to the fangraphs pitch type values which are averages. (“If they’re worth more than twice as many runs on a per-pitch basis, he should clearly be throwing them more, right? And the fact that the change up has a negative run expectancy indicates he should throw fewer, doesn’t it?” etc)
3) But that’s my whole point, I really don’t think average value is close to marginal value. I don’t have any idea how you would measure marginal value because we can’t toy with these percentages and see the deltas but that doesn’t mean we can just assume it’s anywhere near the average.
Greinke IMO would be a better pitcher throwing 100% fastballs than 100% sliders (Greinke would be a really bad pitcher throwing all fastballs but he’d be worthless throwing all sliders) which if true imples that adjusting to the slider happens faster i.e. the marginal slopes are very different. That’s where I’m coming from.
Other stuff:
If the pitcher threw a fastball on the last pitch, all else equal, it lowers the likelihood this pitch will be a fastball by 4.1 percentage points. […] If the last pitch was a slider, the likelihood that this current pitch is a slider falls by two percentage points, or twenty percent.
So pitchers do not throw their pitches in random sequence, which means that (at least in theory) batters can exploit patterns.
This also ignores that seeing a fastball literally changes the value of the following slider i.e. the fastball is a “set up” pitch. The changeup studies have conclusively proved pitch sequencing changes the following marginal values so it seems likely there would be a sequencing effect for sliders, so the changes in pitch type probability in those sequences very well may be accounting for this fact.
Not afraid to nitpick
This also ignores that seeing a fastball literally changes the value of the following slider i.e. the fastball is a "set up" pitch. The changeup studies have conclusively proved pitch sequencing changes the following marginal values so it seems likely there would be a sequencing effect for sliders, so the changes in pitch type probability in those sequences very well may be accounting for this fact.
I think you may truly have something here, as minimax theory supposes the independence of iterative choices (put another way, the stability of the zero-sum payouts). It would seem to be a pretty serious defect in the Kovash/Levitt paper.
by Tommy Bennett on Sep 28, 2009 11:25 PM EDT up reply actions
3) But that’s my whole point, I really don’t think average value is close to marginal value. I don’t have any idea how you would measure marginal value because we can’t toy with these percentages and see the deltas but that doesn’t mean we can just assume it’s anywhere near the average.
Let’s try it recursively. On the first pitch of the game, the average payout is subject to whatever the batter’s expectations are. Assuming good scouting, they should be looking for the pitch thrown most frequently (usually fastball). This is rarely the highest average value pitch, since it is used so frequently.
The fact that the batter is looking for a fastball, in turn, establishes the value of the secondary pitches. So we’re back to a situation that looks very similar to the average values.
Going forward from this starting point, the pitcher ought to mix increasingly more secondary pitches until their marginal value meets the marginal value of the second best offering (which is the economic cost of throwing a given pitch). This stabilization ought to happen relatively quickly, because the pitcher wants to reap the benefit of the differing marginal values.
Once we reach equilibrium, each successive pitch creates microscopic deviations from the equilibrium, but in general an unpredictable (somewhat randomized, though perhaps sensitive to your comment about sequencing) mixed strategy would be best.
So basically, depending on how quickly you think a pitcher can reach the equilibrium, I believe varying average values reflect inefficiencies. Accordingly, the larger the differing average values, the larger the difference in marginal values. The correspondence may not be linear (and thus your point about my “more than twice” comment is probably correct), but I believe it is a positive correlation.
No?
by Tommy Bennett on Sep 28, 2009 11:36 PM EDT up reply actions
There's so much to consider I can't even wrap my head around it
This is worse than when I was in Amsterdam discussing the possibility that robots will eventually produce everything (including other robots) and we won’t need money. Except I’m not even uhhhhh “happy”.
I think we need a lot more information to make anything definitive so I’m not even gonna try to reason my through this. Sequencing data would help a lot. Some thoughts though
-The value of a breaking ball is going to shift just by the count. I don’t know the numbers but traditionally it’s assumed you can throw more strikes with the fastball, but get more chases/swing and misses on breaking stuff at the cost of control.
For example, on an 0-0 count: going 1-0 is murder relative to 0-1. Plus 0-0 contact isn’t the worst thing in the world anyway so it seems pretty obvious that 0-0 is a fastball count.
Also wouldn’t those run value averages be affected by the breaking ball usage with 2 strikes when a hitter is much more likely to chase?
Not afraid to nitpick
Fangraphs has average and total values.
These were average used.
Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.
by Jeff Zimmerman on Sep 28, 2009 9:09 PM EDT up reply actions
Also
I would expect that the average value would be a good approximation for the slope of the expectancy curve given current usage patterns. That means that the marginal value of the next pitch thrown (but not the tenth pitch thrown) would be close to the average value.
It would not, as you say, be the same if you threw ten in a row.
by Tommy Bennett on Sep 28, 2009 9:14 PM EDT up reply actions
I have gone ahead and looked at pitchers to find ones with similar Run Values across their pitches
I got all the information from Frangraphs and took all the qualified pitchers from 2009.
Here is the list.
I used the w[PitchType]/C values and removed all pitch values where a pitcher threw the pitch less than 3% of the time. I should regress all the values to instead, but right now I don’t have the time to figure out that value, so I will go with this method.
I then took the standard deviation of the remaining pitches (far left values on the spreadsheet). The two best pitchers for pitching the same weight of pitches are:
Jair Jurrjens
wFB/C 0.66
wSL/C 0.86
wCH/C 0.55
Standard Deviation 0.16
FB% 62.1%
SL% 14.6%
CH% 23.4%
Yovani Gallardo
wFB/C 0.26
wSL/C 0.31
wCT/C
wCB/C 0.95
wCH/C 0.59
Standard Deviation 0.32
FB% 60.2%
SL% 10.0%
CB% 22.5%
CH% 7.2%
Maybe more to come as I dive a little deeper.
Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.
by Jeff Zimmerman on Sep 29, 2009 12:21 AM EDT reply actions
This discussion is just fascinating.
I love reading about game theory.
"Of course Kolby Rasmus was going deep! That’s what Kolby Rasmus does! You don’t give Kolby Rasmus second chances!" -Kolby Rasmus

by 


















