It is the sovereign right of every sports fan to question--loudly, if need be--the decisions of those who are lucky enough to coach professional teams. For the same reason I have little sympathy for celebrities who complain about the paparazzi even as they build their brands, I have little sympathy for managers whose bad decisions are second-guessed.
Without a doubt, the peak sofa-managing month is October. With seasons hanging in the balance, every pinch runner, relief pitcher, and bunt call is intensely scrutinized. With quantitative methods sufficiently developed, you'd think the answers would be as obvious to managers as they are to you and me.
And yet, the second-guessing continues.
Table of Contents
The foundation of much of the second guessing that happens among quantitatively minded baseball fans these days is FanGraphs. During each game, including the postseason, they update charts displaying the win probability of each team over the course of the game. Here's an example of an exciting one:
What I find amazing about these graphs is the degree to which they resemble an EKG of an Angels or Yankee fan over the course of a game. It's irregular, spiky, and ends either up or down. At the end of the game, one team is pronounced dead.
But it also gives a clear sense of the ups and downs of the game. If a manager calls for a sacrifice bunt and it is successful, the decreased win expectancy will materialize immediately on the live graph. Alternatively, if the bases are loaded with no outs, but a team has a five run lead, the win probability may still reflect the fact that the team with the lead is the overwhelming favorite. It almost takes the fun out of it. (I said almost.)
And now these graphs are everywhere! You can now purchase a FanGraphs application for your phone. Who's in his mother's basement now, HUH?!
But before we get too carried away, it's important to remember the limitations of these graphs. First, they are based only on the run scoring environment, the score, the base-out position, and the inning. That's it.
No consideration is given to the strength of the teams, let alone of individual players. Not only are the overall talents of individual players not accounted for, neither are the shapes of their talents. Over hundreds of games, these talents cancel out. But in a single playoff game, these differences make all the, well, difference.
So when the win expectancy goes down after a sacrifice bunt, what have we really learned? Anything more than the mostly uncontroversial observation that outs are bad?
Let's consider another example.
The USS Mariner warmachine rolls along with its newest offering: a $2.99 application, also in the iPhone app store, that explicitly promises to help you second-guess the decisions of managers in real-time. Derek Zumsteg writes of his new creation:
There’s also the managerial view, which allows you to game out whether it’s a good idea to steal, bunt, or intentionally walk the hitter in the current situation
You’ve probably seen me rant about this kind of in-game stuff here on USSM over and over, citing Tango’s Inside the Book, massive studies on when it makes sense to bunt, or how crazy it is to intentionally walk batters in most situations where it’s considered normal.
I really think using run expectations and WPA are key to understanding effective in-game strategies, and I hope that in offering a really easy way to experiment with tactics, especially as you follow a game, it’ll make all of this more relevant and understandable.
Now, in general, I'm all for bringing WPA to the masses. But, as they say, a little knowledge is a dangerous thing.
I must admit that I have not plunked down the cold, electronic cash for 2nd Guesser, as the app is known. But it does not appear that this app makes any allowances for the skill sets of individual players either. One thing it does do is allow you to adjust the break-even percentage for stolen base attempts, but even that appears to be a relatively unguided choice, delinked from the overall run-scoring environment.
But it really can't be stressed enough how much it matters how strong the players in question are. Here's an example, the numbers for which I have borrowed from The Book.
Imagine you are Joe Girardi. It is Game 4 of the World Series, and it is the bottom of the third inning. Having been chased from the game, C.C. Sabathia sits on the bench while Chad Gaudin stands on the mound (just play along). The Yankees are trailing by five runs, 6-1. There is currently a runner (Jayson Werth) on second base. There are two outs. Raul Ibanez is about to bat; Pedro Feliz stands in the on-deck circle.
What, if anything, should you call for? I'll give you a moment to think about it.
I am guessing your first reaction is to say, "do nothing and let Gaudin pitch."
But this would be the wrong answer! In this situation, an intentional walk will actually INCREASE win probability as long as the ratio of the wOBA of the batter at the plate to that of the batter in the on-deck circle is at least 1.25. Ibanez (.379 this year) is that much better than Feliz (.302 this year, for a ratio of 1.255) even before you consider the platoon advantage.
So Girardi should signal for Gaudin to walk Ibanez to get to Feliz. But I'm confident two things are true: 1) Girardi would almost never call for an intentional walk in this situation, and 2) every stat-minded fan would be talking about it for a week if he did.
If you're at all like me, I know what you're thinking. We can refine the methodology, so the application scrapes data in real time about the individual players and then inputs it into the program. We could do Markov chains 'til the cows come home; we could compare individual players to similar historical players. But at each step along the way, we are introducing assumptions into the analysis.
The eagle-eyed among you probably noticed one methodological flaw with my hypothetical scenario outlined above: I assumed that a player's seasonal wOBA was a good predictor of his wOBA in the next at-bat. And I did it without justification!
All models require simplifying assumptions. Those assumptions are what make them simple, and therefore useful. But if we pass along the models from one person to the next, we tend to forget the assumptions, which convey information about a model's weakness, and focus only on its predictive strengths.
(Perhaps you've heard about this problem?)
The only way to use a model effectively is to familiarize yourself with its strengths and weaknesses. It's something worth bearing in mind as quantitative analysis gains traction with a wider audience.
How accurate of a picture of in-game probabilities do live WPA charts give? Are there specific ways we could improve them? Structural limitations that preclude perfection?