A couple weeks back, fellow sabermetrician (And former Beyond the Box Score Managing Editor) Sky Kalkman posted the following tweet:
I'm going to put all the research/article ideas that I post to twitter in this one Tumblr article. Please steal them. http://t.co/KtmGxr6r5U— Sky Kalkman (@Sky_Kalkman) May 7, 2014
One of the ideas that attracted the most interest was a hitter discipline model. As he noted, currently things are mostly of a binary nature: Taking a ball/swinging at a strike is good, taking a strike/swinging at a ball is bad. This of course does not take into account the probability that any player in general swings at the pitch, how valuable swinging at that pitch would be, and many other factors.
So that's what we want to look at. How can we incorporate all this into a new, more comprehensive, plate discipline statistic? Below I will combine three factors; the probability of swinging, the probability of a called strike, and the value of the pitch; into one metric that attempts to paint a more nuanced picture of plate discipline.
There are a lot of numbers and complicated math over the next few sections. Feel free to skip around to whatever you want to read.
Probability of Swing
First of all, we want to look at the probability that any player in general swings at the pitch. If a player takes a ball that 40% of batters would swing at, this shows more discipline than taking a ball that 10% of batters would swing at. Currently, merely saying not swinging at a ball is good leaves out this aspect of plate discipline.
While I would prefer a fully probabilistic model to estimate this probability, more specifically a nonparametric regression spline or Gaussian process regression, the dataset size (Over 700,000 pitches from 2013) and Bernoulli response (Swing/No Swing) make this impossible. Or at least impracticable until I can borrow some high performance computing equipment.
So instead, I had to improvise, using a much less probabilistic/more deterministic algorithm. I decided to use a weighted K-nearest neighbors approach to estimate this probability, with the weights being determined by a Gaussian kernel. This at least has some connection to Gaussian process regression and splines using Gaussian kernels, even if the connection is not perfect.
The next question comes in the form of what variables to include to determine whether a batter would swing or not. These variables will be used to either (1) split the dataset or (2) determine the weights in the K-nearest neighbors algorithm. The variables that go into group (1) can't be too many, otherwise the splits within the dataset will become too fine.
It seems clear that pitch location should be a major component of this. In addition, the batter side (For inside/outside purposes), pitch type (Limited to "Hard" and "Breaking" pitches), and ball-strike count should be considered. There are of course other factors that can easily be considered: pitch speed, pitch movement, spin rate, pitcher throwing side, base-out-inning-score state, etc. However these were removed for one of two reasons: they either did not appreciably change the probability estimates and/or cut the data subsets too fine.
So the data was broken up into 48 subsets defined by the 2 pitcher throwing sides, 2 groups of pitches, and 12 counts. Some datasets wound out being relatively small, 3-0 breaking pitches against LHB had only around 250 occurrences out of the 700,000, and those ones have larger potential errors in estimation. It's also worth noting that there is some slight loss of information by subsetting the data, but it was unfortunately necessary to run the algorithm and again not my first choice. But it still can give pretty good estimates of probability, as we can see from the heatmaps below.
The heatmap on the left is the probability of swinging for a 0-0 hard pitch by a left-handed batter at the given locations. On the right is 0-2 hard pitches by left-handed batters. Darker colors (Reds) indicate higher probability of swinging than light colors (Yellows), This reflects conventional wisdom, as batters are more likely to take first pitches and tend to expand the zone and swing more against 0-2 counts.
These probability estimates can be enlightening about plate discipline on their own. If just limiting views to swinging at strikes/taking balls is good, one can get a version of discipline above league average from these probabilities. It can be viewed as a cumulative or per pitch basis, but I'm going to look at the cumulative. The main component would be broken down to batter decision (0/1 for "correct" decision based on pitch location) minus overall probability estimate of the "correct" decision. This way, an "incorrect" decision will result in a negative number, while an "incorrect" decision that everyone makes (Taking a fastball down the middle on a 0-0 count) won't be as punished as much as an "incorrect" decision no one makes (Taking that same fastball in an 0-2 count).
So, looking at that, we can get the top 10 and bottom 10 of discipline based on this version of the statistic (Out of 253 batters who saw at least 1,200 pitches). And let's just say there are a few surprising names on each list.
Finding Votto at the top of a plate discipline list is about as surprising as...well...what's more surprising? A Mark Reynolds season over 150 strikeouts (Or over 30% K% for that matter)? You get the idea. But Adam Dunn and Dan Uggla? Yeah...
It's clear something is missing. Well, a few things actually. To start with, let's move on to...
Probability of Called Strike
Above, we were considering merely balls and strikes according to location. Of course, umpires aren't perfect. Some calls will get missed, and this needs to be accounted for. In the above calculation, a ball 1/4" outside zone gets classified a ball, while one 1/4" inside the zone classified a strike. However, it's reasonably likely that the probability that both of these two pitches are called a strike would be similar.
Again, a fully probabilistic model is preferred, but again, dataset size limits what is possible. So again, Gaussian kernel-weighted K-nearest neighbors is employed. The variables chosen to determine data subsetting and kernel weights are the same. It's true that this effectively leaves out pitcher/umpire/catcher effects, but this would (1) start to cut the data too fine, and also (2) the goal is to establish a league wide baseline probability of called strike, and this baseline is calculated by essentially averaging over these effects. If one assumes that the pitcher/umpire/catcher effects average out over the season, this is less of a concern to begin with anyway.
Interestingly, this league wide baseline does not change as much with count as expected, while batter side seems to affect things a bit more (Mostly from an inside/outside pitch perspective I expect). Below is one heatmap of estimated called strike probability, in this case for 0-0 hard pitches to left handed batters.
Here, you can see a reasonably defined rectangular strike zone, albeit with slightly rounded corners (Implying more missed calls on the four corners of the strike zone, which is not necessarily surprising).
So we've now estimated this probability. But before we incorporate it into the statistic, we need to gather a few other things...
Value of a Pitch
There are many ways to value a pitch. This is just my version of it. To begin, let's look at the wOBA of each count. To do that, I calculate the wOBA over any at bat that reached a certain count. In other words, say a batter got a single on an 0-2 pitch. That single will count in the calculations for the 0-0, 0-1, and 0-2 counts. The table below gives the breakdown and wOBA for the 12 counts.
|Count||Intent Walk||Walk||Hit By Pitch||Single||Double||Triple||Home Run||Outs||wOBA|
Here, we can clearly see the traditional hitter's and pitcher's counts. With this information, we can get the amount of change between counts. However, we not only want to compare the new count versus the old count (For swings), we really want to compare to what the count could have been (Given the pitch is taken). In other words, for a "correctly" taken first pitch ball, we want to compare the new 1-0 count to the possible 0-1 count if the batter had "incorrectly" swung. Below is the numbers for each one of the possible old count-new count combinations.
|Old Count||New Count||wOBA Change||Change Compared to Alternate Count|
Of course, the largest difference, and therefore most important time for discipline, is in the 3-2 count. Outside of that, the most important counts are the 2-2, 3-1, and 1-2 counts. In the final statistic, these counts will get the most weight. Specifically, the weights for each count are given below.
Now because this weighting is wOBA-based, it is context-neutral (Similar to WAR, another wOBA-based stat). It would be decidedly possible to create a context-dependent version of this through run expectancy data by count, base state, etc. However, in this setting, one would need to account for the probabilities of runners advancing. But that will be another stat for another day.
Putting It All Together
Here's where things get a little tricky, and also a little subjective. Why subjective? Because here's where you have to define what you mean by plate discipline. I'm essentially removing at bat results from the final plate discipline statistic. Why? Because otherwise you'd have Miguel Cabrera (say) as having the best plate discipline. Not necessarily because he has the best plate discipline, but because he does the most with the pitches he sees.
I personally separate the two. Take the following two hitters. Hitter A swings at everything, and does an above average amount of damage on the average. Hitter B makes "perfect" decisions about when to swing, but does below average damage on every swing. I'd call Hitter A the better hitter, but Hitter B the more disciplined. So that's how I define plate discipline: making the correct decision, regardless of the result of that decision.
So now we need to combine all this mess together. All three parts, the probability of a swing, the probability of a called strike, and the count value will be included in the final number. As before, we want to reward correct decisions, but now, instead of a hard and fast rule based on location, we can factor in the probability of "correctness." In the end, the whole cumulative DiscAA statistic becomes (Those with an aversion to equations should avert their eyes immediately)
Okay, all this mess to say, we get a weighted DiscAA statistic that is context-neutral. The concept can be adjusted to be context-dependent, although that is not discussed here. Finally, below we give the top 10 and bottom 10 plate discipline guys for 2013. Again, as above, there are a few surprises.
Last of all, I link the full table of the 253 qualified (More than 1,200 pitches seen in the dataset) players and the DiscAA.
. . .
Data scraped from MLBAM XML files.