Filed under:

Pitcher Similarity Scores

How do we determine the similarity of two pitchers in terms of their arsenals? We develop a Pitcher Similarity score PSi,j to enable these comparisons.

When players are drafted or are coming up through the minors, we'll often hear, "Man, that kid reminds me of _______ ." If you're lucky enough to be Mike Trout, you get compared to Willie Mays and Mickey Mantle. If you're not so lucky, you get compared to Joe Schlabotnik.

The point is that we like comparisons. We like to be able to say that someone is similar to someone else. If we can put a number on that similarity, that can make the comparison even more interpretable. And of course, we like things to be interpretable.

This is where pitchers come in. Ideally, we'd like to be able to compare pitchers based not on their results, but on the pitches they throw. How similar are Craig Kimbrel and Aroldis Chapman's fastballs? How about the two pitchers as a whole package when looking at their arsenal?

How to Compare Pitches?

We'll start at the pitch level. That is, how similar is Pitcher A's fastball to Pitcher B? Various people will have opinions about what is important in this comparison, but after consultation with the other Beyond the Box Score writers, I've decided to include the following aspects:

1. Pitch Velocity
2. Pitch Break (Horizontally and Vertically)
3. Pitch Locations
4. Pitch Release Point

Now, the question of comparison comes around. How do we compare each of these aspects? I'll concentrate on items 1 and 2 first. As a comparative exercise, look at the two plots below. The left plot has two distributions with the same mean, while the right has two different means.

Which set of two distributions is more similar? The one on the right, correct? Now, this is a contrived example, but it illustrates a point. We need to compare more than just means. Now, if we were willing to assume that the data is all normally distributed, we could also compare variance. However, I do not want to make that assumption.

So, in order to compare the entire distribution in one number, we first construct the empirical distribution function of the value. The empirical distribution is basically the number of observations that are less than a value a. Or, in math terms,

FE(a) = ∑i I{xi≤ a}/n

From this, to compare the distributions, we look at the largest difference between the two distributions of interest. That is,

Di,j = supx |FE(i)(x) - FE(j)(x)|

For those versed in statistics, you'll recognize this as the Kolmogorov-Smirnov test statistic. This is entirely intentional. Now, back to the plots of distributions above. If we drew samples from those distributions an infinite number of times, we'd expect to see a Di,j of 0.242 for the plot on the left, and 0.080 for the right. A visual example of this is given below. So, as the distributions become more and more different, the value of Di,j will approach 1.

This Di,j will get applied to components 1 and 2 on the list (With component 2 having both horizontal and vertical break).

A potential question could be the use of horizontal and vertical break in the difference equation. The other option would be to use the PITCHf/x value for each pitch. Now, this could be fine, except we when want to distinguish between, say, a 12-6 and 1.5-7.5 curveball. Looking at PITCHf/x alone might not illuminate that difference, whereas horizontal and vertical breaks would show some difference between the pitches.

Comparing Locations

In order to compare pitch locations, we have to place the pitches into groups by location. In this case, we'll group it into 17 groups, 9 within the strike zone and 8 outside. Then we'll calculate a distance metric based on these 17 groups. This turns out to be Bray-Curtis distance and can be calculated as

Li,j = ∑l |%i,l - %j,l|/2

Where l are the 17 grouping locations and %i,l is the percentage of pitches thrown in that location. This is again bounded between [0,1], so it's still on the same scale as the previous metrics. As an example, we have two "strike zones" that were divided into 4 grouping locations. In this case, our Li,j is

(|0.3 - 0.25| + |0.5 - 0.25| + |0.15 - 0.25| + |0.05 - 0.25|)/2 = 0.3

Comparing Release Points

The final segment of the difference between pitches is the difference in release points. This is a more difficult question to go into, as the release point varies based on pitcher's height. So, instead of looking at release point, I'm going to look at arm slot angle.

So, to calculate the angle, we need to first account for the minimum height that we'll see in the data. We'd see these in sidearm pitchers, and this needs to be subtracted off from the release point height. So, we'll look at the height of release above the "Eckersley Line.", which is set to be at 2.85 feet.

Now all we have to do is remember geometry class, so our angle is

θ = arctan[(Ry - Eck)/ |Rx| ]

Where Ry is the release point height and Rx is the release point horizontal distance from center.

Now, instead of a two-dimensional distribution to look at, the arm slot angle is one-dimensional. Now, as before, we can use the Kolmogorov-Smirnov statistic to look at the difference in distribution. In this way, 3/4 arm slots will differ highly from overhand which will differ highly from sidearm.

Dealing With Batter Handedness

Now, obviously, pitchers may change velocity, break, location, or even delivery point based on whether they are facing a right-handed or left-handed batter. So, we have to do all the calculations for components for both types of hitters. Yes, it increases the computational load, but it should yield a more accurate comparison.

So, for the pitch comparison score, we'll combine the component scores for each handedness in a weighted average. In this case, the weights will be the percentage of batters faced from each side, combined for both pitchers. So, say, two pitchers combined to face 70% right-handed batters. The component score for velocity will be

DVel = 0.7 DVel,RHB + (1 - 0.7) DVel,LHB

and all the other component scores will be calculated in the same way.

Combining Everything

Now, one clear question remains related to all components. What happens when a pitcher doesn't throw a specified pitch? How can we compare these pitches? Well, we can't. There's no data to do so. In those instances, we set the overall pitch score to 1. It might seem like this makes comparing pitchers entire arsenal impossible, but we'll address dealing with these values after we finish comparing individual pitches.

Otherwise, if both pitchers throw the pitch, we'll calculate all the components as defined above. Finally, the pitch score will be

pi,j(k) = 0.25 DVel + 0.25 DHoriz + 0.25 DVert + 0.15 Dθ + 0.1 Loc

which is just a weighted average of the components after they've been adjusted for batter handedness.

How Many Pitches Are Needed To Compare?

This is an interesting question. Obviously, more is better in this case. The more pitches we see, the closer our empirical distributions get to their true distributions. The more pitches thrown, the more of the pitcher's arsenal we'll get to see.

So what would a minimum number of pitches needed look like? I'd say that I wouldn't start comparing until we see a minimum of 1,000 pitches from a pitcher. That would be roughly 10 starts or a full season for a reliever. We could conceivably use only 100 pitches (Or comparing one start versus another), but it would be more difficult to get solid comparisons with such a small number of pitches.

Going From Pitches to Pitchers

Now, we have similarity scores for each individual pitch. From this, we want to compute a similarity score for pitchers across all pitches. A first thought would be a sum or average of individual pitch similarities, but this has one slight problem. You'll remember that Di,j(k)=1 for pitches that the two pitchers do not share. This would potentially inflate the similarity values artificially.

A better alternative would be to weight the individual pitch similarity scores based on each pitch's usage rate. That way, if one pitcher throws 1 slider over the whole season and is compared to a pitcher who throws no sliders, the slider's contribution to the entire similarity score will be minute at best. So, in the end, the entire score will be

PDi,j = ∑k πi,j(k) pi,j(k)

Where k are the types of pitches and πi,j(k) is calculated to be the average pitch usage between the two pitchers.

Now, because we're dealing with maximal distances in the empirical distributions, the raw PDi,j does not quite cover [0,1]. So, we scale and shift it by a scale factor of 0.77 and a shift factor of 0.1925 to get our final PDi,j. Note that these constants are determined exclusively from 2012 data, so until the metric is tuned with future data, it's possible that we could inadvertently have scores outside the [0,1] interval.

Technically, PDi,j is a measure of pitcher dissimilarity, so we take 1 - PDi,j to get our Pitcher Similarity score PSi,j

So we now have the similarity score for comparing two pitchers. It ranges from [0,1], with the more similar pitchers being closer to 1, and dissimilar pitchers having values closer to 0. The Pitcher Similarity score has the following interpretation, in terms of values of PSi,j and how similar the pitchers actually are.

PSi,j Interpretation
0.8 - 1.0 Extremely Similar
0.6 - 0.8 Reasonably Similar
0.5 - 0.6 Somewhat Similar
0.4 - 0.5 More Different than Similar
0.2 - 0.4 Mostly Different
0.0 - 0.2 Entirely Different

To give you an idea about how these scores are distributed, here's a histogram of all the pitcher similarity scores from 2012. There's a total of 32,640 comparisons among 256 pitchers included in the histogram.

I also want to note that the way this has been set up allows for direct comparisons of right-handed and left-handed pitchers. So, if a right-handed and left-handed pitcher have a strong similarity score, this implies that they essentially are mirror images of each other.

Most Similar Pitchers of 2012

So, now that we've gone through all this math, who were the most similar pitchers of 2012? I'm not doing similarity for 2013 at this point because there hasn't been enough games and therefore pitches thrown. So, back to 2012, the sample was limited to a set of 256 pitchers who threw at least 1,000 pitches over the course of the season. From there, the whole of the similarity indices were calculated. In the end, the most similar pitchers were (Drumroll Please)...

Anibal Sanchez & Anthony Bass

Briefly, I'll compare the arsenal through the individual pitch scores. They shared 3 pitches: Fastball, Changeup, and Slider. Sanchez also throws a curveball, but only 9.7% of the time, so that extra bit accounted for almost half of their pitcher differential PDi,j. The pitch raw dissimilarities pi,j(k) for the three in common pitches were 0.147, 0.249, and 0.198. Once we convert this to the pitcher dissimilarity score, shift and scale it, and then convert it to the Pitch Similarity Score PSi,j of 0.965 for Bass and Sanchez.

Concluding Notes

I hope I didn't lose too many people with all the math above. I included all of the math to let everyone see inside the black box that is my rationale behind the metric. I tried to make all my explanations as clear as possible. Finally, here is the Pitcher Similarity Scores for all the pitchers in the 2012 sample.

Spreadsheet of all Pitcher Similarity Scores from the 2012 Sample

I'd like to thank all the writers here at Beyond the Box Score for their input on this piece, particularly Max Weinstein. Also, I'd especially like to thank my Virginia Tech colleague K.C. Kubli for letting my bounce several ideas off him and checking my mathematical reasoning.