clock menu more-arrow no yes mobile

Filed under:

Testing and Visualizing Similarity Scores

After introducing Pitcher Similarity Scores, we look to test the accuracy of the metric. In addition, a visualization is provided to allow for some comparison across more than two pitchers.

Leon Halip

In my piece last week, I went through the process of computing pitcher similarity scores. I received several comments on the process, including various ways to test the effectiveness of the technique. One such test was to take a pitcher, divide his season into two halves, and compare the halves to get their similarity score.

So I decided to do this test to see how the method did. Thanks to James Gentile and Alex Kienholz, I got the full season data on Oakland Athletics pitcher Jarrod Parker. From there, I divided his season into two parts: before and after July 3rd. Then the similarity scores were computed as described before.

So, before I go into more detail about each pitch, I'll say that Parker's two halves had a raw PDi,j of 0.181321. This would shift-scale into a negative adjusted PDi,j and a Pitcher Similarity Score greater than 1. So it would seem that the similarity score appears to be doing a reasonable job.

Okay, on to Parker's specific pitches. The PITCHf/x database had Parker throwing a total of 8 different types of pitches, but I'm only going to focus on the 4 that were commonly seen in both halves: four-seam fastball, two-seam fastball, changeup, and slider.

The four-seam fastball saw a pitch similarity of 0.8334179. This is extremely high, as the highest seen in the 256 pitcher study was around 0.77. The two-seamer had a score of 0.7938409, the changeup had 0.8593643, and the slider 0.794151. Once again, all very high.

Now, you may think that raw scores around 0.8 are lower than expected on the [0,1] scale, but they really aren't. Recall, these scores are based around the largest difference in empirical distributions. So these raw scores will be worse than, say, the average difference. However, the average difference can be much more time-consuming to calculate. Plus the scores are shifted and scaled to put them on a more intuitive scale.

Now, my PITCHf/x data that I previously had didn't have any game date identifiers, otherwise I would've run this exact procedure for the pitchers concerned. However, I did run a similar test on 50 of the pitchers. I took half of their games that I had (I did have identifiers to distinguish one game from another, but not ordered by date or anything) for the pitcher and compared that to the other half of games. Without going into the details, I can say that the raw scores ranged from 0.79 to 0.88 (Rounded off), so all the shifted scaled scores were very close to 1 or higher. Further, each pitcher's "Most Similar Pitcher" for the 1st half would have been their 2nd half. So, it seems that the scores are able to accurately pair pitchers with themselves.

Visualizing Similarity Scores

Finally, I wanted to include some sort of visualization for the similarity scores. Since some people are visual learners, it might make the comparison a little more understandable. A commenter on the previous article did a dendrogram of the scores, which can be highly useful.

I decided to go with a visualization method called Non-metric Multidimensional Scaling. Given a distance matrix, NMDS plots the pitchers in the 2D plane while trying to retain the same distance between pitchers. Now, the Similarity Scores can be converted to a distance matrix by taking 1-PSi,j. Finally, the NMDS was run until convergence (That took a little while), and the results were plotted. You'll have to zoom in a bit to read the names, but otherwise the names overlapped too much.

NMDS Visualization of Pitcher Similarity Scores

You'll probably notice that certain pitchers might seem a little closer than expected. Specifically, R.A. Dickey seems further from Josh Collmenter than Jose Arredondo despite Arredondo being Dickey's worst comparison. This is entirely possible in NMDS. While multidimensional scaling tries to preserve the distance between pitchers, it isn't entirely able to do so. So you may have those switches in closeness between groups of pitchers who have similar scores.

Really, this visualization is best used when trying to group more than two pitchers together. Otherwise, if just trying to compare two pitchers, the regular similarity score works best. Regardless, this visualization can give an idea about grouping across all pitchers, which may make things easier to understand.