/cdn.vox-cdn.com/uploads/chorus_image/image/15115649/gyi0064603057.0.jpg)
Introduction
There has been a lot of previous discussion about pitcher similarity, and rightfully so. It is extremely interesting to compare two hurlers and find those with the same stuff but varying results. I had always wanted to get involved in this area of sabermetrics, but don't have much to add with my minimal understanding of statistical techniques. However, I can apply others' techniques to different ideas.
I have been studying the ESPN Home Run Tracker data for a while now and the next logical step is to compare how different hitters hit home runs. Wily Mo Pena's home runs look a lot different from Chone Figgins'. Is it possible to create similarity scores for home runs? The answer may surprise you!*
*If you're bad at guessing.
Method
I'm going to be perfectly honest: this method was all Stephen Loftus, so please read his article on pitcher similarity scores to understand more about how these scores were calculated. The idea and the data were mine (well the data is Greg Rybarczyk's or ESPN's), but none of this would be possible without Stephen's help. He literally wrote the code for me and debugged everything when I ran into problems.
This was all done in R using a K-S test.
I decided to evenly weight standard distance, speed off bat, apex, and horizontal exit angle to compare each hitter with at least 20 total home runs since 2006. The only data I had to change a bit was the exit angle, since handedness was an issue. I wanted extreme lefty and righty pull hitters to be similar to each other. In order to do this, I shifted the exit angle from a 45 to 135 scale to a -45 to 45 scale. Then I made all pull home runs positive and all oppo tacos negative
When comparing hitters in the results, I will show the average totals for these four factors -- however, in the actual analysis I used the data points for every home run, not just averages.
Results
Excel spreadsheet with all similarity scores for hitters with more than 20 home runs since 2006
I got the idea to use classical multidimensional scaling (MDS) to visually show the results from Dan Brooks. Stephen used non-metric MDS (NMDS) in his article. From what I understand, MDS minimizes the stress between different objects based on their similarities, such that things that are further away from each other are more different than things that are closer together.
There are a lot of players in the middle -- so many that it is difficult to read most of them. My favorite part of this is the location of Wily Mo Pena. If you have been following my home run analysis at all, you will know that he is such an outlier, Malcom Gladwell could write a whole book about him. Here are how his four factors compare to those of Mark Trumbo (his most similar), Curtis Granderson, Derek Jeter, Chone Figgins, and John McDonald (his least similar):
Hitter | Avg Std. Distance | Average Speed Off Bat | Average Apex | Average Horizontal Angle | Sim To Pena |
Pena, Wily Mo | 417 | 109.0 | 84 | 11.6 | 1.0000 |
Trumbo, Mark | 411 | 107.1 | 89 | 12.2 | 0.7996 |
Granderson, Curtis | 388 | 102.6 | 89 | 16.3 | 0.442 |
Jeter, Derek | 389 | 102.1 | 82 | -1.6 | 0.4118 |
Figgins, Chone | 371 | 100.0 | 79 | 23.2 | 0.0846 |
McDonald, John | 370 | 100.3 | 80 | 25.5 | 0.066 |
Blake DeWitt is the least similar player to everyone. His highest sim score is to Mike Lowell at 0.7480. Only 22 of 442 players have a highest sim score below 0.8.
Since that chart was so jumbled, I increased the minimum to 50 home runs since 2006:
Let's compare Joe Mauer to Jeter, Carlos Beltran, Freddie Freeman, Marco Scutaro, and DeWitt.
Hitter | Avg Std. Distance | Average Speed Off Bat | Average Apex | Average Horizontal Angle | Sim To Mauer |
Mauer, Joe | 392 | 102.4 | 81 | -3.6 | 1.0000 |
Jeter, Derek | 389 | 102.1 | 82 | -1.6 | 0.8748 |
Beltran, Carlos | 396 | 104.0 | 88 | 16.2 | 0.6025 |
Freeman, Freddie | 409 | 104.9 | 89 | 6.4 | 0.4950 |
Scutaro, Marco | 376 | 100.3 | 89 | 22.1 | 0.3777 |
DeWitt, Blake | 376 | 100.0 | 85 | 22.0 | 0.3066 |
There is still a large part of the middle that is difficult to read, so how about a minimum of 100 Home Runs?
Let's compare Justin Upton to Trumbo, Jose Bautista, Dan Uggla, Jimmy Rollins, and Figgins:
Hitter | Avg Std. Distance | Average Speed Off Bat | Average Apex | Average Horizontal Angle | Sim To Upton |
Upton, Justin | 416 | 107.1 | 90 | 8.9 | 1.0000 |
Trumbo, Mark | 411 | 107.1 | 89 | 12.2 | 0.8750 |
Bautista, Jose | 402 | 106.2 | 89 | 18.7 | 0.7043 |
Uggla, Dan | 397 | 103.9 | 90 | 15.6 | 0.6516 |
Rollins, Jimmy | 380 | 101.3 | 84 | 20.8 | 0.3424 |
Figgins, Chone | 371 | 100.0 | 79 | 23.2 | 0.0854 |
I included this last one for fun. This is minimum 150 home runs, the top home run hitters of the last half decade.
Let's compare David Ortiz to Torii Hunter, Matt Holliday, Mike Napoli, Paul Konerko, Matt Kemp, and Figgins:
Hitter | Avg Std. Distance | Average Speed Off Bat | Average Apex | Average Horizontal Angle | Sim To Ortiz |
Ortiz, David | 401 | 104.7 | 87 | 10.9 | 1.0000 |
Hunter, Torii | 405 | 105.2 | 87 | 10.8 | 0.9611 |
Holliday, Matt | 408 | 105.8 | 85 | 5.6 | 0.8522 |
Napoli, Mike | 401 | 103.9 | 97 | 6.3 | 0.8237 |
Konerko, Paul | 388 | 102.9 | 90 | 16.6 | 0.7447 |
Kemp, Matt | 403 | 103.2 | 93 | 3.0 | 0.7401 |
Figgins, Chone | 371 | 100.0 | 79 | 23.2 | 0.2281 |
As you may see, a lot of the top home run hitters are similar to each other. Here are the top 10 home run hitters since 2006 and their similarities to Albert Pujols:
Hitter | Avg Std. Distance | Average Speed Off Bat | Average Apex | Average Horizontal Angle | Sim To Pujols |
Albert Pujols | 405 | 105.6 | 87 | 12.6 | 1.0000 |
David Ortiz | 401 | 104.7 | 87 | 10.9 | 0.9326 |
Prince Fielder | 407 | 105.6 | 88 | 10.0 | 0.9258 |
Alex Rodriguez | 407 | 105.6 | 89 | 8.3 | 0.9079 |
Miguel Cabrera | 401 | 105.3 | 86 | 7.0 | 0.8590 |
Adam Dunn | 407 | 105.5 | 98 | 9.8 | 0.8540 |
Dan Uggla | 397 | 103.9 | 90 | 15.6 | 0.8174 |
Mark Teixeira | 395 | 104.2 | 89 | 16.3 | 0.8172 |
Ryan Howard | 401 | 103.8 | 92 | -0.8 | 0.7487 |
Paul Konerko | 388 | 102.9 | 90 | 16.6 | 0.6922 |
Howard and Konerko are the only ones to drop below a similarity of 0.8, but most of the well-established thumpers do their damage in a similar way.
Finally, which hitters are the most similar at the 20 minimum home run threshold? Paul Konerko is Curtis Granderson's chu-chi face (turn your speakers to full blast before clicking that link) with a 0.9762 similarity score.
Hitter | Avg Std. Distance | Average Speed Off Bat | Average Apex | Average Horizontal Angle |
Granderson, Curtis | 388 | 102.6 | 89 | 16.3 |
Konerko, Paul | 388 | 102.9 | 90 | 16.6 |
Which two players are the least similar? Phil Nevin and Chone Figgins at a minuscule 0.0238:
Hitter | Avg Std. Distance | Average Speed Off Bat | Average Apex | Average Horizontal Angle |
Figgins, Chone | 371 | 100.0 | 79 | 23.2 |
Nevin, Phil | 406 | 104.1 | 93 | 7.2 |
Summary
15 players have hit at least 200 home runs since 2006, and their averages for the four factors were 400 feet standard distance, 104.4 mph speed off bat, 89 foot apex and 10.0 horizontal exit angle. In contrast, the total averages for those for factors are 395 feet standard distance, 103.5 mph speed off bat, 87 foot apex and 13.0 horizontal exit angle. It appears that the top home run hitters hit the ball farther, faster, higher, and a bit more to center than average. None of this is surprising, but it does serve as a reminder that different hitters hit their home runs in different ways.
This idea could be fine-tuned, playing with the weights a bit and perhaps throwing elevation angle into the mix. I would also like to see hitter similarity scores, perhaps using hit locations and hit type. Regardless, I hope you enjoyed this look at the different types of home run hitters and look for these things while watching the game.