clock menu more-arrow no yes mobile

Filed under:

Home Run Similarity Scores

Applying previously established similarity techniques to home run data to find which players hit home runs in a similar manner.

Mark Trumbo is Everything Wily Mo Pena Wishes He Could Be
Mark Trumbo is Everything Wily Mo Pena Wishes He Could Be
Elsa

Introduction

There has been a lot of previous discussion about pitcher similarity, and rightfully so. It is extremely interesting to compare two hurlers and find those with the same stuff but varying results. I had always wanted to get involved in this area of sabermetrics, but don't have much to add with my minimal understanding of statistical techniques. However, I can apply others' techniques to different ideas.

I have been studying the ESPN Home Run Tracker data for a while now and the next logical step is to compare how different hitters hit home runs. Wily Mo Pena's home runs look a lot different from Chone Figgins'. Is it possible to create similarity scores for home runs? The answer may surprise you!*

*If you're bad at guessing.

Method

I'm going to be perfectly honest: this method was all Stephen Loftus, so please read his article on pitcher similarity scores to understand more about how these scores were calculated. The idea and the data were mine (well the data is Greg Rybarczyk's or ESPN's), but none of this would be possible without Stephen's help. He literally wrote the code for me and debugged everything when I ran into problems.

This was all done in R using a K-S test.


I decided to evenly weight standard distance, speed off bat, apex, and horizontal exit angle to compare each hitter with at least 20 total home runs since 2006. The only data I had to change a bit was the exit angle, since handedness was an issue. I wanted extreme lefty and righty pull hitters to be similar to each other. In order to do this, I shifted the exit angle from a 45 to 135 scale to a -45 to 45 scale. Then I made all pull home runs positive and all oppo tacos negative

When comparing hitters in the results, I will show the average totals for these four factors -- however, in the actual analysis I used the data points for every home run, not just averages.

Results

Excel spreadsheet with all similarity scores for hitters with more than 20 home runs since 2006

I got the idea to use classical multidimensional scaling (MDS) to visually show the results from Dan Brooks. Stephen used non-metric MDS (NMDS) in his article. From what I understand, MDS minimizes the stress between different objects based on their similarities, such that things that are further away from each other are more different than things that are closer together.

20mds_medium

There are a lot of players in the middle -- so many that it is difficult to read most of them. My favorite part of this is the location of Wily Mo Pena. If you have been following my home run analysis at all, you will know that he is such an outlier, Malcom Gladwell could write a whole book about him. Here are how his four factors compare to those of Mark Trumbo (his most similar), Curtis Granderson, Derek Jeter, Chone Figgins, and John McDonald (his least similar):

Hitter Avg Std. Distance Average Speed Off Bat Average Apex Average Horizontal Angle Sim To Pena
Pena, Wily Mo 417 109.0 84 11.6 1.0000
Trumbo, Mark 411 107.1 89 12.2 0.7996
Granderson, Curtis 388 102.6 89 16.3 0.442
Jeter, Derek 389 102.1 82 -1.6 0.4118
Figgins, Chone 371 100.0 79 23.2 0.0846
McDonald, John 370 100.3 80 25.5 0.066

Blake DeWitt is the least similar player to everyone. His highest sim score is to Mike Lowell at 0.7480. Only 22 of 442 players have a highest sim score below 0.8.

Since that chart was so jumbled, I increased the minimum to 50 home runs since 2006:

50mds_medium

Let's compare Joe Mauer to Jeter, Carlos Beltran, Freddie Freeman, Marco Scutaro, and DeWitt.

Hitter Avg Std. Distance Average Speed Off Bat Average Apex Average Horizontal Angle Sim To Mauer
Mauer, Joe 392 102.4 81 -3.6 1.0000
Jeter, Derek 389 102.1 82 -1.6 0.8748
Beltran, Carlos 396 104.0 88 16.2 0.6025
Freeman, Freddie 409 104.9 89 6.4 0.4950
Scutaro, Marco 376 100.3 89 22.1 0.3777
DeWitt, Blake 376 100.0 85 22.0 0.3066

There is still a large part of the middle that is difficult to read, so how about a minimum of 100 Home Runs?

100mds_medium

Let's compare Justin Upton to Trumbo, Jose Bautista, Dan Uggla, Jimmy Rollins, and Figgins:

Hitter Avg Std. Distance Average Speed Off Bat Average Apex Average Horizontal Angle Sim To Upton
Upton, Justin 416 107.1 90 8.9 1.0000
Trumbo, Mark 411 107.1 89 12.2 0.8750
Bautista, Jose 402 106.2 89 18.7 0.7043
Uggla, Dan 397 103.9 90 15.6 0.6516
Rollins, Jimmy 380 101.3 84 20.8 0.3424
Figgins, Chone 371 100.0 79 23.2 0.0854

I included this last one for fun. This is minimum 150 home runs, the top home run hitters of the last half decade.

150mds_medium

Let's compare David Ortiz to Torii Hunter, Matt Holliday, Mike Napoli, Paul Konerko, Matt Kemp, and Figgins:

Hitter Avg Std. Distance Average Speed Off Bat Average Apex Average Horizontal Angle Sim To Ortiz
Ortiz, David 401 104.7 87 10.9 1.0000
Hunter, Torii 405 105.2 87 10.8 0.9611
Holliday, Matt 408 105.8 85 5.6 0.8522
Napoli, Mike 401 103.9 97 6.3 0.8237
Konerko, Paul 388 102.9 90 16.6 0.7447
Kemp, Matt 403 103.2 93 3.0 0.7401
Figgins, Chone 371 100.0 79 23.2 0.2281

As you may see, a lot of the top home run hitters are similar to each other. Here are the top 10 home run hitters since 2006 and their similarities to Albert Pujols:

Hitter Avg Std. Distance Average Speed Off Bat Average Apex Average Horizontal Angle Sim To Pujols
Albert Pujols 405 105.6 87 12.6 1.0000
David Ortiz 401 104.7 87 10.9 0.9326
Prince Fielder 407 105.6 88 10.0 0.9258
Alex Rodriguez 407 105.6 89 8.3 0.9079
Miguel Cabrera 401 105.3 86 7.0 0.8590
Adam Dunn 407 105.5 98 9.8 0.8540
Dan Uggla 397 103.9 90 15.6 0.8174
Mark Teixeira 395 104.2 89 16.3 0.8172
Ryan Howard 401 103.8 92 -0.8 0.7487
Paul Konerko 388 102.9 90 16.6 0.6922

Howard and Konerko are the only ones to drop below a similarity of 0.8, but most of the well-established thumpers do their damage in a similar way.

Finally, which hitters are the most similar at the 20 minimum home run threshold? Paul Konerko is Curtis Granderson's chu-chi face (turn your speakers to full blast before clicking that link) with a 0.9762 similarity score.

Hitter Avg Std. Distance Average Speed Off Bat Average Apex Average Horizontal Angle
Granderson, Curtis 388 102.6 89 16.3
Konerko, Paul 388 102.9 90 16.6

Which two players are the least similar? Phil Nevin and Chone Figgins at a minuscule 0.0238:

Hitter Avg Std. Distance Average Speed Off Bat Average Apex Average Horizontal Angle
Figgins, Chone 371 100.0 79 23.2
Nevin, Phil 406 104.1 93 7.2

Summary

15 players have hit at least 200 home runs since 2006, and their averages for the four factors were 400 feet standard distance, 104.4 mph speed off bat, 89 foot apex and 10.0 horizontal exit angle. In contrast, the total averages for those for factors are 395 feet standard distance, 103.5 mph speed off bat, 87 foot apex and 13.0 horizontal exit angle. It appears that the top home run hitters hit the ball farther, faster, higher, and a bit more to center than average. None of this is surprising, but it does serve as a reminder that different hitters hit their home runs in different ways.

This idea could be fine-tuned, playing with the weights a bit and perhaps throwing elevation angle into the mix. I would also like to see hitter similarity scores, perhaps using hit locations and hit type. Regardless, I hope you enjoyed this look at the different types of home run hitters and look for these things while watching the game.