## Home Run Similarity Scores

Mark Trumbo is Everything Wily Mo Pena Wishes He Could Be - Elsa

Applying previously established similarity techniques to home run data to find which players hit home runs in a similar manner.

### Introduction

There has been a lot of previous discussion about pitcher similarity, and rightfully so. It is extremely interesting to compare two hurlers and find those with the same stuff but varying results. I had always wanted to get involved in this area of sabermetrics, but don't have much to add with my minimal understanding of statistical techniques. However, I can apply others' techniques to different ideas.

I have been studying the ESPN Home Run Tracker data for a while now and the next logical step is to compare how different hitters hit home runs. Wily Mo Pena's home runs look a lot different from Chone Figgins'. Is it possible to create similarity scores for home runs? The answer may surprise you!*

### Method

I'm going to be perfectly honest: this method was all Stephen Loftus, so please read his article on pitcher similarity scores to understand more about how these scores were calculated. The idea and the data were mine (well the data is Greg Rybarczyk's or ESPN's), but none of this would be possible without Stephen's help. He literally wrote the code for me and debugged everything when I ran into problems.

This was all done in R using a K-S test.

I decided to evenly weight standard distance, speed off bat, apex, and horizontal exit angle to compare each hitter with at least 20 total home runs since 2006. The only data I had to change a bit was the exit angle, since handedness was an issue. I wanted extreme lefty and righty pull hitters to be similar to each other. In order to do this, I shifted the exit angle from a 45 to 135 scale to a -45 to 45 scale. Then I made all pull home runs positive and all oppo tacos negative

When comparing hitters in the results, I will show the average totals for these four factors -- however, in the actual analysis I used the data points for every home run, not just averages.

### Results

#### Excel spreadsheet with all similarity scores for hitters with more than 20 home runs since 2006

I got the idea to use classical multidimensional scaling (MDS) to visually show the results from Dan Brooks. Stephen used non-metric MDS (NMDS) in his article. From what I understand, MDS minimizes the stress between different objects based on their similarities, such that things that are further away from each other are more different than things that are closer together.

There are a lot of players in the middle -- so many that it is difficult to read most of them. My favorite part of this is the location of Wily Mo Pena. If you have been following my home run analysis at all, you will know that he is such an outlier, Malcom Gladwell could write a whole book about him. Here are how his four factors compare to those of Mark Trumbo (his most similar), Curtis Granderson, Derek Jeter, Chone Figgins, and John McDonald (his least similar):

 Hitter Avg Std. Distance Average Speed Off Bat Average Apex Average Horizontal Angle Sim To Pena Pena, Wily Mo 417 109.0 84 11.6 1.0000 Trumbo, Mark 411 107.1 89 12.2 0.7996 Granderson, Curtis 388 102.6 89 16.3 0.442 Jeter, Derek 389 102.1 82 -1.6 0.4118 Figgins, Chone 371 100.0 79 23.2 0.0846 McDonald, John 370 100.3 80 25.5 0.066

Blake DeWitt is the least similar player to everyone. His highest sim score is to Mike Lowell at 0.7480. Only 22 of 442 players have a highest sim score below 0.8.

Since that chart was so jumbled, I increased the minimum to 50 home runs since 2006:

Let's compare Joe Mauer to Jeter, Carlos Beltran, Freddie Freeman, Marco Scutaro, and DeWitt.

 Hitter Avg Std. Distance Average Speed Off Bat Average Apex Average Horizontal Angle Sim To Mauer Mauer, Joe 392 102.4 81 -3.6 1.0000 Jeter, Derek 389 102.1 82 -1.6 0.8748 Beltran, Carlos 396 104.0 88 16.2 0.6025 Freeman, Freddie 409 104.9 89 6.4 0.4950 Scutaro, Marco 376 100.3 89 22.1 0.3777 DeWitt, Blake 376 100.0 85 22.0 0.3066

There is still a large part of the middle that is difficult to read, so how about a minimum of 100 Home Runs?

Let's compare Justin Upton to Trumbo, Jose Bautista, Dan Uggla, Jimmy Rollins, and Figgins:

 Hitter Avg Std. Distance Average Speed Off Bat Average Apex Average Horizontal Angle Sim To Upton Upton, Justin 416 107.1 90 8.9 1.0000 Trumbo, Mark 411 107.1 89 12.2 0.8750 Bautista, Jose 402 106.2 89 18.7 0.7043 Uggla, Dan 397 103.9 90 15.6 0.6516 Rollins, Jimmy 380 101.3 84 20.8 0.3424 Figgins, Chone 371 100.0 79 23.2 0.0854

I included this last one for fun. This is minimum 150 home runs, the top home run hitters of the last half decade.

Let's compare David Ortiz to Torii Hunter, Matt Holliday, Mike Napoli, Paul Konerko, Matt Kemp, and Figgins:

 Hitter Avg Std. Distance Average Speed Off Bat Average Apex Average Horizontal Angle Sim To Ortiz Ortiz, David 401 104.7 87 10.9 1.0000 Hunter, Torii 405 105.2 87 10.8 0.9611 Holliday, Matt 408 105.8 85 5.6 0.8522 Napoli, Mike 401 103.9 97 6.3 0.8237 Konerko, Paul 388 102.9 90 16.6 0.7447 Kemp, Matt 403 103.2 93 3.0 0.7401 Figgins, Chone 371 100.0 79 23.2 0.2281

As you may see, a lot of the top home run hitters are similar to each other. Here are the top 10 home run hitters since 2006 and their similarities to Albert Pujols:

 Hitter Avg Std. Distance Average Speed Off Bat Average Apex Average Horizontal Angle Sim To Pujols Albert Pujols 405 105.6 87 12.6 1.0000 David Ortiz 401 104.7 87 10.9 0.9326 Prince Fielder 407 105.6 88 10.0 0.9258 Alex Rodriguez 407 105.6 89 8.3 0.9079 Miguel Cabrera 401 105.3 86 7.0 0.8590 Adam Dunn 407 105.5 98 9.8 0.8540 Dan Uggla 397 103.9 90 15.6 0.8174 Mark Teixeira 395 104.2 89 16.3 0.8172 Ryan Howard 401 103.8 92 -0.8 0.7487 Paul Konerko 388 102.9 90 16.6 0.6922

Howard and Konerko are the only ones to drop below a similarity of 0.8, but most of the well-established thumpers do their damage in a similar way.

Finally, which hitters are the most similar at the 20 minimum home run threshold? Paul Konerko is Curtis Granderson's chu-chi face (turn your speakers to full blast before clicking that link) with a 0.9762 similarity score.

 Hitter Avg Std. Distance Average Speed Off Bat Average Apex Average Horizontal Angle Granderson, Curtis 388 102.6 89 16.3 Konerko, Paul 388 102.9 90 16.6

Which two players are the least similar? Phil Nevin and Chone Figgins at a minuscule 0.0238:

 Hitter Avg Std. Distance Average Speed Off Bat Average Apex Average Horizontal Angle Figgins, Chone 371 100.0 79 23.2 Nevin, Phil 406 104.1 93 7.2

### Summary

15 players have hit at least 200 home runs since 2006, and their averages for the four factors were 400 feet standard distance, 104.4 mph speed off bat, 89 foot apex and 10.0 horizontal exit angle. In contrast, the total averages for those for factors are 395 feet standard distance, 103.5 mph speed off bat, 87 foot apex and 13.0 horizontal exit angle. It appears that the top home run hitters hit the ball farther, faster, higher, and a bit more to center than average. None of this is surprising, but it does serve as a reminder that different hitters hit their home runs in different ways.

This idea could be fine-tuned, playing with the weights a bit and perhaps throwing elevation angle into the mix. I would also like to see hitter similarity scores, perhaps using hit locations and hit type. Regardless, I hope you enjoyed this look at the different types of home run hitters and look for these things while watching the game.

## Trending Discussions

forgot?

As part of the new SB Nation launch, prior users will need to choose a permanent username, along with a new password.

I already have a Vox Media account!

### Verify Vox Media account

As part of the new SB Nation launch, prior MT authors will need to choose a new username and password.

We'll email you a reset link.

Try another email?

### Almost done,

By becoming a registered user, you are also agreeing to our Terms and confirming that you have read our Privacy Policy.

### Join Beyond the Box Score

You must be a member of Beyond the Box Score to participate.

We have our own Community Guidelines at Beyond the Box Score. You should read them.

### Join Beyond the Box Score

You must be a member of Beyond the Box Score to participate.

We have our own Community Guidelines at Beyond the Box Score. You should read them.