clock menu more-arrow no yes mobile

Filed under:

Experimenting With Clustering - Offense

 

This post originated out of me asking myself, "Self, if you were going to delve into the world of projecting offense, how would you go about it?"  My answer was that I’d take a basic Marcels approach and add in some additional regression/weighting based on batted ball (plus a little extra) profiles.  That approach would require me to bin players based on batted ball profiles, so I immediately thought of k-means clustering using R.  The rest of this post is my brief exploration of batted ball profile clustering.

Using Fangraph’s 2009 stats (filtered to just the qualifiers) I created clusters based on the following sets of statistics.  

 

LD GB FB IFF
LD GB FB IFF HR
LD HR BB
HR BB K
GB FB ISO SPD
BB K

 

IFF = In Field Fly, HR = HR/FB%

The full lists of clusters can be found here, and I’ll discuss some of the things I found interesting after the jump

 

Not surprisingly the sets of stats that included some version of walk rate did better (anecdotally at least) at clustering the good (based on wOBA) hitters from the bad hitters, but if one wishes to just look at the physical batted ball profiles, then adding in HR/FB weeds out some of the noise. I found it mildly amusing that if you only look at batted ball types (excluding HF/FB) that Yuniesky Betancourt and Albert Pujols fall in the same cluster. The set of clusters I decided to focus on were the ones based on LD, HR/FB, BB. Here are the cluster centers for it, along with the average wOBAs of each cluster.

Cluster LD HR/FB BB wOBA
1 17% 17% 10% 0.369
2 23% 6% 14% 0.353
3 19% 10% 5% 0.330
4 16% 9% 8% 0.326
5 20% 13% 9% 0.362
6 18% 12% 13% 0.363
7 19% 23% 15% 0.398
8 19% 4% 7% 0.315
10 19% 18% 14% 0.392

 

And here are a couple guys that stand out by having a low wOBA relative to their cluster (potential for improvement maybe?)

Name LD% HR/FB BB% Cluster wOBA Cluster wOBA
Brandon Inge 15% 15% 9% 1 0.315 0.369
Jack Cust 20% 18% 15% 10 0.342 0.392
Alfonso Soriano 19% 12% 8% 5 0.314 0.362
Russell Martin 21% 5% 12% 2 0.307 0.353
Mark DeRosa 17% 15% 8% 1 0.327 0.369
Dan Uggla 17% 16% 14% 10 0.354 0.392


Finally, I took a look at the players that had lower numbers of PAs (100-300) to see what cluster they fell in (note: I didn’t re-cluster, just examined which center these players were closest to). The following fell in the high wOBA clusters (1,5,6). Note: clusters 7 and 10 had no players fall in them.

 

Name Team PA LD% HR/FB BB% wOBA cluster
Randy Ruiz Blue Jays 130 11% 31% 8% 0.428 1
Kyle Blanks Padres 172 13% 21% 11% 0.372 1
Rickie Weeks Brewers 162 19% 19% 8% 0.365 6
Justin Maxwell Nationals 102 14% 19% 12% 0.357 1
Rocco Baldelli Red Sox 164 18% 17% 7% 0.326 6
Ryan Raburn Tigers 291 15% 17% 9% 0.378 6
Drew Stubbs Reds 196 21% 17% 8% 0.335 5
David Ross Braves 151 22% 16% 14% 0.386 5
Matt Stairs Phillies 129 11% 16% 18% 0.327 1
Brandon Allen Diamondbacks 116 17% 16% 10% 0.288 6
Landon Powell Athletics 155 18% 15% 9% 0.315 6
Ramon Castro - - - 171 22% 14% 9% 0.304 5
Marcus Thames Tigers 294 18% 14% 10% 0.329 6
Travis Snider Blue Jays 276 15% 14% 11% 0.327 6
Mat Gamel Brewers 148 27% 13% 12% 0.332 5
Jayson Nix White Sox 290 13% 13% 10% 0.319 6
Carlos Delgado Mets 112 20% 13% 11% 0.394 5
Eric Hinske - - - 224 18% 13% 12% 0.344 6
Andres Torres Giants 170 17% 13% 10% 0.379 6
Alex Gordon Royals 189 14% 12% 11% 0.321 6
Gabe Kapler Rays 238 23% 11% 12% 0.334 5
Chris Snyder Diamondbacks 202 17% 11% 16% 0.304 6
Jesus Flores Nationals 106 18% 11% 11% 0.375 6
Chris Gimenez Indians 130 19% 10% 13% 0.233 6
Austin Kearns Nationals 211 19% 7% 16% 0.298 6

 

Seeing as how I'm probably not going to venture into the world of projections (there's already plenty of people that do a much better job than I could) this all boils down to an interesting thought experiment. That being said I thought someone out there may have a use for the data.