This post originated out of me asking myself, "Self, if you were going to delve into the world of projecting offense, how would you go about it?" My answer was that I’d take a basic Marcels approach and add in some additional regression/weighting based on batted ball (plus a little extra) profiles. That approach would require me to bin players based on batted ball profiles, so I immediately thought of k-means clustering using R. The rest of this post is my brief exploration of batted ball profile clustering.
Using Fangraph’s 2009 stats (filtered to just the qualifiers) I created clusters based on the following sets of statistics.
LD | GB | FB | IFF | |
LD | GB | FB | IFF | HR |
LD | HR | BB | ||
HR | BB | K | ||
GB | FB | ISO | SPD | |
BB | K |
IFF = In Field Fly, HR = HR/FB%
The full lists of clusters can be found here, and I’ll discuss some of the things I found interesting after the jump
Not surprisingly the sets of stats that included some version of walk rate did better (anecdotally at least) at clustering the good (based on wOBA) hitters from the bad hitters, but if one wishes to just look at the physical batted ball profiles, then adding in HR/FB weeds out some of the noise. I found it mildly amusing that if you only look at batted ball types (excluding HF/FB) that Yuniesky Betancourt and Albert Pujols fall in the same cluster. The set of clusters I decided to focus on were the ones based on LD, HR/FB, BB. Here are the cluster centers for it, along with the average wOBAs of each cluster.
Cluster | LD | HR/FB | BB | wOBA |
---|---|---|---|---|
1 | 17% | 17% | 10% | 0.369 |
2 | 23% | 6% | 14% | 0.353 |
3 | 19% | 10% | 5% | 0.330 |
4 | 16% | 9% | 8% | 0.326 |
5 | 20% | 13% | 9% | 0.362 |
6 | 18% | 12% | 13% | 0.363 |
7 | 19% | 23% | 15% | 0.398 |
8 | 19% | 4% | 7% | 0.315 |
10 | 19% | 18% | 14% | 0.392 |
And here are a couple guys that stand out by having a low wOBA relative to their cluster (potential for improvement maybe?)
Name | LD% | HR/FB | BB% | Cluster | wOBA | Cluster wOBA |
---|---|---|---|---|---|---|
Brandon Inge | 15% | 15% | 9% | 1 | 0.315 | 0.369 |
Jack Cust | 20% | 18% | 15% | 10 | 0.342 | 0.392 |
Alfonso Soriano | 19% | 12% | 8% | 5 | 0.314 | 0.362 |
Russell Martin | 21% | 5% | 12% | 2 | 0.307 | 0.353 |
Mark DeRosa | 17% | 15% | 8% | 1 | 0.327 | 0.369 |
Dan Uggla | 17% | 16% | 14% | 10 | 0.354 | 0.392 |
Finally, I took a look at the players that had lower numbers of PAs (100-300) to see what cluster they fell in (note: I didn’t re-cluster, just examined which center these players were closest to). The following fell in the high wOBA clusters (1,5,6). Note: clusters 7 and 10 had no players fall in them.
Name | Team | PA | LD% | HR/FB | BB% | wOBA | cluster |
---|---|---|---|---|---|---|---|
Randy Ruiz | Blue Jays | 130 | 11% | 31% | 8% | 0.428 | 1 |
Kyle Blanks | Padres | 172 | 13% | 21% | 11% | 0.372 | 1 |
Rickie Weeks | Brewers | 162 | 19% | 19% | 8% | 0.365 | 6 |
Justin Maxwell | Nationals | 102 | 14% | 19% | 12% | 0.357 | 1 |
Rocco Baldelli | Red Sox | 164 | 18% | 17% | 7% | 0.326 | 6 |
Ryan Raburn | Tigers | 291 | 15% | 17% | 9% | 0.378 | 6 |
Drew Stubbs | Reds | 196 | 21% | 17% | 8% | 0.335 | 5 |
David Ross | Braves | 151 | 22% | 16% | 14% | 0.386 | 5 |
Matt Stairs | Phillies | 129 | 11% | 16% | 18% | 0.327 | 1 |
Brandon Allen | Diamondbacks | 116 | 17% | 16% | 10% | 0.288 | 6 |
Landon Powell | Athletics | 155 | 18% | 15% | 9% | 0.315 | 6 |
Ramon Castro | - - - | 171 | 22% | 14% | 9% | 0.304 | 5 |
Marcus Thames | Tigers | 294 | 18% | 14% | 10% | 0.329 | 6 |
Travis Snider | Blue Jays | 276 | 15% | 14% | 11% | 0.327 | 6 |
Mat Gamel | Brewers | 148 | 27% | 13% | 12% | 0.332 | 5 |
Jayson Nix | White Sox | 290 | 13% | 13% | 10% | 0.319 | 6 |
Carlos Delgado | Mets | 112 | 20% | 13% | 11% | 0.394 | 5 |
Eric Hinske | - - - | 224 | 18% | 13% | 12% | 0.344 | 6 |
Andres Torres | Giants | 170 | 17% | 13% | 10% | 0.379 | 6 |
Alex Gordon | Royals | 189 | 14% | 12% | 11% | 0.321 | 6 |
Gabe Kapler | Rays | 238 | 23% | 11% | 12% | 0.334 | 5 |
Chris Snyder | Diamondbacks | 202 | 17% | 11% | 16% | 0.304 | 6 |
Jesus Flores | Nationals | 106 | 18% | 11% | 11% | 0.375 | 6 |
Chris Gimenez | Indians | 130 | 19% | 10% | 13% | 0.233 | 6 |
Austin Kearns | Nationals | 211 | 19% | 7% | 16% | 0.298 | 6 |
Seeing as how I'm probably not going to venture into the world of projections (there's already plenty of people that do a much better job than I could) this all boils down to an interesting thought experiment. That being said I thought someone out there may have a use for the data.