/cdn.vox-cdn.com/assets/647291/whiff_velo1.png)
What makes a breaking ball nasty?
As a knowledgeable baseball fan, you would probably answer that a combination of movement, location, and velocity all contribute to making a breaking ball nasty. Today I will examine how accurately we can predict nastiness using these three variables, where nastiness is defined as whiffs per swing.
To test how much these three variables can tell us, I created a model that predicts whiff rate using only horizontal movement and location, vertical movement and location, and velocity. It should also be noted that looking at whiffs per swing as opposed to whiffs per pitch helps a lot. This is because whiff per pitch also measures which pitches batters like to swing at, and that's a question we are not interested in right now. The model was created using every breaking ball thrown in the 2011 season that was swung at. Only right-handed pitchers who released the ball above 5 feet were included. MLBAM classifications were used, only including sliders (SL) and curves (CU and KC). Great, boring stuff out of the way! Results after the break.
Using data through the middle of June (last time my database was updated), the leaders among pitchers with at least 50 swings on their breaking balls are:
Name | Predicted whiff rate |
Sergio Santos | 53.1% |
Mark Melancon | 43.2% |
Craig Kimbrel | 42.4% |
Al Alburquerque |
41.9% |
David Hernandez | 41.5% |
Matt Garza | 40.8% |
AJ Burnett | 39.8% |
Zack Greinke | 39.1% |
Kyle Drabek | 38% |
Josh Johnson | 37.6% |
For context, the average whiff rate on breaking balls is 27.4%. As you can see, most of these names are not very surprising. It's good to see guys like Greinke and Burnett who are known for their breaking balls on the list.
Here is how the model performed more generally, among all pitchers with at least 50 swings on their breaking balls:
It's important to note that this graph is for the mean predicted rates and the mean actual rates, and not for individual pitches. Individual pitches don't really have a probability - there either is a whiff or there isn't. There's also a lot of noise on a pitch by pitch basis, so no model is going to perform well at that kind of level.
As you can see, the R -squared is 63.4%, which means that there is a strong relationship between mean predicted rates and mean actual rates. Here are the 5 pitchers that the model thinks are over-performing the most:
Sergio Santos
Al Alburquerque
Sergio Romo
Ramon Ramirez
Jose Veras
Of note is that all of these guys are relievers. Here are the 5 pitchers that the model thinks are under-performing the most:
Wade Davis
Mike Pelfrey
Bartolo Colon
Matt Belisle
Kyle Lohse
In statistics this is called "East coast bias." The model also does not account for the situation the pitch was used in, so it's likely that pitchers' differing usages of breaking balls hurts the accuracy of the model.
Platoon Splits observations:
I should clarify that the model is actual two models, one for right-handed batters and one for left-handed batters. I chose to do this because of the platoon split of sliders and the reverse platoon split of slow curves.
You can see this relationship in the following graph:
Shown here is the relationship of whiff rate and velocity, split by batter handedness where right-handed batter data is blue and lefties are red. Gray bands indicate confidence.
Of interest here is the gap between the two predicted whiff rates when velocity is in the 80-90 mph range. This is because this is where we find mostly sliders, pitches that are known for having a large platoon split. How is that just a 10 mph difference can cause such different behavior among breaking balls? I don't know the answer, and this question is ultimately out of the scope of this post. Just something to keep in mind.
Finishing Thoughts:
We were able to accurately predict whiff rates using just location, movement, velocity, and the knowledge that the batter had swung. This tells us two things. Firstly, it's not much of a mystery what makes a breaking ball hard to hit. Secondly, the more elusive skill seems to be getting batters to swing at pitches that are hard to hit in the first place. The value of models like this are that they can be used to evaluate players in small samples, assuming that the data that goes into the model stabilizes quickly.
References and Resources
*PITCHf/x data from MLBAM through Darrel Zimmerman's pbp2 database.
*http://princeofslides.blogspot.com/ – used as reference for R code.
Please welcome our newest contributor, Josh Weinstock! You might be familiar with his excellent pitch f/x work at It's About the Money, the SweetSpot network Yankees blog. Josh will be contributing feature-length content a couple times per month, and will expand upon his pitch f/x work. -jbopp