The Anatomy of a Whiff: Four-Seam Fastballs

Leon Halip

How well will "High Heat" narrative hold up with a new statistical model and when pitch sequencing is taken into account?

A few weeks ago, I embarked on a difficult sabermetric journey with hopes of possibly discovering factors that would help us predict which characteristics of pitches result in more whiffs.

The major reason behind this journey was the fact that when one breaks down the anatomy of what makes a pitcher successful the most important component (in almost all cases) is their strikeout ability. Then when one considers the anatomy of a strikeout the most prominent component is their ability to get batters to swing and miss (whiff).

The major study within the first piece looked at 50 individual pitchers who threw more than 200 four-seam fastballs in 2012.

I pulled data on every four-seam fastball that a batter swung at against each pitcher and labeled the swings that resulted in contact as a 0 and the swings that resulted in a whiff as a 1.

I tested various predictors such as, the fastball's velocity, movement and location, to test to see whether we could find out what was differentiating the whiffs from the other pitches.

This study resulted in this interesting, yet very logical conclusion:

It seems to me, based on the results of this sample, that on average individual pitchers generate more whiffs on fastballs that are higher in the zone relative to their other fastballs.

This result lent credence to a traditionally accepted idea of a high fastball or "high heat" as a pitch that is more difficult for a batter to catch up and thus, results in more whiffs.

This study was in no way perfect as there is so much going on on a pitch-by-pitch basis, it's really difficult to find any conclusions that could hold up under scrutiny. Thus, I decided to revamp the original study to include some quality suggestions I received on how to improve it.

The first was an issue with my choice in model. An emailer, Nick Embrey, pointed out to me that I should scrap the multiple linear regression model that I used previously in favor of the nonlinear probit model.

If you'd like to learn more about the specifics of a probit model you can follow this link. Although I think the emailer explained the idea the best when he said:

The main drawback of a linear model when you have a binary dependent variable is that your fitted values can be outside of the [0,1] interval, which doesn't really make sense. This is because what you're really modelling is the probability that the pitch results in a swing and miss or not, and a probability can only be between 0 and 1. A probit model restricts the range appropriately.

Given that the basis of my hypothesis was the find which characteristics of a pitch that increased the probability (again, between 0 and 1) of the pitch resulting in a whiff. The probit model made a great deal of sense.

The second suggestion was to factor in the sequencing of the pitch. Taking the entire at bat or pitch sequence into account was honestly way too complex; however, I thought looking at the previous pitch could be fruitful.

If we assume, based on the original study, that a higher than usual fastball results in more whiffs then my hypothesis is that if the previous pitch was lower and slower in the zone then the high fastball will become more effective.

The Study

I took the same sample of pitchers as in the previous study and again classified the swings that resulted in contact as 0 and swings that resulted in a whiff as 1. The independent variables that I used were:

  • The velocity of each fastball where there was a swing
  • The vertical location of each fastball were there was a swing
  • The difference in velocity of the fastball and the previous pitch
  • The difference in location of the fastball and the previous pitch

The Results

The only real issue that I ran into with the probit model is that interpreting the results is slightly more difficult.

The probit model is nonlinear and thus typical linear measures of goodness of fit do not apply. However, a pseudo r-squared can be calculated from the model that is fairly comparable to the typical r-squared from a linear model.

I took the square root of the pseudo r-squared that I found for each player and used that as the quasi-"correlation" or "r" of the model.

This "r" is not the same as one that we would find in a typical correlation, but for all intents and purposes of this piece, we'll consider them to be equivalent.

Below I listed these results for each of the pitchers in the sample:

  • The "correlation" or "r" of the model for each pitcher
  • Whether or not the vertical location of the fastball was significant at a 95 percent confidence level
  • Whether or not the velocity of the fastball was significant at a 95 percent confidence level
  • Whether or not the change in vertical location of the fastball from the previous pitch was significant at a 95 percent confidence level
  • Whether or not the change in velocity of the fastball from the previous pitch was significant at a 95 percent confidence level


Pitcher

"r"

Vert. Location

Velocity

Velocity Change

Location Change

Ben Sheets

0.417

No*

Yes

No

No* (Negative)

Justin Masterson

0.365

Yes

No

No

No* (Positive)

Danny Duffy

0.353

Yes

No

No

No* (Negative)

Chris Sale

0.345

Yes

No

No

No

Zach McAllister

0.342

Yes

No

No

No

Bud Norris

0.314

Yes

No*

No* (Negative)

No* (Negative)

C.J. Wilson

0.304

Yes

No*

No

No

Hiroki Kuroda

0.297

Yes

No

No

No

Jordan Lyles

0.297

Yes

No

No

No

Drew Hutchison

0.296

Yes

Yes

No

No

Roy Oswalt

0.288

Yes

No*

No

No

Felix Hernandez

0.282

Yes

No

No

No* (Negative)

Brian Duensing

0.281

Yes

No

No

No

Neftali Feliz

0.270

Yes

No*

No

No

James McDonald

0.268

Yes

No

No

No

Liam Hendriks

0.266

Yes

No

No

No

Franklin Morales

0.266

Yes

No

No

No

Jeff Locke

0.266

No

No

Yes (Positive)

No

Chad Billingsley

0.262

Yes

No*

No* (Negative)

No

Wei-Yen Chen

0.261

Yes

Yes

No

Yes (Negative)

Mike Minor

0.258

Yes

Yes

No

Yes (Negative)

Casey Kelly

0.252

No

No

No

No

Matt Harvey

0.245

Yes

Yes

No

No

Juan Nicasio

0.243

Yes

No

No

No

Jarrod Parker

0.243

Yes

No

No* (Positive)

No

Jordan Zimmermann

0.241

Yes

No

No

No

Jason Hammel

0.236

Yes

No

No

No

Yovani Gallardo

0.234

Yes

Yes

No

No

Mark Rogers

0.232

No

No

No* (Negative)

No

Miguel Gonzalez

0.229

No*

No*

No

No

Daniel Bard

0.223

No

No

No

No

Garrett Richards

0.216

No

No

No

Yes (Negative)

Gavin Floyd

0.214

Yes

No

No

No

Justin Verlander

0.214

Yes

Yes

No

No

Jaime Garcia

0.199

No

No

No

No

Brad Lincoln

0.198

No

No

No

No

Ivan Nova

0.187

No

Yes

No

No

Clay Buchholz

0.165

No*

No

No

No

Stephen Strasburg

0.162

Yes

No

No

No

Josh Beckett

0.160

Yes

No

No

No

Christian Friedrich

0.152

No*

No

No

No

A.J. Burnett

0.150

No

No

No* (Negative)

No

Johnny Cueto

0.146

Yes

No

No

No

Zack Greinke

0.134

No

No

No

No

Henderson Alvarez

0.134

No

No

No

No

James Shields

0.125

No

No

No

No

Ervin Santana

0.121

No

No

No

No

Josh Outman

0.110

No

No

No

No

Tommy Hunter

0.089

No

No

No

No

Edwin Jackson

0.084

No

No

No

No

*indicates that the predictor was significant at a 90 percent confidence level.

It's clear when comparing these measures of goodness of fit to those from the original test that the probit model is more suited for this study, as each "correlation" became stronger.

Vertical Location:

These results backed the "high heat" conclusion that was found in the first test, as the vertical location of the fastball was significant at a 95 percent level for 64 percent of the sample and was significant at a 90 percent confidence level for 72 percent of the sample.

Velocity of the pitch:

These results also backed the original study with the conclusion that the velocity of the fastballs that resulted in whiffs were no different than the ones which did not result in a whiff, as the velocity was only significant at a 95 percent confidence level for 16 percent of the sample and significant at a 90 percent confidence level for 28 percent of the sample.

Change in location from the previous pitch:

I was surprised to find that change in location was not a significant predictor for the majority of this sample; 6 percent significant at 95 percent level and 16 percent significant at 90 percent level. I was even more surprised to find that for the few pitchers for whom I found a significant relationship that relationship between the change in location and the probability of the whiff was, in fact, negative.

This would indicate that the further away the previous pitch was to the fastball, in terms of vertical location, the less likely it was that a whiff would occur on the fastball.

Change in velocity from the previous pitch:

Similarly to the change in location from the previous pitch, the change in velocity was also not a significant predictor for the majority of the sample; the change in velocity was only significant for one pitcher in the sample at a 95 percent confidence level and only 12 percent of the sample at a 90 percent confidence level.

Interestingly enough, the majority of those significant relationships were also negative, which meant that the larger the gap between the velocity of the previous pitch and the velocity of the fastball, the less likely it was that a whiff would occur.

The results for velocity and location change, which coincided with my attempt to take pitch sequencing into account, were the exact opposite of what I expected to find in my hypothesis. As I had expected the larger the difference in velocity and location would indicate a greater probability of a whiff on a fastball.

Based on this more extensive test it seems the only real conclusion that I could make on what could possibly increase the probability of a fastball resulting in a whiff is to elevate the fastball relative to others; the "high heat" assumption.

How much does elevating the fastball increase the probability of a whiff?

I'll use Zach McAllister of the Cleveland Indians as an example, as he had the strongest relationships between vertical location and the probability of a whiff in this sample.

According to the probit model, the marginal effect or small increase in vertical location for McAllister will increase the probability of a whiff by 20.6 percent. I personally think this is a fairly large increase, but keep in mind that McAllister's vertical location was the strongest predictor in this sample.

Overall the explanatory strength of the predictors (even the significant ones) in this sample was fairly weak. But, again, we should not expect a lot of explanatory strength when analyzing something on a pitch-by-pitch basis.

My goal with this series was to not only look at just four-seam fastballs, but to see what we can learn about what may explain a whiff for other pitch types, as well. Thus, I posted a poll below where you can vote on which pitch type you'd like to see studied next.

All data comes from the PITCHf/x database available on Baseball Heat Maps.

You can follow Glenn on twitter @Glenn_DuPaul.

X
Log In Sign Up

forgot?
Log In Sign Up

Forgot password?

We'll email you a reset link.

If you signed up using a 3rd party account like Facebook or Twitter, please login with it instead.

Forgot password?

Try another email?

Almost done,

Join Beyond the Box Score

You must be a member of Beyond the Box Score to participate.

We have our own Community Guidelines at Beyond the Box Score. You should read them.

Join Beyond the Box Score

You must be a member of Beyond the Box Score to participate.

We have our own Community Guidelines at Beyond the Box Score. You should read them.

Spinner

Authenticating

Great!

Choose an available username to complete sign up.

In order to provide our users with a better overall experience, we ask for more information from Facebook when using it to login so that we can learn more about our audience and provide you with the best possible experience. We do not store specific user data and the sharing of it is not required to login with Facebook.

tracking_pixel_9351_tracker