clock menu more-arrow no yes

Filed under:

Making an attempt at strength of contact using PITCHf/x

Not all fly balls are created equally. A $325M Giancarlo Stanton fly ball is different than, say, a $120M Elvis Andrus fly ball.

Steve Mitchell-USA TODAY Sports

Up in the little blurb description below the entrancing photo there, I noted two players whose combined contractual obligations total $445M over a combined 21 years, ignoring options and opt-outs and the like. That's staggering. The amount of money in this sport is becoming silly. Does $325M even mean anything to us, besides "it's a lot"? Is there a meaningful difference between $225M and $325M besides...$100M? Baseball is dying, and so forth. I thought I'd mention it, even though their salaries are somewhat unrelated to the topic at hand, which is fly ball distance.

Like the blurb says, some fly balls are hit much harder and go for base hits more often than others. Categorizing a ball in play as a fly ball or line drive or fliner (liner fliner) or whatever FanGraphs' play log says it is doesn't quite get there. If one were to have access to HITf/x data, one could distinguish a weak fly ball and a strong fly ball using actual strength of contact statistics such as velocity off the bat and elevation angle. Tony Blengino does this at FanGraphs. Not having access to HITf/x data, I will attempt to use the fly ball distance variable from the PITCHf/x database to approximate what is a "hard-hit fly ball" and what is, technically, a "can of corn".

Here's what I did, then. I downloaded all those balls in play categorized as fly balls from Baseball Savant for 2012-2014, which gave me 95,060 fly balls. I got those files into R and started calculating BA, SLG, and "production" (PRD) (1.7*BA+SLG) for each 10 foot bucket of fly ball distance. My hypothesis was that fly ball production would be a pretty terrible flat line for awhile as distance increased, but at some crucial point, production would increase dramatically. The graph below shows what I found.

fly ball distance buckets

Point of clarification: The buckets are greater than or equal to the lower number and less than the top number. For example, the 260-270 bucket includes 260-269.9999, but not 270.

My hypothesis is basically correct, with some caveats. I forgot about bloops, which is why production is a bit higher on the low end of the distance spectrum. Near the upper end of distance, all the fly balls are home runs, so that's why it flattens out there. This is basically a polynomial function. Using the Excel functionality of fitting a trendline and choosing the polynomial option with an order of 2, an equation with an R^2 of 0.95 can be produced. That equation appears biased at certain points, so I won't be using it for anything, but the point was to show how close the fit is.

Looking at the graph, it appears that somewhere around 310 feet is where things start getting interesting. I decided to look into that area to determine where I might set a definitive cutoff between "can of corn" and "hard hit fly ball". I chose 311 feet as the cutoff (the median fly ball distance was 270.2 ft, and the mean fly ball distance was 275 ft). I think arguments could be made to set a different cutoff, however, based on your expectations of how often a player should get on base with a hard hit fly ball. I found these numbers:

309 0.152 0.370 0.628
310 0.151 0.398 0.655
311 0.197 0.505 0.840
312 0.193 0.469 0.797

Seems reasonable, no? There is a decent jump from 310 to 311 that doesn't occur from 309 to 310 or from 311 to 312. 311 feet is about 0.72 standard deviations above the median. So 311 will be my cutoff. The next logical step is to determine the production and frequency of a can of corn compared to a hard hit fly ball. The table below takes that step.

Bin Count Avg Dist BA SLG PRD Freq
Can of Corn 67961 246.2 0.094 0.138 0.298 71.5%
Hard Hit Fly Ball 27099 347.1 0.556 1.920 2.865 28.5%

Excellent. Tony Blengino's cutoffs for a soft fly vs. a hard fly produce frequencies of about 35-65, respectively, but he uses elevation angle. From this article:

If you physically split the entire population of major league fly balls halfway between the boundaries of the popup and line drive groups, 34.8% of fly balls would be in the "high" group, and 65.2% in the "low" group. There is a stark difference in production between these two groups – major league hitters batted .094 AVG-.224 SLG on "high" fly balls and .372 AVG-.960 SLG on "low" fly balls.

Different data sources and different methods, but I found a similar BA/SLG in the weak fly ball group, whereas I found much higher production in the hard hit fly ball group. The frequencies are flipped as well. The difficulties encountered by different classification systems, I suppose. My methodology probably isn't quite as good, but it's based off public data, so that's something, right?

Well, I mentioned Giancarlo Stanton and Elvis Andrus, so I might as well show their data, as well as some other players of interest of this offseason. These data are from Baseball Savant and from 2012-2014 to match my dataset above.

Giancarlo Stanton

Bin Count Avg Dist BA SLG PRD Freq
Can of Corn 148 244.7 0.108 0.162 0.346 61.7%
Hard Hit 92 361.8 0.620 2.391 3.445 38.3%

As if you needed more evidence that Stanton hits the ball really hard. He not only allocates a greater frequency of fly balls to the hard hit category than average, but he also hits the ball harder within each category than league average as shown by his average distance and PRD.

Elvis Andrus

Bin Count Avg Dist BA SLG PRD Freq
Can of Corn 189 242.3 0.053 0.079 0.169 78.1%
Hard Hit 53 334.5 0.415 1.151 1.856 21.9%

The anti-Stanton. He allocates more fly balls to the weak category, and he hits the ball much weaker within each category.

Victor Martinez

Bin Count Avg Dist BA SLG PRD Freq
Can of Corn 206 252.2 0.053 0.073 0.163 66.7%
Hard Hit 103 343.8 0.437 1.563 2.306 33.3%

Martinez presents an interesting case. He does allocate more fly balls to the hard hit category, but he hits them more weakly than league average. He hits the cans of corn harder than league average (greater average distance), but he produces worse than league average on those cans of corn. This could be due to a lack of speed rather than strength of contact. Some doubles for most players might be singles for VMart.

Jason Heyward

Bin Count Avg Dist BA SLG PRD Freq
Can of Corn 192 243.7 0.083 0.135 0.276 66.2%
Hard Hit 98 349.8 0.531 1.939 2.842 33.8%

Heyward is somewhere around league average in terms of production within each category, but he allocates more of his fly balls to the hard hit category than league average.

Billy Butler

Bin Count Avg Dist BA SLG PRD Freq
Can of Corn 229 248.7 0.162 0.223 0.498 67.4%
Hard Hit 111 353.5 0.550 1.892 2.827 32.6%

Butler is another guy who allocates more of his fly balls to the hard hit category than league average. Interestingly, Butler appears to hit the ball harder within the hard hit category, but his production is pretty close to league average. Conversely, he appears to hit the weaker fly balls around league average, but his production within the category is high compared to league average. I suspect this has to do with Kauffman's park dimensions and how opposing outfielders play there. I hypothesized that opposing outfielders play deeper there to prevent the double but give up the single, which isn't a stretch to imagine.

StatCast data probably will render much of this obsolete. Theoretically, with StatCast data, I could have the granular batted ball data I so desire as well as outfield positioning. Guesswork and hypothesizing will become theorizing. Nevertheless, fly ball distance from PITCHf/x appears to function as an acceptable proxy to measure how hard a player hits the ball.

. . .

All statistics courtesy of Baseball Savant.

Kevin Ruprecht is an Editor of Beyond the Box Score. He also writes at Royals Review. You can follow him on Twitter at @KevinRuprecht.