Beyond the Box Score: An SB Nation Community

Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
New Blog: Cowboy Altitude for Wyoming Fans!

BABIP, HR/FB, and Batted Ball Type by Pitch Location

DIPS theory--the idea that a pitcher has little control over the outcome of balls in play--is possibly sabermetrics' most controversial idea.  Many fans maintain that a pitcher, by consistently locating in the right spots, can induce weak contact and thus lower his batting average on balls in play.  I took all balls in play (including home runs) from 2008, and using the gameday XML data, assigned them to one of 13 bins (I reversed the coordinate system for LHB's).  Bins 1-9 are all inside the strike zone, while Bins 10-13 are balls.

Star-divide

 

For each bin, I calculated BACON (batting average on contact, including HRs), SLGCON (slugging on contact), BABIP (batting average on balls in play, excluding HRs), SLGBIP (slugging on balls in play), GB%, LD%, FB%, IF/FB%, HR/OFB, and batting averages for each of the batted ball types.  A graph of BACON and SLGCON follows below:

This confirms Dave Allen's results, although he used a continuous approach rather than bins.  The best zone for hitters is located along a diagonal line extending from the lower-inside corner to the high-outside corner.  Along this diagonal line, hitters are best able to get the barrel of the bat on the ball, while pitches up and in and down and away are too far from the barrel to be hit solidly.  As we would expect, hitters get much worse results on pitches outside of the strike zone.

By examining batted ball types, we can get a better idea of how pitches in various bins are hit.  GB% follows a very predictable pattern:

Here we see that lower pitches result in dramatically more groundballs than higher pitches, and outside pitches result in somewhat more grounders than inside pitches.  Interestingly, very inside pitches result in markedly more ground balls than one would expect, perhaps due to batters' inability to drive these pitches.

FB% is merely the reverse of GB rate, while LD rate is essentially random, except for inside pitches.  LD% varies from 19.2% to 20.5% for pitches in Zones 2-9; however, in Zone 1 (up and in), it is 17.5%.  In addition, Zones 10-13 all exhibit well below average line drive rates, ranging from 16.1% to 17.8%.  This shows, once again, that batters have difficulty making solid contact up and in and outside of the zone.   Zone 1 also exhibits a significantly lower Batting Average on Line Drives, at .710--Zones 2-8 range from .730 to .756, while Zone 9 is at .716.  Furthermore, Zone 10 (balls inside) has the highest IF/FB ratio (43.3%) while Zone 1 is second at 36.1%.  Once again, this confirms that batters are having a hard time driving the high inside pitch.  Pitchers who throw high and inside should expect a lower BABIP, more infield flies and less line drives.

Interestingly, HR/OFB varies dramatically by location:

Surprisingly, pitches low and inside result in the highest HR/FB rate (though pitches right down the middle are second).  Outside pitches, as would be expected, have a lower HR/FB rate.  But the most surprising result is the degree of correlation between pitch location and HR/FB rate.  These results seem to indicate that by pitching away, a pitcher can noticeably reduce his HR/FB rate--yet pitchers' HR/FB rate show a strong tendency to revert to the league average of 11%.  The only way to solve this discrepancy is by constructing a model to estimate HR/FB and comparing that model to actual HR/FB.

Breakdown by Pitch Type

Due to the limitations of the gameday pitch classification algorithm and small sample sizes, I would be wary of drawing too many conclusions from the individual pitch data.

Fastballs (four seam)

The results for fastballs were almost identical to the results for all pitches.

Change-ups

The high inside changeup was slightly less effective than the high inside fastball (.298 BACON, .536 SLGCON) though still far more effective than middle-inside or low inside.  However, on high inside changeups, pitchers still induced tons of infield fles (37.7%), fewer line drives than average (18.0%), and a significantly lower batting average on those line drives (.660).

Changeups low and in were crushed for a 27.8% HR/FB rate, compared to 15.4% for fastballs, while changeups middle-in had a 21.3% HR/FB (12.3% for fastballs).  This confirms conventional wisdom that changeups are much more effective on the outer part of the plate.

Curves

BACON and SLGCON for curves is largely similar; however, the SLGCON on curves low and inside (.690) is much higher than the SLGCON for curves right down the middle (.625).  This seems to confirm that slow pitches are a bad idea inside.

The HR/FB data is more interesting.  Curveballs up and in have the highest HR/FB of any curveball, at 21.8%.  This might be a fluke of small sample size (133 fly balls), particularly in light of the contradictory result obtained by high-and-tight sliders (see below).

Sliders

The HR/FB trend observed in curveballs does not hold for sliders (10.9% HR/FB on high inside sliders).  Gameday's pitch classification algorithm often has trouble distinguishing curves and sliders; thus I suspect that the high HR/FB on up and in curveballs is nothing more than a statistical fluke.

Sinkers (Two seam fastballs)

Sinkers have the least data out of all the pitch types--I suspect that Gameday classified a lot of sinkers as fastballs.  Nevertheless, pitchers induce significantly more ground balls on sinkers than on fastballs--56% for sinkers, compared to 43.6% for four-seam fastballs.

What to do next

With this data, we can construct a model to predict HR/FB by pitch location.  In particular, I wonder if the large variation between HR/FB in different locations translates to large variations between individual pitchers.

Data

The data is located here on Google Docs.

6 recs  |  Comment 11 comments |

Story-email Email Printer Print

Comments

Display:

Very nice work

As MGL (and others) are fond to remind us, pitch location does not equal pitch intent.

So while it’s definitely useful to look at these results, we need to be careful not to draw too many conclusions about how a pitcher should pitch based on them.

by Dan Turkenkopf on Jul 21, 2009 8:20 AM EDT reply actions  

I agree--we need to track the catcher's target in order to determine the pitcher's "control"

But if we’re just worried about determining past performance, I think we should use the actual location. For instance, normalizing HR/FB based on pitch location (and possibly also pitch type) would produce a better luck-independent pitching metric than FIP or xFIP.

by Alex Krolewski on Jul 21, 2009 2:01 PM EDT up reply actions  

Definitely

For value this is the way to go.

Again, nice work.

by Dan Turkenkopf on Jul 21, 2009 7:51 PM EDT up reply actions  

You could expand beyond location

Velocity and Movement likely have just as large of an effect on HR/FB as location. If you could create, I dunno, 100 or so bins with all combination of location, velocity and movement, that would be amazing.

Derosa.

by vivaelpujols on Jul 21, 2009 9:32 PM EDT up reply actions  

Right--the problem is the sample size

Just dividing the data by pitch type results in sample size issues—with 100 bins, some would have only 2 or 3 batted balls even with a years’ worth of data. Ideally, with 10 or so years we could create a multi-year average since these numbers shouldn’t change too much year to year.
In the end, I think the best approach is to model HR/FB rate based on location, and then look at overperformers and underperformers to determine if the model has any biases.

by Alex Krolewski on Jul 22, 2009 12:18 AM EDT up reply actions  

I haven't checked out the data

But I would assume that there were more than 200-300 fly balls hit in the majors last year. In fact, according to Baseball Reference, there have been 14,460 plate appearances that have ended with a fly ball this year. Over a full season, that’s over 20,000 samples. If you had data for 3 seasons, 2007-2009, you would have about 70,000 samples.

I would then suggest forgoing the pitch classifications, and create 100 or so bins based on different combination’s of movement, velocity and location. Again, you are the one who has done the initial work, but I fail to see how doing that would result in small sample size problems.

Derosa.

by vivaelpujols on Jul 22, 2009 1:44 AM EDT up reply actions  

It's small sample size per bin that's the concern

Some of your bins would be chock full of pitches – relatively straight 88 mph four-seamers up and in perhaps, while others will have very few pitches – 65 mph curve balls up and in. (I know you said ignore the classifications, but this was an easier way to illustrate)

This is just conjecture, but I wouldn’t be surprised if that’s how it turned out.

Without at least smoothing the data, you’re likely going to get some funky results. And I’m not enough of a statistician to know whether smoothing is enough here.

by Dan Turkenkopf on Jul 22, 2009 8:00 AM EDT up reply actions  

"Hundreds of bins" might work

With 2.5 years of data, divided by pitch type, each bin would probably be large enough to alleviate small sample size concerns. In fact, we could probably divide the strikezone into 36 bins (rather than 9). I have already looked at a 44-bin approach (8 outside of the zone instead of 4 inside) and it looks like the data stabilizes around 600-700 AB. So if i used 2.5 years of data instead of 1, and I combined the smallest zones together, then I could probably model HR/FB by location and pitch type.

by Alex Krolewski on Jul 23, 2009 12:34 AM EDT up reply actions  

One should be careful about using the word "would."

“Could” or “should” are probably more appropriate here, at least until you actually build and then test the metric.

by cwyers on Jul 21, 2009 10:02 PM EDT up reply actions  

Hmm... higher HR/FB rates being inside makes sense to me.

Hitters keep the bat closer to their body to try to get the sweet spot on the bat, end up increasing bat speed.

@bs_uf15bosox9be:OverTheMonster-ALLERGEN WARNING:May contain PB.

by bdalebs on Jul 21, 2009 1:25 PM EDT reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?
Start posting on Beyond the Box Score »

Join SB Nation and dive into communities focused on all your favorite teams.

Connect_with_facebook

FanPosts

Community blog posts and discussion.

Recent FanPosts

Baseball_small
WAR By Decade: 1871-1879
Prosser_small
Cliff Lee: No longer invincible
Paige_small
Kelly Johnson Cleared Waivers; I Think That's Weird
Jeter_06_world_series_small
Top 10 players to start a franchise with revised.
Ballgame_2006_vs_texas_revised_small
The Myth of the Spoiler Returns
Small
Denard Span's Strikezone
Small
Matusz: Danks 2.0
Paige_small
I Think I Offended Juan Pierre
Leopold_butter_scotch_southpark_small
HOF/PED Quandry
Small
The Power Rank

+ New FanPost All FanPosts >

Sign up for the BtB Newsletter!

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

Plate discipline trends
What's Wrong With Mike Pelfrey?
Lightest Players in History (min 1000 PA or 500 IP)
Statistical Head Scratchers: The Sacrifice Fly
Adam Wainwrights Curve
Jose Batista Facts
A PitchFX look at how R.A. Dickey is able to change speeds with his knuckleball to be so effective
Out Rate: a simple new upgrade on OBP
Tommy Hunter vs. Scott Feldman
Does anybody know of somewhere you can download up to date pitch-by-pitch...

+ New FanShot All FanShots >

BtB on Facebook

BtB on Twitter

RSS Feed: @BtBScore

Sky: @BtB_Sky

Jeff: @jeffwzimmerman
Steve: @steve_sommer
Dan: @dturkenk
Harry: @harrypav
Jinaz: @jinazreds
Jack: @jh_moore
Tommy R: @trancel
Justin: @justinbopp
Satchel: @SatchelPrice
Adam: @baseballtwit
Larry: @wezen_ball
Peter: @CapitolAvenue
Paul: @TheDiaTribe
Daniel: @CamdenCrazies
Matt: @devil_fingers

SBNation.com Recent Stories

Chicago White Sox's Mark Teahan is congratulated by Gordon Beckham (15) after scoring on a single by A.J. Pierzynski in the second inning of a baseball game against the Detroit Tigers Monday, Sept. 6, 2010 in Detroit. (AP Photo/Duane Burleson)

White Sox Win Seventh In A Row On A.J. Pierzynski's 10th-Inning Single

Colorado Rockies' Carlos Gonzalez is congratulated in the dugout after scoring against the Cincinnati Reds in the third inning of a baseball game at Coors Field in Denver on Monday, Sept. 6, 2010.  (AP Photo/ Matt McClain)

Carlos Gonzalez, Rockies Stay Hot In Holiday Defeat Of Reds

NEW YORK - JULY 18:  Andy Pettitte #46 of the New York Yankees bends over prior to leaving the game in the third inning against the Tampa Bay Rays during the first inning on July 18 2010 at Yankee Stadium in the Bronx borough of New York City.  (Photo by Jim McIsaac/Getty Images) +6 updates

Andy Pettitte Reporting To Minors For Rehab Start Following Incident-Free Bullpen

More from SBNation.com >


Managers

Limes_125_small Sky Kalkman

Wbc_029_small Jeff Sullivan

Editors

Rawlings_baseball_bigger_small Dan Turkenkopf

Dayton_small Jeff Zimmerman (TucsonRoyal)

Aviles_small Justin Bopp

Paige_small Satchel Price

Authors

Jinaz-reds-avatar_small JinAZ

Face_small Harry Pavlidis

Newavatar_small Matt Klaassen

Wezenball-logo_small lar

Big_pun--300x300_small Tommy Rancel

Adam_small adarowski

Redcap_small SFiercex4

St_louis_cardinals_ce1141_003263_small stevesommer05

Small garik16

Julio_teheran_2_small PWHjort

Cclogo_small Daniel Moroz

Closeup4_small J-Doug

Nick_cage_small The DiaTriber