Beyond the Box Score: An SB Nation Community

Navigation: Jump to content areas:


Sports blogs for fans, by fans.
Around SBN: Cal RB Jahvid Best Seriously Injured, Carted Off Field

PitchFX, Dirt, and Parks

The last two years, I've published rankings of how successful catchers were at blocking balls in the dirt.  I've been leveraging the Pitch FX data from MLB for this analysis, but I haven't really used the full power of the technology.  To this point, I've relied on the Gameday stringers to classify whether a pitch was in the dirt or not. 

Harry suggested that I look beyond the human element and use the more detailed pitch location information to determine when a pitch would hit the dirt.  Luckily for me, he was kind enough to provide a formula that allowed us to figure out at what point the pitch would hit the ground.  After going back and forth on it for a little while, and confirming with some other people, we decided that all pitches that landed within 3 feet behind the front of home plate could be considered to be balls in the dirt.1

In 2008, comparing the scorekeepers to the computer system led to the following difference:

Stringers Pitch FX
13332 23147

So the stringers identified pitches in the dirt only 60% of the time that the Pitch FX system did.   I grew curious about such a great discrepancy (which only got larger if we moved the catcher's location back to -3.5 or -4 feet).

Star-divide

My first thought was that the scorers in certain parks had a tendency to report fewer balls in the dirt than their cohorts in other parks.  This table breaks down the identified balls in dirt by park and calculates the percentage that were correctly identified by the stringers.2

Park Stringers Pitch FX
%
ANA 474 969.48 0.49
ARI 402 915.43 0.44
ATL 339 797.2 0.43
BAL 471 861.22 0.55
BOS 456 638.28 0.71
CHA 399 681.51 0.59
CHN 440 791.81 0.56
CIN 487 903.05 0.54
CLE 401 735.38 0.55
COL 394 734.94 0.54
DET 424 884.92 0.48
FLO 462 846.71 0.55
HOU 519 761.14 0.68
KCA 398 887.66 0.45
LAN 482 792.69 0.61
MIL 530 869.11 0.61
MIN 369 599.29 0.62
NYA 441 767.8 0.57
NYN 418 635.49 0.66
OAK 464 776.04 0.6
PHI 463 880.38 0.53
PIT 438 852.79 0.51
SDN 440 708.83 0.62
SEA 408 664.32 0.61
SFN 443 823.21 0.54
SLN 481 974.54 0.49
TBA 463 796.54 0.58
TEX 438 550.34 0.8
TOR 507 857.35 0.59
WAS 477 899 0.53

The values range from Atlanta at the bottom, where only 43% as many pitches were identified by the humans compared to the computers, to Texas, where the stringers called 80% as many balls in the dirt as did Pitch FX.  But that's not the really interesting piece of information to me.  Notice the discrepancy in the number of pitches that Pitch FX located as in the dirt.  Texas only had around 600, while St. Louis was almost at 1000.

There are a lot of things that could cause such a large difference between parks.  My first thought is that some pitchers just tend to throw more balls in the dirt than others.  Perhaps the Cardinals' staff throws a lot more splitters than does the Rangers'.  If that were the case, we'd expect to see roughly the same number of balls in the dirt when a team was on the road as when it was at home.

So I looked at how many pitches in the dirt each team threw both home and away.  I then normalized the results around whichever had fewer pitches thrown.  Finally, I calculated the single season park effects following the steps on Baseball Reference.3

Let me share a quick example before the results.  Let's look at the Texas Rangers.  As the home team, they had 244 pitches flagged as in the dirt according to Pitch FX.  Overall at home, Pitch FX captured 11991 pitches and missed 369, for a capture rate of 97%.  That allows us to scale the expected balls in the dirt1 to 251.52, so Texas had roughly 2 percent of its pitches in the dirt. 

On the road, Texas had 10911 pitches registered with Pitch FX, and 684 missed.  The raw number of balls in the dirt was 293, and the scaled number was 311, for just under 2.7%.

Next, I normalized the results to the smaller number of pitches - in this case those as the away team - giving 311 balls in the dirt on the road, and 236 at home.  We divide the home numbers by the away numbers to get the initial park factor, in this case, .759.  Finally, we apply the Other Parks Corrector, which accounts for the fact that the averages of all the other parks include the ratings of this park.  This is calculated as  n / (n -1 + IPF) where n is the number of teams (30) and IPF is the initial park factor we calculated in the previous step.  In the Rangers' case, this results in a one year Balls in Dirt Park Factor of .765, by far the lowest in the majors.

Here are the results for the entire league, and you can find my complete spreadsheet up on EditGrid.  vNBID is the Normalized Balls in Dirt as the visiting team, while hNBID is the Normailzed Balls in Dirt at home.  PF is park factor.

 

Team vNBID hNBID PF
ANA 464 508 1.091378
ARI 366 436 1.18371
ATL 403 404 1.002398
BAL 447 499 1.112019
BOS 322 270 0.843047
CHA 349 295 0.849654
CHN 342 333 0.974539
CIN 439 481 1.092189
CLE 340 307 0.905872
COL 388 377 0.972569
DET 364 451 1.229218
FLO 395 374 0.948516
HOU 336 378 1.120332
KCA 402 498 1.229023
LAN 455 429 0.944656
MIL 523 495 0.948155
MIN 284 273 0.96251
NYA 358 387 1.078095
NYN 258 264 1.022463
OAK 283 391 1.364271
PHI 453 470 1.036231
PIT 376 357 0.95107
SDN 334 329 0.985522
SEA 344 338 0.98313
SFN 451 429 0.952769
SLN 451 520 1.147143
TBA 352 371 1.052084
TEX 311 236 0.764992
TOR 402 488 1.205335
WAS 483 449 0.931793

I'm not sure what causes there to be a park factor for balls in the dirt - or even if it's a true effect.  One season of data is nowhere near enough to go on, so I'd like to replicate the results with the more limited 2006 and 2007 data and see if there's a pattern here.  Remember though, these are pitches that would be identified as balls in the dirt by the cameras and computers, not by the humans scoring the game, which should eliminate one potential source of bias. 

It's possible that this discrepancy is just a reflection of some other explainable difference - perhaps one team played many more blowouts at home than on the road, so there's no need to try and get batters to chase at home.  Or perhaps some outlier pitchers happened to pitch more often on the road, therefore driving up those numbers.

What other factors could contribute to such an effect?  I'm sure there's plenty I missed, and I'd love to hear any ideas that are out there.


1 In case anyone is interested, here's the formula Harry provided me.

(`y0` + (`vy0` * ((-(`vz0`) - sqrt(((`vz0` * `vz0`) - ((2 * `az`) * (z0))))) / `az`))) + (((0.5 * `ay`) * ((-(`vz0`) - sqrt(((`vz0` * `vz0`) - ((2 * `az`) * (z0))))) / `az`)) * ((-(`vz0`) - sqrt(((`vz0` * `vz0`) - ((2 * `az`) * (z0))))) / `az`))

He tells me that's where the ball should hit the ground in relation to the front of home plate, and I believe him.

2 The reason why the Pitch FX numbers have decimals is that not every pitch was captured by Pitch FX in 2008.  I assumed that a ball in the dirt was just as likely on a pitch that was missed by the computers, and scaled the number of balls in the dirt to the total number of pitches.

3 Although Baseball Reference describes an iterative process to get the proper park factors for batters and pitchers, I didn't think it applied in this case because I was looking at a single number versus two correlated values.

0 recs  |  Comment 12 comments |

Story-email Email Printer Print

Comments

Display:

perhaps how close the BID was to the catcher

I can see where a ball that bounces behind the plate and the catcher short hops it wouldn’t be scored as a ball in the dirt.

by ol Pete on Feb 2, 2009 8:10 AM EST reply actions   0 recs

I could see that being an issue for the stringers, but not for Pitch FX

That’s purely based on where the cameras pick up the ball (and the math).

by Dan Turkenkopf on Feb 3, 2009 8:47 PM EST up reply actions   0 recs

I wonder

if there is any correlation between BID and the HR factor of a given park? If a pitcher is aware that a particular park favors hitters hitting the ball out of the park, would they be more inclined to keep the ball low?

Those Pilgrims ain't lookin' so proud now...

by giveml on Feb 2, 2009 1:22 PM EST reply actions   0 recs

Good question

Eyeballing it, there doesn’t appear to be – Texas and US Cellular have low PFs, GABP is at 1.1 and Dodger Stadium is at .94.

But I’ll run a correlation and see what happens.

by Dan Turkenkopf on Feb 3, 2009 8:49 PM EST up reply actions   0 recs

video

Did you take a lot of these extra balls in the dirt and use mlb.com tv to watch the video of the pitch and see if they are indeed in the dirt?

by willkoky on Feb 2, 2009 10:35 PM EST reply actions   0 recs

I just checked four pitches
date                H     I      B       S     p	                    h	                xG   yG
2008-04-01  flo    1     1       0      Vanden Hurk   Castillo        0.9   2.6
2008-04-01  flo    5     0       2      Sosa                 Jacobs         0.7   9.6
2008-04-01  flo    5     1       1     Pinto                  Beltran        -0.4  -2.7
2008-04-01  flo    4     0       0     Sosa                  Uggla           1.5  -2.7

In order as above
1 – well short of plate
2 – near the cut-out where the grass ends before home – very short pitch
3 – caught by the catcher, palm up off the ground
4- back-handed on one hop in the dirt.

The pitches that are just short of the catcher are in the dirt IFF they are wide, would be a good theory to check. I don’t have the spotter mis-matches handy, I’m just checking based on the distances. So, the cut-off may not be a line, but a curve.

by Harry Pavlidis on Feb 4, 2009 11:04 AM EST reply actions   0 recs

eesh

sorry ’bout the format. nice try , though.

by Harry Pavlidis on Feb 4, 2009 11:04 AM EST up reply actions   0 recs

xG yG
0.9 2.6
0.7 9.6
-0.4 -2.7
1.5 -2.7

by Harry Pavlidis on Feb 4, 2009 11:10 AM EST up reply actions   0 recs

Pitches in Dirt

I wouldn’t consider 60% rate a large discrepancy. The human scores are often going to miss some pitches in the dirt, but rarely call a pitch as in the dirt that WASN’T actually in the dirt.

I think you’d find something similar with pickoffs if you compared a human scorer recording pickoffs vs. the video – the human probably misses recording the pickoff about 30-40% of the time.

KJOK

by KJOK on Feb 4, 2009 5:07 PM EST reply actions   0 recs

Really?

Given a ball that hits the dirt, the scorer misses it on the order of HALF THE TIME? I find that hard to believe. At least, it’s going to take more than this one article. Good thing they have more planned.

My initial reaction is that three feet is too far back. Might it be more like 2.5 feet? I’m guess a lot of balls in the dirt are very short hops right into the catchers glove. What’s the error rate at that distance? (Assuming it’s easy to calculate.)

I just read this article (I thought it would be much more difficult for me to understand) and loved it. Here’s hoping it’s the first of twenty.

Beyond the Boxscore // Calling BJ Upton lazy is lazy.

by Sky Kalkman on Feb 5, 2009 5:07 PM EST up reply actions   0 recs

Just ran the 2007 PFs

Well, as best I could with the limited data.

Correlation with 2008 is .02 – although I wouldn’t necessarily have expected a high correlation, since the cameras were almost certainly adjusted or changed between seasons.

Maybe I’ll keep an eye on this throughout 2009 and see if there’s anything to it.

by Dan Turkenkopf on Feb 5, 2009 7:48 AM EST reply actions   0 recs

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?
Start posting on Beyond the Box Score »

Join SB Nation and dive into communities focused on all your favorite teams.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Small
PZR-based Win Values 2001-2006

Recent FanPosts

Small
The "30 parks on a budget" challenge
Sunflower_small
World Series Simulation, Game #6
Small
JT20 Dynasty League
E52205a2_small
New Look
Sth70021_small
Exploring Hit f/x, Albeit Badly
Redcap_small
Ricky Nolasco: 4 WAR or 1 WAR?
Redcap_small
Apparently I can't do park adjustments
Small
Which tells us more: The last 7 at bats or 7 at bats against this pitcher?
Sleepy_jeff_small
How Efficient and Effective Were the Rockies in 2009?

+ New FanPost All FanPosts >

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

The Mistake Lottery
On the Field, the Yankees Are The Team of the Decade. Off It? The Red Sox.
Tigers' all-time WAR leaders
Primer on Runs Created
How to improve basketball
LB Keith Bulluck uses a sabermetric analogy to explain the Titans' quarterback situation.
Alcides Escobar "abandoned his daughter before she was born"
UZR, Scouting, and the Fans
Not-So-Lousy Lineup Optimizer, Playoff Edition: New York Yankees

+ New FanShot All FanShots >

BtB on Twitter

Main Feed: @BtBScore

Tommy B: @tommy_bennett
Sky: @BtB_Sky
Dan: @dturkenk
Harry: @harrypav
Jinaz: @jinazreds
Jack: @jh_moore
Erik: @Erik_Manning
Tommy R: @trancel
Justin: @justinbopp

Subscribe to BtB via Email

Enter your email address:

Delivered by FeedBurner

Most Commented

BtB Goes Social


Managers

Nando_small R.J. Anderson

Limes_125_small Sky Kalkman

E52205a2_small Tommy Bennett

Editors

Face_small Harry Pavlidis

Rawlings_baseball_bigger_small Dan Turkenkopf

770insig_small Jeff Zimmerman (TucsonRoyal)

Aviles_small Justin Bopp

Authors

Banny_small erik

Raysring1_small Tommy Rancel

Jinaz-reds-avatar_small JinAZ

Jmlogo_small Jack Moore

1753738656_110919ebe9_o_small vivaelpujols

1_small Graham

Baseball_small Mike Rogers

Redcap_small SFiercex4

Small Patrick Clark

Walter_album_small Walter Fulbright