PitchFX, Dirt, and Parks
The last two years, I've published rankings of how successful catchers were at blocking balls in the dirt. I've been leveraging the Pitch FX data from MLB for this analysis, but I haven't really used the full power of the technology. To this point, I've relied on the Gameday stringers to classify whether a pitch was in the dirt or not.
Harry suggested that I look beyond the human element and use the more detailed pitch location information to determine when a pitch would hit the dirt. Luckily for me, he was kind enough to provide a formula that allowed us to figure out at what point the pitch would hit the ground. After going back and forth on it for a little while, and confirming with some other people, we decided that all pitches that landed within 3 feet behind the front of home plate could be considered to be balls in the dirt.1
In 2008, comparing the scorekeepers to the computer system led to the following difference:
| Stringers | Pitch FX |
| 13332 | 23147 |
So the stringers identified pitches in the dirt only 60% of the time that the Pitch FX system did. I grew curious about such a great discrepancy (which only got larger if we moved the catcher's location back to -3.5 or -4 feet).
My first thought was that the scorers in certain parks had a tendency to report fewer balls in the dirt than their cohorts in other parks. This table breaks down the identified balls in dirt by park and calculates the percentage that were correctly identified by the stringers.2
| Park | Stringers | Pitch FX |
% |
| ANA | 474 | 969.48 | 0.49 |
| ARI | 402 | 915.43 | 0.44 |
| ATL | 339 | 797.2 | 0.43 |
| BAL | 471 | 861.22 | 0.55 |
| BOS | 456 | 638.28 | 0.71 |
| CHA | 399 | 681.51 | 0.59 |
| CHN | 440 | 791.81 | 0.56 |
| CIN | 487 | 903.05 | 0.54 |
| CLE | 401 | 735.38 | 0.55 |
| COL | 394 | 734.94 | 0.54 |
| DET | 424 | 884.92 | 0.48 |
| FLO | 462 | 846.71 | 0.55 |
| HOU | 519 | 761.14 | 0.68 |
| KCA | 398 | 887.66 | 0.45 |
| LAN | 482 | 792.69 | 0.61 |
| MIL | 530 | 869.11 | 0.61 |
| MIN | 369 | 599.29 | 0.62 |
| NYA | 441 | 767.8 | 0.57 |
| NYN | 418 | 635.49 | 0.66 |
| OAK | 464 | 776.04 | 0.6 |
| PHI | 463 | 880.38 | 0.53 |
| PIT | 438 | 852.79 | 0.51 |
| SDN | 440 | 708.83 | 0.62 |
| SEA | 408 | 664.32 | 0.61 |
| SFN | 443 | 823.21 | 0.54 |
| SLN | 481 | 974.54 | 0.49 |
| TBA | 463 | 796.54 | 0.58 |
| TEX | 438 | 550.34 | 0.8 |
| TOR | 507 | 857.35 | 0.59 |
| WAS | 477 | 899 | 0.53 |
The values range from Atlanta at the bottom, where only 43% as many pitches were identified by the humans compared to the computers, to Texas, where the stringers called 80% as many balls in the dirt as did Pitch FX. But that's not the really interesting piece of information to me. Notice the discrepancy in the number of pitches that Pitch FX located as in the dirt. Texas only had around 600, while St. Louis was almost at 1000.
There are a lot of things that could cause such a large difference between parks. My first thought is that some pitchers just tend to throw more balls in the dirt than others. Perhaps the Cardinals' staff throws a lot more splitters than does the Rangers'. If that were the case, we'd expect to see roughly the same number of balls in the dirt when a team was on the road as when it was at home.
So I looked at how many pitches in the dirt each team threw both home and away. I then normalized the results around whichever had fewer pitches thrown. Finally, I calculated the single season park effects following the steps on Baseball Reference.3
Let me share a quick example before the results. Let's look at the Texas Rangers. As the home team, they had 244 pitches flagged as in the dirt according to Pitch FX. Overall at home, Pitch FX captured 11991 pitches and missed 369, for a capture rate of 97%. That allows us to scale the expected balls in the dirt1 to 251.52, so Texas had roughly 2 percent of its pitches in the dirt.
On the road, Texas had 10911 pitches registered with Pitch FX, and 684 missed. The raw number of balls in the dirt was 293, and the scaled number was 311, for just under 2.7%.
Next, I normalized the results to the smaller number of pitches - in this case those as the away team - giving 311 balls in the dirt on the road, and 236 at home. We divide the home numbers by the away numbers to get the initial park factor, in this case, .759. Finally, we apply the Other Parks Corrector, which accounts for the fact that the averages of all the other parks include the ratings of this park. This is calculated as n / (n -1 + IPF) where n is the number of teams (30) and IPF is the initial park factor we calculated in the previous step. In the Rangers' case, this results in a one year Balls in Dirt Park Factor of .765, by far the lowest in the majors.
Here are the results for the entire league, and you can find my complete spreadsheet up on EditGrid. vNBID is the Normalized Balls in Dirt as the visiting team, while hNBID is the Normailzed Balls in Dirt at home. PF is park factor.
| Team | vNBID | hNBID | PF |
| ANA | 464 | 508 | 1.091378 |
| ARI | 366 | 436 | 1.18371 |
| ATL | 403 | 404 | 1.002398 |
| BAL | 447 | 499 | 1.112019 |
| BOS | 322 | 270 | 0.843047 |
| CHA | 349 | 295 | 0.849654 |
| CHN | 342 | 333 | 0.974539 |
| CIN | 439 | 481 | 1.092189 |
| CLE | 340 | 307 | 0.905872 |
| COL | 388 | 377 | 0.972569 |
| DET | 364 | 451 | 1.229218 |
| FLO | 395 | 374 | 0.948516 |
| HOU | 336 | 378 | 1.120332 |
| KCA | 402 | 498 | 1.229023 |
| LAN | 455 | 429 | 0.944656 |
| MIL | 523 | 495 | 0.948155 |
| MIN | 284 | 273 | 0.96251 |
| NYA | 358 | 387 | 1.078095 |
| NYN | 258 | 264 | 1.022463 |
| OAK | 283 | 391 | 1.364271 |
| PHI | 453 | 470 | 1.036231 |
| PIT | 376 | 357 | 0.95107 |
| SDN | 334 | 329 | 0.985522 |
| SEA | 344 | 338 | 0.98313 |
| SFN | 451 | 429 | 0.952769 |
| SLN | 451 | 520 | 1.147143 |
| TBA | 352 | 371 | 1.052084 |
| TEX | 311 | 236 | 0.764992 |
| TOR | 402 | 488 | 1.205335 |
| WAS | 483 | 449 | 0.931793 |
I'm not sure what causes there to be a park factor for balls in the dirt - or even if it's a true effect. One season of data is nowhere near enough to go on, so I'd like to replicate the results with the more limited 2006 and 2007 data and see if there's a pattern here. Remember though, these are pitches that would be identified as balls in the dirt by the cameras and computers, not by the humans scoring the game, which should eliminate one potential source of bias.
It's possible that this discrepancy is just a reflection of some other explainable difference - perhaps one team played many more blowouts at home than on the road, so there's no need to try and get batters to chase at home. Or perhaps some outlier pitchers happened to pitch more often on the road, therefore driving up those numbers.
What other factors could contribute to such an effect? I'm sure there's plenty I missed, and I'd love to hear any ideas that are out there.
1 In case anyone is interested, here's the formula Harry provided me.
(`y0` + (`vy0` * ((-(`vz0`) - sqrt(((`vz0` * `vz0`) - ((2 * `az`) * (z0))))) / `az`))) + (((0.5 * `ay`) * ((-(`vz0`) - sqrt(((`vz0` * `vz0`) - ((2 * `az`) * (z0))))) / `az`)) * ((-(`vz0`) - sqrt(((`vz0` * `vz0`) - ((2 * `az`) * (z0))))) / `az`))
He tells me that's where the ball should hit the ground in relation to the front of home plate, and I believe him.
2 The reason why the Pitch FX numbers have decimals is that not every pitch was captured by Pitch FX in 2008. I assumed that a ball in the dirt was just as likely on a pitch that was missed by the computers, and scaled the number of balls in the dirt to the total number of pitches.
3 Although Baseball Reference describes an iterative process to get the proper park factors for batters and pitchers, I didn't think it applied in this case because I was looking at a single number versus two correlated values.
0 recs |
12 comments
|
Comments
perhaps how close the BID was to the catcher
I can see where a ball that bounces behind the plate and the catcher short hops it wouldn’t be scored as a ball in the dirt.
by ol Pete on Feb 2, 2009 8:10 AM EST reply actions 0 recs
I could see that being an issue for the stringers, but not for Pitch FX
That’s purely based on where the cameras pick up the ball (and the math).
by Dan Turkenkopf on Feb 3, 2009 8:47 PM EST up reply actions 0 recs
I wonder
if there is any correlation between BID and the HR factor of a given park? If a pitcher is aware that a particular park favors hitters hitting the ball out of the park, would they be more inclined to keep the ball low?
Those Pilgrims ain't lookin' so proud now...
by giveml on Feb 2, 2009 1:22 PM EST reply actions 0 recs
Good question
Eyeballing it, there doesn’t appear to be – Texas and US Cellular have low PFs, GABP is at 1.1 and Dodger Stadium is at .94.
But I’ll run a correlation and see what happens.
by Dan Turkenkopf on Feb 3, 2009 8:49 PM EST up reply actions 0 recs
video
Did you take a lot of these extra balls in the dirt and use mlb.com tv to watch the video of the pitch and see if they are indeed in the dirt?
by willkoky on Feb 2, 2009 10:35 PM EST reply actions 0 recs
I just checked four pitches
date H I B S p h xG yG 2008-04-01 flo 1 1 0 Vanden Hurk Castillo 0.9 2.6 2008-04-01 flo 5 0 2 Sosa Jacobs 0.7 9.6 2008-04-01 flo 5 1 1 Pinto Beltran -0.4 -2.7 2008-04-01 flo 4 0 0 Sosa Uggla 1.5 -2.7
In order as above
1 – well short of plate
2 – near the cut-out where the grass ends before home – very short pitch
3 – caught by the catcher, palm up off the ground
4- back-handed on one hop in the dirt.
The pitches that are just short of the catcher are in the dirt IFF they are wide, would be a good theory to check. I don’t have the spotter mis-matches handy, I’m just checking based on the distances. So, the cut-off may not be a line, but a curve.
by Harry Pavlidis on Feb 4, 2009 11:04 AM EST reply actions 0 recs
eesh
sorry ’bout the format. nice try , though.
by Harry Pavlidis on Feb 4, 2009 11:04 AM EST up reply actions 0 recs
xG yG
0.9 2.6
0.7 9.6
-0.4 -2.7
1.5 -2.7
by Harry Pavlidis on Feb 4, 2009 11:10 AM EST up reply actions 0 recs
Pitches in Dirt
I wouldn’t consider 60% rate a large discrepancy. The human scores are often going to miss some pitches in the dirt, but rarely call a pitch as in the dirt that WASN’T actually in the dirt.
I think you’d find something similar with pickoffs if you compared a human scorer recording pickoffs vs. the video – the human probably misses recording the pickoff about 30-40% of the time.
KJOK
by KJOK on Feb 4, 2009 5:07 PM EST reply actions 0 recs
Really?
Given a ball that hits the dirt, the scorer misses it on the order of HALF THE TIME? I find that hard to believe. At least, it’s going to take more than this one article. Good thing they have more planned.
My initial reaction is that three feet is too far back. Might it be more like 2.5 feet? I’m guess a lot of balls in the dirt are very short hops right into the catchers glove. What’s the error rate at that distance? (Assuming it’s easy to calculate.)
I just read this article (I thought it would be much more difficult for me to understand) and loved it. Here’s hoping it’s the first of twenty.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Feb 5, 2009 5:07 PM EST up reply actions 0 recs
Just ran the 2007 PFs
Well, as best I could with the limited data.
Correlation with 2008 is .02 – although I wouldn’t necessarily have expected a high correlation, since the cameras were almost certainly adjusted or changed between seasons.
Maybe I’ll keep an eye on this throughout 2009 and see if there’s anything to it.
by Dan Turkenkopf on Feb 5, 2009 7:48 AM EST reply actions 0 recs

by 













BtB on Facebook














