The last two years, I've published rankings of how successful catchers were at blocking balls in the dirt. I've been leveraging the Pitch FX data from MLB for this analysis, but I haven't really used the full power of the technology. To this point, I've relied on the Gameday stringers to classify whether a pitch was in the dirt or not.
Harry suggested that I look beyond the human element and use the more detailed pitch location information to determine when a pitch would hit the dirt. Luckily for me, he was kind enough to provide a formula that allowed us to figure out at what point the pitch would hit the ground. After going back and forth on it for a little while, and confirming with some other people, we decided that all pitches that landed within 3 feet behind the front of home plate could be considered to be balls in the dirt.1
In 2008, comparing the scorekeepers to the computer system led to the following difference:
So the stringers identified pitches in the dirt only 60% of the time that the Pitch FX system did. I grew curious about such a great discrepancy (which only got larger if we moved the catcher's location back to -3.5 or -4 feet).
My first thought was that the scorers in certain parks had a tendency to report fewer balls in the dirt than their cohorts in other parks. This table breaks down the identified balls in dirt by park and calculates the percentage that were correctly identified by the stringers.2
The values range from Atlanta at the bottom, where only 43% as many pitches were identified by the humans compared to the computers, to Texas, where the stringers called 80% as many balls in the dirt as did Pitch FX. But that's not the really interesting piece of information to me. Notice the discrepancy in the number of pitches that Pitch FX located as in the dirt. Texas only had around 600, while St. Louis was almost at 1000.
There are a lot of things that could cause such a large difference between parks. My first thought is that some pitchers just tend to throw more balls in the dirt than others. Perhaps the Cardinals' staff throws a lot more splitters than does the Rangers'. If that were the case, we'd expect to see roughly the same number of balls in the dirt when a team was on the road as when it was at home.
So I looked at how many pitches in the dirt each team threw both home and away. I then normalized the results around whichever had fewer pitches thrown. Finally, I calculated the single season park effects following the steps on Baseball Reference.3
Let me share a quick example before the results. Let's look at the Texas Rangers. As the home team, they had 244 pitches flagged as in the dirt according to Pitch FX. Overall at home, Pitch FX captured 11991 pitches and missed 369, for a capture rate of 97%. That allows us to scale the expected balls in the dirt1 to 251.52, so Texas had roughly 2 percent of its pitches in the dirt.
On the road, Texas had 10911 pitches registered with Pitch FX, and 684 missed. The raw number of balls in the dirt was 293, and the scaled number was 311, for just under 2.7%.
Next, I normalized the results to the smaller number of pitches - in this case those as the away team - giving 311 balls in the dirt on the road, and 236 at home. We divide the home numbers by the away numbers to get the initial park factor, in this case, .759. Finally, we apply the Other Parks Corrector, which accounts for the fact that the averages of all the other parks include the ratings of this park. This is calculated as n / (n -1 + IPF) where n is the number of teams (30) and IPF is the initial park factor we calculated in the previous step. In the Rangers' case, this results in a one year Balls in Dirt Park Factor of .765, by far the lowest in the majors.
Here are the results for the entire league, and you can find my complete spreadsheet up on EditGrid. vNBID is the Normalized Balls in Dirt as the visiting team, while hNBID is the Normailzed Balls in Dirt at home. PF is park factor.
I'm not sure what causes there to be a park factor for balls in the dirt - or even if it's a true effect. One season of data is nowhere near enough to go on, so I'd like to replicate the results with the more limited 2006 and 2007 data and see if there's a pattern here. Remember though, these are pitches that would be identified as balls in the dirt by the cameras and computers, not by the humans scoring the game, which should eliminate one potential source of bias.
It's possible that this discrepancy is just a reflection of some other explainable difference - perhaps one team played many more blowouts at home than on the road, so there's no need to try and get batters to chase at home. Or perhaps some outlier pitchers happened to pitch more often on the road, therefore driving up those numbers.
What other factors could contribute to such an effect? I'm sure there's plenty I missed, and I'd love to hear any ideas that are out there.
1 In case anyone is interested, here's the formula Harry provided me.
(`y0` + (`vy0` * ((-(`vz0`) - sqrt(((`vz0` * `vz0`) - ((2 * `az`) * (z0))))) / `az`))) + (((0.5 * `ay`) * ((-(`vz0`) - sqrt(((`vz0` * `vz0`) - ((2 * `az`) * (z0))))) / `az`)) * ((-(`vz0`) - sqrt(((`vz0` * `vz0`) - ((2 * `az`) * (z0))))) / `az`))
He tells me that's where the ball should hit the ground in relation to the front of home plate, and I believe him.
2 The reason why the Pitch FX numbers have decimals is that not every pitch was captured by Pitch FX in 2008. I assumed that a ball in the dirt was just as likely on a pitch that was missed by the computers, and scaled the number of balls in the dirt to the total number of pitches.
3 Although Baseball Reference describes an iterative process to get the proper park factors for batters and pitchers, I didn't think it applied in this case because I was looking at a single number versus two correlated values.