I've done my fair share of analysis using batted ball data. In trying to replicate Tony Blengino's (of FanGraphs) methods, which look at batted ball frequency and production separated by type, I've looked to FanGraphs. I've looked to Baseball Reference. I've looked to Baseball Savant. I've looked a little bit to Baseball Heat Maps. Unfortunately, they're not all the same. There isn't a single source of truth. For now.
Consider this article as something of a reference. With granular batted ball data becoming more of an interest due to StatCast, the classification of batted balls will hopefully become more uniform. However, that point has not yet arrived. There are technical issues that have prevented accurate analysis of the so-far-released batted ball data. Until that data is ready, we must deal with the differences that we have. The following graphs show the differences.
Line drive rate is the first one up. The statistic is often cited as something that is not consistent from year-to-year for a single player. Most players regress to the mean. The mean, however, is different depending on the data source. If I'm not mistaken, Fangraphs (FG) gets its numbers from Baseball Info Solutions (BIS). Baseball Reference (BR) gets its numbers from Retrosheet. Baseball Savant (BS) gets its numbers from PitchF/X. At the end of this article, you will find tables with the raw numbers.
Baseball Reference and Baseball Savant are in relatively close agreement on the frequency of line drives. Fangraphs differs quite a bit. For some reason, in 2013 the frequency of line drives jumped up for BR and BS but not FG. It's possible that MLB "adjusted" the definition of a line drive in the 2012 offseason, which means the stringers feeding the info for Retrosheet and the algorithm feeding PitchF/X would have been updated on the new classification criteria. This seems unlikely to me, but I can't think of any other reason. I find it hard to believe that there would be a four percentage point increase in line drives from one year to the next when it is understood that players generally don't have control over the frequency of line drives they hit.
The next graph shows the differences in fly ball rate. Baseball Savant separates fly balls and popups. I have added in popups to Baseball Savant's fly balls calculation.
It's basically the opposite of line drives in terms of trend. Again, BR and BS line up quite well. If some fly balls are being categorized as line drives instead, fly ball frequency would decrease and line drive frequency would increase. In addition, the production on fly balls would decrease. If these fly balls are the "borderline" ones that might be considered line drives, and these are the ones being taken away, the proportion of weak fly balls that make up the total fly ball count would increase. This indeed shows up in the Baseball Reference production data, which I will show a bit later.
Here is ground ball frequency. Note that Baseball Reference separates out bunts and ground balls, whereas it does not appear that other sites do. I have included bunts in my calculation of ground balls for BR.
The trends for each website are about the same, but FG's rate is about two percentage points below the other two sites in each year.
Fangraphs calculates an infield fly ball rate (IFFB%), while Baseball Savant has a popup rate. Theoretically, I don't consider the two measures quite alike. Multiplying IFFB% by FB% for Fangraphs data gives the infield fly ball rate relative to all batted balls instead of just fly balls, but that doesn't remove the restriction of being an infield fly ball rate. I suspect that Baseball Savant's popup calculation includes popups outside the infield. I would expect the rates to be different, and I would expect Savant's rate to be higher. It is.
I mentioned comparing the fly ball production between FG and BR earlier. Keep in mind that the definitions for a fly ball appear to be different for each website, so I would expect the production to be different. I use BA*1.7+SLG to calculate an OPS-like measure that weights BA more than SLG, since getting on base is more important than slugging. BA and OBP for batted balls do not differ much. This is mostly to show the drop in production in 2013 for BR's fly ball data.
In general, due to the decline of offense in general, I would expect the slope of each line to trend downward. As more borderline fly balls are classified as line drives by BR, I would expect the production trend to keep going down at a deeper slope than FG's production. 2013 is when the trend for BR really started to go downhill.
Overall, there is significant source disagreement; consequently, analysts must adjust. When comparing players, they must always be compared to the league average for whatever source the analyst is using. That way, the analyst is using a comparable, indexed measure.
The future does not lie here, however. StatCast, once it is technically correct, will allow analysts and scorers to use granular batted ball data to classify a batted ball into a category. There will be research done to define the appropriate angle/velocity combination for a line drive and a fly ball and a popup, and then there will be one single source of truth for all to use.
Or, at least, that's the fantasy. Hopefully fantasy becomes reality.
. . .
All statistics courtesy of FanGraphs and Baseball-Reference and Baseball Savant.
Kevin Ruprecht is the Managing Editor of Beyond the Box Score. He also writes at Royals Review. You can follow him on Twitter at @KevinRuprecht.
Queried each year from the batter's perspective. Included only the three "In play..." options.