clock menu more-arrow no yes

Filed under:

Batted balls and lady luck

New, 12 comments

First of all I'd like to thank Marc for extending an invitation to me to become a regular contributor at BtB. I have actually, and almost unbelievably, followed Marc's work since his days at Baseball Rants - don't worry, he couldn't believe it either! I'm hoping to post an article at least every two weeks or so. In fact all my sabermetric articles will appear here in future. Enough of the pleasantries - let's kick-off the inaugural post.

Sabermetricians love data. The beauty of baseball is that there is plenty (dare I say, too much?) of it to get our stat-obsessed claws into. Over the last few years a new type of data have come to the fore: batted balls. Simply put, during every play of every game, some poor soul decides whether a ball-in-play is a groundball, pop-up, line drive or flyball. Fortunately for you and me, this glut of information allows us to enhance our understanding of pitcher batter confrontations.

Questions such as: do flyball pitchers have a higher K/9 rate than groundball pitchers, or are pitchers who induce pop-ups more susceptible to the longball, are no longer the subject of idle speculation, but rather fall in to the realm of quantitative analysis. A pet project of mine over the last few months has been to use batted ball data to better understand the ability of players - both batters and pitchers - to control balls in play. To avoid droning on for too long, and to give you a chance to read some of the other excellent posts on this site, I'll focus on pitching for the rest of this article.

To start with, here is an easy question: do groundball pitchers exist? In other words, if I take a bunch of pitchers with strong groundball tendencies in year 1, will they continue to make plenty of groundouts in year 2? Absolutely. Look at the chart below which plots 2004 groundball in play (GBIP) rate against 2005 GBIP rate for all pitchers with more than 40 BIP in both years.



The r^2 is a shade over 0.5, meaning that 50% of the variance in the 2005 GBIP is explained by 2004 performance. You don't need to be Copernicus to see that getting hitters to groundout is a skill.

Now let's switch our attention to line drives in play (LDIP). Again, as for groundballs, you might expect some pitchers to have an ability to prevent LDIP. Put me and Johan Santana on the mound for a couple of innings and you won't need a calculator to work out who'll have a higher LDIP. Enough speculation, let's take a look at the data:



Whoa - what's going on? There is no correlation whatsoever (r^2: 0.01). Does this mean that giving up line drives is just luck? Hmm, this is contrary to what we first thought, so it's worth probing a bit deeper.

The problem could well be with the correlation technique. Year to year correlations can be unreliable if either:

  1. the sample size is too low
  2. the data contains a number of sub-groups with different means, weakening the correlation
  3. variation in skill between different players is small and undetectable
  4. some combination of the above

Let's take a closer look. The sample size is 369 - definitely on the small side but presidential elections have been decided on less. There is a fair chance that the random noise in two years of data masks any skill. I'd definitely want to see three to four years of data before I was comfortable with the LDIP conclusion. Now, we could use BIS (Baseball Information Solutions) which tracks LDIP for the past four years, but there are problems with batted ball codings which, I am led to believe, make the data unreliable.

OK, time for a change of tack. Can we cut the data another way to discern the skill impact of a line drive? One option is to divide the 2004 data into, say, 6 buckets of increasing LDIP percentage. This also helps to analyze the impact of sub-groups (note: although we haven't explicitly defined the sub-groups, the bucket approach should broadly capture them if they exist). If there was no skill involved we would expect that the pitchers in a particular 2004 LDIP bucket to be evenly distributed across all the buckets in 2005. With a sample size of 369 and 36 possible buckets, the expected frequency for each bucket is a fraction over 10. This is what we actually get:



This looks a little better. Again we are at the mercy of a small sample size but we can see that among the elite group of LDIP pitchers (upper left part of the table, shaded blue) the frequencies are actually way greater than 10 - we could finally be detecting that elusive skill! Looking at the worst LDIP pitchers (bottom right corner, shaded gray), again the frequencies point to some sort of repeatable skill. Let's try to statistically quantify this using a Chi Squared test. Plug the numbers in and we get a p-value of 0.035. Bingo! Despite the small sample size it appears that pitchers do have some control over their LDIP rate (at a 95% confidence interval).

OK, I know what you are thinking. Sure, there might be a small difference in skill but so what, it is practically unmeasurable so is unimportant. Not so fast ... even a tiny variation in skill can result in a big impact if the value of the event is high enough. And line drives are valuable: an astonishing 75% fall in play and most of those go for extra bases. Because the skill element is so small we need to use a different technique to tease it out.

We know that hitting line drives is a binomial event: every at-bat results in either a 1 (LDIP) or 0 (no LDIP) If we add up these 1s and 0s and divide by BIP we get LDIP%. So if LDIP were 100% luck we'd expect the variance in the data to be exactly the same as that predicted by the binomial distribution. But we showed above that LDIP has a skill element, so actually we expect the variance to be greater than that predicted by the binomial distribution. Got it? No? Right; an example might help.

Consider two coin flippers, both who have the same skill level (ie, both flip heads 50% of the time). If you work out the actual variance in the data and compare it to that predicted by the binomial distribution it should be identical. However, suppose our two coin flippers have different skill levels!! If one flips heads 75% of the time and the other 25% of the time then the combined mean remains the same. But, and here it gets interesting, the variance is much larger than when both players had the same probability of flipping heads. Why? Because of the difference between the players' skill.

Right, back to baseball and LDIP. From the binomial distribution (everyone dust down their stat texts) we expect the standard deviation of LDIP to be .022 (mean: 0.193). What does the data show? .033. High fives all round, a difference of .025 (std. dev.), which is directly attributable to player skill. Hang on, don't get too carried away just yet. All we are saying is that given an average LDIP of 19.3%, approximately 67% of pitchers will have a LDIP skill between 18.2% and 20.4%. An average pitcher may have 250 BIPs a season, so we are talking a difference of 4 line drives! The skill exists, but you need to be superhuman to detect it by eye.

OK, let's try and put a sense of perspective on this. If we want to predict a pitchers' LDIP for 2006 based on his 2005 data, by how much do we need to regress to the mean? A rough formula is to divide the expected population variance by the skill variance weighted by 1/BIP. This gives us 800. What we are saying is that a pitcher has to allow 800 BIP before we can regress his LDIP rate by half. Despite the noise of the y-t-y correlation, it would appear that pitchers like Santana, Cordero and Gordon, who give up relatively few line drives, know what they are doing after all ... sort of!

[Update] I have updated the numbers in the last paragraph based on input from Andy Dolphin on my calculation of regression to the mean. This number is incredibly sensitive to BIP. The numbers above assume a weighted 1/BIP of 1/177 for 2004 and 2005 data combined.

Caveat: if you just use a 2005 sample then 1/BIP is 1/140, and it bang on meets the binomial distribution.

The conclusion: there is probably a very small element of skill in line drives, but it is just that, very small

Thank-you Andy for your input.