Since I first started watching baseball and looking at baseball statistics, I have always wondered why we have two separate stats for "at-bats" and "plate appearances."
Not only did it bother me that these two statistics existed, but then that they were each used as denominators for commonly-used statistics. The most obvious example of this being batting average (AVG) and on-base percentage (OBP). Batting average, as I'm sure most of the readers know, is calculated as hits / ABs, but that does not factor in every time a player came to the plate that season.
This ties in with my fascination with the triple-slash line, because the usage of ABs vs PAs greatly affects these commonly-used numbers.
I decided to set out and try to create unity, as well as paint a better picture of what a player did at the plate with one quick group of stats.
The first triple-slash stat is batting average, which as I discussed above is hits divided by at-bats.
My first order of business was to toss ABs out of the picture, as using them simply creates confusion and causes a lack of unity between the three statistics. The stat I propose as a replacement to batting average is H/PA.
I began this by taking a sample of all qualified batters from 2002-2012, which was a size of 1,684. I then calculated the H/PA for each batter. The mean average for all batters in the data set was .247. For comparison the mean for batting average of the same data set was .278.
Below is a top 10 for H/PA from this set, which does not come with too many surprises. Most of these players also posted excellent batting averages for the seasons that got them on this list.
At this point I have accomplished two things: 1) I have actually sat down and computed the stat I've been conceptualizing since the first time I read a baseball card and 2) I've made sure I'm not getting absurd figures from it that do not seem right.
This does not say much about why it should or should not be used in lieu of batting average however, but I did run some numbers to see if it potentially should be.
For the same sample I mentioned above, I ran both batting average and H/PA for their predictive abilities. The R^2 on AVG from year N to year N+1 was .19, while the R^2 on H/PA from year N to year N+1 was .24.
This struck me as odd because I assumed that AVG would be more predictable due to the smaller amount of variables in play, but a few theories struck me.
- There are more plate appearances in a season than at-bats, so therefore the sample will be larger and true talent will be more likely to come out each season.
- Players have control over things like walks and HBPs, so it should not affect H/PA too greatly.
- BABIP is a larger part of AVG because AVG factors out walks and other outcomes that negate an AB designation, which means that the fluctuations in BABIP will have a greater effect on AVG than H/PA.
Then it hit me that the last bullet was testable, so I went ahead and ran linear regressions between BABIP and each of AVG and H/PA. The R^2 between BABIP and AVG was .57, and the R^2 between BABIP and H/PA was .47. This might help to explain why that difference exists in the relationship between year N and year N+1 because the randomness and variance of BABIP will have a larger effect on AVG as opposed to H/PA.
This stat by itself is not groundbreaking, but it serves the purpose of resolving a pet peeve of mine since a young age. It also is going to help be a part of my reformed triple-slash line that will give you a much better snapshot of a hitter's performance at the plate.
The H/PA metric shows you what percentage of a hitter's overall trips to the plate ended in hits, which is much more valuable to look at then the number of hits out of an arbitrary group of plate appearances from that hitter. Where it fails to properly evaluate a hitter is the quality of the hits, or ways the hitter could contribute other than getting hits. That will all come later in the series.
What do you guys think? Any ideas?