clock menu more-arrow no yes mobile

Filed under:

Small Samples and Reliability

It may be easy to chalk any hot or cold streak up to small sample size, but at what point is a sample actually large enough to use?


Today is May 9, 2013, a date that more than likely will never hold any real significance in baseball history 50 years from now. But in a world where information is constant and news on each team is up to the second, each and every day is an event in its own right. In fact, earlier today our friends over at Viva El Birdos declared May 9 to be Adequate Sample Size Day:

According to the gospel of sample sizes, around about now, we can start say something about how hitters are hitting (or rather, swinging) after 50-100 PAs without chasing ephemera. Specifically, we can talk about swing rate and contact rate, and we can almost talk about strikeout rates and pitches per plate appearance.

His article focuses on the Cardinals performance to date, but he brings up a really interesting and much debated question. When can we say that a player's performance has passed the all-important "small sample" threshold?

Luckily, just this morning Baseball Prospectus writer Russell Carleton also wrote about the subject, updating his work on the stability of pitching statistics. Last year, he did the same for hitters.

His findings show us that some statistics stabilize rather quickly while others take a much larger sample to become reliable. Interestingly to me, two of the three "three true outcome" statistics, walk and strikeout rates, stabilize very quickly for all players. After just 120 plate appearances and 170 batters faced, we can feel rather good about those rates, leading me to believe that we should feel pretty confident that numbers like FIP and xFIP will begin showing a true a quality representation of pitcher performance at those marks as well.

In Tom's article looking at the Cardinals, he is looking at these marks as reliable indicators of future performance, but to my approval Carelton warns against doing just that:

I want to (again) point out that the way in which I most often see these numbers used is not exactly what they're meant to show.

When I say that strikeout rate for pitchers stabilizes at 70 batters faced, what I mean is that we can be reasonably sure that his strikeout rate over those 70 batters is a good reflection of his talent level over those 70 (now past) plate appearances. This is different from saying that once a pitcher has gotten to 70 batters, we can assume that he will perform this way for the rest of the season. That's an assumption. It's not a bad one, but it is an assumption. Instead, what it means is that if his underlying skill set has changed in some meaningful way, we'll know in 70 plate appearances.

Unsurprisingly, that is a very well-put final thought. While we continue to look for ways to predict future performances, it is important to understand that players make adjustments and changes all the time. The scout in me has actually always disagreed with the constant disdain for small samples, because if a jump or fall in numbers is tied to something physical, it may very well be relevant and possibly predictive.

As a community I think there is room for much more analysis on the usefulness of small samples, like Charlie Adams' article on using the stats that stabilize quickly to predict other metrics just last month, and really the fact that so much work is being done is encouraging. But for now, even as a saber-slanted website, we should always remember that in small samples a good scout is ALWAYS better than stats.

Andrew Ball is a writer for Beyond the Box Score, Fake Teams, and Fantasy Ninjas.

You can follow him on twitter @Andrew_Ball.