clock menu more-arrow no yes mobile

Filed under:

When Can We Start To Believe?

The beginning of each season always sees an abundance of players whose performances are out of line with their career norms. Pundits, fans and everywhere in between ascribe reasons: "breaking out" or "getting old" or "changed their approach" or any number of underlying reasons to describe why little more than 10% of a season's games can tell us what to expect for the remaining 140 or so games. Often, this is just due to variation in small sample sizes, but when can we stop saying "small sample" and start saying: "I believe"?

Austin Jackson swinging and successfully connecting with another pitch.
Austin Jackson swinging and successfully connecting with another pitch.
Leon Halip

You probably play fantasy baseball. This is based on the simple observation that this article is on a nerdy baseball site and you are reading it, making you a nerdy baseball fan and a very probably fantasy baseball wizard. One of the ways that many people try to win their fantasy leagues is by looking at non-traditional stats to determine if a player sitting there on waivers is, in fact, a better option than that last guy on your roster. You get enough marginal improvements like that, and you win your league, you are filled with confidence and you have a wonderful fall and winter sending emails berating your friends for not congratulating you enough on being such a brilliant and modest winner. This stuff is important after all.

Accordingly, I am in a league (non-keeper) with a bunch of friends from college; it has been running seven years strong and I spend more time crafting marginally witty trash talk to friends than is probably reasonable and way more time analyzing stats than is healthy. Exiting the draft, I felt I was getting consistent, solid production at a premium, up-the-middle position when I took Austin Jackson in the 9th round (#81 overall).

As you may know, Austin Jackson is hitting fairly well to start the 2013 season, triple-slashing at .291/.351/.394. An article, written by Paul Swydan over at FanGraphs, entitled "Austin Jackson No Longer Cares for K’s," interested me because my fantasy baseball league awards negative points when batters strike out and Jackson’s strikeout propensity was a worry at draft time.

From Paul:

Austin Jackson has a fairly well-documented history of frequent strikeouts. But every day, Jackson seems to be creating a new history. That’s because he’s decided to stop striking out.

After signing out of high school, Jackson spent five seasons in the minors. He worked his way into 40 games with Gulf Coast League Yankees as an 18-year-old, and struck out in a relatively unnoticeable 15.2% of his plate appearances. Since then, he has struck out in at least 19.3% of his plate appearances in each season. His aggregate K% in his 4,276 PA from 2006 to 2012 was 23%. At the major-league level, from 2010 to2012, it was 24.7% — a number that only 17 qualified players could "beat." After striking out in 21.5% of his PAs from ’06 to ’09, Jackson reached the majors. His adjustment period was rough. His two worst years at the dish from a strikeout perspective were his first two in the majors, as he struck out 25.2% of the time in his rookie season, and 27.1% of the time in 2011.

Nobody seemed to mind in 2010, though, because Jackson’s .396 batting average on balls in play — which is still the seventh-highest BABIP for a qualified player since 1947 — papered over a lot of problems. But when his K rate escalated even further in 2011, that’s when Jackson took action. Last season, in addition to other improvements, Jackson cut his strikeout rate more than 5% — not an insignificant mark. This season, Jackson has lopped off another 14%, down to 7.5% as of this morning.

If Jackson is able to maintain this drop in his K rate, it would be a pretty rare accomplishment."

Let me repeat that last line (Note: my emphasis): "IF Jackson is able to MAINTAIN this drop in K-rate". You may think to yourself, as I did, "whoopty-doo, he cut his very-high career K-rate over 60-some PA’s" and write it off as another lesson about small-sample sizes. I am constantly hopeful that players get off to hot starts in hopes that I can spin them for someone else that I perceive to be more valuable/ trustworthy. I just don’t know when I can believe in players' performance and when I should scream "small sample!"

However, Paul goes on to sooth my concerns and back up his assertion when he states that K% stabilizes pretty quickly around 150 PAs. Jackson, despite striking out a whopping 10 times in the five games since Paul wrote the article, is still only striking out 16.0% of his 94 PA’s, which still accounts for a 6% drop from 2012 to 2013. This would still be an immense improvement, and still fall in Paul's 5% cutoff for historical whiff-reduction.

No, it didn’t surprise me that Austin Jackson is improving. What surprised me is how quickly Swydan notes that K-rate’s stabilize. I often don't take stats seriously until May, and for good reason: BABIPs fluctuate, a FB to left at Fenway means something very different than one at Petco and so on and so forth. Typically, I don't start to pay attention to surprise performances until I feel comfortable that things have smoothed out. Then, I see an article that says that K-rates stabilize after about 150 PAs and some statistics even sooner. This got me thinking about how the "stabilization" of certain statistics throughout the season could be used by fantasy managers (and perhaps real managers) to take advantage of perceived "hot" and "cold" starts to the season.

Linked within Paul's article, Russell Carleton wrote an article back in 2007 over at that was republished in 2011 by FanGraphs. The author conducted a reliability test to determine when stats were filled with noise and shouldn’t be trusted and when stats become meaningful and can be used to make inferences about a player’s performance going forward. For more information on this study, please visit the (long and math-filled, but) wonderful article here.

I have placed them on a chart in ascending order of how many PA’s it takes to stabilize and included some milestones throughout the season to help place when the requisite PA’s would be attained for everyday players.


(Click to embiggen)

The statistics that tend to get the most attention (OBP, SLG and OPS) don’t become reliable until after well after the trade deadline!

But, that’s way off in May and we need to start making moves now. What we need is to find a way to take the stats that have stabilized (Swing% and Contact%) and predict other things which are likely to come to pass in the future. So, while Swing% and Contact% don’t seem to tell us much all on their own, we can use them to determine if other changes (that haven’t stabilized yet) are more or less believable to hold true.

I regressed Swing% and Contact% against each other and both against K% for 2012, 2013 and the change between 2012 and 2013. The only thing that proved to have any level of correlation worth investigating was Contact%-K%. During the 2012 season, Contact%-K% had a -0.91 correlation, which is extremely strong and, so far, in 2013 that same relationship has a correlation of -0.81. This means that as Contact% goes up, K% goes down, all things being equal. The final step was to see if improvements made in Contact% translate to decreases in K%. Regressing the change in Contact% from 2012 to 2013 against the change in K%, we get a correlation coefficient of -0.64. Still, that's fairly strong.

How does this all come together? Jackson is making contact with 9.2% more of his swings in 2013. His Contact% should be pretty stabilized at this point, which taken in conjunction with our correlation of Contact%-K%, means that we can be fairly confident that his change in K% is legit, or at least we have a pretty good hunch that it is legit.

To better illustrate how Contact% and K% move inversely to one another, here are the five biggest increases and decreases between 2012 and 2013 in terms of Contact%. Following after that is the K% change and the ranking of that K% among the population:

(The population studied was trimmed to 185 because at least 150 AB’s in 2012 and at least 50 AB’s in 2013 were needed in order to have confidence in our numbers)


(Click to embiggen)

Clearly, a relationship exists. That batters who make more contact tend to strike out less is hardly revolutionary; what is interesting is that we believe we have reliable numbers in Contact% and that we can see that they predict other things fairly well.

This process opens up many doors; there are innumerable regressions, predictions and analyses that can be run off of rate-stats in baseball. The key to running analyses and being able to derive something valuable from it is to understand when your data is reliable. The great work done by Carleton gets us a little bit down this immense path and allows us to start believing whether or not the changes we see, and perhaps cast aside, may actually be taking place right in front of us.