clock menu more-arrow no yes mobile

Filed under:

Bucketing Outcomes in Baseball (and Beyond)

For me, the most useful thing about sabermetrics and the analysis of baseball (as well as other sports) is that it highlights in a very practical way the difficulty and importance of categorizing outcomes.

By this I mean whether or not an outcome is mostly due to the true talents and agency of an individual (think pitchers and strikeouts), the environment an individual or team acts within (think HR/FB rates in PETCO), or whether it is largely the result of randomness (think Anthony Young's consecutive loss streak). 

We can gain some perspective by "bucketing" outcomes in terms of the weighted distribution of their causes:



In the chart above we see four hypothetical outcomes, each caused by a different weighting of variables. In the first outcome (A), we see that individual and environmental factors both play a relatively equal role in determining what happens. In baseball terms, we might think about the earned run average of a ground ball pitcher that plays with a top-tier infield behind him. 

Outcome B is representative of a scenario where the environment is largely responsible for what happens. It's not that individual abilities and randomness don't matter, simply that the environment is playing a larger role. Here we might draw parallels to periods in baseball where the rules of the game were different and benefited some players over others. Think about the increase in offensive production that followed the banning of spitballs, etc, in 1920, or the lowering of the pitcher's mound in 1969. If we are comparing the offensive outputs of entire leagues across time the difference is likely to be largely explained by these environmental factors, not just the inherent talents of the players nor simply randomness. 

With Outcome C, we see lady luck playing her greatest role. In such a situation we find streaks such as Anthony Young's consecutive loss streak. Anthony Young was not a bad pitcher by any stretch. In fact, his performance over his winless streak was actually quite good compared to the rest of the hurlers in the league. Unfortunately, there are simply times when luck will rear it's ugly head and make it appear that someone is less competent than they truly are.

Finally, Outcome D represents what people, generally speaking, tend to implicitly or explicitly believe about how the world works. Rather than recognize these situations for what they are--one of a number of types of outcomes--we tend assume they are the vast majority.  And that's where we get into trouble.

People are seemingly programmed to assume that most outcomes fall into the Outcome D category. People act and based on their own abilities--physical, mental--they determine the outcome. Ask most successful people whether their achievements are due in any significant way to luck or to their current environment and most will scoff at the idea. Psychologically, humans are more comfortable with the notion that they control events around them, especially when it comes to their own success.

But of course, we know that outcomes are usually the result of some weighted blend of agency, environment, and lady luck. Even the best hitters in baseball can find themselves in horrible slumps where they can't buy a hit. Similarly, we see instances where mediocre-at-best players find themselves with impressive statistics in a short playoff series. 

In the first instance, the hitter did not forget to hit, and in the latter, the player did not summon some unique talent for hitting "when it counts". All too often, people will tend to assume that the individual had more to do with both outcomes than they actually did.

Most saber-enthusiasts understand this idea. But most people don't. And it isn't just in baseball. Look at any industry or domain, whether it is public or private, and you will find decision makers struggling with this concept. 

Our world has no shortage of metrics. Of course, the trick is determining A) whether or not a metric actually measures what it is supposed to, and B) whether or not one should rely on that metric when making a decision.

What I love about sabermetrics, and advanced analytics in other sports as well, is that people are attempting to uncover and categorize various outcomes into different buckets so that when we evaluate individuals and teams we can attribute the proper credit or blame to them. FIP, Park Factors, BABIP, all of this stuff is really about identifying the role that agency, environment, and luck plays in different outcomes and evaluating them in light of that fact.

It isn't just about using sophisticated statistical tools. Rather, it's a philosophical mindset that is critical. If leaders and decision makers in any domain don't understand the interplay of these three factors and are unconcerned with identifying when each, or a combination of each, is driving an outcome then all the data in the world will not make for better decisions. 

When it comes to most domains outside of baseball, we have a long way to go in terms of starting to bucket outcomes. There is much to be learned from our "hobby".