clock menu more-arrow no yes mobile

Filed under:

Sabermetric Roots & Roll -- Secondary Average

The opening line of an article entitled "The Sabermetric Manifesto" by David Grabiner reads: "Bill James defined sabermetrics as "the search for objective knowledge about baseball."" Over the years sabermetrics has evolved quite a bit, but neither the definition nor the goal of the discipline--the search for objective baseball knowledge--has changed one bit.

Before we had spreadsheets and databases and Equivalent Average, the search for objective baseball knowledge lacked the scientific properties we've come to expect. Running hundreds of statistical hypothesis tests in a few hours was literally impossible. Therefore, the primary driving force behind the discipline was human intuition, rather than data. While a data-driven approach has plenty of obvious advantages, the work done in the pre retrosheet years is neither frivolous nor useless and should not be ignored. At the very least, we can learn from our mistakes, and quite frequently the research has utility even when implemented directly into a real time sabermetric discussion.

Rewind to 1975. Batters are objectively judged by their batting average and RBI's, pitchers are judged by their victories and saves, and fielders are judged by their errors and assists. That was basically everyone's basis for objective knowledge about baseball in 1975. As we now know, these statistics and metrics are hardly objective. They're objective in the sense they represent pieces of real information--data--but they're not objective in that they do a very poor job of depicting a player's value or true talent level or production or whatever you want to call it. If you were attempting to start from scratch in 1975 (before Pete Palmer introduced linear weights), starting with batting average, how would you attempt to address the obvious deficiencies associated with the then currently widely employed metrics?

Attempting to do the very same, Bill James created a nifty formula called secondary average, which attempts to address the part of baseball that was largely ignored by batting average--secondary production. Secondary production is simply offensive production apart from the hitting for average part, including walks, extra base hits, and stolen bases. The formula for secondary average is:




Secondary Average has a much larger standard deviation than batting average across the league, but isn't subject to as many fluctuations of chance as batting average is. Players frequently post secondary averages below .200 or above .400 for an entire season, something un heard of when talking about batting average. However, a player's secondary average becomes statistically significant at a smaller n. It's naturally scaled to batting average and the league average is usually around .275.

The first obvious benefit of the metric is it addresses everything batting average doesn't and nothing batting average does. Essentially, offense can be thought of as primary production and secondary production. Batting average nearly flawlessly captures the primary production part and secondary average does a pretty good job of capturing the secondary production part. Using the two metrics in conjunction--either as a linear combination (something I've been using pretty frequently as of late and I've dubbed "APS"* (Average plus Secondary Average)) or as two separate metrics--gives us a pretty good idea of a player's overall offensive value. At least a lot better than the metrics that were currently employed.

*It's my contention that this metric, APS, should have been popularized rather than OPS. Now that we have things like wOBA and EqA, there's not too much use in crusading for the widespread use of APS. I still prefer to use it, rather than OPS, if I'm trying to get a quick and dirty look at a player's overall offensive value.

The second benefit of the metric is how simple, clean, and intuitive it is. Secondary production per at bat is about as close to a perfect complement to batting average as you can get. It's something the average baseball fan can take a gander at and say, "wow, that makes sense". What the metric lacks in the precision department it more than makes up for in the accessibility department. It's simple, user-friendly metrics like this that got me interested in sabermetrics to begin with, not the 200-character regression equations or lengthy wins above replacement calculations I've learned to love.

Obviously, the metric is far from perfect, and presents a few tangible problems. One, it's not linear-weights based, therefore, close to useless if you're trying to calculate run values. I'd prefer the denominator to be PA rather than AB, but I'm also the guy who thinks batting average should be calculated H/PA. Infield hits are a gray area, should they be considered primary or secondary? I don't have a good answer, but if the latter is true then they should be accounted for in secondary average. I also prefer to use a modified version that includes HBP.

Despite the real time utility of the metric, the greatest takeaway from secondary average, I think, is the process, not the product. The process of thinking like James did--using human intuition--about baseball should be embraced, even today. As James himself once said: "We haven't figured out anything yet. A hundred years from now, we won't have begun to have the game figured out." If something seems counter intuitive, there's a good chance it's not true. The human mind is more powerful than we sometimes give it credit for.

Again, by no means am I suggesting the data driven approach is flawed and should be replaced with human intuition. That's sort of how the batting average dilemma started in the first place. But the two can and should be used in conjunction.

A little bit of historical perspective, thinking about offense in terms of primary and secondary production, and using our brains first and databases second are all things I think we (saberists), as a whole, could stand a bit more of.

Image source: Wikipedia.