Daily Box Score 9/2: Ockham's Razor

William of Ockham was a Franciscan friar, theologian, and logician from the 1300s. He is most famous for his eponymous razor. I would say it is one of the least well understood logical concepts, but for the fact that nearly all logical concepts are misunderstood.

Suffice it to say, it's up there. So what does it really say, and how does it apply to sabermetrics?

The Axiom

Ockham's Razor is usually misunderstood to say something like 

Simpler is better.

Which is a statement so easily dispatched with, we'll leave it alone and move on to a perhaps more charitable reading of the misrepresentation of good William's principle:

All other things being equal, simple explanations are better.

I suspect this is technically correct but also misleading. It's certainly true that simple explanations are easier to understand, and we like them for that reason. But there's the implication here that simplicity is its own virtue that attaches to some explanations and theories but not to others. And that isn't quite what William of Ockham meant.

This becomes clear when we realize that, just as certainly, simple explanations can be too simple. So if we take the qualifier "all other things being equal" as literally as we can, Ockham's Razor begins to take shape. 

Translated from the Latin, the Razor literally states:

Entities should not be multiplied unnecessarily.

Which now sounds a bit tautological (if the multiplications are unnecessary, then of course we shouldn't do them). So perhaps instead we can rely on Newton's reformulation of the principle:

We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances.

(Note that both of the above explanations come from this lucid explanation of the concept.)

So the point is that an explanation or a theory ought to shrink-wrap the facts. It must differentiate between those things that are necessary and essential and those things that are contingent and accidental. Then, having done so, it must include all of that information that is necessary. (NB: use of bold text indicates that Mrs. Potato Head did not forget to pack my angry eyes.)

So what implication does this have for sabermetrics and baseball generally? You have no idea how glad I am that you asked.

A Grand Unified Theory

Since the twin developments of Einstein's Theory of General Relativity and the articulation of quantum theory, the primary goal of modern physics has been the search for a unified field theory. Put simply, it would be a way to describe the four fundamental forces in terms of a single mechanism (without getting overly complex, three of the forces have been unified--electromagnetic, weak, and strong--and gravity has not).

I think that at least since Bill James developed the Runs Created formula, a similar search has been underway in sabermetrics. There have been numerous challengers over the years:

  • RC
  • RC/27
  • Win Shares
  • VORP
  • EqA
  • WARP
  • wOBA
  • WSAB
  • BaseRuns
  • WAR

I'm even confident I've missed several. But none of them is quite perfect (if only due to imperfections in the component data), and certainly none of them have gained widespread acceptance. And the multiplicity of values really seems to bug some people outside the sabermetrics community. In fact, it even bugs some inside the sabermetrics community who lament statisticians' inability to make inroads with the MSM, baseball analysts, and the casual fan.

This latter group can now count among its ranks Joe Posnanski:

I continue to look for an extremely simple one-stop-shopping stat that could replace OPS. I would LOVE to get behind one. Of course I love Base Runs because it’s so mind-boggling accurate, but it’s complicated*. Even simple runs created is a really good stat, obviously, but it just seems to scare people.

In the process, he derives the old Runs Created formula (which I nonetheless found interesting). But the main point here is that we need a single, simple metric to, you know, in the darkness bind them.

Posnanski invokes the name Tango (there should be some arcane incantation that you have to repeat three times while stirring a cauldron for that), who responds:

I think it has to be a rate, or index of some kind.  It can’t be a simple counting number, like Runs Created, because it does away with outs.  RC/27 is ok, but it goes too far in terms of its implications.  Perhaps another option is RC+, which would be RC/27 divided by the league runs per game.  So, a guy with an RC/27 of 6 when the league scores 4 is 1.50 (or 150).  OPS+ is very close to this (but it scr-ws up the individual values somewhat).  Since only Sean Forman calculates OPS+ anyway, I see no problem in creating a better stat that only one person (be it Sean or Fangraphs or Hardball Times) that calculates it.

Remember though, we have history that shows how very difficult it is to get a stat into the mainstream.  You have to respect that there are conditions to overcome.

His first reaction, naturally, is wOBA. If we limit our search only to offensive numbers, this doesn't seem like a bad idea, seeing as how wOBA is a relatively straightforward application of linear weights. But (as David Pinto points out in the comments section on Tango's post), it loses the shape of a player's performance.

Let's have an example!

A Demonstration

I was thinking about this little debate as I was perusing FanGraphs, when I came across this post by Dave Cameron.

(I am reminded of the old joke where Dave Cameron walks into a bar, puts on his Mariners hat, visualizes some data, and leaves. As he's walking out the door, a guy stops him and says, "Hey! What are you doing?" and Cameron responds, "I'm Dave Cameron, look it up!" So the guy opens a dictionary to "Cameron, Dave," where he finds an entry reading: "Dave Cameron--Fan. Graphs.")

Anyway, Dave Cameron made an interesting point:

Over the winter, the Angels lost out on a bidding war to retain Mark Teixeira and watched him end up in pinstripes. In order to fill the hole on their offense, they gave their first base job to… Mark Teixeira?

Tex, 2009: .280/.380/.541, .392 wOBA, +4.0 wins

Kendry Morales, 2009: .314/.355/.597, .398 wOBA, +3.8 wins

And this, I think, makes the point pretty nicely. These two players are pretty considerably different from each other in terms of the shape of their performances (granting, for the moment, that 2009 represents their true ability), as their triple-slash line shows. 

Returning for a moment to Wiliiam of Ockham, I think it's important to keep in mind just how many different ways of being valuable to a baseball team there are. Now, I'm not arguing these values are incommensurable. (Although some do.) But I am saying that destroying information in the pursuit of a single metric may not always be the right idea. We certainly shouldn't prostrate ourselves on the altar of simplicity for simplicity's sake. 

Discussion Question of the Day

When it comes to summing up a player's production, how simple do you think is too simple? Am I begging the question by suggesting there is such a thing as "too simple"?