Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Dissecting Nick Diaz's Positive Drug Test

Daily Box Score 9/2: Ockham's Razor

Option4_medium

William of Ockham was a Franciscan friar, theologian, and logician from the 1300s. He is most famous for his eponymous razor. I would say it is one of the least well understood logical concepts, but for the fact that nearly all logical concepts are misunderstood.

Suffice it to say, it's up there. So what does it really say, and how does it apply to sabermetrics?

Star-divide

Table of Contents

The Axiom
Grand Unified Theory
A Demonstration
Discussion Question of the Day

 

The Axiom

Ockham's Razor is usually misunderstood to say something like 

Simpler is better.

Which is a statement so easily dispatched with, we'll leave it alone and move on to a perhaps more charitable reading of the misrepresentation of good William's principle:

All other things being equal, simple explanations are better.

I suspect this is technically correct but also misleading. It's certainly true that simple explanations are easier to understand, and we like them for that reason. But there's the implication here that simplicity is its own virtue that attaches to some explanations and theories but not to others. And that isn't quite what William of Ockham meant.

This becomes clear when we realize that, just as certainly, simple explanations can be too simple. So if we take the qualifier "all other things being equal" as literally as we can, Ockham's Razor begins to take shape. 

Translated from the Latin, the Razor literally states:

Entities should not be multiplied unnecessarily.

Which now sounds a bit tautological (if the multiplications are unnecessary, then of course we shouldn't do them). So perhaps instead we can rely on Newton's reformulation of the principle:

We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances.

(Note that both of the above explanations come from this lucid explanation of the concept.)

So the point is that an explanation or a theory ought to shrink-wrap the facts. It must differentiate between those things that are necessary and essential and those things that are contingent and accidental. Then, having done so, it must include all of that information that is necessary. (NB: use of bold text indicates that Mrs. Potato Head did not forget to pack my angry eyes.)

So what implication does this have for sabermetrics and baseball generally? You have no idea how glad I am that you asked.

A Grand Unified Theory

Since the twin developments of Einstein's Theory of General Relativity and the articulation of quantum theory, the primary goal of modern physics has been the search for a unified field theory. Put simply, it would be a way to describe the four fundamental forces in terms of a single mechanism (without getting overly complex, three of the forces have been unified--electromagnetic, weak, and strong--and gravity has not).

I think that at least since Bill James developed the Runs Created formula, a similar search has been underway in sabermetrics. There have been numerous challengers over the years:

  • RC
  • RC/27
  • Win Shares
  • VORP
  • EqA
  • WARP
  • wOBA
  • WSAB
  • BaseRuns
  • WAR

I'm even confident I've missed several. But none of them is quite perfect (if only due to imperfections in the component data), and certainly none of them have gained widespread acceptance. And the multiplicity of values really seems to bug some people outside the sabermetrics community. In fact, it even bugs some inside the sabermetrics community who lament statisticians' inability to make inroads with the MSM, baseball analysts, and the casual fan.

This latter group can now count among its ranks Joe Posnanski:

I continue to look for an extremely simple one-stop-shopping stat that could replace OPS. I would LOVE to get behind one. Of course I love Base Runs because it’s so mind-boggling accurate, but it’s complicated*. Even simple runs created is a really good stat, obviously, but it just seems to scare people.

In the process, he derives the old Runs Created formula (which I nonetheless found interesting). But the main point here is that we need a single, simple metric to, you know, in the darkness bind them.

Posnanski invokes the name Tango (there should be some arcane incantation that you have to repeat three times while stirring a cauldron for that), who responds:

I think it has to be a rate, or index of some kind.  It can’t be a simple counting number, like Runs Created, because it does away with outs.  RC/27 is ok, but it goes too far in terms of its implications.  Perhaps another option is RC+, which would be RC/27 divided by the league runs per game.  So, a guy with an RC/27 of 6 when the league scores 4 is 1.50 (or 150).  OPS+ is very close to this (but it scr-ws up the individual values somewhat).  Since only Sean Forman calculates OPS+ anyway, I see no problem in creating a better stat that only one person (be it Sean or Fangraphs or Hardball Times) that calculates it.

Remember though, we have history that shows how very difficult it is to get a stat into the mainstream.  You have to respect that there are conditions to overcome.

His first reaction, naturally, is wOBA. If we limit our search only to offensive numbers, this doesn't seem like a bad idea, seeing as how wOBA is a relatively straightforward application of linear weights. But (as David Pinto points out in the comments section on Tango's post), it loses the shape of a player's performance.

Let's have an example!

A Demonstration

I was thinking about this little debate as I was perusing FanGraphs, when I came across this post by Dave Cameron.

(I am reminded of the old joke where Dave Cameron walks into a bar, puts on his Mariners hat, visualizes some data, and leaves. As he's walking out the door, a guy stops him and says, "Hey! What are you doing?" and Cameron responds, "I'm Dave Cameron, look it up!" So the guy opens a dictionary to "Cameron, Dave," where he finds an entry reading: "Dave Cameron--Fan. Graphs.")

Anyway, Dave Cameron made an interesting point:

Over the winter, the Angels lost out on a bidding war to retain Mark Teixeira and watched him end up in pinstripes. In order to fill the hole on their offense, they gave their first base job to… Mark Teixeira?

Tex, 2009: .280/.380/.541, .392 wOBA, +4.0 wins

Kendry Morales, 2009: .314/.355/.597, .398 wOBA, +3.8 wins

And this, I think, makes the point pretty nicely. These two players are pretty considerably different from each other in terms of the shape of their performances (granting, for the moment, that 2009 represents their true ability), as their triple-slash line shows. 

Returning for a moment to Wiliiam of Ockham, I think it's important to keep in mind just how many different ways of being valuable to a baseball team there are. Now, I'm not arguing these values are incommensurable. (Although some do.) But I am saying that destroying information in the pursuit of a single metric may not always be the right idea. We certainly shouldn't prostrate ourselves on the altar of simplicity for simplicity's sake. 

Discussion Question of the Day

When it comes to summing up a player's production, how simple do you think is too simple? Am I begging the question by suggesting there is such a thing as "too simple"?

Comment 36 comments  |  2 recs  | 

Do you like this story?

Comments

Display:

What's wrongn with RBI?

I'm not a sabermetrician, but I do play one at Driveline Mechanics.

Can't get enough of me? Check out my Twitter feed.

by Matt Klaassen on Sep 2, 2009 6:14 PM EDT reply actions  

A point made in the Book Blog is that it doesn't matter unless MLB adopts it.

Could also go with xOPS

((1.8 x OBP) + SLG)/3

I also like wOBA as long as it isn’t adjusted. Just keep it as a value of runs created per PA. Would be easy to explain to the fans.

Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.

by Jeff Zimmerman on Sep 3, 2009 12:07 AM EDT reply actions  

That was a good point from that thread -- someone mentioned Passer Rating as being overly complex (and stupid) but popular.

If official sources adopt it, the media will adopt it (probably after making fun of it for a while) but at least it will be out there.

by Sky Kalkman on Sep 3, 2009 9:35 AM EDT up reply actions  

Poz is waiting for BiIl James' approval

but wOBA has a couple of problems in that regrad

1) too easy to convert to runs above/below average

2) was not invented by Bill James

I'm not a sabermetrician, but I do play one at Driveline Mechanics.

Can't get enough of me? Check out my Twitter feed.

by Matt Klaassen on Sep 3, 2009 11:11 AM EDT up reply actions  

Poz's connection to Bill is somewhat irritating, but...

I can understand his reluctance to adopt wOBA if only because it includes multiplication and numbers that are not directly tied to the observable part of a game.

If I told an average fan that a home run was worth two instead of four, and a single was worth 0.9, that fan would think I’m insane. It’s taken us entirely too long to convince people that walks matter. Can you imagine the sort of battle it would take to convince the average person that we should use linear weights?

Ultimately, that’s what I think Poz was getting at. Although the formula for OBP is absurdly complicated (why isn’t the denominator just PA?) given what it calculates, overall I think it’s much more approachable for average people than “advanced” metrics like wOBA because you take raw counting stats without tossing in any seemingly-arbitrary numbers.

by jwiscarson on Sep 3, 2009 11:18 AM EDT up reply actions  

I don't know, I think the idea of linear weights isn't too hard to get.

On average, how many more runs do you expect to score in an inning because of a single?

by Sky Kalkman on Sep 3, 2009 11:40 AM EDT up reply actions  

I think that's a relatively nebulous concept for the average fan.

Not to say that they couldn’t understand it, but I think you’d have to explain it to people, whereas OBP and SLG are self-explanatory. Add them together (which seems arbitrary, I admit) to get a “complete picture” of offensive performance, and that’s that.

I don’t mean to infer that it isn’t worth the argument, but I think a lot of people will ask questions like “well, what if the guy on first is really slow?” that ultimately just confuse the issue. I think the average sports fan likes to believe that sports are more mysterious than they really are: that “clutch”, “grit”, etc. count for a big portion of the reason why things happen.

I’ve had this discussion with fans before where someone thinks that pitchers deserve credit for low-hit, low-strikeout games because they “made the batter swing at a bad pitch,” without understanding that we’re talking about a difference in milliseconds between a groundball between short and third and a routine out. That’s the sort of confusion I’m talking about — a fundamental misunderstanding of what are essentially the laws of baseball.

by jwiscarson on Sep 3, 2009 1:47 PM EDT up reply actions  

I have had the opposite effect when explaining it, people seem to take to it well.

Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.

by Jeff Zimmerman on Sep 3, 2009 2:03 PM EDT up reply actions  

No kidding?

Maybe I’m doing something wrong. I’ve tried explaining it to co-workers and to a few regulars on LSB (which is very saber-friendly, but has some traditionalists), and have never had much success beyond a few friends who are math-oriented people.

Out of curiosity, whom have you explained it to?

by jwiscarson on Sep 3, 2009 2:05 PM EDT up reply actions  

Family and friends.

Basically say it is the average history runs scored for each event. So the average homerun generates 1.9 runs

Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.

by Jeff Zimmerman on Sep 3, 2009 2:17 PM EDT up reply actions  

Except that isn't correct.

The LWTS values are figured out in comparison to the average plate appearance (typically to get absolute runs per PA you add .12 to the LWTS values). To get the wOBA values you typically do something like:

(HR_LWTS+OUT)* 1.15

Where 1.15 is the wOBA scaling factor, and 1.7 is the typical R/O value of a HR.

by cwyers on Sep 3, 2009 2:59 PM EDT up reply actions  

For the reasons you just stated is why wOBA will not get accepted on simplicity and ease?

The sentence I wrote or what you had to add.

That is why I started out by saying “… also like wOBA as long as it isn’t adjusted.”

Tom and I have agreed to disagree on this and his stubbornness, like Bill James’s, is keep things moving forward in getting to correct answers.

Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.

by Jeff Zimmerman on Sep 3, 2009 3:21 PM EDT up reply actions  

The reason wOBA is the way it is...

…is because of the specific use for which it was intended in The Book – to provide a stat that functioned like OBP but properly weighted the value of events. Everything is laid out in the appendix as to why it is like it is.

People have since started using wOBA for other uses, and it’s certainly fine for those purposes so long as you understand what it is and why it is that way. And if you really don’t care about how to do the math, then it’s certainly fine.

Now, if you want some other rate form of LWTS, that’s fine. (Tango in fact came up with one, Linear Weights Ratio, that works pretty well.) But then it’s not wOBA, it’s something else.

by cwyers on Sep 3, 2009 3:31 PM EDT up reply actions  

Bill James is the guy who devised Win Shares.

Have you seen Win Shares? Claim points? I don’t think he has a problem with that aspect of linear weights.

And for that matter, let’s compare the formula for wOBA:

(0.72*NIBB + 0.75*HBP + 0.90*1B + 0.92*RBOE + 1.24*2B + 1.56*3B + 1.95*HR) / PA

For the formula for Runs Created Historical Data Group-1:

(H+W+HB-CS-DP) * (TB+.24(W+HB-IW).5(SHSF).62SB-.03K)/(ABW+HB+SH+SF)

Or if we really want to do apples to apples, Theoretical Team Runs Created:

((H+W+HB-CS-DP) + 2.4*(AB+W+HB+SH+SF)).5(SHSF)+.62SB-.03K) + 3(AB+W+HB+SH+SF)) / (9*(AB+W+HB+SH+SF)) – .9*(AB+W+HB+SH+SF)

Bill James’ problem with wOBA is not its complexity (it’s really pretty simple) or the use of seemingly arbitrary constants (Bill James has a boatload of those).

James simply does not accept linear weights, and this is a problem that goes back all the way to when Palmer first proposed them. He’s spent over 25 years being wrong on the issue and I see no reason to expect him to change his mind now.

by cwyers on Sep 3, 2009 12:11 PM EDT up reply actions  

I didn't mean to say that Bill James has a problem with linear weights.

Just that the average person wouldn’t understand them.

You’re definitely right, his formulas are…absurd. To me, it smacks of someone tossing darts at a dart board until the data sets “look right”, rather than doing the math to make them right.

So, if Bill James is Isaac Newton, and linear weights are the ether, then who’s the Sabermetric Albert Einstein?

by jwiscarson on Sep 3, 2009 1:50 PM EDT up reply actions  

So, I thought about this...

and it’s probably Voros McCracken, right?

by jwiscarson on Sep 3, 2009 1:59 PM EDT up reply actions  

Not born yet.

Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.

by Jeff Zimmerman on Sep 3, 2009 2:04 PM EDT up reply actions  

Oh, Bill James has a problem with linear weights, alright.

And he’s very stubborn about it.

And I wouldn’t call LWTS the “ether.” They are, for the applications for which they are intended, “correct.” (And theoretical teams runs created essentially functions as a linear weights formula.)

by cwyers on Sep 3, 2009 2:53 PM EDT up reply actions  

Err, yeah.

That came out wrong. Linear weights would be relativity, and James’s fudge factors would be the ether.

by jwiscarson on Sep 3, 2009 3:35 PM EDT up reply actions  

He's not alone

Although he appears to have softened a tad recently, Clay Davenport is also against a strict linear weights approach.

by Tommy Bennett on Sep 3, 2009 4:54 PM EDT up reply actions  

EqR is a linear weights formula.

If Clay Davenport doesn’t realize it that’s his own lookout. But (aside from some silliness with SB/CS) EqR functions exactly as a linear run estimator.

by cwyers on Sep 3, 2009 4:58 PM EDT up reply actions  

Which is another thing.

EqA has got plenty of acceptance, and its formula looks something like:

RAW: (SF + SH + 1.5*BB + 1.5*HBP + 1.5*SB + 2*1B + 3*2B + 4*3B + 5*HR)/(SF+SH+BB+HBP+SB+CS+AB)

EqR: (Raw/LgRaw )^2 * PA * LgR/LgPA

EqA: (EqR/Out/5)^.4

People don’t seem to have any problems accepting complicated things so long as they trust the people that are espousing them.

by cwyers on Sep 3, 2009 5:41 PM EDT up reply actions  

This

is the most important point. I think it goes double for VORP.

by Tommy Bennett on Sep 3, 2009 5:54 PM EDT up reply actions  

EqA also has a scale everybody is familiar with, which helps.

You can tell someone, “Hey, we all know HRs are better than a single, so we’ve (these people you trust) got this formula which weights everything according to how important it is, but spits on a number which is exactly like batting average.” .260 is average, .300 is damn good, and above .350 is legendary.

by Sky Kalkman on Sep 4, 2009 8:16 AM EDT up reply actions  

I will just have to disagree. Why not weight to the actual runs scored?

I see everyone is just going to have to agree to disagree and and JoPo is going to be using OPS forever. Might need to set up a poll on this.

Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.

by Jeff Zimmerman on Sep 5, 2009 12:33 AM EDT up reply actions  

I swear Bill's hate lies in his deseated hatered of decimals.

How else can you explain that whole 3 Winshares per win, why not just use a decimal place?

Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.

by Jeff Zimmerman on Sep 3, 2009 2:18 PM EDT up reply actions  

James' fundamental problem is with negative numbers.

He simply doesn’t accept their existance – that’s one of the big reasons that Win Shares is so complicated, because of the fact that he makes it impossible to recieve negative Win Shares.

Most of the problems I have with James’ work over the past few years has come from his desperate insistance that things are not negative, and the logistical nightmares he concocts in order to come up with a system that will never produce a negative number. The problem is that the clever mathematical hoops he jumps through to avoid negative numbers detract from the accuracy of the systems he’s designing.

by cwyers on Sep 3, 2009 3:07 PM EDT up reply actions  

Okay, now I'm really curious.

This seems absurd. It’s plainly obvious to us on even the most basic observational level that you can contribute negatively to a situation. Outs remove baserunners, after all.

So…what’s his justification for this system? Or does he not really justify it?

by jwiscarson on Sep 3, 2009 3:38 PM EDT up reply actions  

Bill James says...

…that it’s impossible for a team to score negative runs. Which is true. But an individual player can certainly hit poorly enough that he contributes less to run scoring than the negative value of the outs he makes. (I’m not explaining myself very well here – mostly because there’s a big difference between negative values in a Palmer-style LWTS that’s baselined to average, and an ERP-style system that uses “absolute” runs and that further complicates the issue.)

by cwyers on Sep 3, 2009 5:03 PM EDT up reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
Context Neutral Run and RBI projections
Small
Free Agent Compensation
Img_0001_small
Value of Various Plate Approaches
Strike_three2_small
Effect of Foul Area on Strikeouts: AL 1954-68: Erratum
Small
Baseball on a stick
Small
Player Evaluating Statistic
Baseball_small
Rays Outfield: Cheap but Extremely Productive
Small
A new xBABIP
Small
Jack Morris "pitching to the score"
Strike_three2_small
Foul Area and Differences in SO: AL vs NL

+ New FanPost All FanPosts >

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Picture-6_small Chris St. John

Btbpro_small Dave Gershman

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung