clock menu more-arrow no yes

Filed under:

Complexity and outliers are poor reasons to reject DRA

Thou shalt not commit logical fallacies!

Baltimore Orioles v New York Yankees
Michael Pineda has been the face of anti-DRA arguments.
Photo by Al Bello/Getty Images

A couple weeks ago, I investigated Michael Pineda’s BABIP fluctuations over his career and why his RA9 and DRA never agree. In the analytics community, Pineda was a frequent topic of discussion because his 2.95 DRA was over two runs better than his actual run average. Even with the most recent update to DRA, Pineda still had a DRA that was roughly a run and a half better than his run average. That is still a huge difference.

Trying to figure out why that is, or treating Pineda as an outlier on whom the DRA model can’t figure out is all fair. What is not fair is throwing DRA out because it might be wrong on one out of 824 pitchers.

A well-known advocate of sabermetrics was reported to have bashed DRA at the SABR conference in Arizona, recently. He cited its complexity and the disparity on Michael Pineda as “evidence.” Ironically, he has no problem with WAR.

I put evidence in quotes for a reason. You might be familiar with the website Thou shalt not commit logical fallacies. It is nowhere close to comprehensive, but it does select the most commonly committed logical fallacies that most people will come across in their daily lives. Strawman, argumentum ad populum, argumentum ad hominem, and appeal to authority are just some examples of frequently committed logical fallacies on the internet. And really, everybody, baseball fan or not, should have a strong understanding of basic logical fallacies. It is an excellent way to discern logical, sound arguments from nonsense and attempts to deflect or distract.

It is especially important for members of the analytics community to avoid committing logical fallacies. People often equate sabermetrics with fancy stats, but that is really not the essence of sabermetrics. It’s very simple, actually. Sabermetrics is the pursuit of that which can be objectively proven about the game of baseball. That’s all. Put another way, it is applying the scientific method to the game of baseball. Sometimes fancy stats were developed to answer questions. What is most important is to always make arguments based on logic and evidence. Failing to do so embarrasses the analyst and reflects poorly on the analytics community.

To be clear, I’m not saying that analysts are not allowed to be wrong, or are not allowed to make mistakes. Nobody is perfect. However, there is a big difference between making a sound, logical argument grounded in modern baseball analysis that might not reach the best conclusion or end up falsely predicting the results, and making a logically fallacious argument in bad faith.

If you were do a deep dive into the content at Baseball Prospectus, FanGraphs, here at Beyond the Box Score, or any other analytical content outside of those sites, you would likely have a hard time finding any logical fallacies. It is not because all these writers are constantly checking to make sure that is the case, either. They just know how to make reasonable, fact-based arguments based on evidence.

Let’s swing this back to the gentleman who publicly complained about DRA at the SABR conference. He committed not one, but two logical fallacies in his argument.

  • Personal incredulity: DRA is complicated. Very, very complicated. That in no way invalidates it. A similar argument has been used against WAR. Just because something is difficult to understand doesn’t mean it’s not true.
  • Anecdotal evidence: Outliers don’t invalidate models or any kind of argument whatsoever. Even if DRA is flat out “wrong” about Michael Pineda, Jered Weaver, David Price, and the like, there are still over 800 other pitchers on whom DRA has data.

I’m sure that there are those out there who have cited Occam’s razor against complex models such as DRA. Occam’s razor is frequently misunderstood. It is more of a rule of thumb than a hard, fast rule of logic, nor is it sufficient to refute an argument. In science, it is used to to assign priority to simpler hypotheses because they are easier to test experimentally. It is a heuristic technique, not a tool for evaluating conclusions.

Part of the problem with accepting complicated models is that their inner workings are not available to the public. Sure, not a lot of people would understand them anyway— I know I wouldn’t — but those who would, could potentially offer constructive criticism. A type of peer-review process could help curtail some of the bad arguments out there. Unfortunately, this can’t be done because of proprietary reasons. It’s not like these models can be patented.

There is no doubt that there are aspects of DRA that are publicly available on which reasonable people can disagree. For example, I disagree with using in-season park factors because the sample sizes are too small. I believe it is better to use three-year park factors like Baseball Reference does in its WAR calculation. That being said, it would be silly of me to reject DRA just because of that.

I am sure that Jonathan Judge is more than happy to address any legitimate questions about DRA that are asked in good faith. It is even okay to ask about the outliers. But citing the outliers or complexity in order evaluate any advanced stat are complete non-arguments. It is human nature to reject what cannot be understood, and those in the analytical community do an excellent job of avoiding that trap. The gentleman mentioned at the beginning of this article, on the other hand, needs to be more careful.

. . .

Luis Torres is a Featured Writer at Beyond the Box Score. He is a medicinal chemist by day, baseball analyst by night. You can follow him on Twitter at @Chemtorres21.