clock menu more-arrow no yes

Filed under:

Defending sabermetrics on the airwaves

During a rain delay, I took it upon myself to call into a radio show and defend wins above replacement.

Kim Klement-USA TODAY Sports

This past Friday, the Toronto Blue Jays were visiting the Kansas City Royals at Kauffman Stadium. Normally a very aesthetically-pleasing park, the game was unfortunately mired in a two-hour rain delay. It just so happens that, during rain delays, a radio show I frequently listen to fields phone calls. Enter: me.

Before we get to the tape though, let's add some context. A week before this rain delay, I had called in after a game when host Mike Wilner and a color-commentator addressed the fact that Wins Above Replacement was displayed on the jumbotron. It's typically a rational broadcast, so I hold them to a high standard. This isn't the kind of program that would dismiss any new statistic outright just because it's new. I don't want to put words in their mouth but they generally dismissed WAR for a few of reasons, including: a player can't be measured by one number, defensive metrics are too subjective, there is more than one type of WAR which diminishes its worth, and there is "too much noise" in WAR.

Perhaps that last point was just summation, but I'll list it separately. Everyone reading this will have heard a conversation exactly like this one in the past. But I didn't expect it from this broadcast team, so I called in to share my thoughts.

One more bit of context: the previous caller had been suggesting that the Blue Jays are better when Ryan Goins is playing shortstop as opposed to Jose Reyes. Below is our ensuing conversation:

As I believe this is a worthwhile opportunity to trumpet the usefulness of WAR, there are a few main points that I'd like to address.

First, at no point would I actually suggest Wins Above Replacement is better than its parts. There are a couple times when the host suggests he doesn't need one number to define a player, and I understand that and actually agree. For analysis, definitely use as many metrics as possible (I lean toward wOBA or FIP though). WAR isn't meant to eliminate the existence of the other stats, but rather to arrive at a higher player of importance within your evaluations.

However, my argument for WAR in this respect is two-fold: First, WAR is built with metrics to help evaluate a player against any other player, no matter what position and that's hard to do otherwise. Second, if it were to be used by everyone all the time, people would be wrong less often than any other one statistic. That is to say, people who use RBI and Wins and even OBP and SLG to evaluate a player would be better-served at times to use WAR if they could choose only one statistic. This isn't to say they should only use one stat, but if you're in a hurry, WAR is the answer in many cases.

I believe this objection to WAR stems from some people not understanding what WAR is. WAR is a cumulative statistic, calculated by weighing many other rate statistics, to help evaluate one player against any other player regardless of position. It is designed to make sabermetrics holistic and summarizes information for the average fan. I don't mind that calculating it can take some effort. I suppose I understand why that scares some people off, but it shouldn't because no one is sitting on their couch doing long division to come up with batting average.

Second, at one point in the interview, an analogy to the Flintstones is made. I hate to parse such an innocuous statement but, truthfully, pitcher wins are the archaic statistic, not Wins Above Replacement. Even if you disagree with how we arrive at WAR, the thing it is trying to measure is exactly what you want to know about a baseball player. Maybe the inputs are imperfect, but how much value you add to your team is way more interesting than how many times your plate appearance led to another guy touching home plate (RBI) or how many times you threw at least five innings and left the game when your team was ahead for good (wins).

Third, any difficulty in explaining WAR is not an inherent flaw. That anything is hard to explain doesn't make it's something to be dismissed; often times, just the opposite. You don't need to know how something works for it to be observably true or, at least, close based on massive samplings of positive correlations. That being said, we could always do a better job of explaining ourselves. Furthermore,  the fact that there are multiple types of WAR should be seen as a strength of the statistic, not a weakness or as a confusing aspect. The inventors of WAR are wholly aware of the fact that these measurements are estimations. No statistic tells the whole story, but the point of stats isn't to be infallible, it's to be probable; for making inferences. Let's really break this down for a second. Let me explain WAR to someone who has never watched baseball before:

Wins Above Replacement is an attempt to take all of a player's performances, whether they are good or bad, and make them one number to see how much their overall contributions have helped or hurt his team.

That was relatively painless. So painless in fact that it is actually easier for me to understand WAR than it is to understand ERA, and all its nuances. Wait how often do pitchers actually pitch nine innings in one game? Hold on, what's an unearned run? Or, my favorite: Errors. Sorry, the player was close enough to a batted ball to make a play but then didn't make the play that we expected him to make? Who expected him to make it? If he wasn't positioned there that single might have been a double? WAR's purpose is to be a better measure of a player's performance than anything else. If the truth is complicated, the statistic should be complicated.

Finally, advanced stats are not designed simply as predictive tools. Some of them are, but just because something involves math doesn't mean it's forward-looking. I would urge the host and like-minded people not to think of FIP (a component of FanGraphs' pitcher WAR) as a predictor or projection. FIP isn't a measure of what should have happened; it is a measure of the only things for which we can hold a pitcher accountable with the current data we have. FIP is a description of results based on a subset of outcomes. Matt Jackson wrote on this recently in which he gave very due credit to Voros McCracken for originally theorizing that we can hold a pitcher accountable for only four main things: home runs, strikeouts, walks and hit-by-pitches, and that the variation among batted ball results is not dramatically tied to the pitcher. That's not a prediction, it's an argument that they had little to do with a particular result. You can argue the merits of that claim, but it does not make FIP into a prediction-only tool.

None of this is meant to specifically target Mike Wilner. He is consistently nice enough to grant a forum for these discussions. Instead, I wanted to call attention to the debates happening outside our little corner of the internet and to dispel some misconceptions. And I really will try to work harder.

. . .

Michael Bradburn is a Featured Writer for Beyond the Box Score. He would like to thank Mike Wilner not only for the forum to discuss sabermetrics, but also for being courteous enough to address my concerns fairly and kindly twice. He would also like to apologize to Wilner for anything above that may appear out of context. You can follow Michael on Twitter at @mwbii. You can also reach him at