Sixteen stats enter, but only one will leave a champion. Will it be saber or traditional? New school or old school? Offense, defense, pitching or value? Let's find out.
Yep, this is happening. In honor of March Madness, that wonderful time of the year when we all pretend to know things about college basketball and brag to our friends when luck is on our side, I thought I'd make a bracket. But what kind of bracket? What else but a stats bracket?
That's right. Just when you thought Beyond the Box Score couldn't get any nerdier, I am going to make a bracket for baseball statistics. Don't pretend like you're not excited. You are. Just admit it. ADMIT IT. Good. Let's move on.
The bracket, like the March Madness bracket, will consist of four divisions: Offense, Defense, Pitching, and Value. Within each division, for the first round, there will be two match-ups, each representing an old school versus new school debate. I'm going to warn you now; new school will win most of these. We are a saber-slanted site after all. But I promise that I'll try to be as rational and clear about my reasons as possible.
After the first round, the match-ups might get a little tricky, since the stats won't necessarily be measuring the same thing. But that's alright, because in case you haven't noticed, this is not quite a scientific process. I hope you don't mind.
Enough of my boring intro - let's get to the good part.
Division One -- Offense
OPS vs. wOBA: OPS, or on-base plus slugging, is, well, on-base percentage plus slugging percentage, as I'm sure you all know. wOBA, or weighted on-base average, isn't as easy to explain in a few words, but essentially, it sums the average run values for each outcome multiplied by the raw number of said outcome, all divided by plate appearances. Long story short, the two metrics both try to measure overall offensive performance as a rate stat, but wOBA uses more accurate linear weights for the coefficients while OPS uses total bases, which don't properly represent the actual value of outcomes.
This isn't much of a contest, really. As I mentioned in a piece at the Hardball Times yesterday, the question that OPS tries to answer can be answered much better with different metrics like wOBA. OPS is nice because it combines two commonly used statistics into one easy number, but they don't even share a denominator. For basically any reason you would want to use OPS, you would be better off using wOBA (or even better, wRC+).
RBI vs. RE24: We all know what an RBI is. RE24, or runs above average by the 24 base/out stats, however, might be unknown to some of you. Simply, it measures the difference between expected runs before the play and after play based on the base/out states. So with a man on 2nd and no outs, the run expectancy for the inning might be 1.2 runs, but if the batter strikes out, it would move down to, say, 0.8 runs, thus the RE24 would be -0.4.
Why did I match this up with RBI? Well, they both try to do similar things: measure the run production of a hitter. However, RBI is obviously riddled with flaws, such as the lack of consistency in opportunity, and the lack of differentiation between runner on 3rd and no outs and runner on 1st and two outs, for example. RE24 gives you everything that you would want from RBI and more, so that's the winner.
Division Two -- Defense
UZR vs. Fielding Percentage (FP): It won't be hard to convince you of this matchup. UZR, for those unaware, stands for Ultimate Zone Rating, and is an incredibly complicated metric that I won't even pretend to understand completely (but you should read the primer on FanGraphs). The simplified version of that primer is that UZR compares a fielder's run prevention with what would be expected from the average fielder at the position. It includes errors, range, double plays, and arm.
Fielding percentage, on the other hand, doesn't consider range or arm but only errors. While it certainly correlates with fielding ability, it still falls far short because of it's failure to include many important aspects of defense. As expected, UZR wins this matchup.
Errors vs. Range Factor (RF): Bill James first introduced Range Factor as an alternative to fielding percentage and errors, measuring number of putouts and assists per inning played rather than number of errors per fielding attempt. This, he argued, would more accurately represent the full range of player's defensive skill, rather than just measuring mistakes. While errors are certainly an important part of fielding, RF takes into account both errors and other aspects of fielding.
Winner: Range Factor
Division Three -- Pitching
ERA vs. Fielding Independent Pitching (FIP): Oh, the classic debate. To measure a pitcher's run prevention, should we count all runs scored while he pitched or simply look at the factors that are independent of the defense behind him - that is, strikeouts, walks, and home runs?
This is the closest contest yet, and so far the first time that the new school stat has a chance to be taken down. However, I'm going to have to stick with DIPS theory here and choose FIP. While I do believe that pitchers often have a great deal of control over balls in play, and while I don't necessarily think that FIP should be used to measure pitcher value, I find ERA to be too dependent on defense, luck, and the official scorer's discretion.
K/BB vs. (K-BB)/PA: If you aren't aware of this distinction, stop what you're doing and go read Glenn DuPaul's fantastic article on the subject.
Now that you're back, you shouldn't need much more convincing. K/BB, though much more commonly used, is flawed because it distorts the importance of walks. As walk rate gets really low, the K/BB rate grows exponentially, despite the fact that the effectiveness of the pitcher has not drastically changed. With K-BB, both variables are given equal importance.
Division Four -- Value
fWAR vs. rWAR: I'm going to admit straight from the start that I'm not at all an expert in the different versions of WAR or their pros and cons, so these decision might not be perfectly objective. This matchup in particular features two very great metrics that probably shouldn't even be facing each other in the first round. However, that's how the bracket was made, so we'll stick with it.
The basic differences between the two WARs are listed nicely at FanGraphs; the main differences are in how they measure pitching and defense. Regarding defense, I really have very little idea about whether UZR or Total Zone or any other defensive metric is superior, so I'll ignore that. For pitching, fWAR uses FIP, thereby ignoring balls in play, while rWAR uses runs allowed and adjusts for park, league and defense.
Again, I'm undecided on which version to use, but my personal preference is fWAR. In particular, I really buy into the use of wOBA for the hitting aspect, and though I understand both arguments with regards to pitching, I lean towards using FIP to measure pitcher value rather than Runs Allowed (though I generally use a combination of both).
WARP vs. VORP: This is a fun one. VORP, or Value over Replacement Player, is the original WAR. Created by Keith Woolner of Baseball Prospectus (and now of the Indians), VORP originally was just used for hitters, but can also be used for pitchers.
WARP is essentially BP's new, modified version of VORP, but with a fielding aspect applied. I won't really get into the differences between the two, mostly because I don't know them well. But, as much as I like VORP for it's simplicity and importance to sabermetrics, I have to choose WARP here because it includes defense in its calculations.
That wraps it up for Round One. Let's see the updated bracket.
Division One -- Offense
wOBA vs. RE24: Don't worry, I'll make these later rounds shorter than the first. I'm sad that I have to choose between these two metrics, because they are two of my favorites in the whole bracket. However, while wOBA is one of the first numbers I go to when I look at a player, I absolutely love RE24 for it's simplicity and usefulness. It not only gives us a great idea of the offensive ability of the player, but includes the context of the hits, and adjusts for opportunity. It's really one of the more underrated metrics out there, and it deserves more recognition.
Division Two -- Defense
UZR vs. Range Factor: To be honest, I don't think I've ever looked at the RF for a player. While it's a really nice idea for it's simplicity and usefulness at the time of its creation, a this point there are simply better metrics to evaluate defense, including UZR. Sorry Bill, but I'm going with the favorite here.
Division Three -- Pitching
FIP vs (K-BB)/PA: This is really an interesting matchup between two minimalist statistics. While some criticize FIP for ignoring a large portion of a pitcher's performance, K-BB ignores even more, not even including home runs. And yet, it's both elegant and possibly more predictive than FIP. Especially with a small sample size, strikeouts and walks tell us a ton about a pitcher, and adding in home runs doesn't really help the model. Because of that, I'm picking another underdog and going with K-BB.
Division Four -- Value
fWAR vs. WARP: Man, I don't even know. Really, you should be looking at all versions of WAR when you want to get a sense of a player's contribution to the team. However, if I have to choose, I'm going with the version of WAR that I learned first, and the one that I look at more often: fWAR. Especially given the recent errors in the calculation of WARP and it's difference in replacement level from other version, I'm more comfortable using fWAR, especially for position players.
I'll wait until the championship to show the bracket again, but your final four stats are: RE24, UZR, (K-BB)/PA, and fWAR. Let's see who can make it to the final game.
RE24 vs. UZR: This matchup is almost too easy. Though UZR had an easy time with the rest of the Defense division, the fact of the matter is that defensive metrics just aren't as accurate, understandable, or interesting, as offensive metrics. RE24 tells us something interesting, is simple, and accurately represents reality. UZR, though important and one of the best metrics we have, comes with a lot of noise and needs a very large sample size in order to be accurate. RE24 wins this round easily and makes it to the finals.
(K-BB)/PA vs. fWAR: Now this is interesting. On the one hand, we have a metric that essentially uses two numbers, and on the other, we have a metric that uses tons of numbers and calculations. One stat goes with simplicity and elegance in order to describe a pitcher, while the other uses the "all hands on deck" approach in order to include every possible source of value.
In the end, this matchup is a matter of subjective preference, and for me, simplicity and elegance take the cake. It was a close game, and fWAR was certainly the favorite, but its complexity and flaws in terms of defense and pitching made K-BB the more appealing stat.
Here's our final bracket:
RE24 vs. (K-BB)/PA: It's been fun tournament, but after some blowouts and some nailbiters, it's come down to the stats from Offense and Pitching - how could it be any other way? Both RE24 and K-BB were underdogs after the easy first round, but both overcame some powerhouse stats due to their ability to elegantly describe an important aspect of baseball.
These are both metrics that fans, analysts and writers underappreciate and should be using more often. RE24 is a great resource for measuring context-dependent offensive contribution, and is a good alternative to RBI, while K-BB is a great way to get a quick, easy, and surprisingly predictive picture of a pitcher.
In the end though, I'm going to have to go with the stat that many people have not heard of, but which won over my mind and heart many months ago: RE24. It's an absolutely fantastic metric, and I encourage you to read more about it.
GRAND CHAMPION: RE24
Thanks to FanGraphs, Baseball Prospectus, Baseball-Reference, Tom Tango, Bill James, and whoever else played a part in the creation and distribution of these stats. Credit also to Spencer Schneier for the idea.