clock menu more-arrow no yes

Filed under:

Guest Article by salb918

New, 5 comments

I'd like to thank salb918 for what I found to be a very interesting article. Here it is:

How good is OPS as a measure of a player's abilities? Good, but not great as far as dashboard metrics go. In my book, a dashboard metric is one that weights and sums all of a players contributions into an easy-to-calculate number. By far the most common dashboard metric is OPS, which is the simple summation of OBP and SLG. But it has long been known that simply adding OBP and SLG is misleading.

Here's the logic: OBP is measured on a scale of 0.000 to 1.000, whereas SLG is measured on a scale of 0.000 to 4.000. Therefore, OBP is four times as important as SLG.

But there's more to it than that. A player can achieve a SLG of 1.000 in many ways. Take the following examples:

  • A lineup of Adam Dunn clones can knock one out of the park every fourth AB, and score only a finite number of runs.
  • A lineup of nine perfect Ichiros can slash perpetual singles and score an infinite number of runs.
The second case illustrates the inherent problem with simply adding OBP and SLG: both stats count some of the same contributions! Having a high OBP is correlated with having a high SLG. What we need is a simple dashboard metric that accounts for a player's ability to get on base and hit for power without double counting and without resorting to heuristics. Here are my requirements for a dashboard metric:
  1. Simplicity. Statistics like EqA and VORP are in vogue, but they are not nearly as easy to calculate as OPS. If I need an EqA or VORP, I'm going to Baseball Prospectus, not calculating it myself.
  2. Independence. The components that go into the metric must avoid double-counting to be fair to extreme hitters like the slap-hitting David Eckstein and the all-or-nothing Russell Branyan.
  3. Minimal heuristics. That means no using RC/27, which is kind of a silly stat when you think about how it is calculated.
  4. Accurate but different. If it does not show Barry Bonds to be head and shoulders over the rest of MLB, it probably stinks. Still, if it doesn't provide any new insights, what's the point?

Here is how I met those requirements:
  1. Use only well-known rate stats. For this study, I picked two from the "get-on-base" column and two from the "beat-the-snot-out-of-the-ball" column: AVG/OBP/SLG/ISO.
  2. Use minimally-correlated rate stats. I took the AVG/OBP/SLG/ISO of all players who logged over 100 PA in 2004 and calculated the correlation coefficient, r, between the four rate stats.
    The correlation coefficient matrix is (don't panic):
    AVG 1.00 0.80 0.66 0.27
    OBP 0.80 1.00 0.70 0.44
    SLG 0.66 0.70 1.00 0.91
    ISO 0.27 0.44 0.91 1.00

    What I was looking for was a pair of stats that have r close to zero, meaning that a player good in one area is not necessarily good in the other area; this avoids double-counting for similar skills. What does it all mean? Reading the AVG row and the OBP column, the value r = 0.80 indicates that the ability to hit for AVG and for OBP are fairly well correlated. On the other hand, the value of r for AVG and ISO are not at all well correlated (r = 0.27). Note that OPS uses stats that whose correlation is not insignificant. For this study, I decided that the best pairs of stats to try would be those with lowest correlation: AVG/ISO and OBP/ISO.
  3. Here is where it gets tricky. I did not want to use a statistic like RC/27 or EqA because I either do not agree with or do not understand the assumptions that went into creating them. Instead, I simulated the offensive contributions of a player using a Monte Carlo computer program written in MATLAB. The program would simulate how many runs a lineup of nine, say, Scott Hatteberg's, would score. The "rules" for my program were:
    • Every player bats behind and in front of himself.
    • The only contributions are walks, singles, doubles, triples, homeruns, and outs. I did not include second-order effects such as sacrifice flies, sacrifice hits, and stolen bases because I am lazy and the first-order effects should be enough to get what we want.
    • Baserunning is strictly station-to-station. The exception is with two outs, when baserunners are permitted to score from first on a double, move from first to third on a single, etc.
    • There are no "clutch hits" or left/right splits.
    I know there will be some criticism of the program, but I think it captures all the main effects. The other neat thing is that it can adjusted for park and/or era. All someone needs is the data - to which I don't have access.

    I ran the simulation for thirty random players from 2004. Because each game itself is inherently random, each player "played" 1500 games the results were averaged. The average runs scored was used as an offensive figure of merit and I fit the runs scored to a three-parameter regression:
    Runs Scored = C1 (AVG) + C2 (ISO) + C3
    Runs Scored = C1 (OBP) + C2 (ISO) + C3
    The C's in the formulas are coefficients that are determined by linear regression.
    The ratio of C1 to C2 tells us the relative importance of the two variables in predicting the runs scored

    How good was the formula in predicting the Runs Scored? Pretty picture time:

    The plots you see above show the relationship between the simulated Runs Scored (on the horizontal axis) and the predicted Runs Scored from the linear regression (on the vertical axis). If my regression equations were perfect, every data point would line up on the 45-degree line, which would correspond to r2 = 1. Based on the r2 values, the OBP/ISO variable set is a better choice than the AVG/ISO variable set (we already knew that, though, didn't we?). For OBP/ISO, the linear regression gives a ratio, C1/C2 = 2.49, and the dashboard metric (mOPS, or Raw modified OPS) is:
    mOPS = 2.49 (OBP) + (ISO)
  4. Does mOPS tell us anything interesting? The best way to evaluate its novelty is to see which players are overrated and underrated by OPS.

    In general, OPS tends to overrate hackers with moderate power. The six most overrated players with more than 500 PAs in 2004: Pedro Feliz, Jack Wilson, Juan Uribe, AJ Pierzynski, Carl Crawford. Intuitively, I always thought these guys were overrated - even by the pseudo-sabermetric OPS. Not so with mOPS, which gives more credit to patient hitters, even those who don't have much power. The six most underrated players with more than 500 PA in 2004: Nick Johnson, Kenny Lofton, Ryan Freel, Luis Castillo.

    Other interesting notes:
    • Adrian Beltre's monster 2004 season in Los Angeles was good for the fifth-best OPS in the majors, but his mOPS was only 17th, behind Adam Dunn and JT Snow. It appears his lack of discipline is catching up with him this year.
    • I've always thought that Bobby Abreu is wholly unappreciated, and so does mOPS. His OPS was 17th in the majors, but mOPS ranked him in the top ten, slightly above Manny Ramirez.
    • Did the A's hang on to the right player? Miguel Tejada and Eric Chavez posted nearly identical OPSs (.894 to .898). But Chavez did a fantastic job of getting on base and posted a mOPS of 1.22 to Tejada's 1.12. I still love me some Tejada, though
    • Had equal seasons: Hee Seop Choi and Derrek Lee (1.12).
    • Finally, if you even believe it's possible, OPS managed to overrate His Neifi-ness.

mOPS isn't perfect, but improving it should be relatively easy (if time consuming). For example, we can adjust for park factors at the level of individual offensive components (i.e., a home-run suppressing park is often friendly for triples). This can be incorporated into the simulation and the calculation of "magical" 2.5 ratio in the mOPS equation. Or, more simply, one can simply use park-adjusted OBP and ISO to calculate mOPS. The simulation can also be adjusted in the future to incorporate sacrifice hits, hit by pitch (Craig Biggio, this is your life!), stolen bases, etc. These second-order effects can be easily incorporated into the framework outlined here. The main issue is my offensive figure of merit being the run-scoring in the simulator. Unfortunately, I don't have an answer if you think that the program is a poor way to measure offensive performance.

Thanks go to Genaro of who tried to improve my program. Although it didn't quite work out, his efforts are greatly appreciated.