Is a superior UZR compensated in any way?
I spent some time this last year matching up Fangraphs's defensive data with the salary data in the Lahman database. My idea was to test the commonly asserted hyposthesis that defense was the "New Moneyball".
I started by doing some pretty basic investigations of UZR. Most of what I found seemed to be similar to what Jeff Zimmerman found in these pages a few months ago. In my data, UZR correlates pretty well with runs prevented, with a coefficient of close to 1 (0.91), which matches what you would expect from theory nicely. It also declines with age, like an athletic ability should:
So, finally, I loaded UZR into a salary regression. I took salaries and performance statistics from 2002-2008. The salaries were adjusted to reflect 2008 dollars. Then I used a player's position (using corner outfielders as the control group), offensive stats (Extra Bases, Outs, Reached Bases) and tenure in the major leagues to try to predict a player's salary in the following year.
What I found was that Dave Cameron was right. UZR adds nothing to a player's expected salary. Teams pay players differently based on what position they play, but not based on the ability to play it better than any other player, at least as measured by UZR.
Regression Results
If this is correct, that superior defense is not compensated, then that means that the smart teams can essentially improve their team for free, if they can hold their offense constant and add defense. Though perhaps, that may be harder than it sounds.
This may well be changing in this very offseason. The Mariners are often pointed to as team that's taking defensive metrics seriously, and maybe enough teams will start to pay for defense to the point where the price will rise.
So what do you think? Are there any problems with this approach that undermine my result? Any questions left unanswered?
15 comments
|
2 recs |
Do you like this story?
Comments
I did a very similar study for my stats class a few weeks ago
I Marcelled each free agent since 1998, and broke down their projected numbers into 5 categories, of which defense was one. I measured defense as (Total Zone + DP rating + ARM rating), with the data taken from Rally’s database. I found that the coefficients were never significant and only in the past couple of years were they even significantly above zero.
BTW, ISOd and PA were steadily rising in terms of the size of the coefficents on salary.
That's pretty cool.
What position adjustments does your research show? I.e., how much more/less would a player earn at different positions?
Are you including all players? (That is, not just free agents?) If so, that’ll mess with the tenure piece. If you could include pre-arb/arb/and post-arb categories, that’d be big.
Beyond the Boxscore Not a member? Sign up.
This is the issue I wrestled with the most. And whereI’[m least certain of what’s the right way to do it. I ended up running T+1 regressions, T regressions and aggregated accross year regressions (where I summed up the total salary and all the independent variables, and used those as my dataset). The aggregated accross year model was the best one, but I’m not sure to really interpret the coefficients. (and no, UZR was not significant in that, either)
With regard to new contracts, the short answer is that I didn’t have contract status in my dataset. I had salary and performance stats, but that’s it. I figured by running an T+1 regression, I would at least sometimes be capturing that "new contract based on last year’s performance "aspect, while the rest of the time I was capturing the “this is this guy’s skill and why he gets paid” aspect. Running the same model on same year salaries resulted in diminished power of the whole model, but no real change in any of the coefficients or their significance.
FU, FO
and positional adjustments
The positional adjustments that I came up with in the UZRless model were (using corner outfielders as the base):
1B -1,262,200
2B -1,034,000
3B -344,700
SS +297,700
CF -417,300
a little wacky, no? I’m thinking maybe I need to adjust for baserunning or use an additional offensive category.
FU, FO
Very cool.
Its nice to see the research that supports the thought that defense is a market inefficiency.
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
Are you using Year N stats to predict Year N+1 salary?
Might using Year N-2 through N data be more meaningful, like an actual projection?
And are you using every player season, or just seasons when a new contract is signed? It doesn’t matter how a player does if he’s not getting a new contract…
Beyond the Boxscore Not a member? Sign up.
I've seen studies like this before.
This is good work.
I don’t mean to undermine you or anything, because this is good stuff, but there are some pretty serious experimental design flaws with any study that attempts to use regression models to predict salary, to the point I’m not ready to make a call either way.
Problem one involves building a model that mirrors reality WRT the pay structure. It’s difficult to do, because arbitration salaries are presumably awarded based on precedent (which is dumb, but that’s another topic for another day). When you start throwing variables in there to compensate for the irregularity of the pay structure that have nothing to do with production, the model breaks down.
The second problem involves the shear noise of UZR data. UZR numbers, especially in 1-year samples, fluctuate a great deal more than a fielder’s fundamental abilities, so that’s not an ideal starting point.
The third problem involves using UZR as your independent variable. Because UZR is a very noisy independent variable, you’re subject to a pretty good amount of regression dilution.
Like I said, good work, but I’m not ready to say, “defense is essentially free”. Not yet at least.
http://www.capitolavenueclub.com/
I agree with almost everything here. It would be premature, probably to say for certain that UZR is uncompensated but I do think I’ve got a piece of evidence here.
I’m not sure I could every really faithfully model the arbitration process, and I’m actually suspicious that those “comps” are good part of the reason why my salary positional adjustment for secondbase men is so cheap.
Lastly, I think that several commentators here are correct that using one year of UZR as an independent variable is too noisy. I’ll try using some three year averages or a projected UZR and see what happens.
FU, FO
Another issue I see...
UZR does not measure pure “talent level”. It measures talent against the average at that position.
It basically tells you whether Player A got more outs in zones 1-20 than Players B through Z. While that is handy for evaluating prior years, as a prospective tool I do not think it really holds up well.
Suppose, for example, you’re considered a mid-range shortstop.
For the first year that you’re playing, you are playing against Ozzie Smith, Barry Larkin, and Alan Trammell (just trying to pick 3 good defensive shortstops) and you post a UZR of 0.
In the second year, those three guys retire and replaced by Derek Jeter, Michael Young, and Alex Gonzalez (3 bad defensive shortstops).. now you post a UZR of 8.
The UZR metric is not simply variable based upon how well a player does in the field — it is also variable based upon how well the other players you’re up against are. It measures against the average, not simple talent level — as such, I don’t think it necessarily works in such a way to determine whether fielding is as large a market inefficiency as this suggests (no correlation)
Another example… one season the same player has a wOBA+ of 100, the following season his wOBA+ is 105. Now, suppose those wOBAs were .340 and .339. For salary purposes, you would look at RAR as his projected wOBA vs the league projected wOBA — and then calculate wOBA+. UZR is backwards in that sense since you have the normalized form of the statistic, and you’re trying to project forward with that rather than projecting the underlying skill level forward and comparing it against the projected average.
Until we can do that, I do not think you’ll necessarily be able to find a significant correllation between UZR and salary because it is a hindsight metric as opposed to a foresight metric.
UZR isn't baselined every year, fyi. It's based on an average of many seasons.
Beyond the Boxscore Not a member? Sign up.
by Sky Kalkman on Feb 17, 2010 11:56 AM EST up reply actions
Thanks for the clarification
The way I had read it was that, depending on source, it could be a small subset of seasons (1-3).
That certainly makes more sense now.
And I'm not exactly sure which seasons are used, and by which implementation.
What I think would be cool is to use a ten year baseline (arbitrary, but I’m open to justification of any other length/weighting) to zero out UZR for each position for each season. But also use those ten years to compute the positional adjustments. And maybe smooth it.
Beyond the Boxscore Not a member? Sign up.
by Sky Kalkman on Feb 18, 2010 12:09 PM EST up reply actions

by 




























