Quantifying the Impact of Defensive Uncertainty
Recently in the sabermetric community there has been a lot of discussion about fielding stats and their inclusion in WAR (see for example this thread, or this one at The Book blog) given the uncertainty behind the data (batted ball type, hit location etc.). With that in mind I thought it would be an interesting exercise to see how applying uncertainty to the defensive runs above average (DRAA) numbers affects the 2009 fWAR leaderboard. My method for applying the uncertainty is pretty simple; I just ran a Monte Carlo simulation using a normal distribution for the simulated DRAA with a mean of the DRAA reported by Fangraphs and a standard deviation of 5 runs. The following table looks at how often the top 10 players in fWAR fell into each of the top 10 slots after running the simulation 10000 times.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Albert Pujols | 62% | 23% | 9% | 4% | 1% | 0% | 0% | 0% | 0% | 0% |
| Ben Zobrist | 22% | 36% | 22% | 11% | 5% | 2% | 1% | 1% | 0% | 0% |
| Joe Mauer | 12% | 24% | 29% | 17% | 9% | 5% | 2% | 1% | 0% | 0% |
| Chase Utley | 3% | 8% | 16% | 23% | 19% | 13% | 9% | 5% | 3% | 1% |
| Derek Jeter | 1% | 4% | 10% | 16% | 18% | 17% | 13% | 9% | 5% | 3% |
| Hanley Ramirez | 0% | 2% | 7% | 12% | 16% | 18% | 16% | 11% | 8% | 5% |
| Evan Longoria | 0% | 2% | 4% | 10% | 14% | 17% | 17% | 14% | 9% | 6% |
| Prince Fielder | 0% | 0% | 2% | 5% | 8% | 12% | 15% | 16% | 15% | 10% |
| Ryan Zimmerman | 0% | 0% | 1% | 2% | 4% | 7% | 11% | 15% | 16% | 15% |
| Adrian Gonzalez | 0% | 0% | 0% | 1% | 3% | 6% | 8% | 13% | 16% | 15% |
So if you buy my 5 run SD assumption then the impact on ordinal ranking is the above. Clearly the impact on overall WAR (and thus $/WAR) isn't captured in the above analysis.
This is just a quick look at the subject, but I think there may be more to uncover like looking at different fielding metrics in place of UZR. Either way it answered one of my questions, "What orders of magnitude are we talking about?"
Update: Here's the same table with a SD of 10 runs
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Albert Pujols | 38% | 21% | 14% | 9% | 6% | 4% | 3% | 2% | 1% | 1% |
| Ben Zobrist | 22% | 20% | 16% | 11% | 9% | 6% | 4% | 3% | 2% | 2% |
| Joe Mauer | 15% | 17% | 15% | 12% | 10% | 8% | 6% | 4% | 3% | 2% |
| Chase Utley | 8% | 11% | 12% | 11% | 10% | 9% | 8% | 6% | 5% | 4% |
| Derek Jeter | 5% | 8% | 10% | 10% | 10% | 9% | 8% | 7% | 6% | 5% |
| Hanley Ramirez | 4% | 7% | 8% | 9% | 10% | 9% | 8% | 7% | 6% | 5% |
| Evan Longoria | 3% | 6% | 7% | 9% | 9% | 9% | 8% | 7% | 7% | 5% |
| Prince Fielder | 2% | 4% | 5% | 7% | 7% | 8% | 8% | 7% | 7% | 6% |
| Ryan Zimmerman | 1% | 2% | 4% | 5% | 6% | 7% | 7% | 7% | 7% | 6% |
| Adrian Gonzalez | 1% | 2% | 3% | 5% | 5% | 6% | 6% | 7% | 6% | 6% |
0 recs |
8 comments
|
Comments
Neat!
So if we think we’re within 5 runs of reality with uzr, then fielding uncertainty will bump a guy a spot or so up or down at the top of the leaderboard. I hope we’re within 5 runs!
I write at:
Beyond the Boxscore | Red Reporter | Basement-Dwellers.com | Twitter: @jinazreds
Uncertainty Varies
I think the uncertainty about Pujols defense would be much less than, say, Zobrist’s. Pujols only plays one position, gets lots of chances, and it’s relatively easy to determine whether a chance was in zone or not.
don't disagree
I toyed with altering sd’s based on various factors, but couldn’t reconcile in my mind exactly what I wanted to do.
by stevesommer05 on Jul 22, 2010 8:55 PM EDT up reply actions
Cool idea
A few thoughts:
- Colin already pointed out on Twitter that a 5 run SD is probably too small.
- Is the distribution normal? I wouldn’t be shocked if it were, but then I wouldn’t be shocked if it weren’t either. Either way, it’s not something I’ve ever looked at.
Concur on point one. I was WAY guessing. Laziness at it’s finest. Hopefully I’ll have a table with SD of 10 here in a few minutes.
On the second, I’m not sure we know how the systemic biases would make the “answer” be off, but I’d guess normal until I had some evidence to the contrary.
by stevesommer05 on Jul 22, 2010 9:05 PM EDT up reply actions
You should mention that this is 2009 numbers BTW
And really if your assuming UZR error is normally distributed that won’t really change anything because the distribution is still around the players’ mean UZR grade. Why not try playing around with different kinds of distributions (they would have to be skewed towards league average) or use the players regressed UZR as the mean?
I have that it’s the 2009 fWAR leaderboard in the first paragraph, but yeah in general it’s probably not called out especially well.
I guess it didn’t matter to me that it wasn’t going to change anything (I assume you mean drastically change the order?). In fact I kinda like that using UZR as the center “maintains” the general order because then any shifting around is due to the uncertainty in the metric not shifting to a different metric. That said, when I referenced other metrics instead of UZR, I was thinking of using my projections (so a version of regressed) as the substitute.
by stevesommer05 on Jul 23, 2010 8:08 AM EDT up reply actions
Oh crap sorry, I guess I overlooked that sentence
I guess I just think people’s problem with UZR is not that it has a lot of error, but that the error is going to be more present in players with extreme UZR scores and it’s going to be biased to the ends of the curve. I mean let’s say you were to repeat this analysis with every fielder assumed to be zero runs above average. The types of deviations and percentages would be the same as they are now.
I think you’d have to do a distribution in a distribution to get the right effect here. Calculate the spread around each players UZR score by standard deviations than use those in the context of the league average spread by standard deviations.
by vivaelpujols on Jul 24, 2010 4:26 AM EDT up reply actions

by 


























