Ricky Nolasco: 4 WAR or 1 WAR?
This is a question I asked on my own blog, but I thought it'd get decent play here, and it addresses an interesting about pitcher WAR. Also, I have author privilege here, but I haven't written a damn useful thing on BtB for some time now, so I'd like to contribute.
Now as most readers here know, there are two great places to find WAR for any player. One of them is FanGraphs, the other is Rally's historical WAR database. One of them is fast and constantly updated, the other goes is comprehensive throughout baseball history (hence the "historical" part). The two databases measure positions players in essentially the same fashion. Some of the inputs and rates are different, but you aren't generally going to see a stark contrast in any one player; most differences are going to be due to different inputs (UZR for FanGraphs vs. TotalZone for Rally's database, for example).
Pitcher WAR is also performed in a similar fashion, but pitcher runs are determined in a very different fashion. FanGraphs uses FIP, a defense-independent component statistic that everyone here knows about and needs no further explanation. Rally's database uses a pitcher's actual runs allowed and takes a prorated value for defensive runs based on the balls in play the pitcher allows. Now, both versions in general reach similar conclusions, as most pitchers face around average luck and timing and the two values end up similar. A difference of 1 WAR would not preclude me from using one or the other; it simply becomes a matter of taste/preference for the method.
Then comes the interesting case of Ricky Nolasco's 2009 season.
Here's the relevant information:
2009 Nolasco FIP/0.92: 3.64
2009 Nolasco tRA (StatCorner): 3.94
2009 Nolasco RA: 5.40
Here's how I did the calculations and what I got as a result. From Marlin Maniac:
I did both calculations using park factors provided by Patriot. For defense-independent statistics WAR, I averaged tRA from StatCorner and FanGraphs, then averaged that value with FIP/0.92 and stuck into Pythagenpat. If you checked out my MVP article, you saw a list of WAR for pitchers calculated using that; that list contains all pitchers with more than 4.0 WAR. Using that method of evaluation, I had Ricky at 3.8 WAR for the season, a very good total. I then calculated WAR using Rally's method, using team bUZR from FanGraphs as my defensive metric. Using that total, I got Ricky totaling 0.8 WAR on the season.
The difference between the two is a staggering 3 WAR. If you'd prefer, just use the FanGraphs total of 4.2 WAR instead for the component statistic, it's not particularly relevant. The key here is that the difference is huge, and it brings up my question:
Strictly in terms of production, was Ricky Nolasco a 4 WAR pitcher or a 1 WAR pitcher this season?
Presumably, both measures are defense-independent, though they are calculated in different ways. Rally gives credit/debit to the pitcher for his context/timing, while the linear weights models are context independent. If all we wanted to know was production, which one would be a better option? Based on how we treat hitters, my initial presumption would be to lean towards the context-neutral method, but pitchers should have a lot more control over their environment than hitters do. Should a different method be used that's somewhere in between either of these options?
This is not to be taken as an indictment against either methodology, I'm just interested in everyone's opinion on the topic. Vote and discuss accordingly.
1 recs |
39 comments
|
Comments
This was my struggle
I spent a significant amount of time last winter trying to decide what the pitcher WAR formula for FanGraphs should look like. Eventually, I just came to the conclusion that there was no best way – there were just a bunch of different options with pros and cons.
I went with FIP for the reason you stated – it’s context neutral and measures pitchers and hitters the same way, without regarding to base/out context. I felt like having WAR for hitters that was context neutral and WAR for pitchers that was partially context dependent was not a road that FanGraphs should go down.
Eventually, we’ll have something like PZR, where we use a play-by-play metric to actually value the defensive contribution behind each pitcher and we can do better than any of the methods currently in use. But for now, it’s a preference thing, really.
by davidcameron on Oct 28, 2009 2:19 PM EDT reply actions 0 recs
You're probably right, Dave
and I’m happy to have any of these evaluations available. But when discrepancies like these come about, it makes me think if there is a better answer. I’d say I’d agree that if anything, that answer is PZR.
Marlin Maniac, a Florida Marlins blog
Come attend Intro to Sabermetrics 101!
Check me out at Beyond the Box Score as well.
by SFiercex4 on Oct 28, 2009 3:02 PM EDT reply actions 0 recs
I say 4 WAR a la Fangraphs
For the simple reason that it is likely to be more predictive.
If I had to guess how Nolasco would do if you gave him identical priors and ran the season again, his expected value would be much closer to 4 WAR than 1 WAR.
by Tommy Bennett on Oct 28, 2009 4:11 PM EDT reply actions 0 recs
Tommy, I would absolutely agree with that it's more predictive
but does it tell the story that already happened better? That’s what I was thinking when I looked at both numbers. Did he put up four wins for the Marlins this season alone, or is it close to that smaller, other number?
Marlin Maniac, a Florida Marlins blog
Come attend Intro to Sabermetrics 101!
Check me out at Beyond the Box Score as well.
by SFiercex4 on Oct 28, 2009 4:35 PM EDT up reply actions 0 recs
WAR is not a predictive stat
It is a what have I done stat.
For example for offensive players, a players can have the same wOBA, play in the same number of games, and play the same defense, but if one player bats higher in the lineup and is surrounded by better hitters his WAR value will be higher.
Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.
by Jeff Zimmerman (TucsonRoyal) on Oct 28, 2009 5:50 PM EDT up reply actions 0 recs
Wait, what?
No it won’t…will it? I was under the impression that wouldn’t be factored into WAR at all…that would increase the number of RBIs, but those aren’t included in the calculation for WAR…am I missing something here?
Brewers Baseball and other assorted nonsense (mostly the assorted nonsense) at my blog, What's a Tararrel?
by Lefti on Oct 28, 2009 5:54 PM EDT up reply actions 0 recs
Innings Pitched and PA will both be factored into WAR.
Now, some people project players using 600 or 700 at bats (example at driveline). If a player would be a +20 UZR defender and hit for wOBA only plays a few time in the season (Jose Reyes), they won’t accumulate any WAR.
Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.
by Jeff Zimmerman (TucsonRoyal) on Oct 28, 2009 6:24 PM EDT up reply actions 0 recs
Just innings/PA’s isn’t adequate for describing their total contributions, though, in terms of “what have I done”. Two players get an equal number of PA’s but one is more “clutch” than the other over that period, they’ll have the same WAR even though the more “clutch” one directly lead to more runs being scored for his team. If you’re using WAR strictly to describe the past, you’d be missing some important elements that have a significant effect on W-L. I like WAR for what it is, but I don’t think you can fully call it a “what have I done stat”…
by Missing Barry on Oct 30, 2009 3:36 PM EDT up reply actions 0 recs
Then what does WAR do?
Marlin Maniac, a Florida Marlins blog
Come attend Intro to Sabermetrics 101!
Check me out at Beyond the Box Score as well.
by SFiercex4 on Oct 30, 2009 7:07 PM EDT up reply actions 0 recs
Sort of a bait to get him to define it for me
I have my definition in mind.
Marlin Maniac, a Florida Marlins blog
Come attend Intro to Sabermetrics 101!
Check me out at Beyond the Box Score as well.
by SFiercex4 on Oct 30, 2009 11:33 PM EDT up reply actions 0 recs
Perhaps I should have put some emphasis on some words?
Marlin Maniac, a Florida Marlins blog
Come attend Intro to Sabermetrics 101!
Check me out at Beyond the Box Score as well.
by SFiercex4 on Oct 30, 2009 11:33 PM EDT up reply actions 0 recs
I actually have concerns about what WAR does
It’s not a value stat, because it completely ignores timing, nor is it a “true talent level” stat because it doesn’t incorporate past seasons. It’s kinda of a weird hybrid, although very nicely presented and quite useful for blog discussion purposes.
by vivaelpujols on Oct 30, 2009 11:49 PM EDT up reply actions 0 recs
What is timing, though?
In other words, what is the difference between a single with the bases empty and a single with a runner on third?
In each case the batter performs similarly – it is the batters who preceed him who perform differently. By rewarding a batter for “timing” without controlling for the quality of opportunities he recieves, are we truly measuring value, or are we misattributing value?
Or look upon it this way. In a typical system that accounts for “timing,” a single with the bases empty is credited as though an average batting performance follows. The next batter, who singles with a runner on first, is credited for having performed well with runners on.
But why should we treat the arrow of causation as only going in one direction? Should not the first batter be credited for his performance, given the performance of the second batter above that of an average batter?
What is typically referred to as “timing” or “clutch” and so forth is based upon a narrative idea, one which is compelling but one which does not necessarily tell us anything about real value in baseball.
WAR (at least as constituted – one could certainly swap out RE24 or WPA or whatever one likes for the batting component) is set to measure a batter’s performance in isolation, ignoring the effect of his teammates. This may not be the only definition of retrospective value, but it certainly is a valid one.
by cwyers on Oct 31, 2009 1:34 AM EDT up reply actions 1 recs
That's a great point about the arrow of causation
I never thought about that, and for hitters it’s perfectly valid.
However, for pitchers, it is different matter, I think. They are responsible for creating their own run environments on each play. I can’t think of a way to measure that though. tRA/24 would have the same problems. If a pitcher on a bad defense was put in more high run scoring situations, his tRA/24 would be higher than a pitcher who’s defense was better.
I still don’t think WAR should be used for valuing the year end production (IE, how much they actually helped their team win ballgames) of pitchers, and to some degree, hitters.
by vivaelpujols on Oct 31, 2009 2:11 AM EDT up reply actions 0 recs
I share some of your concerns for pitchers, VEP
For hitters, I’m far more confident, for reasons stated by Colin.
After reading all this discussion, both here and on The Book blog, I’m still on the fence for pitchers. It may still have something to do with removing defense as well, as Tango posed an excellent question regarding timing and how defense-independence could screw with that.
Marlin Maniac, a Florida Marlins blog
Come attend Intro to Sabermetrics 101!
Check me out at Beyond the Box Score as well.
by SFiercex4 on Oct 31, 2009 5:22 PM EDT up reply actions 0 recs
WPA/LI incorporates "hitting to the situation"
but doesn’t give more credit to players who get higher leverage situations.
by lookatthosetwins on Nov 2, 2009 4:27 PM EST up reply actions 0 recs
WAR is not a "it is what I have done stat" either
Not for pitchers at least. It’s a “how well I would have pitches if all timing and BABIP related effects were taken out of the equation”. It removes a lot of noise, but also a decent ammount of skill. Besides, skill shouldn’t matter much towards production. If two players put up this line: 1 IP, 1 HR, 1 BB, 1 K; however, one of them allows the home run after the walk and the other one allows it before, they obviously weren’t equal pitchers in terms of production.
by vivaelpujols on Oct 28, 2009 7:17 PM EDT up reply actions 0 recs
I was thinking that as well
Again, that gets to the heart of the question, I suppose.
Marlin Maniac, a Florida Marlins blog
Come attend Intro to Sabermetrics 101!
Check me out at Beyond the Box Score as well.
by SFiercex4 on Oct 28, 2009 7:43 PM EDT up reply actions 0 recs
But WAR for pitchers does measue innings
In the NL a pitcher may get the hook if the team is down, even though they may still be pitching good, if they are up to bat they may be pulled for a pinch hitter.
I guess playing time is the big unknown for hitters and pitchers.
Jeff Zimmerman - Protecting the world from RBI's and Wins from my mom's guest house.
by Jeff Zimmerman (TucsonRoyal) on Oct 28, 2009 8:35 PM EDT up reply actions 0 recs
Honestly, I wouldn’t mind just doing it two ways (context + context neutral) and listing them both. It’s not that either is “better” than the other, it’s just they describe different things and have different pros and cons as others noted. Who’s the better player? Well, a context neutral stat is better to tell us that. Who was actually more responsible for the end results over the season? Without context, we can’t accurately answer that question because context factors like luck play such a big role, so again, it just depends on what you want to know.
So I’d advocate using ttwo methods and posting the results for both, though I’m sure there are difficulties presenting that clearly and what not…
by Missing Barry on Oct 28, 2009 4:34 PM EDT reply actions 0 recs
The issue is that (IIRC) Rally computes TotalZone for PITCHER, not for TEAM. So using team bUZR is not going to capture how Rally does it.
BABIP for Nolasco was .322. He allowed 561 BIP. League average BABIP was .298. So:
561*(.322-.298)=13.46
To convert BIP to runs, multiply by .7, so 9.42.
111/185*9 = 5.4 RA
Adjust by BABIP to get adjusted RA:
(111-9.42)/185*9 = 4.94 RA
I don’t know what that gives you (recall that Rally uses a different pitcher rep-level than FG as well).
by cwyers on Oct 28, 2009 5:12 PM EDT reply actions 0 recs
That yields something like 1.4 WAR
Thanks for the correction Colin, I was not aware of that. Nevertheless, I suppose the point still stands, though with less “oomph.” I was using the same replacement level for each one just to keep all other inputs the same.
Marlin Maniac, a Florida Marlins blog
Come attend Intro to Sabermetrics 101!
Check me out at Beyond the Box Score as well.
by SFiercex4 on Oct 28, 2009 7:42 PM EDT up reply actions 0 recs
Not really
“The issue is that (IIRC) Rally computes TotalZone for PITCHER, not for TEAM. So using team bUZR is not going to capture how Rally does it.”
I do it for team. Then, if a team is +50 TZ, and a pitcher pitches 10% of his team BIP, he’s considered to have had +5 defensive support. This will be the same whether his BABIP is .260 or .320.
The HK-47 hitting droid is the finest line drive machine ever built
by RallyMonkey5 on Oct 28, 2009 10:12 PM EDT reply actions 0 recs
Thanks for the clarification Rally.
It makes sense, but why did you choose to do it this way over the way Colin suggests?
Marlin Maniac, a Florida Marlins blog
Come attend Intro to Sabermetrics 101!
Check me out at Beyond the Box Score as well.
by SFiercex4 on Oct 28, 2009 11:20 PM EDT up reply actions 0 recs
Defensive sample size
We’ve got people who believe a player’s career defensive rating (or at least last 3-4 years) tells you more about how they played in 2009 than their actual 2009 defense does. Tango and MGL are in this camp. I don’t think we can say, based on 150 innings, that a defense played better behind one pitcher or another.
If one allows a .260 bip and another a .340, then I think it more likely that pitcher 2 allowed a lot more hard hit balls. That result is partly from skill, and probably from some bad luck, but either way the blame goes to the pitcher in a “what have you done” metric.
The HK-47 hitting droid is the finest line drive machine ever built
by RallyMonkey5 on Oct 29, 2009 8:01 AM EDT via mobile up reply actions 0 recs
Fair enough. That makes sense.
Marlin Maniac, a Florida Marlins blog
Come attend Intro to Sabermetrics 101!
Check me out at Beyond the Box Score as well.
by SFiercex4 on Oct 29, 2009 8:29 AM EDT up reply actions 0 recs
I don't think that's right at all.
The sampling problems with measures like UZR largely have to deal with apportioning fielding chances among players. When it comes to measuring team defense, we don’t have those problems. (It’s a split-credit thing – we have a pretty good idea of how to measure team defense, the problem is figuring out how to divide that credit up between individual players.)
As far as “hard-hit balls” – looking at the data, hardness of hit ball is not a major impact on how it is fielded (assuming you control for everything else). A ball that’s REALLY hard hit typically leaves the park altogether. The big impacts are the path the ball takes – whether it’s a ground ball or an air ball and the spray of the ball.
Honestly, I’d rather see no defensive adjustment than the defensive adjustment you’re using, because of the risk of a double penalty – for instance, a pitcher on a good defensive team who recieves bad defensive support on days when he pitches. That player’s observed RA is going to be hit twice – first from the effects of his defense, and secondly from the defensive adjustment that assumes his defensive support was good when it really wasn’t.
by cwyers on Oct 29, 2009 12:22 PM EDT up reply actions 0 recs
Defense behind Pitcher
I agree. This would be like assigning a pitcher’s run support based on the team average runs per game instead of using the runs per game when the pitcher actually was the starting pitcher. As we know, those two numbers can sometimes differ significantly, and I would think defense would be the same way.
KJOK
by KJOK on Oct 30, 2009 4:45 PM EDT up reply actions 0 recs
Wouldn't it make more sense to prorate defense based on batted balls?
For example, Nolasco, as a flyball pitcher, should not be rewarded for the Marlins’ bad infield defense, because he would benefit disproportionately from their good OF defense (This is a hypothetical example; I don’t know whether the Marlins had relatively better IF or OF defense)
Also, since you have TotalZone on each play, can’t you calculate defense the way PZR does it?
by Alex Krolewski on Oct 28, 2009 11:46 PM EDT up reply actions 0 recs
Probably could
But it would take a lot of work from where I have things set up now.
The HK-47 hitting droid is the finest line drive machine ever built
by RallyMonkey5 on Oct 29, 2009 8:03 AM EDT via mobile up reply actions 0 recs
I'm hoping Jacky Z can make a move for him. He'd be a steal, now that his value is low.
by Taylor H on Oct 28, 2009 11:35 PM EDT reply actions 0 recs
Jack Z is good, but Beinfest won't let him go until he absolutely has to
My suspicion is that Beinfest knows he was good and will hold on to him until arbitration is over. Maybe next year, though.
Marlin Maniac, a Florida Marlins blog
Come attend Intro to Sabermetrics 101!
Check me out at Beyond the Box Score as well.
by SFiercex4 on Oct 28, 2009 11:37 PM EDT up reply actions 0 recs
I do think Ricky Nolasco is a bit better than J.J. Hardy
And that’s for whom Dave wanted to trade Morrow. I’m not buying it!
Marlin Maniac, a Florida Marlins blog
Come attend Intro to Sabermetrics 101!
Check me out at Beyond the Box Score as well.
by SFiercex4 on Oct 29, 2009 8:28 AM EDT up reply actions 0 recs
Darn! Admittedly, I don't know how smart the Marlins GM is in terms of understanding value and sabermetrics and whatnot.
The only thing I know about the Marlins FO is that their owner made some disparaging comments about Ichiro :(
by Taylor H on Oct 29, 2009 12:06 PM EDT up reply actions 0 recs
I don't think he's that great, but he does seem to have a good scouting eye in general
And with the way Nolasco “bounced back” in the second half, I think most people in the front office think he’s the plus pitcher he probably is, rather than the guy who had a 9.08 ERA before being demoted.
Marlin Maniac, a Florida Marlins blog
Come attend Intro to Sabermetrics 101!
Check me out at Beyond the Box Score as well.
by SFiercex4 on Oct 29, 2009 1:20 PM EDT up reply actions 0 recs
He being Larry Beinfest, Marlins FO guru
Marlin Maniac, a Florida Marlins blog
Come attend Intro to Sabermetrics 101!
Check me out at Beyond the Box Score as well.
by SFiercex4 on Oct 29, 2009 1:20 PM EDT up reply actions 0 recs

by 










BtB on Facebook















