Two recent threads have brought up the appropriateness of using xFIP to judge past performance for pitchers. (And Adam's tie-in to his career touches on a related point, too.)
I'll summarize the arguments for and against using xFIP as a retrospective metric below, but the reason I bring this up is because I think the arguments are more nuanced than these simplified viewpoints:
Against using xFIP as a retrospective measure: Sure, HR/FB might need a large sample size to become a good representation of a pitcher's skill, but those home runs actually happened and the pitcher is the one responsible for them happening. For a historical metric, we should use actual home runs, not a home run estimate based on fly balls. (One could also argue that any sort of FIP measure removes a pitcher's performance with runners on base, which is not right, because that "clutchiness" performance actually happened, too.)
For using xFIP as a retrospective measure: Yes, those home runs occurred when the pitcher was on the mound, but, like BABIP, they aren't necessarily a result of his talent. The cliche is that "pitchers allow fly balls, but hitters turn them into home runs." If we're going to assess a pitcher's future HR skill based on FB's, then why would measuring his past skill be any different?
I see two levels of discussion here: which of those two simplified arguments do you agree with (and why) and what nuances do they miss that are important (and why)?
For example, are there more distinctions to be made than just past/future? Where does the distinction between past value and past talent come into play?
A question from Sky
almost 2 years ago
Sky Kalkman
32 comments
0 recs |
Comments
Here's the thing
I think you have to differentiate between performing better than expected or worse than expecting, whether you are talking about HR/FB rate or BABIP.
A BABIP that is significantly lower than average over a significant period of time is almost certainly the result of defense/luck. However, a higher BABIP than average is more likely the result of bad pitching. Natural selection just ends up winnowing out those pitchers at the major league level.
Similarly, I have a hard time giving credit to pitchers retrospectively if their xFIP is lower than their FIP, because a significantly higher than average HR/FB rate is more likely to be reflective of the pitcher’s performance than a significantly lower than average HR/FB is.
by Adam J. Morris on Jun 25, 2010 12:31 PM EDT reply actions
The MLB vet vs. MLB youngster is an interesting question.
Low variation in BABIP among MLB pitchers is a decent assumption, but what do we do with pitchers who haven’t established themselves as MLB pitchers?
And if someone (say, Matt Cain) has shown a past ability to beat the league-average HR/FB rate over multiple years and 1000 IP, should that information be used when only dealing with 50 IP in 2010?
Why not take it even beyond FIP?
Preemptive apologies for sounding like a broken record, but why not use a more complete batted ball metric like tRA/tERA that gives credit for batted ball types and the run values of those events?
I think it’s a fairly cut and dry distinction to determine the true value of past performances, but simultaneously contextualize that performance in a metric that is much better at predicting the future.
My same question would apply to separating out GB/LD/IFFB/OFFB, I think.
Yes, line drives are usually hits and are a bad from a pitcher’s perspective. But research has shown that for MLB pitchers, past LD rate is not a good indicator of future LD rate. So is it a fluke? If so, why hold pitchers accountable for historic LD rates?
Also raises the question, does this have to be a zero sum game? Others (David Gassko? Pizza Cutter?) have argued no, credit you give to hitters doesn’t all have to be matched by blame handed to pitchers/fielders.
by Sky Kalkman on Jun 25, 2010 1:37 PM EDT up reply actions 1 recs
Yeah, that's a good point.
From my perspective, I think it’s worth including simply because it’s a part of what happened. If you can effectively communicate that LD% is a good predictor at all, then I don’t see a reason to pull it out. But then again, I like looking at lists of worst starts or worst seasons in MLB history and seeing the hilariously absurd numbers that some pitchers put up.
I think you bring up a good point about the zero-sum game as well. We’ve all seen the absurd batter-reaching-across-the-plate-for-a-bloop-single hits. It’s hard to imagine that that’s “bad” pitching. I wonder how you’d (first) determine what hits fall into that category, and (second) assign credit/blame to those events. Do you give partial blame to the pitcher? Has anyone collated ideas about how that would work?
I disagree with using xFIP for retrospective analysis
for much the same reasons that are stated in the above comment. I perfer FIP for that use since it is a simple and straight forward account of the past. I do think but their are significant nuances missing from both statements though and also from the idea of xFIP itself.
Home runs are statistically connected to FB%, which is logical. High strikeout, power pitchers tend to throw more balls up in the zone, which are more frequently hit for fly balls and also more likely to be hit for home runs. There is some truth to the notion that “pitchers allow fly balls, batters turn them into home runs” because of that relationship, but that does not explain all home runs, and I am not sure it even does a good job of explaining the majority of home runs.
Two things not accounted for have a huge effect on home runs actually happening, the intended location vs the actual location and the effectiveness of a pitches intended break. A fast ball on the upper inside corner is a very hard pitch to hit fair for a home run, but the same pitch a few inches further over the plate and a faction lower is a very easy pitch to hit for a home run. The same thing applies to breaking pitches that “hang” ( or break less or earlier than intended). Location and break are absolutely a part of a pitcher’s skills. Their consistency in executing both is a factor in home run rates. Applying xFIP to retrospective data ignores this.
That is not say that xFIP is plain wrong. The relationship between Fly balls and home runs has enough consistency for us to feel confident in using it as a predictive tool, but we should not ignore pitcher’s performance entirely in home run rates. Good hitting certainly accounts for some home runs, but poor pitching has an influence as well.
If I am considering a pitcher’s performance to date against what I feel he might do in the rest of the season, I would want to consider his past FB/HR rates, BB%, Zone% and both his FIP and xFIP to see if a change in one area is influencing the rise (or fall) in xFIP relative to his skills. A pitcher who is in the zone more might be giving up more home runs as a result or a pitcher might have a historically high FB/HR because he is inconsistent in hitting his spots.
It is important to remind yourself that correlation does not mean causation.
- Matt Sullivan
A good friend of mine used to say, "This is a very simple game. You throw the ball, you catch the ball, you hit the ball. Sometimes you win, sometimes you lose, sometimes it rains." Think about that for a while. - Nuke LaLoosh
"Luck"
We attribute runs above replacement to batters for what they’ve done, regardless of how lucky they may have been. We then project what they might do differently based on regression on BABIP, HR/FB, etc.
I’m in agreement with jwiscarson above. What happened, happened.
If we’re going to base pitcher historical evaluation on xFIP, perhaps we should be calculating batter WAR based on wOBAr (what should have happened) instead of wOBA? Add an adjustment for HR/FB, true distance style to take away wind aided HRs and reward wind-denied HRs?
I've wondered this, too.
Is HR/FB at a stable ~10% for all batters, too? If not, couldn’t you add in a competition factor to pitching homers?
HR/FB has more variation for hitters than pitchers.
Big HR hitters are well above 10/11% (in estimated skill) and guys like Ichiro/Pierre are well below it. Mariano Rivera’s at 6% HR/FB, which considering he’s the best reliever ever has got to be close to the minimum true-talent level for any pitcher.
I think their should be two stats
Defense adjusted run average and projection systems. Abolish xFIP, tRA and FIP!
On xFIP
I was one of the posters who called into question the use of xFIP in the “1-2 Punch” article, and I stand by that, having read both sides.
I think that sometimes the statistical community gets wrapped up in itself and does occasionally ignore what happened on the field. Using whatever methods, the results on the field stand. I think that DIPS stats (whatever your flavor, they all correlate fairly similarly) are fantastic analysis tools for understanding WHY something has happened, and WHAT likely will happen going forward.
However, stats that measure actual on field contributions should not be thrown out entirely. I don’t personally care how he’s done it, but Livan Hernandez has not allowed a lot of runs this season. That probably will not continue, but we all know that past success does not indicate future success. To downplay that is what makes “normal” baseball fans throw their hands up and storm away.
Sometimes, even if we acknowledge that it was luck, we need to acknowledge that it happened. Does anyone disagree that Ubaldo has been the best/most valuable pitcher in the NL? Probably very few, yet he’s outperforming his xFIP. By season’s end, he may regress, or he may continue to outperform. To ignore the actual results on the field is counter productive and not what the SABR movement should be all about.
Nobody ignores Ubaldo's contributions on the field
Those who are saying he is getting lucky, are literally saying that the contributions that he has done on the field are not reflective of the amount of control Ubaldo had.
by vivaelpujols on Jun 25, 2010 6:18 PM EDT up reply actions 2 recs
THIS!
It’s all about the context of what you’re arguing. If you’re looking at who has actually pitched the best, you have to isolate what has been in the players’ control. That doesn’t mean you can’t look at who’s had the best results, elements outside of his control included, but if your responding to someone who isn’t looking at that, you shouldn’t be trashing them for doing so because it’s not his intention.
I understand what xFIP means and what the criticisms of Ubaldo have been. I’m not saying that he hasn’t outpitched his peripherals. But even if its entirely smoke and mirrors and he has a second half (forgive me) ERA of 5.50 because of a streak of historically bad luck, I think you should count [b]the results he actually achieved[/b] more than xFIP actually does.
Depends on what you're trying to present.
If you’re simply trying to present what Ubaldo has done within his control, you don’t have to look at the things that aren’t.
by philkid3 on Jun 26, 2010 2:10 AM EDT up reply actions 1 recs
Again, it's phrases like this that I think deserve closer scrutiny:
the results he actually achieved
by Sky Kalkman on Jun 26, 2010 7:16 AM EDT up reply actions 1 recs
I think a lot of it boils down to interpreting a statement like this:
Livan Hernandez has not allowed a lot of runs this season
I agree (and it’s tough not to) that while Livan Hernandez has been pitching, the opposing team has not scored many runs this year. And traditionally we’ve used the term “allowed” as a shortcut for that previous statement. But there are many other things we can use “allowed” for and they aren’t all the same thing.
Filpping coins is a metaphor that’s probably a bit overplayed, but if Marc Normandin and Rob Neyer have a coin flipping contest (sans cheating) and Marc flips 68 heads out of 100 flips and Rob flips 47 heads out of 100, does that make Marc the better coin flipper? He “allowed” 32 tails versus 52 tails for Rob. And he won the contest. But if we wanted to make a judgment about how well they flipped, isn’t there an argument to be made that they were equally talented?
by Sky Kalkman on Jun 25, 2010 6:40 PM EDT up reply actions 1 recs
I understand how statistics work, and how the statistics behind xFIP work. I understand that they are a measure of who has pitched [b]to have the most likely chance of success[/b] even if it has not turned out that way.
I am arguing that we should not ignore the actual outcomes. By your measure there is no point in keeping track at all, because at the end of the day, we’re just going to write 50/50, what should have happened.
Unfortunately, I’m not sure there is enough research into any of the advanced pitching metrics. DIPS itself is a relatively recent field. There are a TON of unknowns about pitcher performance. HR/FB as a mainstream concept in our community only took off when Fangraphs added xFIP. Relying too much on these stats to say who [b]has gotten the best results[/b] is spurious. Which honestly I think is how you should measure something like the 1-2 punch article.. Which team has gotten the most value from its top 2 SP so far this season.
I am arguing that we should not ignore the actual outcomes.
I don’t think anyone is saying you always do ignore actual outcomes. It depends on what you’re trying to say.
By your measure there is no point in keeping track at all, because at the end of the day, we’re just going to write 50/50, what should have happened.
Oh come on, he didn’t say that at all.
I do think you’re perfectly welcome to rather see a best 1-2 punch article looking more at pure results rather than true talent, or something in between, or something entirely different. But that doesn’t mean Satchel’s wrong in looking at the angle he chose to, it’s just not necessarily what you’re interested in. If you made a FanPost looking at it from a results-based angle, I’m sure people would be interested.
I just have MASSIVE problems with how the SABR community acts nothing like science. We see one or two articles or pieces of research backing up a theory and it becomes the absolute gospel.
I am not that confident in xFIP as a measure of “who pitched best” vs “who was the most likely to achieve the best results.” I think they are different, even if it sounds like I’m debating semantics. I think xFIP tells us who put themselves in the best situation for success, which in my mind, is not synonymous with “pitched best” or “was the best pitcher.”
For years and years we accepted as gospel that idea that pitchers had no influence on their babip, and now there is some evidence to the contrary. HR/FB% research is fairly new, and xFIP as a mainstream stat is new. I urge caution when applying any of these stats that often do not match the results on the field as absolute 100% gospel.
Sorry, I disagree.
Not that xFIP is absolute gospel, but that people think it is or that it has to be to be the most used reflection of talent level.
I have yet to read this thread.
This seems to go along with the thread I started a while ago about pitching metrics, but wouldn’t the “against” argument argue against using plain FIP as a catch-all measure of what has happened when evaluating past performances, as well?
Also, it's hard to answer your question.
I think it depends on what you’re talking about. If I’m looking at who I’m going to, say, give pretend Cy Young votes to, I’m going to look at FIP over xFIP (and maybe something like tRA over either). If I’m in a talent conversation, then xFIP is probably what I’m looking at.
I would say xFIP is a good retrospective stat.
It regresses HR/FB% to league average. That regress’s park affects on homeruns and quality of batters faced to league average. I look at ERA, FIP and xFIP to see if a pitcher has been lucky or unlucky in any area to evaluate their past performance.
I don’t think HR/FB% is really a pitcher skill. GB/FB% is the pitcher skill that minimizes homeruns by minimizing FB’s.
From reading Ted Williams: The Science of Hittting: If the batters timing is early, it’s a ground ball; if he’s right, it’s a linedrive; if he’s late, it’s a flyball; if he’s later, it’s an infield popup.
So, If their is such a thing a HR/FB% pitcher skill, then it’s a side effect of a pitcher skill, like velocity, stuff, control or change in speeds that causes the batter to be too late to hit a good HR capable flyball.
RickyRomeroFan
makes an excellent point. I still prefer FIP for looking back, but I think it is necessary to adjust for park and opponents in considering HR/FB rate. I think that regression to the league mean might be an overly simplistic approach to such an adjustment though. There is a strong bias in that league average- mainly if a ML pitcher were to sustain an inordinately high HR/FB rate he would not be a ML pitcher for very long. Because the selection process guarantees an artificial rate, it hides discrepancies amongst pitchers, making the average appear more stable than it might otherwise be.
I am starting to believe more and more in SIERA as the top pitching stat out there right now due to it’s modifications on GB% and FB/HR . Even SIERA makes the same assumptions about the level of control involved in FB/HR, though so perhaps a better method can be created using regressions based on specifically opponent strength and park factors. By doing that we may see a more accurate estimation of how a pitcher has fared on HR’s by both skill and luck.
- Matt Sullivan
A good friend of mine used to say, "This is a very simple game. You throw the ball, you catch the ball, you hit the ball. Sometimes you win, sometimes you lose, sometimes it rains." Think about that for a while. - Nuke LaLoosh
I'm surprised no one puts out a park-adjusted FIP
It seems easy (at least for the many great saberists out there) to do.
Aaron King is still my homeboy... iffy mechanics and all
McFAQ for all you newcomers out there.




























