Jon Gray’s lackluster performance can’t be explained away by FIP

I saw Jon Gray pitch in person once. It was at Fenway Park of all places, a ballpark where the Rockies do not play much for obvious reasons. I got some good seats (third base side) to be able to get a good view of Nolan Arenado’s wizardry. Though Arenado alone is a great reason to go see the Rockies, that was just a bonus for the main reason my wife and I decided to go to that game and sit third base side.

It was the day the Red Sox FINALLY retired the number of Wade Boggs. He gave his speech at third base, naturally, so I got a great view of of it. Other Red Sox greats were there, including Carl Yastrzemski, who is one of the most reclusive Hall of Famers out there. Even Ryne Sandberg was there for some reason! It is a baseball memory I will always carry with me.

Moving back to the topic at hand, I was not really looking forward to seeing Gray pitch. Yes, he was a high draft pick, but he was far from the player that scouts thought he would be. He had a career 4.23 RA9 in the minors, though he pitched in some very hitter friendly environments like the Pacific Coast League and the California League. His peripherals were nothing special either.

Coors Field is surprisingly not to blame for Jon Gray’s struggles

Gray’s issues go far deeper than what particular ballpark he is pitching in.

Gray made his debut the year before and pitched poorly. He had a 5.75 RA9 with mediocre strikeout and walk rates. His rookie season is actually not too different than his current season. He had a big gap between his ERA and FIP, much like analysts are talking about now.

The day I saw Gray pitch, he performed well. He went 7.1 IP with two runs allowed on a David Ortiz home run. He had six strikeouts but walked three batters. I was pleasantly surprised watching him that day. He finished the season with a 4.93 RA9 and 26 K%, both improvements over the year before. When adjusting for Coors Field, he was a solidly average pitcher at 2.0 WAR.

Well, we all know what is going on with Gray this year. He has a 5.97 RA9. He might be one of the most confusing, frustrating, and talked about players in baseball this year. He was demoted to Triple-A despite the fact that he has a 3.10 FIP and 2.68 DRA.

I am not going to discuss why that is here. Plenty has already been written on that subject. The best pseudonym in baseball wrote about it in the sidebar. BtBS alumnus Zack Crizer and The Ringer’s Ben Lindbergh both wrote in-depth articles trying to unravel this mystery.

Whenever a pitcher appears to be underperforming, fans and analysts will always look at the same set of stats to see how “lucky” or “unlucky” the pitcher has been in order to assess the sustainability of the performance. BABIP, HR/FB, FIP, xFIP, and strand rate are some of the most popular stats to point to. DRA is one of my favorites, and it is gaining in popularity, though it has also been a bit divisive.

If a pitcher’s BABIP and/or HR/FB ratio is out of wack with his track record, or is inconsistent with the league average if he is relatively new, he is likely going to regress to the the mean. If a pitcher’s FIP deviates significantly from his ERA, or if the same can be said about his DRA and RA9, he is likely going to regress to the mean. Many articles are written on these concepts because these stats have proven predictive properties.

(It is important to note that FIP must always be compared ERA because it is on the ERA scale, and DRA must always be compared to RA9 because it is on the RA9 scale.)

FanGraphs Craig Edwards also wrote about Jon Gray last week, focusing more on ERA-FIP differences in MLB history. While reading that article I came across John Lackey’s horrendous 2011 season where he had a 6.41 ERA. It turned out he was pitching hurt and then had Tommy John surgery. I remember there were those at the time defending Lackey’s performance by citing his 4.71 FIP. It is similar to those criticizing the Rockies’ decision to demote Gray by citing his .386 BABIP, 63 percent strand rate, and 3.10 FIP. I have seen fans and media members alike make that argument. Here is the problem when making that argument for Lackey before and Gray now:

BABIP, FIP, HR/FB, DRA, and strand rates only work for major league quality pitchers.

All models have their flaws and limitations. Every single one of them. That includes major sciences such as physics and chemistry. Those problems are dealt with as long as the models are useful. That’s just Science 101. I am reminded of when I wrote about Jered Weaver breaking the DRA model. As one of its inventors Jonathan Judge confirmed, DRA was never intended to handle a pitcher as bad as Weaver. Likewise, DIPS theory, from where we got BABIP and FIP, was never developed with terrible pitchers in mind.

A major league quality pitcher can suffer from the aforementioned stats being high due to bad luck. A terrible pitcher who has no business being in the major leagues will also suffer from those stats being high, but it will have little to nothing to do with luck. Writers such as Crizer, Lindbergh, and Edwards understand this, which is they did not write short, simple articles dismissing Gray’s poor performance as bad luck.

Thankfully, the misuse of these stats int this manner has never been much of a problem. Bad pitchers usually do not stick around the majors long enough for anybody to cite DIPS theory to say that they will regress to the mean. It only ever happens when a seemingly established pitcher with good peripherals starts to allow a ton of runs.

This is not to say that Gray is no long a major league quality pitcher. He is a fascinating, complicated outlier, which is why so many words have been written on him already. The point is that DIPS theory and DRA are not going to work when a pitcher has an RA9 approaching six runs.

The eagle-eyed reader will notice that nowhere in this article did I mention xwOBA. There is a reason for that. It’s all well and good for hitters, but Jonathan Judge demonstrated that it is no better at predicting future performance than FIP or DRA. This does not necessarily mean that it has no utility for pitchers, as Edwards points out, just not for predicting future performance.

Only time will tell if Gray will figure things out. The demotion might be frustrating, but it might be for the best. It certainly worked out well for Roy Halladay and Cliff Lee once upon a time. If he does continue to struggle in the same manner, just be careful not to waive it off completely by citing his BABIP or DRA.

. . .

Luis Torres is a Featured Writer at Beyond the Box Score. He is a medicinal chemist by day, baseball analyst by night. You can follow him on Twitter at @Chemtorres21.

On Jon Gray and the use of “luck” factors

Be careful when blaming Gray’s poor performance on bad luck.

Coors Field is surprisingly not to blame for Jon Gray’s struggles

Loading comments...

On Jon Gray and the use of “luck” factors

Be careful when blaming Gray’s poor performance on bad luck.

Coors Field is surprisingly not to blame for Jon Gray’s struggles

More From Beyond the Box Score

Loading comments...