clock menu more-arrow no yes mobile

Filed under:

What Starting Pitcher Metrics Correlate Year-to-Year?

Tim Hudson has the highest GB/FB ratio in the majors since 2008 (2.63).
Tim Hudson has the highest GB/FB ratio in the majors since 2008 (2.63).

As a follow up to my previous article on hitting metrics, I wanted to take a look at those pitching metrics that correlate year-to-year. For this installment, I looked at starting pitchers from 2004-2011 with at least 162 innings pitched in year one and year two.

As before, this is just a straightforward correlative analysis--nothing fancy. I took a look at a bevy of metrics (courtesy of the fine, upstanding citizens at FanGraphs), and here are the results:

Pitcher repertoire generally has the highest correlation, year-to-year (Y2Y). The distribution of their pitches (i.e. four-seam fastball, cutter, change up, etc.) shows great consistency from one year to the next. Now, there are potentially coding errors in that data, but the consistency of those statistics reflects what I think is generally known--that once a pitcher makes it to the big leagues as a starter they rarely alter their portfolio of pitches. What they likely alter, more regularly, is speed, sequence, and location. But that's just a hypothesis, one that can't be confirmed or rejected with this data.

Moving on.

Outside of repertoire, the highest correlated statistic for starters is the ratio of ground balls to fly balls they throw (GB/FB), followed closely by K%. Again, this is consistent with previous research that looked into what factors a pitcher generally controls (e.g. Tom Tango and FIP, Matt Swartz and SIERA). We can see that strikeouts (and metrics associated with strikeouts such as Swinging Strike %, Contact %, and Outside of Zone Swing and Contact %), walks, and batted balls outside of line drives are all correlated Y2Y at least .67 or higher.

The highest correlated ERA estimator was SIERA (.72), followed by xFIP (.68).

As before, I also put together a correlation matrix for all the year one metrics and the year two metrics. Those correlations between .40 and .69 are shaded blue, and correlations above .70 are shaded green.

Scrolling left to right we can quickly see what metrics correlate strongly with, say, next year's Earned Run Average (ERA). ERA itself has a Y2Y correlation of .38. True ERA (tERA) came in at .47, the highest of all the ERA estimators. Fielding Independent Pitching (FIP) had a correlation of .46, followed by SIERA .45 and xFIP .43.

Another interesting finding relates to Win Propability Added (WPA). The most predictive statistics in terms of whether starters will have higher WPA are those related to strikeouts. Again, this jives with what people have long suggested--the ability to miss bats is key and something that pitcher's inherently control to a large degree.

Finally, to further emphasize the point that a starting pitcher's record is not the best way to evaluate their performance, let's look at run support per nine innings (RS/9). The Y2Y correlation of a pitcher's run support is a mere .16. With Wins having a correlation of only .29, it's no surprise.

So, as with hitters, it pays to focus on independent pitcher metrics like SIERA and FIP when trying to get a read on a hurler's true performance and likely performance in the next year. And, like hitters, focusing on how much a pitcher misses bats, gets swings on less hittable balls, and commands the zone is a solid bet as these attributes are some of the most related year-to-year. When we see big changes in these types of metrics it should be a red flag that something might be happening (positive or negative) with a pitcher.

(Special thanks to Matt Swartz for working through some data issues with me)