clock menu more-arrow no yes mobile

Filed under:

Playing Around With a Strike-Based Pitching Metric

What can we tell about a pitcher from how many strikes they throw?

Kevin C. Cox

Recently at Beyond the Box Score, there has been a revival of sorts for the pitching metric kwERA. I have always been a big fan of kwERA, preferring it to most other pitching metrics due to it truly stripping pitching down to the two things that the pitcher has most control over- strikeouts and walks.

Tom Tango was kind enough to respond to our wave of kwERA support, discussing it, and mentioning that he had named kwERA over szERA (strikezone ERA), because he felt that calling it szERA would be misleading as it is not based on balls and strikes.

I decided to take a look at a metric based on balls and strikes, and ran some quick numbers, using a sample of all qualified seasons by starting pitchers, from 2002-2012 (a sample of 961 seasons). I chose strike% instead of zone% because strike% includes every strike, while zone% is simply pitches in the zone.

I ran strike% (strikes/pitches thrown) with RA9 (Runs Allowed per 9) and got an r-squared of .103, meaning that it accounted for 10.3% of the variance in RA9.

This obviously would act as a deterrent to any further work on a strike-based ERA, but I wondered if there was a way to judge the "quality" of a strike, if you would. I decided to run it with SwStr% to see if that would get a stronger result, with my logic being that pitchers with better stuff would have a higher SwStr%, and therefore would have more "quality" strikes.

After running both, I came up with an r-squared of .236, meaning that it accounts for 23.6% of the variation in RA9. Still not definitive, but it is still roughly 1.5x better than the previous attempt. For reference, I tested kwERA with this sample (not including IBBs and HBPs), and got an r-squared of .347, so I'm still not close enough to make any definitive statements.

I named it qsERA (quality-strike ERA), but that was only for the purpose of making the table, as I can not stress enough that it is far from a finished stat. I called it qsERA because it attempts to look at the amount and the quality of strikes a pitcher throws.

Below are the top 10 seasons out of the sample, based on the metric.

The metric is scaled, but that does not mean that it is a finished product. Also please note that I scaled it to ERA, not RA9.

Season Name STR% SwStr% qsERA
2002 Curt Schilling 70.65% 14.60% 2.45
2004 Randy Johnson 69.03% 14.80% 2.54
2002 Randy Johnson 65.61% 16.40% 2.56
2004 Johan Santana 66.31% 15.70% 2.62
2005 Johan Santana 69.90% 13.90% 2.62
2003 Curt Schilling 69.62% 14.00% 2.63
2007 Johan Santana 67.92% 14.00% 2.77
2007 Cole Hamels 68.47% 13.60% 2.79
2002 Pedro Martinez 66.39% 14.50% 2.81
2007 John Smoltz 67.57% 13.70% 2.84

That looks like some of the better pitching seasons of the last decade or so, although it is interesting that Pedro Martinez does not show up until 9th, slotting in behind 2007 Cole Hamels.

For further reference I included the bottom 10 seasons in the table below, with poor Kirk Rueter owning the bottom two seasons.

Season Name STR% SwStr% qsERA
2004 Shawn Estes 57.41% 6.40% 4.87
2006 Steve Trachsel 59.98% 5.10% 4.88
2011 Brad Penny 60.35% 4.60% 4.93
2008 Livan Hernandez 62.33% 3.60% 4.93
2007 Tom Glavine 58.04% 5.70% 4.94
2009 Livan Hernandez 59.76% 4.80% 4.95
2008 Daniel Cabrera 58.81% 5.10% 4.97
2003 Nate Cornejo 60.57% 4.00% 5.01
2002 Kirk Rueter 58.63% 4.50% 5.09
2004 Kirk Rueter 58.09% 4.40% 5.15

This is far from a finished statistic, but I do believe that it could be the start of a strike-based metric. I should note again that this should not be used in any case to evaluate pitchers, as it does not come close enough to more established pitching metrics to be used in lieu of metrics such as FIP, xFIP, or even ERA.

So now I'll ask the audience: What can be done to improve this strike-based pitching metric? What faults exist in the starting point that I have tried to establish?