Applying Linear Weights to Pitchers
The idea of creating wRA (weighted Runs Allowed, based off of wRC, weighted Runs Created, the linear weights based stat used for evaluating hitters) started after seeing a debate on Twitter about the Ottoneu scoring system. Regardless of how well you do it, the ultimate goal of pitching is to record outs, obviously, doing it well is better, but even "eating up innings" as the phrase goes, has some value. The argument of solely using batters faced didn't make logical sense to me, as it only punishes the pitcher for facing more batters, even if the pitcher faces the minimum 3 batters per inning.
The scoring system in Ottoneu, for those who don't know, is based on work by Merry, using the FIP (fielding independent pitching) constants and innings pitched. When I thought about the idea of only using batters faced, I had the idea of "What would happen if a pitcher pitched a perfect game, but didn't record a strikeout?" Francisco Liriano's 6 walk, 2 strikeout no-hitter earlier this season comes to mind as a slightly less extreme example of this situation.
In that proposed system, the no-strikeout perfect game would be a negative game, because the pitcher failed to record a single strikeout, and he's being penalized for facing 27 batters, but due to pitching a perfect game, at no point were his team's chances at winning hurt. A perfect game is, by it's very nature, impossible to lose. Possibly not the ideal way to go about recording a perfect game, because it's been proven repeatedly that more strikeouts are better (SIERA has done some great work on this front), but this game had significant positive value for his team, and even if the pitcher was exceptionally lucky, why should he be penalized for it?
Yes, Pedro Martinez in 2002 was better than Ryan Rowland-Smith in 2010, but we knew that already. The real question remained how much better was he? WAR (Wins above Replacement player) only tells us such much, and is heavily reliant upon playing time (more playing time with good results is better for a team than less playing time with the same relative results). FanGraphs WAR is slaved to FIP, and FIP, while reliable, is still a metric that could be improved upon. Trying to solve this current problem of how to improve Ottoneu's scoring system, I asked Niv if anybody had ever tried to calculate wRC or wRC+ against pitchers, thinking that if the basic formula behind wRC was good enough to use for hitters in Ottoneu, it might be able to be adapted as a metric to use for hitters as well.
Any stat ending in +/- in sabermetrics means that it's scaled to the league average (and generally park adjusted as well), so the average of all wRC's in the league, after being scaled to account for playing time, would be 100, and any deviations from 100 are percentage points better or worse than the league average, meaning a wRC+ of 120 is 20% above average, an 80 being 20% worse than average, and so forth. Applied to a pitcher, this would tell use how good or bad a pitcher was at limiting the total amount of offense against him.
Simply put, how often a pitcher gave up singles, doubles and triples, as well as the walks, strikeouts, hit-by-pitches, and home runs already incorporated into FIP, with the appropriate weight given to every event, compared to the league average pitcher. I knew that wRC already existed, and has been used, tested and confirmed to be a completely reasonable metric, so why not look into the wRC allowed by any given pitcher in a season? That should tell us exactly how good or bad a pitcher's results were.
As far as I knew, this hadn't been done. BP has TAv (True Average, an all inclusive BP proprietary stat that works in the same fashion as wOBA or wRC+) against, but it isn't publicly available, and is harder to explain to someone not involved in the statistics world. On the other hand, wRC+ is incredibly easy to explain, because it can be given in terms of percentage points better or worse than the average. Obviously, the number of hits a pitcher gives up would be a huge factor in this statistic, but over a significant number of innings, the quality of the defense should normalize and BABIP (batting average on balls in play) would regress towards that predicted by a pitcher's batted ball profile and skill-set.
When FIP was first created, it was in response to BABIP fluctuations drastically affecting a pitcher's performance, and with walks, strikeouts, home runs allowed, and hit by pitches remaining fairly constant from season to season for most pitchers, FIP made sense to use as an evaluation metric, because it still accurately separated the best pitchers from the worst pitchers. With the invention of linear weights, we can exactly calculate just how much each event matters, so the noise produced by a good or bad BABIP can be accounted for to some extent.
The stats I've come up with to date to use are wRA, the net accumulation of weighted Runs Created in a given time period/season(s)/career, wRA/PA, or how many weighted runs a pitcher expects to give up per batter faced, and wRA/9, how many weighted runs a pitcher should expect to give up per 9 innings pitched, or, a linear weights equivalent of ERA. Obviously, this statistic is not meant to be the end-all of pitching statistics, but I see it as a step forward, slowly working to expand our horizons for statistical evaluations of pitchers. This is not to say that FIP shouldn't be used, because it has proven to be a reliable metric, but there's always more than one way to evaluate a player, and none of them are perfect.
wRA is built using the same linear weights that make up wRC. Points are accumulated exactly like wRC, except in this case, since a higher wRC means that a pitcher allowed more "aggregate" offense (not necessarily in the form of runs given up, but did allow more total bases and/or baserunners), a higher wRC is bad. This could be accomplished by either pitching poorly, such as Brandon Backe in 2008, who accumulated a wRA of 124.12 through 168 2/3 innings, or a 6.622 wRA/9, or by simply racking up lots of innings, such as Roy Halladay in 2003, where he accumulated a wRA of 98.19, but did so by virtue of logging 266 innings, or a wRA/9 of 3.322. These examples are given to remind you to keep things in perspective. Don't just look at the wRA allowed, look at wRA/PA, wRA/IP, or wRA/9, and see just how much offense they're allowing relative to how much they've pitched.
Due to fluctuations in luck and the order in which hits are given up, one can't necessarily reliably predict ERA using wRA, but it should serve as a reasonable approximation. As an example of how the order of hits can matter, we have two pitchers, Albert and Brendan. Both pitchers pitch complete innings, and do not get pulled mid-inning. In every inning, Albert first gives up a double, walks the following batter, then proceeds to record three consecutive strikeouts. In every inning, Brendan first allows the walk, then the double, then records his three consecutive stikeouts. Both pitchers would have identical results according to wRC, but Brendan is likely going to give up far more actual runs, due to the order in which his events occurred. Luck, order of events, and grouping of events all play an important factor, so they cannot be discounted when discussing ERA, and thus comparing wRC to ERA (or RA, if you prefer to avoid the earned runs vs unearned runs mess).
Using the 2010 data, here are a few sample players:
| PA | IP | wRA | wRA/PA | wRA/9 | |
| Felix Hernandez | 1001 | 249.7 | 71.23 | 0.07116 | 2.56743 |
| Jeremy Bonderman | 713 | 174.3 | 63.21 | 0.08865 | 3.2638 |
| Joel Pineiro | 634 | 152.3 | 67.4 | 0.10632 | 3.98316 |
| R.A. Dickey | 754 | 171 | 95.72 | 0.12695 | 5.03784 |
| Ryan Rowland-Smith | 510 | 109.3 | 86.37 | 0.16395 | 7.11198 |
The league leader, Felix Hernandez, the 25th percentile R.A. Dickey, "league average" Joel Piniero, the 75th percentile Jeremy Bonderman, and in last place, Ryan Rowland-Smith, who was truly terrible (from 2002 to 2010, this was the 2nd worst season by wRA/PA and wRA/9). He was a full .02 wRA/PA worse than Zach Duke, who finished next to last. A gap of .02 wRA/PA is also approximately the gap between the respective 2010 seasons of Zack Greinke and Vin Mazzaro.
There are some interesting trends to be found in this data, and they fit with the intuitive understanding of how baseball works. If you plot wRA/PA as a function of ground ball rate, there's a noticeable negative trend, so generating more ground balls tends to lower one's wRA/PA. Logically, ground balls rarely turn into extra base hits (which are obviously far more damaging than a single), so this makes sense. As the number of plate appearances a pitcher has in a season increases, the overall wRA/PA tends to decrease. Better pitchers tend to pitch more (without regards to injuries), because teams know who the better pitchers are, so they are allowed to pitch more. The long reliever / spot starter is a long reliever because the 5th starter is assumed by the coaching staff (sometimes incorrectly, but typically the coaches are correct in this assessment) to be better. As line drive rate goes up, wRA/PA goes up. Line drives drop for hits far more often than ground balls and fly balls, and often result in extra base hits, so a higher line drive rate would therefore be worse. All of this data makes logical sense, and if the data matches the logical conclusions, then usually the idea is a valid one.
However, there's also a strong correlation between BABIP and wRA/PA, which also makes sense. The more hits a pitcher allows, the worse his results are, so this obviously isn't perfect, but every measure can be, and should be, improved on. Generally speaking, pitchers who give up more ground balls are better, because while they tend to give up more hits overall than fly ball pitchers, those hits are often singles, whereas fly balls typically turn into doubles, triples and home runs. So, the age-old question of how exactly to account for this remains unanswered. Linear weights offers the best solution that I currently know of for evaluating hitters, so I decided to extend that analysis to pitchers. Better weightings, better methods of calculation, and better record keeping are all possible. These things are improvable, so I don't think that this field (or any other) has been explored to its fullest. No statistical evaluation tool will ever be perfect, but the pursuit of perfection is still a worthwhile goal, because there's always a more accurate metric always exists. While every player might be a sample size of one, the aggregate of all those individuals is still an incredibly powerful tool for evaluating.
All raw data used was obtained from Baseball Prospectus, and is used with permission. The linear weights used are the 1974 to 1990 weights from TangoTiger, and are used with permission.
9 comments
|
1 recs |
Do you like this story?
Comments
I really like this. Nice job
I used to use a system similar to this when evaluating my MLB Showdown cards (did anyone else ever play that?)—if you assume (as the cards did) that what kinds of hits balls fall for are under the pitcher’s control, this might be the best way to measure pitchers.
The big elephant in the room is that it doesn’t really address DIPS theory—maybe it’s just me, but breaking plate appearances up into separate events emphasizes the impact of BABIP in my mind. Maybe it would be possible to throw some SIERA-esque batted-ball and pitcher-specific adjustments in there to make xwRA/9? I’m not sure how that work.
There’s also the problem of big innings (there are definitely instances when pitchers struggle once they start getting flustered), and since LOB% isn’t a factor Mariano Rivera doesn’t get a boost from the fact that he consistently posts insane strand rates. I’m guessing that would make both really good and really bad pitchers more clustered towards the middle.
Anyway, this is really interesting stuff, great way of approaching it. I don’t know if you’re planning to reveal this in the future, but I’d be interested in seeing how well wRA/9 correlates with ERA, FIP, etc.
Contributor @ Beyond the Box Score. Lead Blogger @ Wahoo Blues. Sophomore @ Brown University. Twitter: @LewsOnFirst
"Baseball, it is said, is only a game. True. And the Grand Canyon is only a hole in Arizona."—George Will
Now that you mention it, that seems like a good idea. I’ll plan a follow-up post to address a few things like FIP, ERA, tERA, etc and how this compares to them.
Thanks for the positive comments, really appreciate it.
Jacob Smith
Email: blasek0@gmail.com
Twitter: JTD_Smith
by Jacob T Smith on Aug 17, 2011 9:55 PM EDT up reply actions
Yahoo
Should be noted for those that run linear weights leagues in Yahoo that Total Batters Faced is a scoring option.
Thank you for writing this, it was a very interesting read.
I do have a question though: What ever happened to tRA, and how does this compare?
Purple Row - For all of your Colorado Rockies-related needs
Learn about Batting Metrics
Learn about Pitching Metrics
Thanks
tERA is available over at Fangraphs. I was thinking about doing a follow-up post showing comparisons between wRA, SIERA, tERA/tRA, ERA, x/FIP, etc at some later point.
Jacob Smith
Email: blasek0@gmail.com
Twitter: JTD_Smith
by Jacob T Smith on Aug 17, 2011 9:53 PM EDT up reply actions
Runs allowed by a pitcher are not linear in hits / walks.
Let’s assume that a pitcher cannot be relieved and pitches through each entire inning (for simplicity sake). The first hit in an inning a pitcher allows costs them very little. The second hit in an inning costs a little more. The third hit likely costs them a lot, and so on. Eventually, there is likely a saturation point, where the marginal gains on the Xth hit from the (X-1)th hit is nearly 0. Still, the point stands that runs allowed by a pitcher are more like exponential / logarthmic in hits (and walks work in a similar way).
You might say that I’m considering situation too much, and that the hits allowed would be randomly dispersed across innings. I agree with that. My point is that there is likely a point of inflection (so to speak), where the runs allowed by a pitcher starts to spike when conditioned on the number of walks and hits. This likely explains why the distribution of ERAs is flatter than expected.
I’m looking at pitchers from 2002-2010 who threw 100+ IP in a season to evaluate this, so I’m working with a huge number of innings as a sample. I’m more trying to evaluate general trends in the population sample than identify anything specific.
Jacob Smith
Email: blasek0@gmail.com
Twitter: JTD_Smith
by Jacob T Smith on Aug 17, 2011 9:55 PM EDT up reply actions
Component ERA
Component ERA, Peripheral ERA, or anything else are all of the same nature: use the components (hits, walks, HR, etc), to figure out the runs allowed.
Bill James, BPro, MGL, and a bevy of others have all done this.
The key point between Linear Weights (additive) and BaseRuns (multiplicative) is that Linear Weights works for individual hitters, but BaseRuns works for individual pitchers.
So, if you want to create a good component runs formula, it would really need to be BaseRuns based, and not Linear Weights based. wOBA, wRC, wRAA are all Linear Weights base.
I’m fairly new to the saber world (and baseball stats in general, I never got into it when I played), so a lot of the “required reading” that most veterans have, I haven’t seen and am thus not totally familiar with, or familiar with at all, my apologies. I’m working on catching up with that. That said, I do find your site very helpful, and thanks for re-linking this there. I’m not trying to re-invent the wheel and create my own formula for that, because it’s been done before by people smarter than I am, and much more intimately familiar with baseball, I’m just trying to look at all the raw data in a way that makes sense to me.
I chose to use linear weights because I wanted to look at how much “net” offense he allowed, without regards to grouping, sequencing, etc. I completely agree that BaseRuns is the superior metric if you’re trying to evaluate how much actual scoring is going to take place against a pitcher, because in that case, grouping of hits and sequence of said events would be hugely important factors.
Jacob Smith
Email: blasek0@gmail.com
Twitter: JTD_Smith
by Jacob T Smith on Aug 17, 2011 9:59 PM EDT up reply actions

by 


































