Bob Gibson is the luckiest pitcher I ever saw. He always pitches when the other team doesn't score any runs.
-Tim McCarver
So, despite the tongue-in-cheek nature of McCarver's comment, it brings up a well-known fact within the sabermetric community: Pitcher wins are simultaneously a product of pitcher ability, team ability, and a lot of luck. I know, this isn't exactly a secret. Google "pitcher wins luck" (without quotes) and you'll get roughly 9.7 millions results. "Pitcher wins overrated" gives over 88,000 results. Again, this isn't news.
Must Reads
However, pitcher wins are still a major component of awards voting. Taking a look at Tom Tango's Cy Young Predictor, assuming it is up to date, one of the largest components of his predictor is wins. Update: Tango's updated formula decreases the emphasis of wins. This formula has been pretty good at mimicking the thought process of the BBWAA (He's gone 22-for-22 on the past 11 elections). And he's not the only one. Rob Neyer's Cy Predictor also has a large wins component.
However, with the discussion about the usefulness of wins intensifying, is it possible that wins are slowly losing their place in Cy Young voting? It would be gratifying if they were, but how would we look at this? Well, it is possible to do, but there are some things we'll have to address first.
This is where the gory math begins. For those who want to avoid the math, feel free to skip ahead a few paragraphs
First, the data. We're working with the Cy Young top-5 finishers from the last 25 years, specifically the starters. As relievers work on an entirely different plane of statistics than starters* they were left out of this exercise.
*By this I mean that what's great for starters might not be for relievers. For example, a 3.00 ERA is the 50th percentile for relievers, but the 84th percentile for starters in 2013. 6 wins for a reliever is pretty big, but probably pretty bad for a full-season starter.
Now, what type of response do we have in this data? Well, we're going to work with the rank of finish for that year and league. Now, there's a slight problem: this isn't a continuous response. By that, I mean the response can only take values of {1,2,3,4,5}. You can't run a regular regression on this, so what do we do? Let's say that these are ordered groups. Now, let's assume that there's some unseen measure that causes a pitcher to fall in one of these groups. This unseen measure is called a latent factor, and we assume that it is entirely continuous value on the interval (-∞, ∞). We could run a regression on this, but we don't know its values. However, we'll deal with that later. Now, I said that we could classify the Rank based on this latent value z_{i}. Specifically, we could assume the following.
Rank = | If... |
---|---|
5 | z_{i} ≤ 0 |
4 | 0 < z_{i} ≤ τ_{3} |
3 | τ_{3} < z_{i} ≤ τ_{2} |
2 | τ_{2} < z_{i} ≤ τ_{1} |
1 | τ_{1} < z_{i} |
Of course this presents more problems as we don't know the values of the τ_{i}'s, but again, we'll get to that really soon. So, we have a potential regression and classification method, but we don't know their values. Well, this is where Bayesian statistics come to the rescue. We can treat these as random variables and integrate over all their possible values stochastically. We'll do this through a MCMC sampler based on the following hierarchical model.
Rank_{i}|z_{i} given in the table above
z_{i} | β ~ N(x_{i}'β, 1)
β | φ ~ N(0, 1/φ)
φ ~ Γ(0.1, 0.1)
We can use the MCMC sampler to estimate the effect size of the variables/statistics of interested. This effect size will give a little idea of how much the voters collectively weight those statistics in their decision, at least subconsciously.
Okay, the gory math is done. Welcome back to anyone who skipped to this point.
So what statistics will we look at? Well, I decided to limit things to the numbers Tom Tango was looking at: ERA (Specifically ERA+), Wins, Innings Pitched, Complete Games, and Strikeouts (Specifically K%). I also included an intercept in the model and a variable called "Past Winner." It's defined as you'd expect, with value 1 if the pitcher had won a Cy Young previously in their career, and 0 otherwise.
To take a look at the changing effect size through the years, we're going to look at the data from 1989 to Year X, with X ranging from 2004 to 2013, and run the model just on those year. This will allow for a more gradual way of looking at coefficients.
What do the results show? Before we get to the wins effect, some general observations...
- The ERA+ effect has stayed pretty constant over the 10 datasets in question,
- The effect of being a former winner has decreased since 2004, but has stayed roughly constant since 2010. Being a former winner does slightly increase your chance of finishing higher in the race. However, this is most likely a function of good pitchers staying good rather than a voting bias.
- Innings Pitched effect hasn't changed much over time, and frankly has a small effect. This probably is due to all Cy Young top-5 candidates having pretty decent IP totals.
- The Complete Games effect has essentially disappeared, probably due to complete games disappearing.
- The K% effect has been increasing quickly, relatively speaking.
Okay, now to the effect that we're all interested in: the wins effect. Using all the data up to today, higher wins totals does increase your chances of a higher rating, but not as much as in year's past.
To get a more concrete idea, let's run the model on two datasets: 1989-2005 and 2006-2013. When comparing the effect size from these two datasets, the 1989-2005 data had a wins effect size of 0.355 while 2006-2013 was 0.142, or roughly a 60% decrease in effect size. While it could be that voters still value the win the same, they are at least giving pitchers with lower win totals more consideration than previously.
Does this mean wins are going away entirely? Of course not. There is at least some aspect of pitcher ability found in wins, albeit very small. Plus, the more casual fan and fantasy baseball player will be married to wins for the foreseeable future. But does this mean voters are valuing the win less? In the end, it's a distinct possibly, or at least the data seem to potentially be pointing that way.
. . .
Statistics courtesy of Baseball Reference.
Stephen Loftus is an editor at Beyond The Box Score. You can follow him on Twitter at @stephen__loftus.
Connect with Beyond the Box Score