Variance of the FIP Estimate

This is my first foray into anything resembling a significant amount of baseball data, so the following is a little messy and hard to read. The formatting for exponents did not come through right. Plus, to save some of you some time, the math here is rusty and I've likely made some mistakes, not to mention that at the end of it all I ended up finding nothing (which was alright with me) but might not be of any interest to the rest of you.

I'm curious as to the range around estimates and, in particular, what can effect that range. In the baseball world, I'm particularly interested in the range around the FIP estimate for a pitcher and how it might vary with changes in K, BB, and HR rates.. So I toyed with some data.

First, my data:

I pulled seasonal pitching data from between 1974 and 2005 for all pitchers that threw more than 20 innings. This results in ~11,000 data points. I pulled Earned Runs, Strikeouts, BBs + HBPs, HRs, Batters Faced, and Innings Pitched.

Second, my theory:

FIP's coefficients derive from the base-out run expectancy states plus a constant to bring it to the league average ERA. The run expectancy tables themselves are derived by linear weights of events, but I'm going to take a slightly different route..

The formula that everyone is pretty comfortable with is:

FIP (pred_ERA) = -2 * K/IP + 3 * BB/IP + 13 * HR/IP + 3.2 * IP/IP

This is the linear-equasion of the form Y = XB.

ER/IP = A * K/IP + B * BB/IP + C * HR/IP + D

If I fit this equation to a generalized least-squares regression, using batters faced to pick my variance weights, then I can determine a covariance matrix for the four factors. I know variance will decrease with innings pitched, but what about for the other components? I would expect variance to increase as a pitcher gives up homeruns, since the runs allowed will vary heavily based on how many men are on base ahead of the homeruns. I'd expect the same as walks increase, and I'd expect the opposite as strikeouts increase.

Third, my approach:

If I distribute the innings pitched then I can rewrite this as:

ER = A * K + B * BB + C * HR + D * IP

Now, since I'm not normalizing my components, I know that I'll have different variances around the error term for each of my observations. A pitcher that pitches 200 innings will be much closer to his mean expectation, on average, than a pitcher that pitches 20 innings. Instead of innings, I've used batters faced (BF) for this weighting. To do this, I have calculated a weighting vector as simply:

wi = SQRT(BFi / Max(BF))

That is to say, give 100% weight to the pitcher that faced the most batters, and weight each observation from a pitcher who faced fewer batters at a decreasing rate. I will use this as my variance weighting matrix instead of the identity matrix. Note that this is judgementally selected as I'm effectively assuming that the variance around the number of runs allowed (as a function of HR, BB, and Ks) grows as the number of batters faced grows but not at a 1-1 scale. Example: A pitcher that has faced 20 batters would probably have a standard deviation of 1-2 runs while a pitcher that has faced 2000 batters might have something closer to 50, with all else held equal.

My assumption for the error term around my predicted ER (ER-hat) is thus:

ei ~ N(0, σ2Ω)

Now, I'm ready to fit my least squares regression:

β = (X'ΩX)-1(X'ΩY)

Given that I'm approaching this differently, I did not expect to get the same weights as based on the more detailed approach for each individual event. In particular, the coefficients I got were:




IP (Constant)

β * 9





SE (β) * 9





Student T





These are quite a bit different. The resulting ERA predictor would be:

pERA = (-1.25 * K + 3.08 * BB + 15.18 * HR) / IP + 2.13

FIP = (-2 * K + 3 * BB + 13 * HR) / IP + 3.20

My fit suggests that pitchers get less benefit from strikeouts than the FIP equation, about the same damage from a walk, and far more damage from a home-run. The resulting constant is also significantly lower at a rate of +2 runs per nine innings instead of +3.2 runs per nine innings.

I'd have liked my coefficients to be closer to the FIP numbers, but I'm more interested in the covariance matrix. While my coefficients do come out quite a bit different, the unweighted standard deviation of my error term around my estimated ERA is .973 compared to the unweighted standard deviation of the error term around FIP of .982.

A weighted variance based on innings pitched changes these standard deviations to .595 and .684 respectively - this makes common sense as we expect the variance around our estimate to decrease as our underlying sample size increases and it gives me a baseline error variance for comparison.

The variance around a predicted variable is as follows:

σ2i = (σ2 * (1 + xi * (XΩX') * xi') * Wi)

I've generated a dataset of K/9, BB/9, and HR/9 values using 150 innings as my basis and a BABIP of exactly .300 to determine the number of batters faced for each scenario. Using my covariance matrix from my fitted model, using a modified Wi that averages out to 1 (rather than one with a non-unity average), and using each one of my scenarios (example: 150 IP, 1 K/9, 1 BB/9, 0 HR/9) to calculate the individual variances. This gives me a range of reasonability around each FIP estimate with an overall standard deviation (on the ERA scale) of .615 when applied against my observed dataset.

And finally, onto some results..

With these ERA-scaled variances, I have plotted a 95% confidence range around various FIP estimates by K/9, BB/9, and HR/9 buckets to see how the range changes as one increases each of these variables.

Let's start with innings pitched where we'll find an obvious result.. Note that, for innings, I have used all of the observed data and calculated the total FIP for the group within that bucket of innings. You can see that it hovers at league-average until dipping a bit around 200 innings, where we would expect the more elite pitchers to be sitting anyhow.



As the number of innings a pitcher throws increases, the variance around his FIP decreases rather quickly up to about 100 innings. Afterwards, the variance continues to decrease but at a slower and slower rate. This makes intuitive sense as the sample size of batters faced is increasing, so BABIP and LOB% should both normalize with higher inning counts.

This is the only graph based on actual observed data from 1974 through 2005. For each of the following graphs I have fixed the innings pitched at 150, the K/9 at 6.6, the BB/9 at 3.0, and the HR/9 at 1.0. Only one variable is allowed to change at a time.

Now let's look at K/9



Indistinguishable in this graph is a very minutely decreasing standard deviation at the rate of ~.001 runs per nine for every additional strikeout per nine innings. Although I had guessed that a higher strikeout rate might reduce variance, this isn't particularly surprising. A strikeout is simply a nearly-guaranteed out, while a ball-in-play is only frequently an out. Replacing the some balls in play with strikeouts doesn't do a lot to change the results range and while there is a decrease in the variance around that estimate, it's really not significant.



For BB rate, I expected more variance around the predicted ERA as a pitcher walked more batters. My logic went that if a pitcher had more walks (which are 100% non-outs) and everything else held stable, that the error range around the estimated ERA would increase since there are now more potential scorers than there would have been. As seen here, there is no significant change. For every 1 BB/9 increase, the error range around the estimated ERA increases by .002 runs per nine. Despite the additional runners, it seems that the LOB% is stable enough at this point to maintain a similar error range.

For HR rate, I expected a similarly increasing range as I had expected for walks. Simply put, if a pitcher is more likely to give up a home-run, then he's also more likely to give up a home-run with men on-base. The variance does increase at an alarming rate of .005 per additional home-run every nine innings. Since pitchers don't regularly jump by .5 HR/9 over a season, that's .0005 runs every nine innings of deviation for every .1 HR/9.

For each of these graphs, I've specifically targeted a pretty stable group of pitchers. Those pitchers that allow 3 BB/9, 1 HR/9, and 6.6 K/9 are pretty close to middle-of-the-pack guys, which is where most of my projection points are. Given that, I'm graphing the areas where the variance should be minimized as a result of the regression. While this demonstrates that any one of K/9, BB/9, or HR/9 does not have a significant impact on variance for the typical pitcher, I continue to wonder about the potential covariance among the terms -- I would think a pitcher with high K, low BB, and low HR rates would have a much tighter range around his FIP than a pitcher with low K, high BB, and high HR -- but those are questions for a different day.

I've dropped the ERA of 35 pitcher-seasons from 1974 through 2005 that falls near my range - BB/9 between 2.8 and 3.2, HR/9 between .9 and 1.1, with only a variable K-rate over my K-rate graph from above.



If you made it this far, kudos and thanks. When (if) I try this again, I hope to have something a bit cleaner with an actual outcome at the end.