FIP is "just" a correlation between the "true outcomes" (HR, BB, K & HBP) and ERA. Surely there is uncertainty in the correlation of FIP to ERA. But what does it mean to have uncertainty in FIP? FIP is just a weighted set of counting stats—It’s not like strikeouts are being miscounted or something. So where does the uncertainty come from? FIP is actually a "rate stat". HR, BB, K, & HBP are combined on a per inning basis (innings pitched is in the denominator). A mean derived from a set of pass/fail experiments will always have experimental uncertainty attached.
Why do we care about the uncertainty in FIP? Because FIP is the basis for pitcher fWAR. Any uncertainty in FIP will be strongly reflected in the fWAR computation. Understanding the uncertainty in FIP is the first step in understanding the uncertainty of pitcher fWAR.
To quantify uncertainty in FIP, it helps to transform the equation into terms with easy-to-quantify confidence intervals (confidence intervals are a form of expressing uncertainty). First, a reminder of the definition of FIP:
It’s tough to establish a confidence interval on a per inning term because there are an unknown number of "events" in an inning. I’ll rearrange the definition of FIP (using just algebra) to generate a set of binomial terms:
A binomial term is a pass/fail metric. For example, K% is binomial if you split plate appearances as either a strikeout or a non-strikeout. Note that this definition of FIP is exactly equivalent to the "standard notation". FIP computed this way yields the same values as the "normal way". Some of the minor terms can be further minimized:
I approximated pitcher specific BIP/TBF and HBP/TBF with league averages because they’re "second order" terms.To being explaining why, the next graph shows these eliminated terms across the 2014 qualified pitcher set.
The correlations with FIP are weak, and the standard deviations of the parameters are small. These terms are not going to "drive" the FIP computation. Eliminating these terms is a handy simplification and not fundamental to the FIP uncertainty quantification. The next plot is a correlation of actual FIP to FIP’ (using league average BIP/TBF and HBP/TBF) to justify eliminating the second-order terms.
Back to the definition of FIP’:
There are five pitcher specific terms: HR/FB, FB%, K% BB%, and outs/TBF (TBF/outs needs to be inverted to achieve a ratio less than one and perform the uncertainty quantification). Now, each term has an easy-to-quantify uncertainty. I’ve included the equations for computing K% and HR/FB for reference (using the normal approximation for binomial confidence). For demonstration, David Price’s 2014 HR/FB ratio is 9.7%. The 90% confidence interval around this term is ±3.0% or [6.7%, 12.7%].
Now I can substitute the upper and lower confidence interval values into FIP’ to find the sensitivity of each input parameter. The next graph shows this for K% and HR/FB.
Not surprisingly, FIP’ is more than twice as sensitive to HR/FB than it is to K%. Performing the same technique for the other FIP’ parameters generates the following:
Since the largest uncertainties are principally independent of one another, they can be combined in quadrature to establish the 90% confidence interval on FIP’: 0.538 runs per 9 innings. The largest contributor to the uncertainty in FIP is the HR/FB rate. xFIP mitigates this uncertainty contributor considerably by adjusting each pitchers’ HR/FB rate (xFIP therefore has less total uncertainty). Pitch-based techniques can be used to reduce the uncertainty in each of the other major contributors.
The dynamic range of FIP is something like 2-5—Almost no pitchers post a FIP <2 or >5. A FIP uncertainty of ± 0.5 is about 35% of FIP's dynamic range. If FIP now seems like a crude tool, well, it is, at least on a single season basis. Limitations notwithstanding, FIP can discriminate good pitchers from bad pitchers and great pitchers from average pitchers.
A few caveats:
· CIFIP|90% = 0.538 was derived from a pitcher population with a mean of 194.7 IP (810 TBF). The size of the confidence intervals should scale according to 1/√TBF. Rule of thumb: a four season sample should have half the uncertainty of a single season sample.
· Lower values of FIP have slightly less uncertainty. This small effect is likely a selection bias, great pitchers tend to pitch more (i.e., have larger samples).
· I attempted no correction for park factors or seasonal adjustment. This effect can’t be ignored when interpreting the values of FIP; I expect this would have a small effect on the uncertainty of the FIP values.
· The derived value applies only to the uncertainty in FIP, not the uncertainty in the FIP-to-ERA correlation.
David Price’s 2014 FIP was within half a run of his career FIP. This is consistent with the amount of uncertainty present in the FIP calculation. Don’t lose heart David! A career 3.27 FIP is impressive by any standard.
. . .
All data courtesy of FanGraphs
Jonathan Luman is a system engineer with a background in aerospace. You can contact him at email@example.com or follow him on twitter @LumanJonathan.