clock menu more-arrow no yes mobile

Filed under:

Saberizing a Mac #6: Calculating FIP, FIP-, ERA-

Beyond the Box Score continues its classic series on creating your own sabermetric database.

FIPify your lifestyle.
FIPify your lifestyle.
Kyle Terada-USA TODAY Sports

I don't intend to give every metric it's own article, though it may certainly seem that way after both wOBA and FIP seem to be getting special attention this week. But calculating some of the minus metrics can get a bit complicated for beginners. At least, it certainly took me more than a few unsuccessful queries to get it right and I thought I might try and save you some of the aggravation.

As with calculating wOBA, the Fangraphs "GUTS" download will prove extremely helpful in generating single-season FIP totals for your database. Colin pointed out in Monday's edition of Saberizing that you can certainly calculate both wOBA weights and yearly FIP constants yourself using methods outlined at BASQL and at The Book Blog, but I've found that generally people prefer their calculations to match those at Fangraphs.

I'm sure we all know the FIP formula by heart at this point:

C= (HR*13+(BB+HBP)*3-K*2)/IP

Which in SQL will look like this:

SELECT CONCAT(m.namefirst," ",m.namelast) as NAME
, p.yearID
, ROUND(g.cFIP+(sum(p.HR)*13+(sum(p.BB)+sum(p.HBP))*3-sum(p.so)*2)/(sum(p.ipouts)/3),2) as FIP
, ROUND((sum(p.ipouts)/3),1) as IP
FROM pitching p, master m, GUTS g
WHERE p.playerID = m.playerID
AND p.yearID = 2012
AND g.season = p.yearID
GROUP BY p.playerID, yearID
HAVING IP > 150
ORDER BY FIP asc
LIMIT 10;

If your SQL is telling you there is an error like "BIGINT UNSIGNED value is out of range," then your integers are 'unsigned' (meaning they won't convert into negative values) and you may need to re-write your code like this:

, ROUND(g.cFIP+(CAST(SUM(p.hr) AS SIGNED)*13+CAST(SUM(p.hbp) AS SIGNED)*3+CAST(SUM(p.bb) AS SIGNED)*3-CAST(SUM(p.ibb) AS SIGNED)-CAST(SUM(p.so) AS SIGNED)*2)/
(SUM(p.ipouts)/3),2) as FIP

Either way, your results should match up exactly with Fangraphs FIP leaderboard for 2012. I always then prefer to remove IBB from the formula, even though Fangraphs does not, but this is entirely up to you, obviously.

FIP- / ERA-

Once you've got the formula down, then, getting the minus metrics shouldn't be too difficult. Although, fair warning here: You may have difficulty getting your minus metrics to line up perfectly with Fangraphs unless you download their Park Factors, which in their present format are a bit of a pain for those planning to download multiple seasons. If someone wants to make a giant file combining each season and share with the rest of the group, that would be wonderful. (I made it to 1980 and then nearly died of boredom).

It also depends on whether you want to compare to the league average or the MLB average. Typically your numbers won't be off more than a point or two, but at least you'll gain an appreciation for the true impact park factors have on our measurements. I'll show you what I do and you can adjust yours accordingly.

First you need to create a "League-Average" table and index that bad boy:

CREATE TABLE lg_averages AS

SELECT p.yearID, p.lgID

, ROUND(SUM(p.ER)/(SUM(p.ipouts/3))*9,2) as lgERA
, ROUND(g.cFIP+(SUM(p.HR)*13+(SUM(p.BB)+SUM(p.HBP))*3-SUM(p.so)*2)/(SUM(p.ipouts)/3),2) as lgFIP
FROM pitching p, GUTS g
WHERE p.yearID = g.season
GROUP BY yearID, lgID;

CREATE INDEX year_idx ON lg_averages (yearID);

Especially for recent seasons, I prefer to divide up the averages by AL/NL due to the fairly significant differences between the two leagues at the moment. (I believe Baseball-Reference does not do this for their ERA+ metric, however). This is achieved, you may have noticed, by adding "lgID" to the GROUP BY clause. If you prefer to create an "mlb_averages" table, simply remove "lgID" from the GROUP BY clause.

Then return to your original single-season FIP leaders query and divide the FIP formula by the lgFIP from our lg_averages table:

((g.cFIP+(SUM(p.HR)*13+(SUM(p.BB)+SUM(p.HBP))*3-SUM(p.so)*2)/(SUM(p.ipouts)/3)) / l.lgFIP

Now it's time to add your park factor. The Lahman database actually comes with a set of park factors that are certainly more than adequate for our purposes. You can find them in the "teams" table labeled "ppf" for "Pitcher's Park Factor". The ppf is designed to contribute towards ERA+ measurements, so we have to remember to flip it upside down, so to speak. You can go about this a number of ways, but for me it works best to subtract the ppf from 200. So the 2012 ppf for Seattle changes from 91 to 109 (or 200-91 = 109).

So in the end you have this:

, ROUND(((g.cFIP+(SUM(p.HR)*13+(SUM(p.BB)+SUM(p.HBP))*3-SUM(p.so)*2)/(SUM(p.ipouts)/3))/l.lgFIP*(200-t.ppf)),0) as FIP_minus

Remember now, you've added two more tables to the query, so you will need to define something on which they can relate. You'll need to join the pitching, GUTS, lg_averages, and teams tables all on 'yearID'/'season', join teams and pitching on 'teamID' and lg_averages and pitching on 'lgID'. Your final product should look something like this:

SELECT

CONCAT(m.namefirst," ",m.namelast) as NAME
, p.yearID, p.lgID

, ROUND(SUM(p.ER)/(SUM(p.ipouts/3))*9,2) as ERA
, ROUND(SUM(p.ER)/(SUM(p.ipouts/3))*9/l.lgERA*(200-t.ppf),0) as ERA_minus

, ROUND(g.cFIP+(sum(p.HR)*13+(sum(p.BB)+sum(p.HBP))*3-sum(p.so)*2)/(sum(p.ipouts)/3),2) as FIP
, ROUND(((g.cFIP+(sum(p.HR)*13+(sum(p.BB)+sum(p.HBP))*3-sum(p.so)*2)/(sum(p.ipouts)/3))/l.lgFIP*(200-t.ppf)),0) as FIP_minus
, ROUND((sum(p.ipouts)/3),1) as IP
, t.PPF
FROM pitching p, master m, GUTS g, lg_averages l, teams t
WHERE p.playerID = m.playerID
AND p.yearID = l.yearID and p.yearID = t.yearID AND g.season = p.yearID AND p.yearID = 2012
AND p.teamID = t.teamID
AND p.lgID = t.lgID AND p.lgID = l.lgID
GROUP BY p.playerID, yearID, lgID
HAVING IP > 150
ORDER BY FIP asc
LIMIT 10;

Returning this group of names:

NAME yearID lgID FIP ERA ERA_minus FIP_minus IP PPF
Gio Gonzalez 2012 NL 2.82 2.89 72 72 199.3 101
Stephen Strasburg 2012 NL 2.83 3.16 79 72 159.3 101
Felix Hernandez 2012 AL 2.84 3.06 82 75 232.0 91
Clayton Kershaw 2012 NL 2.89 2.53 67 77 227.7 96
Justin Verlander 2012 AL 2.94 2.64 63 69 238.3 103
David Price 2012 AL 3.05 2.56 67 79 211.0 93
Adam Wainwright 2012 NL 3.10 3.94 103 82 198.7 97
Cliff Lee 2012 NL 3.13 3.16 79 79 211.0 101
Wade Miley 2012 NL 3.15 3.33 79 76 194.7 106
R.A. Dickey 2012 NL 3.27 2.73 71 86 233.7 97

Now, before any of the SQL gurus out there get on my case, I should mention that this won't work for a player like Zack Greinke, who switched leagues in the middle of the season. We can look at how to handle cases like that at a later date. For now, though, I'm hoping you just get comfortable with the idea of making joins, creating lg-avg tables, and using park factors to expand your SQL game.

As always, don't be afraid to ask questions in the comments and the community at large and myself will be eager to assist you.

More from the Saberizing a Mac series:

Sky's SQL Basics

#2 Baseball Databank

#3 Win Shares

#4 Pitch f/x

#5 wOBA

#6 FIP, FIP-, ERA-

#7 Year-to-year correlations

Thanks to Fangraphs and Sean Lahman for making all of our lives a little nerdier.

Follow @JDGentile on twitter.