Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Dissecting Nick Diaz's Positive Drug Test

Baseball Databank Data Dump, I: FIP Constants and Runs Per Win

Hi. I'm Matt. I write here (occasionally) now. 

Perhaps I could have picked something a bit more exciting for my first post, but I kept missing stuff, getting busy with other things, and so on. But I think this might be of utility to some people -- I know there were times I've wished I could easily access stuff like this. 

Like many people, I originally got into SQL databases for baseball thanks to a few posts by Colin Wyers at the sadly-departed Statistically Speaking blog (Colin has since posted new versions of his introductions at The Hardball Times.)

This isn't about SQL, though. It's about putting some data out there for people who want to use FIP scaled to a particular season or need a runs-to-win conversion but don't want to mess with the database stuff themselves. I'll also publish the queries down below for those curious (they are very simple), which I came up with for use in MySQl and the pitching table in the freely available Baseball Databank.


FIP Scaled Historically

You probably already know about FIP (Fielding Indepedent Pitching),  created by Tom Tango. (If not, click here for a good introduction), the simple and beautiful stat that gives us an idea of how well a pitcher performed independently of batted ball luck and the fielders behind him by focusing on things he can control. The basic formula is:

((13*HR)+(3*(BB+HBP-IBB))-(2*K))/IP+constant

Where the "constant" is usually around 3.2, but is adjusted for each season/entity to that the leagues season/entity ERA is the same as the season/entity FIP. Obviously, it's a pain to calculate each season individually, but that's why computers were invented. So I came up with an easy query for that. Not only that, but if you're like me and many others, you hate the "earned run" rule for various reasons I can't get into here. Not only that, but you realize that when, say, calculating Wins Above Replacement, using a stat scaled to ERA undervalues a pitcher relative to position players. So, instead of scaling FIP to ERA, why not to RA? Well, I did both with this query. And here are the results for easy bookmarking:

Historical FIP Scales, 1871-2009

The numbers in the "FIP-ERA" and "FIP-RA" columns are the "constants" as listed above that you need to use to scale them to the ERA/RA for the season. I have two separate sheets, one that groups all the major leagues by season, and the other that groups by season and league (i.e., separates AL and NL).

 

Runs Per Win

You can read the link in the heading if you aren't sure what this means. In the contemporary baseball era, you're pretty safe assuming that 10 runs = one win. But sometimes you want to be more precise, so I've come up with a table for that, too, comparing the results of three different runs per win estimators

  1. Simple Runs Per Game (or lgRA times two).
  2. Tom Tango's simple formula: lgRA*1.5+3
  3. Patriot's PythagenPat version: (league Runs Per Game^(1-z)) times two, where z is between 0.27 and 0.29. I use 0.287 because that's what I read originally and I'm too lazy to figure out if I should change it.

Here is the table:

Runs Per Win, 1871-2009

As noted in the first link in this section, the Pythagorean win expectation is the heart of the runs-to-win conversion, and since PythagenPat is the most accurate win expectation estimator across all contexts, so will its run to win conversion be the most accurate (although, again, you're pretty safe just going with 10 in a pinch). But I actually still use Tango's version in a lot of my SQL's because it's easier to work with quickly, and I'm not sure the loss in precision is worth all the effort, although I will eventually change all my SQLs over to PythagenPat. And, as you can see from the table, in the contemporary era there really isn't all that much of a difference, and even RPG isn't that far off.

I hope this is useful to someone out there, and if not, well, I hope I'm more interesting next time. I'll spare you the SQLs on the front page, but I do have them after the break for those that are interested.

Star-divide

These are for MySQL with the Baseball Databank. Both of these are the "group by league" versions, but if you take out the lgID in the "select" and "group" lines, it will group by all the major leagues of that year.

FIP Constants

SELECT p.yearID, p.lgid , SUM(COALESCE(IPOuts,0)/3) AS IP ,((SUM(COALESCE(ER,0))/SUM(COALESCE(IPOuts,0)/3))*9)AS lgERA , ,((SUM(COALESCE(ER,0))/SUM(COALESCE(IPOuts,0)/3))*9) - ((SUM(COALESCE(HR,0)*13)+(SUM(COALESCE(BB,0))+SUM(COALESCE(HBP,0))- SUM(COALESCE(IBB,0)))*3-SUM(COALESCE(SO,0))*2)/(SUM(COALESCE(IPOuts,0))/3)) AS FIPERScale , ,((SUM(COALESCE(R,0))/SUM(COALESCE(IPOuts,0)/3))*9) AS lgRA , ,((SUM(COALESCE(R,0))/SUM(COALESCE(IPOuts,0)/3))*9) - ((SUM(COALESCE(HR,0)*13)+(SUM(COALESCE(BB,0))+SUM(COALESCE(HBP,0))- SUM(COALESCE(IBB,0)))*3-SUM(COALESCE(SO,0))*2)/(SUM(COALESCE(IPOuts,0))/3)) AS FIPRAScale FROM bdb.pitching p GROUP BY p.yearID, p.lgid ORDER BY p.yearID DESC

Runs Per Win

 

SELECT yearID, lgID

, (((SUM(R)/SUM(IPOuts/27))*1.5)+3) AS RPWTango

, (SUM(R)/SUM(IPOuts/27))*2 AS RPG

, 2*(POW(SUM(R)/SUM(IPOuts/27)*2, 0.713)) PythPat

FROM bdb.pitching

GROUP BY yearID, lgID

ORDER BY yearID DESC

;

Comment 15 comments  |  2 recs  | 

Do you like this story?

Comments

Display:

Very cool stuff

I’m definitely going to find a bunch of ways to use this stuff.

I like baseball.
I write for Beyond the Box Score and The Hardball Times Fantasy

by Satchel Price on Apr 8, 2010 12:35 PM EDT via mobile reply actions  

Thanks, I hope it's usuable

I hope I’m right in assuming that people can round for themselves?

I'm not a sabermetrician, but I do play one at FanGraphs.

Can't get enough of me? Check out my Twitter feed.

by Matt Klaassen on Apr 8, 2010 12:37 PM EDT up reply actions  

Welcome Matt

Neat to see how rapidly and semi-permanently the FIP constant drops below 3 once you get into the mid-90’s. The classic 3.2 constant estimate works pretty well most of the time these days, but it’s sort of at the extreme high end of the variation.

How ’bout a plot of FIP constant vs. year? :)
-j

by JinAZ on Apr 8, 2010 12:42 PM EDT reply actions  

Sounds easy enough.. I'll do it in the next week or so or whatever

do you want it scaled to ERA or RA?

ON a different topic, did you ever come up with an easy way of getting custom lwts scaled to run environment w/ BaseRuns that didn’t totally overweight every event?

I'm not a sabermetrician, but I do play one at FanGraphs.

Can't get enough of me? Check out my Twitter feed.

by Matt Klaassen on Apr 8, 2010 12:45 PM EDT up reply actions  

And, yes, I've read Patriots articles

I’m just too dumb

I'm not a sabermetrician, but I do play one at FanGraphs.

Can't get enough of me? Check out my Twitter feed.

by Matt Klaassen on Apr 8, 2010 12:45 PM EDT up reply actions  

Not really

The base runs equation that I published in the minor league run environment article at THT last month does a good job across most run environments in modern baseball (major and minor leagues). Misses by like 8 runs per season on average. I might have too much weight on the outs, or else I’m missing too much baserunning or something. But I forced the b-coefficients to match up to Colin’s empirical linear weights for 2007-2008 MLB rather than actual run totals. I like that approach better.

Oh, and I was just talking about the ERA constant…though it might be interesting to show how they diverge as the error rates pick up back in the no-gloves era.

Cheers,
Justin

by JinAZ on Apr 8, 2010 11:19 PM EDT up reply actions  

I'm sure wOBA in BDB is your next topic.

Right?

rzar.wordpress.com
draysbay.com
raysprospects.com

by RZ on Apr 8, 2010 2:41 PM EDT reply actions  

Actually, I was thinking of just posting the recent weights and discussing them

I’m not a SQL teacher, I just put them in here because I hadn’t seen t hem before and to “show my work.”

Might also do historical BaseRUns-FIP at some point

I'm not a sabermetrician, but I do play one at FanGraphs.

Can't get enough of me? Check out my Twitter feed.

by Matt Klaassen on Apr 8, 2010 3:14 PM EDT up reply actions  

That is great Colin

I tried it around when BDB came out. But the wOBA numbers I got were about 5-15 points lower than Fan Graphs which derives their numbers from the same data set. I think it might have been my wOBA equation.

rzar.wordpress.com
draysbay.com
raysprospects.com

by RZ on Apr 8, 2010 4:09 PM EDT up reply actions  

Pythpat RPW

The Tango linear formula is actually a cousin to the more involved Pythagenpat formula. It’s basically what you get when you linearize the Pythagenpat formula for a R=RA team at 9 RPG. (Tango developed it independently of Pythpat, FWIW). So the two are very much related and for normal enviornments, you lose very little by using the simple formula.

by Walksaber on Apr 9, 2010 12:37 PM EDT reply actions  

Addendum

I should have noted that the Tango formula will match Pythpat even better if you add 2.75 instead of 3. However, it actually reduced accuracy in predicting W% for real teams ever-so-slightly when figured in that manner, IIRC.

by Walksaber on Apr 9, 2010 12:50 PM EDT up reply actions  

thanks

interesting

I'm not a sabermetrician, but I do play one at FanGraphs.

Can't get enough of me? Check out my Twitter feed.

by Matt Klaassen on Apr 9, 2010 2:13 PM EDT up reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

FanPosts

Community blog posts and discussion.

Recent FanPosts

Img_3830_small
BtBS Fantasy League
Small
Context Neutral Run and RBI projections
Small
Free Agent Compensation
Img_0001_small
Value of Various Plate Approaches
Strike_three2_small
Effect of Foul Area on Strikeouts: AL 1954-68: Erratum
Small
Baseball on a stick
Small
Player Evaluating Statistic
Baseball_small
Rays Outfield: Cheap but Extremely Productive
Small
A new xBABIP
Small
Jack Morris "pitching to the score"

+ New FanPost All FanPosts >

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Picture-6_small Chris St. John

Btbpro_small Dave Gershman

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung