Saberizing a Mac #5: Calculating wOBA

wOBAfy your existence. - Kyle Terada-USA TODAY Sports

After years of dormancy, Beyond the Box Score re-opens a classic guide to creating your own sabermetric workbench.

There was a recent fanpost here at Beyond The Box Score requesting assistance forming some basic baseball-related queries with SQL. In that thread, a number of other BTBers, both writers and readers alike, reported interest in conducting a sort of beginner's tutorial as well. So it would seem like today is the perfect opportunity to end the long hiatus of Beyond The Box Score's immensely popular DIY database-building series, "Saberizing a Mac."

If you are not familiar with SQL, but would like to begin setting up a baseball database, then I suggest you start from the beginning, because this will all seem like gibberish to you otherwise:

Part 1: Installing the Baseball Databank Database

After you're set up, I would then refer you to Colin Wyers's excellent beginner's tutorial now featured at The Hardball Times. This should get you acquainted with some basic query-language. Have fun with it, explore strange criteria, make up your own stats, etc.

(There is also a 'Saberizing' Part 3 on Win Shares and even a Part 4 on pitch f/x, but you don't necessarily need to know either of those today.)

If this all seems far too intimidating to you, know this: Up until 4-5 years ago I had absolutely zero experience with any form of data programing. The last computer class I took involved booting up a floppy disc of Oregon Trail on MS-DOS circa 1989. I've had zero formal training in any of this, but I've since built a fairly radical database after reading one article by Colin Wyers at the now-defunct Statistically Speaking website sometime back in 2007-ish. So if I can do it, so can you. No excuses.

Calculating wOBA

Since the html in the comments section in the Fanpost isn't exactly code-friendly, I'm instead going to respond to one of the more common code requests I've gotten, adding wOBA to your database.

I would first begin by downloading Fangraphs "GUTS" as a csv file, and then importing it into your SQL program as a separate table. SQL programs may differ on how to import a file as a table, but for my SQLpro (downloaded per the Saberizing a Mac series) it works like this:

  • Select File/Import.
  • Find the GUTS file. Click "Open"
  • Click "New". Name the file "GUTS." Click "Import."
  • Done.

It's extremely helpful then to index the "season" field right away. This is a nice habit to get into as it speeds up your queries significantly, thereby allowing you to get back to that incredibly successful social life as soon as possible.

CREATE INDEX year_idx ON GUTS (season);

We can then join the 'Batting' and 'GUTS' tables to create the wOBA formula of our own choosing. This is the code for Fangraphs' current wOBA formula:

SELECT b.playerID, b.yearID
, CONCAT(m.namefirst," ",m.namelast) as Name
, SUM(b.ab)+SUM(b.bb)+SUM(b.hbp)+COALESCE(SUM(b.sf),0)+COALESCE(SUM(b.sh),0) as PA
, ROUND(((SUM(b.bb)-COALESCE(SUM(b.ibb),0))*g.wbb + COALESCE(SUM(b.hbp),0)*g.whbp +
(SUM(b.h)-SUM(b.2b)-SUM(b.3b)-SUM(b.hr))*g.w1b +SUM(b.2b)*g.w2b +SUM(b.3b)*g.w3B +SUM(b.hr)*g.wHR)/
(SUM(b.ab)+(SUM(b.bb)-COALESCE(SUM(b.ibb),0))+COALESCE(SUM(b.hbp),0)+COALESCE(SUM(b.sf),0)),3) as wOBA

FROM batting b, master m, GUTS g
WHERE b.playerID = m.playerID AND b.yearID = g.season
AND b.yearID = 2012
GROUP BY playerID, yearID
HAVING PA > 499
ORDER BY wOBA DESC
LIMIT 50;

That should return you the top 50 leaders in wOBA for the 2012 season with at least 500 PA's. It should line up exactly with Fangraphs leaderboards here. If you notice a discrepancy somewhere, let us know and we'll try and smoke it out.

My top ten looks like this:

playerID yearID Name PA wOBA
cabremi01 2012 Miguel Cabrera 697 0.417
braunry02 2012 Ryan Braun 677 0.413
troutmi01 2012 Mike Trout 639 0.409
poseybu01 2012 Buster Posey 610 0.406
stantmi03 2012 Mike Stanton 501 0.405
mccutan01 2012 Andrew McCutchen 673 0.403
fieldpr01 2012 Prince Fielder 690 0.398
encared01 2012 Edwin Encarnacion 644 0.396
canoro01 2012 Robinson Cano 697 0.394
beltrad01 2012 Adrian Beltre 654 0.388

...

If your query returned something different, let us know in the comments and we'll try to help you figure out what the issue is.

Remember, I started out from nothing, so I have an enormous debt to pay forward, so don't be afraid to ask the most embarrassing, basic of questions!

More from the Saberizing a Mac series:

Sky's SQL Basics

#2 Baseball Databank

#3 Win Shares

#4 Pitch f/x

#5 wOBA

#6 FIP, FIP-, ERA-

#7 Year-to-year correlations

Thanks to Fangraphs, Colin Wyers, and The Hardball Times.

Follow @JDGentile on twitter.

X
Log In Sign Up

forgot?
Log In Sign Up

Forgot password?

We'll email you a reset link.

If you signed up using a 3rd party account like Facebook or Twitter, please login with it instead.

Forgot password?

Try another email?

Almost done,

By becoming a registered user, you are also agreeing to our Terms and confirming that you have read our Privacy Policy.

Join Beyond the Box Score

You must be a member of Beyond the Box Score to participate.

We have our own Community Guidelines at Beyond the Box Score. You should read them.

Join Beyond the Box Score

You must be a member of Beyond the Box Score to participate.

We have our own Community Guidelines at Beyond the Box Score. You should read them.

Spinner.vc97ec6e

Authenticating

Great!

Choose an available username to complete sign up.

In order to provide our users with a better overall experience, we ask for more information from Facebook when using it to login so that we can learn more about our audience and provide you with the best possible experience. We do not store specific user data and the sharing of it is not required to login with Facebook.

tracking_pixel_9351_tracker