Beyond the Box Score: An SB Nation Community

Navigation: Jump to content areas:



Sports blogs for fans, by fans.
Around SBN: NHL Free Agency Coverage: NHL Rumors, NHL Signings & Trades


How To Build And Use A Baseball Database

Colin Wyers has a fantastic post up over at Statistically Speaking about setting up and using a relational database for doing baseball analysis. If you're like me and still stuck using Excel, you'll want to check this out. And it's a Part One, meaning there's more to come. (Unless Colin is a Mel Brooks fan.)

Link 7 months ago Limes_125_tiny Sky Kalkman Comment 21 comments 0 recs |

Story-email Email Printer Print

Comments

Display:

Very interesting

I am a huge fan of MySQL, especially for large databases. The learning curve is large and steep though.

I would think that people who are at the point where they’re struggling with Excel would start using Access. You can do queries, it’s simpler to use, and you can do all sorts of math in it.

Stupid question: where do you get all of this data from?

Curt never met a buttered roll he didn't like.

by CoachOfEarl on Nov 14, 2008 1:16 PM EST reply reply actions actions   0 recs

Access does a lot of nice things for you.

I’m not as comfrortable with it, though, and it costs quite a bit of money. (It also doesn’t handle large databases as well, and that’s an issue with Retrosheet data – I think that my events table is up to 8 GB, partitioned by year.) You can use Access as a frontend for MySQL, actually.

The problem with Excel isn’t that it’s hard, per se, but that it’s not meant to do a lot of the stuff that sabermetricians do.

As for a data source, start with the Baseball Databank.

by cwyers on Nov 14, 2008 1:35 PM EST to parent up reply reply actions actions   0 recs

I keep forgetting there are people who didn't get access with excel

And thanks for the link

Curt never met a buttered roll he didn't like.

by CoachOfEarl on Nov 14, 2008 5:57 PM EST to parent up reply reply actions actions   0 recs

like me

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by devil_fingers on Nov 14, 2008 9:02 PM EST to parent up reply reply actions actions   0 recs

Also, there's no Access for the Mac

Anyone have thoughts on the OpenOffice database program?

And, regarding Access as a front-end for MySQL, how difficult is that to set up? Does it basically just remove the file-size limitations that Access has? Do you lose any Access functionality? Do you still have to deal with MySQL at all?

Beyond the Boxscore // Calling BJ Upton lazy is lazy.

by Sky Kalkman on Nov 15, 2008 12:20 PM EST to parent up reply reply actions actions   0 recs

I do 99% of my stuff in Excel.

But I think I’m going to try this. It seems easier, once you get it down pat.

by R.J. Anderson on Nov 14, 2008 1:38 PM EST reply reply actions actions   0 recs

Is MySQL really that tough?

Has anyone worked with the Open Office database? I haven’t, but it it’s like Access, it might be a good option.

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by devil_fingers on Nov 14, 2008 9:03 PM EST reply reply actions actions   0 recs

for free, that is

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by devil_fingers on Nov 14, 2008 9:03 PM EST to parent up reply reply actions actions   0 recs

OOo Base is, shall we say, poorly documented.

The screenshots look similar to Access, but the documentation is very sparse.

As for the question of how tough MySQL is – download it, go through the tutorial, and let us know! I’d like to think I’ve written something to at least get you started.

by cwyers on Nov 14, 2008 10:41 PM EST to parent up reply reply actions actions   0 recs

I did it last night!

I didn’t do the “extra credit,” but I think I did OK. It’s not really running in the background is it?

I like that it seems to be smaller and more efficient that I assume Access would be.

I’m eagerly awaiting Part 2, so that I can start completely move out of Excel for all my wasting-time-pretending-to-be-a-sabermetrician needs.

Maybe there can be a tutorial on how to take a table generated in Google Spreadsheets, MySQL, Excel, etc., and formatting it decently for SB Nation posts…. see my lame attempt (after way too much time spent on it) here.

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by devil_fingers on Nov 15, 2008 11:41 AM EST to parent up reply reply actions actions   0 recs

oh, and thanks a bunch, this is a huge help

can’t wait for your do-it-yourself-Marcels

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by devil_fingers on Nov 15, 2008 11:41 AM EST to parent up reply reply actions actions   0 recs

for extra credit...

write your own relational database software in Python!

(That’s what I did, not that I’m proud of it, but I started programming long before I knew anything, so I wrote a lot of Python code that clumsily does things that other programs/languages/etc do much better. But hey, I know my own RDBMS really, really well!)

Also, cheese.

by Jeff Sackmann on Nov 15, 2008 12:18 AM EST reply reply actions actions   0 recs

question from a noob

Colin: just tried to do the “extra credit” and keep running into errors. Here’s what I have:

1 select player id,
2 sum (ipouts)/3 as inn,
3 sum (er)/(inn*9) as era
4 from pitching
5 group by playerID
6 having ipouts >= 3000
7 order by era asc
8 limit 50;

I keep getting

Error Code : 1054
Unknown column ‘player’ in ‘field list’

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by devil_fingers on Nov 15, 2008 3:43 PM EST reply reply actions actions   0 recs

Right.

I would also remove the spaces between the sum commands and the parenthesis.

Also, you’re can’t use an alias as a data column. So the final result looks like this:

select playerid,
sum(ipouts/3) as inn,
sum(er)/(sum(ipouts/3))*9 as era
from pitching
group by playerID
having inn >= 1000
order by era asc
limit 50;

Good catch on using Asc in the order by clause instead of Desc.

by cwyers on Nov 15, 2008 4:13 PM EST to parent up reply reply actions actions   0 recs

and thanks for trying to compliment the one thing I did right

heh… knowing when and when not to use commas is really a challenge for me… I got it, though. Thanks.

can’t wait to figure out how to generate wOBAs and bRAAs for the entire history of baseball (although I know the weights wouldn’t be right for each era)… then add in positional adjustments and replacement value.

That might be a while.

Is this what you use to generate projections, Colin?

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by devil_fingers on Nov 15, 2008 4:46 PM EST to parent up reply reply actions actions   0 recs

This is exactly what I use.

I’m working on Part II of the tutorial right now, and I’m working on cleaning up that code for release, and hopefully that’ll go up either tomorrow or on Friday.

by cwyers on Nov 15, 2008 5:28 PM EST to parent up reply reply actions actions   0 recs

Formulas aren't tricky

Multiply each stat by the appropriate constant:

Instead of just H+K+BB or whatever, do (Hx3) + (Kx-5) + (BBx7)

And replace the x’s with *’s. Damn SBN auto-format.

Beyond the Boxscore // Calling BJ Upton lazy is lazy.

by Sky Kalkman on Nov 15, 2008 5:48 PM EST to parent up reply reply actions actions   0 recs

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Start posting on Beyond the Box Score »

Join SB Nation and dive into communities focused on all your favorite teams.

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
SLG and Speed
Small
Interleague Attendance Nonsense
Limes_125_small
A Note About Becoming a BtB Author: Contributing to the Community Helps
Small
Is Adrian Beltre underrated...?
Limes_125_small
How Do You Like the New Daily Link Roundup Posts?
Stlouiscardinals_small
Depth Charts Help
Zorilla_small
How Do You Measure a Pitching Coach?
Limes_125_small
Looking For SQL & Tech Geek Help For Collaborative Projects
Small
When do MLBers get paid...?
Limes_125_small
Help Me Expand Who I Follow On Google Reader

Post_icon New FanPost All FanPosts Carrot-mini

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

BtB Ranked Second Best Sporst Blog By Wikio
There's your human element.  Why, when the technology is readily available, are humans still calling balls and strikes?
Fire Jim Leyland: Fu-Te Ni Follow Up; Concern Over Big Three?
Rany Gets Banned By the Royals
Contract Retrospective: Vernon Wells' 7-year, $126 Million Contract
The Rockets are innovative
Flip Flop Fly Ball
Yanks Considered Trading Rivera For Wells In '95: MLB Rumors - MLBTradeRumors.com
Bullpen Usuage Charts for Last 5 days
MiLB Game of the Week

Post_icon New FanShot All FanShots Carrot-mini

Most Commented

Subscribe to BtB via Email

Enter your email address:

Delivered by FeedBurner

BtB Goes Social

BtB on Facebook

BtB_Sky on Twitter


Managers

Mos-def-the-ecstatic_small R.J. Anderson

Limes_125_small Sky Kalkman

Editors

Rawlings_baseball_bigger_small Dan Turkenkopf

Face_small Harry Pavlidis

770insig_small Jeff Zimmerman (TucsonRoyal)

Rickstache_small erik

Authors

Jinaz-reds-avatar_small JinAZ

Hms_surprise_small Graham

Wisc19cropped2_small jhmoore

Raysring1_small Tommy Rancel

E52205a2_small tbsmkdn

Official Partner of Yahoo! Sports