How To Build And Use A Baseball Database
Colin Wyers has a fantastic post up over at Statistically Speaking about setting up and using a relational database for doing baseball analysis. If you're like me and still stuck using Excel, you'll want to check this out. And it's a Part One, meaning there's more to come. (Unless Colin is a Mel Brooks fan.)
7 months ago
Sky Kalkman
21 comments
0 recs |
Comments
Very interesting
I am a huge fan of MySQL, especially for large databases. The learning curve is large and steep though.
I would think that people who are at the point where they’re struggling with Excel would start using Access. You can do queries, it’s simpler to use, and you can do all sorts of math in it.
Stupid question: where do you get all of this data from?
Curt never met a buttered roll he didn't like.
by CoachOfEarl on
Nov 14, 2008 1:16 PM EST
reply
actions
0 recs
Access does a lot of nice things for you.
I’m not as comfrortable with it, though, and it costs quite a bit of money. (It also doesn’t handle large databases as well, and that’s an issue with Retrosheet data – I think that my events table is up to 8 GB, partitioned by year.) You can use Access as a frontend for MySQL, actually.
The problem with Excel isn’t that it’s hard, per se, but that it’s not meant to do a lot of the stuff that sabermetricians do.
As for a data source, start with the Baseball Databank.
by cwyers on
Nov 14, 2008 1:35 PM EST
up
reply
actions
0 recs
I keep forgetting there are people who didn't get access with excel
And thanks for the link
Curt never met a buttered roll he didn't like.
by CoachOfEarl on
Nov 14, 2008 5:57 PM EST
up
reply
actions
0 recs
like me
OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG
by devil_fingers on
Nov 14, 2008 9:02 PM EST
up
reply
actions
0 recs
Also, there's no Access for the Mac
Anyone have thoughts on the OpenOffice database program?
And, regarding Access as a front-end for MySQL, how difficult is that to set up? Does it basically just remove the file-size limitations that Access has? Do you lose any Access functionality? Do you still have to deal with MySQL at all?
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on
Nov 15, 2008 12:20 PM EST
up
reply
actions
0 recs
I do 99% of my stuff in Excel.
But I think I’m going to try this. It seems easier, once you get it down pat.
by R.J. Anderson on
Nov 14, 2008 1:38 PM EST
reply
actions
0 recs
Is MySQL really that tough?
Has anyone worked with the Open Office database? I haven’t, but it it’s like Access, it might be a good option.
OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG
by devil_fingers on
Nov 14, 2008 9:03 PM EST
reply
actions
0 recs
for free, that is
OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG
by devil_fingers on
Nov 14, 2008 9:03 PM EST
up
reply
actions
0 recs
OOo Base is, shall we say, poorly documented.
The screenshots look similar to Access, but the documentation is very sparse.
As for the question of how tough MySQL is – download it, go through the tutorial, and let us know! I’d like to think I’ve written something to at least get you started.
by cwyers on
Nov 14, 2008 10:41 PM EST
up
reply
actions
0 recs
I did it last night!
I didn’t do the “extra credit,” but I think I did OK. It’s not really running in the background is it?
I like that it seems to be smaller and more efficient that I assume Access would be.
I’m eagerly awaiting Part 2, so that I can start completely move out of Excel for all my wasting-time-pretending-to-be-a-sabermetrician needs.
Maybe there can be a tutorial on how to take a table generated in Google Spreadsheets, MySQL, Excel, etc., and formatting it decently for SB Nation posts…. see my lame attempt (after way too much time spent on it) here.
OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG
by devil_fingers on
Nov 15, 2008 11:41 AM EST
up
reply
actions
0 recs
oh, and thanks a bunch, this is a huge help
can’t wait for your do-it-yourself-Marcels
OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG
by devil_fingers on
Nov 15, 2008 11:41 AM EST
up
reply
actions
0 recs
for extra credit...
write your own relational database software in Python!
(That’s what I did, not that I’m proud of it, but I started programming long before I knew anything, so I wrote a lot of Python code that clumsily does things that other programs/languages/etc do much better. But hey, I know my own RDBMS really, really well!)
Also, cheese.
by Jeff Sackmann on
Nov 15, 2008 12:18 AM EST
reply
actions
0 recs
I've been toying with this all morning.
Great stuff, thanks Colin.
by R.J. Anderson on
Nov 15, 2008 12:28 PM EST
reply
actions
0 recs
question from a noob
Colin: just tried to do the “extra credit” and keep running into errors. Here’s what I have:
1 select player id,
2 sum (ipouts)/3 as inn,
3 sum (er)/(inn*9) as era
4 from pitching
5 group by playerID
6 having ipouts >= 3000
7 order by era asc
8 limit 50;
I keep getting
Error Code : 1054
Unknown column ‘player’ in ‘field list’
OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG
by devil_fingers on
Nov 15, 2008 3:43 PM EST
reply
actions
0 recs
Make playerid one word in line 1 maybe?
by R.J. Anderson on
Nov 15, 2008 4:08 PM EST
up
reply
actions
0 recs
Right.
I would also remove the spaces between the sum commands and the parenthesis.
Also, you’re can’t use an alias as a data column. So the final result looks like this:
select playerid,
sum(ipouts/3) as inn,
sum(er)/(sum(ipouts/3))*9 as era
from pitching
group by playerID
having inn >= 1000
order by era asc
limit 50;
Good catch on using Asc in the order by clause instead of Desc.
by cwyers on
Nov 15, 2008 4:13 PM EST
up
reply
actions
0 recs
and thanks for trying to compliment the one thing I did right
heh… knowing when and when not to use commas is really a challenge for me… I got it, though. Thanks.
can’t wait to figure out how to generate wOBAs and bRAAs for the entire history of baseball (although I know the weights wouldn’t be right for each era)… then add in positional adjustments and replacement value.
That might be a while.
Is this what you use to generate projections, Colin?
OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG
by devil_fingers on
Nov 15, 2008 4:46 PM EST
up
reply
actions
0 recs
This is exactly what I use.
I’m working on Part II of the tutorial right now, and I’m working on cleaning up that code for release, and hopefully that’ll go up either tomorrow or on Friday.
by cwyers on
Nov 15, 2008 5:28 PM EST
up
reply
actions
0 recs
Formulas aren't tricky
Multiply each stat by the appropriate constant:
Instead of just H+K+BB or whatever, do (Hx3) + (Kx-5) + (BBx7)
And replace the x’s with *’s. Damn SBN auto-format.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on
Nov 15, 2008 5:48 PM EST
up
reply
actions
0 recs








BtB on Facebook
BtB_Sky on Twitter




