Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Did The Giants Run Buddy Ryan's 'Polish Goalline'?

Colin Wyers has a fantastic post up over at Statistically Speaking about setting up and using a relational database for doing baseball analysis. If you're like me and still stuck using Excel, you'll want to check this out. And it's a Part One, meaning there's more to come. (Unless Colin is a Mel Brooks fan.)

about 3 years ago Limes_125_tiny Sky Kalkman 21 comments 0 recs  | 

Story-email Email Printer Print

Comments

Display:

Very interesting

I am a huge fan of MySQL, especially for large databases. The learning curve is large and steep though.

I would think that people who are at the point where they’re struggling with Excel would start using Access. You can do queries, it’s simpler to use, and you can do all sorts of math in it.

Stupid question: where do you get all of this data from?

Curt never met a buttered roll he didn't like.

by CoachOfEarl on Nov 14, 2008 1:16 PM EST reply actions  

Access does a lot of nice things for you.

I’m not as comfrortable with it, though, and it costs quite a bit of money. (It also doesn’t handle large databases as well, and that’s an issue with Retrosheet data – I think that my events table is up to 8 GB, partitioned by year.) You can use Access as a frontend for MySQL, actually.

The problem with Excel isn’t that it’s hard, per se, but that it’s not meant to do a lot of the stuff that sabermetricians do.

As for a data source, start with the Baseball Databank.

by cwyers on Nov 14, 2008 1:35 PM EST up reply actions  

I keep forgetting there are people who didn't get access with excel

And thanks for the link

Curt never met a buttered roll he didn't like.

by CoachOfEarl on Nov 14, 2008 5:57 PM EST up reply actions  

like me

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by Matt Klaassen on Nov 14, 2008 9:02 PM EST up reply actions  

Also, there's no Access for the Mac

Anyone have thoughts on the OpenOffice database program?

And, regarding Access as a front-end for MySQL, how difficult is that to set up? Does it basically just remove the file-size limitations that Access has? Do you lose any Access functionality? Do you still have to deal with MySQL at all?

Beyond the Boxscore // Calling BJ Upton lazy is lazy.

by Sky Kalkman on Nov 15, 2008 12:20 PM EST up reply actions  

I do 99% of my stuff in Excel.

But I think I’m going to try this. It seems easier, once you get it down pat.

by R.J. Anderson on Nov 14, 2008 1:38 PM EST reply actions  

Is MySQL really that tough?

Has anyone worked with the Open Office database? I haven’t, but it it’s like Access, it might be a good option.

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by Matt Klaassen on Nov 14, 2008 9:03 PM EST reply actions  

for free, that is

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by Matt Klaassen on Nov 14, 2008 9:03 PM EST up reply actions  

OOo Base is, shall we say, poorly documented.

The screenshots look similar to Access, but the documentation is very sparse.

As for the question of how tough MySQL is – download it, go through the tutorial, and let us know! I’d like to think I’ve written something to at least get you started.

by cwyers on Nov 14, 2008 10:41 PM EST up reply actions  

I did it last night!

I didn’t do the “extra credit,” but I think I did OK. It’s not really running in the background is it?

I like that it seems to be smaller and more efficient that I assume Access would be.

I’m eagerly awaiting Part 2, so that I can start completely move out of Excel for all my wasting-time-pretending-to-be-a-sabermetrician needs.

Maybe there can be a tutorial on how to take a table generated in Google Spreadsheets, MySQL, Excel, etc., and formatting it decently for SB Nation posts…. see my lame attempt (after way too much time spent on it) here.

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by Matt Klaassen on Nov 15, 2008 11:41 AM EST up reply actions  

oh, and thanks a bunch, this is a huge help

can’t wait for your do-it-yourself-Marcels

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by Matt Klaassen on Nov 15, 2008 11:41 AM EST up reply actions  

for extra credit...

write your own relational database software in Python!

(That’s what I did, not that I’m proud of it, but I started programming long before I knew anything, so I wrote a lot of Python code that clumsily does things that other programs/languages/etc do much better. But hey, I know my own RDBMS really, really well!)

Also, cheese.

by Jeff Sackmann on Nov 15, 2008 12:18 AM EST reply actions  

question from a noob

Colin: just tried to do the “extra credit” and keep running into errors. Here’s what I have:

1 select player id,
2 sum (ipouts)/3 as inn,
3 sum (er)/(inn*9) as era
4 from pitching
5 group by playerID
6 having ipouts >= 3000
7 order by era asc
8 limit 50;

I keep getting

Error Code : 1054
Unknown column ‘player’ in ‘field list’

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by Matt Klaassen on Nov 15, 2008 3:43 PM EST reply actions  

Right.

I would also remove the spaces between the sum commands and the parenthesis.

Also, you’re can’t use an alias as a data column. So the final result looks like this:

select playerid,
sum(ipouts/3) as inn,
sum(er)/(sum(ipouts/3))*9 as era
from pitching
group by playerID
having inn >= 1000
order by era asc
limit 50;

Good catch on using Asc in the order by clause instead of Desc.

by cwyers on Nov 15, 2008 4:13 PM EST up reply actions  

and thanks for trying to compliment the one thing I did right

heh… knowing when and when not to use commas is really a challenge for me… I got it, though. Thanks.

can’t wait to figure out how to generate wOBAs and bRAAs for the entire history of baseball (although I know the weights wouldn’t be right for each era)… then add in positional adjustments and replacement value.

That might be a while.

Is this what you use to generate projections, Colin?

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by Matt Klaassen on Nov 15, 2008 4:46 PM EST up reply actions  

This is exactly what I use.

I’m working on Part II of the tutorial right now, and I’m working on cleaning up that code for release, and hopefully that’ll go up either tomorrow or on Friday.

by cwyers on Nov 15, 2008 5:28 PM EST up reply actions  

Formulas aren't tricky

Multiply each stat by the appropriate constant:

Instead of just H+K+BB or whatever, do (Hx3) + (Kx-5) + (BBx7)

And replace the x’s with *’s. Damn SBN auto-format.

Beyond the Boxscore // Calling BJ Upton lazy is lazy.

by Sky Kalkman on Nov 15, 2008 5:48 PM EST up reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
Free Agent Compensation
Img_0001_small
Value of Various Plate Approaches
Strike_three2_small
Effect of Foul Area on Strikeouts: AL 1954-68: Erratum
Small
Baseball on a stick
Small
Player Evaluating Statistic
Baseball_small
Rays Outfield: Cheap but Extremely Productive
Small
A new xBABIP
Small
Jack Morris "pitching to the score"
Strike_three2_small
Foul Area and Differences in SO: AL vs NL
Baseball_small
Is there a Kuroda and Oswalt Alternative?

+ New FanPost All FanPosts >

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Picture-6_small Chris St. John

Btbpro_small Dave Gershman

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung