Beyond the Box Score: An SB Nation Community

Navigation: Jump to content areas:


Sports blogs for fans, by fans.
New Blog: Sounder At Heart for Seattle Sounders Fans!

Colin Wyers has a fantastic post up over at Statistically Speaking about setting up and using a relational database for doing baseball analysis. If you're like me and still stuck using Excel, you'll want to check this out. And it's a Part One, meaning there's more to come. (Unless Colin is a Mel Brooks fan.)

11 months ago Limes_125_tiny Sky Kalkman 21 comments 0 recs  | 

Story-email Email Printer Print

Comments

Display:

Very interesting

I am a huge fan of MySQL, especially for large databases. The learning curve is large and steep though.

I would think that people who are at the point where they’re struggling with Excel would start using Access. You can do queries, it’s simpler to use, and you can do all sorts of math in it.

Stupid question: where do you get all of this data from?

Curt never met a buttered roll he didn't like.

by CoachOfEarl on Nov 14, 2008 1:16 PM EST reply actions   0 recs

Access does a lot of nice things for you.

I’m not as comfrortable with it, though, and it costs quite a bit of money. (It also doesn’t handle large databases as well, and that’s an issue with Retrosheet data – I think that my events table is up to 8 GB, partitioned by year.) You can use Access as a frontend for MySQL, actually.

The problem with Excel isn’t that it’s hard, per se, but that it’s not meant to do a lot of the stuff that sabermetricians do.

As for a data source, start with the Baseball Databank.

by cwyers on Nov 14, 2008 1:35 PM EST up reply actions   0 recs

I keep forgetting there are people who didn't get access with excel

And thanks for the link

Curt never met a buttered roll he didn't like.

by CoachOfEarl on Nov 14, 2008 5:57 PM EST up reply actions   0 recs

like me

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by devil_fingers on Nov 14, 2008 9:02 PM EST up reply actions   0 recs

Also, there's no Access for the Mac

Anyone have thoughts on the OpenOffice database program?

And, regarding Access as a front-end for MySQL, how difficult is that to set up? Does it basically just remove the file-size limitations that Access has? Do you lose any Access functionality? Do you still have to deal with MySQL at all?

Beyond the Boxscore // Calling BJ Upton lazy is lazy.

by Sky Kalkman on Nov 15, 2008 12:20 PM EST up reply actions   0 recs

I do 99% of my stuff in Excel.

But I think I’m going to try this. It seems easier, once you get it down pat.

by R.J. Anderson on Nov 14, 2008 1:38 PM EST reply actions   0 recs

Is MySQL really that tough?

Has anyone worked with the Open Office database? I haven’t, but it it’s like Access, it might be a good option.

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by devil_fingers on Nov 14, 2008 9:03 PM EST reply actions   0 recs

for free, that is

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by devil_fingers on Nov 14, 2008 9:03 PM EST up reply actions   0 recs

OOo Base is, shall we say, poorly documented.

The screenshots look similar to Access, but the documentation is very sparse.

As for the question of how tough MySQL is – download it, go through the tutorial, and let us know! I’d like to think I’ve written something to at least get you started.

by cwyers on Nov 14, 2008 10:41 PM EST up reply actions   0 recs

I did it last night!

I didn’t do the “extra credit,” but I think I did OK. It’s not really running in the background is it?

I like that it seems to be smaller and more efficient that I assume Access would be.

I’m eagerly awaiting Part 2, so that I can start completely move out of Excel for all my wasting-time-pretending-to-be-a-sabermetrician needs.

Maybe there can be a tutorial on how to take a table generated in Google Spreadsheets, MySQL, Excel, etc., and formatting it decently for SB Nation posts…. see my lame attempt (after way too much time spent on it) here.

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by devil_fingers on Nov 15, 2008 11:41 AM EST up reply actions   0 recs

oh, and thanks a bunch, this is a huge help

can’t wait for your do-it-yourself-Marcels

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by devil_fingers on Nov 15, 2008 11:41 AM EST up reply actions   0 recs

for extra credit...

write your own relational database software in Python!

(That’s what I did, not that I’m proud of it, but I started programming long before I knew anything, so I wrote a lot of Python code that clumsily does things that other programs/languages/etc do much better. But hey, I know my own RDBMS really, really well!)

Also, cheese.

by Jeff Sackmann on Nov 15, 2008 12:18 AM EST reply actions   0 recs

question from a noob

Colin: just tried to do the “extra credit” and keep running into errors. Here’s what I have:

1 select player id,
2 sum (ipouts)/3 as inn,
3 sum (er)/(inn*9) as era
4 from pitching
5 group by playerID
6 having ipouts >= 3000
7 order by era asc
8 limit 50;

I keep getting

Error Code : 1054
Unknown column ‘player’ in ‘field list’

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by devil_fingers on Nov 15, 2008 3:43 PM EST reply actions   0 recs

Right.

I would also remove the spaces between the sum commands and the parenthesis.

Also, you’re can’t use an alias as a data column. So the final result looks like this:

select playerid,
sum(ipouts/3) as inn,
sum(er)/(sum(ipouts/3))*9 as era
from pitching
group by playerID
having inn >= 1000
order by era asc
limit 50;

Good catch on using Asc in the order by clause instead of Desc.

by cwyers on Nov 15, 2008 4:13 PM EST up reply actions   0 recs

and thanks for trying to compliment the one thing I did right

heh… knowing when and when not to use commas is really a challenge for me… I got it, though. Thanks.

can’t wait to figure out how to generate wOBAs and bRAAs for the entire history of baseball (although I know the weights wouldn’t be right for each era)… then add in positional adjustments and replacement value.

That might be a while.

Is this what you use to generate projections, Colin?

OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG

by devil_fingers on Nov 15, 2008 4:46 PM EST up reply actions   0 recs

This is exactly what I use.

I’m working on Part II of the tutorial right now, and I’m working on cleaning up that code for release, and hopefully that’ll go up either tomorrow or on Friday.

by cwyers on Nov 15, 2008 5:28 PM EST up reply actions   0 recs

Formulas aren't tricky

Multiply each stat by the appropriate constant:

Instead of just H+K+BB or whatever, do (Hx3) + (Kx-5) + (BBx7)

And replace the x’s with *’s. Damn SBN auto-format.

Beyond the Boxscore // Calling BJ Upton lazy is lazy.

by Sky Kalkman on Nov 15, 2008 5:48 PM EST up reply actions   0 recs

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?
Start posting on Beyond the Box Score »

Join SB Nation and dive into communities focused on all your favorite teams.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Small
PZR-based Win Values 2001-2006

Recent FanPosts

Small
The "30 parks on a budget" challenge
Sunflower_small
World Series Simulation, Game #6
Small
JT20 Dynasty League
E52205a2_small
New Look
Sth70021_small
Exploring Hit f/x, Albeit Badly
Redcap_small
Ricky Nolasco: 4 WAR or 1 WAR?
Redcap_small
Apparently I can't do park adjustments
Small
Which tells us more: The last 7 at bats or 7 at bats against this pitcher?
Sleepy_jeff_small
How Efficient and Effective Were the Rockies in 2009?

+ New FanPost All FanPosts >

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

The Mistake Lottery
On the Field, the Yankees Are The Team of the Decade. Off It? The Red Sox.
Tigers' all-time WAR leaders
Primer on Runs Created
How to improve basketball
LB Keith Bulluck uses a sabermetric analogy to explain the Titans' quarterback situation.
Alcides Escobar "abandoned his daughter before she was born"
UZR, Scouting, and the Fans
Not-So-Lousy Lineup Optimizer, Playoff Edition: New York Yankees

+ New FanShot All FanShots >

BtB on Twitter

Main Feed: @BtBScore

Tommy B: @tommy_bennett
Sky: @BtB_Sky
Dan: @dturkenk
Harry: @harrypav
Jinaz: @jinazreds
Jack: @jh_moore
Erik: @Erik_Manning
Tommy R: @trancel
Justin: @justinbopp

Subscribe to BtB via Email

Enter your email address:

Delivered by FeedBurner

Most Commented

BtB Goes Social


Managers

Nando_small R.J. Anderson

Limes_125_small Sky Kalkman

E52205a2_small Tommy Bennett

Editors

Face_small Harry Pavlidis

Rawlings_baseball_bigger_small Dan Turkenkopf

770insig_small Jeff Zimmerman (TucsonRoyal)

Aviles_small Justin Bopp

Authors

Banny_small erik

Raysring1_small Tommy Rancel

Jinaz-reds-avatar_small JinAZ

Jmlogo_small Jack Moore

1753738656_110919ebe9_o_small vivaelpujols

1_small Graham

Baseball_small Mike Rogers

Redcap_small SFiercex4

Small Patrick Clark

Walter_album_small Walter Fulbright