Beyond the Box Score: An SB Nation Community

Navigation: Jump to content areas:


Sports blogs for fans, by fans.
New Blog: RSL Soapbox for Real Salt Lake Fans!

Hack, hack, hack: A review of Baseball Hacks

Gulp. This is a tough game. No, I am not talking about baseball, but sabermetrics! While on the subject I agree 100% with Bill James that sabermetrics is an unfortunate word to use to describe the statistical analysis of baseball. For me it conjures images of socially inept math professors, who are completely detached from reality pontificating findings from their ivory tower. Anyway, this article is not interested in a debate about the misfortune of the term sabermetrics; it is to bemoan some of the frustrations that come with our chosen hobby.

Only for the lucky or talented few is the study of baseball statistics a full time profession. For the rest of us it is something that we dedicate our spare time to. This means that a wannabe sabermetrician not only has to be a baseball and math expert, but also be a database guru, programming junky and spreadsheet geek. Only then can we begin to lift the fog in which baseball statistics generally wallow. So how can a budding baseball enthusiast navigate this treacherous course? One option is to do as I did and log on to Amazon.com and purchase a copy of a new book called Baseball Hacks by all-round baseball stat super sleuth, Joseph Adler.

Star-divide

Baseball Hacks is a 400 page text dedicated to helping Joe Average get started in the world of hardcore baseball analysis. Want to know how to build a 30 year play by play database, or how to work out win expectancy? Then this is the book for you. You may need a degree in Computer Science to understand exactly what you are doing, but the chances are that you'll find a nugget or two in this weighty tome.

The book is ordered into seven chapters, and each chapter is organized in to a series of hacks focusing on a particular topic, of which there are seventy odd. The first couple of chapters introduce the nuts and bolts of baseball analysis, from the mundanely simple (how to read a box score) to the fiendishly complex (building a 30 year play by play database using Perl and SQL). Later chapters discuss interpreting and analyzing data: a chapter is dedicated to graphical presentation; while another shows how to calculate all kinds of exotic sabermetric statistics. In short, if you were to read and understand the book from cover to cover you could drop Marc Normandin a quick email and probably get a writing job on this blog, perhaps replacing me!

The authors (although Joseph Adler is the named author it turns out that the hacks are drawn from a reasonably wide collection of different analysts) sensibly take advantage the plethora of free tools available on the Internet, so much of the early chapters serve as an introduction to the software packages and programming languages required to sniff out and process data and analyze statistics. And herein the challenges for the reader, and the authors for that matter, begin. Introducing a package like MySQL in one 400 page book would be an accomplishment, but attempting to synthesize it in 3 pages, and then expecting the reader to be fluent, makes the odds of successfully implementing a particular hack on the first attempt about as likely as hooking up with hottest girl at the school prom!

After attempting a few hacks one begins to muse whether the authors are perhaps guilty of over complication in order to show off their programming credentials, which are admittedly impressive. There is little doubt that this book has been written by techies for techies. Take the hack for the PBP database which it turns out only works in UNIX (it took me half a day to discover this). Whoa: flashing lights and sirens. Please excuse the rant that follows ... but why do books like this assume everyone runs UNIX? Sure, if you're operating business critical applications then UNIX is great, but this book is aimed at you and me, enthusiasts, who aren't going to have an Oracle database next to their fridge. Yes, we use Windows. So for this book to be truly accessible it should be written for Windows users. Something to focus on for the second edition methinks.

One amusing thing that caught my eye as I flicked through the pages was the number of shameless endorsements for other books by the same publisher (O'Reilly). After each plug the authors bashfully add that they were not asked to push sister books but were doing so because they are so great! Maybe so, but after the 23rd O'Reilly recommendation it starts to get a little unnecessary - I mean we get the idea, go and buy the entire O'Reilly back catalogue! Saying that if you buy this book then it is almost obligatory to get a couple of others about MySQL and Perl, if for no other reason than to reference when you get stuck (which I guarantee you will).

However, some of these frustrations beside, I have to commend the authors on a valiant attempt in penning this book. In actuality Baseball Hacks is a very valuable resource that, if properly used, can alleviate the pain of complex data gathering and tricky analysis. Simply put, there is no other book like it and that makes it a required text for all sabermetricians, aspiring or otherwise. If you are serious about baseball stats you've got no choice to buy it, but I urge you to heed an immortal line from Charles Dickens: it was the best of times, it was the worst of times. That is how you will feel after you have plowed through this book.

As a postscript:

After a couple of agonizing days I now actually have a fully working 5 year PBP database in MySQL! Given the time and effort it took to get working I figured that I might as well put it to good use. What I want to do is run a series of analyses on this PBP database, with the intention of publishing my findings on BtB. To make it more interesting I want to get reader input. So every couple of weeks I'll pick the most interesting suggestion and run the analysis. Email any ideas to me. In the meantime I'll have to spend my spare time furiously learning SQL properly! The first post will run in about a month or so as I have a few other small projects that I am working on that will soon be ready for publishing.

0 recs  |  Comment 5 comments

Story-email Email Printer Print

Comments

Display:

A tip, perhaps
Not that I am an expert but I think what you do is create the database in the MySQLdirectory. You can do this from the MySQL command line. Then you have exit MySQL, go to the directory with the unzipped database in, and then import it using the command in step 4 on page 51.

Good luck! Wait until you get to the PBP database.

by John Beamer on May 1, 2006 1:35 PM EDT reply actions   0 recs

typically
developers and database folk are intimately familiar with not necessarily unix so much as linux. and the numerous free tools out there for personal database use and for analysis are pretty much all linux projects that have been ported to windows. It would be relatively inexpensive to set up a linux box with apache and mysql (or one of the other options) and be able to load in the database and develop some analysis routines that could be accessed via web pages. Could it be done in windows? sure. But apache and mysql are native to linux and ported to windows. And the free programming tools (php, perl, whatever) are far, far more robust for linux.

by cephyn on May 1, 2006 1:57 PM EDT reply actions   0 recs

You are right
Cephyn,

You are right. Perl, MySQL etc etc, the tools that the authors use in the book are far better suited to Linux (I was using the term UNIX to include Linux, but I acknowledge there is a difference even if the two OS' have a common thread running through them).

I guess it comes down to your perspective. I was reviewing this book from the perspective of your average, sabermetrically inclined baseball fan. Now my assumption (and this could be wrong) would be that this person probably runs Windows rather than Linux. Could they download and install Linux? Sure, but I doubt they'd want to. A second consideration was that I was doing this on my work laptop. They'll go mad when they realise I have downloaded MySQL - they'd be apoplectic if I'd installed Linux.

Saying that I have no doubt that this book is better suited to a Linux environment. Those with the patience and technical nous to pursue that course should.

by John Beamer on May 1, 2006 3:50 PM EDT up reply actions   0 recs

Hey, that's me!
Now my assumption (and this could be wrong) would be that this person probably runs Windows rather than Linux. Could they download and install Linux? Sure, but I doubt they'd want to.

That's me.

by salb918 on May 1, 2006 5:52 PM EDT up reply actions   0 recs

Another OS Option
Instead of putting Linux on your machine and having to have a dual boot capability, you could try loading Cygwin, which is essentially a Linux emulator which runs under Windows. It's slower than having a full Linux system, but it's a lot easier to set up and use. And most importantly, there is a version of MySQL which runs under Cygwin.

by BosoxBob on May 2, 2006 1:10 PM EDT reply actions   0 recs

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?
Start posting on Beyond the Box Score »

Join SB Nation and dive into communities focused on all your favorite teams.

FanPosts

Community blog posts and discussion.

Recent FanPosts

Leopold_butter_scotch_southpark_small
Using the TVC
Small
Determining Batted Ball Rates using Pitch Type and Location
Small
a new xBABIP calculator
Img587561916661595
Top 15 high school MLB draft prospects
Small
PZR-based Win Values 2001-2006
Small
The "30 parks on a budget" challenge
Sunflower_small
World Series Simulation, Game #6
Small
JT20 Dynasty League
E52205a2_small
New Look
Sth70021_small
Exploring Hit f/x, Albeit Badly

+ New FanPost All FanPosts >

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

Primer on BaseRuns
Cool Baseball Infographics
ESPN's Jerry Crasnick on defensive metrics
I’m also a follower, since Brian Bannister’s on our team, of sabermetric st...
Top Ten Baseball-Reference.com's Sponsorships
Primer on Linear Weights
JC Bradbury on "Hot Stove Myths"
Everyone Should Learn to Throw a Cutter
Criminals of WAR
Ten statisticians you should know about

+ New FanShot All FanShots >

BtB on Twitter

Main Feed: @BtBScore

Tommy B: @tommy_bennett
Sky: @BtB_Sky
Dan: @dturkenk
Harry: @harrypav
Jinaz: @jinazreds
Jack: @jh_moore
Erik: @Erik_Manning
Tommy R: @trancel
Justin: @justinbopp

Subscribe to BtB via Email

Enter your email address:

Delivered by FeedBurner

BtB Goes Social


Managers

Nando_small R.J. Anderson

Limes_125_small Sky Kalkman

E52205a2_small Tommy Bennett

Editors

Face_small Harry Pavlidis

Rawlings_baseball_bigger_small Dan Turkenkopf

770insig_small Jeff Zimmerman (TucsonRoyal)

Aviles_small Justin Bopp

Authors

Banny_small erik

Raysring1_small Tommy Rancel

Jinaz-reds-avatar_small JinAZ

Jmlogo_small Jack Moore

1753738656_110919ebe9_o_small vivaelpujols

1_small Graham

Baseball_small Mike Rogers

Redcap_small SFiercex4

Small Patrick Clark

Walter_album_small Walter Fulbright