Beyond the Box Score: An SB Nation Community

Navigation: Jump to content areas:


Sports blogs for fans, by fans.
New Blog: RSL Soapbox for Real Salt Lake Fans!

Saberizing a Mac #4: Pitch f/x

Pitch f/what?  I rely on my eyes!

More photos » by Gene J. Puskar - AP

Pitch f/what? I rely on my eyes!

Hi all.  With Sky being "hired by the Rays*" and all, I thought I would try to continue BtB's series on equipping yourself with best and geekiest tools to analyze baseball.  If you've missed the previous 4 installments, and you can find them all here, we took a look at how to set up an SQL database, upload 100+ years of baseball data via the Baseball Databank, add data to the Baseball Databank, and write queries to analyze those stats.  Hopefully, those who are interested in continuing this series are up to speed.  If not, you an of course as questions in the comments section and I'm sure one of BtB's excellent commenters will be able to help.  

Today, I wanted to take a look at one of the newer innovations in baseball analysis... Pitch f/x.  Implemented sometime in 2007, Sport Vision's Pitch f/x cameras record information on each pitch in the majors, including vertical and horizontal position, velocity, initial acceleration, vertical and horizontal movement, spin angle and many other useful bits of information. 

This system has opened up a whole new avenue of baseball analysis.  We are now able to see the nastiness of Neftali Feliz in graphical form and the sometimes hilarious innacuracy of umpires.  Anaylist's like Josh Kalk, who are well refined in the craft, have gone on to work for major league teams.  And BtB's own Harry Pavlidis is not far behind him.  It's even reaching major league pitchers!  Brian Bannister, through extensive use of Pitch f/x data, has transformed himself into a quality major league pitcher.  

Okay, enough posturing.  I, for one, would like to get my hands on the data and see what I can do.  Follow to the jump for more info.  

 

Star-divide

The first step in creating a Pitch f/x database is to dowload the data.  If your interested in simply looking at a single games worth, Dan Brooks' site offers some great tools to that; however, if you are looking to set up a database in which you can look at every and any pitchers aggregate data, the process becomes a little harder.  

One such source is avaible for free here.  BtB's Jeff Zimmerman and his brother have taken the time to download all of the Pitch f/x files into a single SQL dump, and they update it daily.  While this is a great resource to have, and I have used it more than a couple of times, several components are missing (namely pitch type) and you have to re-download the file each day for up to date usage.

To avoid doing that, and to get more thorough pitch data, you have to set up Perl Scripts to automatically download the data.  Unfortunately, most people are like me and have no idea how to manipulate and run Perl Scripts to do that for you.  

The most comprehensive tutorial I found on the matter was written by Mike Fast.  However, he wrote it just under 2 years ago, and some of the information may be outdated.  Indeed, when I tried to implement his methods, I encountered a slew of problems.  So, I thought that we could walk through his tututorial here and try to solve it on our own.   

The first step is downloading the data.  From Mike:

Whether or not you want to buy the book, the Perl scripts he wrote for the book can be downloaded from the examples section of O’Reilly’s website. Download and uncompress the baseball_hacks_code.zip file. It contains all the scripts from the book, divided by chapter and hack number

That sounds pretty easy so far.  Here is a screenshot of what the downloaded file will look like, once you are able to uncompress it and open it:

Baseball_hacks_medium

Next step from Mike:

The first script of interest for XML downloading is hack_28_spider.pl, found in Chapter 3. This script can be used after only a few minor modifications, as follows:

To download the 2007 season starting with April 2, change Line 35 from
$start = timelocal(0,0,0,20,6,105);
to
$start = timelocal(0,0,0,2,3,107);

So, I'm pretty sure that you want to change the first line to (0,0,0,2,4, 107), that must have been a typo on Mike's part.  This will start downloading the data from the 2nd of April in 2007.  Then:

Similarly, you need to change the end time. Uncomment Line 40 and comment Line 41:
$now = timelocal(0,0,0,$mday - 1,$mon,$year);
#$now = timelocal(0,0,0,3,10,105);

This is where I run into trouble.  Being a complete and udder idiot when it comes to computer programming, I have no idea what "comment" and "uncomment" mean.  This is the step where you specify the end date from which you want to download the data, so it is obviously very important.  Assuming that someone will be able to fill me in on this later, let's proceed to the next step.  

Mike:

Another change is to correct for the fact that MLB now puts player information in an XML file rather than a TXT file.

Change the players.txt in Lines 95, 96, 101, and 102 to players.xml:
from
if($gamehtml =~ m/<a href=\"players\.txt\"/ ) {
$plyrurl = "$dayurl/$game/players.txt";
$response = $browser->get($plyrurl);
die "Couldn't get $plyrurl: ", $response->status_line, "\n"
unless $response->is_success;
$plyrhtml = $response->content;
open PLYRS, ">$gamedir/players.txt"
or die "could not open file $gamedir/players.txt: $|\n";
to
if($gamehtml =~ m/<a href=\"players\.xml\"/ ) {
$plyrurl = "$dayurl/$game/players.xml";
$response = $browser->get($plyrurl);
die "Couldn't get $plyrurl: ", $response->status_line, "\n"
unless $response->is_success;
$plyrhtml = $response->content;
open PLYRS, ">$gamedir/players.xml"
or die "could not open file $gamedir/players.xml: $|\n";
UPDATE:

 The spider needs to be changed to download the game.xml file. Since the game.xml file is in the same directory as the boxscore.xml file, you can duplicate that section of the code and change "boxscore" to "game", like so:

if($gamehtml =~ m/<a>get($gameurl);
die "Couldn’t get $gameurl: ", $response->status_line, "\n"
unless $response->is_success;
$infohtml = $response->content;
open GAME, ">$gamedir/game.xml"
or die "could not open file $gamedir/game.xml: $|\n";
print GAME $infohtml;
close GAME;
} else {
print "warning: no xml game file for $game\n";
}

 

Sounds simple enough, you should all be able to do this.  If it isn't clear, this is on the same text file that you are editing the start and end dates in, which looks something like this:

Text_medium

The picture is for reference only, and doesn't depict the actual changes you should make.  As long as you followed the first two steps shown above, it should be pretty simple.  

After you've changed the txt files to xml files, the next step is to download Perl and run the script.  From Mike:

Before you can run it, though, you need to have Perl installed on your computer. You can download the Perl binaries and package manager for ActivePerl here.

This is pretty simple.  Once you click on the link, you can download the file free off of their website.  Just click on the "download" link, skip where it asks you to provide your name (or fill it in, whatever) and click on whatever your operating system is.  Active Perl takes about 2 minutes to download, then you can click on it and it will start the intilation process.  If that doesn't work, then you are probably selecting the wrong opperating system. 

Once it installs, it will show up on your desktop.  Just leave it there for now.  

Next step:

Once you get Perl installed you should be ready to go. I made a separate directory to hold my game data. If you’re running from Windows, just open a command prompt window, cd to your game data directory, and then run the hack_28_spider.pl script.

Again, this is where I run into problems.  I don't know what a directory is, or how it affects the process.  Furthermore, I don't know how to "cd to your game data directory" and run the hack_28_spider.pl script (this was the one we changed right above).  If anyone can help me or other people who might be having confusion as well, that would be awesome.

Okay, so once you have done that, you have to download the data to your SQL database.  From Mike:

The next steps are setting up a database and adapting the hack_28_parser.pl script to input the data into the database.

Here is a link to my PBP database structure. This is a MySQL database. If you are already familiar with MySQL, then this will give you a head start.

UPDATE:  Mike has an "updated" structure that includes other goodies!!

Here is my new database structure for 2008 with these fields added to the pitch table

For those using MAMP and Sequel Pro, you may have to create the tables manually.  This isn't a big problem, but it's a bit tedious.  To do so, you open Sequel Pro, and go to "Database" - "Create new database".  I labeled mine Pitchfx.  Then you create 5 tables, by pressing the + button under the table section.  You should label them "atbats", "games", "pitches", "players" and "umpires". 

Then, click on the first one and go to "structure".  Then you can click on the + button to add columns to each table.  The columns you should add are detailed by Mike in the link in the last quote.  So, for the atbats table you would create 15 columns, starting with "ab_id".  Then you would creat the next 14 shown.  

Do this for each different table, and you have a nice a simple database structure to house all of your data.  It will look like this:

Queiry_medium

The next step from Mike:

Also, here is a link to the code from my XML-to-database parser script. It’s still a work in progress, so please don’t mind the mess inside, but it works if you want to use it in its current state. I’ve fixed it up a bit from an earlier version I posted, cleaning up some of the subroutine calls. (Note: the copy now posted is yet a newer version that also processes umpires and fielding locations for balls in play.)

UPDATE: You can download the new database parser script to use these fields. I have an additional script to update the pitches table with the ball-strike count at each pitch.

Running the XML-to-database parser script will require the Perl DBI::mysql package to be installed. If you don’t have that already, open the Perl Package Manager (under the Active Perl program group if you installed Active Perl). Under the View menu, select All Packages, and look for DBI and DBD-mysql. If they aren’t listed as installed, click on them to select them, and then go to the Action menu and Install them.

That XML-to-database parser script is the key!  This is what transfers the the data you download from the MLB.com website to Sequel.  He revealed later that it was updated later with some information provided by Harry Pavilidis, so it looks like it should be updated.  If Harry wants to provide his most recent script, that would be great as well.

I'm not sure about the second paragrpah though.  I (We) already have both downloaded Active Perl, and we already have MAMP and Sequel up and running.  I assume you need to connect the two somehow.

Mike says that you have to open the Perl Package Manager, yet I can't seem to find that on my computer anywhere.  Is this something that you need to download seperately?  Again, if someone can help me here, that would be awesome.  

The next step provided in Mike's tuturiol is how to download and use MySQL.  Since most of you here are using Mac's, I'll refer you to other installments in our series, where Sky shows how to download and use Sequel Pro with MAMP on the Mac.  

 

Consider this a laymen's thought process when reading Mike's tutorial.  Hopefully, more computer savy guys like Dan and Harry can help answer my questions and any others that other readers have.  Or if anyone else has anymore information on the matter or can point to a different tutorial, awesome!

*"Put down"

0 recs  |  Comment 205 comments

Story-email Email Printer Print

Comments

Display:

So, without intimate familiarity either in these scripts, or in Perl...

The idea of “commenting” a line of code means that the Perl interpreter will ignore it:

Similarly, you need to change the end time. Uncomment Line 40 and comment Line 41:
$now = timelocal(0,0,0,$mday – 1,$mon,$year);
#$now = timelocal(0,0,0,3,10,105);

(I’m assuming here) The # sign in front of the second timelocal command is Perl’s “comment” command. In many languages, this command is // (just in case you’ve seen that in JavaScript).

So, assuming that those two timelocal commands are lines 40 and 41 respectively, line 40 is already uncommented, and 41 is commented.

by jwiscarson on Aug 19, 2009 10:47 AM EDT reply actions   0 recs

some quick tips

I’ve gotten this method working in the past so I’m more than happy to help. It’s a fantastic setup.

Directory is synonymous for “folder.” CD stands for change directory, and thus is a way to move in and out of folders. You will want to start using the Mac Terminal (or windows command prompt) for this step. You can google more info, but on a mac I believe it’s found in Utilities. Check out a tutorial on using a terminal aka shell. You will want to cd into the right directory then to run the script use: “perl hack_28_parser.pl”

I’m not sure where to find the Perl Package Manager on a mac, but it should be there already on your computer. You will need to find it in order to install the add-on or else things won’t run.

by arzdb on Aug 19, 2009 10:55 AM EDT reply actions   0 recs

Question Two:
Next step:

    Once you get Perl installed you should be ready to go. I made a separate directory to hold my game data. If you’re running from Windows, just open a command prompt window, cd to your game data directory, and then run the hack_28_spider.pl script.

Again, this is where I run into problems. I don’t know what a directory is, or how it affects the process. Furthermore, I don’t know how to “cd to your game data directory” and run the hack_28_spider.pl script (this was the one we changed right above). If anyone can help me or other people who might be having confusion as well, that would be awesome.

When he says “directory”, he means folder.

“cd” refers to the command prompt (or terminal) command “change directory”. So, think of this command as a way to navigate directly to a folder using only the keyboard.

I’m unfamiliar with the way terminal works on a Mac, but I understand they run on a Linux kernel these days, so he’s telling you to open a terminal and navigate to the folder you saved your game data, using the cd command. Don’t feel bad if this is confusing — we old fogeys cut our teeth in that world. If you can open a terminal, but can’t figure out how to use the cd command, the Mac should have man pages (short for manual). Just type “man cd” and you’ll get a long explanation of the command (Hit Q to quit the man page when you’re done, btw).

by jwiscarson on Aug 19, 2009 10:56 AM EDT reply actions   0 recs

I would also like to add...

It bothers me to no end that the Mac SQL software is called Sequel Pro.

by jwiscarson on Aug 19, 2009 10:59 AM EDT reply actions   0 recs

Does anyone know...

of a script or database that has up-to-date individual player statistics? In theory you could build it off of the raw pitchf/x data, but that seems like a real pain. It would be great to be able run a regression on fastball velocity to strikeout rate, for example, among many other things.

by arzdb on Aug 19, 2009 3:01 PM EDT reply actions   0 recs

I think you would have to get BIS or STATS or something to do that

Or, like you said, you could do it yourself using Retrosheet or Pitch f/x. I know that’s what Statcorner does.

You could also download the data from FanGraphs. They offer a CSV download of the entire leaderboards (just selcect > 10 innings pitched or something) and if you can strip away the url, it just shows the playerid, which I think matches up with Elias.

Thanks

by vivaelpujols on Aug 19, 2009 7:43 PM EDT up reply actions   0 recs

A few notes about the dates
So, I’m pretty sure that you want to change the first line to (0,0,0,2,4, 107), that must have been a typo on Mike’s part.

Actually the month numbering system starts at 0 for January, so Mike has it correct with 3 for April.
Also, I ran into problems with this line:

Similarly, you need to change the end time. Uncomment Line 40 and comment Line 41:
$now = timelocal(0,0,0,$mday – 1,$mon,$year);
#$now = timelocal(0,0,0,3,10,105);

Why don’t you just set the $now at the end to the final date of the season? I couldn’t get it to work without doing that, and that seems to make the most sense.

by Alex Krolewski on Aug 19, 2009 4:27 PM EDT reply actions   0 recs

You're right Alex

The date/time string is (seconds, minutes, hours, day, month, year), where 0 = January and 11 = December for the month, and years are years since 1900, so 107 = 2007.

You can change the $now to any date which works for you; the end of the season is fine.

by Mike Fast on Aug 20, 2009 2:01 AM EDT up reply actions   0 recs

Thanks a lot guys

I just got back from work, and should be able to get it all running soon. Then I’ll post a summary at the end of the post.

Don’t worry though, I’ll have more questions :D

Thanks

by vivaelpujols on Aug 19, 2009 7:41 PM EDT reply actions   0 recs

God I need to catch up with my own damn series.

"Of course Kolby Rasmus was going deep! That’s what Kolby Rasmus does! You don’t give Kolby Rasmus second chances!" -Kolby Rasmus

by hazel on Aug 20, 2009 1:49 AM EDT reply actions   0 recs

Help!

Hey, i get 500 internal error time outs when i run the perl script. How do i fix this? I followed the directions except i started in April of 2008 and then it got through one day of files perfectly before timing out on a game on april 3rd.

Any help?

by garik16 on Aug 21, 2009 7:53 PM EDT reply actions   0 recs

Garik

If that’s the error I think it is, I’ve only seen it very rarely (a handful of times in three years). Whenever it happened to me, it always worked to try again later, and I found that I tended to see it more often when the load on the MLB servers was highest during East Coast game time.

However, some people have reported significantly more trouble with connection timeouts, to the extent that it made it extremely difficult to get the data. You might try this suggestion from Kris Gardham:

Note: Some of you have noted connection problems with the spider script. Kris Gardham informs me, "For those people getting connection problems, I think it’s more of a DNS thing than a MLB thing. Hard coding the IP of gd2 or gd.mlb.com into the script really seemed to speed things up."

If you still can’t get it to work, please post the full text of the error message you are seeing. This applies to anyone seeing error message—the more detailed information you provide, the more likely you will receive a helpful response; seemingly minor details may be the key to identifying your issue.

by Mike Fast on Aug 22, 2009 3:00 PM EDT up reply actions   0 recs

Mike, i already am using the IP rather than the web link (12.130.102.19), and its still giving me the issue.

The error message that comes up is for example:

Couldn’t get http://12.130.102.19/components/game/mlb/year_2008/month_04/day_14/gid_2008_04_14_oakmlb_chamlb_1//batters//445968.xml: 500 read timeout

This occurs usually after getting like 10-15 files (it’s sometimes less or more) and then the whole script stops.

Any help would be appreciated.

by garik16 on Aug 22, 2009 3:21 PM EDT up reply actions   0 recs

try this

at a command prompt, and report the results:

ping gd2.mlb.com

and

ping gd.mlb.com

by Mike Fast on Aug 22, 2009 5:48 PM EDT up reply actions   0 recs

Ping gd2.mlb.com results (12.130.102.19) with 32 bytes of data:

Resulted in FOUR TIME OUTS and Zero returns.

Ping gd.mlb.com (69.174.84.82) with 32 bytes of data

Ping 1: returned in 77 ms
Ping 2: returned in 91 ms
Ping 3: returned in 85 ms
Ping 4: returned in 101 ms

Obviously the first result is bad. What’s gd.mlb.com exactly? as far as i can tell, there’s nothing there at all!

by garik16 on Aug 22, 2009 6:14 PM EDT up reply actions   0 recs

you can get the data from either one

just point your $baseurl variable at the one you want to use:
$baseurl = “http://gd.mlb.com/components/game/mlb”;

See if that works.

by Mike Fast on Aug 22, 2009 6:21 PM EDT reply actions   0 recs

Tried this

Tried it. It went about 4 times as long (got 4 days worth of games) and then…timed out again.

Anything else or am i basically screwed?

by garik16 on Aug 22, 2009 6:38 PM EDT up reply actions   0 recs

Incidentally

I just pinged gd.mlb.com and promptly the ping shot right up, in 4 tries 1 timed out and the other 3 took around 900 ms to return.

by garik16 on Aug 22, 2009 6:40 PM EDT reply actions   0 recs

My suggestion

would be to try again at an off-peak time (no East Coast games in progress).

by Mike Fast on Aug 22, 2009 7:02 PM EDT up reply actions   0 recs

I’ll try, thanks for the help.

by garik16 on Aug 22, 2009 7:14 PM EDT up reply actions   0 recs

Still having issues

Mike I’m still having issues. It’s better late at night, but definitely not good enough to get more than a week’s worth…and i’d like to sleep rather than stay up all night restarting the script

by garik16 on Aug 23, 2009 1:06 AM EDT up reply actions   0 recs

Garik, I’m having the same timeout issue as you now.

I didn’t have any problems this afternoon (downloaded all the 2007 data), so I plan to just start up the script in the morning and hope it gets all the data.

by RedRobot8 on Aug 25, 2009 9:06 PM EDT up reply actions   0 recs

Garik, you still around?

Did you find a fix for the timeout problem? After a good afternoon yesterday, I can’t even seem to get a game’s worth of data before a timeout.

by RedRobot8 on Aug 26, 2009 2:44 PM EDT up reply actions   0 recs

Following along...

Just jumped into this tutorial, looks like fun. Have some programming experience (Linux/C/Java), but no DB.

Once you get Perl installed you should be ready to go. I made a separate directory to hold my game data. If you’re running from Windows, just open a command prompt window, cd to your game data directory, and then run the hack_28_spider.pl script.
Again, this is where I run into problems. I don’t know what a directory is, or how it affects the process. Furthermore, I don’t know how to “cd to your game data directory” and run the hack_28_spider.pl script (this was the one we changed right above). If anyone can help me or other people who might be having confusion as well, that would be awesome.

In the Finder, create a folder where you want to store all your PitchFX data. Preferably, this will be somewhere in your user folder. Move the spider script into that newly created PitchFX folder in your user folder.

In Terminal, use the “cd” command to navigate to your new folder. For me, that command was “cd baseball/PitchFX”. To verify that you are in the right place, enter “ls” in Terminal, and you should see the contents of that folder (just the spider for now). Now, enter “perl hack_28_spider.pl”. This should cause a stream of text as the script tells you what games it is processing.

I am not having Garik’s issue, but I do get “warning: no player list” for every single game (have only tried 2007 so far). Is that normal?

by RedRobot8 on Aug 25, 2009 5:58 PM EDT reply actions   0 recs

Hi, thanks for stopping by

I created a folder on my desktop labeled “pitchfx” and I saved the perl script to that. When I type in terminal “cd pitchfx” it says no file or directory exists. What am I doing wrong?

Smoltz.

by vivaelpujols on Aug 25, 2009 8:26 PM EDT up reply actions   0 recs

You need to type “cd Desktop/pitchfx”.

It is difficult to explain command line to someone who hasn’t used it before, but the basic idea is that Terminal opens in your User folder. “cd” just tells it to open a named subfolder. Your Desktop is actually a subfolder of your user folder.

Basic command line navigation skills:

Enter “pwd” to see your current location.
Enter “ls” to see the contents of the folder you are in.
Enter “cd ” to move into a subfolder.
Enter “..” to go back to the parent folder.

With those four, you should be able to navigate your file system.

by RedRobot8 on Aug 25, 2009 8:58 PM EDT up reply actions   0 recs

Thanks a lot!

It looks like it’s working.

Smoltz.

by vivaelpujols on Aug 25, 2009 9:01 PM EDT up reply actions   0 recs

It can.

It gets every game in the range specified by you in the spider editing script that we edited earlier.

It seems like a couple of us (Garik and I) are having an issue where the script reports a timeout error. I didn’t have that problem this afternoon when I downloaded the 2007 data, so I am assuming that it will only be a problem during peak times (i.e., during evening baseball). I will attempt to download the rest of the data tomorrow morning.

by RedRobot8 on Aug 25, 2009 9:21 PM EDT up reply actions   0 recs

Hm...

It was working for me before, but I realized I forgot the change the start date. I changed it to 07 instead of 05, but the terminal command didn’t work. It said:

starting at 04/02/2007
ending at 11/03/2009
processing 04/02/2007
http://gd2.mlb.com/components/game/mlb/year_2007/month_04/day_02/
Couldn’t get http://gd2.mlb.com/components/game/mlb/year_2007/month_04/day_02/: 500 Can’t connect to gd2.mlb.com:80 (Bad hostname ‘gd2.mlb.com’)

Smoltz.

by vivaelpujols on Aug 25, 2009 9:26 PM EDT up reply actions   0 recs

This is probably just a variant of our timeout error. I’d suggest trying again during a low-traffic time.

by RedRobot8 on Aug 25, 2009 9:36 PM EDT up reply actions   0 recs

Hey guys, thanks a lot for the help

I’m glad people are trying to do this. I just started back up myself, I’ve been busy lately, and of course, I’m still having trouble.

I don’t get this bit:

The spider needs to be changed to download the game.xml file. Since the game.xml file is in the same directory as the boxscore.xml file, you can duplicate that section of the code and change “boxscore” to “game”, like so:

        if($gamehtml =~ m/get($gameurl);
        die "Couldn’t get $gameurl: ", $response→status_line, “\n”
        unless $response→is_success;
        $infohtml = $response→content;
        open GAME, “>$gamedir/game.xml”
        or die “could not open file $gamedir/game.xml: $|\n”;
        print GAME $infohtml;
        close GAME;
        } else {
        print “warning: no xml game file for $game\n”;
        }

I can’t seem to find that part on the script where the “boxscore” part of that is located. Can anyone help?

Smoltz.

by vivaelpujols on Aug 25, 2009 8:19 PM EDT reply actions   0 recs

Yeah, that part of the tutorial seemed to have some issues. At line 95, insert:

if($gamehtml =~ m/<a href=\“game\.xml\”/ ) {
$gameurl = “$dayurl/$game/game.xml”;
$response = $browser→get($gameurl);
die "Couldn’t get $gameurl: ", $response→status_line, “\n”
unless $response→is_success;
$gamehtml = $response→content;
open GAME, “>$gamedir/game.xml”
or die “could not open file $gamedir/game.xml: $|\n”;
print GAME $gamehtml;
close GAME;
} else {
print “warning: no xml game file for $game\n”;
}

That seems to work, although I am still downloading data and haven’t tried to process anything yet.

by RedRobot8 on Aug 25, 2009 9:01 PM EDT reply actions   0 recs

Apparently, copying/pasting through HTML changes the characters.

I guess you’ll need to recreate that part on your own. Start by copying the block of code that begins in line 81, and change the words as I did.

by RedRobot8 on Aug 25, 2009 11:19 PM EDT up reply actions   0 recs

Okay, i'm pretty sure someting is wrong

I’ve been processing the text file for 30 minutes, and I have about 20,000 rows of data. And I’m downloading the game for 4/11/07 – I started 4/2/07

Smoltz.

by vivaelpujols on Aug 25, 2009 11:14 PM EDT reply actions   0 recs

Each game has about 300 pitches

At 15 games per day, that’s 4500 pitches per day…in ten full days of games you have roughly 45,000 pitches, so your number of rows sounds about right to me.

At the beginning it will go quickly…30 minutes for 60-70 games’ worth of data is actually quick. When you have several thousand games in your database it will take ~a few minutes to update the data for just one game. Btw, the updates go much faster if the database tables are MyISAM rather than InnoDB. I did update that part of the database structure in my post, no? I thought I did, but I forget.

I also have an update to the parser code that will spit out the ID of the game it is parsing, rather than just the list of atbat IDs. Let me see if I can dig that up and post it, too.

by Mike Fast on Aug 26, 2009 4:04 AM EDT up reply actions   0 recs

change the start date in the spider script

I do this manually every day (or however often I download new data).

It won’t download games you have already downloaded, but it checks to see if the game directory exists in the download area, and if it has to do that for every day from April 2, 2007, every time you run it, it will take a long time to check by the time you get into 2009. So you can just change the start manually.

It is something that can be automated. I have never done so, but I believe Dan Turkenkopf did. Check his old blog at blog.stealingfirst.com for instructions if you are interested.

by Mike Fast on Aug 26, 2009 4:31 AM EDT up reply actions   0 recs

More questions

Does anybody know how to run PPM? The tuturiol online just says to type “ppm” in the command, but that doesn’t work.

Smoltz.

by vivaelpujols on Aug 26, 2009 3:26 AM EDT reply actions   0 recs

*Also*

I’m having trouble with the updated txt script. When I just change the start and end dates, it works fine, but when I it, it says “warning: no player list for $game\n”. Clearly, I’m messing something up by adding the “get gamehtml” code.

Here is my txt file, if anyone wants to take a look:

http://codepaste.net/fftshd

Smoltz.

by vivaelpujols on Aug 26, 2009 4:20 AM EDT reply actions   0 recs

Updated messaging for parser script

Add the following line somewhere around line 166 (not sure if my line numbers correspond exactly anymore):

print “\nParsing game number $select_game_id ($gamedir).\n”;

The whole chunk of code updating the games table should now look like this:


# Check if game info has been input before inputting umpire, at bat, and pitch info
$game_id_query = ‘SELECT game_id FROM games WHERE (date = ’ . $gamedate
. ’ AND home = ’ . $home . ’ AND away = ’ . $away . ’ AND game = ’ . $game_number . ’)’;
$sth= $dbh→prepare($game_id_query) or die $DBI::errstr;
$sth→execute();
my $numRows = $sth→rows;
if (1==$numRows) {
$select_game_id = $sth→fetchrow_array();
print “\nParsing game number $select_game_id ($gamedir).\n”;
} else {
die “duplicate game entry $select_game_id in database or game not found.\n”;
}
$sth→finish();

by Mike Fast on Aug 26, 2009 4:24 AM EDT reply actions   0 recs

Mike, I hate to ask

But do you think you could just post your entire text script? I’m having all sorts of problems, and I have no idea how to work them out. You could just paste it in a website like this:

http://codepaste.net/new

Smoltz.

by vivaelpujols on Aug 26, 2009 4:41 AM EDT up reply actions   0 recs

Just a reminder to eveyone

you need to change the start dates of this code to ,2,3,107. This will only download a couple games worth of data.

Smoltz.

by vivaelpujols on Aug 26, 2009 5:16 AM EDT up reply actions   0 recs

I updated this

I had forgot to remove some temporary tables from the list.

It should be good to go now. It won’t hurt you if you have those extras tables in your database, but they are not needed.

by Mike Fast on Aug 26, 2009 5:08 AM EDT up reply actions   0 recs

the script I use to update balls and strikes in the pitches table

http://codepaste.net/qq8ajk

Ideally this would be integrated into the parser, but I haven’t yet, so right now I run it separately.

by Mike Fast on Aug 26, 2009 4:54 AM EDT up reply actions   0 recs

I want to do some other updates to my parser script at some point

I really should have had it keep track of top/bottom of inning. I don’t know why I decided to throw that info away, but it’s a real pain if you try to do queries by teams instead of by players to have that missing.

The other thing I want to add is the fielders involved in a play and some tracking of the baserunning events like stolen bases, caught stealing, pickoffs, wild pitches, passed balls, etc.

I don’t have these features yet, but at some point in the future I may.

by Mike Fast on Aug 26, 2009 4:57 AM EDT up reply actions   0 recs

err

Nothing about fielding positions is currently tracked. The baserunner states are tracked, but not the events that led the baserunners to move between bases unless on a batted ball. Nothing currently associates the players to teams.

So definitely stuff that needs to be added. But you can do a ton of stuff without it.

The information is all in the files that have been spidered, so you wouldn’t need to download them again. But they would need to be parsed again, similar to how the ball-strike script I posted above parses through the data again and updates the database. That subsequent parsing for just a few fields is typically much faster than the original parsing.

It’s on my project list, but no guarantee of a timeline. :)

by Mike Fast on Aug 26, 2009 5:04 AM EDT up reply actions   0 recs

Yeah

I basically want to do something similar to what Dan did there, although I don’t care about Retrosheet codes, per se. I have to translate from Python to Perl if want to use his stuff, but that shouldn’t be too big of a problem.

Baseball-on-a-Stick (BBOS) also parses fielding events, so I intend to copy from what they did, too.

Because I already have an existing database structure, it’s not as simple as a straight copy-and-paste from either Dan or BBOS, but definitely I intend to borrow from them where I can. I never do anything from scratch if I don’t have to. Even though the parser has mutated to being about 80% my code, it sure helped me a lot to start with Adler’s parser than to write my own from scratch.

by Mike Fast on Aug 26, 2009 5:11 PM EDT up reply actions   0 recs

Spider Issue

I seem to be having some difficulty with the Spider scripts.

The spider script downloaded from O’Reilly seems to download two files for each game: boxscore.xml and game.xml. This doesn’t include any of the pitch or play-by-play data. I assume that

Mike, your version of that script downloads many files per game, including the two above plus inning, batters, pitchers, and pbp files. This is more what I expected. Presumably, there is some bug in that initial spider (that I may or may not have introduced myself) that is not present in Mike’s spider.

So, to everyone else out there trying this at home: if your data is incomplete, use Mike’s version of the spider.

by RedRobot8 on Aug 26, 2009 12:55 PM EDT reply actions   0 recs

Hey guys

would it be possible for someone to send me the updated spider file so that i could just replace the old one with it or would that not work?

by wrightnumber5 on Aug 26, 2009 1:17 PM EDT reply actions   0 recs

Viva, Mike or anyone else

I’m still having the time out problem, just as RedRobot said. I thought you guys had left this thread so i stopped posting. Any other thoughts or am i just screwed?

by garik16 on Aug 26, 2009 3:32 PM EDT reply actions   0 recs

Still here, same problem.

Timeouts galore, so no Pfx data for me. Maybe I just got lucky yesterday afternoon? I have failed to find a workaround.

by RedRobot8 on Aug 26, 2009 3:53 PM EDT up reply actions   0 recs

I'm having the same trouble

I’m sure it’s because we using it during the wrong time. It worked fine for me last night and I downloaded several months of games.

Smoltz.

by vivaelpujols on Aug 26, 2009 5:23 PM EDT up reply actions   0 recs

ping tests
$ ping gd.mlb.com
PING a1546.g.akamai.net (64.208.21.19): 56 data bytes
- a1546.g.akamai.net ping statistics -
8 packets transmitted, 8 packets received, 0% packet loss
round-trip min/avg/max/stddev = 13.509/14.006/14.769/0.393 ms
$ ping gd2.mlb.com
PING gd2.mlb.com (12.130.102.19): 56 data bytes
- gd2.mlb.com ping statistics -
16 packets transmitted, 0 packets received, 100% packet loss

I also noticed that gd.mlb.com is changing IP address very frequently, so I had no luck bypassing the DNS servers with a hardcoded IP.

by RedRobot8 on Aug 26, 2009 4:17 PM EDT up reply actions   0 recs

voodoo

5:00 ET seems to be magic time. I’ve been rolling for a solid 15 minutes now, easily the best I’ve done all day. Not sure why that would be, but I’m gonna see how far I get.

If the timeouts return, I may look into those Perl links Mike sent. Unfortunately, I have never coded in Perl before.

by RedRobot8 on Aug 26, 2009 5:19 PM EDT up reply actions   0 recs

Hey guys, so I'm having trouble finding the Perl Package Manager

Which I believe I need to run the parser script.

When I download and install Active Perl, it shows up on my desktop like this:

http://img.imgcake.com/snapshot_2009-08-26_21-07-55.jpg

Than the tuturiol online says I need to type “ppm” into terminal to open the Perl Package Manager. I do that, and it says, “-bash: ppm: command not found”.

When I search for ppm in the search function on my computer, the only related thing that shows up is, “ppm4.log” which opens in Console and doesn’t appear to be anything.

Can anyone help? How did you guys get the Perl Package Manager opened on your computer?

Smoltz.

by vivaelpujols on Aug 27, 2009 12:11 AM EDT reply actions   0 recs

You probably already have it installed.

Search for “ppm” on “This Mac” in “File Name”. In the search window, go to the little + sign next to “Save” to make another row of options appear. Choose “System Files” and “Include”. In this view, you should see PPM.app. Open it to view the Perl Package Manager GUI.

I haven’t gotten to this stage yet, so I don’t know what you are supposed to do once you are in the PPM.

by RedRobot8 on Aug 27, 2009 1:18 AM EDT up reply actions   0 recs

Awesome

It worked. Thanks a lot.

Smoltz.

by vivaelpujols on Aug 27, 2009 1:25 AM EDT up reply actions   0 recs

Hey, also how is your spider script working?

Mine downloaded about 3 months of data last night, and I had to shut it off this morning. When I restarted it, it now says a 404 error. I assume that it’s just because it’s the wrong time, but if your’s is working than it might just be something wrong with my script.

Smoltz.

by vivaelpujols on Aug 27, 2009 1:27 AM EDT up reply actions   0 recs

I haven't seen a 404 error.

I am still regularly getting the 500 “Operation timed out” error, so I am still struggling to get the dataset downloaded. Aside from a brief burst this afternoon, I haven’t been able to convince it to download more than one day at a time.

I’ll get back to it tomorrow.

by RedRobot8 on Aug 27, 2009 1:33 AM EDT up reply actions   0 recs

Okay, so once we eventually get all of the gd2.mlb data downloaded

The next step is to download Active Perl, and open the Perl Package Manager. You can find it on a mac using the directions that RedRobot8 outlined above. Once opened PPM will look like this:

Then, from Mike’s tutorial:

Running the XML-to-database parser script will require the Perl DBI::mysql package to be installed. If you don’t have that already, open the Perl Package Manager (under the Active Perl program group if you installed Active Perl). Under the View menu, select All Packages, and look for DBI and DBD-mysql. If they aren’t listed as installed, click on them to select them, and then go to the Action menu and Install them.

The database parser code can be found here:

http://codepaste.net/gjbeyv

And Mike also provided one for marking balls and strikes here:

http://codepaste.net/qq8ajk

However, I am unclear what to do with those once we install the neccessary packages on PPM. Any help?

Smoltz.

by vivaelpujols on Aug 27, 2009 4:08 AM EDT reply actions   0 recs

Have you installed MySQL

and created a database yet?

by Mike Fast on Aug 31, 2009 11:55 AM EDT up reply actions   0 recs

Yes, I have sequel and MAMP

And I used your updated code to create a database structure.

Smoltz.

by vivaelpujols on Aug 31, 2009 8:31 PM EDT up reply actions   0 recs

Then you're ready to parse the files into the database

You go to the directory/folder (whatever you kids are calling it these days ;) ) that contains your games directory. From there, you run the parser Perl script. This is where the rubber meets the road. You will find out if everything worked. The parser script will attempt to connect to the database and begin reading through the subdirectories under the games directory. It will parse each file and update/insert the tables in the database. If you are trying to parse years worth of data it will probably take something on the order of one day to parse one year’s worth of data, give or take depending on the speed of your computer.

by Mike Fast on Sep 1, 2009 12:12 AM EDT up reply actions   0 recs

So for me, it would be...

cd desktop/pitchfx
perl parser.pl

That’s all?

Smoltz.

by vivaelpujols on Sep 1, 2009 12:29 AM EDT up reply actions   0 recs

Yes, that's correct

Although, once you parse a game, you may want to move it to a new directory. I have a “to be parsed” directory and an “already parsed” directory. I suppose you could also just delete it since you have the relevant info in the database at that point, but I find that I want to go back and do something new with the data often enough that I keep the original XML files around.

The reason to move games that are already parsed to a new directory before you run the parser a second time is that it will try to parse every game in the directory where you run it. It will check the database and won’t duplicate entries in the database, but it wastes a lot of time to do that.

by Mike Fast on Sep 1, 2009 12:25 PM EDT up reply actions   0 recs

So all I have to do is run the parser script

And it does everything automatically for me (I hope)? Cool, didn’t know it would be that easy. And for the balls and strikes parser I do the same thing?

Smoltz.

by vivaelpujols on Sep 1, 2009 12:33 PM EDT up reply actions   0 recs

Yes

That’s how it’s designed.

by Mike Fast on Sep 1, 2009 1:41 PM EDT up reply actions   0 recs

So, it's not working

Here’s what it says:

install_driver(mysql) failed: Can’t locate DBD/mysql.pm in INC (INC contains: /System/Library/Perl/5.8.8/darwin-thread-multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level /Library/Perl/5.8.8 /Library/Perl /Network/Library/Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .) at (eval 3) line 3.
Perhaps the DBD::mysql perl module hasn’t been fully installed,
or perhaps the capitalisation of ‘mysql’ isn’t right.
Available drivers: DBM, ExampleP, File, Proxy, SQLite, Sponge.
 at parser_one.pl line 22

So obviously, I didn’t install the right packages on the PPM. When I go back in there, I see that DBI is installed, but it is also in a menu marked “To Install”.

Also, I couldn’t find DBD-Mysl. The one offered on PPM is called “DBD-MyslPP”. I installed that, but it doesn’t show up on the installed menu, it’s under the “To be installed menu” along with DBI.

Is it like queued and I just need to wait, or is there something I’m doing wrong?

Smoltz.

by vivaelpujols on Sep 1, 2009 7:41 PM EDT up reply actions   0 recs

Okay, so I fixed that by doing "FIle" > "Run Marked Actions"

However, I’m still getting the same message. Maybe it’s something wrong with the “DBD-mysqlPP” part?

Smoltz.

by vivaelpujols on Sep 1, 2009 7:55 PM EDT up reply actions   0 recs

You need to change your database connect statement

It needs to use the name of the new module, like so:
http://codepaste.net/81g3ef

You may also need to change the syntax of the $DBI::errstr portion of the database connect statement, but that’s not clear to me. I’m consulting this user guide for DBD-mysqlPP, fyi:
http://search.cpan.org/~oyama/DBD-mysqlPP-0.04/mysqlPP.pm

by Mike Fast on Sep 2, 2009 12:37 PM EDT up reply actions   0 recs

I still doesn't work

This is the message I am getting now:

install_driver(mysqlPP) failed: Can’t locate DBD/mysqlPP.pm in INC (INC contains: /System/Library/Perl/5.8.8/darwin-thread-multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level /Library/Perl/5.8.8 /Library/Perl /Network/Library/Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .) at (eval 3) line 3.
Perhaps the DBD::mysqlPP perl module hasn’t been fully installed,
or perhaps the capitalisation of ‘mysqlPP’ isn’t right.
Available drivers: DBM, ExampleP, File, Proxy, SQLite, Sponge.
 at parser_one.pl line 22

I’ll take a look at that cpan link you sent me and see if I can figure it out myself, unless it’s blatantly obvious to you what’s wrong…

Smoltz.

by vivaelpujols on Sep 2, 2009 6:56 PM EDT up reply actions   0 recs

It seems the mysqlPP module is still not installed correctly

I’m not sure what you need to do in Perl Package Manager to get that to happen.

You can also browse around your files and see if you can find the DBD folder containing the mysqlPP.pm file.

by Mike Fast on Sep 2, 2009 11:31 PM EDT up reply actions   0 recs

Okay

I removed it and re-downloaded it. It says it’s installed, and I went into the DBD folder on my computer and found it labeled “mysqlPP.pm”. It’s a text file, here are the contents:

http://www.codepaste.net/bs9u7h

Smoltz.

by vivaelpujols on Sep 3, 2009 12:28 AM EDT up reply actions   0 recs

There is another DBD file called SQL-Lite

Would do that do that same thing as mysqlPP?

Smoltz.

by vivaelpujols on Sep 3, 2009 12:39 AM EDT up reply actions   0 recs

I think you could use it in theory

But you will need to rewrite a lot of the code to do so.

by Mike Fast on Sep 3, 2009 12:51 AM EDT up reply actions   0 recs

Might it have to do with where my Perl files are located?

The error message says:

(Can’t locate DBD/mysqlPP.pm in INC (INC contains: /System/Library/Perl/5.8.8/darwin-thread-multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level /Library/Perl/5.8.8 /Library/Perl /Network/Library/Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .) at (eval 3) line 3.)

However, my mysqlPP file is located under “Library” → “ActivePerl-5.10” → “lib” → “DBD”

Smoltz.

by vivaelpujols on Sep 3, 2009 12:53 AM EDT up reply actions   0 recs

Is it in one of the directories in your path?

i.e., those listed after “INC contains:” in the error message?

by Mike Fast on Sep 3, 2009 12:49 AM EDT up reply actions   0 recs

No, I can't find those on my computer

I described my process right above.

Smoltz.

by vivaelpujols on Sep 3, 2009 12:58 AM EDT up reply actions   0 recs

Okay, so I was looking in the wrong place

I found the “System” > “Library” > “Perl” > Darwin" thing, and transported the DBD folder into that folder. I ran the script again, and now it says this:

install_driver(mysqlPP) failed: Can’t locate Net/MySQL.pm in INC (INC contains: /System/Library/Perl/5.8.8/darwin-thread-multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level /Library/Perl/5.8.8 /Library/Perl /Network/Library/Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .) at /System/Library/Perl/5.8.8/darwin-thread-multi-2level/DBD/mysqlPP.pm line 88.
BEGIN failed—compilation aborted at /System/Library/Perl/5.8.8/darwin-thread-multi-2level/DBD/mysqlPP.pm line 88.
Compilation failed in require at (eval 3) line 3.
Perhaps a module that DBD::mysqlPP requires hasn’t been fully installed
 at parser_one.pl line 22

Smoltz.

by vivaelpujols on Sep 3, 2009 1:20 AM EDT up reply actions   0 recs

Can’t locate Net/MySQL.pm

Try installing this module using the Perl Package Manager (or manually moving it into one of the INC directories like you did with DBD\mysqlPP).

by Mike Fast on Sep 3, 2009 5:58 PM EDT up reply actions   0 recs

I already have Net/MySQL.pm

When I moved it the Darwin folder and ran the script, it said:

failed: #28000Access denied for user ’user’@’localhost’ (using password: YES) at /System/Library/Perl/5.8.8/darwin-thread-multi-2level/DBD/mysqlPP.pm line 109

Any clue how to fix that?

Smoltz.

by vivaelpujols on Sep 5, 2009 5:56 AM EDT up reply actions   0 recs

When you installed MySQL

Did you choose a username and password? You will need to put that username and password into the script in place of “user” and “password”.

by Mike Fast on Sep 5, 2009 4:15 PM EDT up reply actions   0 recs

Huh, I don't think I did

When I connect Sequel and MAMP, my username and password are both “root”.

Smoltz.

by vivaelpujols on Sep 5, 2009 4:19 PM EDT up reply actions   0 recs

If you simply accepted the defaults

A Google search tells me that the default username in MAMP is “root” and the default password is also “root”.

If you accepted the defaults during setup, you would change this line:

$dbh = DBI→connect(“dbi:mysqlPP:database=pbp;host=localhost”, ‘user’, ‘password’)

to this:

$dbh = DBI→connect(“dbi:mysqlPP:database=pbp;host=localhost”, ‘root’, ‘root’)

by Mike Fast on Sep 5, 2009 4:19 PM EDT up reply actions   0 recs

Yep, I tried that right now

I’m still getting the same message.

DBI connect(‘database=pbp;host=localhost’,‘root’,…) failed: #28000Access denied for user ’root’@’localhost’ (using password: YES) at /System/Library/Perl/5.8.8/darwin-thread-multi-2level/DBD/mysqlPP.pm line 109
 at parser_one.pl line 22
#28000Access denied for user ’root’@’localhost’ (using password: YES) at /System/Library/Perl/5.8.8/darwin-thread-multi-2level/DBD/mysqlPP.pm line 109
Lauren-Steiners-MacBook-2:pitchfx Nick$

Smoltz.

by vivaelpujols on Sep 5, 2009 4:22 PM EDT up reply actions   0 recs

Thanks for sharing

Mike – thanks for sharing all of your files and the updates you’ve made. They’re a huge help to us inexperienced with database types. This is some kind of awesome.

by cabintwelve on Aug 29, 2009 4:16 AM EDT reply actions   0 recs

Quick question

First off, big thanks goes to Sky, Nick, Mike, and Colin for running this primer. Makes this a whole lot less daunting.

I’ve got my MAMP and Sequel Pro installed and running, bbdatabank and Historical_Win_Shares added and I’ve begun working with the introductory queries to get my feet wet. Alcoholic beverages have been popped.

Question, I’m running the query that concatenates the names but when I type in “FROM Master” the Master part lights up in blue as if it is part of the FROM command. “AS Name” is all blue as well. An error message returns when I run the query. Why is it changing to blue like it’s a command?

SELECT CONCAT AS Name, playerID, FROM master;

Thanks.

by habetw4 on Sep 2, 2009 12:57 PM EDT reply actions   0 recs

I'm back

Finally found a way to get all the data, so I’ll be moving on to the parser script stuff now. If anyone is still having trouble downloading the Pitch F/X, let me know.

by RedRobot8 on Sep 5, 2009 12:03 PM EDT reply actions   0 recs

Still having trouble

I’m still having trouble

by garik16 on Sep 5, 2009 1:08 PM EDT up reply actions   0 recs

Yes, please share

This is the most common complaint that people have, but since I haven’t encountered it, I can’t be of much help. So if you have a solution, it could help a lot of people.

by Mike Fast on Sep 5, 2009 4:16 PM EDT up reply actions   0 recs

For me, the script times out at random points

Like 7/11/07. So I just set the start day for one day after that, and it works again.

Smoltz.

by vivaelpujols on Sep 5, 2009 4:28 PM EDT up reply actions   0 recs

Question

Doesn’t that mean that your resulting dataset will be missing games at each timeout point?

by RedRobot8 on Sep 5, 2009 8:39 PM EDT up reply actions   0 recs

I would clear out any partial games

The spider script will check if a particular game directory has been created on your computer. If it has, it will skip to the next game. So if you have a partial game downloaded, it will not get the rest of the files for that game.

by Mike Fast on Sep 5, 2009 10:49 PM EDT up reply actions   0 recs

Nick

that’s not a random point; that’s the All-Star break. If there are days without games sometimes the script will need to be restarted for the following day. You’ll notice the same thing happening in the postseason when there are off days.

by Mike Fast on Sep 5, 2009 10:48 PM EDT up reply actions   0 recs

Still having trouble

I’m still having trouble RedRobot…whatd you do?

by garik16 on Sep 5, 2009 1:08 PM EDT reply actions   0 recs

I wasn't able to fix the script.

Instead, I downloaded SiteSucker and told it to fetch all .xml files from http://gd2.mlb.com/components/game/mlb/year_2007/ etc. It seems to take about 24 hours to download a year’s worth of data, but I haven’t had a single error yet.

SiteSucker is pretty easy to use; there were only a couple things that were tricky. You’ll need to tell it to ignore “robot exclusions”. Set the Download Option to “Get all files in same directory”.

by RedRobot8 on Sep 5, 2009 8:34 PM EDT up reply actions   0 recs

Umm is there a windows variant of sitesucker?

by garik16 on Sep 5, 2009 10:50 PM EDT up reply actions   0 recs

Yes.

Wget is a command line tool that should do the same thing. I’m sure that there are alternatives with GUIs as well, just search around a bit.

by RedRobot8 on Sep 5, 2009 10:57 PM EDT up reply actions   0 recs

Meh

Meh, command line. I’ll give it a shot maybe later.

by garik16 on Sep 5, 2009 11:01 PM EDT up reply actions   0 recs

Also

I haven’t gotten to the data processing stage yet, so I suppose that there is a chance that the data I am retrieving w/ SiteSucker isn’t the right stuff, but it looks okay so far.

by RedRobot8 on Sep 5, 2009 8:41 PM EDT reply actions   0 recs

There's nothing special about using the Perl spider script

And getting all the .xml files should be just fine.

by Mike Fast on Sep 5, 2009 10:51 PM EDT up reply actions   0 recs

Just checking

Has anyone, who’s actually using a mac, gotten all if this stuff to work yet?

Smoltz.

by vivaelpujols on Sep 6, 2009 12:45 AM EDT reply actions   0 recs

VEP, I've struggled through some of your issues and have a couple simple suggestions.

(1) Make sure MAMP is running when you run the parser script.

(2) If you have multiple versions of perl on your computer (I do), make sure you are running the ActivePerl version. To run the parser, I cd into the directory that contains the “games” folder and use the command:

/usr/local/ActivePerl-5.10/bin/perl ./PitchFX_parser.pl

(3) MAMP listens for a connection on a non-default port, so I had to change one of the lines near line 22 to:

$dbh = DBI→connect(“dbi:mysqlPP:database=pbp;host=localhost;port=8889”, ‘root’, ‘root’)

When you start MAMP, it loads a file in your browser that tells you what port # to use.

Let me know if that helps.

by RedRobot8 on Sep 8, 2009 3:15 PM EDT up reply actions   0 recs

one thing

In (3) above, if you named your databse “Pitchfx” like you did in the original post, the line should be:

$dbh = DBI→connect("dbi:mysqlPP:database=Pitchfx;host=localhost;port=8889", ‘root’, ‘root’)

by RedRobot8 on Sep 8, 2009 3:20 PM EDT up reply actions   0 recs

Okay, I think that means you are close.

The script is now running with no errors.

You are now having the same problem I am (see my post below).

by RedRobot8 on Sep 8, 2009 10:25 PM EDT up reply actions   0 recs

If you want to confirm, run my version of the parser to see exactly what is going on:

http://codepaste.net/v46rdr

This is functionally the same as Mike’s latest version but I added a bunch of print statements so I could see what it was doing when it started hanging.

by RedRobot8 on Sep 8, 2009 10:27 PM EDT up reply actions   0 recs

Thanks

Here is what comes up:

-——-Hello, World!
-——-foreach monthdir = .
-——-foreach monthdir = ..
-——-foreach monthdir = .DS_Store
-——-foreach monthdir = month_04
-——-foreach daydir = .
-——-foreach daydir = ..
-——-foreach daydir = day_02
-——-foreach gamedir = .
-——-foreach gamedir = ..
-——-foreach gamedir = gid_2007_04_02_arimlb_colmlb_1
-——-sub: extract_info
-——-sub: extract_date
-——-no_duplicate_query: SELECT game_id FROM games WHERE (date = ‘2007-04-02’ AND home = ‘col’ AND away = ‘ari’ AND game = 1)
-——-sth: DBI::st=HASH

Smoltz.

by vivaelpujols on Sep 8, 2009 10:34 PM EDT up reply actions   0 recs

Because we downloaded the data differently.

You used the spider script, I used SiteSucker. My way downloaded more extraneous folders that the parser had to sort through before it got to real data.

From your output, it looks like you are missing the April 1 Opening Day game between the Mets and Cards. Not an important omission at this point in the process.

Now we try to figure out why the parser is having trouble talking to the database.

by RedRobot8 on Sep 8, 2009 11:23 PM EDT up reply actions   0 recs

You're right

I redid the spider to download the missed games.

Smoltz.

by vivaelpujols on Sep 8, 2009 11:26 PM EDT up reply actions   0 recs

When I run my script, the full output is:
-——-Hello, World!
-——-foreach monthdir = .
-——-foreach monthdir = ..
-——-foreach monthdir = .DS_Store
-——-foreach monthdir = batters
-——-foreach monthdir = index.html
-——-foreach monthdir = media
-——-foreach monthdir = mobile
-——-foreach monthdir = month_04
-——-foreach daydir = .
-——-foreach daydir = ..
-——-foreach daydir = .DS_Store
-——-foreach daydir = day_01
-——-foreach gamedir = .
-——-foreach gamedir = ..
-——-foreach gamedir = .DS_Store
-——-foreach gamedir = batters
-——-foreach gamedir = gid_2007_04_01_nynmlb_slnmlb_1
-——-sub: extract_info
-——-sub: extract_date
-——-no_duplicate_query: SELECT game_id FROM games WHERE (date = ‘2007-04-01’ AND home = ‘sln’ AND away = ‘nyn’ AND game = 1)
-——-dbh→prepare(no_duplicate_query): DBI::st=HASH
^X^C

You will probably see fewer lines, depending on how many folders the script skips over. “^X^C” appears from when I hit Ctrl-X Ctrl-C to stop the script.

What it is doing is going through each folders until it finds the first game of the season. In 2007, which is what I am working with, this is Mets-Cardinals on April 1. Then when the script goes to access the DB for the first time (checking to make sure the game isn’t already in the system) it hangs (i.e. is non-responsive).

by RedRobot8 on Sep 8, 2009 10:33 PM EDT up reply actions   0 recs

Game.xml error

Hey guys, thanks for setting up this helpful tutorial (obviously big thanks to Mike). Now, Mike, when running your unified parser script I had gotten an error saying that game.xml did not exist – and it doesn’t (in the gid_2009 data folder, I have boxscore.xml and players.xml, but not game.xml). I’ve downloaded a month’s worth of data with all that information. Was there something wrong with my spider?

FWIW, I changed the path to parse game.xml to boxscore.xml, and this worked fine; I aborted it, though, because I don’t want to waste time parsing incomplete games, i.e. if I parse boxscore.xml I miss out on other data (like what may be contained in players.xml).

by michaeljay on Sep 8, 2009 10:04 AM EDT reply actions   0 recs

The spider should be getting game.xml

And if it’s not, something is wrong. Are you using the latest version of the spider (linked above):
http://codepaste.net/dvsm3q

The spider should try to download the game.xml file, and if it can’t get it from the Gameday site for some reason, it will give you a warning message:

if($gamehtml =~ m/<a href=[[COMMENT_CHILDREN_TOKEN]]#8220;game\.xml[[COMMENT_CHILDREN_TOKEN]]#8221;/ ) {
        $gameurl = “$dayurl/$game/game.xml”;
        $response = $browser→get($gameurl);
        die "Couldn’t get $gameurl: ", $response→status_line, “\n”
            unless $response→is_success;
        $infohtml = $response→content;
open GAME, “>$gamedir/game.xml”
            or die “could not open file $gamedir/game.xml: $|\n”;
print GAME $infohtml;
        close GAME;
        } else {
        print “warning: no xml game file for $game\n”;
        }

by Mike Fast on Sep 8, 2009 2:28 PM EDT up reply actions   0 recs

Thanks, Mike. I was using code from the baseball hacks book. Your updated code works just fine.

Now, downloading this data again… :p

by michaeljay on Sep 8, 2009 11:22 PM EDT up reply actions   0 recs

Current successes: I have downloaded all 2007 data and have gotten Mike Fast’s parser script to compile/run.

Current problem: The parser script hangs upon its first attempt to access the database.

The particular call that hangs is “$sth→execute();” near line 120. The $no_duplicate_query string seems to be fine: “SELECT game_id FROM games WHERE (date = ‘2007-04-01’ AND home = ‘sln’ AND away = ‘nyn’ AND game = 1)”. My pbp database has a table called “games” that was created by:

CREATE TABLE `games` (
  `game_id` smallint(5) unsigned NOT NULL auto_increment,
  `date` date NOT NULL,
  `home` varchar(7) NOT NULL,
  `away` varchar(7) NOT NULL,
  `game` tinyint(3) unsigned NOT NULL,
  `umpire` varchar(30) default NULL,
  `wind` tinyint(4) unsigned default NULL,
  `wind_dir` varchar(20) default NULL,
  `temp` tinyint(4) default NULL,
  `type` tinyint(3) unsigned NOT NULL default ‘1’,
  `runs_home` tinyint(3) unsigned default NULL,
  `runs_away` tinyint(3) unsigned default NULL,
  `local_time` time default NULL,
  PRIMARY KEY (`game_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

Mike or anyone else with Perl knowledge, have any thoughts?

by RedRobot8 on Sep 8, 2009 2:48 PM EDT reply actions   0 recs

Found it.

This problem appears to arise from a difference between DBD::mysql and DBD:mysqlPP.

From a site that Mike Fast linked to above:

The return value of execute(‘SELECT * from table’)
Although DBD::mysql makes a return value the number of searched records SQL of SELECT is performed, DBD::mysqlPP surely returns 0E0.

Now I just need to find a workaround.

by RedRobot8 on Sep 8, 2009 4:30 PM EDT up reply actions   0 recs

On second thought...

that shouldn’t be my problem. The script ignores the return value anyway; my problem is that the call never returns anything.

Wish I knew Perl better…

by RedRobot8 on Sep 8, 2009 4:40 PM EDT up reply actions   0 recs

Can you share

the exact text of the error message (if any) you are seeing when you say the parser script “hangs”. If no error message, describe exactly what you are doing and exactly what happens as a result.

by Mike Fast on Sep 8, 2009 5:32 PM EDT up reply actions   0 recs

By “hangs” I mean nothing happens. No error message, no print output, and I left it sitting for up to 20 minutes. I chased the error to that particular line by inserting “print” statements throughout the code.

Specifically, that code section for me looks like:
$sth = $dbh→prepare($no_duplicate_query) or die $DBI::errstr;
if ($debug) { print “-——-sth: $sth\n”; }
$sth→execute();
if ($debug) { print “-——-Goodbye, World!\n”; }
And when I run the script, I see:

————sth: DBI::st=HASH

Note that it never prints “Goodbye, World!” No matter how long I wait, there is no further progress.

by RedRobot8 on Sep 8, 2009 7:30 PM EDT up reply actions   0 recs

So then I kill the script with Ctrl-X-C.

Thus, it looks to me that one of the following must be true:

1 The “$sth = $dbh→prepare($)” call is not working correctly.
2 The “$sth→execute()” call is not working correctly.

That is as far as my troubleshooting has gotten.

by RedRobot8 on Sep 8, 2009 7:34 PM EDT up reply actions   0 recs

I found this link to a bug in mysqlPP

Other people have had similar problems with it:
http://bugs.mysql.com/bug.php?id=37331

It says that if you change the table type from MyISAM to InnoDB, it should work. So you might give that a go. The reason I use MyISAM tables is that it makes for much faster updates when the database gets big, but that didn’t start to be an issue for me until I started adding 2008 data to the database. So at least as proof of concept, you could try the InnoDB tables.

Or look around for a bug fix to mysqlPP.

by Mike Fast on Sep 9, 2009 1:59 AM EDT up reply actions   0 recs

You guys might also try reading this thread

and let me know what you think. It talks about how to get DBD:mysql, which is apparently not available in PPM by default but can be added.

http://forums.mysql.com/read.php?51,237322,237322#msg-237322

Look particularly at the message dated 1/16/2009 that is labeled “RESOLVED” and the discussion following that.

by Mike Fast on Sep 9, 2009 2:04 AM EDT up reply actions   0 recs

Yeah, I've seen that.

Unfortunately, for whatever reason the repositories linked there are all Windows-only (VEP and I are on Mac). There seem to be few (just one?) repositories for OSX and that one does not have dbd-mysql. So I think that leaves to avenues of pursuit:

(1) Try to compile dbd-mysql myself.
(2) Try to figure out how to fix mysqlPP.

by RedRobot8 on Sep 9, 2009 12:53 PM EDT up reply actions   0 recs

Hey, progress!!

I switched to InnoDB tables, and it stopped hanging. This is the message I got now:

-——-Hello, World!
-——-foreach monthdir = .
-——-foreach monthdir = ..
-——-foreach monthdir = .DS_Store
-——-foreach monthdir = month_04
-——-foreach daydir = .
-——-foreach daydir = ..
-——-foreach daydir = day_01
-——-foreach gamedir = .
-——-foreach gamedir = ..
-——-foreach gamedir = gid_2007_04_01_nynmlb_slnmlb_1
-——-sub: extract_info
-——-sub: extract_date
-——-no_duplicate_query: SELECT game_id FROM games WHERE (date = ‘2007-04-01’ AND home = ‘sln’ AND away = ‘nyn’ AND game = 1)
-——-sth: DBI::st=HASH
-——-Goodbye, World!

INSERT INTO games (date, home, away, game, wind, wind_dir, temp,
                runs_home, runs_away, local_time) VALUES (‘2007-04-01’, ‘sln’, ‘nyn’, 1, 17, ‘Out to RF’, 72, 1, 6, ‘19:05’)

Smoltz.

by vivaelpujols on Sep 9, 2009 10:54 AM EDT up reply actions   0 recs

And... we have duplication.

It appears that VivaElPujols has duplicated this issue (see above). Nice to see that my issue isn’t something unique to my setup,

by RedRobot8 on Sep 8, 2009 10:42 PM EDT up reply actions   0 recs

I have a friend who's good with Perl

I’ll ask him to take a quick look and see what he can find.

Smoltz.

by vivaelpujols on Sep 8, 2009 10:53 PM EDT up reply actions   0 recs

Just wanted to say that while I have nothing to contribute towards helping solve problems, I'm psyched that so many people are working diligently on this.

Big thanks to Mike, especially. Looks like there’s a strong possibility we get all the kinks worked out, which would be AWESOME.

If I can post something about calls to help about specific problems or anything else, just let me know.

by Sky Kalkman on Sep 8, 2009 5:13 PM EDT reply actions   0 recs

Okay, it seems to me that DBD-mysqlPP is beyond repair.

It has not been updated in 5+ years and is flat-out not compatible with Perl 5.10. Ugh.

Thus, I will investigate a way to compile/install DBD-mysql on Mac OS X. I do not anticipate that this will be an easy project.

If someone knows of an ActivePerl PPM repository for Mac that has DBD-mysql, please let me know!

by RedRobot8 on Sep 9, 2009 3:26 PM EDT reply actions   0 recs

Nevermind

May have a breakthrough. More soon.

by RedRobot8 on Sep 9, 2009 3:47 PM EDT up reply actions   0 recs

This might be a problem

As RedRobot said above, mysqlPP is shot; however that should be fixable.

The problem is that the mysqlPP parses 1 games worth of data in the database:

INSERT INTO games (date, home, away, game, wind, wind_dir, temp,
                runs_home, runs_away, local_time) VALUES (‘2007-04-01’, ‘sln’, ‘nyn’, 1, 17, ‘Out to RF’, 72, 1, 6, ‘19:05’)

When I checked my database, it parsed that same data 5 different times (because I ran the parser script 5 times). I thought it was supposed to check for duplicates?

Smoltz.

by vivaelpujols on Sep 9, 2009 6:27 PM EDT reply actions   0 recs

This is a symptom of one of the mysqlPP problems.

Even though we got through that first query, the →numRows() call always returns 0 even if there should have been a row returned. Thus, all duplicate checks fail.

mysqlPP may not be a viable option unless someone who really knows Perl can troubleshoot it for us. My pretending can only get so far.

by RedRobot8 on Sep 9, 2009 6:50 PM EDT up reply actions   0 recs

This looks to be promising

http://bixsolutions.net/forum/thread-8.html

Although I’m having trouble finding the CPAN/Build folder.

Smoltz.

by vivaelpujols on Sep 9, 2009 8:56 PM EDT reply actions   0 recs

Didn't work for me, but you should try anyway.

To find the CPAN build folder, type:

cd ~/.cpan/build/DBD-

and then press “Tab” to autocomplete the filename. Then press enter to move into that directory.

by RedRobot8 on Sep 9, 2009 9:33 PM EDT up reply actions   0 recs

I just did, but couldn't get it to work.

This is where my novice programming skills really become a liability; when ActivePerl can’t find my newly-MacPorted DBD::mysql (or there is some other error there), I don’t know how to begin troubleshooting.

FYI, my error message is:

$ /usr/local/ActivePerl-5.10/bin/perl ./PitchFX_parser.pl
DBI connect(‘database=pbp;host=localhost;port=8889’,‘root’,…) failed: Can’t connect to local MySQL server through socket ‘/tmp/mysql.sock’ (2) at ./PitchFX_parser.pl line 25
Can’t connect to local MySQL server through socket ‘/tmp/mysql.sock’ (2) at ./PitchFX_parser.pl line 25.

The line referenced there is the original DB connect call:
#$dbh = DBI→connect(“DBI:mysql:database=pbp;host=localhost;port=8889”, ‘root’, ‘root’)

I know that the the problem is not with the MySQL server because the following call successfully connects to the database:
$dbh = DBI→connect(“DBI:mysqlPP:database=pbp;host=localhost;port=8889”, ‘root’, ‘root’)

by RedRobot8 on Sep 10, 2009 4:56 PM EDT up reply actions   0 recs

Sorry, I copied the wrong line.

Or rather, I had already moved on to another possible solution and so had commented that line out.

by RedRobot8 on Sep 10, 2009 5:12 PM EDT up reply actions   0 recs

VivaElPujols

I think I got this MacPort thing to work. Check out Colin’s link, follow the instructions there, it is nice and simple.

Then, if you get the error where it can’t find mysql.stock, enter into the terminal:

$ ln -s /Applications/MAMP/tmp/mysql/mysql.sock /tmp/mysql.sock

by RedRobot8 on Sep 10, 2009 5:14 PM EDT up reply actions   0 recs

I think we have a solution

Colin pointed us to MacPorts, a site with software that allows us to install DBD::mysql for ActivePerl. Assuming that you are on a Mac and have been following along (i.e., using MAMP and have downloaded the Pitch F/X data), here are the steps to parse the data:

(1) Make sure that you have Xcode installed. If you don’t you can insert your OS X disc and install it from there. If you are newly installing it, run Software Update to get the latest version.

(2) Go to MacPorts, and download the DMG for your version of OS X. Follow the instructions during install.

(3) In Terminal, enter:

$ ln -s /Applications/MAMP/tmp/mysql/mysql.sock /tmp/mysql.sock

to tell ActivePerl where to find the mysql block.

(4) Copy Mike Fast’s parser script (linked above) into the folder that has the Pitch F/X “Games” directory.

(5) In the parser script, the database connect line should be (near line 21):

$dbh = DBI→connect(“DBI:mysql:database=pbp;host=localhost;port=8889”, ‘root’, ‘root’)

(6) In Terminal, cd into the directory that contains the parser script and enter:

/usr/local/ActivePerl-5.10/bin/perl ./PitchFX_parser.pl

(7) Watch the parsing goodness fly by. Mike said it would probably take ~ 24 hours to parse a year’s worth of data, so it could take a while.

Let me know if I missed any steps here or if you have any problems.

by RedRobot8 on Sep 10, 2009 5:45 PM EDT reply actions   0 recs

I'm getting this:
install_driver(mysql) failed: Can’t locate DBD/mysql.pm in INC (INC contains: /Users/Nick/Library/ActivePerl-5.10/lib /usr/local/ActivePerl-5.10/site/lib /usr/local/ActivePerl-5.10/lib .) at (eval 5) line 3.
Perhaps the DBD::mysql perl module hasn’t been fully installed,
or perhaps the capitalisation of ‘mysql’ isn’t right.
Available drivers: CSV, DBM, ExampleP, File, Gofer, ODBC, Proxy, SQLite, Sponge, mysqlPP.
 at ./parser_one.pl line 25

However, I assume I don’t have XCode properly downloaded. Unfortunately, I can’t seem to find it on my computer, and I have since lost my OS X disk. Do you know where else I can find it?

Smoltz.

by vivaelpujols on Sep 10, 2009 9:10 PM EDT up reply actions   0 recs

It looks like you can download it from Apple if you sign up for a free developer account.

by RedRobot8 on Sep 11, 2009 1:36 PM EDT up reply actions   0 recs

I downloaded that, and re-downloaded MacPorts

I still get the same message. Maybe I’m not installing the right thing on MacPorts; the one I’m downloading is 10.5 Leopard? Or is there anything I have to do with XCode to run the MacPorts?

More info for you, when I type in this command to let my computer know MacPorts is there:

$ ln -s /Applications/MAMP/tmp/mysql/mysql.sock /tmp/mysql.sock

I get this in return:

ln: /tmp/mysql.sock: File exists

That sounds right, so I don’t know why it’s not finding the “mysql” when it tries to parse the data. Maybe it’s my parser? I just copied it from Mike, and changed the DB name and user and passwords. Could you take a quick look?

http://www.codepaste.net/ytx3jm

Thanks in advance.

Smoltz.

by vivaelpujols on Sep 11, 2009 10:43 PM EDT up reply actions   0 recs

Ok

I went searching in MAMP to find the “mysql.sock” (which I assume is what MacPorts downloads for you), and when I click on it, it says; “and unexpected error code occurred (error code – 43).”

Smoltz.

by vivaelpujols on Sep 11, 2009 10:59 PM EDT up reply actions   0 recs

If you still have a problem after doing step 2.5 (below), you can try:
$ rm /tmp/mysql.sock
$ ln -s /Applications/MAMP/tmp/mysql/mysql.sock /tmp/mysql.sock

by RedRobot8 on Sep 11, 2009 11:46 PM EDT up reply actions   0 recs

Oops, I missed a step.

(2.5) In Terminal, enter:

sudo port install p5-dbd-mysql

Sorry about that!

by RedRobot8 on Sep 11, 2009 11:45 PM EDT up reply actions   0 recs

I'll forgive you ;)

I’m running that right now, and after it’s done I’ll run the rest of the steps.

Thanks a lot.

Smoltz.

by vivaelpujols on Sep 12, 2009 12:03 AM EDT up reply actions   0 recs

You shouldn’t have to do much else, just run the parser. Doing the mysql.sock I mentioned just above couldn’t hurt.

by RedRobot8 on Sep 12, 2009 12:10 AM EDT up reply actions   0 recs

Goddamnit, it's still not working

I ran the “sudo install” command and it worked fine (I think). Then, I ran the commands and I got the same message:

install_driver(mysql) failed: Can’t locate DBD/mysql.pm in INC (INC contains: /Users/Nick/Library/ActivePerl-5.10/lib /usr/local/ActivePerl-5.10/site/lib /usr/local/ActivePerl-5.10/lib .) at (eval 5) line 3.
Perhaps the DBD::mysql perl module hasn’t been fully installed,
or perhaps the capitalisation of ‘mysql’ isn’t right.
Available drivers: CSV, DBM, ExampleP, File, Gofer, ODBC, Proxy, SQLite, Sponge, mysqlPP.
 at ./parser_one.pl line 25

So I checked, and sure enough, the DBD/mysql wasn’t located in the Active Perl/ Lib, so I searched on my computer for DBD, and found a folder that contained this:

I figured that was the right one, so I moved it into the Lib folder. Then, when I run the steps, I get this:

install_driver(mysql) failed: Can’t locate loadable object for module DBD::mysql in INC (INC contains: /Users/Nick/Library/ActivePerl-5.10/lib /usr/local/ActivePerl-5.10/site/lib /usr/local/ActivePerl-5.10/lib .) at (eval 5) line 3
Compilation failed in require at (eval 5) line 3.
Perhaps a module that DBD::mysql requires hasn’t been fully installed
 at ./parser_one.pl line 25

What the hell.

Also, if it helps to explain some things. When I run “rm /tmp/mysql.sock”, it says “Permission denied”.

Smoltz.

by vivaelpujols on Sep 12, 2009 12:42 AM EDT reply actions   0 recs

(1) In terminal, enter:

$ port list installed

and make sure that p5-dbd-mysql shows up in the list. You should also see mysql5, p5-dbi, and perl5, among others.

(2) In terminal, enter:

sudo rm /tmp/mysql.sock
$ ln -s /Applications/MAMP/tmp/mysql/mysql.sock /tmp/mysql.sock

Aside from that, I don’t really know what else to suggest.

by RedRobot8 on Sep 12, 2009 5:36 PM EDT up reply actions   0 recs

If ActivePerl refuses to find the ported DBD-mysql, you could always try to make use of the Perl install that comes with OS X. That is the version that you invoke by just typing perl instead of /usr/local/ActivePerl-5.10/bin/perl.

If you run:

perl ./PitchFX_parser.pl

it will tell you what libraries you need. If it has DBD-mysql, it will probably complain about needing XML::simple, which I’m not sure exactly how to get.

by RedRobot8 on Sep 12, 2009 5:51 PM EDT up reply actions   0 recs

Okay, so your error message says that ActivePerl still can’t find the DBD::mysql library.

Just dragging the folder probably won’t work, because nothing else will know where it has gone. I’d suggest moving that folder back where it was.

I’ll get back to you tomorrow night.

by RedRobot8 on Sep 12, 2009 1:09 AM EDT up reply actions   0 recs

Sorta

I opted to use the extant Perl installation on OS X 10.3.9 (no problems, needed packages from CPAN) and build MySQL from scratch. The build went fine, but I’m not able to keep mysqld running outside the test suite. I’m no doubt missing something obvious, but I haven’t had time to fuss with it.

On the other hand, if one has a working MySQL installation and the issue is just getting (Active)Perl to talk to it through the parser script, the “normal” Perl might be worth a try. I think I needed LWP, XML::Simple, and DBI.

Case studies quote patients reporting a sensation of "giving way," a "bursting noise" or "sudden explosion."

by Caledonia on Sep 16, 2009 12:26 AM EDT up reply actions   0 recs

Thanks for chiming in.

Glad to hear that you got everything working without ActivePerl—I had some trouble getting XML::Simple installed and didn’t press the issue further. That will definitely be useful for anyone else trying to replicate this.

Let us know if you have any other difficulties or successes.

by RedRobot8 on Sep 16, 2009 2:03 AM EDT up reply actions   0 recs

I can finally confirm success
mysql> SELECT * FROM games atbats;
-—————————-—-———-—-————-—-——————-—-———-—————————--————-
| game_id | date | home | away | game | umpire | wind | wind_dir | temp | type | runs_home | runs_away | local_time |
-—————————-—-———-—-————-—-——————-—-———-—————————--————-
| 1 | 2009-08-24 | min | bal | 1 | 1 | 0 | Indoors | 70 | 1 | 2 | 1 | 19:10:00 |
| 2 | 2009-08-24 | bos | cha | 1 | 2 | 5 | Out to LF | 71 | 1 | 12 | 8 | 19:10:00 |
| 3 | 2009-08-24 | kca | cle | 1 | 3 | 12 | R to L | 80 | 1 | 6 | 10 | 19:10:00 |
| 4 | 2009-08-24 | ana | det | 1 | 4 | 4 | Out to CF | 79 | 1 | 7 | 10 | 19:05:00 |
| 5 | 2009-08-24 | was | mil | 1 | 5 | 6 | In from LF | 84 | 1 | 1 | 7 | 16:35:00 |
| 6 | 2009-08-24 | sea | oak | 1 | 6 | 0 | None | 72 | 1 | 3 | 1 | 19:10:00 |
| 7 | 2009-08-24 | nyn | phi | 1 | 7 | 4 | L to R | 78 | 1 | 2 | 6 | 13:10:00 |
| 8 | 2009-08-24 | col | sfn | 1 | 8 | 4 | Out to CF | 70 | 1 | 6 | 4 | 18:40:00 |
| 9 | 2009-08-24 | tor | tba | 1 | 9 | 9 | Out to CF | 71 | 1 | 7 | 12 | 19:07:00 |
-—————————-—-———-—-————-—-——————-—-———-—————————--————-
9 rows in set (0.00 sec)

There’s no front end, but the machinery works. The main problems were some bizarre permissions on /usr/local/ that appeared along the way and some funny business with /tmp (which should be a link to /private/tmp on OS X).

Case studies quote patients reporting a sensation of "giving way," a "bursting noise" or "sudden explosion."

by Caledonia on Sep 16, 2009 4:47 PM EDT up reply actions   0 recs

I should note...

…that “SELECT * FROM games atbats” above is an ill-formed query; all it does is select everything from the ‘games’ table. (I had failed to carefully peruse the record structures and have only a passing familiarity with SQL.)

Case studies quote patients reporting a sensation of "giving way," a "bursting noise" or "sudden explosion."

by Caledonia on Sep 16, 2009 5:29 PM EDT up reply actions   0 recs

XML::Simple

Should be as easy as

sudo perl -MCPAN -eshell
install XML::Simple

It may take a while based on the dependencies, but it’s not as complicated as DBI (which I had to ‘force install’ since it couldn’t test itself against the non-running MySQL).

Case studies quote patients reporting a sensation of "giving way," a "bursting noise" or "sudden explosion."

by Caledonia on Sep 16, 2009 2:21 PM EDT reply actions   0 recs

Yeah, I tried that, but install failed.

Didn’t bother to troubleshoot further. Hopefully my problem is an isolated issue and others won’t have issues.

by RedRobot8 on Sep 17, 2009 11:59 AM EDT up reply actions   0 recs

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?
Start posting on Beyond the Box Score »

Join SB Nation and dive into communities focused on all your favorite teams.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Small
PZR-based Win Values 2001-2006
Small
The "30 parks on a budget" challenge

Recent FanPosts

Small
Determining Batted Ball Rates using Pitch Type and Location
Small
a new xBABIP calculator
Img587561916661595
Top 15 high school MLB draft prospects
Sunflower_small
World Series Simulation, Game #6
Small
JT20 Dynasty League
E52205a2_small
New Look
Sth70021_small
Exploring Hit f/x, Albeit Badly
Redcap_small
Ricky Nolasco: 4 WAR or 1 WAR?

+ New FanPost All FanPosts >

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

Primer on BaseRuns
Cool Baseball Infographics
ESPN's Jerry Crasnick on defensive metrics
I’m also a follower, since Brian Bannister’s on our team, of sabermetric st...
Top Ten Baseball-Reference.com's Sponsorships
Primer on Linear Weights
JC Bradbury on "Hot Stove Myths"
Everyone Should Learn to Throw a Cutter
Criminals of WAR
Ten statisticians you should know about

+ New FanShot All FanShots >

BtB on Twitter

Main Feed: @BtBScore

Tommy B: @tommy_bennett
Sky: @BtB_Sky
Dan: @dturkenk
Harry: @harrypav
Jinaz: @jinazreds
Jack: @jh_moore
Erik: @Erik_Manning
Tommy R: @trancel
Justin: @justinbopp

Subscribe to BtB via Email

Enter your email address:

Delivered by FeedBurner

BtB Goes Social


Managers

Nando_small R.J. Anderson

Limes_125_small Sky Kalkman

E52205a2_small Tommy Bennett

Editors

Face_small Harry Pavlidis

Rawlings_baseball_bigger_small Dan Turkenkopf

770insig_small Jeff Zimmerman (TucsonRoyal)

Aviles_small Justin Bopp

Authors

Banny_small erik

Raysring1_small Tommy Rancel

Jinaz-reds-avatar_small JinAZ

Jmlogo_small Jack Moore

1753738656_110919ebe9_o_small vivaelpujols

1_small Graham

Baseball_small Mike Rogers

Redcap_small SFiercex4

Small Patrick Clark

Walter_album_small Walter Fulbright