Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: NFL Players Ready To Welcome Gay Teammate

Need Perl script help to get Minors data


I am working on a project currently and I am really hoping someone out there can help me out. I have a working Perl script for MLB data as well as AAA and the Southern League in AA. They use the same format as the Majors do, so it was easy for me to manipulate to get the data even though I know next to nothing about Perl, I was easily able to figure out how to make that switch.  The problem is that I can't figure out how to get the script to work for the Eastern League in AA and the rest of the minors.

The Majors/AAA/Southern League use the format of Game-PBP-Batters/Pitchers-playerfile.xml

The rest of the minors use the format of Game-Inning-Inning_X.xml

The best I can get my Perl script is to create the game folders, download the box score and player list and create an 'innings' folder, but I can't get it to download the Inning_X.xml files. I tried to email Jeff Sackmann who runs MinorLeagueSplits.com and haven't gotten a response yet so I am now reaching out to you guys and hoping someone can help me out as I have spent about 14 hours of the last 2 days trying to tweak little things with my file to try and make it work and I just haven't had any success beyond what it listed to start this paragraph and that really doesn't help me much. Appreciate you taking the time to read it.

Comment 24 comments  |  0 recs  | 

Do you like this story?

Comments

Display:

Can you post your Perl script on codepaste.net?

If you do I can take a look at it.

Btw, you might be interested to know that MLB is deprecating the use of the game-pbp-batters/pitchers directories for the 2010 season and removing them for previous seasons.

My spider script grabs the inning_x.xml files. You can take a look at it here:
http://codepaste.net/dvsm3q

by Mike Fast on Feb 12, 2010 2:00 PM EST reply actions  

Mike

You are an absolute life saver. I was able to find my problem by comparing your links data to what I had.

I had heard about the MLB data change. I hope that we will still be able to use the data somehow for the 2010 season (I have all the previous seasons already). I saw that Brooksbaseball.net already saw some kind of change that messed his site up and he is working around it, but is there going to be more to it than what he has already seen?

by dougdirt on Feb 12, 2010 4:27 PM EST up reply actions  

Here is what Cory Schwartz from MLBAM said yesterday.

I don’t think he would mind my passing it along.

Folks, just wanted to give you a heads-up that we are deprecating the individual batter and pitcher .xml files published under these directories:

http://gd2.mlb.com/components/game/mlb/year_$YEAR/month_$MONTH/day_$DAY/gid_*/pbp/batters/

http://gd2.mlb.com/components/game/mlb/year_$YEAR/month_$MONTH/day_$DAY/gid_*/pbp/pitchers/

If you’re using any data in those files you should be able to get it from other files in the gd2 directories, but we no longer need or use these for any of our internal purposes or products. In addition, we are deleting the 2008 and 2009 files from our servers to free up the disc space for other content.
The 2008/2009 files will be removed in the next day or two, maybe even today. We only have a 20-day offseason between the end of Caribbean Series and the start of spring training games, so we move fast on maintenance and the like.

by Mike Fast on Feb 12, 2010 4:37 PM EST reply actions  

I'm trying to accomplish the same thing as Doug

I’m trying to create a minor league database for mySQL. But I haven’t been able to find enough information to start. Does anyone have a good starting place for me? I’m at square one. Thanks

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 15, 2010 1:51 PM EST reply actions  

Square one?

Meaning you have no script/database experience at all, or meaning you have a working major league database and are wanting to supplement it with a minor league database, or something in between?

by Mike Fast on Feb 15, 2010 3:37 PM EST up reply actions  

Thanks Mike

Sorry I was so unclear. I have some script and database experience, I would like a minor league database. I’m more interested in the minors leagues and prospects (like doug).

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 15, 2010 3:58 PM EST up reply actions  

To download the game files

You can take the script I posted in the first comment and change the $baseurl to the appropriate minor league. Then you could use a variation of my database as described on this site.
http://www.beyondtheboxscore.com/2009/8/19/994666/saberizing-a-mac-4-pitch-f-x

Alternatively, you could do the same with Baseball on a Stick.
http://sourceforge.net/projects/baseballonastic/

Either way is going to require installing some software—Perl/Python, MySQL, etc.

by Mike Fast on Feb 15, 2010 6:32 PM EST up reply actions  

Thanks Mike!

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 15, 2010 9:03 PM EST up reply actions  

Sorry to be a pain

Maybe Doug or Devil (if he got this far) can help with what he did.

I have SQLyog not PHPSQL, I’m trying to create a database, but not not a pitch f/x one (because obviously there isn’t pitch f/x in the minors).Does anyone have an idea of how I create the database in SQLyog for minor leaguers? The example Mike hashave is great, but I’ve got a different administration interface and a different aim. It might be more similar to the bdb database.

Right now, I’ve got hack 28 almost finished (with a question to follow), mySQL running locally with SQLyog as my administrative host, and Perl downloaded (but that is about it, I’m not sure how to link and update the two).

My question about hack 28 is, do I need a different version for each league or can I adapt the script to incorporate all the leagues at once in the same database?

Thanks

-JD

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 16, 2010 10:14 PM EST up reply actions  

Wish I could help

But I don’t even use an SQL database. I just use a lot of excel.

by dougdirt on Feb 17, 2010 11:03 PM EST up reply actions  

I'm interested in the same thing JD is

yearlly minor league stats a la Retrosheet or bdb… I assume if one had the minor-leagye pbp data, one could calculate park factors, etc. Am I right in understanding that all this 2008 and 2009 data is going to be deleted for the servers, as listed above?

Ah, who am I kidding, I’m never going to figure this out… where the heck do forecasters get their minor league data?

I'm not a sabermetrician, but I do play one at FanGraphs.

Can't get enough of me? Check out my Twitter feed.

by Matt Klaassen on Feb 16, 2010 10:11 AM EST up reply actions  

My understanding of what Cory said

was that the pbp directories were going to be deleted for 2008 and 2009. The pbp directories contain duplicate information of what is in the inning directory. So no information is being lost per se, but the organization of the info is changing, and that breaks some people’s scripts or websites.

I don’t spider the minor-league data, so I’m not the right guy to ask for a tutorial on that, but I know that is there for the spidering in a very similar fashion to the major league data, and the same scripts could easily be adapted to get the minor league data. So if you want to jump in and try it for yourself, I’m sure I and others can offer some pointers or answer questions.

by Mike Fast on Feb 16, 2010 11:19 AM EST up reply actions  

Yes

They use the inning_X.xml files.

by Mike Fast on Feb 17, 2010 12:23 AM EST up reply actions  

Great

I’m planning on re-downloading the files, as I think I have a few duplicates in my DB caused by disruption in the original download.

by vivaelpujols on Feb 17, 2010 12:39 AM EST up reply actions  

The answer is

“where the heck do forecasters get their minor league data?”

Copy and paste. I’ve used different sources over the years – b-ref, baseball america, baseball cube. A long process, which is why I only do it once a year, when the season ends.

I’ll try some of these links, it would be nice to get a minor league database working but in the past I looked at this stuff and have not had the time to make any real progress.

Thanks Mike and everyone else for sharing your code.

The HK-47 hitting droid is the finest line drive machine ever built

by RallyMonkey5 on Feb 17, 2010 11:11 AM EST up reply actions  

Modified my spider

Right now it’s grabbing the inning files, boxscore.xml, players.xml. It runs sooo much faster if I’m not downloading the batter and pitcher folders.

Do you guys think that will give me enough info to build a retrosheet-like pbp database? Or are there some other needed files?

Batter, pitcher, and event result seem like they should be straight forward, but getting the fielders looks like a challenge. I’m thinking somehow to start from the boxscore and try to identify defensive changes.

The HK-47 hitting droid is the finest line drive machine ever built

by RallyMonkey5 on Feb 17, 2010 11:12 PM EST reply actions  

Fielders are tough.

I know Colin was working with the guys at Baseball on a Stick to incorporate that functionality.

It might be worth looking into whether they’ve finished

by Dan Turkenkopf on Feb 18, 2010 8:06 AM EST up reply actions  

Yep

Figuring out how to track fielders is on my to-do list, but I haven’t implemented it yet.

Rally, you don’t need the batter and pitcher folders. You might want to grab the game.xml file, though. It has a few tidbits of useful info.

by Mike Fast on Feb 18, 2010 11:37 AM EST up reply actions  

Anyone have a script that puts all the innings files into a big text or csv file?

Maybe I’ll have to figure out mysql, but I’m a lot more comfortable with Access. A season should be around 750000 rows I think.

The innings xml files can be opened in excel, but I only get the top inning, the home team doesn’t show up. I can fix this though, by replacing with , the end tag becomes , and then do something similar with the bottom tag.

If I do that with a script I could probably create what I need in visual basic. Probably not the most efficient, but I’m really good with VB, and suck at most programming languages (including perl).

The HK-47 hitting droid is the finest line drive machine ever built

by RallyMonkey5 on Feb 18, 2010 9:00 PM EST reply actions  

Rally

You can change the Perl script to talk to MS Access instead of MySQL. If you use my Perl database parser (http://codepaste.net/gjbeyv), you change the line

$dbh = DBI→connect(“DBI:mysql:database=pbp;host=localhost”, ‘user’, ‘password’)

to

 $dbh = DBI→connect(‘dbi:ODBC:DSN’, ‘user’, ‘password’);

where you’ve set up a DSN to connect to your Access database. I believe the rest of the parser script would stay the same.

by Mike Fast on Feb 19, 2010 12:47 AM EST up reply actions  

Thanks Mike

I’ll try that this weekend. In my last post my tag examples didn’t show up.

I’ll try to explain in English. There is a top and bottom tag in the xml. I change the top to something like “halfinn batteam=”top"". And similar with the bottom tag. With appropriate brackets. Just in case anyone wanted to open one of the inning xml documents into an excel table for a quick look and didn’t need to populate a database.

The HK-47 hitting droid is the finest line drive machine ever built

by RallyMonkey5 on Feb 19, 2010 8:54 AM EST up reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

Yahoo_full_count

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Recent_pic_pg_small Patrick Gordon

Btbpro_small Dave Gershman

Me_small Bryan Grosnick

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung

30472_1481067225243_1190689185_1381415_997334_n_small Glenn DuPaul

1mnvxku7_small joshuaworn

Set_small MattFilippi18

Photo0011_small Nathaniel Stoltz