Need Perl script help to get Minors data
I am working on a project currently and I am really hoping someone out there can help me out. I have a working Perl script for MLB data as well as AAA and the Southern League in AA. They use the same format as the Majors do, so it was easy for me to manipulate to get the data even though I know next to nothing about Perl, I was easily able to figure out how to make that switch. The problem is that I can't figure out how to get the script to work for the Eastern League in AA and the rest of the minors.
The Majors/AAA/Southern League use the format of Game-PBP-Batters/Pitchers-playerfile.xml
The rest of the minors use the format of Game-Inning-Inning_X.xml
The best I can get my Perl script is to create the game folders, download the box score and player list and create an 'innings' folder, but I can't get it to download the Inning_X.xml files. I tried to email Jeff Sackmann who runs MinorLeagueSplits.com and haven't gotten a response yet so I am now reaching out to you guys and hoping someone can help me out as I have spent about 14 hours of the last 2 days trying to tweak little things with my file to try and make it work and I just haven't had any success beyond what it listed to start this paragraph and that really doesn't help me much. Appreciate you taking the time to read it.
24 comments
|
0 recs |
Do you like this story?
Comments
Can you post your Perl script on codepaste.net?
If you do I can take a look at it.
Btw, you might be interested to know that MLB is deprecating the use of the game-pbp-batters/pitchers directories for the 2010 season and removing them for previous seasons.
My spider script grabs the inning_x.xml files. You can take a look at it here:
http://codepaste.net/dvsm3q
Mike
You are an absolute life saver. I was able to find my problem by comparing your links data to what I had.
I had heard about the MLB data change. I hope that we will still be able to use the data somehow for the 2010 season (I have all the previous seasons already). I saw that Brooksbaseball.net already saw some kind of change that messed his site up and he is working around it, but is there going to be more to it than what he has already seen?
Here is what Cory Schwartz from MLBAM said yesterday.
I don’t think he would mind my passing it along.
Folks, just wanted to give you a heads-up that we are deprecating the individual batter and pitcher .xml files published under these directories:
http://gd2.mlb.com/components/game/mlb/year_$YEAR/month_$MONTH/day_$DAY/gid_*/pbp/batters/
http://gd2.mlb.com/components/game/mlb/year_$YEAR/month_$MONTH/day_$DAY/gid_*/pbp/pitchers/
If you’re using any data in those files you should be able to get it from other files in the gd2 directories, but we no longer need or use these for any of our internal purposes or products. In addition, we are deleting the 2008 and 2009 files from our servers to free up the disc space for other content.
The 2008/2009 files will be removed in the next day or two, maybe even today. We only have a 20-day offseason between the end of Caribbean Series and the start of spring training games, so we move fast on maintenance and the like.
In case anyone needs the raw data from 2008 - 2009, I still have all the XML files
For some reason I keep them around.
by Dan Turkenkopf on Feb 12, 2010 8:58 PM EST up reply actions
I'm trying to accomplish the same thing as Doug
I’m trying to create a minor league database for mySQL. But I haven’t been able to find enough information to start. Does anyone have a good starting place for me? I’m at square one. Thanks
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
Square one?
Meaning you have no script/database experience at all, or meaning you have a working major league database and are wanting to supplement it with a minor league database, or something in between?
Thanks Mike
Sorry I was so unclear. I have some script and database experience, I would like a minor league database. I’m more interested in the minors leagues and prospects (like doug).
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
To download the game files
You can take the script I posted in the first comment and change the $baseurl to the appropriate minor league. Then you could use a variation of my database as described on this site.
http://www.beyondtheboxscore.com/2009/8/19/994666/saberizing-a-mac-4-pitch-f-x
Alternatively, you could do the same with Baseball on a Stick.
http://sourceforge.net/projects/baseballonastic/
Either way is going to require installing some software—Perl/Python, MySQL, etc.
Sorry to be a pain
Maybe Doug or Devil (if he got this far) can help with what he did.
I have SQLyog not PHPSQL, I’m trying to create a database, but not not a pitch f/x one (because obviously there isn’t pitch f/x in the minors).Does anyone have an idea of how I create the database in SQLyog for minor leaguers? The example Mike hashave is great, but I’ve got a different administration interface and a different aim. It might be more similar to the bdb database.
Right now, I’ve got hack 28 almost finished (with a question to follow), mySQL running locally with SQLyog as my administrative host, and Perl downloaded (but that is about it, I’m not sure how to link and update the two).
My question about hack 28 is, do I need a different version for each league or can I adapt the script to incorporate all the leagues at once in the same database?
Thanks
-JD
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
I'm interested in the same thing JD is
yearlly minor league stats a la Retrosheet or bdb… I assume if one had the minor-leagye pbp data, one could calculate park factors, etc. Am I right in understanding that all this 2008 and 2009 data is going to be deleted for the servers, as listed above?
Ah, who am I kidding, I’m never going to figure this out… where the heck do forecasters get their minor league data?
I'm not a sabermetrician, but I do play one at FanGraphs.
Can't get enough of me? Check out my Twitter feed.
by Matt Klaassen on Feb 16, 2010 10:11 AM EST up reply actions
My understanding of what Cory said
was that the pbp directories were going to be deleted for 2008 and 2009. The pbp directories contain duplicate information of what is in the inning directory. So no information is being lost per se, but the organization of the info is changing, and that breaks some people’s scripts or websites.
I don’t spider the minor-league data, so I’m not the right guy to ask for a tutorial on that, but I know that is there for the spidering in a very similar fashion to the major league data, and the same scripts could easily be adapted to get the minor league data. So if you want to jump in and try it for yourself, I’m sure I and others can offer some pointers or answer questions.
Mike, will your scripts still work for major league Pitch f/x data with the new format?
by vivaelpujols on Feb 16, 2010 9:04 PM EST up reply actions
Great
I’m planning on re-downloading the files, as I think I have a few duplicates in my DB caused by disruption in the original download.
by vivaelpujols on Feb 17, 2010 12:39 AM EST up reply actions
The answer is
“where the heck do forecasters get their minor league data?”
Copy and paste. I’ve used different sources over the years – b-ref, baseball america, baseball cube. A long process, which is why I only do it once a year, when the season ends.
I’ll try some of these links, it would be nice to get a minor league database working but in the past I looked at this stuff and have not had the time to make any real progress.
Thanks Mike and everyone else for sharing your code.
The HK-47 hitting droid is the finest line drive machine ever built
by RallyMonkey5 on Feb 17, 2010 11:11 AM EST up reply actions
Modified my spider
Right now it’s grabbing the inning files, boxscore.xml, players.xml. It runs sooo much faster if I’m not downloading the batter and pitcher folders.
Do you guys think that will give me enough info to build a retrosheet-like pbp database? Or are there some other needed files?
Batter, pitcher, and event result seem like they should be straight forward, but getting the fielders looks like a challenge. I’m thinking somehow to start from the boxscore and try to identify defensive changes.
The HK-47 hitting droid is the finest line drive machine ever built
Fielders are tough.
I know Colin was working with the guys at Baseball on a Stick to incorporate that functionality.
It might be worth looking into whether they’ve finished
by Dan Turkenkopf on Feb 18, 2010 8:06 AM EST up reply actions
Anyone have a script that puts all the innings files into a big text or csv file?
Maybe I’ll have to figure out mysql, but I’m a lot more comfortable with Access. A season should be around 750000 rows I think.
The innings xml files can be opened in excel, but I only get the top inning, the home team doesn’t show up. I can fix this though, by replacing with , the end tag becomes , and then do something similar with the bottom tag.
If I do that with a script I could probably create what I need in visual basic. Probably not the most efficient, but I’m really good with VB, and suck at most programming languages (including perl).
The HK-47 hitting droid is the finest line drive machine ever built
Rally
You can change the Perl script to talk to MS Access instead of MySQL. If you use my Perl database parser (http://codepaste.net/gjbeyv), you change the line
$dbh = DBI→connect(“DBI:mysql:database=pbp;host=localhost”, ‘user’, ‘password’)
to
$dbh = DBI→connect(‘dbi:ODBC:DSN’, ‘user’, ‘password’);
where you’ve set up a DSN to connect to your Access database. I believe the rest of the parser script would stay the same.
Thanks Mike
I’ll try that this weekend. In my last post my tag examples didn’t show up.
I’ll try to explain in English. There is a top and bottom tag in the xml. I change the top to something like “halfinn batteam=”top"". And similar with the bottom tag. With appropriate brackets. Just in case anyone wanted to open one of the inning xml documents into an excel table for a quick look and didn’t need to populate a database.
The HK-47 hitting droid is the finest line drive machine ever built
by RallyMonkey5 on Feb 19, 2010 8:54 AM EST up reply actions

by 


























