Advanced Graphing Techniques Part 2 - Installing R, Linking it to Pitch F/X Database and Plotting an Graph
Update 2/17/10 - Driver needed for PC and order of commands changed
There seems to be no more questions on installing the Pitch F/X database, so I will continue on with installing and plotting in R. I am pretty much a beginner with R, so I will be struggling with you from now on. If anyone has any tricks or advice, please chime in.
1. Follow Part 1 - Advanced Graphing Techniques Part 1 - Installing a Pitch FX Database on a Computer
2. Goto http://cran.r-project.org/ and download the version for your computer. For Windows I selected the base version. I will be following the Windows instructions from now on. Others can chime in with the Apple instructions. Once the other instructions are verified, I will add them to the article.
3. Install R following the instructions. The only variation I took was that I installed all instruction manuals on my machine.
4. Start R and go to Packages along top tool bar. Select a location near you under the CRAN Mirror
5. Go back to Packages and select Install Packages. You want to add:
RODBC
Once it is added close R
5.5. Install ODBC Driver for MySQL at:
http://dev.mysql.com/downloads/connector/odbc/5.1.html
6. Open Microsoft Windows Control Panel and then double-click the Administrative Tools icon. Double-click on the Data Sources (ODBC) icon. Add a new User DSN called pfx with a MySQL ODBC 5.1 Diver.
7. Once a new User DSN is created, select it and press Configure
Set the following values (I am not 100% sure these are the same for all Window machines)
Server: localhost
port: 3306
user: root
Database: php2
These are the same as the ones on the SQLyog login and the Pitch F/X dataset installed
These setting let R know what database to use. Here is an image of the previous 2 panels discussed:
8. Open R again
In the R console run:
library(RODBC)
channel <- odbcConnect('pfx')
odbcGetInfo(channel)
sqlTables(channel)
All 4 lines can be copy and paste at once. These will connect R to database and let you know that they are connect. No errors should be seen and the Pitch FX tables should be listed.
9. Running a query. First you need to tell R that you are plotting a query.
Here is the query for Joe Mauer's home runs. I was having some problems, so I added "as x" and "as y" to describe the px and pz coordinates. Note, the entire query needs to be on a single line.
query <- sqlQuery(channel, "select p.px as x, p.pz as y from atbats a join pitches p on p.ab_id = a.ab_id join games g on g.game_id = a.game_id join players pl on pl.eliasid = a.batter where pl.last = 'Mauer' and pl.first = 'Joe' and g.date like '2009%' and a.event = 'Home Run' and p.type = 'X';")
Also the extents of data can be set at this time with the xlim and ylim commands
plot(query,xlim=c(-1.5,1.5), ylim=c(0,5))
Here is the command that will draw a red strike zone at the defined points:
lines(c(1,1,-1,-1,1),c(1.5,3.5,3.5,1.5,1.5), col=c("red"))
Here is what the output should look like:
8. Saving graphic. Select the R - Graphic Window, go to the top toolbar and select File, Save as and then the image type you want. The above image is from a saved file.
That is it for now. I am sure there will be more questions this time so fire away and let us know if you find out any ideas to make process smoother or better.
52 comments
|
1 recs |
Do you like this story?
Comments
Nit pick
Headline should read “Installing an R”.
/copyeditingnerdom
If I hit a hole-in-one on this grand slam the dominoes will fall like a house of cards. Checkmate.
by jasonkylebates on Feb 17, 2010 12:16 PM EST reply actions
Excellent
It would be nice to link up some R tutorial sites.
rzar.wordpress.com
draysbay.com
raysprospects.com
well, you really should know R already
..but you could do worse than browsing through http://cran.r-project.org/doc/contrib/usingR.pdf (warning: long). You could also type ‘learn R’ into Google.
A nice shortcut once you install R is ‘?’ and ‘??’ for ‘help’ and ‘search help for’. For example, ‘?paste’ will point you at the help file for a text function you might find useful building SQL.
You don't need to know R, that is why I am doing the series.
There is no need to know all of it, just the parts you want to use.
This series started out for people to create heat maps.
It’s not an easy subject, so we are helping people along.
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman on Feb 17, 2010 2:15 PM EST up reply actions
not sure yet
i don’t really know anything about odbc and R, but as far as i can tell we might try this odbc admin tool from apple http://support.apple.com/downloads/ODBC_Administrator_Tool_for_Mac_OS_X, maybe, and an odbc driver for os x, i think, but i am gonna research more first
It's not really neccesary for me to have to link R to SQL
As I can just copy and past the results from my query into a text file pretty easily; however, it would nice to be able to do.
by vivaelpujols on Feb 17, 2010 5:17 PM EST up reply actions
that sounds easier!
thanks for pointing that out, i am gonna start with that move…i installed the mysql driver and odbc admin tool but get it all working got hectic
There are other ways to connect R to MySQL than using ODBC
such as RMySQL. However, using one of them will change the R syntax for all the SQL queries from what Jeff is doing in the tutorial.
Otherwise, chiefjamesp’s suggestion seems like a reasonable course.
I tried the RMySQL route with no luck
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman on Feb 17, 2010 3:55 PM EST up reply actions
another mac odbc manager?
this seems like another option http://www.odbcmanager.net/, the only tricky part seems like maybe a free driver…the mysql connector is for 10.5, and i have read of a few problems with 10.6
I'm having trouble with step 6
I’m adding a user, using “SQL Server” as my data source, naming it pfx, but at the bottom of where I name it, there isn’t a SQL Sever to connect to.
My SQLyog has been installed for months and I use it all the time with the pbp2 database.
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
Try downloading the Driver from here:
http://dev.mysql.com/downloads/connector/odbc/5.1.html
I installed R about 6 months ago and uninstalled and installed it to do the write up. I may have added the drive then and not remembered.
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman on Feb 17, 2010 4:18 PM EST up reply actions
That worked!
But…. (I know I’m such a pain but this isn’t my forte) I don’t know what the rest of the information is.
Currently I am putting:
Data Source Name: pbp2
Description: pitch fx
Server: localhost
port: 3306
user: root
password:
and there isnt an option for database.
When I test I get “Connection Failed: [HY000] [MySQL][ODBC 5.1] Access denied for user ‘root’ @ localhost” (using password: YES)
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
nevermind
It is because I put a password in. I don’t have password for MySQL….
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
thanks for the driver
got that driver installed, thank you! but i can’t seem to figure out how to configure odbc admin tool in os x 10.6
when configure the user dsn it just seems i can specify a file path not the server settings
Ok next snag...
You mention,
No errors should be seen and the Pitch FX tables should be listed.
I didn’t get any errors from R, bu the only table I got listed was at_bats. In the form of:
TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE REMARKS
1 pbp2 atbats TABLE Play-by-play data
Then I used your next script to plot the query,
plot(query,xlim=c(-1.5,1.5), ylim=c(0,5))
But I got a message saying they couldn’t find the query,
“Error in plot(query, xlim = c(-1.5, 1.5), ylim = c(0, 5)) :
object ‘query’ not found”
Side note, I’ve been looking other data, and recently I’ve only had 13 entries pop up even when more should be. Any ideas?
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
Sorry, do it in this order:
query <- sqlQuery(channel, “select p.px as x, p.pz as y from atbats a join pitches p on p.ab_id = a.ab_id join games g on g.game_id = a.game_id join players pl on pl.eliasid = a.batter where pl.last = ‘Mauer’ and pl.first = ‘Joe’ and g.date like ‘2009%’ and a.event = ‘Home Run’ and p.type = ‘X’;”)
plot(query,xlim=c(-1.5,1.5), ylim=c(0,5))
lines(c(1,1,-1,-1,1),c(1.5,3.5,3.5,1.5,1.5), col=c(“red”))
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman on Feb 17, 2010 7:15 PM EST up reply actions
Thanks.
I got an error so I changed over the diagonal ’ above into the regular ’ and then pasted the code below and got the same error also below:
sqlQuery(channel, "SELECT p.px AS X, p.pz AS Y FROM atbats a JOIN pitches p ON p.ab_id = a.ab_id JOIN games g ON g.game_id = a.game_id JOIN players pl ON pl.eliasid = a.batter WHERE pl.last = ‘Mauer’ AND pl.first = ‘Joe’ AND g.date LIKE ‘2009%’ AND a.event = ‘Home Run’ AND p.type = ‘X’;")
Error:
unexpected input in “sqlQuery(channel, "”
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
Nevermind again
It was a spacing issue. The one you responded with has different spacing than the one in the main post.
Jeff, thanks SOOOO much for this. You’re a huge help. I look forward to future posts on heat maping!
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
So are you plotting the points in a useable output?
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman on Feb 17, 2010 8:23 PM EST up reply actions
yup 100% useable
I’m now trying to plot lines (but it is more like playing). I’m attempting to make each quadrant a different color.
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
But I should add
drawing lines doesn’t appear to be as simple as, X, Y, X, Y, unless I’m missing something. :)
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
Help for anyone else
lines(c(X1,X2,X3,X4,X5),c(Y1,Y2,Y3,Y4,Y5), col=c(“red”))
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
4 quads
> lines(c(1,1,-1,-1,1),c(1.5,3.5,3.5,1.5,1.5), col=c(“red”))
> lines(c(1,1,0,0,1),c(1.5,2.5,2.5,1.5,1.5), col=c(“green”))
> lines(c(1,1,0,0,1),c(2.5,3.5,3.5,2.5,2.5), col=c(“yellow”))
> lines(c(-1,-1,0,0,-1),c(1.5,2.5,2.5,1.5,1.5), col=c(“blue”))
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
strike zone sliced up further
plot(query,xlim=c(-1.5,1.5), ylim=c(1,4))
lines(c(1,1,0,0,1),c(1.5,2.5,2.5,1.5,1.5), col=c(“green”))
lines(c(1,1,0,0,1),c(2.5,3.5,3.5,2.5,2.5), col=c(“yellow”))
lines(c(-1,-1,0,0,-1),c(1.5,2.5,2.5,1.5,1.5), col=c(“blue”))
lines(c(-1,-1,0,0,-1),c(2.5,3.5,3.5,2.5,2.5), col=c(“red”))
lines(c(-0.5,-0.5,0.5,0.5,-0.5),c(2,3,3,2,2), col=c(“black”))
lines(c(0,-1,),c(2.5,3.5), col=c(“red”))
lines(c(0,-1),c(2.5,3.5), col=c(“red”))
lines(c(0,-1),c(2.5,1.5), col=c(“blue”))
lines(c(0,1),c(2.5,3.5), col=c(“yellow”))
lines(c(0,1),c(2.5,1.5), col=c(“green”))
lines(c(-0.25,-0.25,0.25,0.25,-0.25),c(2.25,2.75,2.75,2.25,2.25), col=c(“black”))
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
X and Y axis question
Jeff, I think the X and Y axis may be mislabeled (even though they don’t appear to be mislabeled according to the formula). I charted Brandon Webb’s 2008 grounds on fastballs and he only has 9 below the zone (below 1.5Y), and has probably 50 from -1.0X to -1.5X.
Just an observation.
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
What query are you using and I will check?
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman on Feb 17, 2010 11:21 PM EST up reply actions
Ok
query <- sqlQuery(channel, “select p.px as x, p.pz as y from atbats a join pitches p on p.ab_id = a.ab_id join games g on g.game_id = a.game_id join players pl on pl.eliasid = a.pitcher where pl.last = ‘Webb’ and pl.first = ‘Brandon’ and a.event = ‘Ground out’ and p.type = ‘X’;”)
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
Seems about right
The above equation is for all years though
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman on Feb 18, 2010 12:13 PM EST up reply actions
Yeah it is
The graphed results just looked off to me.
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
plot for camcrazy
Type:
plot(query,xlim=c(-1.5,1.5), ylim=c(1,4))
then
lines(c(1,1,-1,-1,1),c(1.5,3.5,3.5,1.5,1.5), col=c(“red”))
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
Thanks
I got it working otherwise, but only after getting all three lines of code (the query and those two) together in a text editor and then pasting them all into R at once. Everything seems to be working fine now – thank you for the help.
No problem, I am probably going to need help myself on the heat maps
I haven’t gotten them working yet.
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman on Feb 18, 2010 10:39 PM EST up reply actions
my next speed bump
I know I’m getting annoying, but I hope that when someone looks back on these questions they have the answer.
I closed R last night and saved my “workspace image” (though that appears to do nothing).
I trying running a query to see how it was working and I got this error,
Error: could not find function “sqlQuery”
So I tried to run what I’m calling the “start up commands”
library(RODBC)
channel <- odbcConnect(‘pfx’)
odbcGetInfo(channel)
sqlTables(channel)
I got the follow error for each,
library(RODBC)
Error in library(RODBC) : there is no package called ‘RODBC’
channel <- odbcConnect(‘pfx’)
Error: could not find function “odbcConnect”
odbcGetInfo(channel)
Error: could not find function “odbcGetInfo”
sqlTables(channel)
Error: could not find function “sqlTables”
I’ve loaded the ROBDC before and created between 8-10 graphs (last night). I’m not sure why it isn’t working now.
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
Ok, I made my own mess
Don’t move the “R” folder in your documents sections. I have a generic baseball folder with several folder within that that I moved mine to, and R couldn’t find it.
Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...
I'm just not going to bother linking R to SQL, I can't really imagine when I would really need it
I’ve already created the graph, so I’m ready for the heat maps!
I am probably not going to have time to get to the next part until Monday.
So I will probably have it ready in the early to middle part of next week.
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman on Feb 19, 2010 8:51 AM EST up reply actions
i am with ya
i took your advice and found it easier to export tables from sequel pro to use in R, so thanks for the tip on not ripping my hair out trying to link the two
by chiefjamesp on Feb 23, 2010 11:16 PM EST up reply actions
I've caught up! Woo!
I have had absolutely no experience with SQL (although I have a little with R) but I’ve (finally) managed to get through both of these tutorials after (quite) a few hitches. Consider this a testimonial to your ability to teach to the lowest common denominator. Many thanks for that.
I’m now trying to apply my new-found SQL knowledge. I’ve found quite a few articles to work on including one on applying it to Retrosheet, a skill I’ve often considered learning but never really got around to. I also need to read up on Pitchfx because while I know the basics, I’ve kind of avoided it in the past as too complicated.
However this process has left me with a few (basic) questions:
1. Is there a free program that makes it easy to unzip files? I downloaded the .gz file for pbp2 and I couldn’t get it to unzip into the sql file. I eventually downloaded a free trial* for
WinZip, but I figure if I can find something free that would be much better.
*I had a similar experience with mySQL at first. The site linked by Colin’s Databases for Sabermetrician’s article only had free trials. I eventually googled and found the community version.
2. I’m assuming the statspeak site doesn’t exist any more. I can’t seem to find an article by Colin about Building A Sabermetrician’s Workbench that I recalled skimming at the time. Is there a copy of it somewhere in Internetland?
3. Are there any other baseball SQL for beginner articles you would recommend? I just finished Sky’s handy one.
4. I couldn’t get the Batting Marcels from Colin’s Database article to work. I’m assuming I can’t copy-paste it for some reason. Any suggestions?
"Loyal? I'm the most loyal player money can buy." - Don Sutton
A couple answers
1. Try 7-Zip. It’s free and will handle gzipped files.
http://www.7-zip.org/
2. Right. MVN.com closed up shop, and the StatSpeak archives went into the aether with it. I have on my list to re-publish my own articles from their at some point, but you can always check the Internet Wayback Machine, too. I’ve had pretty good luck with finding old StatSpeak articles there up through mid 2008. More recent stuff (which includes Colin’s tenure there) doesn’t seem to show up on the Wayback archives.
In addition to the article you linked from Colin, there’s this one at THT:
http://www.hardballtimes.com/main/article/databases-for-sabermetricians-part-one/
1. I have winzip from when it was free and use it. Not sure if there are other programs other use.
2. What Mike says.
3. Besides Colin’s work, not much has been done. A discussion with Viva and I brought on this series. If you want to see more, make sure you say exactly what you want. There seemed to be quite a bit of interest on heat maps, so that will be the eventual focus.
Also the book Baseball Hacks isn’t bad, isn’t great. I found a used one for $5 online.
4. Never tried it.
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman on Feb 19, 2010 3:39 PM EST up reply actions
Yeah
I don’t really know of a good tutorial, but if you have any questions about how to get data, which functions are good, etc., just ask.
The batting marcel’s have worked for me. I don’t know why they wouldn’t work for you.
by vivaelpujols on Feb 19, 2010 6:17 PM EST up reply actions
Viewpoint
Thanks for doing this, its been very useful.
Bit of a dumb question, but I wanted to confirm that the graph produced at the end of the process is from the umpires perspective not the pitchers ?
It is from the pitcher's perspective
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman on Feb 23, 2010 3:36 PM EST up reply actions
I was going from this definition from Mike Fast’s site:
px: the left/right distance, in feet, of the pitch from the middle of the plate as it crossed home plate. The PITCHf/x coordinate system is oriented to the catcher’s/umpire’s perspective, with distances to the right being positive and to the left being negative.
Which implies a positive px value is to the right of the centre of home plate from the “catchers/umpires”perspective.
Reply Fail
- .-. ..- … – / – …. . / .—. .-. - .. . … …
by Jeff Zimmerman on Feb 23, 2010 3:36 PM EST up reply actions

by 






























