Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: NFL Players Ready To Welcome Gay Teammate

Advanced Graphing Techniques Part 2 - Installing R, Linking it to Pitch F/X Database and Plotting an Graph

Update 2/17/10 - Driver needed for PC and order of commands changed

 

There seems to be no more questions on installing the Pitch F/X database, so I will continue on with installing and plotting in R.  I am pretty much a beginner with R, so I will be struggling with you from now on.  If anyone has any tricks or advice, please chime in.

 

1.  Follow Part 1 -  Advanced Graphing Techniques Part 1 - Installing a Pitch FX Database on a Computer

 

2.  Goto http://cran.r-project.org/ and download the version for your computer.  For Windows I selected the base version.  I will be following the Windows instructions from now on.  Others can chime in with the Apple instructions.  Once the other instructions are verified, I will add them to the article.

 

3. Install R following the instructions.  The only variation I took was that I installed all instruction manuals on my machine.

 

4. Start R and go to Packages along top tool bar.  Select a location near you under the CRAN Mirror

 

5. Go back to Packages and select Install Packages.  You want to add:

 

RODBC

 

Once it is added close R

Star-divide

5.5. Install ODBC Driver for MySQL at:

http://dev.mysql.com/downloads/connector/odbc/5.1.html

6.  Open Microsoft Windows Control Panel and then double-click the Administrative Tools icon. Double-click on the Data Sources (ODBC) icon.  Add a new User DSN called pfx with a MySQL ODBC 5.1 Diver.

 

7.  Once a new User DSN is created, select it and press Configure

 

Set the following values (I am not 100% sure these are the same for all Window machines)

Server: localhost

port: 3306

user: root

Database: php2

 

These are the same as the ones on the SQLyog login and the Pitch F/X dataset installed

 

Rinstall3_medium

 

These setting let R know what database to use.  Here is an image of the previous 2 panels discussed:

 

Rinstall_medium

 

8.  Open R again

 

In the R console run:

 

library(RODBC)

channel <- odbcConnect('pfx')

odbcGetInfo(channel)

sqlTables(channel)

 

All 4 lines can be copy and paste at once.  These will connect R to database and let you know that they are connect.  No errors should be seen and the Pitch FX tables should be listed.

 

9.  Running a query.  First you need to tell R that you are plotting a query. 

 

Here is the query for Joe Mauer's home runs.  I was having some problems, so I added "as x" and "as y" to describe the px and pz coordinates. Note, the entire query needs to be on a single line.

 

query <- sqlQuery(channel, "select p.px as x, p.pz as y from atbats a join pitches p on p.ab_id = a.ab_id join games g on g.game_id = a.game_id join players pl on pl.eliasid = a.batter where pl.last = 'Mauer' and pl.first = 'Joe' and g.date like '2009%' and a.event = 'Home Run' and p.type = 'X';")

 

Also the extents of data can be set at this time with the xlim and ylim commands


plot(query,xlim=c(-1.5,1.5), ylim=c(0,5))

 

Here is the command that will draw a red strike zone at the defined points:

 

lines(c(1,1,-1,-1,1),c(1.5,3.5,3.5,1.5,1.5), col=c("red"))

 

Here is what the output should look like:

 

Test_medium

8.  Saving graphic.  Select the R - Graphic Window, go to the top toolbar and select File, Save as and then the image type you want.  The above image is from a saved file.

 

That is it for now.  I am sure there will be more questions this time so fire away and let us know if you find out any ideas to make process smoother or better.

 

Comment 52 comments  |  1 recs  | 

Do you like this story?

Comments

Display:

Nit pick

Headline should read “Installing an R”.
/copyeditingnerdom

If I hit a hole-in-one on this grand slam the dominoes will fall like a house of cards. Checkmate.

by jasonkylebates on Feb 17, 2010 12:16 PM EST reply actions  

Excellent

It would be nice to link up some R tutorial sites.

rzar.wordpress.com
draysbay.com
raysprospects.com

by RZ on Feb 17, 2010 1:19 PM EST reply actions  

well, you really should know R already

..but you could do worse than browsing through http://cran.r-project.org/doc/contrib/usingR.pdf (warning: long). You could also type ‘learn R’ into Google.

A nice shortcut once you install R is ‘?’ and ‘??’ for ‘help’ and ‘search help for’. For example, ‘?paste’ will point you at the help file for a text function you might find useful building SQL.

by wcw on Feb 17, 2010 2:02 PM EST up reply actions  

You don't need to know R, that is why I am doing the series.

There is no need to know all of it, just the parts you want to use.

This series started out for people to create heat maps.

It’s not an easy subject, so we are helping people along.

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 17, 2010 2:15 PM EST up reply actions  

Hmmm

I’m on a mac, and I don’t know what to do for steps 6 and 7. Anyone else?

by vivaelpujols on Feb 17, 2010 2:33 PM EST reply actions  

not sure yet

i don’t really know anything about odbc and R, but as far as i can tell we might try this odbc admin tool from apple http://support.apple.com/downloads/ODBC_Administrator_Tool_for_Mac_OS_X, maybe, and an odbc driver for os x, i think, but i am gonna research more first

by chiefjamesp on Feb 17, 2010 3:29 PM EST up reply actions  

It's not really neccesary for me to have to link R to SQL

As I can just copy and past the results from my query into a text file pretty easily; however, it would nice to be able to do.

by vivaelpujols on Feb 17, 2010 5:17 PM EST up reply actions  

that sounds easier!

thanks for pointing that out, i am gonna start with that move…i installed the mysql driver and odbc admin tool but get it all working got hectic

by chiefjamesp on Feb 17, 2010 5:33 PM EST up reply actions  

There are other ways to connect R to MySQL than using ODBC

such as RMySQL. However, using one of them will change the R syntax for all the SQL queries from what Jeff is doing in the tutorial.

Otherwise, chiefjamesp’s suggestion seems like a reasonable course.

by Mike Fast on Feb 17, 2010 3:39 PM EST up reply actions  

I tried the RMySQL route with no luck

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 17, 2010 3:55 PM EST up reply actions  

another mac odbc manager?

this seems like another option http://www.odbcmanager.net/, the only tricky part seems like maybe a free driver…the mysql connector is for 10.5, and i have read of a few problems with 10.6

by chiefjamesp on Feb 17, 2010 4:03 PM EST up reply actions  

I'm having trouble with step 6

I’m adding a user, using “SQL Server” as my data source, naming it pfx, but at the bottom of where I name it, there isn’t a SQL Sever to connect to.

My SQLyog has been installed for months and I use it all the time with the pbp2 database.

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 17, 2010 4:03 PM EST reply actions  

Try downloading the Driver from here:

http://dev.mysql.com/downloads/connector/odbc/5.1.html

I installed R about 6 months ago and uninstalled and installed it to do the write up. I may have added the drive then and not remembered.

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 17, 2010 4:18 PM EST up reply actions  

That worked!

But…. (I know I’m such a pain but this isn’t my forte) I don’t know what the rest of the information is.

Currently I am putting:
Data Source Name: pbp2
Description: pitch fx
Server: localhost
port: 3306
user: root
password:
and there isnt an option for database.

When I test I get “Connection Failed: [HY000] [MySQL][ODBC 5.1] Access denied for user ‘root’ @ localhost” (using password: YES)

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 17, 2010 4:49 PM EST up reply actions  

nevermind

It is because I put a password in. I don’t have password for MySQL….

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 17, 2010 4:51 PM EST up reply actions  

thanks for the driver

got that driver installed, thank you! but i can’t seem to figure out how to configure odbc admin tool in os x 10.6
when configure the user dsn it just seems i can specify a file path not the server settings

by chiefjamesp on Feb 17, 2010 5:04 PM EST up reply actions  

Ok next snag...

You mention,

No errors should be seen and the Pitch FX tables should be listed.

I didn’t get any errors from R, bu the only table I got listed was at_bats. In the form of:

TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE REMARKS
1 pbp2 atbats TABLE Play-by-play data

Then I used your next script to plot the query,

plot(query,xlim=c(-1.5,1.5), ylim=c(0,5))

But I got a message saying they couldn’t find the query,


“Error in plot(query, xlim = c(-1.5, 1.5), ylim = c(0, 5)) :
  object ‘query’ not found”

Side note, I’ve been looking other data, and recently I’ve only had 13 entries pop up even when more should be. Any ideas?

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 17, 2010 5:14 PM EST reply actions  

Sorry, do it in this order:

 query <- sqlQuery(channel, “select p.px as x, p.pz as y from atbats a join pitches p on p.ab_id = a.ab_id join games g on g.game_id = a.game_id join players pl on pl.eliasid = a.batter where pl.last = ‘Mauer’ and pl.first = ‘Joe’ and g.date like ‘2009%’ and a.event = ‘Home Run’ and p.type = ‘X’;”)

plot(query,xlim=c(-1.5,1.5), ylim=c(0,5))

lines(c(1,1,-1,-1,1),c(1.5,3.5,3.5,1.5,1.5), col=c(“red”))
 

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 17, 2010 7:15 PM EST up reply actions  

Thanks.

I got an error so I changed over the diagonal ’ above into the regular ’ and then pasted the code below and got the same error also below:


sqlQuery(channel, "SELECT p.px AS X, p.pz AS Y FROM atbats a JOIN pitches p ON p.ab_id = a.ab_id JOIN games g ON g.game_id = a.game_id JOIN players pl ON pl.eliasid = a.batter WHERE pl.last = ‘Mauer’ AND pl.first = ‘Joe’ AND g.date LIKE ‘2009%’ AND a.event = ‘Home Run’ AND p.type = ‘X’;")

Error:

unexpected input in “sqlQuery(channel, "”

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 17, 2010 8:11 PM EST up reply actions  

Nevermind again

It was a spacing issue. The one you responded with has different spacing than the one in the main post.

Jeff, thanks SOOOO much for this. You’re a huge help. I look forward to future posts on heat maping!

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 17, 2010 8:15 PM EST up reply actions  

So are you plotting the points in a useable output?

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 17, 2010 8:23 PM EST up reply actions  

yup 100% useable

I’m now trying to plot lines (but it is more like playing). I’m attempting to make each quadrant a different color.

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 17, 2010 8:53 PM EST up reply actions  

But I should add

drawing lines doesn’t appear to be as simple as, X, Y, X, Y, unless I’m missing something. :)

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 17, 2010 8:59 PM EST up reply actions  

Help for anyone else

lines(c(X1,X2,X3,X4,X5),c(Y1,Y2,Y3,Y4,Y5), col=c(“red”))

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 17, 2010 9:20 PM EST up reply actions  

4 quads

> lines(c(1,1,-1,-1,1),c(1.5,3.5,3.5,1.5,1.5), col=c(“red”))
> lines(c(1,1,0,0,1),c(1.5,2.5,2.5,1.5,1.5), col=c(“green”))
> lines(c(1,1,0,0,1),c(2.5,3.5,3.5,2.5,2.5), col=c(“yellow”))
> lines(c(-1,-1,0,0,-1),c(1.5,2.5,2.5,1.5,1.5), col=c(“blue”))

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 17, 2010 9:23 PM EST up reply actions  

strike zone sliced up further

plot(query,xlim=c(-1.5,1.5), ylim=c(1,4))
lines(c(1,1,0,0,1),c(1.5,2.5,2.5,1.5,1.5), col=c(“green”))
lines(c(1,1,0,0,1),c(2.5,3.5,3.5,2.5,2.5), col=c(“yellow”))
lines(c(-1,-1,0,0,-1),c(1.5,2.5,2.5,1.5,1.5), col=c(“blue”))
lines(c(-1,-1,0,0,-1),c(2.5,3.5,3.5,2.5,2.5), col=c(“red”))
lines(c(-0.5,-0.5,0.5,0.5,-0.5),c(2,3,3,2,2), col=c(“black”))
lines(c(0,-1,),c(2.5,3.5), col=c(“red”))
lines(c(0,-1),c(2.5,3.5), col=c(“red”))
lines(c(0,-1),c(2.5,1.5), col=c(“blue”))
lines(c(0,1),c(2.5,3.5), col=c(“yellow”))
lines(c(0,1),c(2.5,1.5), col=c(“green”))
lines(c(-0.25,-0.25,0.25,0.25,-0.25),c(2.25,2.75,2.75,2.25,2.25), col=c(“black”))

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 17, 2010 10:12 PM EST up reply actions  

X and Y axis question

Jeff, I think the X and Y axis may be mislabeled (even though they don’t appear to be mislabeled according to the formula). I charted Brandon Webb’s 2008 grounds on fastballs and he only has 9 below the zone (below 1.5Y), and has probably 50 from -1.0X to -1.5X.

Just an observation.

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 17, 2010 10:36 PM EST reply actions  

What query are you using and I will check?

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 17, 2010 11:21 PM EST up reply actions  

Ok

query <- sqlQuery(channel, “select p.px as x, p.pz as y from atbats a join pitches p on p.ab_id = a.ab_id join games g on g.game_id = a.game_id join players pl on pl.eliasid = a.pitcher where pl.last = ‘Webb’ and pl.first = ‘Brandon’ and a.event = ‘Ground out’ and p.type = ‘X’;”)

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 18, 2010 11:52 AM EST up reply actions  

Seems about right

The above equation is for all years though

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 18, 2010 12:13 PM EST up reply actions  

Yeah it is

The graphed results just looked off to me.

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 18, 2010 12:15 PM EST up reply actions  

plot for camcrazy

Type:

plot(query,xlim=c(-1.5,1.5), ylim=c(1,4))

then

lines(c(1,1,-1,-1,1),c(1.5,3.5,3.5,1.5,1.5), col=c(“red”))

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 18, 2010 7:30 PM EST up reply actions  

Thanks

I got it working otherwise, but only after getting all three lines of code (the query and those two) together in a text editor and then pasting them all into R at once. Everything seems to be working fine now – thank you for the help.

by DanielMoroz on Feb 18, 2010 7:57 PM EST up reply actions  

Second thanks

To Jeff for putting this together. Very much appreciated.

by DanielMoroz on Feb 18, 2010 7:59 PM EST up reply actions  

No problem, I am probably going to need help myself on the heat maps

I haven’t gotten them working yet.

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 18, 2010 10:39 PM EST up reply actions  

my next speed bump

I know I’m getting annoying, but I hope that when someone looks back on these questions they have the answer.

I closed R last night and saved my “workspace image” (though that appears to do nothing).

I trying running a query to see how it was working and I got this error,

Error: could not find function “sqlQuery”

So I tried to run what I’m calling the “start up commands”

library(RODBC)

channel <- odbcConnect(‘pfx’)

odbcGetInfo(channel)

sqlTables(channel)

I got the follow error for each,

library(RODBC)
Error in library(RODBC) : there is no package called ‘RODBC’
channel <- odbcConnect(‘pfx’)
Error: could not find function “odbcConnect”
odbcGetInfo(channel)
Error: could not find function “odbcGetInfo”
sqlTables(channel)
Error: could not find function “sqlTables”

I’ve loaded the ROBDC before and created between 8-10 graphs (last night). I’m not sure why it isn’t working now.

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 18, 2010 12:18 PM EST reply actions  

Ok, I made my own mess

Don’t move the “R” folder in your documents sections. I have a generic baseball folder with several folder within that that I moved mine to, and R couldn’t find it.

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 18, 2010 12:35 PM EST up reply actions  

I am probably not going to have time to get to the next part until Monday.

So I will probably have it ready in the early to middle part of next week.

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 19, 2010 8:51 AM EST up reply actions  

i am with ya

i took your advice and found it easier to export tables from sequel pro to use in R, so thanks for the tip on not ripping my hair out trying to link the two

by chiefjamesp on Feb 23, 2010 11:16 PM EST up reply actions  

I've caught up! Woo!

I have had absolutely no experience with SQL (although I have a little with R) but I’ve (finally) managed to get through both of these tutorials after (quite) a few hitches. Consider this a testimonial to your ability to teach to the lowest common denominator. Many thanks for that.

I’m now trying to apply my new-found SQL knowledge. I’ve found quite a few articles to work on including one on applying it to Retrosheet, a skill I’ve often considered learning but never really got around to. I also need to read up on Pitchfx because while I know the basics, I’ve kind of avoided it in the past as too complicated.

However this process has left me with a few (basic) questions:
1. Is there a free program that makes it easy to unzip files? I downloaded the .gz file for pbp2 and I couldn’t get it to unzip into the sql file. I eventually downloaded a free trial* for
WinZip, but I figure if I can find something free that would be much better.

*I had a similar experience with mySQL at first. The site linked by Colin’s Databases for Sabermetrician’s article only had free trials. I eventually googled and found the community version.

2. I’m assuming the statspeak site doesn’t exist any more. I can’t seem to find an article by Colin about Building A Sabermetrician’s Workbench that I recalled skimming at the time. Is there a copy of it somewhere in Internetland?

3. Are there any other baseball SQL for beginner articles you would recommend? I just finished Sky’s handy one.

4. I couldn’t get the Batting Marcels from Colin’s Database article to work. I’m assuming I can’t copy-paste it for some reason. Any suggestions?

"Loyal? I'm the most loyal player money can buy." - Don Sutton

by vignette17 on Feb 19, 2010 3:02 PM EST reply actions  

A couple answers

1. Try 7-Zip. It’s free and will handle gzipped files.
http://www.7-zip.org/

2. Right. MVN.com closed up shop, and the StatSpeak archives went into the aether with it. I have on my list to re-publish my own articles from their at some point, but you can always check the Internet Wayback Machine, too. I’ve had pretty good luck with finding old StatSpeak articles there up through mid 2008. More recent stuff (which includes Colin’s tenure there) doesn’t seem to show up on the Wayback archives.

In addition to the article you linked from Colin, there’s this one at THT:
http://www.hardballtimes.com/main/article/databases-for-sabermetricians-part-one/

by Mike Fast on Feb 19, 2010 3:33 PM EST up reply actions  

1. I have winzip from when it was free and use it. Not sure if there are other programs other use.

2. What Mike says.

3. Besides Colin’s work, not much has been done. A discussion with Viva and I brought on this series. If you want to see more, make sure you say exactly what you want. There seemed to be quite a bit of interest on heat maps, so that will be the eventual focus.

Also the book Baseball Hacks isn’t bad, isn’t great. I found a used one for $5 online.

4. Never tried it.

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 19, 2010 3:39 PM EST up reply actions  

Yeah

I don’t really know of a good tutorial, but if you have any questions about how to get data, which functions are good, etc., just ask.

The batting marcel’s have worked for me. I don’t know why they wouldn’t work for you.

by vivaelpujols on Feb 19, 2010 6:17 PM EST up reply actions  

Viewpoint

Thanks for doing this, its been very useful.

Bit of a dumb question, but I wanted to confirm that the graph produced at the end of the process is from the umpires perspective not the pitchers ?

by Gwyn on Feb 23, 2010 1:03 PM EST reply actions  

It is from the pitcher's perspective

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 23, 2010 3:36 PM EST up reply actions  

I was going from this definition from Mike Fast’s site:
px: the left/right distance, in feet, of the pitch from the middle of the plate as it crossed home plate. The PITCHf/x coordinate system is oriented to the catcher’s/umpire’s perspective, with distances to the right being positive and to the left being negative.
Which implies a positive px value is to the right of the centre of home plate from the “catchers/umpires”perspective.

by Gwyn on Feb 24, 2010 9:12 AM EST up reply actions  

It is from the pitcher's perspective

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 23, 2010 3:36 PM EST reply actions  

Reply Fail

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 23, 2010 3:36 PM EST up reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

Yahoo_full_count

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Recent_pic_pg_small Patrick Gordon

Btbpro_small Dave Gershman

Me_small Bryan Grosnick

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung

30472_1481067225243_1190689185_1381415_997334_n_small Glenn DuPaul

1mnvxku7_small joshuaworn

Set_small MattFilippi18

Photo0011_small Nathaniel Stoltz