Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Yankees Deny Rumors That Team Is For Sale

Advanced Graphing Techniques Part 1 - Installing a Pitch FX Database on a Computer

There was discussion in another FanPost on showing people how to do some more advanced graphing techniques.  I have wanted to create some heat maps myself, so I will start a series on how to do more advanced graphing and calculations.

I will step through the following 3 Parts for now and then go into other subjects.

  1. Setting up a pitch FX dataset and making a query.
  2. Installing R, linking it to the dataset and plotting some points
  3. Creating a heatmap using R

I will keep a general theme going and we can see how using advanced features gives us better results.  We will follow how and where Joe Mauer hits home runs off right handed pitchers in 2009.  Once everyone following the process is done, we will move to the next step.  I am running on a PC, so there maybe different steps for those with Macs.  Vivaelpujols will be following the series and I am pretty sure he has a Mac to help along.

Steps start after the jump

Star-divide

Steps:

  • Goto:  http://www.hardballtimes.com/main/article/databases-for-sabermetricians-part-one/ and complete everything until you get to the "Putting the "Data" in the Database" section.
  • We will use most of the steps in the  "Putting the "Data" in the Database" but instead of downloading the Databank data, download a Pitch F/X database found here.   This dataset is quite large, so I would make sure you have ~10GB free to be safe in the installation process.
  • Once the dataset is installed, queries can than be made.  Try to get the list of pitch locations (x,y data) for HRs hit by Joe Mauer off RHP in 2009.  See if you can use some of Colin's sample codes to get the data.   Here is a sample query I used to get a list of HRs hit off of Randy Johnson in the strike zone:

select a.ab_id, p.px, p.pz,a.des 
,a.pitcher ,a.batter
from atbats a 
join pitches p on p.ab_id = a.ab_id
join games g on g.game_id = a.game_id
where a.pitcher = 116615
and g.date like '2008%'   and p.px >-1.2 and p.px < 1.2
and p.pz >= 1.5 and p.pz <=3.5 
and p.pitch_type ='FA' 
and a.des like '%home%'
group by ab_id

I was thinking of just putting up the Mauer query, but I add data as people as it.  I think it is best to work through this part individually and when you get stuck ask for help.

 

  • Others or myself will help with the query, but each person needs to look around the Pitch FX dataset to see where the data (Pitch location, speed, batter, pitcher, umpire, etc) is located and how to get it with a query.  Creating the Joe Mauer query isn't the only one that we can help people with at this time.  Let us know the query you want to do and your attempt and we will help you through it.  Writing queries will be useful for all datasets, so it is best to figure it out initially.
  • Once everyone has a database setup and can make some basic queries, we will move on to installing R on the computer and linking it to the Pitch FX dataset.

There are no stupid questions here, except "What is Dayton Moore thinking?", so feel to ask anything.

 

select p.px, p.pz 

from atbats a 

join pitches p on p.ab_id = a.ab_id 

join games g on g.game_id = a.game_id 

join players pl on pl.eliasid = a.batter

where pl.last = 'Mauer' and pl.first = 'Joe'

and g.date like '2009%'

and a.event = 'Home Run'

and p.type = 'X'


Comment 39 comments  |  2 recs  | 

Do you like this story?

Comments

Display:

Thanks for putting this up Jeff

I am on a mac, so I can help anyone who is having trouble with the PC steps.

by vivaelpujols on Feb 12, 2010 12:51 AM EST reply actions  

Error populating database

When I try to populate the database with the downloaded data, I get error message referring to incorrect SQL syntax. Help?

by ktsharp on Feb 12, 2010 3:06 AM EST reply actions  

Did you unzip the data?

Can you post the exact syntax

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 12, 2010 8:00 AM EST up reply actions  

Just right click in the SQLyog interface on root@localhost

Select restore from SQLdump and point to the unzipped .sql file

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 12, 2010 8:27 AM EST up reply actions  

Shouldn't it be grouped by pitch id?

a.event = ‘home run’
p.type=‘x’

Also, the SQL dump was missing the first two games in the 2008 season and had some unimportant spring training games it as well. That’s when I downloaded it a month or two ago.

rzar.wordpress.com
draysbay.com
raysprospects.com

by RZ on Feb 12, 2010 10:42 AM EST reply actions  

It can be grouped by pitch ID, but I am hoping to just use the x,y coordinates for the final heat map

I didn’t know of any games missing except the ones played in Japan.

I will look into getting the spring training games removed.

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 12, 2010 11:07 AM EST up reply actions  

Here is how to get rid of the 2008 ST games:

delete from games where date like “2008-02%”

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 12, 2010 12:12 PM EST up reply actions  

I already got rid of them

Although I only use the dump for the 2007 season data which has a lot of errors every time I parse it.

rzar.wordpress.com
draysbay.com
raysprospects.com

by RZ on Feb 12, 2010 12:30 PM EST up reply actions  

I don't use 07 when I have enough data for 08 and 09

If not, then I’ll use 07. As long as the errors are distributed randomly, it’s not a huge deal.

by vivaelpujols on Feb 12, 2010 2:53 PM EST up reply actions  

field definitions

Thanks so much Jeff, Vivaelpujols (I am on a mac these days and you are da man!) and everyone else for making this really fun and easy! i have real limited experience with this stuff, so I appreciate the help alot. I have a pretty lame question:is there a glossary for the pitch f/x table fields anywhere? Most are quite self-explanatory, and mike fast pointed out the newer ones like pitches.sv_id, but I still get confused on pitches.×0, pitches.vx0, pitches.ax etc…thanks again for eveyone’s time and work!

by chiefjamesp on Feb 14, 2010 10:37 AM EST reply actions  

Here is a decent one from Mike Fast's website:

http://fastballs.wordpress.com/2007/08/02/glossary-of-the-gameday-pitch-fields/

Let us know if there are any other questions on the categories. Maybe my next piece should be on each field if Mikes glossary is not enough

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 14, 2010 11:22 AM EST up reply actions  

Thanks, Jeff.

Just checking in to say that I am following along here to learn / help out. I also participated in the Saberizing A Mac series, so I’ll especially try to address any Mac-specific questions.

by RedRobot8 on Feb 14, 2010 4:00 PM EST reply actions  

So I'd just like to see how many people are interested in this

Can you reply to this comment if you are interested in following this series and working through the problems that may come up? If everyone who is could also specify how far along they are in this, that would be great as well.

by vivaelpujols on Feb 15, 2010 1:17 AM EST reply actions  

interested,

got the database, not the r or the heat map know-how

by JDanger on Feb 15, 2010 9:44 AM EST up reply actions  

I'm interested

I’ve been using SQL for a while with Rotosheet and Pitch F/X. I’ve trying to learn about the graphing aspect with R.

So I’m in the same place as Danger.

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 15, 2010 2:53 PM EST up reply actions  

wow

retrosheet* lol

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 15, 2010 3:56 PM EST up reply actions  

I'm interested.

Curious, Nick, what interface are you using with MySQL on the Mac? The one “we” used during the Saberizing a Mac series or something else?

by Sky Kalkman on Feb 15, 2010 2:55 PM EST up reply actions  

agreed

Using R or Gretl is what I am curious about

Follow me at http://twitter.com/JDSussman
Remember: baseball guys... baseball...

by JD Sussman on Feb 15, 2010 4:40 PM EST up reply actions  

very interested

i am wicked psyched to follow along on my mac (using mamp&sequel pro)…i got the database going and R installed which i have no experience with, but i am psyched to learn how to use it with sql, thanks again!

by chiefjamesp on Feb 15, 2010 6:49 PM EST up reply actions  

For future reference, I figured this step was needed for reference.

It looks like we are ready to move on. Here is the query I used to get the Mauer data:

For ease of data, I removed the throws R on the requirement:

select p.px, p.pz
from atbats a
join pitches p on p.ab_id = a.ab_id
join games g on g.game_id = a.game_id
join players pl on pl.eliasid = a.batter
where pl.last = ‘Mauer’ and pl.first = ‘Joe’
and g.date like ‘2009%’
and a.event = ‘Home Run’
and p.type = ‘X’

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 15, 2010 11:14 PM EST reply actions  

got an error trying this query

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘%’
and a.event = ‘Home Run’
and p.type = ‘X’’ at line 7

by ktsharp on Feb 16, 2010 12:37 AM EST up reply actions  

I get the same one when I cut and paste.

the ’ is at an angle on the cut and paste. It is not a true quote. If you replace them all, it should run fine.

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 16, 2010 12:47 AM EST up reply actions  

got it.

you guys are a great help. sorry i am a newbie with this stuff. but very eager to learn

by ktsharp on Feb 16, 2010 1:05 AM EST up reply actions  

getting an error

I’m getting the following error when I try to replicate the Joe Mauer query:

Table ‘pbp2.players’ doesn’t exist

What am I doing wrong?

by UMASS on Feb 17, 2010 11:37 AM EST reply actions  

No

The only tables I have are atbats, game_types, games, pitch_types, and pitches.

Each time I try to download the pbp2.sql file it exits about halfway through when reading. Not sure why its doing this. I’m on a mac running MAMP and Sequel Pro.

by UMASS on Feb 17, 2010 12:21 PM EST up reply actions  

I might have to wait for Viva to reply.

Do you have plenty of memory of the computer?

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 17, 2010 12:36 PM EST up reply actions  

Yeah

Memory shouldn’t be an issue. Not sure whats up, I had no problem following the first saberizing a mac series.

by UMASS on Feb 17, 2010 2:26 PM EST reply actions  

I've never used Jeff's SQL dump, I built it up from the XML files

Can you try clearing your database, redownloading the Zip file, and then import it and tell me what happens?

by vivaelpujols on Feb 17, 2010 2:39 PM EST up reply actions  

Wasn't there a problem with the gui timing out on Macs.

  It might have to be installed via the command line like retrosheet.

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 17, 2010 2:57 PM EST up reply actions  

Same result

I’ve tried multiple times to re-download and import again all with the same result. Has anyone else with a mac had this problem or is it just my computer?

How can I install it via the command line?

by UMASS on Feb 17, 2010 3:49 PM EST up reply actions  

command line command

mysql < pbp2.sql

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Feb 17, 2010 7:28 PM EST up reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

Yahoo_full_count

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Recent_pic_pg_small Patrick Gordon

Btbpro_small Dave Gershman

Me_small Bryan Grosnick

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung

30472_1481067225243_1190689185_1381415_997334_n_small Glenn DuPaul

1mnvxku7_small joshuaworn

Set_small MattFilippi18

Photo0011_small Nathaniel Stoltz