Advanced Graphing Techniques Part 1 - Installing a Pitch FX Database on a Computer

There was discussion in another FanPost on showing people how to do some more advanced graphing techniques. I have wanted to create some heat maps myself, so I will start a series on how to do more advanced graphing and calculations.

I will step through the following 3 Parts for now and then go into other subjects.

  1. Setting up a pitch FX dataset and making a query.
  2. Installing R, linking it to the dataset and plotting some points
  3. Creating a heatmap using R

I will keep a general theme going and we can see how using advanced features gives us better results. We will follow how and where Joe Mauer hits home runs off right handed pitchers in 2009. Once everyone following the process is done, we will move to the next step. I am running on a PC, so there maybe different steps for those with Macs. Vivaelpujols will be following the series and I am pretty sure he has a Mac to help along.

Steps start after the jump


  • Goto: and complete everything until you get to the "Putting the "Data" in the Database" section.
  • We will use most of the steps in the "Putting the "Data" in the Database" but instead of downloading the Databank data, download a Pitch F/X database found here. This dataset is quite large, so I would make sure you have ~10GB free to be safe in the installation process.
  • Once the dataset is installed, queries can than be made. Try to get the list of pitch locations (x,y data) for HRs hit by Joe Mauer off RHP in 2009. See if you can use some of Colin's sample codes to get the data. Here is a sample query I used to get a list of HRs hit off of Randy Johnson in the strike zone:

select a.ab_id, p.px, p.pz,a.des
,a.pitcher ,a.batter
from atbats a
join pitches p on p.ab_id = a.ab_id
join games g on g.game_id = a.game_id
where a.pitcher = 116615
and like '2008%' and p.px >-1.2 and p.px < 1.2
and p.pz >= 1.5 and p.pz <=3.5
and p.pitch_type ='FA'
and a.des like '%home%'
group by ab_id

I was thinking of just putting up the Mauer query, but I add data as people as it. I think it is best to work through this part individually and when you get stuck ask for help.

  • Others or myself will help with the query, but each person needs to look around the Pitch FX dataset to see where the data (Pitch location, speed, batter, pitcher, umpire, etc) is located and how to get it with a query. Creating the Joe Mauer query isn't the only one that we can help people with at this time. Let us know the query you want to do and your attempt and we will help you through it. Writing queries will be useful for all datasets, so it is best to figure it out initially.
  • Once everyone has a database setup and can make some basic queries, we will move on to installing R on the computer and linking it to the Pitch FX dataset.

There are no stupid questions here, except "What is Dayton Moore thinking?", so feel to ask anything.

select p.px, p.pz

from atbats a

join pitches p on p.ab_id = a.ab_id

join games g on g.game_id = a.game_id

join players pl on pl.eliasid = a.batter

where pl.last = 'Mauer' and pl.first = 'Joe'

and like '2009%'

and a.event = 'Home Run'

and p.type = 'X'

Trending Discussions