Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: The Most Dangerous Division in Sports

A Question of Regression: Heat Maps

I recently published an article using heat maps over at Fangraphs showing differing batter strike zones, in which a question arose about Brett Gardner vs. LHP. In particular, the question asked about a single way inside pitch that Brett swung at as seen in this image:

Gardl_medium

It looks like he swings at stuff way inside, but it was just a one time deal.

The best method I could think of to deal with this problem is to regress the data by the league average for each area. This would help smooth extreme and out of place values on the heat maps. I have found that I will need to add between 20 and 30 league weighted pitches to properly regress the data.

Star-divide

With this information, what do you think is the correct way to regress the data once other variables are used? For example, what if I want to look at how one player swung on 0-2 counts during 2010? Do I use the league average data for all counts since 2007 or should I just look at 0-2 counts in 2010? I think resetting the data for each scenario would be ideal, but then I run into another problem.

 

Currently, the process of creating the heat maps takes about 3 seconds over the internet. Figuring out the data on the fly will add anywhere from a few more seconds up to 15+ minutes per map. Also, once a second person starts a process, the heat maps will then take twice as long to produce for both people. If I pre-program in a set values for all processes to regress the output to, the heat map will be created in just seconds. This off season, I plan on making this application available to the public (some people already have access to it) and am wondering how people would feel about having a faster application or a more correct image.

 

Right now, I am thinking of doing a single adjustment for each of the counts and ignoring the dates. Does this seem like a reasonable middle ground or should I be more or less stringent with the data?

 

Let me know if you need more information or need any ideas cleared up. Thanks -Jeff

Comment 6 comments  |  0 recs  | 

Do you like this story?

Comments

Display:

IMHO

I’d prefer the more correct image.

Blogger and Editor, Rational Pastime Blog. Twitter: @RationalPastime.

by J-Doug on Nov 17, 2010 11:11 AM EST reply actions  

No reason not to build both

But if I had to pick one, it would be the one that reflects the actual data.

by Graham MacAree on Nov 17, 2010 11:16 AM EST up reply actions  

Both might be ideal.

Having a selection box to regress or not regress the data may work the best. The main problem lies in if there is only one or no data to sample.

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Nov 17, 2010 1:02 PM EST up reply actions  

use a robust method

You want to use a method that doesn’t allow a single influential outlier to create bad output. That also should be faster to create on the fly.

by wcw on Nov 17, 2010 12:49 PM EST reply actions  

Our process is streamlined right now.

When it first ran it took over an hour per heat map and now it is round 3 seconds. The problem with doing it on the fly is that every pitch record (3 million currently) needs to be looked up over a hundred times (depending on how detailed the image is to be) , all the data stored and then outputted to the visual.

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Nov 17, 2010 1:08 PM EST up reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

Yahoo_full_count

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Recent_pic_pg_small Patrick Gordon

Btbpro_small Dave Gershman

Me_small Bryan Grosnick

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung

30472_1481067225243_1190689185_1381415_997334_n_small Glenn DuPaul

1mnvxku7_small joshuaworn

Set_small MattFilippi18

Photo0011_small Nathaniel Stoltz