Beyond the Box Score: An SB Nation Community

Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: NFL Week One: Previews and Predictions for all 15 games

@MacAree: @BtB_Sky @dturkenk @sabometrics Someone throw up a comment thread on BTB so we can have this discussion less confusingly?

Ok, Graham, here's your discussion thread. For those who don't stalk Graham on Twitter, these guys have been discussing the merits and deficiencies of potential Hit f/x data for judging hitters and fielders. 140 characters wasn't really enough. Let's see what happens next...

about 1 month ago Limes_125_tiny Sky Kalkman 89 comments 0 recs  | 

Story-email Email Printer Print

Comments

Display:

Ok, let's start

Hit f/x gives us the initial vector of a batted ball, in terms of velocity, field angle, and elevation angle. Will that be enough to get us accurate batted ball classifications?

by Graham on Jul 21, 2010 4:12 PM EDT reply actions  

How granular are we talking? The current FB/LD/GB schema? How many buckets are we aiming for?

And are we only considering height? Or will horizontal angle become part of a popular batted ball classification? (LDs and FBs have different angles for success than GBs. i.e. fielder gaps are different.)

by Sky Kalkman on Jul 21, 2010 4:15 PM EDT up reply actions  

As a thought, why even aim for a number of buckets?

Why not do the calculation for 1 through n buckets, where n is the number of batted balls that there’s data for, and then see which number of buckets optimizes predictive values?

by themastah on Jul 21, 2010 4:17 PM EDT up reply actions  

That's true.

Just thinking about vertical buckets (like we use now), there are certainly going to be parts of the continuum where changes don’t matter and then parts where small change make a large difference in out rate.

Like on grounders, as an example: REALLY soft is easy out for catcher. Little harder is good chance of infield hit. Harder is routine grounder. More harder will get through holes/cause errors more often… etc

by Sky Kalkman on Jul 21, 2010 4:22 PM EDT up reply actions  

My belief is that what needs to happen is to take into account the seven parameters of batted ball trajectory

(vector is three, then spin is another three, then atmospherics) and run a clustering analysis to figure out the optimal number of buckets we’d want to look at. After that, you can take any batted ball you like and do a fuzzy means to figure out what buckets it should be shared between. It’s a quasi-continuous solution.

by Graham on Jul 21, 2010 4:22 PM EDT up reply actions  

Oh, and Dan mentioned that pitch location would be important too

Personally, I don’t see it – there’s a very limited range where the ball can actually be hit, and ballistics will tell you that the starting point of an object inside a 2×2 cube won’t matter very much over baseball distances. I don’t see leaving it out as a huge deal compared to the gains from ignoring it when we’re computing outs and runs

by Graham on Jul 21, 2010 4:25 PM EDT up reply actions  

It depends

whether the different outcomes of say a good curveball vs a hanging curveball can be captured completely with the other data we’d be using. Seems like a good chance of that happening, especially if we’re using landing location

by CGlaser on Jul 21, 2010 4:27 PM EDT up reply actions  

I don't follow

Apart from location, the batted ball trajectory is ignorant of the pitch that came before, correct?

by Graham on Jul 21, 2010 4:28 PM EDT up reply actions  

Will spin on the pitch

have an effect of the spin on the batted ball trajectory? I guess that would just be captured in this spin we are trying to calculate and thus it would be unnecessary to include?

Including data about the pitch might help us assign ‘credit’ to the pitcher for a bad pitch vs the hitter for hitting a good pitch or something – which should matter at some level, right?

by CGlaser on Jul 21, 2010 4:47 PM EDT up reply actions  

I'm (fairly) sure pitch spin would effect batted ball spin

But wouldn’t change how the ball flies once contact is made and the spin for that is captured.

I agree with you that if we’re looking to apportion credit, we’ll want information about how difficult the pitch is to hit too.

by Dan Turkenkopf on Jul 21, 2010 4:49 PM EDT up reply actions  

Spin

Craig, take a tennis racket and hit a slice or a top spin shot. Same effect here. Spin is extremely important.

by JDSussman on Jul 22, 2010 10:17 AM EDT up reply actions  

Craig was asking if you know the spin off the bat, would the spin of the pitch have an effect beyond that.

I don’t think so, since any possible effects are being measured by the 6 parameters.

In other words, does knowing how the ball was hit lack any information that knowing how the ball was pitched would tell us?

by Sky Kalkman on Jul 22, 2010 10:28 AM EDT up reply actions  

Not...

Not if we’re concerned solely with what happens with the ball once it’s hit – If we’re looking to judge fielders, essentially.

If we’re looking to judge hitters or pitchers, then we have to integrate info about the pitch… And not for judging fielders, unless we find they do something differently according to the pitcher on the mound. (I’m thinking of something similar to the more arcane UZR adjustments..)

Go Twins!

by Patrick42 on Jul 22, 2010 11:57 AM EDT up reply actions  

Right, you want to know about the pitch in order to judge what the hitter does with it.

But given that you know how the ball was hit, knowing about the inputs doesn’t tell you any more about the outputs.

by Sky Kalkman on Jul 22, 2010 1:47 PM EDT up reply actions  

Well

does the spin data also give you the spin acceleration/deceleration? If not, the pitch data could help you guess that. That’s the only application I could see, though.

by themastah on Jul 22, 2010 1:51 PM EDT up reply actions  

Are you talking about spin-down

i.e., the decay of the spin rate throughout the flight of a fly ball? If so, its effect is believed to be negligible:
http://webusers.npl.illinois.edu/~a-nathan/pob/spindown.pdf

Winner, Beyond the Box Score 32 Predictions Contest, 2009

by Mike Fast on Jul 22, 2010 5:41 PM EDT up reply actions  

Yeah, I was

This article looks interesting, I’ll read it later, thanks.

If that’s the case, then I see nothing that pitch f/x should give you once the ball’s been hit about spin.

by themastah on Jul 22, 2010 8:03 PM EDT up reply actions  

I'm not sure

Since I’ve forgotten how to calculate trajectories I’m using some online calculators.

It appears that the difference between a pitch hit 2 feet off the ground versus one hit 4 feet off the ground can range from just a few feet to almost 20 feet (all else equal of course). The higher the angle, the less of a difference it makes.

by Dan Turkenkopf on Jul 21, 2010 8:03 PM EDT up reply actions  

Upon further review, I'm not sure the calculator I found was correct

It had an object with an initial velocity of 80 MPH at a 45 deg angle going 900 feet.

But the 20 ft case was roughly 230-250 ft I think. I’ll see if I can reproduce it with numbers that look a little closer to my expectations. Or god forbid, do my own math.

by Dan Turkenkopf on Jul 22, 2010 4:49 PM EDT up reply actions  

Why clustering?

Could you expand on what you mean by this?

by themastah on Jul 21, 2010 4:34 PM EDT up reply actions  

Essentially, we're looking at a 7-D matrix of all these batted ball parameters

We might find that there are patterns within the data that lend themselves to being seeds for our buckets, and there are algorithms you could run across the whole data set in order to identify how many seeds would be optimal in terms of group separation and accuracy. The last time I looked at this sort of thing was three or four years ago, though, so I don’t remember exactly how it was done.

by Graham on Jul 21, 2010 4:37 PM EDT up reply actions  

I see

The only clustering I know of is k-means clustering. But that doesn’t tell you the optimal number of clusters unless you do some sort of cross-validation.

by themastah on Jul 21, 2010 4:41 PM EDT up reply actions  

Okay, but

then it’s really the validation part that we’re more concerned of, no? I mean, that’s what’s going to decide what value of k is used, isn’t it?

by themastah on Jul 21, 2010 4:51 PM EDT up reply actions  

Sure, but I don't think that that's a particularly difficult thing to do

It’s computationally intensive, but we’d only have to do it once for all of batted-ball space. I forget the technical definitions of ‘good’ clusters vs ‘bad’, but I know that there are ways of doing it.

by Graham on Jul 21, 2010 4:53 PM EDT up reply actions  

Oh, I agree comopletely, it's not hard

but at that point, what’s the advantage of using k-means clustering? Nothing’s going to give you the advantage of skipping the validation, so most regression/learning methods will work. Some will probably work better.

I also forget how you evaluate the quality of a cluster; I’ll have to look it up. I think it has to do with how likely another test point is to fall in a cluster, and the tightness of that cluster…I don’t know, I have to look it up.

by themastah on Jul 21, 2010 4:57 PM EDT up reply actions  

After looking it up

The goal is to get as many “similar” clusters as possible…nothing is inherently good or bad about a cluster on its own, it’s a total similarity function of the entire set.

by themastah on Jul 22, 2010 8:07 PM EDT up reply actions  

As much as I think GB/LD/FB is important now, I also think that they're something we should be looking to get rid of

We don’t know how many buckets we should be using, the exact definition of the buckets, etc. Seems to imply that we should be moving towards a continuous analysis. Furthermore, hit/fx won’t solve the problem anyway, because it’s missing some of the batted ball parameters (particularly spin) which might impact run/out value significantly

by Graham on Jul 21, 2010 4:18 PM EDT up reply actions  

But unfortunately

we won’t have trustworthy landing locations until Field F/X is around and that is going to take quite a while. Hit f/x will definitely be around much earlier

by CGlaser on Jul 21, 2010 4:20 PM EDT up reply actions  

Can't bank on

Field F/X (or Hit F/X for that matter) being given away either.

I don’t know how much I trust current landing location data or whose is the best. For HRs I think Hit Tracker is generally considered tops but for other balls in play?

by CGlaser on Jul 21, 2010 4:25 PM EDT up reply actions  

I have my doubts that it ever will

Teams are having to pay a lot of money for access

by Graham on Jul 21, 2010 4:27 PM EDT up reply actions  

Most analysis of landing location is done by video

Which is currently limited to the information captured by TV cameras. There’s often no real clues as to the exact position of the ball.

Might not be an issue for analyzing batting, but probably would be for fielding.

That’s actually an important question to answer here – what are we trying to analyze? And if we optimize for one part of the game, do we improve or hurt our understanding of the others?

by Dan Turkenkopf on Jul 21, 2010 4:28 PM EDT up reply actions  

Not to speak for Graham (ok, to speak for Graham), but continuous would be best

No buckets at all

Personally, I think we’re going to bucket at some point – so we’ll want consistent definitions. I need to think more about themastah’s suggestion.

by Dan Turkenkopf on Jul 21, 2010 4:19 PM EDT up reply actions  

I haven't read the whole discussion yet, but I thought I should post this

Mike Fast showed a graph or two on Tango’s blog

http://www.insidethebook.com/ee/index.php/site/comments/launch_angle_speed_off_the_bat_trajectory/#2

It seems as though there is way too much overlap to separate things into the 4 batted ball classifications. I think a continuos model using LOESS or some thingy would be better.

by vivaelpujols on Jul 21, 2010 6:07 PM EDT up reply actions  

What applications are being considered?

I feel one of the major uses would be a replacement to wOBA. A more refined tRA/tRA*/tRAr?

by themastah on Jul 21, 2010 4:30 PM EDT reply actions  

Right now we're pretty much outcome-based or scouting-based when trying to detect changes for batters

If the 7 params Graham mentioned change beyond some reasonable fluctuation, we can infer a change in approach/talent/health I think.

by Dan Turkenkopf on Jul 21, 2010 4:35 PM EDT up reply actions  

Hmmm

Possibly…but I’m worried (without looking at the numbers) that there will be so much noise on this level that it will be difficult to tell.

by themastah on Jul 21, 2010 4:37 PM EDT up reply actions  

Understandable

But it’s probably worth trying.

by Dan Turkenkopf on Jul 21, 2010 4:38 PM EDT up reply actions  

It's everything we try to do right now with limited data.

For example, we compare this year and last year’s batted ball locations to see if someone isn’t hitting the ball as hard they were last year. Now:

With Hit f/x: measure batted ball velocity directly.

by Sky Kalkman on Jul 21, 2010 4:40 PM EDT up reply actions  

Huh?

I thought the point of tRA is to be fielding indifferent.

by themastah on Jul 21, 2010 4:33 PM EDT up reply actions  

If you know the difficulty of batted balls, then, well, you know how difficult they are to field.

It’s like the rather unpublicized PZR. UZR has to know how difficult every ball is to turn into an out. That’s how it judges fielders. Well, if you know how difficult every ball is to turn into an out, you credit/blame the pitcher for allowing that batted ball and then credit/blame the fielders from that point on (for making/not making a play).

by Sky Kalkman on Jul 21, 2010 4:38 PM EDT up reply actions  

Yep.

How much need is there to separate range and hands? Positioning is obvious, as that can be (ideally) 100% coached.

by Sky Kalkman on Jul 21, 2010 4:41 PM EDT up reply actions  

We're pretty good at separating range and hands already I think

Errors count against hands.

Obviously we’re tripped up when a player gets to a difficult ball but bobbles it, but my guess is that’s well within the noise of the stats.

Positioning versus range is the major sticking point.

by Dan Turkenkopf on Jul 21, 2010 4:44 PM EDT up reply actions  

Yep

Fielding is actually what got the whole conversation started and I think it’ll be a boon to the old metrics (more trustworthy batted ball types) until new metrics (and Field F/X) start coming along

by CGlaser on Jul 21, 2010 4:35 PM EDT up reply actions  

But tRAr is different than tRA*

The numbers for certain players have changed, and some results are outright wonky. I could give you examples if you want when work ends.

by themastah on Jul 21, 2010 5:20 PM EDT up reply actions  

Also

The regression style has changed. It now uses past data for players.

by themastah on Jul 21, 2010 5:20 PM EDT up reply actions  

Ah

I’m not really sure what he did exactly. I certainly wasn’t consulted about it!

by Graham on Jul 21, 2010 5:23 PM EDT up reply actions  

Okay, maybe I'll e-mail him

I know that the 2009 tRAr value for Pineiro definitely is not the same as the tRA* value…which is no longer available, but was certainly in the 3’s.

by themastah on Jul 21, 2010 5:32 PM EDT up reply actions  

Using past data is a really interesting topic (at least to me).

Say we observe a pitcher with a 7% HR/FB rate over 100 IP. We assume that’s not his true talent level and regress it towards 11%. How much? Not sure. Say 75%.

Now what if we also know that this pitcher posted a 7% HR/FB in 2009 and 2008 and 2007. Ignoring park effects, we’re a LOT more sure about his talent level now.

But should a metric of his 2010 performance take that into account? Say he posted a 13% HR/FB in 2007-2009. Or say he’s a rookie. Observed 2010 stats are all the same — can we judge their 2010 performances differently because of pre-2010 performance? I can see arguments in both directions.

by Sky Kalkman on Jul 21, 2010 5:53 PM EDT up reply actions  

I'm kind of against using past data as a metric

It’s fine for projection systems, since there’s “two layers of guessing” there, so to speak, but not when you’re just trying to fit the metric. Players’ mechanics can change so much from year to year. Take Pineiro, in fact. He was a mostly league average pitcher until last year, when Dave Duncan had him add a groundball repetoire. Should we still hold his past against him?

by themastah on Jul 21, 2010 6:37 PM EDT up reply actions  

Well, maybe you could adjust for significant changes.

Changes in GB rate, pitch selection, pitch movement, velocity, etc. Those things could even help define the baseline against which you regress.

by Sky Kalkman on Jul 21, 2010 6:42 PM EDT up reply actions  

I suppose one could do that, but...

I don’t think we have analysis developed nearly enough nowadays to know how much to adjust.

by themastah on Jul 21, 2010 6:45 PM EDT up reply actions  

I think

that you can’t throw out past data just because some people develop in certain ways. It seems to me that for every Piniero (and remember, nobody was sure if his GB rate would stay up after that one season) there are a bunch of guys who do not maintain their success.

It reminds me of the fangraphs pieces at the beginning of the year about every player who was in the best shape of their career or added a new pitch. Sometimes it will make a big difference but that’s just something you have to mentally account for until the data backs it up, right?

by CGlaser on Jul 21, 2010 7:11 PM EDT up reply actions  

It would be nice if we could take into account things like a very effective new pitch, though

I remember all the projection systems being totally unable to deal with JJ Putz after 2006, because they had no idea how good his new splitter was.

by Graham on Jul 21, 2010 11:50 PM EDT up reply actions  

Say you had the pitch f/x numbers

Could one do a k-nearest neighbors analysis (or something) on a pitch using a database of pitch f/x data, find the run value of that pitch using the most similar pitches, guess the frequency, and then adjust the wOBA against/whatever accordingly?

Sorry…I hope that made sense….I’m half asleep.

by themastah on Jul 22, 2010 12:08 AM EDT up reply actions  

What does...

What does “guess the frequency” mean?

I think you’d need to do some regressing as well, to really understand, that or get huge sample sizes.

Go Twins!

by Patrick42 on Jul 22, 2010 12:38 PM EDT up reply actions  

I mean

If you want to figure out the impact of a new pitch on a player’s ERA, xFIP, tRA, or something, you’d need to know how often he’s throwing that pitch.

by themastah on Jul 22, 2010 1:16 PM EDT up reply actions  

Hmmmm....

Who could have possible done such a thing?

by vivaelpujols on Aug 2, 2010 2:54 PM EDT up reply actions  

Well, you're not just arbitrarily throwing out data.

If you want to say what a player is GOING to do for the rest of 2010, you certainly want to include 2009, 2008, etc. data. But for saying what he’s done for far in 2010? We don’t have to estimate how many runs a pitcher has given up from the start of the season until now – past seasons’ data cannot improve our accuracy at measuring what we know for certain has happened this year.

by cwyers on Jul 22, 2010 12:56 AM EDT up reply actions  

And here we get in to the neverending debate between...

Is it a projection of future performance, or an analysis of the underlying true talent in a current performance?

And how different are those two things?

Go Twins!

by Patrick42 on Jul 22, 2010 12:40 PM EDT up reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?
Start posting on Beyond the Box Score »

Join SB Nation and dive into communities focused on all your favorite teams.

Connect_with_facebook

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
Real Dirty Mets Looking for a saberminded author
Baseball_small
WAR By Decade: 1871-1879
Prosser_small
Cliff Lee: No longer invincible
Paige_small
Kelly Johnson Cleared Waivers; I Think That's Weird
Jeter_06_world_series_small
Top 10 players to start a franchise with revised.
Ballgame_2006_vs_texas_revised_small
The Myth of the Spoiler Returns
Small
Denard Span's Strikezone
Small
Matusz: Danks 2.0
Paige_small
I Think I Offended Juan Pierre
Leopold_butter_scotch_southpark_small
HOF/PED Quandry

+ New FanPost All FanPosts >

Sign up for the BtB Newsletter!

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

Of course, historical trends do not always hold, as has been the case for Secret Sauce since ~2002. Take a look at the success of the Secret Sauce favorites in head-to-head competition since the playoffs expanded to eight teams in 1995.

Continue reading Weak Sauce? Secret Sauce's Predictive Capacity Wanes in Recent Years
2010 Fans Scouting Report
Plate discipline trends
What's Wrong With Mike Pelfrey?
Lightest Players in History (min 1000 PA or 500 IP)
Statistical Head Scratchers: The Sacrifice Fly
Adam Wainwrights Curve
Jose Batista Facts
A PitchFX look at how R.A. Dickey is able to change speeds with his knuckleball to be so effective
Out Rate: a simple new upgrade on OBP

+ New FanShot All FanShots >

BtB on Facebook

BtB on Twitter

RSS Feed: @BtBScore

Sky: @BtB_Sky

Jeff: @jeffwzimmerman
Steve: @steve_sommer
Dan: @dturkenk
Harry: @harrypav
Jinaz: @jinazreds
Jack: @jh_moore
Tommy R: @trancel
Justin: @justinbopp
Satchel: @SatchelPrice
Adam: @baseballtwit
Larry: @wezen_ball
Peter: @CapitolAvenue
Paul: @TheDiaTribe
Daniel: @CamdenCrazies
Matt: @devil_fingers

SBNation.com Recent Stories

SEATTLE - JULY 10:  Starting pitcher Felix Hernandez #34 of the Seattle Mariners celebrates after defeating the New York Yankees 4-1 at Safeco Field on July 10 2010 in Seattle Washington. (Photo by Otto Greule Jr/Getty Images)

MLB Power Rankings: On The Challenge Of Identifying A League's Best Pitcher

ANAHEIM CA - SEPTEMBER 08:  Jeff Mathis #5 of the Los Angeles Angels of Anaheim is mobbed by teammates after hitting a  walk off sacrifice fly to score Torri Hunter form third base against the Cleveland Indians in the 16th inning on September 8 2010 at Angel Stadium in Anaheim California.   The Angels won 4-3 in 16 innings.  (Photo by Stephen Dunn/Getty Images)

Mathis' 16th-Inning Sac Fly Lifts Angels Over Indians 4-3

Philadelphia Phillies' Jimmy Rollins, left, slides into home to score past the tag of Florida Marlins catcher Brad Davis on a single by Carlos Ruiz in the third inning of a baseball game, Wednesday, Sept. 8, 2010, in Philadelphia. (AP Photo/Matt Slocum) +2 updates

Phils Top Marlins 10-6, Jimmy Rollins Leaves Game With 'Hamstring Tightness'

More from SBNation.com >


Managers

Limes_125_small Sky Kalkman

Wbc_029_small Jeff Sullivan

Editors

Rawlings_baseball_bigger_small Dan Turkenkopf

Dayton_small Jeff Zimmerman

Aviles_small Justin Bopp

Paige_small Satchel Price

Authors

Jinaz-reds-avatar_small JinAZ

Face_small Harry Pavlidis

Newavatar_small Matt Klaassen

Wezenball-logo_small lar

Big_pun--300x300_small Tommy Rancel

Adam_small adarowski

Redcap_small SFiercex4

St_louis_cardinals_ce1141_003263_small stevesommer05

Small garik16

Julio_teheran_2_small PWHjort

Cclogo_small Daniel Moroz

Closeup4_small J-Doug

Nick_cage_small The DiaTriber