Regress, Regress I Say! (Complete with In-Season Batter Regression Tool)
(Thanks to J-Doug and commenter hairball for helping me find an error in the calculator. Data and post have been updated accordingly)
So much is driven early on by small sample sizes that it can be easy to get overly excited about a batters hot or cold start. After only about 15-20 games, all sorts of weird stat lines can pop up. Maybe a player's performance is a reflection of a change in approach, leading to a change in performance (e.g. Jose Bautista). Or maybe it's just a reflection of random chance.
To gain some perspective on the early returns this season I decided to regress all batters with >=40 plate appearances by their 3-year average batting average on balls in play (BABIP) and their 3-year average HR/FB rate. The chart above presents the top-15 and bottom-15 batters in terms of the difference between their actual wOBA this year and what we would expect given their 3-year averages in BABIP and HR/FB.
Not surprisingly, we see Russell Martin high up on the list in terms of beneficial difference with a wOBA .148 points higher than his regressed wOBA. Pat Burrell is off to a .391 start, but if we regress him he may be more of a .290 wOBA hitter given his BABIP and HR/FB rates.
(More analysis and calculator below)
Brad Hawpe, who has been off to just a miserable start, comes out with a respectable .321 wOBA--a difference of -.169 (second largest negative difference of all players). Hamley Ramirez comes in fourth in terms of least beneficial difference with an expected .399 wOBA (-.142 difference).
Now, just because regressing a batter based on their 3-year averages show them to be very high or low relative to what they've shown so far we can't just assume it's luck. For example, Pablo Sandoval should technically be in the .319 wOBA range, but we all know that Pablo this year is drastically different from a year ago. Will he end the year with a .400 wOBA? That I can't say, but we have to interpret the data in context and it's pretty likely he'll be a better than league-average hitter by the end (say, .340-.360).
I put together an In-Season Batter Regression Calculator for all to use throughout the season (you can find it here). It will likely be more useful early on until player performance stabilizes, but it will work at any point in the season.
Simply type in the first and last name of the player (all batters with >=40 plate appearances where I have 3-year average data are included) as well as a few bits of data (basically FanGraphs standard dashboard plus FB%). The calculator will compute their adjusted actual wOBA (just using the coefficients I used for consistency's sake) as well as what we would expect based on BABIP and HR/FB over the past three seasons.
The tool simply takes what a hitter has done to this point and recalculates their performance based on how many non-HR's they should have (based on 3YR BABIP) as well as home runs based on 3YR average HR/FB ratios. It doesn't necessarily predict where a player will finish the year, but it does give some sense of how much a player's performance is out of whack relative to those averages.
I'll be doing more with this tool in later posts. For now, let me know what you think and any suggested tweaks.
27 comments
|
0 recs |
Do you like this story?
Comments
If Sandoval's expected wOBA is .272
Then your numbers are wrong.
"Today I flew the most poorly dressed bad-ass that has ever entered my jet. And he borrowed my pen to do a cross word puzzle." - robotsapproach on Brian Wilson.
For reference purposes (from FG)
Sandoval’s career wOBA: .359
Sandoval’s wOBA in his “disaster” 2010: .314
Sandoval’s ZiPS® for 2011: .357
Sandoval’s ZiPS(U) for 2011: .364
"Today I flew the most poorly dressed bad-ass that has ever entered my jet. And he borrowed my pen to do a cross word puzzle." - robotsapproach on Brian Wilson.
It's a straight calculation based on 3YR average BABIP and HR/FB
And it’s best on how many flyballs he’s hit this year so far and the distribution of singls, doubles, and triples to date.
And as I mentioned, it doesn’t mean he should have a .272 wOBA, just that if you apply his 3-year average BABIP and HR/FB to his plate appearances so far it would look like this. But it doesn’t provide any context, which is why he likely isn’t actually a .272 wOBA guy overall.
Columnist at Beyond the Box Score
.272 is nearly .100 points off from ZiPS
and is .42 belos the lowest wOBA he’s ever posted. Seems like useless, faulty data in predicting regression. Either that, or use different terminology, because your “wOBA” and the one ZiPS and FG are using are not apples to apples.
"Today I flew the most poorly dressed bad-ass that has ever entered my jet. And he borrowed my pen to do a cross word puzzle." - robotsapproach on Brian Wilson.
*below
"Today I flew the most poorly dressed bad-ass that has ever entered my jet. And he borrowed my pen to do a cross word puzzle." - robotsapproach on Brian Wilson.
Also, meant .042
"Today I flew the most poorly dressed bad-ass that has ever entered my jet. And he borrowed my pen to do a cross word puzzle." - robotsapproach on Brian Wilson.
It's just a different method for getting a handle on what a guy might look like right now if certain rates are different
And wOBA is calculated the same except that the coefficients are a bit different, but not so much you’d notice.
Columnist at Beyond the Box Score
.100 points of predicted wOBA difference is QUITE noticable
"Today I flew the most poorly dressed bad-ass that has ever entered my jet. And he borrowed my pen to do a cross word puzzle." - robotsapproach on Brian Wilson.
Given what he's done so far, and given his 3-year averages, this is what his stats might look like
That isn’t to say they’ll hold for the next 5 months.
Columnist at Beyond the Box Score
He’s only played 2 full seasons though. Is your projection assuming that his production will keep plummeting like it did last year?
Nope
Since he only had two seasons his BABIP and HR/FB average are just based on those two years (this is a straightforward average calculated by FanGraphs—no weighting, etc).
That being said, that is why I mentioned context above—his numbers will likely be better if for no other reason then last year was likely a fluke compared to his true talent.
It’s also why this isn’t meant to be a fully predictive tool—just offers some context based on what hitters’ peripheral numbers have been historically and what they are to this point this year.
Columnist at Beyond the Box Score
To be more specific
What’s driving it right now is that his 3-year HR/FB ratio is about 10% but so far this year he’s at 21%. That’s a difference of about 3 home runs. His BABIP is 15 points higher, but generally he’s had a high BABIP (.325).
Columnist at Beyond the Box Score
You're messing up somewhere
Pablo Sandoval should technically be in the .270 wOBA range
That’s just wrong, and should be taken out of the article.
"Today I flew the most poorly dressed bad-ass that has ever entered my jet. And he borrowed my pen to do a cross word puzzle." - robotsapproach on Brian Wilson.
Think you are taking this a bit too much to heart
Just based on the straight calculation, yes, that’s where he should be. Like I said, his HR/FB rate seems to be driving it right now.
Columnist at Beyond the Box Score
Then the straight calculation you are using is not wOBA-related
There is no realistic scenario where a sub-.300 wOBA is “where he should be.” This is not me being a Pablo fan or a Giants homer. This is a stats discussion where I’m wanting to correct a glaring error.
You are using your numbers in comparison to wOBA. If your numbers are not the same wOBA formula, then it’s not apples to apples. If your numbers ARE essentially the same, then you calculated something wrong. It’s simple.
"Today I flew the most poorly dressed bad-ass that has ever entered my jet. And he borrowed my pen to do a cross word puzzle." - robotsapproach on Brian Wilson.
It is the same wOBA formula
What’s different is how many singles, doubles, triples, and home runs he has versus how many he likely would have given his 3-year BABIP and 3-year HR/FB rate at this point in the season.
Columnist at Beyond the Box Score
Just please do me a favor and check your numbers that you used on all of those with him
"Today I flew the most poorly dressed bad-ass that has ever entered my jet. And he borrowed my pen to do a cross word puzzle." - robotsapproach on Brian Wilson.
yes, please, i know it would help make hairball's year
"If we hit that bull's eye, the rest of the dominoes will fall like a house of cards. Checkmate"
How about you calm down a little
And let him write his articles however he pleases with his methods. For reference how would you like it if someone walked up to you at work who didn’t know you and said to change something on your work because they thought it was wrong. It would be stupid.
"If we hit that bull's eye, the rest of the dominoes will fall like a house of cards. Checkmate"
I have to agree with hairball
In that I don’t quite understand how his expected wOBA could be .270 regardless of how you calculate it. His career wOBA is .359. League average is usually around .335. His regressed wOBA should be in the low .340s. However you calculate it, how you come up with a difference of .07?
Blogger and Editor, Rational Pastime Blog. Twitter: @RationalPastime.
J-Doug and hairball: Sure thing, happy to explain
This assumes the same distribution of non-HR’s per hit as he has so far this year.
Actual (through Monday night):
H: 22
1B: 15
2B: 2
3B: 0
HR: 5
Estimated based on 3-year BABIP (.325) and HR/FB (10%) (from FanGraphs):
H: 15
1B: 11
2B: 2
3B: 0
HR: 2
The difference in non-HRs make sense given his .340 vs. .325 BABIP as does the 5 vs. 2 HR’s.
This is just one way to approach it, though. I am sure there are others and, again, this isn’t necessarily predicting what the end of the year will look like but rather what we might expect the batter’s line to look like if we applied the 3YR BABIP and HR/FB to what the batter has done so far this year.
Columnist at Beyond the Box Score
So, because his HR/FB rate is higher you're not counting 3 of those 5 HRs as expected hits?
It makes sense that you wouldn’t count them as HRs, but it doesn’t make sense you wouldn’t expect them to be hits. If your HR/FB rate goes down it could be due to homers turning into outs, but also doubles.
I can’t see any reason why regressing a .340 BABIP to .325 would take away 7 of 22 hits.
Blogger and Editor, Rational Pastime Blog. Twitter: @RationalPastime.
For simplicity's sake, that's what I did
I suppose one way to handle it is to take half the difference in HR’s and treat them as hits, but that strikes me as just as bad or worse.
I did find an error—thanks to you both for pushing me—and with the adjustment Sandoval has an adjusted wOBA of .318.
Columnist at Beyond the Box Score
Glad you're receptive to constructive criticism
Even if I disagree with the methodology
"Today I flew the most poorly dressed bad-ass that has ever entered my jet. And he borrowed my pen to do a cross word puzzle." - robotsapproach on Brian Wilson.
In other words, I think you're discounting hitting events twice
By adjusting for both HR/FB and BABIP. I hope you’re not treating those two stats as independent of one another.
Blogger and Editor, Rational Pastime Blog. Twitter: @RationalPastime.

by 
































