Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: PHOTOS: Mike Moser's Dunk Face Is Spectacular

Some odd results from a multivariate regression

After I wrote this post on my blog regarding FIP and it's correlation to UZR, I got an idea to run a multivariate regression using each position independently (excluding C and P, since there's no UZR data for them), and I got some interesting results, and I'm not really sure if I can either (A) trust the data or (B) if I'm interpreting it correctly, so I thought I'd post it here.

Dependent Variable: TotalRunDiff (or TRD) = (IP/9) * (FIP - ERA)

This is the difference in earned runs projected by FIP and actual earned runs.

Independent variables: 1Buzr, 2Buzr, 3Buzr, SSuzr, LFuzr, CFuzr, RFuzr

The UZR for each team by position.

I input data for all 30 teams in 2008.   Here's the equation the regression analysis spit out:

TRD = .048 + 2.12*1Buzr + (-.10)*2Buzr + 1.60*3Buzr + .70*SSuzr + .02*LFuzr + 1.66*CFuzr + .60RFuzr

The correlation was pretty strong; r = .8063.

This seems to imply that the most important positions, in order, are 1B, CF, 3B, SS, RF, LF, and 2B, with good defense at 2B actually having a slightly negative effect on a team (which doesn't make any sense, but this is why I will run more regressions on other seasons besides 2008).

Just wondering if anybody had any input on this.

 

Comment 18 comments  |  0 recs  | 

Do you like this story?

Comments

Display:

The second base data could have been thrown off...

If teams with good 2B UZR in 2008 all had bad TRD, it could adversely affect your data. Of course, this would mean that most of the bad TRD teams had good UZR , and that good TRD teams had bad UZR. I don’t think it accurately reflects 2B’s impact on TRD.

A larger sample size could fix this problem.

by NoNameOnCard on Mar 4, 2009 2:54 PM EST reply actions  

Where did you get your positional team data?

The Fangraph leaderboards don’t divide up production between teams for a given player. Did you go team by team?

Beyond the Boxscore // Calling BJ Upton lazy is lazy.

by Sky Kalkman on Mar 4, 2009 2:59 PM EST reply actions  

I went to fangraphs => teams => fielders => position

I guess that might mess things up a little bit, but I don’t think it would be that large of an issue, would it?

---
Juuuust a bit outside!!
http://www.rightfieldbleachers.com

by Jack Moore on Mar 4, 2009 3:03 PM EST up reply actions  

So you downloaded seven sets of data?

Yeah, that should work just fine.

Multiple seasons would be good, obviously.

Beyond the Boxscore // Calling BJ Upton lazy is lazy.

by Sky Kalkman on Mar 4, 2009 3:27 PM EST up reply actions  

One thing to look at is the Standard Deviations between datasets (2B UZR, etc)

For example if the 2nd base the numbers are near 0 could mean that all team’s are getting the same play from their 2nd basemen, so it’s value doesn’t really matter.

I am wondering how well the positional S.D. correlates the positional multiplier in your equation.

by Jeff Zimmerman on Mar 4, 2009 3:34 PM EST reply actions  

SDs by position:

1B. .|..2B…|..3B|.SS..|….LF….|..CF..|…RF
5.68 | 8.48 | 9.8 | 9.83 | 11.22 | 10.61 | 15.63

---
Juuuust a bit outside!!
http://www.rightfieldbleachers.com

by Jack Moore on Mar 4, 2009 6:03 PM EST reply actions  

Just nothing there

I was also thinking the left side might be more important since most people are right handed, but it only applies to infield, but not to outfield.

I also looked at chances and that doesn’t help explain 2nd
“first and third baseman get around 1.5 chances per game, CF, 2B and SS, 2.5, and RF and LF, 2.0.” -MGL

If you remove 2nd base from the regression, what happens to r-sqared?

by Jeff Zimmerman on Mar 4, 2009 6:41 PM EST up reply actions  

Sometimes multiple regression is simply wrong.

It’s a very crude tool.

Some suggestions, however:

  • Use multiple years of data.
  • Look at all runs, not just earned runs.
  • Consider removing the constant.

by cwyers on Mar 4, 2009 8:43 PM EST reply actions  

p-values

You might consider double-checking the p-values of each individual term to see if any (i.e. 2B) could be considered insignificant contributors to the dependent variable. Just a thought…

by jrfischer on Mar 5, 2009 11:53 AM EST reply actions  

wait

did you run a regression on 7 independent variables using 30 observations?

by Matt Swartz on Mar 5, 2009 9:00 PM EST reply actions  

hello?

just to clarify, running a regression with seven independent variables and for only thirty observations is useless. if that’s what you did, it’s not even worth analyzing this. you might as well just summarize the individual players. for seven regressors, you should have 150-250 observations to be safe, i’d say. nothing much short of that.

by Matt Swartz on Mar 7, 2009 10:13 AM EST up reply actions  

So, 5-8 seasons' worth?

UZR’s available for seven at Fangraphs, right?

Beyond the Boxscore // Calling BJ Upton lazy is lazy.

by Sky Kalkman on Mar 7, 2009 10:32 AM EST up reply actions  

OK, when I get a chance I’ll add the other seasons. Might not be for a bit as I have a packed week coming up.

---
Juuuust a bit outside!!
http://www.rightfieldbleachers.com

by Jack Moore on Mar 7, 2009 1:31 PM EST up reply actions  

datum

It looks to me more like a measure of the variability in quality of the defender between teams at that position – rather than importance of the position.

1B can be slow-footed non-athletes, or extremely athletic. >>> 2B are remarkably similar, athletic, good glove – average number of plays handled.

Go away! Guys, you're gonna wake up my Mom!

by David Howards Legacy on Mar 6, 2009 4:15 PM EST reply actions  

Actually

Looking at the spread of UZR talent by position, OF is by far the most varying.

vivaelbeñsheets

by vivaelpujols on Mar 9, 2009 10:27 PM EDT up reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
Prince Fielder in Comerica Park
Crystal_ball_small
Sparky vs Buck
Img_3830_small
BtBS Fantasy League
Small
Context Neutral Run and RBI projections
Small
Free Agent Compensation
Img_0001_small
Value of Various Plate Approaches
Strike_three2_small
Effect of Foul Area on Strikeouts: AL 1954-68: Erratum
Small
Baseball on a stick
Small
Player Evaluating Statistic
Baseball_small
Rays Outfield: Cheap but Extremely Productive

+ New FanPost All FanPosts >

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Picture-6_small Chris St. John

Btbpro_small Dave Gershman

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung