What factors have an effect on runs scored at MLB Parks?
Question: What factors have an effect on runs scored at MLB Parks?
Why I asked the question: I was trying to find why Chase Field had such a high Park Factor over the years and the research expanded to all stadiums from that point.
Analysis: I did some research on effects on all MLB and MiLB stadiums (link to spreadsheet). After it was published, someone brought up that Chase Field had such a high Park Factor and elevation and temperature could not explain the entire difference. Being unable to tell if it was the lower humidity (as some suggested) or park size, I decided to run a multiple regression for the average park factors over the last 3 years (2006 to 2008) against elevation, average temperature, average humidity, park size (RF, RC, CF, LC, LF), average wall height, errors per game, wind direction, surface type and foul territory area.
Note: Sorry for the pink images that are hard/impossible to read. I seemed to hit a limit on html characters because of the number of graphs I used. If you want to be able to read them better, the same article is on my blog.
Explanation of source of data:
Park Factors – I originally used runs scored per game because of the only Park Factors I could find did not go to the decimal point. After my original article ran, I got flooded with many different sets of data and am using Patriot's numbers from his website (htttp://gosu02.tripod.com/id103.html)
Elevation (ft)- Collected from the List of Major League Baseball stadiums on Wikepedia – Elevation is a major factor in determining the distance a ball travels and the time the defense has to react to the ball once it is in play.
Temperature (degrees F)– Collected from the retrosheet database – last 3 years, except only 1 year's worth of data for Washington. The higher the temperature the farther a ball will travel
Relative Humidity (percentage) – Average values from April to September from websites BBC.com and CityRating.com Humidity is not supposed to have much of an effect on the distance a ball travels, but maybe that that small differences will explain some differences.
Dimensions (ft) – Taken from Wikepedia. Only 5 sets used – LF, FC, CF, RC and RF. Originally used total area from ballpark, but I found out that even though these two stadiums that had about the same area the stadium shaped like #1 below had a higher park factor:
Stadium #1 360-380-400-380-360
Stadium #2 380-380-380-380-380
Park Foul Area (ft squared) – Areas were calculated by Mitchel Lichtman. The larger the foul area, more foul balls will be caught, therefore less runs scored
Wind strength and direction (mph)- I used retrosheet data from the last 3 years (1 year for Washington). The data from Retrosheet comes in the for of 8 different directions. From these different directions, I created the following matrix:
I multiplied the X and Y values by the wind speed, added all the wind values up for each component and then divided by the number of games. Y component is a wind blowing out to CF, while the X component is a wind blowing to RF.
Question: When collecting this data, I found there was no wind blowing in form right field and I thought I had made a mistake somewhere. I searched on the games database for hat wind direction and the most recent case was in 2003. Has the wind not once over 5 years blown in from 1 game from RF? Is there some unspoken rule that the scores don't mark it this way?
Opponents Errors per game – I was looking for a way to measure how tough it is to play in a Stadium (i.e. Fly balls in Metrodome). The best metric I could come up with is to average the amount of errors the opposing team has per game.
Playing Surface – The three stadiums with Turf were given a value of 1 and the rest 0. Being that it was the new Field Turf, I wondered if runs scored might go down because the balls hit would be slower than "AstroTurf" and less weird bounces.
Average Wall Height (ft) – Averaged the values ballparks.com
Now it is time for a few graphs that show the data collected..
Note: Data on the Washington Nationals is only from 2008 since they just moved into a new park this last year.
The initial data is for each of the major league parks is in the following table
Table 1 Park Factors and Park Characteristics
Table 2 – Natural Factors and Errors
I ran a regression analysis on the data to get an equation that uses the preceding data.
The regression equation ended up having an R-squared of 0.714 and the Standard Deviation of the difference of the initial Park Factor and the final Park Factor was 0.0178.
There was two problems with that initial equation:
-
The variable for wind blowing to CF was negative, therefore the more the wind was blowing out, less scoring that would be. That just defies all logic, so I threw both the Wind Components out for the next round of analysis
-
The variable for Wall Height was positive, meaning the higher the wall, the more runs that are scored. Home runs score more runs than doubles, so I decided to remove Wall Height also.
After rerunning the regression after removing Wall Height and Wind, I got the following equation Standard Deviation of 0.0.0184 and R-squared of 0.692:
Park Factors = Away Teams Errors per game * (0.016) + % Relative Humidity * (-0.0012) + Foul Area * (-0.00000061) + Elevation * (0.000021) + Average Temperature * (0.00077) + Left Field * (-0.0010) + Left Center Field * (-0.00063) + Center Field * (-0.0010) + Right Center Field * (-0.00020) + Right Field * (0.0011) + 0.0090 (if Surface is turf) + 1.7056
Here is a simple chart of the factors for easy comparison of the factor and how much effects the park factor and run scoring environment.
Table 3. Amount each factor has on Park Factors and Runs Scored (9.54 runs per game was the average runs scored by both teams over the past 3 years).
|
Factor |
Change in Park Factor |
Change Runs Scored per game (9.54 runs per game) |
|
10 degree F increase |
0.0077 |
0.073 |
|
Increase in RH by 10% |
0.012 |
-0.115 |
|
10,000 sq ft increase in foul area |
-0.0061 |
-0.058 |
|
Surface is Turf |
0.0090 |
0.085 |
|
1000 ft increase in elevation |
0.0206 |
0.196 |
|
1 Errors for Away Team |
0.016 |
0.150 |
|
10 ft increase in LF |
-0.0100 |
-0.095 |
|
10 ft increase in LC |
-0.0063 |
-0.060 |
|
10 ft increase in CF |
-0.0101 |
-0.096 |
|
10 ft increase in RC |
-0.0020 |
-0.019 |
|
10 ft increase in RF |
0.0106 |
0.101 |
As it can be seen, each factor can significantly effect the runs scored. The following table is the original and final numbers for each of the ballparks. I also have added a column of combined stadium attributes (Dimensions, Foul Area and Surface Type) added to the equation's constant value to help to show which stadium designs lead to more runs.
The regression equation is able to predict some stadiums run production quite well. Here is a table where the regression was able to predict the Park Factor within 0.01.
I grouped the parks that exceeded the Standard Deviation of 0.0184 These are the stadiums that the factors I am using can't explain the runs scored at that stadium.
Using the preceding data we can do analysis on future parks. I will pick the Met's new stadium, Citi Field. Most of the natural effects will be the same and the errors aren't know yet, but we can look at the dimensions and foul area to come to some conclusion.
|
Feature |
Shea Stadium |
Change in PF |
Citi Field |
Change in PF |
Difference (Citi -Shea) |
|
LF |
338 |
-0.34 |
335 |
-0.33 |
.01 |
|
LC |
371 |
-0.24 |
379 |
-0.24 |
0 |
|
CF |
410 |
-0.41 |
408 |
-0.41 |
0 |
|
RC |
371 |
-0.07 |
383 |
-0.08 |
-.01 |
|
RF |
338 |
0.36 |
330 |
0.35 |
-.01 |
|
Foul Area |
25665 |
-0.0156 |
20900 |
-0.0127 |
-.003 |
|
|
|
|
|
Total |
-0.013 |
The new Mets stadium looks to allow less runs per game than the previous one. If you used the 9.54 runs per game environment, it would allow 0.12 runs less per game or about 10 less runs over the entire 81 home games.
I have had a lot of help putting this study together and special thanks to Mitchel Lichtman and Patriot for providing and data and to Sky Kalkman for his many suggestions. I hope the data gives people more of an insight to various variables that go into a stadium and how much of an effect each variable has on the run scored environment.
Extra information for those that want to do their own regression analysis.
You could run your own regression using LINEST() in OpenOffice by using the data I have collected. I have the Spreadsheet for download and by inserting your own park factors into the spreadsheet, it will calculate the values for you.
Note: LINEST() puts the equation values in the opposite order they occur in the table and the 3rd value down on the left is the r-squared value.
Instructions:
-
Open the spreadsheet in OpenOffice Calc. I use OpenOffice because it's free for everyone and creates the variables simply.
-
Inset your values for the various teams into Table 1, 2, 3 and 4 under Park Factor columns. The cells for LINEST() will automatically update using the numbers.
-
Copy all the LINEST() values and paste them into the area after Table 2. The upper left hand corner of the original data should be pasted on the cell that has a border. See following image.

-
All the values will be automatically updated in Table 2
-
Do the same with Tables 3 and 4, but they don't contain Wall Height and Wind factors.
3 recs |
27 comments
Comments
Very interesting
One thing to consider might be dimensions at certain points at the field (i.e. down the lines, power alleys, dead center, etc.) Maybe average time of game played? Is it easier to play in the daytime or at night?
Another thing that is probably very hard to impossible to measure is the effect of the “batter’s eye.”
---
Juuuust a bit outside!!
http://www.rightfieldbleachers.com
by Jack Moore on Dec 17, 2008 2:24 PM EST reply actions 0 recs
Good ideas
especially the dimensions. I will take out total area and see if they give a better regression line.
by Jeff Zimmerman (TucsonRoyal) on Dec 17, 2008 2:59 PM EST reply actions 0 recs
Awesome stuff. Still digesting.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Dec 17, 2008 5:48 PM EST reply actions 0 recs
One quick comment.
You NEED to use park factors. You want to test only the parks, not the parks plus home-team offense + home-team pitching.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Dec 17, 2008 8:05 PM EST up reply actions 0 recs
I agree ....
… I can only find them in decimal format not in single digits. I did want to do the amount of runs per effect, so if it is 30 degrees cooler in the World Series, the amount of scoring will go down ~1.5 runs/per game.
by Jeff Zimmerman (TucsonRoyal) on Dec 17, 2008 9:11 PM EST reply actions 0 recs
Interesting to think about...
Maybe this is why supposedly “Pitching wins in the playoffs”? Maybe because it is colder so it is harder to score. Does that give an advantage to teams with good pitching? It would seem so.
by Brendan Scolari on Dec 18, 2008 9:01 AM EST up reply actions 0 recs
The sign on Fair Area seems wrong
The more territory to be covered (i.e., the more territory that will not be covered by the same nine people) the fewer runs scored?
Waiting to see what dimensions produces.
by klhoughton on Dec 17, 2008 10:45 PM EST reply actions 0 recs
More fair area = futher fences = less HR's
Less points scored at Dodger Stadium than Minute Maid Park
by Jeff Zimmerman (TucsonRoyal) on Dec 17, 2008 11:06 PM EST up reply actions 0 recs
Summar
Parks that produced more runs than the model predicts:
Reds
Rangers
Phillies
White Sox
Tigers
The Reds and Rangers, Tigers, and to a lesser extent the Phillies had much better offenses than defenses, increasing run-scoring at home unrelated to park effects. The White Sox, however, had an ok offense and good pitching. Also, the Tigers moved their fences in the past few years; is the data up to date?
Parks that produced fewer runs than the model predicts:
Padres
Diamondbacks
Dodgers
The DBs had the best pitching in the majors, and a poor offense, which would explain them on this list. But I don’t think that’s the case for the Padres and Dodgers. It’s interesting that Dodgers appear here, because their stadium has historically played like a pitchers’ park, but hasn’t over the past few years. Could there have been a change made recently to bump up scoring that’s not reflected in TR’s study?
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Dec 18, 2008 11:37 AM EST reply actions 0 recs
Comment I got from an anonymous source:
“Factoring in the wind might be worth looking into. I work in Philly and I know from being there 70+ games a season that the wind plays tricks with the ball in that park. I’m convinced the reason it does it is because the open concourse on the lower lever creates some kind of jet stream effect, especially along the foul lines out toward the opposite field (1B line/LF, 3B line/RF). I think it would also explain the problems that visiting CFs seem to have with fielding fly balls to the little triangle in deep left center. The wind seems to swirl around in that one little area all the time.”
I finally found a fairly ghetto way of determining wind direction. I will still take a while to incorporate, but should have something by the end of the weekend.
I took the errors of opposing team (toughness of park) and the r-squared jumped to .68 with a S.D. of 0.38
I have found park factors for wOBA and get similar numbers as this final one with RunsScored
by Jeff Zimmerman (TucsonRoyal) on Dec 18, 2008 11:56 AM EST reply actions 0 recs
I have runs park factors to four decimal places.
Send me an email and I’ll forward them to you: skyking162@gmail.com
Thanks to Patriot.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Dec 18, 2008 12:11 PM EST up reply actions 0 recs
Tuscon's source:
Shane Victorino.
by cherub_daemon on Dec 19, 2008 12:30 PM EST up reply actions 0 recs
Another idea...
How about using the factors you already have to predict not just overall runs park factors, but piece-meal park factors, like HR-factors, BABIP-factors, (2B/3B)/BIP-factors, etc.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Dec 18, 2008 5:28 PM EST reply actions 0 recs
This seems interesting, and might explain some of the large deviation teams.
As limiting examples, consider first Team A, whose hitters do only 2 things: walk and strike out. They’re not going to be affected at all by the park. However, Team B’s hitters all hit the ball to exactly the same spot, 385 feet away, every pitch. They’re totally at the mercy of the park.
If you can predict lots of offensive components, but not offense, then it could be that the deviation from your model is simply a result of the home team’s offensive idiosyncracies.
by cherub_daemon on Dec 19, 2008 12:41 PM EST up reply actions 0 recs
I'm a little surprised that an increase in RH decreases offense
Doesn’t that make the air less dense?
I would also think humidity would wear on pitchers more than hitters.
You might want to use area0.5 since this would represent an effective distance to the home run fences. Similarly if you use wind velocity, you might want to use velocity2 since this has some physical significance in many drag equations.
by Edgar for Pres on Dec 25, 2008 12:36 PM EST reply actions 0 recs
Good points
Also temp and rh may be correlated. You may want to look into that
Children, until we have taught them better, will be perfectly happy with a seasonal round of games in which conkers succeeds hopscotch.
by salb918 on Dec 25, 2008 6:30 PM EST up reply actions 0 recs
Possible explanation.
The only two parks with below 50% RH are Arizona and Colorado, two of the best hitters’ parks in the majors. So the regression might be using RH as a proxy for getting those two parks to be sufficiently helpful to hitters, especially since every other park is in a close range of about 55% to 65%.
The same effect might explain why wall height (TR’s done a revised study yet to be published as it’s awaiting more statistical scrutiny) correlates with MORE run scoring. Parks with a REALLY high fence (like the Green Monster) tend to have that fence extremely close to home plate. So wall height becomes a proxy for an outlier in wall distance. Maybe.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Dec 25, 2008 7:02 PM EST up reply actions 0 recs
Answer to some questions
RH – From what I have read RH should have no effect on on distance the balls travels. What I have seen quoted a couple times is that a dryer ball comes off the ball faster therefore the players have less time to react. I have not been able to find any actual study/article to reference this to.
Park Size – I am now using the 5 OF park measures (lf,lc,c,rc,rf). It seams that the parks that are short in the corners and deep in CF score more runs.(ie. a park at 360, 400, 440, 400, 360 scores more than one where all distances are at 400 even though they both average 400).
Velocity *2 — will look into it, but wind is one of the least significant factors.
RH vs Temp - I ran a correlation and they have an r-squred of .0713. Absolutely no correlation.
Wall height and wind - Like increase wall height decreases scoring, wind blowing to CF decreases scoring. Again, wind is not a very significant factor.
I am still looking into any significant measurable factor I can use. Currently I am at an R-squared of .727 which is pretty good, but looking to get it better.
by Jeff Zimmerman (TucsonRoyal) on Dec 25, 2008 11:38 PM EST up reply actions 0 recs
a little bit of physics (and cricket)
So – more humid (and also, to an extent, colder) atmospheres are denser atmospheres. This creates more drag on a ball (the ball has to ‘work harder’ to push through the fluid (nb gasses are fluids)). now, you’d think that this will make a ball slow down more quickly, and you’d be right.
however, there’s another factor, too – the cut, or curve, or whatever on a ball will be increased in a denser atmosphere. so you’d think that pitches (at least initially, before batters get a chance to adapt) would be harder to hit. or maybe harder to control for the pitcher? i don’t know how to throw a sinker ball, but i’d imagine there would be more ‘bite’ on these, too
this is something that we see in cricket, too – when the clouds are low and there’s some humidity in the air, the ball swings around like no-one’s business.
as for the velocity*2 factor – it should be velocity^2 (squared), at least for the first iteration.
I have no solutions, just rejoindres
by alea iacta est on Dec 30, 2008 6:05 PM EST up reply actions 0 recs
Agree
I tend to think like this. The higher the humidity, the greater
the chance that park is close to the ocean . The closer to
the ocean, the lower the elevation. The lower the elevation,
the denser the air.
"Evolution happened, now get over it." Michael Shermer
by rodcarew on Jan 1, 2009 1:24 AM EST up reply actions 0 recs
Humidity makes the air lighter ...
… therefor the ball should travel farther, but the humidity seeps into the ball, making it heavier, it travels less.
According to the book, The Physics of Baseball:
"The humidity, per se, has little effect on the ball’s flight. Indeed, since water vapor is lighter than air, if all factors are the same, a ball will travel slightly farther with the humidity is high. The humidity however effects the weight and elasticity of balls in storage. Balls stored under conditions of high humidity will gain some weight through the absorption of water from the air and their elasticity will be reduced."
by Jeff Zimmerman (TucsonRoyal) on Jan 1, 2009 10:19 AM EST up reply actions 0 recs
Park Factor Spreadsheets
I am guessing you are the same guy who tested the wOBA park factors that I posted a link to on Tangotiger’s blog. I have since scrapped the idea of a specific wOBA park factor when I figured out how to park adjust wOBA so that it was equivalent to park adjusting Linear Weights. Anyway I just updated my Park Factor spreadsheet. It has unrounded PF’s for all teams from 1871-2008. To get the unrounded values you need to export it as a XLS. The spreadsheet also has all the PF data that I used in the calculations. Warning, these are run-based PF’s. Brian Cartwright has a spreadsheet floating around that has component Park Factors (1B, 2B, 3B, HR, BB, etc…) for all parks from 1954-2007. His spreadsheet is the second link. MGL also posted component park factors last year. His spreadsheet is the third link.
http://spreadsheets.google.com/ccc?key=pzy9IhjJPqasyNfGRqHZrUQ&hl=en
by terpsfan101 on Dec 26, 2008 5:51 AM EST reply actions 0 recs
Yes, I am the same one ...
…. I usually go by name Jeff over there. I am about ready to just post the spreadsheet myself so everyone else can then test whatever they want. I have it correlating to P.F. pretty good but working on getting the numbers back to runs so it is terms that most people can understand better. wOBA is all good, but when I have to have them read 10 pages of a book to understand it, I wish to keep things simple if I can.
by Jeff Zimmerman (TucsonRoyal) on Dec 26, 2008 11:45 AM EST up reply actions 0 recs

by 



















BtB on Facebook














