Question:"> What factors have an effect on runs scored at MLB Parks? Part 2
Why I asked the question: I was trying to find why Chase Field had such a high Park Factor over the years and the research expanded to all stadiums from that point.
Analysis: I did some research on effects on all MLB and MiLB stadiums (link to spreadsheet). After it was published, someone brought up that Chase Field had such a high Park Factor and elevation and temperature could not explain the entire difference. Being unable to tell if it was the lower humidity (as some suggested) or park size, I decided to run a multiple regression for the average park factors over the last 3 years (2006 to 2008) against elevation, average temperature, average humidity, park size (RF, RC, CF, LC, LF), average wall height, errors per game, wind direction, surface type and foul territory area.
Explanation of source of data:
Park Factors – I originally used runs scored per game because of the only Park Factors I could find did not go to the decimal point. After my original article ran, I got flooded with many different sets of data and am using Patriot's numbers from his website (htttp://gosu02.tripod.com/id103.html)
Elevation (ft) Collected from the List of Major League Baseball stadiums on Wikepedia – Elevation is a major factor in determining the distance a ball travels and the time the defense has to react to the ball once it is in play.
Temperature (degrees F)– Collected from the retrosheet database – last 3 years, except only 1 year's worth of data for Washington. The higher the temperature the farther a ball will travel
Relative Humidity (percentage) – Average values from April to September from websites BBC.com and CityRating.com Humidity is not supposed to have much of an effect on the distance a ball travels, but maybe that that small differences will explain some differences.
Dimensions (ft) – Taken from Wikepedia. Only 5 sets used – LF, FC, CF, RC and RF. Originally used total area from ballpark, but I found out that even though these two stadiums that had about the same area the stadium shaped like #1 below had a higher park factor:
Stadium #1 360380400380360
Stadium #2 380380380380380
Park Foul Area (ft squared) – Areas were calculated by Mitchel Lichtman. The larger the foul area, more foul balls will be caught, therefore less runs scored
Wind strength and direction (mph)  I used Retrosheet data from the last 3 years (1 year for Washington). The data from Retrosheet comes in the for of 8 different directions. From these different directions, I created the following matrix:
Wind Direction 
X component 
Y component 
To LF 
0.71 
0.71 
To CF 
0 
1 
To RF 
0.71 
0.71 
LF to RF 
1 
0 
From LF 
0.71 
0.71 
From CF 
0 
1 
From RF 
0.71 
0.71 
RF to LF 
1 
0 
I multiplied the X and Y values by the wind speed, added all the wind values up for each component and then divided by the number of games. Y component is a wind blowing out to CF, while the X component is a wind blowing to RF.
Question: When collecting this data, I found there was no wind blowing in form right field and I thought I had made a mistake somewhere. I searched on the games database for hat wind direction and the most recent case was in 2003. Has the wind not once over 5 years blown in from 1 game from RF? Is there some unspoken rule that the scores don't mark it this way?
Opponents Errors per game – I was looking for a way to measure how tough it is to play in a Stadium (i.e. Fly balls in Metrodome). The best metric I could come up with is to average the amount of errors the opposing team has per game.
Playing Surface – The three stadiums with Turf were given a value of 1 and the rest 0. Being that it was the new Field Turf, I wondered if runs scored might go down because the balls hit would be slower than "AstroTurf" and less weird bounces.
Average Wall Height (ft) – Averaged the values ballparks.com Now it is time for a few graphs that show the data collected. *
Note: Data on the Washington Nationals is only from 2008 since they just moved into a new park this last year.
The initial data is for each of the major league parks is location in the following spreadsheet that can be downloaded.
I ran a regression analysis on the data to get an equation that uses the preceding data. The regression equation ended up having an Rsquared of 0.714 and the Standard Deviation of the difference of the initial Park Factor and the final Park Factor was 0.0178. There was two problems with that initial equation:

The variable for wind blowing to CF was negative, therefore the more the wind was blowing out, less scoring that would be. That just defies all logic, so I threw both the Wind Components out for the next round of analysis

The variable for Wall Height was positive, meaning the higher the wall, the more runs that are scored. Home runs score more runs than doubles, so I decided to remove Wall Height also.
After rerunning the regression after removing Wall Height and Wind, I got the following equation Standard Deviation of 0.0.0184 and Rsquared of 0.692:
Park Factors = Away Teams Errors per game * (0.016) + % Relative Humidity * (0.0012) + Foul Area * (0.00000061) + Elevation * (0.000021) + Average Temperature * (0.00077) + Left Field * (0.0010) + Left Center Field * (0.00063) + Center Field * (0.0010) + Right Center Field * (0.00020) + Right Field * (0.0011) + 0.0090 (if Surface is turf) + 1.7056
Here is a simple chart of the factors for easy comparison of the factor and how much effects the park factor and run scoring environment.
The amount each factor has on Park Factors and Runs Scored (9.54 runs per game was the average runs scored by both teams over the past 3 years
Factor 
Change in Park Factor 
Change Runs Scored per game (9.54 runs per game) 
10 degree F increase 
0.0077 
0.073 
Increase in RH by 10% 
0.0120 
0.115 
10,000 sq ft increase in foul area 
0.0061 
0.058 
Surface is Turf 
0.0090 
0.085 
1000 ft increase in elevation 
0.0206 
0.196 
1 Errors for Away Team 
0.0160 
0.150 
10 ft increase in LF 
0.0100 
0.095 
10 ft increase in LC 
0.0063 
0.060 
10 ft increase in CF 
0.0101 
0.096 
10 ft increase in RC 
0.0020 
0.019 
10 ft increase in RF 
0.0106 
0.101 
As it can be seen, each factor can significantly effect the runs scored.
The following table is the original and final numbers for each of the ballparks. I also have added a column of combined stadium attributes (Dimensions, Foul Area and Surface Type) added to the equation's constant value to help to show which stadium designs lead to more runs.
Team 
Park 
Original Value 
Dimen plus constant 
Projected Value 
Diference 
Original Value 
Dimen plus constant 
Projected Value 
Difference 
Arizona Diamondbacks 
Chase Field 
1.0505 
0.9930 
1.0642 
0.0136 
10.0214 
9.4725 
10.1512 
0.1298 
Atlanta Braves 
Turner Field 
0.9934 
0.9862 
1.0090 
0.0157 
9.4758 
9.4076 
9.6254 
0.1495 
Baltimore Orioles 
Oriole Park at Camden Yards 
0.9946 
0.9794 
0.9875 
0.0070 
9.4874 
9.3425 
9.4202 
0.0672 
Boston Red Sox 
Fenway Park 
1.0313 
1.0256 
1.0183 
0.0130 
9.8382 
9.7836 
9.7137 
0.1245 
Chicago Cubs 
Wrigley Field 
1.0257 
1.0059 
1.0127 
0.0130 
9.7841 
9.5951 
9.6603 
0.1238 
Chicago White Sox 
U.S. Cellular Field 
1.0285 
0.9912 
0.9958 
0.0328 
9.8115 
9.4557 
9.4987 
0.3128 
Cincinnati Reds 
Great American Ball Park 
1.0150 
0.9986 
1.0096 
0.0054 
9.6825 
9.5258 
9.6307 
0.0518 
Cleveland Indians 
Progressive Field 
0.9819 
1.0028 
1.0087 
0.0268 
9.3667 
9.5659 
9.6227 
0.2560 
Colorado Rockies 
Coors Field 
1.1066 
0.9729 
1.1056 
0.0010 
10.5559 
9.2808 
10.5461 
0.0098 
Detroit Tigers 
Comerica Park 
0.9838 
0.9589 
0.9702 
0.0136 
9.3846 
9.1474 
9.2551 
0.1295 
Florida Marlins 
Dolphin Stadium 
0.9667 
0.9863 
0.9859 
0.0192 
9.2217 
9.4083 
9.4045 
0.1828 
Houston Astros 
Minute Maid Park 
1.0006 
0.9983 
0.9881 
0.0125 
9.5452 
9.5230 
9.4258 
0.1194 
Kansas City Royals 
Kauffman Stadium 
1.0033 
0.9874 
1.0001 
0.0033 
9.5710 
9.4195 
9.5399 
0.0312 
Los Angeles Angels of Anaheim 
Angel Stadium of Anaheim 
0.9774 
0.9960 
0.9860 
0.0086 
9.3238 
9.5010 
9.4059 
0.0821 
Los Angeles Dodgers 
Dodger Stadium 
0.9751 
1.0065 
1.0030 
0.0278 
9.3021 
9.6016 
9.5677 
0.2655 
Milwaukee Brewers 
Miller Park 
1.0031 
0.9947 
0.9963 
0.0068 
9.5688 
9.4882 
9.5039 
0.0649 
Minnesota Twins 
Hubert H. Humphrey Metrodome 
0.9971 
0.9839 
0.9981 
0.0011 
9.5111 
9.3856 
9.5214 
0.0103 
New York Mets 
Shea Stadium 
0.9754 
0.9900 
0.9896 
0.0142 
9.3046 
9.4437 
9.4396 
0.1350 
New York Yankees 
Yankee Stadium 
0.9879 
0.9700 
0.9704 
0.0176 
9.4243 
9.2535 
9.2568 
0.1675 
Oakland Athletics 
OaklandAlameda County Coliseum 
0.9836 
0.9982 
0.9840 
0.0004 
9.3831 
9.5223 
9.3865 
0.0034 
Philadelphia Phillies 
Citizens Bank Park 
1.0273 
1.0128 
1.0167 
0.0105 
9.7994 
9.6613 
9.6989 
0.1006 
Pittsburgh Pirates 
PNC Park 
0.9913 
0.9866 
1.0026 
0.0113 
9.4561 
9.4113 
9.5641 
0.1080 
St. Louis Cardinals 
Busch Stadium 
0.9816 
0.9834 
0.9951 
0.0135 
9.3637 
9.3811 
9.4929 
0.1292 
San Diego Padres 
PETCO Park 
0.9248 
0.9724 
0.9539 
0.0291 
8.8215 
9.2756 
9.0992 
0.2777 
San Francisco Giants 
AT&T Park 
0.9977 
0.9894 
0.9769 
0.0208 
9.5172 
9.4380 
9.3189 
0.1983 
Seattle Mariners 
Safeco Field 
0.9632 
0.9962 
0.9925 
0.0293 
9.1884 
9.5027 
9.4680 
0.2797 
Tampa Bay Rays 
Tropicana Field 
0.9878 
1.0118 
1.0063 
0.0186 
9.4225 
9.6519 
9.5997 
0.1772 
Texas Rangers 
Rangers Ballpark in Arlington 
1.0497 
0.9946 
1.0096 
0.0401 
10.0137 
9.4877 
9.6312 
0.3825 
Toronto Blue Jays 
Rogers Centre 
1.0220 
1.0022 
1.0023 
0.0197 
9.7490 
9.5604 
9.5615 
0.1875 
Washington Nationals 
Nationals Park 
1.0070 
0.9928 
0.9949 
0.0120 
9.6058 
9.4707 
9.4910 
0.1148 
The regression equation is able to predict some stadiums run production quite well. Here is a table where the regression was able to predict the Park Factor within 0.01.
The regression equation is able to predict some stadiums run production quite well. Here is a list of stadiums where the regression was able to predict the Park Factor within 0.01.
Angel Stadium of Anaheim 0.0086
Hubert H. Humphrey Metrodome 0.0011
OaklandAlameda County Coliseum 0.0004
Coors Field 0.0010
Kauffman Stadium 0.0033
Great American Ball Park 0.0054
Miller Park 0.0068
Oriole Park at Camden Yards 0.0070
I grouped the parks that exceeded the Standard Deviation of 0.0184. These are the stadiums that the factors I am using can't explain the runs scored at that stadium.
Seattle Mariners 0.0293
San Diego Padres 0.0291
Los Angeles Dodgers 0.0278
Cleveland Indians 0.0268
Toronto Blue Jays 0.0197
San Francisco Giants 0.0208
Chicago White Sox 0.0328
Texas Rangers 0.0401
Using the preceding data we can do analysis on future parks. I will pick the Met's new stadium, Citi Field. Most of the natural effects will be the same and the errors aren't know yet, but we can look at the dimensions and foul area to come to some conclusion.
Feature  Shea Stadium  Change in PF  Citi Field  Change in PF  Difference (CitiShea) 
LF  338  0.34  335  0.33  0.010 
LC  371  0.24  379  0.24  0.000 
CF  410  0.41  408  0.41  0.000 
RC  371  0.07  383  0.08  0.010 
RF  338  0.36  330  0.35  0.010 
Foul Area  25665  0.02  20900  0.01  0.003 
Total  0.013 
The new Mets stadium looks to allow less runs per game than the previous one. If the 9.54 runs per game environment is used, it would allow 0.12 runs less per game or about 10 less runs over the entire 81 home games.
I have had a lot of help putting this study together and special thanks to Mitchel Lichtman and Patriot for providing and data and to Sky Kalkman for his many suggestions. I hope the data gives people more of an insight to various variables that go into a stadium and how much of an effect each variable has on the run scored environment.
Loading comments...