Park Factors are numbers we use to understand, and sometimes also to adjust for, the effects of parks on hitter or pitcher statistics. Much of the time, we tend to be interested in park factors at the level of runs per game: hitters parks will cause more runs per game than pitcher's parks, given the same players. When we're just looking at player value, runs park factors really are the most important and useful. They allow us to understand a player in the context of his run environment.
If we really want to understand a park, however--or how a hitter will fare in a given park--we need to look at component park factors. They allow us to answer this question: what affect do parks have on individual events like singles, doubles, strikeouts, etc? This offseason, I was tasked with writing an article on Great American Ball Park for a Reds annual that will be published this spring (by Maple Street Press--watch for it!), and so I decided to take my first stab at component factors. The data the follow are the result. To my knowledge, the only other comparable, updated source for these kinds of park factors is Statcorner. I don't know how they calculate theirs, so they might be better or they might be worse--they do give you splits by LHB and RHB, which is helpful. But I have fun doing things myself now and then. So here they are:
The table is sortable if you click in the header, so you can look at park rankings by each of these park factors.
*BIP = Ball In Play, includes all balls hit into play INCLUDING home runs, or AB-K+SF
*BB = Non-intentional Bases on Balls
Some words on the methods and rationale...
These are calculated as Patriot describes in his post, minus the regression. At its heart, the Runs park factor is basically just:
[Runs Per Game at Home] / [Runs Per Game in League as Whole]
The denominator is estimated primarily by runs per game in away games, though there's an adjustment that includes the home team as part of the league as a whole. The raw ratios are also divided by two, so you can apply them to season data and not just home splits. Again, I highly recommend Patriot's article for discussion of these issues.
The one place where I extend beyond Patriot's article here is that I'm doing park factors not just for runs, but also for most of the important events that occur in a ballgame: singles, doubles, homers, walks, strikeouts, etc. One thing that you will note is that I don't always do everything "per game" as I do with runs. The park factor for singles, for example, is based on singles per ball in play. Why? If we just did everything per game, we would allow other events to influence our estimate of how a park affects our focal event. Say, for example, that we were looking at home runs in a park that is neutral for home runs, but is otherwise a hitter's park (permissive to singles, doubles, etc). Because it's a hitter's park, there will be more plate appearances than average at home, as outs happen less often per PA. And because of those extra PA's, you will get more home runs in the park--but it's not because of an effect on homers, per se, it's because you get more opportunities to hit one. By looking at home runs per ball hit into play, I'm focusing specifically on the effects of a park on balls that are struck and hit "fair." Ideally, I might use only air balls--and perhaps only air balls hit by a left-handed batter vs. a right-handed batter--but I'm not there yet.
So, if you were going to use these data on a player (e.g. to figure out how Adrian Beltre might hit in TEX vs. BOS), you'd first want to adjust PA's, then balls in play per PA, and then finally adjust home runs to ball in play. Most of the time, it probably won't give you a different answer than home runs per game. But sometimes it will.
There are more complicated ways of calculating park factors, like that used by baseball-reference to calculate runs park factors. Most of the time, I think this approach I'm using works fine. The cases I worry most about are those in the NL West, where you have the most extreme pitcher park in the same division as two of the most extreme hitters' parks. Thanks to the unbalanced schedule, my guess is that this causes SDP, COL, and ARI to look slightly more extreme than they need to be. I don't know how important this is.
Finally, as I mentioned above, these data are not regressed. This means that you should be much more skeptical of the park factors for Minnesota's Target Field than, for example, that for Cincinnati's Great American Ball Park. Patriot used what seem like pretty arbitrary (though reasonable) values he got from MGL to regress runs park factors that. You can apply those coefficients to my numbers above and get values that match his 2010 factors exactly (I've done this)--and feel free to do it, folks. But I decided not to do that to these data. In a future post, my plan is to look at year to year correlations (and hopefully intra-class correlation coefficients if I can get access to SAS again, or figure out how to do it in R) for each event. This will hopefully provide more useful data to help us to understand how much to regress each component. I'm sure that some park factors (triples, for example) are more volatile from year to year than others (PA/G, maybe?), and so different coefficients would be needed.
But for now, that's a wrap! Hope you enjoy them and find them useful.