When people talk about how different ballparks play, there are certain terms and concepts that are regularly and almost exclusively used. Coors Field is a Hitter’s Park. So is Chase Field. Petco is a Pitcher’s Park. On rare occasion, it might get a little more specific — i.e., AT&T has Triples Alley — but in general the conversation doesn’t get much deeper. Just because it’s not done, however, doesn’t necessarily mean it can’t be done. Is it possible to use data, specifically park factors, to group major league ballparks into more well-defined groups than just good-for-hitters and good-for-pitchers?
I set out to accomplish this through the statistical concept of clustering. Now, this seems like an ideal spot to mention that I am NOT a statistician, and so there’s a decent chance that some or all of what follows is a complete misuse of good techniques. That said, I think everything was used correctly, and hopefully my editors here will kill this piece off entirely if it makes no sense.
Additionally, I’ll mention that everything done herein was done either in MySQL (for data storage/retrieval), Excel (for easily assembling the needed data), or R (for actual data analysis). MySQL and R are both free programs, and the Excel stuff would have been just as easy with any of its free competitors.
Still reading? Good - here we go.
The first step in the process was deciding on a set of park factors to thoroughly characterize the different aspects of how MLB parks play. I wanted to capture run environment, power hitting, the likelihood of batted ball types (grounder, popup, etc), true outcomes, and in-play hitting. This was complicated in part by the fact that publicly-available information on batted ball types is… not ideal. Although Retrosheet has complete batted ball-type records for all seasons since 2003, it relies on stringers to classify them, leading to inconsistencies in what counts as a pop-up, line drive, fly ball, etc. They’re perfectly reliable for on-ground versus in-air, though, so that’s the split I used.
Must Reads
I ended up picking nine park factors, split by handedness (so, eighteen factors really): LinearWeightsRuns/G, LWTSRuns/G on ground balls, LWTSRuns/G on balls-in-air, Ground Balls/Ball-in-Air ratio, Home Runs per PA, Strikeouts per PA, Walks per PA, ISO, and BABIP. In each case, I calculated the park factors myself, using the procedure I’ve written about before at this very site and taking a five-year park average. The linear weights I used were also calculated myself empirically and individually for each season (see the wiki at Tango's site for details). Based on advice from BtBS’s own Neil Weinberg, I wanted to include a baserunning measure, but couldn’t find one that seemed both worthwhile and calculable from the data I have access to. Since these measures include batted ball types, I only looked at data from the last twelve seasons.
In order to group ballparks with respect to these eighteen park factors, I (or rather, the clustering algorithms) needed to know the distance between each park’s factors. This isn’t as straightforward as it may seem. If each factor, or variable, were independent of all the others, it would be an easy calculation that you’ve seen a million times before — he square root of the squared differences between the points in each dimension. In two dimensions it’d be sqrt((a2-a1)^2+(b2-b1)^2), which is easily enough expanded to the eighteen-dimensional space of my data. However, I can say confidently that these park factors are not independent of each other, but instead exhibit some covariance; that is, some of them vary in related ways. HRs per PA and ISO, for example, will tend to vary in the same direction. Because of this, I used an alternative distance metric called the Mahalanobis distance (calculated via the ecodist package in R). The Mahalanobis distance takes into account and corrects for this covariance in a way I’m not remotely qualified to describe, and normalized based on each variable’s standard deviation across the data set.
So, armed with a data set of park factors and a matrix of Mahalanobis distances between them, the only thing left was to decide on a clustering algorithm, run it, and show all my cool results. Unfortunately, as you may have guessed from the subtitle of this article, it wasn’t quite that easy.
I narrowed the wide variety of clustering choices available in R to the two types that were easiest for me (and probably for you) to understand: hierarchical clustering and partition-based clustering. Both have strengths and weaknesses, are easy enough to explain in plain English, and completely failed at finding reliable structures in the data here. Hierarchical clustering, specifically the bottom-up kind, starts with all data points being in their own group of one. It then merges the two closest and recalculates the distance between all groups (by one of several available methods). It continues until all points are put together into one giant group, and presents you with the distance between each group that was merged throughout the process. This is commonly presented as a dendrogram.
The above is the actual clustering dendrogram of the data I used here (note that I removed the data labels from the ends of all branches for visual clarity), and it doesn’t look promising. An ideal result would have featured a few significantly longer vertical lines, indicating a long distance between the connected groupings. Clusters are determine by horizontal slicing (i.e., at height = 275, a horizontal line crosses five vertical lines, each corresponding to a cluster containing all of the lower branches sprouting from it), and long distances/long vertical lines indicate good distinction between clusters. Nothing obvious pops out of the graph above; the best I can do is probably something in the 4-6 range (i.e., height between 250 and 300).
There are a few ways to assess the appropriateness of a clustering. Okay, for all I know there are a thousand ways, but I'm only to focus on a single one here — silhouette plotting, which can be produced easily from any clustering function in R. A silhouette plot shows, for each data point in the set, a number called width ranging from -1 to 1 that represents how well it fits in its assigned cluster, where 1 is an ideal fit, -1 is an ideal fit in the nearest other cluster, and 0 indicates that it's on the border between two clusters. This number can be averaged across a cluster or even across the whole data set to indicate how good of a grouping the clustering produced. The heuristic I found for silhouette width says that 0.75+ is a great fit, 0.5-0.75 is still pretty good, 0.25-0.5 is kinda bad, and under 0.25 means it's essentially worthless. Below, you can find the silhouette plots for clustering into 3-6 groups by the above method.
I also used one of the tools available in R to attempt to predict the ideal number of groups. The NbClust packages provides a function that runs any of up to 30 tests that purport (and I use that word because I have no idea how they all work) to determine the correct cluster number, and then reports the average from all the tests you choose to run. I ran the full complement of tests, using k-means as the clustering algorithm of choice since k-medoids wasn’t available, on my data, and got 27 results. The large plurality (10/27) told me to use two clusters, and second-place was a tie at 6/27 metrics each between three and thirty clusters. So, in other words, over 80% of the tests I ran told me my best options were effectively meaningless groupings. No other number was in the results more than once; the other five results were 5, 6, 16, 23, and 29.
| Park | Yankee | GABP | Busch | Marlins | Target | Chase | Confidence |
|---|---|---|---|---|---|---|---|
| Angel Stadium 2007 | 13 | 5 | 15 | 48 | 15 | 4 | 0.405 |
| Angel Stadium 2008 | 0 | 12 | 85 | 0 | 1 | 1 | 0.790 |
| Angel Stadium 2009 | 2 | 24 | 37 | 1 | 17 | 21 | 0.249 |
| Angel Stadium 2010 | 6 | 5 | 43 | 16 | 13 | 17 | 0.345 |
| Angel Stadium 2011 | 1 | 13 | 84 | 0 | 1 | 1 | 0.775 |
| Angel Stadium 2012 | 3 | 25 | 64 | 0 | 5 | 2 | 0.517 |
| Angel Stadium 2013 | 7 | 8 | 77 | 0 | 7 | 1 | 0.726 |
| Angel Stadium 2014 | 1 | 5 | 84 | 0 | 8 | 2 | 0.800 |
| Arlington 2007 | 3 | 52 | 1 | 0 | 33 | 11 | 0.356 |
| Arlington 2008 | 60 | 10 | 0 | 0 | 29 | 1 | 0.455 |
| Arlington 2009 | 0 | 95 | 4 | 0 | 1 | 0 | 0.928 |
| Arlington 2010 | 0 | 96 | 2 | 0 | 1 | 1 | 0.953 |
| Arlington 2011 | 0 | 93 | 1 | 0 | 1 | 4 | 0.913 |
| Arlington 2012 | 0 | 88 | 4 | 0 | 3 | 5 | 0.859 |
| Arlington 2013 | 0 | 91 | 7 | 0 | 0 | 2 | 0.873 |
| Arlington 2014 | 0 | 82 | 9 | 0 | 2 | 6 | 0.777 |
| AT&T 2007 | 0 | 1 | 63 | 0 | 34 | 1 | 0.461 |
| AT&T 2008 | 0 | 1 | 79 | 0 | 20 | 0 | 0.691 |
| AT&T 2009 | 0 | 2 | 72 | 0 | 25 | 0 | 0.592 |
| AT&T 2010 | 0 | 2 | 11 | 0 | 81 | 6 | 0.753 |
| AT&T 2011 | 0 | 0 | 0 | 0 | 98 | 1 | 0.975 |
| AT&T 2012 | 1 | 2 | 3 | 0 | 92 | 2 | 0.907 |
| AT&T 2013 | 11 | 5 | 2 | 0 | 77 | 4 | 0.720 |
| AT&T 2014 | 2 | 35 | 25 | 0 | 26 | 13 | 0.218 |
| Busch Stadium 2007 | 1 | 1 | 70 | 0 | 26 | 2 | 0.569 |
| Busch Stadium 2008 | 1 | 2 | 48 | 2 | 34 | 14 | 0.309 |
| Busch Stadium 2009 | 0 | 1 | 51 | 1 | 42 | 6 | 0.299 |
| Busch Stadium 2010 | 0 | 0 | 91 | 0 | 9 | 0 | 0.863 |
| Busch Stadium 2011 | 0 | 0 | 100 | 0 | 0 | 0 | 0.999 |
| Busch Stadium 2012 | 0 | 0 | 100 | 0 | 0 | 0 | 0.995 |
| Busch Stadium 2013 | 0 | 0 | 100 | 0 | 0 | 0 | 1.000 |
| Busch Stadium 2014 | 0 | 0 | 100 | 0 | 0 | 0 | 1.000 |
| Camden 2007 | 0 | 4 | 81 | 0 | 1 | 14 | 0.739 |
| Camden 2008 | 33 | 7 | 4 | 0 | 17 | 38 | 0.218 |
| Camden 2009 | 75 | 7 | 1 | 0 | 6 | 11 | 0.693 |
| Camden 2010 | 76 | 20 | 0 | 0 | 2 | 2 | 0.661 |
| Camden 2011 | 4 | 65 | 4 | 0 | 1 | 25 | 0.523 |
| Camden 2012 | 3 | 30 | 1 | 0 | 9 | 57 | 0.416 |
| Camden 2013 | 3 | 62 | 12 | 0 | 4 | 20 | 0.522 |
| Camden 2014 | 0 | 75 | 14 | 0 | 1 | 10 | 0.686 |
| Chase 2007 | 0 | 31 | 0 | 0 | 0 | 68 | 0.529 |
| Chase 2008 | 0 | 1 | 0 | 0 | 0 | 99 | 0.986 |
| Chase 2009 | 0 | 1 | 0 | 0 | 0 | 98 | 0.975 |
| Chase 2010 | 0 | 0 | 0 | 0 | 0 | 100 | 1.000 |
| Chase 2011 | 0 | 0 | 0 | 0 | 53 | 47 | 0.293 |
| Chase 2012 | 1 | 1 | 0 | 0 | 74 | 25 | 0.609 |
| Chase 2013 | 0 | 0 | 0 | 0 | 0 | 100 | 0.995 |
| Chase 2014 | 0 | 3 | 1 | 0 | 0 | 95 | 0.941 |
| Citi Field 2009 | 13 | 3 | 0 | 0 | 73 | 10 | 0.661 |
| Citi Field 2010 | 0 | 0 | 2 | 0 | 52 | 46 | 0.290 |
| Citi Field 2011 | 0 | 0 | 38 | 0 | 3 | 59 | 0.395 |
| Citi Field 2012 | 0 | 1 | 71 | 0 | 2 | 25 | 0.589 |
| Citi Field 2013 | 7 | 39 | 36 | 0 | 5 | 13 | 0.208 |
| Citi Field 2014 | 2 | 72 | 25 | 0 | 0 | 1 | 0.600 |
| Citizen's Bank 2007 | 2 | 95 | 1 | 0 | 2 | 0 | 0.942 |
| Citizen's Bank 2008 | 5 | 58 | 20 | 0 | 11 | 5 | 0.483 |
| Citizen's Bank 2009 | 99 | 1 | 0 | 0 | 0 | 0 | 0.988 |
| Citizen's Bank 2010 | 96 | 3 | 0 | 0 | 1 | 0 | 0.942 |
| Citizen's Bank 2011 | 87 | 11 | 0 | 0 | 1 | 0 | 0.815 |
| Citizen's Bank 2012 | 100 | 0 | 0 | 0 | 0 | 0 | 0.996 |
| Citizen's Bank 2013 | 91 | 9 | 0 | 0 | 0 | 0 | 0.869 |
| Citizen's Bank 2014 | 99 | 1 | 0 | 0 | 0 | 0 | 0.984 |
| Comerica 2007 | 0 | 26 | 8 | 0 | 13 | 52 | 0.391 |
| Comerica 2008 | 3 | 36 | 7 | 0 | 16 | 37 | 0.190 |
| Comerica 2009 | 0 | 22 | 62 | 0 | 8 | 8 | 0.512 |
| Comerica 2010 | 3 | 11 | 6 | 0 | 35 | 45 | 0.281 |
| Comerica 2011 | 10 | 12 | 9 | 0 | 35 | 33 | 0.188 |
| Comerica 2012 | 6 | 4 | 51 | 0 | 23 | 16 | 0.390 |
| Comerica 2013 | 1 | 3 | 50 | 0 | 13 | 33 | 0.338 |
| Comerica 2014 | 1 | 1 | 23 | 0 | 10 | 65 | 0.533 |
| Coors 2007 | 21 | 7 | 1 | 0 | 60 | 11 | 0.497 |
| Coors 2008 | 50 | 4 | 1 | 0 | 39 | 7 | 0.309 |
| Coors 2009 | 4 | 4 | 2 | 0 | 42 | 48 | 0.267 |
| Coors 2010 | 11 | 31 | 6 | 0 | 35 | 18 | 0.193 |
| Coors 2011 | 10 | 2 | 2 | 0 | 72 | 13 | 0.662 |
| Coors 2012 | 4 | 14 | 2 | 0 | 33 | 47 | 0.301 |
| Coors 2013 | 1 | 4 | 1 | 0 | 46 | 48 | 0.251 |
| Coors 2014 | 3 | 2 | 0 | 0 | 52 | 43 | 0.303 |
| Dodger Stadium 2007 | 99 | 0 | 0 | 0 | 0 | 0 | 0.989 |
| Dodger Stadium 2008 | 100 | 0 | 0 | 0 | 0 | 0 | 0.997 |
| Dodger Stadium 2009 | 38 | 0 | 1 | 0 | 61 | 0 | 0.419 |
| Dodger Stadium 2010 | 61 | 3 | 10 | 0 | 24 | 2 | 0.493 |
| Dodger Stadium 2011 | 46 | 3 | 6 | 0 | 44 | 1 | 0.241 |
| Dodger Stadium 2012 | 0 | 2 | 85 | 0 | 11 | 2 | 0.793 |
| Dodger Stadium 2013 | 1 | 2 | 65 | 0 | 9 | 23 | 0.539 |
| Dodger Stadium 2014 | 6 | 40 | 46 | 0 | 5 | 3 | 0.261 |
| Fenway 2007 | 17 | 53 | 5 | 0 | 2 | 23 | 0.419 |
| Fenway 2008 | 6 | 71 | 8 | 0 | 1 | 14 | 0.644 |
| Fenway 2009 | 8 | 59 | 22 | 0 | 1 | 9 | 0.477 |
| Fenway 2010 | 1 | 26 | 60 | 0 | 1 | 13 | 0.465 |
| Fenway 2011 | 0 | 3 | 2 | 0 | 4 | 90 | 0.880 |
| Fenway 2012 | 4 | 10 | 11 | 0 | 8 | 67 | 0.615 |
| Fenway 2013 | 0 | 5 | 18 | 0 | 17 | 59 | 0.504 |
| Fenway 2014 | 2 | 13 | 4 | 0 | 34 | 48 | 0.305 |
| GABP 2007 | 0 | 97 | 2 | 0 | 0 | 1 | 0.963 |
| GABP 2008 | 1 | 98 | 1 | 0 | 0 | 1 | 0.971 |
| GABP 2009 | 0 | 97 | 0 | 0 | 0 | 3 | 0.952 |
| GABP 2010 | 0 | 100 | 0 | 0 | 0 | 0 | 0.994 |
| GABP 2011 | 0 | 100 | 0 | 0 | 0 | 0 | 1.000 |
| GABP 2012 | 0 | 100 | 0 | 0 | 0 | 0 | 1.000 |
| GABP 2013 | 0 | 100 | 0 | 0 | 0 | 0 | 1.000 |
| GABP 2014 | 0 | 100 | 0 | 0 | 0 | 0 | 1.000 |
| Kauffmann 2007 | 1 | 10 | 59 | 0 | 7 | 22 | 0.481 |
| Kauffmann 2008 | 1 | 12 | 61 | 0 | 25 | 2 | 0.484 |
| Kauffmann 2009 | 1 | 5 | 19 | 8 | 47 | 20 | 0.368 |
| Kauffmann 2010 | 2 | 5 | 3 | 0 | 83 | 7 | 0.797 |
| Kauffmann 2011 | 1 | 1 | 5 | 1 | 50 | 42 | 0.291 |
| Kauffmann 2012 | 0 | 3 | 17 | 0 | 38 | 41 | 0.218 |
| Kauffmann 2013 | 0 | 3 | 3 | 0 | 2 | 91 | 0.897 |
| Kauffmann 2014 | 1 | 51 | 11 | 0 | 6 | 31 | 0.348 |
| Marlins Park 2012 | 8 | 39 | 35 | 6 | 9 | 4 | 0.213 |
| Marlins Park 2013 | 0 | 0 | 0 | 100 | 0 | 0 | 1.000 |
| Marlins Park 2014 | 0 | 0 | 0 | 100 | 0 | 0 | 1.000 |
| Metrodome 2007 | 2 | 2 | 4 | 0 | 89 | 3 | 0.873 |
| Metrodome 2008 | 10 | 2 | 5 | 0 | 79 | 4 | 0.743 |
| Metrodome 2009 | 13 | 6 | 7 | 0 | 72 | 2 | 0.659 |
| Miller 2007 | 0 | 99 | 0 | 0 | 0 | 0 | 0.991 |
| Miller 2008 | 0 | 97 | 2 | 0 | 0 | 0 | 0.961 |
| Miller 2009 | 0 | 14 | 86 | 0 | 0 | 0 | 0.789 |
| Miller 2010 | 0 | 87 | 10 | 0 | 0 | 3 | 0.824 |
| Miller 2011 | 1 | 66 | 19 | 0 | 1 | 13 | 0.568 |
| Miller 2012 | 0 | 39 | 59 | 0 | 0 | 2 | 0.390 |
| Miller 2013 | 0 | 83 | 3 | 0 | 1 | 13 | 0.766 |
| Miller 2014 | 0 | 98 | 1 | 0 | 0 | 1 | 0.969 |
| Minute Maid 2007 | 0 | 1 | 1 | 0 | 33 | 65 | 0.483 |
| Minute Maid 2008 | 1 | 33 | 5 | 0 | 52 | 9 | 0.352 |
| Minute Maid 2009 | 0 | 50 | 5 | 0 | 39 | 7 | 0.303 |
| Minute Maid 2010 | 0 | 88 | 9 | 0 | 3 | 0 | 0.831 |
| Minute Maid 2011 | 0 | 76 | 22 | 0 | 2 | 1 | 0.645 |
| Minute Maid 2012 | 0 | 30 | 67 | 0 | 1 | 1 | 0.524 |
| Minute Maid 2013 | 0 | 69 | 24 | 0 | 1 | 5 | 0.567 |
| Minute Maid 2014 | 1 | 65 | 10 | 0 | 10 | 15 | 0.572 |
| Nationals Park 2007 | 8 | 3 | 48 | 2 | 10 | 29 | 0.334 |
| Nationals Park 2008 | 0 | 2 | 38 | 0 | 1 | 59 | 0.404 |
| Nationals Park 2009 | 0 | 2 | 7 | 0 | 1 | 91 | 0.873 |
| Nationals Park 2010 | 0 | 75 | 12 | 0 | 1 | 12 | 0.695 |
| Nationals Park 2011 | 0 | 52 | 19 | 0 | 1 | 28 | 0.375 |
| Nationals Park 2012 | 1 | 41 | 1 | 0 | 2 | 54 | 0.337 |
| Nationals Park 2013 | 6 | 38 | 10 | 0 | 22 | 25 | 0.252 |
| Nationals Park 2014 | 3 | 3 | 2 | 0 | 75 | 17 | 0.659 |
| O.Co 2007 | 0 | 0 | 100 | 0 | 0 | 0 | 0.994 |
| O.Co 2008 | 0 | 1 | 98 | 0 | 0 | 1 | 0.979 |
| O.Co 2009 | 0 | 0 | 100 | 0 | 0 | 0 | 0.999 |
| O.Co 2010 | 0 | 0 | 98 | 0 | 2 | 0 | 0.968 |
| O.Co 2011 | 0 | 0 | 96 | 0 | 3 | 0 | 0.948 |
| O.Co 2012 | 0 | 0 | 100 | 0 | 0 | 0 | 0.998 |
| O.Co 2013 | 0 | 0 | 100 | 0 | 0 | 0 | 0.997 |
| O.Co 2014 | 3 | 3 | 66 | 0 | 10 | 18 | 0.564 |
| Petco 2007 | 15 | 22 | 56 | 1 | 5 | 1 | 0.453 |
| Petco 2008 | 4 | 27 | 58 | 0 | 10 | 1 | 0.446 |
| Petco 2009 | 3 | 14 | 81 | 0 | 1 | 0 | 0.743 |
| Petco 2010 | 2 | 48 | 49 | 0 | 1 | 0 | 0.250 |
| Petco 2011 | 1 | 10 | 82 | 0 | 5 | 1 | 0.771 |
| Petco 2012 | 4 | 7 | 5 | 0 | 77 | 8 | 0.731 |
| Petco 2013 | 1 | 12 | 82 | 0 | 4 | 1 | 0.764 |
| Petco 2014 | 4 | 23 | 22 | 0 | 35 | 17 | 0.237 |
| PNC Park 2007 | 0 | 0 | 100 | 0 | 0 | 0 | 0.999 |
| PNC Park 2008 | 0 | 0 | 98 | 0 | 0 | 1 | 0.976 |
| PNC Park 2009 | 0 | 0 | 99 | 0 | 0 | 0 | 0.992 |
| PNC Park 2010 | 0 | 0 | 100 | 0 | 0 | 0 | 0.997 |
| PNC Park 2011 | 0 | 0 | 100 | 0 | 0 | 0 | 0.997 |
| PNC Park 2012 | 0 | 0 | 96 | 0 | 1 | 2 | 0.955 |
| PNC Park 2013 | 0 | 0 | 1 | 0 | 96 | 2 | 0.955 |
| PNC Park 2014 | 2 | 0 | 3 | 0 | 94 | 1 | 0.920 |
| Progressive 2007 | 51 | 2 | 1 | 0 | 42 | 5 | 0.297 |
| Progressive 2008 | 3 | 1 | 2 | 0 | 92 | 2 | 0.902 |
| Progressive 2009 | 1 | 5 | 8 | 0 | 82 | 5 | 0.782 |
| Progressive 2010 | 2 | 7 | 8 | 0 | 73 | 10 | 0.686 |
| Progressive 2011 | 0 | 0 | 0 | 0 | 97 | 2 | 0.958 |
| Progressive 2012 | 2 | 1 | 3 | 0 | 89 | 6 | 0.861 |
| Progressive 2013 | 26 | 1 | 1 | 0 | 44 | 28 | 0.303 |
| Progressive 2014 | 3 | 1 | 0 | 0 | 3 | 93 | 0.915 |
| Rogers 2007 | 4 | 10 | 6 | 0 | 77 | 3 | 0.718 |
| Rogers 2008 | 0 | 17 | 9 | 0 | 62 | 12 | 0.534 |
| Rogers 2009 | 0 | 60 | 29 | 0 | 7 | 4 | 0.453 |
| Rogers 2010 | 0 | 58 | 23 | 0 | 8 | 11 | 0.469 |
| Rogers 2011 | 0 | 82 | 8 | 0 | 4 | 6 | 0.786 |
| Rogers 2012 | 0 | 2 | 0 | 0 | 0 | 97 | 0.962 |
| Rogers 2013 | 0 | 41 | 1 | 0 | 2 | 56 | 0.358 |
| Rogers 2014 | 0 | 4 | 0 | 0 | 2 | 94 | 0.918 |
| Safeco 2007 | 3 | 17 | 55 | 0 | 18 | 7 | 0.460 |
| Safeco 2008 | 22 | 28 | 13 | 0 | 34 | 2 | 0.196 |
| Safeco 2009 | 16 | 9 | 25 | 0 | 43 | 7 | 0.304 |
| Safeco 2010 | 3 | 17 | 77 | 0 | 4 | 0 | 0.682 |
| Safeco 2011 | 10 | 27 | 55 | 0 | 7 | 1 | 0.418 |
| Safeco 2012 | 45 | 3 | 50 | 0 | 2 | 0 | 0.271 |
| Safeco 2013 | 9 | 2 | 88 | 0 | 0 | 0 | 0.838 |
| Safeco 2014 | 13 | 2 | 86 | 0 | 0 | 0 | 0.792 |
| Shea Stadium 2007 | 1 | 1 | 8 | 0 | 12 | 77 | 0.710 |
| Shea Stadium 2008 | 0 | 1 | 5 | 0 | 1 | 93 | 0.911 |
| Sun Life 2007 | 2 | 63 | 20 | 0 | 9 | 7 | 0.529 |
| Sun Life 2008 | 3 | 66 | 12 | 1 | 7 | 12 | 0.598 |
| Sun Life 2009 | 2 | 34 | 15 | 1 | 14 | 33 | 0.178 |
| Sun Life 2010 | 4 | 22 | 2 | 4 | 34 | 35 | 0.180 |
| Sun Life 2011 | 5 | 13 | 1 | 1 | 51 | 29 | 0.361 |
| Target 2010 | 1 | 1 | 16 | 0 | 78 | 4 | 0.707 |
| Target 2011 | 1 | 0 | 0 | 0 | 99 | 0 | 0.984 |
| Target 2012 | 0 | 0 | 0 | 0 | 100 | 0 | 1.000 |
| Target 2013 | 0 | 0 | 0 | 0 | 100 | 0 | 1.000 |
| Target 2014 | 0 | 0 | 0 | 0 | 100 | 0 | 1.000 |
| Tropicana 2007 | 0 | 79 | 1 | 0 | 1 | 19 | 0.698 |
| Tropicana 2008 | 0 | 94 | 2 | 0 | 0 | 3 | 0.927 |
| Tropicana 2009 | 0 | 97 | 1 | 0 | 0 | 1 | 0.965 |
| Tropicana 2010 | 0 | 99 | 1 | 0 | 0 | 0 | 0.988 |
| Tropicana 2011 | 7 | 75 | 13 | 0 | 0 | 5 | 0.690 |
| Tropicana 2012 | 98 | 1 | 0 | 0 | 1 | 0 | 0.977 |
| Tropicana 2013 | 17 | 1 | 2 | 0 | 79 | 1 | 0.706 |
| Tropicana 2014 | 60 | 5 | 11 | 0 | 24 | 0 | 0.478 |
| Turner 2007 | 9 | 25 | 62 | 0 | 3 | 1 | 0.493 |
| Turner 2008 | 99 | 1 | 0 | 0 | 0 | 0 | 0.988 |
| Turner 2009 | 96 | 2 | 1 | 0 | 1 | 0 | 0.953 |
| Turner 2010 | 100 | 0 | 0 | 0 | 0 | 0 | 0.995 |
| Turner 2011 | 99 | 0 | 1 | 0 | 0 | 0 | 0.984 |
| Turner 2012 | 100 | 0 | 0 | 0 | 0 | 0 | 0.996 |
| Turner 2013 | 98 | 0 | 1 | 0 | 1 | 0 | 0.978 |
| Turner 2014 | 42 | 5 | 40 | 0 | 11 | 3 | 0.218 |
| US Cellular 2007 | 0 | 36 | 3 | 0 | 3 | 57 | 0.389 |
| US Cellular 2008 | 0 | 9 | 12 | 0 | 1 | 78 | 0.722 |
| US Cellular 2009 | 0 | 6 | 0 | 0 | 0 | 93 | 0.905 |
| US Cellular 2010 | 0 | 3 | 0 | 0 | 0 | 97 | 0.960 |
| US Cellular 2011 | 0 | 80 | 2 | 0 | 0 | 17 | 0.719 |
| US Cellular 2012 | 0 | 96 | 2 | 0 | 0 | 2 | 0.947 |
| US Cellular 2013 | 0 | 96 | 3 | 0 | 0 | 0 | 0.950 |
| US Cellular 2014 | 3 | 86 | 10 | 0 | 1 | 0 | 0.809 |
| Wrigley 2007 | 1 | 54 | 0 | 0 | 10 | 35 | 0.367 |
| Wrigley 2008 | 1 | 49 | 0 | 0 | 23 | 26 | 0.361 |
| Wrigley 2009 | 0 | 98 | 0 | 0 | 1 | 0 | 0.979 |
| Wrigley 2010 | 0 | 92 | 3 | 0 | 4 | 1 | 0.905 |
| Wrigley 2011 | 0 | 65 | 25 | 0 | 8 | 2 | 0.528 |
| Wrigley 2012 | 0 | 54 | 25 | 0 | 18 | 2 | 0.420 |
| Wrigley 2013 | 0 | 61 | 35 | 0 | 3 | 1 | 0.429 |
| Wrigley 2014 | 0 | 22 | 51 | 0 | 5 | 22 | 0.402 |
| (Old) Yankee Stadium 2007 | 100 | 0 | 0 | 0 | 0 | 0 | 1.000 |
| (Old) Yankee Stadium 2008 | 100 | 0 | 0 | 0 | 0 | 0 | 1.000 |
| Yankee Stadium 2009 | 57 | 38 | 2 | 0 | 2 | 1 | 0.385 |
| Yankee Stadium 2010 | 68 | 27 | 0 | 0 | 1 | 4 | 0.542 |
| Yankee Stadium 2011 | 11 | 88 | 0 | 0 | 0 | 1 | 0.830 |
| Yankee Stadium 2012 | 2 | 97 | 0 | 0 | 0 | 0 | 0.961 |
| Yankee Stadium 2013 | 2 | 98 | 0 | 0 | 0 | 0 | 0.964 |
| Yankee Stadium 2014 | 79 | 21 | 0 | 0 | 0 | 0 | 0.684 |
One last point on the poor results seen here. You might be wondering if my utter failure to definitively cluster ballparks based on their park factors is because of the park factors themselves, which I calculated myself, rather than because of something about the parks themselves. In order to address this, I tried to repeat the above work (at least in part) on two other park factor data sets. First, I reduced the number of years being averaged to produce the park factors from five to three. This did not result in any significant difference in silhouette width for any of the clustering methods. Second, I replaced my park factors with the factors FanGraphs provides on their Guts! page, split by handedness still, and again found no meaningful differences. This leads me to conclude that it's because of the parks themselves, and not because of an artifact in the data, that clustering parks beyond hitter's and pitcher's parks does not lead to any definitive information.
. . .
Much of the information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at www.retrosheet.org. Some other information courtesy of FanGraphs.
John Choiniere is a researcher and featured (occasional) writer at Beyond the Box Score. You can follow him on Twitter at @johnchoiniere.
Loading comments...