Armed with a Pitch F/X database, I recently took a basic look at potential park effects for pitch data from the 2012 season. The study was admittedly crude, as a number of factors would or could have played a part in contributing to the observed deltas from stadium to stadium. Among those mentioned in the article were frequency of use for each pitch type. After all, if for example curveballs are thrown more often in a particular ballpark than most others, the average movement (and velocity and release point) calculated for that park would be affected.
Wondering whether there would be any pattern to frequency of certain pitch types in certain types of stadiums, I set out to use the same method as before to look at deltas between home and away usage of each pitch type. Again, I was looking at all pitches that both teams threw for all home games for a given team, and subtracting the total number of pitches that both teams threw for all away games for that same team.
After running the data, the cases of the most extreme deltas looked fairly suspicious to me. I decided to choose the most extreme case from the fastball set to investigate further: the Tampa Bay Rays. Consider the following graph showing the breakdown of four-seam vs. two-seam fastballs in Tampa Bay Rays home and away games in 2012.
This is a large discrepancy. According to the pitch type classification, RHPs threw more than 1300 additional four-seam fastballs at home than on the road, and over 1000 less two-seam fastballs. LHPs had the inverse, with over 1300 more two-seam fastballs at home, and over 1000 less four-seam fastballs.
Before diving any further, there are two possibilities for these large differences. The first, which I would find very interesting, would be a conscious decision by the Rays to use different types of fastballs much, much more often at Tropicana Field than on the road. Unfortunately, when we remember that these numbers are including both teams in each Rays game, it would likely have to mean many teams visiting Tampa Bay would be implementing a similar strategy for the deltas to be this large.
This makes the second possibility more likely, which is one that involves environmental factors driving these differences. Consider the horizontal and vertical movement of all four-seam and two-seam fastballs from the home and away sets below to see the effect.
From my last article showing the basic Pitch F/X park effects, Tropicana Field showed a positive movement factor of 1.75 inches horizontally and 0.97 inches vertically for RHP. For LHP, the horizontal and vertical movement factors were both positive as well, at 2.79 inches and 0.14 inches, respectively.
In the graph, you can see that the entire mass of pitches is shifted to the right in Rays home games. Take a look at both -10 and +14 on the x-axis to see the effect clearly. It is also possible to look around +8 on the x-axis and see that the void of pitches in that region that have negligible positive vertical movement in home games compared to away games.
Seeing this, it is certainly understandable that the algorithm used to automatically classify pitches into pitch types may have misclassified many of these pitches at Tropicana Field. To start, these two types of fastballs are thrown with very similar velocities. Typically, a two-seam fastball will break more to the arm side of the pitcher but have less positive vertical movement than a four-seam fastball. This means that an offset that is positive in both horizontal and vertical planes like we have here will tend to make the movement of two-seamers look more a lot like four-seamers for RHPs and the movement of four-seamers look more like two-seamers for LHPs.
This is also a good example of why the incredibly successful MLBAM neural net-based pitch type binning results can yet stand to be examined and hand-tweaked, as has been undertaken in the wonderful work by Brooks Baseball. Other types of clustering algorithms have been utilized as alternative approaches to tackling this difficult problem.
At this point it is almost impossible to imagine that the difference seen in the graphs is due to strategic pitch type selection - there must be environmental effects at work. Aside from potential Pitch F/X calibration error, we can attempt to consider what these other factors may be, and perform at least a reasonable attempt at accounting for their presence.
After having explanations from Pitch F/X experts Peter Jensen and Alan Nathan, it is my understanding that Pitch F/X cameras measure the position of the pitched ball at regular intervals between 40 feet and 10 feet from the plate. It assumes a constant acceleration, and uses the initial location, velocity and acceleration values to calculate a hypothetical path for the pitch. My understanding is that this hypothetical path is fed as essentially an initial guess into a presumably iterative solver algorithm, that attempts to tweak the underlying velocity and acceleration parameters in an effort to minimize the differences (or errors) between the hypothetical ball positions and the actual, measured ball positions. From that point, the solved path can be extrapolated 10 feet out each side. This produces a "release point" which is fixed at 50 feet away from home plate in the model, and the location of the pitch at 0 feet as it crosses home plate.
It really is an incredibly engineered and useful system. One nice feature of this method of measurement is that any movement caused by temperature, elevation, air pressure, humidity or wind is all not particularly relevant to the system, in that it is just measuring the position of the ball in fixed intervals, without needing to know or care what effects are causing the ball to have arrived at such position.
Dr. Nathan suggested after my previous article that it would be interesting to see an analysis that accounted for air density. The variable contributing factors to air density are temperature, elevation, air pressure and relative humidity. In order to get a feel for the effects and relative importance of these contributors, I experimented with some test cases and varied the four parameters to what I considered the reasonable lower and upper bounds using Dr. Nathan's wonderful Trajectory Calculator. Using this I could see that relative humidity appeared to have the least effect on the movement of a pitch, all others being equal, which agreed with a comment that Dr. Nathan had made earlier. Elevation certainly plays a large factor at the altitude of Coors Field versus really anywhere else.
Game time temperatures are already recorded in my Pitch F/X database. I located the elevation for each park for the altitude component. I was able to track down the air pressure and relative humidity readings for nearby airports or venues to stadiums for the date and time of the game. These measurements can certainly vary quite significantly over the course of the day, so I chose the first reading after the start of the game for the inputs. Of course there is an assumption being made here that these recorded input values are close enough to the actual values at field level.
I should note that because Tropicana Field is a climate-controlled environment, it has a constant temperature of 72F. I assumed a constant relative humidity of 50% here and in all other climate-controlled venues, on the upper end of the typical comfortable range. From these values, and the formulas in the Trajectory Calculator, I was able to calculate an air density, rho, for each pitch.
To adjust these to the same scale, I can just use the Rays home games as the reference point, since the rho for each was identical thanks to Tropicana Field. While I didn't realize this at first, Dr. Nathan pointed out that rho is directly proportional to the horizontal and vertical movement pitch values, so scaling at this point is simple.
Below is the pitch movement graph for Rays road games after having theoretically compensated for the air density in a manner that makes them behave like they were thrown in the air at Tropicana Field. I've included the original for comparison, although once you see them you'll notice that they are CONSIDERABLY different...
Okay, so there are actually some differences there, although I suspect if you were like me you would have to sit and eye them together for a bit before you spot some. In parsing the data, the average absolute shift was about 0.15 inches horizontally and 0.20 inches vertically. Of course that is relatively small on the scale of this graph to start with, not to mention when direction is considered a lot of these movements counterbalance.
The largest shifts were 0.97 inches in the horizontal plane and 1.30 inches vertically, on separate pitches from the same game played in 97F heat in Kansas City on the afternoon of June 27th. Kauffman Stadium is actually at the second highest elevation of all AL stadiums. This at least gives us an idea how large of an effect is possible due to air density.
If I think about it, it makes sense that differences in air density on their own could not have been responsible for a positive horizontal shift in movement from both RHPs and LHPs. Higher air density should allow more pitch movement, but this would be more for both righties and lefties, which would have the effect of spreading the set of fastballs in the graph out to both sides, not shifting it in more or less one direction.
Of course the last potential contributing factor aside from measurement error that has not been considered at this point is the effect of wind. In my mind I can picture wind blowing the right direction causing a positive movement shift to fastballs from both RHPs and LHPs. Alas, removing wind effects from the data is beyond the scope of this study at this point. I would certainly be interested in revisiting this at some point using the wind speeds and directions recorded during the game.
In terms of whatever measurement error is left over, it seems to me that there are numerous sources that could play a factor. I have to admit that I do not know much at all about the specific camera setup used by SportVision, but I can make some educated guesses as to factors that could affect such a setup.
With respect to an individual camera, one source of error would likely be due to lens distortion. No two lenses will be made exactly the same, so some form of characterization is likely required for accuracy. Temperature and humidity can certainly cause relative movement within the camera as well, which is a complicated effect for which to compensate. Stationing the cameras in climate-controlled locations would likely be the best way to avoid this issue, and may be possible given the specific setup required for this system. Whatever sensors themselves are used in the cameras may be subject to both fixed pattern noise and non-uniform pixel response, which may need to be corrected for to extract the tracked ball from the image accurately.
Moving now to the larger system setup, the set of cameras would have to go through some form of registration and alignment procedure to place all cameras in the same coordinate space. This could take the form of having all cameras observe a fixed set of at least two objects that are known distances apart as it is moved throughout the measurement volume. There would be other ways to do such a registration and alignment process, but I'm convinced there must be a step of this nature required. Certainly I could see the possibility of some error being introduced in this process.
There is no doubt that compensating for environmental effects, determining systematic offsets in ballparks and classifying pitches into pitch types are all complicated problems, with many very smart people applying a wide range of techniques to tackle them. Adding to the complexity is the fact that these systems could be recalibrated during the season, leading to different calibration errors in subsets of games.
Interestingly, from the abstract it would appear that there will be a presentation at the upcoming SABR Conference on a topic very similar to the one undertaken herein as a small case study. Andy Andres and Rory Kirchner will present on a topic they call "Merging weather data to PITCHf/x and HITf/x".
Two questions that I would love to find out more about are:
(1) What is the range of this level of Pitch F/X analysis/compensation across MLB organizations?
(2) In slides from a 2008 SportVision Pitch F/X summit presentation by Ross Paul from MLBAM, he stated that "if we train a network only for a particular pitcher, we get "perfect" accuracy". In light of that, has any organization trained a per-pitcher neural network algorithm for pitch type classification? At least for their own set of major league pitchers?
As always please leave your comments and ideas below or you can contact me on Twitter at @MLBPlayerAnalys. <a href="https://twitter.com/MLBPlayerAnalys" class="twitter-follow-button" data-show-count="false">Follow @MLBPlayerAnalys</a>
Credit and thanks to Baseball Heat Maps for the Pitch F/X data upon which this analysis was based. Special thanks to Dr. Alan Nathan for his willingness to answer questions surrounding air density compensation.