When Bill Petti and Jeff Zimmerman released their research on Edge%, I have to admit that I got very excited. I have always thought that it would be interesting to look at pitch characteristics such as location, velocity and movement and come up with a measure of expected pitcher performance based solely on these aspects without worrying about the actual results of the pitch. The hope would be that this measure could then be used to explain and/or predict actual pitcher performance.
Preceding Edge% there was Zone%, which likewise looks at pitch locations to give us some information about pitch characteristics of a pitcher without regard for results. For all qualified pitchers in 2012, Zone% correlates negatively with BB% at -0.39 and with BABIP at -0.12, so it can prove helpful in trying to explain some pitching results.
What I decided to investigate was whether I could look inside the strike zone to determine what a successful pitch looks like and work backwards from there to create a measure for how well a pitcher is executing pitches that look like they should be successful. To start, as in the previously discussed metrics I will consider only location, ignoring for the moment velocity and movement.
Certainly what is going to make a good pitch location in the strike zone is going to depend on a couple of things. The first thing will be the pitch type. As Glenn DuPaul showed recently, and I have confirmed in this work, fastballs located up in the zone are more desirable but changeups down in the zone are more successful. The second thing that will be important to consider is the platoon situation. Throwing a slider to the outside corner of a same-handed batter versus an opposite-handed batter may yield very different results, for example, as the pitch will start over the plate versus off the plate depending on the handedness of the batter.
Based on these two points, I split up my investigation by pitch type and also by pitcher-batter handedness combination. The next task was to determine a method of determining what made pitches successful. To do so, I calculated a modified form of wOBA against, just for pitches inside the strike zone. As I'm only looking at pitches inside the zone here, walks and hit by pitches are extracted from the standard wOBA formula. wOBA against seems like an appropriate metric here since it is context-neutral, and in this case I'm looking at the pitch level and ignoring base-out state and even count.
I split up the strike zone into an equal sized, 3x3 grid based on the custom strike zone formulas developed by Mike Fast based on batter handedness and batter height. I then calculated this modified form of wOBA for each of these sub-zones for each pitch type (using GameDay pitch classifications) and each pitcher-batter handedness combination over all pitches thrown in 2012. As an example, the calculated wOBA against for changeups thrown by LHP to LHB in the low-and-away sub-zone was .118, but .352 when thrown in the low-and-in sub-zone. Changeups thrown in the upper-middle sub-zone were scorched to a .535 wOBA.
From here, I then looked at every pitch thrown in the strike zone by each pitcher, which now had an associated "expected" wOBA against should it be the last pitch of the plate appearance. Since the pitcher does not know when the plate appearance will end, every pitch thrown into the strike zone is used to contribute to the overall strike zone pitch location quality measurement.
As an example, pretend that Cliff Lee threw only two pitches in the strike zone in 2012, and both happened to be changeups to LHBs. If one of these was located in the low-and-away sub-zone and one in the low-and-in sub-zone, then his estimated strike zone wOBA would be (0.118 + 0.352) / 2 = 0.235, regardless of whether the hitter took the pitches, fouled them off, grounded out or tripled.
At this point, the question we're really answering here is "How well are these pitchers locating their pitches within the strike zone?", assuming that the pitcher cannot control velocity, movement, environment or on which pitch the batter will make contact.
If a pitcher hangs a changeup that is fouled back, in other metrics this will go unnoticed. In this measurement, it will contribute to a worse expected wOBA against. Alternatively, if a pitcher throws a slider in a typically ideal down-and-away location that is looped for a single, in this expected measurement it will contribute beneficially.
With that said, we can now get to the results. Let us consider the 88 qualified pitchers from the 2012 season, starting with the best and worst in strike zone "expected" wOBA:
|Pitcher||Estimated SZ wOBA Rank||Estimated SZ wOBA||Actual SZ wOBA Rank||Actual SZ wOBA|
There are several observations that can be made about this table, which is available in its entirety here. The most important one is that the "estimated" wOBA based on location is not appearing to be a very good estimator of actual wOBA. It has a small positive 0.10 correlation to the actual strike zone modified wOBA. The "estimated" wOBA does also correlate with BABIP at 0.18, ERA at 0.21, K% at -0.20 and BB% at 0.09. So at least things are headed in the right direction here.
Before we get to what is missing, I'd like to point out a couple of interested entries in the table. R.A. Dickey defines the most of his own estimator in this protocol, given that he threw almost all knuckleballs last season. So it is not surprising to see his estimation be close. Aaron Harang is an interesting case where both the estimator and actual strike zone performance was high.
On the negative end, an alarm always goes off whenever we see FIP-buster Jeremy Hellickson's name appear toward one extreme of any metric. (Actually, from related work, if you draw a more-or-less 3 inch wide frame around the outside of the batter-adjusted strike zone, Hellickson threw a higher percentage of pitches in this area than any other qualified starter in 2012). Rick Porcello, Henderson Alvarez and Luis Mendoza are all pitchers who rated poorly in both estimated and actual performance in the zone.
Edge% leader David Price appears just outside the top ten list in 11th, while Edge% laggard Tim Lincecum ranks just above the bottom ten list in 75th.
Back to the problem: something or more likely some things are missing from this approach. Let's take a look at the best and worst leaderboard again, but this time sorted by largest differences between actual and estimated strike zone wOBA:
|Pitcher||Estimated SZ wOBA||Actual SZ wOBA||Delta||vFA||HR/FB%||Home Park HR Factor|
I added a couple of extra columns to the table that may be contributing to the ineffectiveness of the current strike zone pitch quality estimator. To start, you can see that only four of the 10 pitchers who were rated most highly in comparison to their actual performance throw with above average velocity. In contrast, eight of the 10 pitchers who rated most lowly relative to their actual performance throw with above average fastball velocity. Similar to the type of correlation reported with Edge% and vFA, in this case the correlation between the delta above and vFA is -0.22.
A second note is that the pitchers who were estimated better than they actually realized almost all have double-digit HR/FB ratios, while C.J. Wilson and David Price are the only members of the other extreme of the list that suffered that fate. The wOBA constants used here are not park corrected, and neither are any of the calculations used in this exercise. For this reason, home parks can certainly impact the results. There is a positive correlation of 0.12 between home park HR factor and the delta. The correlation between the delta and the HR/FB% is 0.58, such that the HR/FB% alone is describing 33% of the variability in the gap between the estimator and the actual wOBA within the strike zone.
A multiple regression of vFA and HR/FB for the delta showed both statistically significant, with and adjusted R^2 of 0.37.
In short, pitchers who were estimated much too rosily were either relatively soft tossers and/or play in ballparks with extreme home run environments. Pitchers who were estimated much too harshly were flamethrowers who played in more favorable (or at least neutral) home parks with respect to home runs.
What I learned from this exercise is that locating pitches in the regions of the strike zone most favorable to success does not on its own mean very much to your realized success.
I believe that this list does provide useful information as it is, however, as pitchers for the most part cannot really control how hard they throw. For guys like Dan Haren and Tommy Hanson who top out velocity-wise in the 80s but appear to be executing pitches within the strike zone at a high level, one wonders what they can do going forward to improve their results. Perhaps the answer lies in pitch sequencing or operating more slightly outside the strike zone?
It seems a reasonable next step in this endeavor would be to include pitch velocity into the equation, as it does appear that it plays a part in what makes a successful pitch in the strike zone. It will be interesting to see whether velocity in combination with location starts to explain more of the HR/FB% that is driving a significant portion of the gap between the current estimator and the true results, or helps make the distribution of estimated wOBAs more spread out like they are in actuality.
Eventually it would be nice to expand this to previous years, and perhaps use a three year average for the sub-zone expected wOBAs. Whether or not I want to include park factors is another matter, as it would add more complexity and take away the idea of a "neutral" pitch quality measurement.
I would love to hear your comments or ideas about the concept in general and the methodology. Have you already seen something like this before? Are there other factors that you believe I must consider?
You can follow me on Twitter at @MLBPlayerAnalys. Follow @MLBPlayerAnalys