Pitching is a complicated art, or science, or a bit of both. I am not a pitcher. I'm not a hitter either, at this point. So I cannot pretend to know from experience what specific aspect or combination of aspects of a pitch make it effective.
There are all kinds of variables at play that affect the outcome of each plate appearance. DIPS theory will tell us that the pitcher does not have much control over what happens to a pitch once contact is made. However, pitchers can in fact do things to help their cases. We know that pop ups are almost certainly outs, and we know a type of pitch that generates pop ups more than others. We know that ground balls will tend to turn into hits more often than fly balls, but that outside of T-ball, ground balls tend to not be home runs.
We know that not every pitch thrown in the strike zone will be called a strike, and not every pitch thrown outside the strike zone will be called a ball. We know that not every nasty slider will result in a whiff, and not every hanging changeup will be crushed for a home run.
I would suggest that it must be true that the things that a pitcher has the most control over are the characteristics of the pitches themselves. This means the pitch type, location, velocity, movement and sequencing with respect to other pitches. While some pitchers may be able to control their pitches better than others, there is nobody other than the pitcher himself who is delivering those pitches.
With the availability of PITCHf/x data, what we can do is look at which types of pitches are usually successful and which ones are typically horribly unsuccessful. By using actual results, my hope is that I can build a matrix whereby each entry is a combination of pitch characteristics that can be assigned a particular value relative to its effectiveness. With these values known, what I seek to discover is whether by looking only at the aforementioned pitch characteristics, can I establish a measure that correlates strongly with an actual pitching performance metric, like ERA or FIP?
This is what I have started doing in my latest research endeavor. In the first article in this series, I started by looking at just pitch type and location, and only within the strike zone. While there were signs that the measure was headed in the right direction, it did not correlate very well with real pitching performance metrics. From there, it appeared obvious that the pitchers most widely under-appreciated as compared to actual performance were those who threw the ball harder than average. In this article, I have added the pitch velocity dimension to the matrix to see what turns up.
I added velocity by calculating quartiles for each pitch type, so that I could look at the difference velocity makes for the same type of pitch in the same location within the strike zone. Again the weapon of choice for assigning a value to the pitch characteristic matrix entries is wOBA against. As one example, consider all four-seam fastballs thrown by RHP to RHB in the absolute middle part of the strike zone. The wOBA against was .403 for the slowest 25% of those fastballs, .397 against the slightly faster 25% of those pitches, .373 against the next fastest 25% and finally down to .356 against the fastest quarter. For another example, take sinkers thrown by LHP to RHB in the down-and-away part of the strike zone. From slowest 25% to fastest, the wOBA against was .386, .279, .274, .229. Not every combination shows this completely monotonically decreasing trend. Changeups thrown by LHP to RHB in that same down-and-away area produced .256, .237, .243, .249, showing much less of a dependency on velocity than location.
To summarize, the information that I am now considering in this experiment is pitcher handedness, batter handedness, batter height, pitch type, pitch location (only within the strike zone) and pitch velocity. I should reiterate that in assessing a pitcher, I am looking at all pitches within the strike zone, not just those that ended plate appearances.
Before getting to individual pitchers, let's get right to the summary of how well the metric is now correlating to some major pitching performance measures now that velocity has been considered. The study has been broken out to show some different groups of pitchers.
|2012 Pitcher Group||ERA||ERA-||FIP||FIP-||xFIP||xFIP-||Sample Size|
|10+ pitches in strike zone||0.20||0.20||0.26||0.26||0.20||0.21||646|
Correlation of Pitch Quality Metric (using pitch type, location in strike zone, velocity) with Performance Measures
There are some interesting observations to make about the table. While the correlation with FIP and xFIP strengthens the more pitches we have to assess for each pitcher, this is not the case for ERA. In fact, all of the correlations in the table are statistically significant at the 99% level except for the ERA and ERA- relationship for qualified pitchers (which are SS at the 95% level).
The explanation that I can give for this is that the combination of location and velocity within the strike zone is very good at identifying high strikeout pitchers. This drives a stronger correlation with FIP than ERA, since strikeouts make up a major component of FIP. xFIP shows an even stronger correlation as it partly acts like a "park effect", in that one major reason for pitchers to deviate from the league average HR/FB ratio is due to the home run environment in their home stadium. Encompassing league and park effects is the same reason that the context-neutral pitch quality metric correlates more closely with the minus versions of all of the performance measures in the table.
For qualified starting pitchers, the fact that ERA is not being described very well by this metric clearly shows a lack of complexity in the metric as it stands. By taking a look at the pitchers most under and over-appreciated by the current state of the pitch quality metric, we can attempt to identify patterns and use those to drive the next steps to take.
I will refer to the pitch quality metric for now as the crazy bunch of letters PQ(SZLV) to refer to Pitch Quality based on Strike Zone Location and Velocity. This will allow it to be distinguishable from future versions as more components are added. For this measure, the lower the number the better it is for pitchers. Here are the top and bottom leaderboards for all qualified pitchers in 2012.
Certainly if you remember the leaderboards from the original article based on strike zone location alone, you should feel much better about the look of this one than the former. Other than a surprising appearance by Clayton Richard, the top ten list features a bunch of the top pitchers in the game from last season.
Turning our attention to the bottom ten, we see names like Jeremy Hellickson, Jered Weaver and Tom Milone as pitchers who were able to be successful despite the lack of strike zone location and velocity combination. Pretty much all of these guys do not throw very hard. For nine out of the bottom ten pitchers, we see that FIP also overestimated them compared to their actual success represented by ERA. What is interesting is that for a lot of these pitchers, the estimated wOBA against within the strike zone actually lines up quite well with what they actually experienced, with Weaver and Buehrle being the exceptions. The fact that these pitchers are succeeding despite this raises a flag that to truly appreciate pitch quality, we must expand to include pitches outside the strike zone as well.
For one thing, by ignoring pitches outside the zone, we are likely missing important information that would help describe walk rates. It does not seem that focusing on pitch location within the strike zone infers this well enough to capture what we need. As an indication of how important a proxy for walk rates are in this measure, a multiple regression of PQ(SZLV) with BB% and ERA- for qualified pitchers raises the correlation to 0.49.
As a first look at how pitchers operate outside the strike zone, we can draw a roughly three inch frame around the outside of the strike zone, and see which pitchers hit within this area most frequently. We see pitchers who outperform their FIP have a fairly weak but visible correlation with percentage of pitches in this area just outside the strike zone, so there is some hope that paying attention to this area could yield positive results.
Of course there are other pieces yet that could be added to this puzzle. I have still not considered pitch movement, which is something that I could see playing a part in the success of a pitch. Each pitch is treated independently, as well. This means factors like the current count, base-out state and pitch sequence are outstanding. League and park factors are other external variables that are missing from this context-neutral measure, although pursuing correlations to the minus versions of the pitching performance indicators helps to mitigate this issue.
By good fortune, I noticed recent comments on TangoTiger by both Brian Cartwright and TangoTiger that have alerted me to the fact that ERA is actually more of a function of the square of wOBA against, which is something that I will need to keep in mind if I continue to use this as my basis for pitch quality. I believe the idea is that each pitcher creates his own run environment (via one instance of wOBA against) and then allows runs to score within that environment (via the second instance of wOBA against).
The idea in this project is to keep adding pieces one at a time and reassessing. After each component is added, I can determine by the biggest outliers in either direction which missing factor will likely make the most impact to include next. By taking this approach, I will both get a feel of the relative importance of pitch characteristics to actual results, and also allow the data to guide me along the way.
After starting with strike zone location and velocity, it appears considering these same pitch traits outside the strike zone should be the next part of the puzzle in the quest to measure a pitcher by pitch quality alone. Whether this process will get me to a measure that amounts to anything useful remains to be seen!
You can follow me on Twitter at @MLBPlayerAnalys. <a href="https://twitter.com/MLBPlayerAnalys" class="twitter-follow-button" data-show-count="false">Follow @MLBPlayerAnalys</a>
Credit and thanks to Baseball Heat Maps for PITCHf/x data upon which this analysis was based and to Fangraphs for other pitching statistics and park factors.