Last week's Pitch FX analysis of Barry Zito garnered a great deal of attention from across the web and from the Beyond the Box Score community. Thankfully, BtB is blessed with a highly educated group of followers, many of whom posed informed questions and provided helpful, well-thought suggestions. In response to comments from Mike Fast, vivaelpujols, Missing Barry and others, I present to you a far more in-depth comparison of Zito's 2009 and 2010 performances.
Please wake me when you're finished.
Question #1: Which is the more relevant shift, Zito's pitch repertoire or MLB's pitch classification algorithm?
A number of commenters suggested that the reported change in Barry Zito's pitch selection was due to changes in the Pitch FX algorithm rather than actual modifications by the pitcher. To test this, I needed to develop a (rather rough*) pitch classification scheme and apply it evenly to both seasons' data.
Having done so, I feel I can answer this question one of two ways. One possibility is that Zito is indeed pitching more two-seamers and curves this year. Another possibility is that the embattled Giants hurler is throwing the same 4-seamer, but it's coming in slower, breaking less vertically, and breaking more horizontally--and he's still throwing more curves.
See the charts below (data from Brooks Baseball's PitchFX Tool):
*By rough I mean this: I started by measuring the 1st-99th percentile range for pitches classified across both season based on velocity, spin angle, horizontal and vertical break. When ranges overlapped, I split the difference between them. For unclassified pitches, I expanded the range of the classification ranges until they included unidentified pitches without creating more duplicates. I considered myself successful when 98% of pitches were classified and there were no duplicates.
There is certainly a notable shift in the speed, spin, and break of Zito's various pitches across both seasons. Whether these represent valid changes in the classification of his pitches, or whether they are an artifact of the difficulty of divining pitch types based from the behavior of the baseball, well I'll leave that up to you. The shift in pitch clusters is statistically significant, and visually noticeable, so I am confident that the change in pitch ID has more to do with Barry Zito than the MLBAM algorithm.
That said, this is how the ratio of Zito's pitches thrown has changed according to my classification scheme:
My classification scheme wasn't perfect. While I succeeded in avoiding duplicates, about one pitch out of every hundred was left unclassified. However, since the total is low and the distribution is even across both seasons, I'm not too concerned about this. For the curious, most of these pitches are curves that fell outside the horizontal break range of most curves, and sliders that fell outside of the spin range of most sliders.
Question #2: Is Zito really performing better, or is he just luckier? What statistics can help us answer this question?
Let's take a look again at the difference between Zito's performance over the first nine starts of this season and last (data from Baseball Reference and Brooks Baseball):
We know that Zito's winning more and that he's allowing fewer runs and fewer bases, but of course these stats give him credit for his offense and defense. Same goes for WPA and WPA/LI. I also noted in the previous post that Zito's BABIP is down, and that it is likely to regress back to his career average at some point, indicating that Zito's been a bit lucky. The fact that Zito's K/9 and HR/FB rates are down so dramatically is also a good indication that his performance is unsustainable.
On the other hand, Zito's FIP is down substantially, and his xFIP is down as well. But while Zito's FIP is clearly aided by his abnormally low HR rate, Zito's FIP would be 0.50 better this year than last even if Zito had allowed 6 dingers this year rather than just the one. While some regression is clearly in order, it does look as if Zito's pitching better, even when we include last week's disaster.
Before moving on, a word about defense-independent pitching stats: we know that xFIP does a better job of predicting future ERA than FIP, and that FIP does a better job than ERA itself. This doesn't mean that we should ignore FIP or ERA, it just means that we should be more confident about the stats that are better predictors. It isn't appropriate to throw out a good stat just because a better one also exists. A good stat is still a good stat, and if two good stats tell a slightly different story, that's a reason to look more closely, not a reason to discard useful information.
Question #3: What has Zito done to improve his ground ball rate?
Honestly? I'm not entirely sure. The PitchFX data tells me that his pitch selection is more balanced, and that he's relying on pitches with more break (his curve and two-seamer), so I skipped a couple of the steps required by Aristotle's logic and suggested that the two could be related. I theorized that hitters were thus far unable hit the ball squarely, and were less able to sit on Zito's four-seamer, which breaks less than his other pitches and reaches the plate at a velocity reminiscent of my days playing Class-E prep school ball in Connecticut.
Can I back that up with the stats? Not really, but I can tell you that Zito has done a better job this season keeping his two-seamer and curve out of play, and a much better job of throwing the two-seamer for strikes. See the chart below:
Question #4: Is Zito's cutter really different from his two- or four-seamer?
Yes, absolutely. If you look at the PitchFX charts at the top of this post, it's pretty clear that neither I nor PitchFX are confusing his cutter with his other fastballs or curve--but maybe with his slider.
Question #5: Why didn't I provide more information about strikes swinging and strikes looking, location, and Zito's whole arsenal?
Because I didn't expect you to read a post quite as long as this one, and so I left out some data for the sake of brevity. I focused on Zito's curve, two- and four-seamers because those were the pitches in which we see the most variation between 2009 and 2010. Likewise, I ignored strike ratios because the data was essentially unchanged between 2009 and 2010.
As far as location goes, there are several reasons why I didn't employ an analysis of pitch location. First, while I'd love to do what Jeremy Greenhouse is doing over at Baseball Analysts, I'm just not that good--not yet. That and a few issues with this type of analysis--that location provides little information absent knowledge of intent, the presence of ballpark bias, and pitchers' strategic responses to umpires' shifting zones--convinced me to stay away from this type of work.
Question #6: How can I make an informed decision about Zito's performance when my sample size is only eight (now nine) games?
Simple, because nine games isn't the sample. The sample is dependent upon the common denominator of the statistic, which ranges from innings pitched (n=117) to pitches thrown (n=1868). Employing a basic margin of error calculation (while assuming a 95% confidence interval and a population equal to two full years worth of data), we can tell that inning-based stats in this data set sway about 8% in either direction, while pitch-based stats err just short of 2%, and all the others err somewhere in between.
For instance, Zito's xFIP over the first 9 games of both seasons has improved by 0.60, from 5.20 to 4.60. xFIP is scaled to ERA, so those figures include an ERA constant of 3.2. Before xFIP is scaled to ERA, Zito's "pure" xFIP quotients from 2009 and 2010 are 3.00 and 2.40, respectively. This indicates a 20% improvement for Zito, which falls easily beyond the margin of error for this statistic at this point in the season.
Question #7: Can I say more about what Zito's low BABIP and pitch selection will mean moving forward?
Sure, so long as you know that my most confident predictions will be the least interesting, and that my most interesting predictions will be only educated guesses. If you're looking for something better than that, I suggest you check out ZiPS, or CHONE, or MARCEL--but here goes nothing:
- Regardless of pitch selection, Barry Zito hasn't transformed into Mariano Rivera circa-1997. He won't maintain his outrageously low HR/FB rate. As it regresses to the mean, Zito's ERA, FIP, WPA and W-L% will suffer. The same thing will happen as Zito's BABIP regresses.
- On the bright side, the fact that the velocity-break trade-off seems to be working means that Zito may be able to employ his two-seamer and his curve to keep the ball out of play, lessening the damage he'll sustain during reentry. I predict that if Zito can keep the break on his two-seamer and curve, without losing more velocity, he can still avoid regressing entirely as the season progresses.
- On the other hand: Zito doesn't look like he's doing a very job maintaining speed or velocity of late. In his latest disaster, Zito completely lost both. The chart below expresses the speed and break of Zito's pitches, per pitch type, as a Lowess curve over the course of the season to date (2009 in Navy, 2010 in Green):
To make a long story short (TOO LATE!) my interpretation of the data is that it's clear Zito has changed the way he's pitching, that it's clear he's pitching better, and that I think he can avoid a complete regression, but that we're probably already witnessing his return to earth. Even in yesterday's no-decision against the Nats, Zito looked like he was overthrowing. As a casual observer following the Nats-Giants game on MLB At Bat and tracking his pitches, it seemed to me that Zito was overcompensating for his diminished velocity, giving up control in the attempt.
If the former Cy Young winner can't at least maintain the break and speed he had at the beginning of 2010, the law of averages isn't going to be his biggest problem. Finally, I'd like to say that it's both difficult and rewarding writing for the smartest audiences. Thanks for keeping me honest, and I'll do my best to meet your expectations in the future, even if my future posts aren't quite as long or in-depth as this one.