Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Nevin Shapiro Vows To Bring Down Miami

Barry Zito Redux: Bigger Breaks, Less Zip, More Questions

Last week's Pitch FX analysis of Barry Zito garnered a great deal of attention from across the web and from the Beyond the Box Score community. Thankfully, BtB is blessed with a highly educated group of followers, many of whom posed informed questions and provided helpful, well-thought suggestions. In response to comments from Mike Fast, vivaelpujols, Missing Barry and others, I present to you a far more in-depth comparison of Zito's 2009 and 2010 performances.

Please wake me when you're finished.

Question #1: Which is the more relevant shift, Zito's pitch repertoire or MLB's pitch classification algorithm?

A number of commenters suggested that the reported change in Barry Zito's pitch selection was due to changes in the Pitch FX algorithm rather than actual modifications by the pitcher. To test this, I needed to develop a (rather rough*) pitch classification scheme and apply it evenly to both seasons' data.

Having done so, I feel I can answer this question one of two ways. One possibility is that Zito is indeed pitching more two-seamers and curves this year. Another possibility is that the embattled Giants hurler is throwing the same 4-seamer, but it's coming in slower, breaking less vertically, and breaking more horizontally--and he's still throwing more curves.

See the charts below (data from Brooks Baseball's PitchFX Tool):

Spin_medium

Break_medium

 

Star-divide

*By rough I mean this: I started by measuring the 1st-99th percentile range for pitches classified across both season based on velocity, spin angle, horizontal and vertical break. When ranges overlapped, I split the difference between them. For unclassified pitches, I expanded the range of the classification ranges until they included unidentified pitches without creating more duplicates. I considered myself successful when 98% of pitches were classified and there were no duplicates. 

 

There is certainly a notable shift in the speed, spin, and break of Zito's various pitches across both seasons. Whether these represent valid changes in the classification of his pitches, or whether they are an artifact of the difficulty of divining pitch types based from the behavior of the baseball, well I'll leave that up to you. The shift in pitch clusters is statistically significant, and visually noticeable, so I am confident that the change in pitch ID has more to do with Barry Zito than the MLBAM algorithm.

That said, this is how the ratio of Zito's pitches thrown has changed according to my classification scheme:

Season Change Curve Cutter 4-Seam 2-Seam Slider ?
2009 16.3% 14.5% 4.5% 45.6% 5.6% 10.8% 1.3%
2010 15.0% 21.0% 3.2% 30.7% 16.2% 10.8% 1.5%
-1.3% 6.5% -1.4% -14.9% 10.6% 0.0% 0.2%

My classification scheme wasn't perfect. While I succeeded in avoiding duplicates, about one pitch out of every hundred was left unclassified. However, since the total is low and the distribution is even across both seasons, I'm not too concerned about this. For the curious, most of these pitches are curves that fell outside the horizontal break range of most curves, and sliders that fell outside of the spin range of most sliders.

Question #2: Is Zito really performing better, or is he just luckier? What statistics can help us answer this question?

Let's take a look again at the difference between Zito's performance over the first nine starts of this season and last (data from Baseball Reference and Brooks Baseball):

Season K/9 K-BB HR   HR/FB OPS Strikes Look Swing W-L% WPA WPA/LI ERA FIP xFIP
2009 5.95 11 6 6.1% 0.713 62.0% 19.0% 7.0% 0.167 0.294 0.294 4.02 4.89 5.20
2010 5.46 15 1 1.0% 0.583 63.0% 18.0% 7.0% 0.750 1.410 1.343 2.80 3.23 4.60
-0.49 4 -5 -5.2% -0.130 1.0% -1.0% 0.0% 0.583 1.116 1.049 -1.22 -1.65 -0.60

We know that Zito's winning more and that he's allowing fewer runs and fewer bases, but of course these stats give him credit for his offense and defense. Same goes for WPA and WPA/LI. I also noted in the previous post that Zito's BABIP is down, and that it is likely to regress back to his career average at some point, indicating that Zito's been a bit lucky. The fact that Zito's K/9 and HR/FB rates are down so dramatically is also a good indication that his performance is unsustainable.

On the other hand, Zito's FIP is down substantially, and his xFIP is down as well. But while Zito's FIP is clearly aided by his abnormally low HR rate, Zito's FIP would be 0.50 better this year than last even if Zito had allowed 6 dingers this year rather than just the one. While some regression is clearly in order, it does look as if Zito's pitching better, even when we include last week's disaster.

Before moving on, a word about defense-independent pitching stats: we know that xFIP does a better job of predicting future ERA than FIP, and that FIP does a better job than ERA itself. This doesn't mean that we should ignore FIP or ERA, it just means that we should be more confident about the stats that are better predictors. It isn't appropriate to throw out a good stat just because a better one also exists. A good stat is still a good stat, and if two good stats tell a slightly different story, that's a reason to look more closely, not a reason to discard useful information.

Question #3: What has Zito done to improve his ground ball rate?

Honestly? I'm not entirely sure. The PitchFX data tells me that his pitch selection is more balanced, and that he's relying on pitches with more break (his curve and two-seamer), so I skipped a couple of the steps required by Aristotle's logic and suggested that the two could be related. I theorized that hitters were thus far unable hit the ball squarely, and were less able to sit on Zito's four-seamer, which breaks less than his other pitches and reaches the plate at a velocity reminiscent of my days playing Class-E prep school ball in Connecticut.

Can I back that up with the stats? Not really, but I can tell you that Zito has done a better job this season keeping his two-seamer and curve out of play, and a much better job of throwing the two-seamer for strikes. See the chart below:

Result Season Change Curve Cutter 4-Seam 2-Seam Slider
Ball 2009 37.2% 36.5% 41.9% 36.7% 44.4% 34.3%

2010 37.2% 38.9% 44.8% 34.4% 37.6% 41.4%

0.0% 2.4% 3.0% -2.3% -6.9% 7.1%
Strike 2009 44.2% 43.8% 39.5% 46.0% 31.5% 47.1%

2010 46.0% 43.0% 37.9% 47.2% 41.6% 43.4%

1.8% -0.8% -1.6% 1.1% 10.1% -3.6%
In Play 2009 18.6% 19.7% 18.6% 17.2% 24.1% 18.6%

2010 16.8% 18.1% 17.2% 18.4% 20.8% 15.2%

-1.8% -1.6% -1.4% 1.2% -3.3% -3.5%

Question #4: Is Zito's cutter really different from his two- or four-seamer?

Yes, absolutely. If you look at the PitchFX charts at the top of this post, it's pretty clear that neither I nor PitchFX are confusing his cutter with his other fastballs or curve--but maybe with his slider.

Question #5: Why didn't I provide more information about strikes swinging and strikes looking, location, and Zito's whole arsenal?

Because I didn't expect you to read a post quite as long as this one, and so I left out some data for the sake of brevity. I focused on Zito's curve, two- and four-seamers because those were the pitches in which we see the most variation between 2009 and 2010. Likewise, I ignored strike ratios because the data was essentially unchanged between 2009 and 2010.

As far as location goes, there are several reasons why I didn't employ an analysis of pitch location. First, while I'd love to do what Jeremy Greenhouse is doing over at Baseball Analysts, I'm just not that good--not yet. That and a few issues with this type of analysis--that location provides little information absent knowledge of intent, the presence of ballpark bias, and pitchers' strategic responses to umpires' shifting zones--convinced me to stay away from this type of work.

Question #6: How can I make an informed decision about Zito's performance when my sample size is only eight (now nine) games?

Simple, because nine games isn't the sample. The sample is dependent upon the common denominator of the statistic, which ranges from innings pitched (n=117) to pitches thrown (n=1868). Employing a basic margin of error calculation (while assuming a 95% confidence interval and a population equal to two full years worth of data), we can tell that inning-based stats in this data set sway about 8% in either direction, while pitch-based stats err just short of 2%, and all the others err somewhere in between.

For instance, Zito's xFIP over the first 9 games of both seasons has improved by 0.60, from 5.20 to 4.60. xFIP is scaled to ERA, so those figures include an ERA constant of 3.2. Before xFIP is scaled to ERA, Zito's "pure" xFIP quotients from 2009 and 2010 are 3.00 and 2.40, respectively. This indicates a 20% improvement for Zito, which falls easily beyond the margin of error for this statistic at this point in the season.

Question #7: Can I say more about what Zito's low BABIP and pitch selection will mean moving forward?

Sure, so long as you know that my most confident predictions will be the least interesting, and that my most interesting predictions will be only educated guesses. If you're looking for something better than that, I suggest you check out ZiPS, or CHONE, or MARCEL--but here goes nothing:

  1. Regardless of pitch selection, Barry Zito hasn't transformed into Mariano Rivera circa-1997. He won't maintain his outrageously low HR/FB rate. As it regresses to the mean, Zito's ERA, FIP, WPA and W-L% will suffer. The same thing will happen as Zito's BABIP regresses.

  2. On the bright side, the fact that the velocity-break trade-off seems to be working means that Zito may be able to employ his two-seamer and his curve to keep the ball out of play, lessening the damage he'll sustain during reentry. I predict that if Zito can keep the break on his two-seamer and curve, without losing more velocity, he can still avoid regressing entirely as the season progresses.

  3. On the other hand: Zito doesn't look like he's doing a very job maintaining speed or velocity of late. In his latest disaster, Zito completely lost both. The chart below expresses the speed and break of Zito's pitches, per pitch type, as a Lowess curve over the course of the season to date (2009 in Navy, 2010 in Green):

Breaktd_medium

 

Velotd_medium

To make a long story short (TOO LATE!) my interpretation of the data is that it's clear Zito has changed the way he's pitching, that it's clear he's pitching better, and that I think he can avoid a complete regression, but that we're probably already witnessing his return to earth. Even in yesterday's no-decision against the Nats, Zito looked like he was overthrowing. As a casual observer following the Nats-Giants game on MLB At Bat and tracking his pitches, it seemed to me that Zito was overcompensating for his diminished velocity, giving up control in the attempt.

If the former Cy Young winner can't at least maintain the break and speed he had at the beginning of 2010, the law of averages isn't going to be his biggest problem. Finally, I'd like to say that it's both difficult and rewarding writing for the smartest audiences. Thanks for keeping me honest, and I'll do my best to meet your expectations in the future, even if my future posts aren't quite as long or in-depth as this one.

Poll
Is Zito's 2010 performance real or just a mirage?
Real, and he'll outperform the mean this season.
27 votes
It was real, but he's losing too much speed and break.
41 votes
It was all luck, and the regression's already started.
21 votes

89 votes | Poll has closed

Comment 28 comments  |  2 recs  | 

Do you like this story?

Comments

Display:

Polls are a funny thing.

Why not this:
“it’s definitely real, but a little luck too. He’ll probably regress, but I’m not certain.”

;)

by Justin Bopp on May 28, 2010 10:17 AM EDT reply actions  

HA!

Blogger and Editor, Rational Pastime Blog (http://www.rationalpastime.com/)

by J-Doug on May 28, 2010 12:41 PM EDT up reply actions  

Absolute value of the break

To account for the fact that some pitches break up or down, left or right.

Blogger and Editor, Rational Pastime Blog (http://www.rationalpastime.com/)

by J-Doug on May 28, 2010 12:13 PM EDT up reply actions  

?

How did you calculate it?

Fuzz

by RZ on May 28, 2010 3:15 PM EDT up reply actions  

Presumably

pfx = sqrt( pfx_x ^ 2 + pfx_z ^2 )

Winner, Beyond the Box Score 32 Predictions Contest, 2009

by Mike Fast on May 28, 2010 3:21 PM EDT up reply actions  

Maybe

I just never heard of “absolute break”.

Fuzz

by RZ on May 28, 2010 3:23 PM EDT up reply actions  

Some other people do it this way

I’m not 100% sold on it, but it’s certainly a legit approach.

Some other approaches:
spin deflection (i.e. break) relative to the average four-seam fastball
spin deflection + gravity (relative to fastball, or not)
spin axis angle (but this doesn’t work well for sliders, with a spin axis that mostly points toward home plate)

Winner, Beyond the Box Score 32 Predictions Contest, 2009

by Mike Fast on May 28, 2010 3:28 PM EDT up reply actions  

I'm not sure...

…what it’s called in the field. I actually didn’t post the pythagorean in that particular graph, just the sum of the absolute value of the horizontal and vertical breaks of his various pitches. I could have broken it up, but the trend looks the same.

I used pythagoras in the last post, but the more I think about it I’m turned off by it. The value of the hypoteneuse of a right triangle is far more sensitive to changes in the longer side than the shorter one, which means that it will undervalue variance in the smaller of the pfx values and overvalue the variance of the larger ones, which is exactly the opposite of what you’d want to do, since an equal change in the “shorter” dimension is probably more, not less, important.

Blogger and Editor, Rational Pastime Blog

by J-Doug on May 28, 2010 4:10 PM EDT up reply actions  

Do you think batters see in two orthogonal dimensions

with axes fixed at right angles to the plane of the ground?

Winner, Beyond the Box Score 32 Predictions Contest, 2009

by Mike Fast on May 29, 2010 10:44 AM EDT up reply actions  

I still think he's not throwing anymore two seamers based off of the charts you showed

There is no difference in the movement of the pitchers, and I’m skeptical that small differences in velocity are anything other than random variation, especially since the velocity on all of his pitches are down, not just the two seamers. I’ll run my K-Means clustering when I get home later to verify for sure.

And to your point about location, yes there are probably some small park and umpire effects, but mostly what you are looking for in location is general trends and Pitch f/x is perfectly fine for that. Is he thrown more balls down in the zone, is he throwing more pitches outside the strike zone, etc. Location is a huge part of success, so you can’t just not look at it.

I also disagree that he’s pitched better this year. Of the most sustainable stats, K’s, BB’s and GB%, his K’s a way down, his BB’s are slightly down and his GB% is slightly up. According to FanGraphs, his xFIP is up by .3 points from 2009, so I’m not really sure where you are getting your numbers from. Are you calculating them yourself? He is getting fewer line drives, but that stat is very unsustainable for pitchers, having virtually no year to year correlation.

I appreciate the follow up and this is a very thourough article, but I’m just not convinced that anything has changed with Zito, besides slightly lowered velocity on his pitches.

by vivaelpujols on May 28, 2010 2:22 PM EDT reply actions   1 recs

I agree with VEP

You are not classifying his two-seamer vs. four-seamer properly, particularly in 2009.

you look at the PitchFX charts at the top of this post, it’s pretty clear that neither I nor PitchFX are confusing his cutter with his other fastballs or curve—but maybe with his slider.[/quote]

Right. He doesn’t throw a cutter. Those are all sliders.

far as location goes, there are several reasons why I didn’t employ an analysis of pitch location. First, while I’d love to do what Jeremy Greenhouse is doing over at Baseball Analysts, I’m just not that good—not yet. That and a few issues with this type of analysis—that location provides little information absent knowledge of intent, the presence of ballpark bias, and pitchers’ strategic responses to umpires’ shifting zones—convinced me to stay away from this type of work.[/quote]

If you look at location, you will find it provides a lot of information and for most pitches (typically, other than the four-seam fastball) you CAN actually tell where the pitcher intended to throw it.

Ballpark bias and umpire zones make a difference on the order of an inch or two. It’s something to be concerned about if you’re trying to measure the strike zone. It’s nothing to be concerned with if you’re looking at pitcher strategy and execution.

This approach really troubles me. It seems that you’re starting with the conclusion that something has changed with his approach because his ERA is lower, looking for anything that has changed at the pitch level , and then conjuring an explanation to causally connect the two. Take any pitcher and look at enough things and you will find something that appears to have changed from one year to the next, whether it really has or not, and whether that is significant to his performance or not.

Look at where his results have changed (balls in play, walks, strikeouts) and work backward from that to see what you can explain, if anything. It’s an acceptable conclusion (and the one I most often reach, particularly on sub-season size samples) to say that you can’t determine if anything significant has changed. (And by “significant”, I mean anything meaningfully and persistently affecting game results, not statistical significance.)

Winner, Beyond the Box Score 32 Predictions Contest, 2009

by Mike Fast on May 28, 2010 3:14 PM EDT up reply actions  

Working backwards

Mike: most of what I would respond to you I’ve already said to vivaelpujols, so see my response below. I’m employing a classification that derives from MLBAM’s assumptions but that is consistent for both years, eliminating the issue of year-to-year changes in the algorithm. He’s still throwing more curves—if you disagree with this then I’d like to see the data contradicting mine. His fastballs come in slower with less v-break and more h-break (I mistyped in my response to VEP). This tells me that there’s a good chance he’s made a conscious modification, and there’s very little question that batters are trying to hit a fastball that breaks more in or away, and that those who are sitting on what the PitchFX tool sees as his four-seamer from 2009 won’t be so often rewarded.

But I can’t help if you if you’re troubled by my approach, your perception of which is not accurate (and I’d thank you not to infer my motives without any information about them). If you’re interested, this was my chain of thought: 1) Zito’s winning more, so I should see if I can explain that change away with luck, offense and defense. 2) I found that there was definitely some luck involved, but that there were also noticeable changes in GB and BB rates, as well as FIP and xFIP. 3) Having concluded that Zito isn’t just luckier this year, I aimed to investigate whether or not he changed his approach in any way. 4) I concluded that he has (to which many of you responded I wasn’t rigorous enough, and having been more rigorous I still conclude that he has), and then I offered that these modifications could be responsible for the change in outcomes. I’m fine if you’re not satisfied with that, it’s a free Internet.

Finally, I did look at his end results, as you can see in my post. There is significant change, and by significant I mean statistically significant, because the tools exist to determine the likelihood that the variance in two samples is random chance, and because I know how to use them. Feel free to disagree with my findings, and to claim that I’m not classifying the pitches properly (by now you should have figured out that the two-seamer vs. four-seamer issue is far less important than what the actual pitch is doing, and probably less important than curveball usage). If you think I’m wrong, please show me the data.

Blogger and Editor, Rational Pastime Blog

by J-Doug on May 28, 2010 3:32 PM EDT up reply actions  

There is significant change, and by significant I mean statistically significant, because the tools exist to determine the likelihood that the variance in two samples is random chance, and because I know how to use them.

No, you don’t.

What you know how to do is determine if the variance in two samples is likely to be due to random chance, assuming the two samples were picked at random.

But they weren’t. There are at bare minimum 150 starters in MLB this season (30*5), and you picked one of them because you know ahead of time that there was a difference between the two samples.

Given (at minimum) 150 starters, we know that even at 95% significance we’re likely to see, what, 7-8 starters with “statistically significant” changes due to random chance alone, right?

(And that presumes you are calculating statistical significance properly – and if you’ve found that a 0.60 difference in xFIP is significant in 117 innings you’re not.)

by cwyers on May 29, 2010 1:09 AM EDT up reply actions  

To quote from the econometrics text by Wallis and Roberts
It is essential not to confuse the statistical usage of “significant” with the everyday usage. In everyday usage, "significant’ means “of practical importance,” or simply “important.” In statistical usage, “significant” means “signifying a characteristic of the population from which the sample is drawn,” regardless of whether the characteristic is important.

(emphasis mine)

And from Freedman, Pisani, and Purves:

…“significance” is a technical word. A test can only deal with the question of whether a difference is real [permanent in Venn’s sense], or just a chance variation. It is not designed to see whether the difference is important.

Unless one develops and applies some baseball understanding to the data, it will be difficult to come to conclusions of any real importance, whether or not your tests for statistical significance are met.

Winner, Beyond the Box Score 32 Predictions Contest, 2009

by Mike Fast on May 29, 2010 10:58 AM EDT up reply actions  

We can agree to disagree on this

Easiest responses first:

1. I’m comparing Zito’s xFIP over the first 9 starts of 2009 and the first 9 starts of 2010, not his partial 2010 to his full 2009. If you’ll recall, Zito performed better between June-Sep last year than from Apr-May. Also over this same time period, his BB/9 is down from 3.86 to 3.10, which is a better improvement than his K/9’s deterioration. Yes, I am calculating the numbers myself, using B-R data. FWIW, the change in his K/9 is just barely outside the margin of error, while the change in his BB/9 is well outside of it.

2. As I noted, I can’t tell whether he’s actually throwing more two-seamers, and I felt I was clear that I’m willing to concede that. However, my findings are that the shifts in horizontal break, vertical break and spin—whether we look at the cluster that just includes FT and FF or whether we throw CH in there too (since it breaks in a similar manner)—are significant at 0.001. His velocity shift is weakly significant at 0.1. I think it’s a perfectly safe conclusion that he’s losing some speed, which means A) a pitch with the same spin deflection will break more anyway and/or B) Zito is purposely adding more break to his fastball arsenal.

If you want to double check me, make sure you’re comparing the data from starts 1-9 in 2009 and starts 1-9 in 2010, not starts 1-9 in 2010 to his full 2009. I haven’t yet checked to see if maybe this is a correction Zito made over the course of last season that just carried over.

Also, let’s not forget the curve. According to the data there’s no denying at this point that he’s throwing his curve more, esp. since his slider percentage (which the curve is most likely to be confused with) identical. What I get from this is that hitters who were willing to sit on Zito’s slow, moderately breaking fastball last year could do some damage, but do that this year and he’s more likely to throw a curve in the zone (which is coming in for fewer strikes per pitch but more total strikes overall) or throw a pitch that looks very similar but will break in or away, depending on your stance. Probably should have looked at this via batter handedness, now that I think about it.

3. I agree that location is important, but I think it’s a bit rigid to say that it’s an absolute requirement. The work in this area has shown that ballpark effects aren’t necessarily small or easily controlled for, and that error is going to be much bigger when comparing 18 games than two full seasons. Same goes for umpire effects—it’s not appropriate to assume that the effect is going to wash out in this dataset.

Additionally, the type of geometry involved in measuring location means that year-to-year and park-to-park differences yields more error than when measuring things like speed, spin, and break. Same goes for release point.

Moreover, I have absolutely, positively no confidence that the top and bottom of the strike zones are measured accurately. Combine park and season error with the fact that umpire’s judgments of the top and bottom of the zone are far less constrained, the fact that pitchers know this, and above all the fact that there are humans making judgment calls on these boundaries, and there’s just way too much potential for spurious correlations. (I’d love to see the inter-coder reliability data on top and bottom strike zone boundary measurements).

Maybe I’m wrong about all this, and if you have the data to prove it then please send it along. That said, if I could make awesome heat maps like they do at BA, then I’d probably make them anyway. If I can get there you’ll see it in future posts.

Finally, I appreciate your skepticism. I’ve seen the work you’ve done here and I know you’ve got game. If you didn’t push back I’d be disappointed. I’ll just have to try harder in the future.

Blogger and Editor, Rational Pastime Blog

by J-Doug on May 28, 2010 3:15 PM EDT up reply actions  

No! Why?

Okay, hopefully I can the quotey thing to work a little better in this post.

If you want to double check me, make sure you’re comparing the data from starts 1-9 in 2009 and starts 1-9 in 2010, not starts 1-9 in 2010 to his full 2009. I haven’t yet checked to see if maybe this is a correction Zito made over the course of last season that just carried over.

You keep saying that you only want to compare starts 1-9 in 2009. This makes no sense to me. You’re severely and artificially restricting your comparison sample for little reason.

Yes, velocity changes a little bit throughout the course of a season. (And really it’s more dependent on temperature than it is point in the season.) Compared to all the information you’re throwing away, you just don’t gain much this way. The only reason I would do that would be if I were specifically trying to study how his velocity changed over the course of the season. And then in that specific instance I would absolutely need to control for temperature.

Winner, Beyond the Box Score 32 Predictions Contest, 2009

by Mike Fast on May 28, 2010 3:20 PM EDT up reply actions  

Because...

I might be inclined to agree with you on stats where IP is the denominator and the margin of error at this point of the season is kind of big, but not with the higher resolution stats where the margin of error is rather small. Aside from that argument, there’s no reason to assume that I’m throwing out information unless you think there is an inherent bias in my data from excluding the rest of the 2009 season.

By excluding that data I’m excluding any possibility of omitted variable bias that might correlate with the progress of the season. Even if you don’t particularly know whether two populations of data skew differently, it’s not sound econometrics to assume that they don’t, unless by excluding it you reduce the confidence in your sample size. If I did that, my stats would fall within the margin of error, and they don’t.

Finally, if you want to control for temperature, go ahead. Velocity changes for different pitchers differently as the season progresses, and while we know a little bit about why, we don’t have that much info. Therefore, I’m not about to go making assumptions about it. If you don’t think there’s a real bias that correlates with the season’s progression, and the reduction in sample size doesn’t wash out my data (and the accepted methods say it doesn’t), then there’s no reason why this approach isn’t appropriate.

Blogger and Editor, Rational Pastime Blog

by J-Doug on May 28, 2010 3:50 PM EDT up reply actions  

But why control for point in season and not a hundred other things?

Batter handedness? Quality of opposition? Home/road? There are a bunch of things I’d put in line ahead of point in season if I wanted to split my sample.

Just because your margin of error tells you it’s okay doesn’t mean it’s a good idea.

Winner, Beyond the Box Score 32 Predictions Contest, 2009

by Mike Fast on May 28, 2010 4:06 PM EDT up reply actions  

Fair enough

Handedness, as I admitted to viva, I should pay more attention to. It actually never occurred to me until today, which is something I’m rather disappointed about. Home/road is even between these two samples, so it’s controlled for. Quality of opposition, also a good point. Any variable you can think of off the top of your head that I’d use that isn’t laden with contextual noise or park effects issues?

Blogger and Editor, Rational Pastime Blog

by J-Doug on May 28, 2010 4:14 PM EDT up reply actions  

Margin of Error

The margin of error tells me that there’s a 95% chance that I didn’t delete information by narrowing my sample, so long as the variance in my averages falls outside the relevant range. That tells me that I haven’t thrown away enough data so as to erode confidence in my quotients, which was your primary concern, as well as the concern of some from the last post.

I don’t mean to come off as smug—I’ve seen your work and I know that you know what you’re doing—but I have six years of graduate-level education involving statistical and qualitative methodology under my belt, and I know a thing or two about picking a sample. I’m just following the guidelines of my training.

Blogger and Editor, Rational Pastime Blog

by J-Doug on May 28, 2010 4:21 PM EDT up reply actions  

...
I’m comparing Zito’s xFIP over the first 9 starts of 2009 and the first 9 starts of 2010, not his partial 2010 to his full 2009.

Okay, that explains that. Not sure why you would specifically do that, but whatever.

However, my findings are that the shifts in horizontal break, vertical break and spin—whether we look at the cluster that just includes FT and FF or whether we throw CH in there too (since it breaks in a similar manner)—are significant at 0.001.

But “all” of his pitches are breaking more, and the same amount. About 1.5 to 2 inches down. That either implies A) his lost velocity is effecting the break on his pitches, or B) there are some park effects at work. I would think it’s a combination of the two, and that does not mean a conscious shift in his approach, but rather just either measurement error or flat out decline.

Regarding location, the park effects you see are going to be in the way of maybe an inch at most of location error. That’s not nearly a large enough measurement error for you to totally disregard location. If it were me, I would take a gander at some simple scatter plots by pitch type and see if there are any obvious trends to notice. Location isn’t one of those things you have to quantify with fine precision but the overall trends are very important.

For the top and bottom strike zone measurements, Mike has done research that has shown they are a little off, but not too drastically if you simply average them out for each batter over the entire year. This is all imperfect data, yes, but it’s still pretty damn good and you have to use all of it that you can.

Right now it seems that the extra curves are you biggest selling point. Given that the velocity and movement of all of his pitches have changed together, that can’t be viewed as a good thing. He’s not throwing more two-seamers, it’s just that all of his pitches have less vertical movement now. Same thing with velocity. I’m not sure the 6% jump in curveballs is enough to explain his improved numbers this year, especially if it’s offset by the decreased velocity on all of his pitches, which is why I suggest looking at location.

Thanks for the appreciation. Honestly, you are probably sick of Zito and most BtB readers probably don’t want to have read another technical article on him, so you don’t have to dig even deeper into this if you don’t want.

by vivaelpujols on May 28, 2010 6:32 PM EDT up reply actions  

Because he can, Nick
Not sure why you would specifically do that, but whatever.

Because he can.

I’m not sure the 6% jump in curveballs is enough to explain

It’s only a 3% jump (18% to 21%) in you look at all of 2009 instead of a reduced sample of 2009.

And if you split by batter handedness, which you really have to do in anything where you’re examining pitch type usage, it’s 19% curves to RHB and 18% curves to LHB in 2009, and 23% curves to RHB and 15% curves to LHB in 2010, fwiw.

Winner, Beyond the Box Score 32 Predictions Contest, 2009

by Mike Fast on May 28, 2010 8:12 PM EDT up reply actions  

Nice two part analysis!

Really enjoyed reading them.

Since you seem to understand that Zito would regress to his career BABIP and not to .300, then I’m curious why you use his FIP and xFIP? Both assumes regression to .300, making it useless for pitchers like Zito who has proven to have a career BABIP below .300, after more than 7 seasons worth of stats (Tangotiger gave that number previously as the seasons a pitcher has to have before his BABIP is statistically significantly below the .300 mean)

Zito appears to fit under the Tom Tippett DIPS-defying category of “Crafty Lefty”. His lack of strikeouts relative to his success plus his high percentage of infield flies, which was a staple of his early career and which has returned to a great degree this season.

This would suggest that using tERA would be a better sabermetric to use for examining how Zito is doing, and it is currently at 3.52, though not much lower than his FIP of 3.69 (which I got from Fangraphs; not sure why your is so much higher).

And his HR/FB while a Giants should be greatly affected by the park’s HR-reducing quality, and thus making xFIP even less appropriate.

And I’m not saying he’s not going to regress on his HRs – it’s too low to sustain I agree – just that when Zito is being analyzed sabermetrically, there has to be allowances made because he is the bumblebee in the world of DIPS analysis.

Adoptive parental unit of Ehire Adrianza.
Godfather of Travis Ishikawa.

"Woo hoo!" - Tim "The Kid" Lincecum
"The objective is that World Series ring" - The Kid

by obsessivegiantscompulsive on May 28, 2010 3:00 PM EDT reply actions  

Regression

OBS: I haven’t been able to find that Tango post, if you know where it is, could you link me? It seems odd that FIP and xFIP would assume a regression to a BABIP of .300, when almost every pitcher in the modern era keeps his BABIP between 0.290 and 0.300 (wouldn’t we want those stats to assume a BABIP of 0.295, then?)

I mentioned tERA in a comment post last week. Comparing the two partial seasons, his tERA is definitely better. Honestly, I include tERA, xFIP and FIP because they all tell a different story and all raise different questions. I feel my job as a blogger is to post the data, draw conclusions, and raise questions that encourage others to draw their own conclusions and raise their own questions. I’d also post his SIERA if I took the time to copy BP’s formula down.

I did not park adjust his numbers, but he’s played 4 away and 5 home in both seasons, to this point. And thanks for your complements :)

Blogger and Editor, Rational Pastime Blog

by J-Doug on May 28, 2010 4:05 PM EDT up reply actions  

I've been looking for a while

But I know I read it. It is just too hard to search TheBook’s website.

I tried again and found a discussion on Zito and his significantly lower BABIP against RHH: http://www.insidethebook.com/ee/index.php/site/comments/solving_barry_zito/#comments

I found an article that may be it: http://www.insidethebook.com/ee/index.php/site/comments/career_dips_numbers/

In this, comment #6, specifies 5-6 years to reach the tipping point. But perhaps I misunderstand this, I’m no expert in statistics. It just seems significant at that point because then skills explain more than half of the difference, if I’m understanding that correctly. So perhaps I used the wrong terminology before, but basically 5-6 years (or 3700 BIP) is the point where luck explains less than half of the variance between a pitcher’s BABIP and the league mean and thus his skills explains more than half.

And if you look at the spreadsheet he provided, Zito is near the top of all MLB pitchers in negative SD away from the mean for BABIP. 27th all time at that point, but his poor years on SF should have dropped him.

Zito is now around 6000 BIP so that formula would result in Zito’s skill explaining around 62% of the gap between his BABIP and the league average if I understand the formula right.

This post also discusses: http://www.insidethebook.com/ee/index.php/site/the_church_of_baseball_part_2/P2140/

Go about half way down, to the part about NOTHING is random in baseball. Then he repeats that about 2000-3000 BIP is where the difference can be attributed half to luck (and thus half to skill).

And for those interested, here is Tippett’s study: http://www.diamond-mind.com/articles/ipavg2.htm

And I prefer “ogc” :^). Thanks, great discussion, have a great weekend!

Adoptive parental unit of Ehire Adrianza.
Godfather of Travis Ishikawa.

"Woo hoo!" - Tim "The Kid" Lincecum
"The objective is that World Series ring" - The Kid

by obsessivegiantscompulsive on May 28, 2010 7:41 PM EDT up reply actions  

Maybe I'm missing something here.

Can you spell out for me how you figured those margins of error?

by cwyers on May 29, 2010 12:40 AM EDT reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
Prince Fielder in Comerica Park
Crystal_ball_small
Sparky vs Buck
Img_3830_small
BtBS Fantasy League
Small
Context Neutral Run and RBI projections
Small
Free Agent Compensation
Img_0001_small
Value of Various Plate Approaches
Strike_three2_small
Effect of Foul Area on Strikeouts: AL 1954-68: Erratum
Small
Baseball on a stick
Small
Player Evaluating Statistic
Baseball_small
Rays Outfield: Cheap but Extremely Productive

+ New FanPost All FanPosts >

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Picture-6_small Chris St. John

Btbpro_small Dave Gershman

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung