clock menu more-arrow no yes mobile

Filed under:

Being Cautious with using Pitchf/x Data to evaluate Stuff: The Case of Kyle Drabek

The pitches of Kyle Drabek:  Look easy to distinguish the various pitches, right?
The pitches of Kyle Drabek: Look easy to distinguish the various pitches, right?

There are two types of scouting reports of pitchers:  The first, focuses upon a pitcher's results with his pitches as a means of judging how good that pitcher is.  The Second focuses on the pitches of a pitcher, what is often termed the "stuff" of that pitcher.  The latter type of scouting tends to focus toward the future....a pitcher who has a 80MPH curveball with real bite and a 94 MPH fastball is thought of to have good stuff and is thus likely to become a good if not great pitcher in the future.  The second type of scouting, scouting "stuff," can lead to bad predictions if a scout gets too fixated on a player's stuff and ignores their results, and we see this all the time. Thus the best scouting report combines these two forms' of analysis to make the most accurate judgment of a player's potential and his ability to reach it.

Pitchf/x analyses of pitchers can fall into the same two categories: like scouts who use their eyes, these analyses will either analyze the results of a pitchers' pitches or just the movement and velocities of the pitches themselves.  Of course, most pitchf/x analysis is done looking at both of these things - looking at results in addition to speculating about the potential greatness of a pitch's movement or velocity -  just like the best form of scouting rerports. 

But due to September call-ups, the minor league futures game, the Arizona Fall League, and a few other places now having pitchf/x cameras capturing data, we often have pitchf/x data for a bunch of young prospects...but not enough in order to make any meaningful analysis of that pitcher's results (Tiny Sample Sizes!).  Instead, a pitchf/x analysis of such a pitcher mainly needs to reflect upon a pitcher's stuff and project from there. 

Unfortunately, with Small Sample Sizes and Pitchf/x data, we may have a problem: 

Take Kyle Drabek for example.  I've put up a graphic showing his pitches in September over 3 starts last year at the top of this post, and you can clearly see how each of his 5 pitches forms somewhat nice, clusters that are distinct from one another. From that graph you can tell the kid has good velocity on all of his pitches including his breaking ball, a cutter with a pretty good cut and a two-seamer with good tailing action, in addition to a more standard looking four-seam fastball with nice velocity (it averages around 94MPH).  

Of Course, showing you only the velocity and horizontal break of a pitch is a little bit deceptive here: after all, what about the vertical break of Drabek's pitches - how much do each of his fastballs sink and how much does his curveball drop?  Well, there we run into a problem.

 

Drabekpfxbypfz_withlegend_medium
Figure 1:  A total mess of a graph.


Above we have a graph showing the vertical spin deflection (break or movement) of Drabek's pitches by the horizontal movement of those same pitches.  What a mess, right?  If I didn't color in each pitch type, telling them apart (change-up aside) would be extremely difficult, as the vertical spin deflection of each pitch type seems to vary greatly:  One four-seam fastball for example has a vertical spin deflection over +15 inches (meaning that, relative to a ball thrown without spin, the pitch sinks over 15 inches less than would be expected due to gravity), while a second four-seamer, with a similar horizontal movement and velocity, has less than +5 (around 3 inches actually) vertical spin deflection.  In other words, if we were to believe the info in this graph, we'd think that Drabek's four-seam fastball varies so much that sometimes it drops over 10 inches more than it does other times. Similarly the graph seems to say that sometimes Drabek's curveball has no drop at all, while other times it has a pretty nice amount of vertical break.  These results are unbelievable.

In fact, as you hopefully suspect by now in reading this, these results are wrong.  What's happening here is the result of pitchf/x calibration errors.  Over a full season, such errors are mitigated and mainly neutralized because a player pitches at multiple parks, so that the pitcher's results will cluster around the true movement on his pitches (unless you play at a ballpark with such errors all season long, and even then you can use the road data to avoid this problem).  But, for a pitcher with only a few starts, there isn't enough data for such errors in calibration to be clearly eliminated from the data set. In other words, due to pitchf/x calibration errors, we can only guesstimate as to the actual vertical break of Drabek's pitches. 

Drabek's case is particularly easy to notice: he pitched only for three starts, one in Baltimore, and two in Toronto.  This just so happens to be two places where there were extreme pitchf/x calibration errors during September, and their extreme errors resulted in opposite results: in Baltimore pitchf/x data was shown to have much more sink than actually was there, while in Toronto the data shows much less sink/drop on the pitches than actually occurred.  This can be seen in Figure 2 below:

Biasarg_medium
Figure 2The average vertical spin deflection of four and two seam fastballs in Baltimore and Toronto

As you can see, this wasn't the case the whole year (in fact it was reversed in April), but in September both Toronto and Baltimore were extreme in opposite directions.  This is caused by bad release point locations (which are used to calculate vertical spin deflection).  Theoretically, one could try and compensate for this by subtracting the differences in the results from the average results of each park across the league, but the end result still won't leave us with an exact (or even great) handle on the vertical spin deflection of Drabek's pitches. 

But here's the rub of this whole thing:  With Drabek, the problem was obvious because he pitched in two separate parks with these issues, making it clear that to look at this data we had to look for the calibration error.  But what if he'd only pitched in Toronto?  Then someone analyzing him might've simply concluded that he has nice heat, but his pitches do not have much sink at all, and in fact have extreme rise!  This would of course be incorrect somewhat and it just goes to remind us when looking at pitchf/x data with a small sample size...beware of calibration errors before you start analyzing a pitcher's stuff. 

Conclusion:

We see this problem with Arizona Fall League data as well, where at least one of the two pitchf/x parks has odd calibration problems.  And when working with AFL data, a pitchf/x analyst is not going to look at results..the sample size is too small.  Instead he'll look solely at the pitcher's stuff, and if he's not careful he'll make some very incorrect judgments. 

So the point of the piece is this:  If you're taking note of a small sample size of pitchf/x data, be aware that the data is not perfect.  Calibration errors can majorly affect your data, and lead you to the wrong conclusions.  So be cautious here...and avoid this mistake.