First off, I'd like to thank R.J. for inviting me to join Beyond the Box Score. I hope to be able to contribute some interesting information and insight to a great site. Some topics I plan to examine in the next few months include game calling, pitch framing and clutch performance, as well as commenting on any studies that strike a chord.
For today's post, I'd like to continue my look at catcher's block percentage that I began on my personal blog. For those who haven't read the rest of the series, let me summarize the concept. Basically, I'm using the MLB provided Gameday data from 2005 through 2007 to calculate how each catcher performed in blocking pitches. Wild pitches and passed balls are considered "Misses," while balls in the dirt with runners on base are considered "Opportunities".
To this point, I've only been able to compare catchers to the aggregate league performance. There are a few problems with that approach. The first is data quality. Determining whether a ball was in the dirt is up to the Gameday scorer, and there's no guarantee of consistency. This fact becomes obvious when looking at the differences in opportunities from year to year (Table 1). Unfortunately, there's not much I can do about the data quality beyond acknowledge it may cause some issues.
|Year||Opportunities||Avg. Block %|
The second concern is the impact of the pitching staff. Any given catcher may see easier or harder pitches to handle based on the tendencies of his pitchers. To illustrate this, imagine a staff full of knuckleballers. Whichever catcher is unlucky enough to log innings for that team is going to have a much lower block percentage than average. Of course knuckleballers aren't the only types of pitchers who may cause problems for catchers, they're just the most obvious. To try and isolate the impact of the pitching staff, I attempted a With Or Without You (WOWY) study as outlined by TangoTiger here.
The idea behind this WOWY study is to look at the block percentage for a pitcher throwing to the catcher in question as compared to all other catchers. It's probably best demonstrated with an example. Let's consider Jason Varitek. From 2005 through 2007, Varitek has caught 25 different pitchers. Table 2 is a listing of Misses and Opportunities for the first few pitchers both with and without Varitek catching. From there, we can calculate Varitek's expected Misses and, therefore, how many runs he's saved by blocking pitches.
|Pitcher||With Misses||With Opportunities||Without Misses||Without Opportunities||Expected Misses||Blocks Above Expectation|
|And so on...|
Looking at the WOWY numbers for Chad Bradford and Matt Clement, you can see the big issue here. As opposed to Tango's study I referenced above, the sample sizes for the pitcher/catcher matchups is, in a lot of cases, too small to be helpful. Varitek allowed 14 misses in 88 opportunities with Clement, but since no other catcher caught Clement in this time frame, it's impossible to determine how well Varitek compares. In fact, the system credits him with 14 fewer blocks than he should have had. This is obviously not the right answer, considering how Clement traditionally is among the league leaders in wild pitches - a pattern which has held across many teams and catchers.
So what do we do to counter this. The right answer is probably to wait for more data. But that's a bit of a disappointment, so I decided to take a different approach. I threw out the observations with fewer than 20 opportunities on either the with or without side - basically assuming that a catcher performed as expected in those cases. In Varitek's case, that means eliminating all of the entries from the above table except for Josh Beckett, and retaining only 3 of his 25 matchups overall. Why 20? It was just an arbitrary number that seemed to balance two competing sets of sample size concerns - the individual matchups, and the total number of matchups for a given catcher. Someone with more statistical aptitude than I could probably help me with a mathematically proper cutoff, but for now I went ahead with 20.
For those matchups left in the sample, I summed the Blocks Above Expection for each catcher. I then converted that into runs by multiplying by .27 runs per miss, which is the linear weights value for a miss. Finally, I scaled the opportunities to (roughly) 120 games or 238 opportunities (based on the number of opportunites per inning across all three seasons) to allow for easier comparison. Table 3 shows the results for those catchers with 100 or more WOWY opportunities from 2005-2007 seasons. Keep in mind that the numbers aren't anywhere as precise as the decimals make them appear to be - that's just how the math turned out.
|Paul Lo Duca||220||4.1||1.1||1.2|
What does this all tell us? If you're expecting a straight answer out of me here, you're not going to get one. I'll be honest and say I just don't know what it all means. The data quality and sample size issues make me question a lot of it. For those catchers who appear in Table 3, the mean is .9 runs above average and the standard deviation is 3.5 runs. It looks to be a pretty normal distribution, which was also true of the less rigorous analyses I attempted before. I think in many cases the results match reputations, which might lend some credence, but nothing here jumps out at me indicating that there's a definite skill. I think the net is that this information might be useful for determing a catcher's past value, but I wouldn't suggest incorporating it into any projections yet. Hopefully as we get more data from more seasons we'll be able to make more progress in unraveling the value of pitch blocking.