A Strike Is a Strike, Right?
After my last post about catcher framing, many people (at least two or three) suggested that the pitching staff had a fair amount to do with the results we were seeing. The poor performance of both the Rangers' catchers (Gerald Laird and Jarrod Saltalamacchia) seemed to lend credence to that stance. I started wondering what factors might influence an umpire to call a pitch a certain way. So I retreated into my stats cave (no, it's not my mother's basement) for a while to see what I could I find out. I'll warn you, this is a very long post, so if you want to skip to the summary at the end, feel free, but you'll miss the graphs.
Before I start examining the different factors, let me outline the methodology. It's the same basic approach as the catcher framing article, which in turn is based on Jonathan Hale's article at The Hardball Times. Each called pitch is compared to the strike zones identified by John Walsh to see whether it was "mistakenly" classified. I put "mistakenly" in quotes because Walsh marked the boundaries of the strike zones where 50% of the pitches were called strikes, so it's expected that some calls will not match. Walsh provides values on the height and width of the strike zone for both left and right handed batters and compares them to the rulebook strike zone for the average batter. I used his values for the strike zone widths, but calculated my own for the strike zone height of each batter, based on his deltas from the rulebook strike zone. I used absolute differences (2.2 inches on the top for right handed batters, for example) rather than percentage differences.
Anyway, once I classified all the pitches, I credited the pitcher (or category of pitchers) for all additional strikes and debit for all additional balls to find how many total "misses" happened. I then calculated the average percentage of misses across all pitches and determined how each pitcher differed from his expected value. Those differences were normalized to 150 opportunites, or roughly the amount in one game. Finally, that number was converted to runs per 150 called pitches by multiplying by .161 runs, which is the value of changing a ball to a strike. This allowed me to compare different factors that might influence how a pitch was called and see how important each one might be.
Game Specific Factors
Certain characteristics of the game might play a part in how an umpire calls pitches. Specifically, I looked at which inning the calls happened, and which team was at bat. Another possible area to explore would be day games versus night games, but I didn't have the data handy to look into that.
Inning
| Inning | Called Pitches | Runs / 150 Pitches |
|---|---|---|
| 1 | 21281 | 0.10 |
| 2 | 19165 | 0.01 |
| 3 | 19076 | -0.05 |
| 4 | 18093 | 0.03 |
| 5 | 18209 | -0.03 |
| 6 | 18028 | -0.08 |
| 7 | 18482 | 0.02 |
| 8 | 18482 | -0.04 |
| 9 | 13029 | 0.06 |
| 10+ | 2588 | -0.18 |
The most interesting piece of information here is the (relatively) large assist given to batters in extra innings. It's almost like the umpires want to go home and unconsciously give batters the benefit of the doubt so as to increase the chances of scoring a run. Of course the effect is pretty small - a difference of one run every 45 innings or so, but it's almost twice as large as any other inning. The fact that a team is more likely to use the back of their bullpen in extra innings could play a role too.
Home or Visitor
| Batting Team | Called Pitches | Runs / 150 Pitches |
|---|---|---|
| Visitor | 84524 | 0.06 |
| Home | 81909 | -0.06 |
A mild advantage to the home pitchers (when the visiting team is batting) - to the tune of one run every 8 games. I expected the home team pitchers to get some favorable calls, but wasn't sure of the impact.
Pitcher Factors
How could a pitcher influence whether close pitches are called strikes? I separated the possiblities into two main categories - demographics and performance. Under demographics, I looked at age, experience and handedness, and for performance I looked at runs allowed, walks allowed and early-game wildness. I know that age and experience are close analogs but I thought I'd see if there were major differences between the two. More on that later.
Age
| Age | Called Pitches | Runs / 150 Pitches |
|---|---|---|
| 20 | 2790 | 0.12 |
| 21 | 2944 | 0.00 |
| 22 | 4507 | -0.15 |
| 23 | 11582 | -0.27 |
| 24 | 13041 | -0.17 |
| 25 | 15982 | -0.19 |
| 26 | 15387 | 0.05 |
| 27 | 17369 | -0.19 |
| 28 | 11809 | -0.12 |
| 29 | 15636 | 0.28 |
| 30 | 13211 | -0.01 |
| 31 | 7844 | 0.20 |
| 32 | 5177 | 0.15 |
| 33 | 8705 | 0.01 |
| 34 | 2392 | 0.19 |
| 35 | 3190 | 0.16 |
| 36 | 3187 | 0.10 |
| 37 | 1851 | 0.73 |
| 38 | 903 | 0.60 |
| 39 | 2645 | 0.37 |
| 40 | 2184 | 0.38 |
| 41 | 1603 | 0.40 |
| 42 | 276 | -1.01 |
| 43 | 1498 | 0.06 |
| 45 | 720 | 0.72 |
The numbers jump around a bit, and the sample size for some of the individual years leaves a lot to be desired, but in general, the older you are, the more love you get from the umpires. The effect is much clearer if we look just at age buckets
| Age | Called Pitches | Runs / 150 Pitches |
|---|---|---|
| Under 25 | 50846 | -0.17 |
| 26-35 | 97530 | 0.03 |
| Over 35 | 18057 | 0.31 |
Pitchers over 35 see almost half a run per game benefit compared to pitchers under 25. There are some concerns with this. There's likely a selection bias as the pitchers who make it to 35 tend to be the better ones, so that might influence how the umpires rule. It's also possible that the determining factor isn't age, but experience. Let's look at that one next.
Experience
| Experience | Called Pitches | Runs / 150 Pitches |
|---|---|---|
| 0 | 17814 | -0.13 |
| 1 | 20275 | -0.19 |
| 2 | 21048 | -0.09 |
| 3 | 16597 | -0.20 |
| 4 | 14579 | -0.07 |
| 5 | 11218 | 0.11 |
| 6 | 10098 | -0.04 |
| 7 | 9716 | 0.15 |
| 8 | 13236 | 0.04 |
| 9 | 6638 | 0.28 |
| 10 | 6063 | 0.35 |
| 11 | 1868 | 0.29 |
| 12 | 3914 | 0.39 |
| 13 | 499 | -0.49 |
| 14 | 2722 | 0.36 |
| 15 | 3541 | -0.14 |
| 16 | 863 | 0.75 |
| 18 | 929 | 0.61 |
| 19 | 1494 | 0.67 |
| 20 | 1625 | 0.20 |
| 21 | 1522 | 0.31 |
| 23 | 139 | 0.64 |
There are a few strange negative bumps at 13 and 15 years, but the samples are pretty small there. In general the trend is upwards, and it appears more strongly than with age. Let's look at the buckets of experience.
| Years of Experience | Called Pitches | Runs / 150 Pitches |
|---|---|---|
| Under 2 | 59137 | -0.14 |
| 3-7 | 62243 | -0.03 |
| Over 8 | 45053 | 0.22 |
I think these understate the actual value of being a long-time pitcher - mostly because a large portion of the sample is in the set of pitchers with 8 years of experience. I'll admit these buckets don't match up well with the age buckets I introduced above, so the comparison is not as easy to make as I might hope.
Handedness
| Hand | Called Pitches | Runs / 150 Pitches |
|---|---|---|
| Left | 44213 | -0.09 |
| Right | 122220 | 0.03 |
An advantage of 1 run for every 8 games or so to righties. I'm not sure why righties might get more favorable treatment. It could have something to do with the direction the pitches break, or maybe it's related to handedness of the batter - where righty pitchers face more left-handed batters than lefties do. Whatever the cause, I'm going to chalk it up to unexplained variation (Mike Emeigh recently suggested that term as opposed to random variation, and I think it's a good idea).
Runs Allowed
| RA/9 | Called Pitches | Runs / 150 Pitches |
|---|---|---|
| Less Than 4.00 | 30491 | -0.01 |
| 4.00-6.00 | 97973 | 0.05 |
| More Than 6.00 | 20086 | -0.25 |
First off, let me make it clear that those runs allowed numbers are for a pitcher's career before 2007 while the runs / 150 pitches arer the 2007 season. There isn't a whole lot of variation between the low RA and the medium RA buckets. However, the high RA bucket loses a quarter of a run per game, or -.03 points of ERA in 2007 to umpires' calls. That suggests that umpires might be buying into a pitcher's bad reputation and reinforcing it. Interestingly, there doesn't appear to be a corresponding positive bump on the other side.
Walk Rate
| BB/9 | Called Pitches | Runs / 150 Pitches |
|---|---|---|
| Less Than 2.50 | 21241 | 0.18 |
| 2.50-4.50 | 102769 | 0.02 |
| More Than 4.50 | 24540 | -0.22 |
Again, the walk rate numbers are pre 2007 while the runs / 150 pitches are 2007. In this case we see that reputation appears to play a part in both directions. Pitchers who previously had low walk rates benefit from the umpires' calls, while those with high walk rates suffer. The difference between the two is .4 runs per game or .04 points of ERA. It's important to realize the walk rate is probably not independent from the rate of runs allowed - in other words, a pitcher who has a high walk rate is likely to give up a lot of runs - so these differences can't just be added together to figure out what part reputation plays in skewing the results.
Early-Game Wildness
| Wildness | Called Pitches | Runs / 150 Pitches |
|---|---|---|
| Low (Less Than 33%) | 14704 | -0.01 |
| Mid (33% - 42%) | 23160 | 0.08 |
| High (Greater Than 42) | 11431 | -0.15 |
This one needs some explanation. What I tried to do was identify how wild each starter was in the first two innings of each 2007 start by looking at the percentage of balls he threw. I then grouped them into low, medium and high buckets, and ran my standard analysis. This shows much the same result (although less pronounced) as the runs allowed case - in that the high bucket is hurt, but there's little effect on the low bucket. Basically, it appears that being wild early tarnishes you in the eyes of the umpire, while having good control early doesn't really do much. Again, there's going to be some causal overlap with this measure and the walk rate and runs allowed metrics, so keep that in mind.
Summary
So what does this all mean? Very good question. Again, I'm not sure. Since we only have one (partial) season to go on, it's hard to be that confident in the data. I think the sample sizes are pretty good for many of the buckets used in this study, but that's a gut feel rather than anything confirmed mathematically. In a moment, I'll share the results of a regression I ran for individual pitchers based on some of these metrics, but first let's look at a summary chart of all the different ways of breaking this down. The ERA effect is the difference in ERA between the "good" group in the sample (low walk rate, low RA, etc.) and the "bad" group (high walk rate, high RA).
| Breakdown | Good | Bad | ERA Effect |
|---|---|---|---|
| Inning | First | Extras | .03 |
| Batting Team | Home | Visitor | .01 |
| Age (Buckets) | Over 35 | Under 25 | .05 |
| Experience (Buckets) | Under 2 | Over 8 | .04 |
| Handedness | Righties | Lefties | .01 |
| Runs Allowed | 4.00-6.00 | More Than 6.00 | .03 |
| Walk Rate | Less Than 2.5 | More Than 4.5 | .04 |
| Early-Game Wildness | Mid | High | .03 |
Again, please keep in mind these aren't independent so cannot be added together to get a total effect. There's nothing that's all that big here, at least compared to what I was finding for catchers (a spread of over a run per game).
Now let's get to that regression. I took all pitchers from 2007 who had over 300 pitches called (174 pitchers) and regressed the runs / 150 pitches twice - once looking at handedness, age, runs allowed and walk rate, and the second time replacing age with experience. Neither case explained more than .10 of the total variation, so I'm not even going to bother sharing the actual values. If anyone really wants to know, ask and I'll share.
I'm sure there are plenty of other things that might influence an umpire to call pitches a certain way for a given pitcher. A few that I didn't get a chance to look at for various reasons were GB/FB ratio and the pitcher's arsenal. Plus it's going to be nearly impossible to untangle the effect of the catcher and umpire until we have more data. But I think there's still value in examining these characteristics and finding the limits of their effects, even if the information is not conclusive.
1 recs |
11
comments
| Add your comment
Comments
Wow
This is a really fascinating study. This seems to confirm most of what we think about which pitchers get calls and which don’t. Thanks for taking the time to run these numbers.
by TexasTiger on
Apr 24, 2008 8:25 AM EDT
reply
0 recs
Really cool stuff
I find the experience breakdown especially interesting.
"I've seen many, many blue skies turn gray, but the sun will eventually return, and so will I. So will I." - Carlos Pena
by R.J. Anderson on
Apr 24, 2008 9:55 PM EDT
reply
0 recs
I am afraid...
that what you simply might be seeing with lots of these differences are simply differences in the distributions of pitches. In fact, you really need to control for that. Any pitcher or group of pitchers (or catchers who catch a certain group of pitchers) who throw more pitches around the edges of the zone may appear to get more favorable or unfavorable calls from the umpires. That may be why older pitchers, better pitchers, less wild pitchers, etc., appear to get the benefit of the doubt from the umpires (there may be more misses not in favor of the pitcher on all pitches near the edge of the zone) . What you need to do when you are comparing groups (of pitchers or catchers, or whatever) is to look at only those pitches that are at the edge of the zone and scale each pitcher or group to the same number of pitches (on the edge).
For example, let’s say that of all pitches on the edge, 10% are “mistakes” but 60% of those mistakes are balls that should be strikes. Obviously any pitcher who never throws on the edge will get no mistakes (and receive some “credit” because the average pitcher has more mistakes against him than in his favor) and any pitcher who lives on the edge will get more debits than credits. And the more he lives on the edge, the more debits he will get.
The only way to control for this is to look at only those “edge” pitches for all pitchers and then use what percentage of those were mistakes (and in what direction) to credit or debit the pitcher. So a pitcher with 10 “mistakes” on edge pitches per 150 with 6 not in his favor and 4 in his favor, will have the same overall credit/debit (or run value) as a pitcher with 20 mistakes, 12 against him and 8 for him. If you did the calculations by “per total pitches (150 in this case), pitcher B with more edge pitches and more mistakes will get more debits and his run value will be a lot more negative (or positive, whichever way you are doing it).
While the results of this and the catcher study are quite interesting, as you point out, there are all kinds of potential problems here that need to be looked at, not the least of which are the selective sampling issues (especially with the age data), and probably park issues.
Plus, be real careful about taking sample differences, even large ones, and imputing skill differences to them. If you want to take the sample numbers and regress them towards the mean in order to estimate “true” (skill) differences, you better find out how much to regress! You can’t just guess at the regression. It could be 50% for, say, one season’s worth of data for a pitcher or a catcher, or it could be 100% (no skill).
Finding “splits” differences among players or groups of players (or teams, or whatever) is only half the battle. Next, you have to find out how much of those differences are accounted for by random (and unexplained) variation and how much by skill. As I said, for any given “splits” and any given sample size, there could be almost 0% skill (as in clutch hitting, RHB platoon splits), or there could be a lot. Just looking at sample differences may be interesting (to some people), but without knowing how much is random variation and how much is skill, we really don’t know what to say or conclude. Unfortunately, as soon as people see large “splits” they automatically conclude that there must be lots of, or at least, some, skill underlying the differences. Not true at all. We have to figure out the skill underlying the differences by comparing the variance we see to that expected by chance or perhaps do some kind of “one time period to another” (commonly y-t-y, but it doesn’t have to be) correlation.
by mgl on
Apr 25, 2008 7:20 AM EDT
reply
0 recs
Great points MGL
I agree with every thing you say – and I definitely try to caution anyone not to read too much into these.
I’ll see what I can do to just look at those pitches that could be classified as edge pitches – but then I’m pretty sure the sample sizes are going to be extremely small – so my guess is we won’t get anything meaningful out of the data… yet.
Mostly what I’m trying to do is introduce these splits and lay the groundwork for something we can revisit in future years to try and determine the skill component.
by Dan Turkenkopf on
Apr 25, 2008 7:28 AM EDT
up
reply
0 recs
Watch those ball/strike count weights...
Dan,
Great work. A fascinating effort.
One comment. I suspect your run value for switching a ball to a strike is off.
You weight the run value for switching the call on a pitch at a given count by the number of plate appearances with that count. However, the impact of an umpire’s call is not necessarily evenly distributed by the number of plate appearances at each count.
First, I suspect that some counts see substantially higher numbers of called pitches than others. The more likely the batter is to swing, the less likely the umpire will have a pitch to call. At least you should weight the run values by the number of called pitches at each count, not the number of PAs.
Second, if he’s behind in the count the batter may be more likely to swing at anything close to the plate; if he takes the pitch, it’s more likely to be an obvious ball. The umpire’s judgment (or catcher’s skill) would be less of a factor.
To get a better sense of the influence of the catcher or umpire, I think you need to weight the run value of the pitch by some estimate of the number of “close call” pitches at each count – what MGL’s calling edge pitches.
by Iblemetrician on
Apr 26, 2008 6:24 PM EDT
reply
0 recs
I see your point...
but I think I’d need to change a lot more than the weighting factors to make this work. If I change the number of opportunities to only count the edge pitches, wouldn’t the run values change as well?
As you say, there are certain counts that batters are more likely to swing at or let by. The run values take into account all possible outcomes of the at-bat – some of which are immediate (putting the ball in play) and shouldn’t come into play when talking about the run value of a called pitch, right?
I did agonize over how to do the weighting and went back and forth a bit. I think I could approach it by number of called pitches and see how much that changes things. I feel like I started out that path and it didn’t change things too much, but I don’t remember the details.
by Dan Turkenkopf on
Apr 26, 2008 10:57 PM EDT
up
reply
0 recs
I was wrong
I didn’t go down this path because it’s surprisingly hard to tie together a pitch with a given count. I don’t think I’m able to do with the data as I have it not. I’m going to need to write some code to see if I can match up the pitch with the count.
I’ll let you know how it turns out.
by Dan Turkenkopf on
Apr 27, 2008 10:14 AM EDT
up
reply
0 recs
That does bring the run number down some
If I look at all called pitches by count it’s .146 runs. If I look at called pitches at the edge, it’s .140 pitches.
More details here
by Dan Turkenkopf on
May 1, 2008 8:17 AM EDT
up
reply
0 recs
The age stuff is really interesting
I wonder if you can control for that somehow- It stands to reason that the wildest pitchers are in and out of the majors by the time that they’re 35, so that any group of 35 year old MLB pitchers will be better able to throw strikes than any group of 25 year olds (who are still cheap enough to gamble on them putting it together). What mgl was saying about the ‘edge’ pitches makes a lot of sense in this scenario.
Have you started working this year’s Pf/x data or will you wait until the end of the season for that?
"Have faith in the Yankees, my son. Think of the great DiMaggio."
by jscape2000 on
Apr 29, 2008 1:25 AM EDT
reply
0 recs
Thanks
I’m still working through the “edge” pitches, so we should see that soon.
I haven’t started looking at this year’s data yet. Maybe around the All Star break.
by Dan Turkenkopf on
May 1, 2008 8:19 AM EDT
up
reply
0 recs
Pitcher v. Batter
Last night during the Red Sox game rookie Jed Lowrie was called out on an edge pitch by B.J. Ryan. Some of the commentary has suggested Ryan got the call (or Lowrie didn’t get the call) because he’s a veteran pitcher, and the close calls will go to the vet. I wonder if your data on balls vs. strikes would show anything when the difference of time in the majors (age, experience, etc) between pitcher and hitter. Is a close pitch a ball when Frank Thomas is at the plate, and a strike when it’s Lowrie?
by a20261 on
May 2, 2008 8:28 PM EDT
reply
0 recs













