Is baseball 75% pitching, as we sometimes hear? If so, what would that mean? One question that we could ask is what percentage of the plate appearances (or outcomes) is determined by the hitter and what percentage by the pitcher? If we knew the answer to that, we might be able to say that baseball is a certain percentage pitching. To look at this, I tried the same type of analysis that Rob Wood used in a study done back in 1990. The basic idea was to measure what share of total variation in certain statistics came from the hitters and what share came from the pitchers.
Bill James also raised this issue in his "1986 Baseball Abstract." (That is from an excellent site by Rich Lederer). James raised the issue of the extremes in various stats and how the rate at which hitters did things were more extreme than pitchers (like the lowest HR frequency allowed by pitchers was not as low as the lowest for hitters and the highest HR% allowed by pitchers was not as high as the highest allowed by pitchers, plus the same for other stats). The idea is that if pitchers don't vary as much as the hitters, then they don't impact the game as much. I found all the pitchers from 1996-2005 with 5000 or more batters faced (or BFP) and all the hitters who had 5000 ore more plate appearances (PAs). The table below shows the extremes in four stats for both pitchers and hitters: Hits, HRs, strikeouts and walks, all per batter faced or plate appearance.
In some cases the difference between the extremes (the very best and the very worst) are clearly greater for hitters, in others they are not. The table below shows the best and worst in OPS (on-base percentage + slugging percentage) for both pitchers and hitters from 1991-2000 (more on this time period later).
The differences between the best and worst hitters are cleary bigger here. Next, I move on to what Rob Wood did back in 1990 and applying his method to other data sets.
In his article "Hitter or Pitcher," which appeared in the "By the Numbers," the newsletter of SABR's statistical analysis committee, Rob Wood supposed that there is a league where all the pitchers are of equal ability but the hitters vary as they normally do. Then we have to conclude that all the variation in what happens is due to the hitters. Baseball would be 100% hitting. So he looked at the variance in several stats of AL hitters and pitchers in 1987. One advantage of using the AL was that the pitchers don't bat so their poor performance would not artificially increase the variance of the hitters (I am not sure how important that consideration would have been since he calculated the variance of each stat by weighting each player/pitcher by at-bats or plate appearances). I also think in using just one league in one season and including every pitcher and hitter, he covers every single plate appearance during the time period studied. That makes sense since he was interested in the batter-pitcher matchup.
So he found the variance, a measure of dispersion frequently used in statistical analysis, in various stats for the pitchers and hitters. Then he added them to get total variance. The table below shows what share of the total variance came from the variance of the hitters or pitchers.
So most of the variation easily came from the hitters. I think numbers like this make it unlikely that "baseball is 75% pitching" or that the pitchers determine 75% of the outcomes.
I looked at a few different data sets using the same method that Wood used. First, I found all the pitchers from 1996-2005 with 5000 or more batters faced (or BFP) and all the hitters who had 5000 ore more plate appearances (PAs). Of course, this is not the same set of conditions that Wood used since I am not counting all plate appearances (just those from the guys who pitched and batted the most) and I use both leagues, so some of the pitchers did not face these hitters very much, even with inter-league play. The stats I used were hits, HRs, Ks, and BBs, all per PA or BFP. The table below shows the results.
These results certainly show that baseball is not 75% pitching, but the hitters don't dominate as much as in Wood's study. One potential problem is that the mean for each of these stats differs between pitchers and hitters. If one group of numbers has a higher mean than another, it might naturally lead to higher variance. If each number in group A is 10% higher than just one number in group B, then the variance of group A will be 21% higher (since 1.1 squared is 1.21-you use 1.1 since that is the number-to-number ratio). So I found the ratio of the means for the four stats, then adjusted the variances on the hitters side, then re-calculated what share went to the hitters. For example, the mean for Hits/PA was about 6% higher for the hitters (which makes sense, since I looked at pitchers who faced alot batters, meaning they would likely be better than average and allow fewer hits than normal and the hitters were probably better than average, so they would get hits more often than average-the result is a higher mean for the hitters). Then I squared 1.06 to get about 1.13. Then the Hits/PA variance for hitters was reduced by 13%. Then I recalculated what share of the variance goes to the hitters. So the new shares for the hitters for the four stats were 40%, 84%, 55%, and 65%. The pitchers still are not dominating.
I also looked at a single season in the AL, 2004. This is closer to what Wood did, but not quite the same since AL teams do play some of their games against the NL now, which they did not do in 1987. The stats I looked at were batting average (AVG), on-base percentage (OBP), slugging percentage (SLG), and OPS (OBP + SLG). The table below shows what share of the variance goes to the hitters and pitchers (the means for the stats were all about the same).
Again, the hitters have more of the variance, so it is more likely that outcomes are more batter determined than pitcher determined.
The next test I ran used career data from 1991-2000. I took all of the pitchers who had at least 1,000 IP from 1991-2000. Then I looked for their data in my STATS, INC Player Profiles books from 1996 and 2001, to cover the years 1991-2000. Those books gave 5-year totals. If a pitcher did not appear in both editions, he was left out. That left 59 pitchers. Of this group, the lowest number of BFP was 4,444. So then I found all the hitters with at least that many PAs in that period. The one stat I looked at was OPS. Hitters had about 75% of the total variance. If I made the adjustment in variance mentioned earlier because of the higher mean for hitters, the share from the hitters would fall to 69%. So still, the hitters dominate.
One last issue that might be important is that it may be natural to have a high variance among hitters because some fielders hit better than others. Shortstops and catchers usually don't hit as well as outfielders and pitchers. So I used the 1991-2000 data again but just with outfielders (who all had 3,000 or more PAs-this gave me about the same number of OFers as pitchers). But the variance and mean were both about the same as they were for all hitters in this period with 4,444 or more PAs. That means the share of the variance going to the hitters would not change.
It seems that pitchers clearly do not dominate baseball since their share of the variance in various stats is usually less than 50%.
Sources: The Lee Sinins Complete Baseball Encyclopedia, the Sean Lahman Data base (from Baseball Archive), and the ESPN website.