As the title suggests, many have studies have been written on this. There are some links to previous research listed at the end of this article. But the question is at what age do hitter's peak? To analyze this, I found all players who had 15 or more seasons with 400 or more plate appearances (a minimum for a "full" season). To measure offensive performance, I used RCAA from the Lee Sinins's Complete Baseball Encyclopedia. It is "Runs created above average. It's the difference between a player's RC total and the total for an average player who used the same amount of his team's outs. A negative RCAA indicates a below average player in this category." Then I sorted this group of players by age, found the average RCAA for each age, and graphed it.
Here are the averages at each age. The count column shows how many players at a given age.
The highest average RCAA is 34.20 at age 29, although this is only slightly higher than ages 26-28 (since there were so few cases at certain ages, I only graphed ages 20-40). It is interesting that the average dips from 30-31 but comes back up at age 32. There were a total of 90 players in this group and from ages 25-35, the number stays very high. The equation:
(1) RCAA = -0.1922*AGE2 + 10.901*AGE - 122.45
predicts what the average RCAA will be at a given age. The R2 = 0.8975 means that 89.75% of the variation in average RCAA across ages is explained by the equation. To find the peak of the trend line, the derivative of the equation with respect to age can be found and set equal to zero (this is a calculus technique). The derivative of equation (1) is
(2) -.3844*AGE + 10.901 = 0
The AGE at which this is true will be 28.36. That could be another way to get the peak value and it is fairly close to the peak age in terms of average RCAA, 29. It tells us the very highest point on the trend line.
Equation (1) is a second-order polynomial. A commenter on one of JC Bradbury's article suggested a higher order polynomial. I tried a third, fourth and fifth-order polynomials (this can be selected for very easily in Excel). I did not like the fifth-order polynomial because the trend line actually went down, then came up, and declined again. That may be possible, but I want to stick with a simple rising, then falling trend. The fourth-order polynomial has a similar inverted U-shape found in the graph above and had a higher R-squared of .9567. In that case, the equation was
(3) RCAA = -0.0006*AGE4 + 0.082*AGE3 - 4.3663*AGE2 + 103.48*AGE - 876.52
If I wanted to find the peak age using a derivative, I would end up with a third-order polynomial or cubic function. If that is set equal to zero, it can be pretty complex to find the value for AGE or whatever the unknown variable is. So I just plugged in every age from 20 to 40 (using every tenth like 20.1. 20.2, etc) into equation (3). Then I sorted the results to get the highest predicted RCAA. In that case, the peak age was 26.7 (with a predicted RCAA of 29.58). So that is much younger than the other ages found for peak value earlier (29 and 28.36). I also only used ages 22-40 for this fourth-order polynomial case since the average RCAA at age 21 was lower than age 20 and the equation for the age 20-40 case gave an unrealistically low predicted value for the peak age average RCAA of only about 19 (that peak age was 25.8). That makes little sense if you look at the graph above. If I restricted the ages to 22-40 and did a second-order polynomial, the peak age would be 28.54 and the r-squared would be .9361. The predicted RCAA would be 32.3.
I also found the number of players having their best season (highest RCAA) at various ages. This is in the next table:
Then I broke the 90 players into five groups by ranking each guy by their average yearly RCAA. Here are the ages which had the highest average RCAA for the top 18, the middle 18 and the lowest 18 along with the average RCAA at that age.
Then I found the trend line for the top 18, the middle 18 and the lowest 18 and what age would be predicted to be the peak age by both the second-order and fourth-order polynomials for each of the three cases. For each of the three categories, if there were not at least 8 cases for an age, it was not included. I found the peak by either using the derivative or the sorting method mentioned earlier. For the second-order polynomials, here are the peak ages and the RCAA predicted by the trend line or equation:
For the fourth-order polynomials, the peak ages and their RCAAs were
The last one is a negative RCAA. None of the others are. Also, for the fourth-order polynomial for the bottom group, I used all observations since this gave a curve which did not change directions (the one that only included ages with at least 8 cases did and the predicted values did not make sense since the predicted values just kept getting more negative with age). So, for the bottom group, I had some ages with only 1 or 2 cases.
So, I found lots of different possible ages for peak value. I can't say I know for sure which one I would pick. But the ages I found generally are between 28-30 or so.
Links to other research on Bonds and aging patterns in general
"Has Anyone Aged as Well As Barry Bonds?"
"Smoothing Career Trajectories" by Jim Albert "By the Numbers" August 2002
JC Bradbury has these studies posted at his site:
"ESTIMATED AGE EFFECTS IN BASEBALL" By Ray C. Fair
Dr. Fair is a well known and respected economist.