In basketball, often times a single player can determine the fate of a team. In the 2009-10 season, the Cleveland Cavaliers won 61 games and reached the Eastern Conference Finals. The following year, they declined by 42 wins to being the worst team in the Eastern Conference. The difference? They lost Lebron James in free agency to Miami.
Now, James is the best player in the world, but there are several lesser examples out there. At the same time, one player can't lead his team all the way over the top, as the world saw with Lebron James during his Cleveland years. In the words of Kareem, "One man can be a crucial ingredient on a team, but one man cannot make a team."
Now, in baseball, it's near impossible to build a team around one single transcendent player. Even the best starting pitcher goes once every five days, and the best hitter gets at most around 11-12% of his team's plate appearances. There are just too many times where a single player is not in the middle of the action to build around solely one player.
But how many players are necessary? Or, another way, how many players do you need to look at to really get an idea about how good a team is? Do we treat hitters and pitchers differently? What about when combining them?
In looking at this, we'll look at WAR. Now of course, the WAR of an entire team does correlate pretty strongly with the team's win percentage. However, we'll start by looking at the WAR of the best players on the team, and see how this correlates with win percentage. Then, we do the same for starting pitchers. In addition, we'll also look at these players through the scope of win probability added, and then combine both the rotation and lineup to see how they all go together.
In looking at this question, I limited the years to 2002-2012 (So that batted ball and win probability data would be available), and only looked at players with over 100 PAs or 30 IP. Originally, this had started out as a "What Do Great Teams Have in Common" post, but this became more interesting. So there will be a mix of both topics here. All data courtesy of FanGraphs.
Looking At Hitters
Now, as is obvious, a team's WAR is pretty well correlated with its actual winning percentage. So we could look at that as summing up the individual WAR of all players and looking at the correlation of that. However, we could look at the correlation of any subset of the team. In this case, we'll look at the subsets defined by the most WAR overall and the most PAs.
So, in looking at this, we take the sum of the top players for each team and look at the correlation between this summed WAR and their win percentage. Of course, as the number of players in the summation increases, the correlation generally also increases.
|Players in Summation||Correlation With W%|
|Top 2 Players||0.5488|
|Top 3 Players||0.6067|
|Top 4 Players||0.6488|
|Top 5 Players||0.6779|
|Top 6 Players||0.6973|
|Top 7 Players||0.7091|
|Top 8 Players||0.7174|
|Top 9 Players||0.7215|
|Top 10 Players||0.7242|
|Top 11 Players||0.7262|
|Top 12 Players||0.7233|
If we look at WPA, we see a very similar correlation curve. Now from this curve, we want to look for a diminishing returns type of phenomenon. Yes, we could add in more players to increase the correlation by a minimal amount. However, if a team believed that they needed to get one more very good player, the cost (In dollars) might not outweigh the increase in the correlation.
With the hitters, it seems that that point of diminishing returns occurs somewhere around the value of five. So we can get a decent indication of team performance from the top five batters's combined WAR, and the positive correlation implies the obvious: The higher this top five WAR, the better the win percentage.
Now having five solid to elite players (In order to try to ensure a higher win percentage) seems like a tough thing to ask, but of course, there is the pitching side of the equation to consider.
Looking at Pitchers
In the consideration of the pitchers, we'll only work with the starters, as the variability experienced in working with relievers leads to very low correlations. Again, we'll look at the top starters by WAR, and then observe the correlation between their combined WAR and team win percentage.
|Players in Summation||Correlation With W%|
|Top 2 Starters||0.5153|
|Top 3 Starters||0.5531|
|Top 4 Starters||0.5701|
|Top 5 Starters||0.5790|
|Top 6 Starters||0.5712|
|Top 7 Starters||0.5725|
Again, in looking for the diminishing returns point, we see the diminishing returns point at three starters. Surprisingly, the correlation of the starters to win percentage is lower than that of the position players. Equally interesting is that the correlation of both WAR and WPA/LI for the "ace" of the staff (Highest WAR on the team) is nearly equal to the correlation for the team's "closer" (most saves on the team), with these correlations being roughly between 0.4 and 0.45 for both WAR and WPA/LI.
Now, this doesn't take into account whether the team's rotation when looking at the top five position players. Similarly, a team's lineup isn't taken into account when looking at the top 3 starters. Do these numbers change when both aspects are taken into consideration?
Looking At Both Rotation and Lineup
So, there's two ways that we can look at this. The first is to just combine all the players into one dataset, and do as before; sum the WAR for the top players regardless of whether they are a position player or pitcher. This first technique gives us the following chart and diminishing return point.
So here it seems that the breaking point is roughly five or six players in determining a reasonable correlation with win percentage. However, this technique may be slightly shortsighted. The main reason being that it treats pitchers and position players the same, when in fact one might be more crucial than the other.
In order to do this, we need some sort of correlation to go with a regression on both pitchers and hitters. Of course, the R2 value immediately comes to mind. As a side note, the R2 value is the squared correlation between the fits of the regression and the actual data points, so the R2 is an appropriate correlation-type of measure that we want. In these cases, we'll actually get the correlation of the fits to the actual values for our plots.
The first plot is a 3D surface plot of the correlations. While this may be a little difficult to see (And unfortunately cannot be rotated), it was too fun to not be included. To make it a little easier to view, the second plot, a contour plot, is included.
Here, finding the point of diminishing returns is a little more difficult as we're dealing with movement in two dimensions. However, it looks like the diminishing returns appear to occur at three starters on the pitcher axis, and three or four position players on the batter axis.
How Good Do Players Have to Be?
So we've established that we can look at anywhere between five and seven of a team's best players and have a pretty decent indication of the team's ability. However, how good do these players have to be in order to ensure decent winning percentages? Further, what about the team's who build around a different number of starters or position players? How well would those individuals have to perform?
Let's start by looking at the diminishing returns point, three starters and three position players. We can initially get an idea by regressing the team winning percentage against the combined WAR for the top three starters and top three position players. While technically this should be fit by a GLM, the results are not appreciably different.
So, when running this regression, the equation for fit is
0.2628 + 0.0098 × Top 4 Batter WAR + 0.0104 × Top 3 Pitching WAR
So, from this, we can get ideas about what the mean estimates for win percentage would be for various combinations of pitching and hitting. This can be best seen in the contour plot below.
Now, this is just an estimate of the expected winning percentage. The chances that the team's winning percentage falls within plus/minus 0.025 points (Or four wins) is 43%. By contrast, the chances that an estimated win percentage falls within four wins of the estimate is 48% for a model using the top 12 position players and top seven starters.
Finally, this plot gives an idea of how much WAR the top players for each team need to generate in order to get a good season, defined as a 90 win (Or 0.556 win percentage). Roughly speaking, as a midpoint of all possible combinations, a team needs about 12.5 WAR from its top three in the rotation, and 17.5 from its top four position players. In other words, you need seven All-Stars (as FanGraphs defines four-to-five WAR as an All-Star), or a couple of MVPs and five good players. Beyond this core set, a good group of role players, innings-eaters, and the like can fill out the team. This type of estimates can be redone for any other number of building blocks.
So it seems you can "build" around two players (one starter and one position player) if they are both transcendent, MVP-type players. You can even build around a couple of superstars, but regardless, you have to augment these foundations with a solid group of good to All-Star second bananas.
. . .
All statistics courtesy of FanGraphs.
More from Beyond the Box Score:
- Lineouts! 08/08/13: Young arms, home run trots, and framing pitches
- The Smallest Sample Size 8/7/2013: AL MVP candidates go yard
- What happened to Denard Span?
- Giving some love to under the radar 2013 rookie hitters
- How has Jose Fernandez managed his massive success in 2013?