Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Tiger Woods, Tony Romo Grouped Together At Pebble Beach

Applying the Pareto Principle (80-20 Rule) to Baseball

I was checking out Freakonomics blog and came across this questions answered section.  In it, David Berri and Martin Schmidt answered questions on their newest book, "Stumbling on Wins."  They mention that stars are really important for teams to wins and state that:

About 80% of wins appear to be produced by 20% of the players. 

The theory behind this statement is the Pareto Principle or 80-20 rule. It states that 20% percent of a population does 80% of the work. The principle was developed by Vilfredo Pareto, who noticed that 20% of his pea pods produced 80% of his peas. He moved on to show that 20% of the world's population control 80% of the wealth. The principle has been used in software design, customer service, health care and criminal behavior.

I decided to look to see if this was true with major league baseball players. I took all players in 2009 and looked to see how much WAR (Rally's WAR database) was created in the top 20% of the players and the amount in bottom 80%. I put all players together and did not group depending on amount of time or negative WAR.

Star-divide

A total of 1263 players were divided up. The top 20% of all players generated 945 WAR while the bottom 80% generate -89 WAR. This works out to the top 20% of all baseball players generating 110% of the results. The numbers here aren't ideal so I adjusted the percentages to look at 15% and 10% of all players. Here is a chart of the results:

% of Players % of WAR
20.0% 110.4%
15.0% 85.3%
10.0% 78.3%

So 15% of the all the players last year produced 85% of the total wins with the other 85% of the players creating 15% of the wins. The Pareto Principle holds up pretty soundly when it is applied to baseball last year.

 War-distribution-parento_medium


Baseball's model fits closely into the model of a store. The store will get most of its profits from a small sample of people. These are the regulars that come in everyday and make their large purchases at the store. Most of the other people will come in for just few items once in a while.  Also there will be a set of people that end up costing the store money through lost productivity or theft. 

Like the store, baseball has its big stars that generate most of the production (Albert Pujols), players that don't contribute much (AAAA journeymen) or players that actually sap production (Jose Guillen).  The Pareto Principle is applicable to baseball as with many other aspects of our lives.

Comment 21 comments  |  3 recs  | 

Do you like this story?

Comments

Display:

What happens if you don't count negative WAR?

Either setting all WAR < 0 to 0 or removing -WAR players from the sample entirely (which would, of course, lead to different results).

"There's never enough time to do all the nothing you want" -Bill Watterson

by nevermoor on Jun 4, 2010 3:40 PM EDT reply actions  

I like it, but maybe a different stat is better

Pareto works best with counts or other data that must be positive so maybe WAR isn’t the best stat to use. Maybe runs created would be better.

How about doing a split comparing pitchers and hitters?

by Joelestra on Jun 4, 2010 3:52 PM EDT reply actions  

Or

We could just make set the lowest WAR in the league as zero…

by Justin Bopp on Jun 4, 2010 4:03 PM EDT up reply actions  

I would have gone with wRC

Quick off-the-top-of-my-head calculation would have the top 20% contributing ~91% of wRC. The point of the article remains strong, however.

Blogger and Editor, Rational Pastime Blog

by J-Doug on Jun 4, 2010 11:53 PM EDT up reply actions  

What if you rephrase the question as...

What percentage of players does it take to reach 80% of run production?

And in this case, isn’t WAR almost “misguided”? Negative WAR doesn’t mean negative production, it means less than replacement production.

As a result, wouldn’t it make more sense to look at it from a zero baseline?
For example, Jose Guillen has been responsible for say, 20 created runs while a replacement player would have created 25 runs at this point — that results in him being a -.5 WAR player; however, he’s still responsible for producing 20 runs, regardless of whether that’s better or worse than a AAAA-scrub.

by Trickman on Jun 4, 2010 3:59 PM EDT reply actions  

Hmm...I'm getting much different results.

I added 3.9 to every player’s WAR to remove negatives.

Method
Total Players: 1266
Total WAR: 5793.3

20% of 1266: 253

The top 253 players combined WAR: 1830.60

1830.60/5793.30 = 31.6%

by Justin Bopp on Jun 4, 2010 4:16 PM EDT reply actions  

I deleted a couple erroneous numbers in my spreadsheet

so my totals are ever-so-slightly off, but I’m afraid the result is the same.

In fact, graphing the data out shows a beautiful curve with the inflection point just below center (if leaving the data as is, just below zero, or adjusted as I showed, just below 4).

Not sure this one is going to hold up, Jeff.

by Justin Bopp on Jun 4, 2010 4:32 PM EDT up reply actions  

you can't just 3.9 WAR to every player.

You end up with alway 5000 extra wins over the season. those players that are at 0 (cup-of-joes), can’t each be getting 4 wins each.

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Jun 4, 2010 4:56 PM EDT up reply actions  

If you were to take people's net worth to see where the money is concentrated

The people that are negative aren’t given an extra $100K because the lowest person in in the whole that much. Maybe I could see just putting the (-) to zero, but that also messes with the total wins.

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Jun 4, 2010 5:08 PM EDT up reply actions  

I see were you are getting at, but you just can't add 3.9

Guillen or Yuni or Jacobs (some shitty Royal) is the worst player with the -3.9 WAR. If you set him to 0, you just can’t give the guy with 1 AB 3.9 more WAR. They would also be near zero, the baseline set with Guillen. What should be used for an above 0 method would be Win Shares

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Jun 4, 2010 6:02 PM EDT up reply actions  

I think

The problem is that you’re counting the negatives against total WAR, which reduces the denominator and makes the percentage of the top much bigger than it should be.

by Justin Bopp on Jun 4, 2010 6:41 PM EDT up reply actions  

That is what happens when using WAR and the Pareto Principle can handle negative numbers.

Let me re-run the numbers tonight with the 2008 Win Share numbers which bases all the players at 0 and goes only up from there.

- .-. ..- … – / – …. . / .—. .-. - .. . … …

by Jeff Zimmerman on Jun 4, 2010 6:44 PM EDT up reply actions  

My point is that adding 3.9 to all numbers shouldn’t change the set of data, nor the percentage of total WAR that the top 20% represents—unless you’re suggesting that those in the positive are ‘making up for’ those in the negative, in which case all WAR would not be created equal. And I don’t think you’re saying that.

by Justin Bopp on Jun 4, 2010 6:58 PM EDT up reply actions  

It's also worth noting

that under your original model those with 0.0 WAR aren’t really being counted at all, which represents a significant population.

by Justin Bopp on Jun 4, 2010 7:00 PM EDT up reply actions  

A Different Approach: Non-Pitchers Only

I reran your study but limited it to position players. I did that so I could use traditional stats rather than the esoteric WAR. And since pitchers aren’t in MLB for their batting prowess I removed their offense as well.

According to fangraphs, 949 players had at least one PA in 2009. 344 of those were pitchers. Take them out and you’re left with 605. 20% of 605 is 121. So I compared the totals for the Top 121 batters (in terms of PA’s) to the MLB non-pitcher totals.

PA’s: The Top 121 had 76,502 PA’s out of 181,051. So 20% of the players got 42.3% of the opportunities to produce offensively. How well did they do?

Hits: They got 19,225 hits in 67,521 AB’s, a .285 BA. All non-pitchers went 42,836 for 160,769, .266. So the Top 20% were indeed more productive than the rest, as you’d expect. But nowhere near 80%. In fact in percentage terms the difference looks insignificant: they had 42% of the AB’s and 44.9% of the Hits. How about extra-base hits?

XBH: The Top 20% had 6,787 XBH out of 14,590 for all non-pitchers, which is 46.5%. So yes, they did better in this category than they did as singles hitters (44.0%). But once again the difference is surprisingly small. How about HR’s?

HR: 2,493/5,015 = 49.7%. So we can say 20% of the batters produced 50% of the HR’s if we round off. (Of course I could get a higher number if I select the Top 20% in terms of HR’s hit instead of PA’s, since some of them were not included in my Top 20%. But I think that’s pushing it.) That’s the highest percentage I found for any of the traditional stats, with one exception: Intentional Walks.

IBB: 616/1,179 = 52.2%.

Almost every stat I looked at fell in the range between 42% and 50%. The only exceptions beside IBB’s were:

Sacrifice Bunts: 27.7%. (You don’t ask you’re best hitters to bunt very often.)
Strikeouts: 39.1%.
Triples: 41.3%.

What about the bottom line, Runs and RBI? The Top 20% produced 46.1% of R, 46.7% of RBI.

So there you have it. Yes the Top 20% of batters produced more than their share of the offense. But it’s mostly because they got more opportunities. They come to the plate 42% of the time and produce between 42 and 50% of the results. Nowhere near 80%.

by fjm235 on Jun 5, 2010 4:02 PM EDT reply actions  

Some additional information on Pareto...

For professional purposes, I use Pareto charts all of the time so I thought I’d share a few observations that are consistent with fjm235’s findings…

- This is most effective for smaller numbers of items. Looking across all players is almost certainly too large of a sample.

- When very large samples are used, the percentage contribution from 20% of the items looked at will rarely approach 80% and is more likely in the ranges seens above (40-50%).

- Let’s not pretend we’re debunking scientific law – the Pareto principle is simply a phenomenon that may have good practical application (in the quality world it is used to identify which defects occur most frequently so those can be addressed first) but has little theoretical basis other than the known fact that few things are distributed uniformly.

- We will most likely never get good results for baseball, because the sample we are looking at excludes players that do not perform well enough to play in the majors. So there’s a whole population of people that would be batting .050, for example, that would greatly inflate population size without increasing the total number of hits, BB, etc.

- When we say the “top 20%” it always has to be in terms of the metric of interest. Looking at the top 20% of PA and then counting HR is not good application of the pareto principle anyway.

One area we would most likely see the principle be effective is salaries…20% of the players are probably making close to 80% of the money.

I like the discussion thus far!

by Joelestra on Jun 7, 2010 10:48 AM EDT reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
Prince Fielder in Comerica Park
Crystal_ball_small
Sparky vs Buck
Img_3830_small
BtBS Fantasy League
Small
Context Neutral Run and RBI projections
Small
Free Agent Compensation
Img_0001_small
Value of Various Plate Approaches
Strike_three2_small
Effect of Foul Area on Strikeouts: AL 1954-68: Erratum
Small
Baseball on a stick
Small
Player Evaluating Statistic
Baseball_small
Rays Outfield: Cheap but Extremely Productive

+ New FanPost All FanPosts >

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Picture-6_small Chris St. John

Btbpro_small Dave Gershman

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung