Beyond the Box Score: An SB Nation Community

Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook

Downloadable Run Distribution Fun!

We usually talk about baseball statistics in terms of gross averages.  But statistics deals with probabilities, not clairvoyance, and so it is always important to think of how values are spread across time or skill level.  This sort of thinking is prevalent in Baseball Prospectus' PECOTA projections and the many iterations of DIPS theory.

The sort of distribution in which I am most interested is run distribution.  I used run distributions to analyze the 2005 AL West race (Part One and Part Two) late last year.  My conclusion was that the way a team distributes its runs scored and allowed can have consequences on its wins and losses in a way that manifests itself as deviations from projected Pythagorean wins.  

I realize that my earlier article reflected my extreme West Coast Media Bias, so I thought it would be fun if everybody could see the run distributions for their favorite teams.  Thankfully, a combination of Retrosheet, Excel, and MATLAB makes generating these plots quite easy.  So I've generated run distribution plots for both runs scored and runs allowed for all major league teams from 1998 - the latest expansion - to 2004. (Whither 2005?  I have to wait until Retrosheet has its 2005 game logs up.)

But that's not all!  On each run distribution plot, I have also included the Weibull distribution curve that theoretically describes run distribution.  For those of you unfamiliar, the Weibull curve shows the expected distribution of run scoring and run prevention.  Deviations from the curve can manifest themselves as deviations from projected Pythagorean wins.  Before I present the run distribution plots, I thought it would be helpful if we go over some Weibull/Pythagorean basics, but feel free to skip over the mathematical junk if you wish.

What the hell is a Weibull Distribution and why should I care?
Steven Miller of Brown University has shown that a three-parameter Weibull distribution describes the run distribution of teams quite well.

In English, the frequency f with which a team scores (or allows) x runs is equal to a long messy equation with three parameters, α, β, and γ.  The real magic of the Weibull distribution is that it can be used to derive the Pythagorean theorem - and the parameter &gamma is the same as the exponent in the Pythagorean theorem.

Where did you get the parameters for the Weibull curve, smartypants?
I used the following parameters to generate the Weibull curves:

β = -0.5. This is a mathematical trick that Professor Miller's paper discusses in some detail.  I won't rehash it here, but you can check the original paper if you are interested.

γ = (Runs/Game)^.287.  This is the the Smyth-Patriot model, and this parameter is calculated for the entire major leagues for each year.  It is probably more correct to calculate γ separately for each team, but I think the gains are marginal.

α is computed so that the observed average runs scored (or allowed) is equal to the Weibull-determined average.  By taking the first moment of the Weibull distribution, the average μ can be computed as

where Γ is the well-known gamma function.  Thus

and it is calculated separately for both runs scored and runs allowed.  This is not as robust as minimizing the mean-square error, but it sure is quicker.

Why didn't you separate National and American Leagues when calculating γ?
I didn't see a need to.  If you can come up with a convincing explanation as to why I should, please let me know.

How do I read the plots?
Here's a sample:

Each plot shows one team's distributions for runs scored (top) and runs allowed (bottom).  The x-axis is runs and the y-axis is frequency.  The open circles represent actual data and the line represents the Weibull curve.  Each season comes in its own zipped file which you may download and includes distributions for all 30 teams as well as a league-wide distribution which has the name Dist_ML_XX.bmp where XX = last two digits of the year.

What does it all mean?
I don't know.  Maybe nothing.  But it is known that deviations from the curve can impact a team's record differently depending on its location.  For example, a team that has scores runs more often than Weibull predicted at 7+ runs but less often between 2-5 runs isn't doing itself any favors, as there is a decreasing marginal utility to additional runs.  I discussed some consequences in my above-linked 2005 AL West article; let me know if you think of some more.

Is this available numerically?
Yes.  There are links below to Run Distribution Reports for each year that show the league exponent (γ, displayed in the reports as "g"), aggregate win frequency by runs scored, and Major League averages for α (displayed as "a_RS_league").  For each team, the actual run distribution and Weibull distribution are shown, as well as α for runs scored and allowed (shown as "a_RS" and "a_RA," respectively).

You're my favorite BtB author, and you are also very attractive.  Where have you been?
You'll thank me when you see all the good that polymer brushes grafted to semiconductors does for mankind.

Who's that funny-looking dude?
Welcome to Beyond the Boxscore.

What's that noise?
Welcome to Beyond the Boxscore.

Where's that awful smell coming from?
Welcome to Beyond the Boxscore.

Are you going to let me see the damn things?
Sheesh.  Pushy, pushy.  Here they are:

Run Distribution Report 1998 (best viewed in your browser)
Run Distribution Plots 1998 (right-click to save zipped file)

Run Distribution Report 1999
Run Distribution Plots 1999

Run Distribution Report 2000
Run Distribution Plots 2000

Run Distribution Report 2001
Run Distribution Plots 2001

Run Distribution Report 2002
Run Distribution Plots 2002

Run Distribution Report 2003
Run Distribution Plots 2003

Run Distribution Report 2004
Run Distribution Plots 2004

Comments on the plots, both in substance and style, are welcome, as are suggestions for how to get rid of this damn athlete's foot.


Update [2006-2-23 20:26:1 by salb918]: All the plots are in .bmp format, so they should be viewable by just about anybody.

0 recs  |  Comment 2 comments |

Story-email Email Printer Print

Comments

Display:

Athlete's foot
You need BOOM! Tough actin' Tinactin.

Oh, and this was a good article... you simplified it enough so that the history major here (me) could understand most of it.

by Dan Scotto on Feb 23, 2006 4:55 PM EST reply actions  

I can hear
John Madden's voice in my head as we speak.

by salb918 on Feb 23, 2006 4:56 PM EST up reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?
Start posting on Beyond the Box Score »

Join SB Nation and dive into communities focused on all your favorite teams.

Connect_with_facebook

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
FIP is a Garbage Statistic
Jeter_400_101709_small
Scarier opponent come October?
Ghanafan03_741584gm-a_small
Los Angeles Angels trade for Dan Haren
Pedoria1_small
Pointing Fingers: Rollie Fingers and WAR
Small
Rajai Davis versus Gabe Gross
Small
Year of the Pitcher
Sealab_murphy_small
Prospect Surplus Value
T-rex_small
Saberizing a Mac, revisited
Small
How do you use splits?
Sealab_murphy_small
My Wang Problem

+ New FanPost All FanPosts >

Sign up for the BtB Newsletter!

BtB on Facebook

BtB on Twitter

RSS Feed: @BtBScore

Sky: @BtB_Sky

Jeff: @jeffwzimmerman
Steve: @steve_sommer
Dan: @dturkenk
Harry: @harrypav
Jinaz: @jinazreds
Jack: @jh_moore
Tommy R: @trancel
Justin: @justinbopp
Satchel: @SatchelPrice
Adam: @baseballtwit
Larry: @wezen_ball
Peter: @CapitolAvenue
Paul: @TheDiaTribe
Daniel: @CamdenCrazies
Matt: @devil_fingers

SBNation.com Recent Stories

ST. LOUIS - MAY 18:  Ryan Ludwick #47 of the St. Louis Cardinals rounds third base after hitting a game-winning homerun against the Washington Nationals at Busch Stadium on May 18, 2010 in St. Louis, Missouri.  The Cardinals beat the Nationals 3-2.  (Photo by Dilip Vishwanat/Getty Images) +3 updates

Padres, Cardinals, Indians Complete Three-Way Trade Involving Ryan Ludwick, Jake Westbrook

SEATTLE - JULY 08:  Alex Rodriguez #13 of the New York Yankees hits an RBI single in the ninth inning to give the Yankees a 3-1 lead against the Seattle Mariners at Safeco Field on July 8 2010 in Seattle Washington. (Photo by Otto Greule Jr/Getty Images) +16 updates

Yankees' 9th-Inning Win Completely Overshadowed By A-Rod's Ongoing Homer Drought

Colorado Rockies' Carlos Gonzalez is congratulated by teammates after his walk-off home run against the Chicago Cubs in the ninth inning of a baseball game at Coors Field in Denver, Colo. on Saturday, July 31, 2010.  (AP Photo/ Matt McClain)

Carlos Gonzalez Completes Cycle With Walk-Off Homer; Rockies Beat Cubs, 6-5

More from SBNation.com >


Managers

Limes_125_small Sky Kalkman

Wbc_029_small Jeff Sullivan

Editors

Rawlings_baseball_bigger_small Dan Turkenkopf

Dayton_small Jeff Zimmerman (TucsonRoyal)

Aviles_small Justin Bopp

Paige_small Satchel Price

Authors

Jinaz-reds-avatar_small JinAZ

Face_small Harry Pavlidis

Newavatar_small Matt Klaassen

Wezenball-logo_small lar

Big_pun--300x300_small Tommy Rancel

Adam_small adarowski

Redcap_small SFiercex4

St_louis_cardinals_ce1141_003263_small stevesommer05

Small garik16

Julio_teheran_2_small PWHjort

Cclogo_small Daniel Moroz

Closeup4_small J-Doug

Nick_cage_small The DiaTriber