Beyond the Box Score: An SB Nation Community

Navigation: Jump to content areas:


Sports blogs for fans, by fans.
Around SBN: Fedor vs Rogers Results and Live Coverage

Downloadable Run Distribution Fun!

We usually talk about baseball statistics in terms of gross averages.  But statistics deals with probabilities, not clairvoyance, and so it is always important to think of how values are spread across time or skill level.  This sort of thinking is prevalent in Baseball Prospectus' PECOTA projections and the many iterations of DIPS theory.

The sort of distribution in which I am most interested is run distribution.  I used run distributions to analyze the 2005 AL West race (Part One and Part Two) late last year.  My conclusion was that the way a team distributes its runs scored and allowed can have consequences on its wins and losses in a way that manifests itself as deviations from projected Pythagorean wins.  

I realize that my earlier article reflected my extreme West Coast Media Bias, so I thought it would be fun if everybody could see the run distributions for their favorite teams.  Thankfully, a combination of Retrosheet, Excel, and MATLAB makes generating these plots quite easy.  So I've generated run distribution plots for both runs scored and runs allowed for all major league teams from 1998 - the latest expansion - to 2004. (Whither 2005?  I have to wait until Retrosheet has its 2005 game logs up.)

But that's not all!  On each run distribution plot, I have also included the Weibull distribution curve that theoretically describes run distribution.  For those of you unfamiliar, the Weibull curve shows the expected distribution of run scoring and run prevention.  Deviations from the curve can manifest themselves as deviations from projected Pythagorean wins.  Before I present the run distribution plots, I thought it would be helpful if we go over some Weibull/Pythagorean basics, but feel free to skip over the mathematical junk if you wish.

What the hell is a Weibull Distribution and why should I care?
Steven Miller of Brown University has shown that a three-parameter Weibull distribution describes the run distribution of teams quite well.

In English, the frequency f with which a team scores (or allows) x runs is equal to a long messy equation with three parameters, α, β, and γ.  The real magic of the Weibull distribution is that it can be used to derive the Pythagorean theorem - and the parameter &gamma is the same as the exponent in the Pythagorean theorem.

Where did you get the parameters for the Weibull curve, smartypants?
I used the following parameters to generate the Weibull curves:

β = -0.5. This is a mathematical trick that Professor Miller's paper discusses in some detail.  I won't rehash it here, but you can check the original paper if you are interested.

γ = (Runs/Game)^.287.  This is the the Smyth-Patriot model, and this parameter is calculated for the entire major leagues for each year.  It is probably more correct to calculate γ separately for each team, but I think the gains are marginal.

α is computed so that the observed average runs scored (or allowed) is equal to the Weibull-determined average.  By taking the first moment of the Weibull distribution, the average μ can be computed as

where Γ is the well-known gamma function.  Thus

and it is calculated separately for both runs scored and runs allowed.  This is not as robust as minimizing the mean-square error, but it sure is quicker.

Why didn't you separate National and American Leagues when calculating γ?
I didn't see a need to.  If you can come up with a convincing explanation as to why I should, please let me know.

How do I read the plots?
Here's a sample:

Each plot shows one team's distributions for runs scored (top) and runs allowed (bottom).  The x-axis is runs and the y-axis is frequency.  The open circles represent actual data and the line represents the Weibull curve.  Each season comes in its own zipped file which you may download and includes distributions for all 30 teams as well as a league-wide distribution which has the name Dist_ML_XX.bmp where XX = last two digits of the year.

What does it all mean?
I don't know.  Maybe nothing.  But it is known that deviations from the curve can impact a team's record differently depending on its location.  For example, a team that has scores runs more often than Weibull predicted at 7+ runs but less often between 2-5 runs isn't doing itself any favors, as there is a decreasing marginal utility to additional runs.  I discussed some consequences in my above-linked 2005 AL West article; let me know if you think of some more.

Is this available numerically?
Yes.  There are links below to Run Distribution Reports for each year that show the league exponent (γ, displayed in the reports as "g"), aggregate win frequency by runs scored, and Major League averages for α (displayed as "a_RS_league").  For each team, the actual run distribution and Weibull distribution are shown, as well as α for runs scored and allowed (shown as "a_RS" and "a_RA," respectively).

You're my favorite BtB author, and you are also very attractive.  Where have you been?
You'll thank me when you see all the good that polymer brushes grafted to semiconductors does for mankind.

Who's that funny-looking dude?
Welcome to Beyond the Boxscore.

What's that noise?
Welcome to Beyond the Boxscore.

Where's that awful smell coming from?
Welcome to Beyond the Boxscore.

Are you going to let me see the damn things?
Sheesh.  Pushy, pushy.  Here they are:

Run Distribution Report 1998 (best viewed in your browser)
Run Distribution Plots 1998 (right-click to save zipped file)

Run Distribution Report 1999
Run Distribution Plots 1999

Run Distribution Report 2000
Run Distribution Plots 2000

Run Distribution Report 2001
Run Distribution Plots 2001

Run Distribution Report 2002
Run Distribution Plots 2002

Run Distribution Report 2003
Run Distribution Plots 2003

Run Distribution Report 2004
Run Distribution Plots 2004

Comments on the plots, both in substance and style, are welcome, as are suggestions for how to get rid of this damn athlete's foot.


Update [2006-2-23 20:26:1 by salb918]: All the plots are in .bmp format, so they should be viewable by just about anybody.

0 recs  |  Comment 2 comments

Story-email Email Printer Print

Comments

Display:

Athlete's foot
You need BOOM! Tough actin' Tinactin.

Oh, and this was a good article... you simplified it enough so that the history major here (me) could understand most of it.

by Dan Scotto on Feb 23, 2006 4:55 PM EST reply actions   0 recs

I can hear
John Madden's voice in my head as we speak.

by salb918 on Feb 23, 2006 4:56 PM EST up reply actions   0 recs

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?
Start posting on Beyond the Box Score »

Join SB Nation and dive into communities focused on all your favorite teams.

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
PZR-based Win Values 2001-2006
Small
The "30 parks on a budget" challenge
Sunflower_small
World Series Simulation, Game #6
Small
JT20 Dynasty League
E52205a2_small
New Look
Sth70021_small
Exploring Hit f/x, Albeit Badly
Redcap_small
Ricky Nolasco: 4 WAR or 1 WAR?
Redcap_small
Apparently I can't do park adjustments
Small
Which tells us more: The last 7 at bats or 7 at bats against this pitcher?
Sleepy_jeff_small
How Efficient and Effective Were the Rockies in 2009?

+ New FanPost All FanPosts >

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

The Mistake Lottery
On the Field, the Yankees Are The Team of the Decade. Off It? The Red Sox.
Tigers' all-time WAR leaders
Primer on Runs Created
How to improve basketball
LB Keith Bulluck uses a sabermetric analogy to explain the Titans' quarterback situation.
Alcides Escobar "abandoned his daughter before she was born"
UZR, Scouting, and the Fans
Not-So-Lousy Lineup Optimizer, Playoff Edition: New York Yankees

+ New FanShot All FanShots >

BtB on Twitter

Main Feed: @BtBScore

Tommy B: @tommy_bennett
Sky: @BtB_Sky
Dan: @dturkenk
Harry: @harrypav
Jinaz: @jinazreds
Jack: @jh_moore
Erik: @Erik_Manning
Tommy R: @trancel
Justin: @justinbopp

Subscribe to BtB via Email

Enter your email address:

Delivered by FeedBurner

Most Commented

Limes_125_small
Time To Move On
Nando_small
A Complete and Lenghty List of Baseball-Related Things Miguel Olivo is Good At
770insig_small
Negative Team WAR - 2009 Edition
Aviles_small
Minnesota: Fielding TargetView Before & After JJ Hardy
E52205a2_small
New Look

Managers

Nando_small R.J. Anderson

Limes_125_small Sky Kalkman

E52205a2_small Tommy Bennett

Editors

Face_small Harry Pavlidis

Rawlings_baseball_bigger_small Dan Turkenkopf

770insig_small Jeff Zimmerman (TucsonRoyal)

Aviles_small Justin Bopp

Authors

Banny_small erik

Raysring1_small Tommy Rancel

Jinaz-reds-avatar_small JinAZ

Jmlogo_small Jack Moore

1753738656_110919ebe9_o_small vivaelpujols

1_small Graham

Baseball_small Mike Rogers

Redcap_small SFiercex4

Small Patrick Clark

Walter_album_small Walter Fulbright