Beyond the Box Score: An SB Nation Community

Navigation: Jump to content areas:


Sports blogs for fans, by fans.
Around SBN: Sean Keeley's Week 12 College Football Buffet

Downloadable Run Distribution Fun!

We usually talk about baseball statistics in terms of gross averages.  But statistics deals with probabilities, not clairvoyance, and so it is always important to think of how values are spread across time or skill level.  This sort of thinking is prevalent in Baseball Prospectus' PECOTA projections and the many iterations of DIPS theory.

The sort of distribution in which I am most interested is run distribution.  I used run distributions to analyze the 2005 AL West race (Part One and Part Two) late last year.  My conclusion was that the way a team distributes its runs scored and allowed can have consequences on its wins and losses in a way that manifests itself as deviations from projected Pythagorean wins.  

I realize that my earlier article reflected my extreme West Coast Media Bias, so I thought it would be fun if everybody could see the run distributions for their favorite teams.  Thankfully, a combination of Retrosheet, Excel, and MATLAB makes generating these plots quite easy.  So I've generated run distribution plots for both runs scored and runs allowed for all major league teams from 1998 - the latest expansion - to 2004. (Whither 2005?  I have to wait until Retrosheet has its 2005 game logs up.)

But that's not all!  On each run distribution plot, I have also included the Weibull distribution curve that theoretically describes run distribution.  For those of you unfamiliar, the Weibull curve shows the expected distribution of run scoring and run prevention.  Deviations from the curve can manifest themselves as deviations from projected Pythagorean wins.  Before I present the run distribution plots, I thought it would be helpful if we go over some Weibull/Pythagorean basics, but feel free to skip over the mathematical junk if you wish.

What the hell is a Weibull Distribution and why should I care?
Steven Miller of Brown University has shown that a three-parameter Weibull distribution describes the run distribution of teams quite well.

In English, the frequency f with which a team scores (or allows) x runs is equal to a long messy equation with three parameters, α, β, and γ.  The real magic of the Weibull distribution is that it can be used to derive the Pythagorean theorem - and the parameter &gamma is the same as the exponent in the Pythagorean theorem.

Where did you get the parameters for the Weibull curve, smartypants?
I used the following parameters to generate the Weibull curves:

β = -0.5. This is a mathematical trick that Professor Miller's paper discusses in some detail.  I won't rehash it here, but you can check the original paper if you are interested.

γ = (Runs/Game)^.287.  This is the the Smyth-Patriot model, and this parameter is calculated for the entire major leagues for each year.  It is probably more correct to calculate γ separately for each team, but I think the gains are marginal.

α is computed so that the observed average runs scored (or allowed) is equal to the Weibull-determined average.  By taking the first moment of the Weibull distribution, the average μ can be computed as

where Γ is the well-known gamma function.  Thus

and it is calculated separately for both runs scored and runs allowed.  This is not as robust as minimizing the mean-square error, but it sure is quicker.

Why didn't you separate National and American Leagues when calculating γ?
I didn't see a need to.  If you can come up with a convincing explanation as to why I should, please let me know.

How do I read the plots?
Here's a sample:

Each plot shows one team's distributions for runs scored (top) and runs allowed (bottom).  The x-axis is runs and the y-axis is frequency.  The open circles represent actual data and the line represents the Weibull curve.  Each season comes in its own zipped file which you may download and includes distributions for all 30 teams as well as a league-wide distribution which has the name Dist_ML_XX.bmp where XX = last two digits of the year.

What does it all mean?
I don't know.  Maybe nothing.  But it is known that deviations from the curve can impact a team's record differently depending on its location.  For example, a team that has scores runs more often than Weibull predicted at 7+ runs but less often between 2-5 runs isn't doing itself any favors, as there is a decreasing marginal utility to additional runs.  I discussed some consequences in my above-linked 2005 AL West article; let me know if you think of some more.

Is this available numerically?
Yes.  There are links below to Run Distribution Reports for each year that show the league exponent (γ, displayed in the reports as "g"), aggregate win frequency by runs scored, and Major League averages for α (displayed as "a_RS_league").  For each team, the actual run distribution and Weibull distribution are shown, as well as α for runs scored and allowed (shown as "a_RS" and "a_RA," respectively).

You're my favorite BtB author, and you are also very attractive.  Where have you been?
You'll thank me when you see all the good that polymer brushes grafted to semiconductors does for mankind.

Who's that funny-looking dude?
Welcome to Beyond the Boxscore.

What's that noise?
Welcome to Beyond the Boxscore.

Where's that awful smell coming from?
Welcome to Beyond the Boxscore.

Are you going to let me see the damn things?
Sheesh.  Pushy, pushy.  Here they are:

Run Distribution Report 1998 (best viewed in your browser)
Run Distribution Plots 1998 (right-click to save zipped file)

Run Distribution Report 1999
Run Distribution Plots 1999

Run Distribution Report 2000
Run Distribution Plots 2000

Run Distribution Report 2001
Run Distribution Plots 2001

Run Distribution Report 2002
Run Distribution Plots 2002

Run Distribution Report 2003
Run Distribution Plots 2003

Run Distribution Report 2004
Run Distribution Plots 2004

Comments on the plots, both in substance and style, are welcome, as are suggestions for how to get rid of this damn athlete's foot.


Update [2006-2-23 20:26:1 by salb918]: All the plots are in .bmp format, so they should be viewable by just about anybody.

0 recs  |  Comment 2 comments

Story-email Email Printer Print

Comments

Display:

Athlete's foot
You need BOOM! Tough actin' Tinactin.

Oh, and this was a good article... you simplified it enough so that the history major here (me) could understand most of it.

by Dan Scotto on Feb 23, 2006 4:55 PM EST reply actions   0 recs

I can hear
John Madden's voice in my head as we speak.

by salb918 on Feb 23, 2006 4:56 PM EST up reply actions   0 recs

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?
Start posting on Beyond the Box Score »

Join SB Nation and dive into communities focused on all your favorite teams.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Small
PZR-based Win Values 2001-2006

Recent FanPosts

Leopold_butter_scotch_southpark_small
Using the TVC
Small
Determining Batted Ball Rates using Pitch Type and Location
Small
a new xBABIP calculator
Img587561916661595
Top 15 high school MLB draft prospects
Small
The "30 parks on a budget" challenge
Sunflower_small
World Series Simulation, Game #6
Small
JT20 Dynasty League
E52205a2_small
New Look
Sth70021_small
Exploring Hit f/x, Albeit Badly

+ New FanPost All FanPosts >

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

Primer on BaseRuns
Cool Baseball Infographics
ESPN's Jerry Crasnick on defensive metrics
I’m also a follower, since Brian Bannister’s on our team, of sabermetric st...
Top Ten Baseball-Reference.com's Sponsorships
Primer on Linear Weights
JC Bradbury on "Hot Stove Myths"
Everyone Should Learn to Throw a Cutter
Criminals of WAR
Ten statisticians you should know about

+ New FanShot All FanShots >

BtB on Twitter

Main Feed: @BtBScore

Tommy B: @tommy_bennett
Sky: @BtB_Sky
Dan: @dturkenk
Harry: @harrypav
Jinaz: @jinazreds
Jack: @jh_moore
Erik: @Erik_Manning
Tommy R: @trancel
Justin: @justinbopp

Subscribe to BtB via Email

Enter your email address:

Delivered by FeedBurner

Most Commented

BtB Goes Social


Managers

Nando_small R.J. Anderson

Limes_125_small Sky Kalkman

E52205a2_small Tommy Bennett

Editors

Face_small Harry Pavlidis

Rawlings_baseball_bigger_small Dan Turkenkopf

770insig_small Jeff Zimmerman (TucsonRoyal)

Aviles_small Justin Bopp

Authors

Banny_small erik

Raysring1_small Tommy Rancel

Jinaz-reds-avatar_small JinAZ

Jmlogo_small Jack Moore

1753738656_110919ebe9_o_small vivaelpujols

1_small Graham

Baseball_small Mike Rogers

Redcap_small SFiercex4

Small Patrick Clark

Walter_album_small Walter Fulbright