Beyond the Box Score: An SB Nation Community

Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Around SBN: Ole Miss-Alabama: "Let's Go Eat.Wait. What Happened?"

Downloadable Run Distribution Fun!

We usually talk about baseball statistics in terms of gross averages.  But statistics deals with probabilities, not clairvoyance, and so it is always important to think of how values are spread across time or skill level.  This sort of thinking is prevalent in Baseball Prospectus' PECOTA projections and the many iterations of DIPS theory.

The sort of distribution in which I am most interested is run distribution.  I used run distributions to analyze the 2005 AL West race (Part One and Part Two) late last year.  My conclusion was that the way a team distributes its runs scored and allowed can have consequences on its wins and losses in a way that manifests itself as deviations from projected Pythagorean wins.  

I realize that my earlier article reflected my extreme West Coast Media Bias, so I thought it would be fun if everybody could see the run distributions for their favorite teams.  Thankfully, a combination of Retrosheet, Excel, and MATLAB makes generating these plots quite easy.  So I've generated run distribution plots for both runs scored and runs allowed for all major league teams from 1998 - the latest expansion - to 2004. (Whither 2005?  I have to wait until Retrosheet has its 2005 game logs up.)

But that's not all!  On each run distribution plot, I have also included the Weibull distribution curve that theoretically describes run distribution.  For those of you unfamiliar, the Weibull curve shows the expected distribution of run scoring and run prevention.  Deviations from the curve can manifest themselves as deviations from projected Pythagorean wins.  Before I present the run distribution plots, I thought it would be helpful if we go over some Weibull/Pythagorean basics, but feel free to skip over the mathematical junk if you wish.

What the hell is a Weibull Distribution and why should I care?
Steven Miller of Brown University has shown that a three-parameter Weibull distribution describes the run distribution of teams quite well.

In English, the frequency f with which a team scores (or allows) x runs is equal to a long messy equation with three parameters, α, β, and γ.  The real magic of the Weibull distribution is that it can be used to derive the Pythagorean theorem - and the parameter &gamma is the same as the exponent in the Pythagorean theorem.

Where did you get the parameters for the Weibull curve, smartypants?
I used the following parameters to generate the Weibull curves:

β = -0.5. This is a mathematical trick that Professor Miller's paper discusses in some detail.  I won't rehash it here, but you can check the original paper if you are interested.

γ = (Runs/Game)^.287.  This is the the Smyth-Patriot model, and this parameter is calculated for the entire major leagues for each year.  It is probably more correct to calculate γ separately for each team, but I think the gains are marginal.

α is computed so that the observed average runs scored (or allowed) is equal to the Weibull-determined average.  By taking the first moment of the Weibull distribution, the average μ can be computed as

where Γ is the well-known gamma function.  Thus

and it is calculated separately for both runs scored and runs allowed.  This is not as robust as minimizing the mean-square error, but it sure is quicker.

Why didn't you separate National and American Leagues when calculating γ?
I didn't see a need to.  If you can come up with a convincing explanation as to why I should, please let me know.

How do I read the plots?
Here's a sample:

Each plot shows one team's distributions for runs scored (top) and runs allowed (bottom).  The x-axis is runs and the y-axis is frequency.  The open circles represent actual data and the line represents the Weibull curve.  Each season comes in its own zipped file which you may download and includes distributions for all 30 teams as well as a league-wide distribution which has the name Dist_ML_XX.bmp where XX = last two digits of the year.

What does it all mean?
I don't know.  Maybe nothing.  But it is known that deviations from the curve can impact a team's record differently depending on its location.  For example, a team that has scores runs more often than Weibull predicted at 7+ runs but less often between 2-5 runs isn't doing itself any favors, as there is a decreasing marginal utility to additional runs.  I discussed some consequences in my above-linked 2005 AL West article; let me know if you think of some more.

Is this available numerically?
Yes.  There are links below to Run Distribution Reports for each year that show the league exponent (γ, displayed in the reports as "g"), aggregate win frequency by runs scored, and Major League averages for α (displayed as "a_RS_league").  For each team, the actual run distribution and Weibull distribution are shown, as well as α for runs scored and allowed (shown as "a_RS" and "a_RA," respectively).

You're my favorite BtB author, and you are also very attractive.  Where have you been?
You'll thank me when you see all the good that polymer brushes grafted to semiconductors does for mankind.

Who's that funny-looking dude?
Welcome to Beyond the Boxscore.

What's that noise?
Welcome to Beyond the Boxscore.

Where's that awful smell coming from?
Welcome to Beyond the Boxscore.

Are you going to let me see the damn things?
Sheesh.  Pushy, pushy.  Here they are:

Run Distribution Report 1998 (best viewed in your browser)
Run Distribution Plots 1998 (right-click to save zipped file)

Run Distribution Report 1999
Run Distribution Plots 1999

Run Distribution Report 2000
Run Distribution Plots 2000

Run Distribution Report 2001
Run Distribution Plots 2001

Run Distribution Report 2002
Run Distribution Plots 2002

Run Distribution Report 2003
Run Distribution Plots 2003

Run Distribution Report 2004
Run Distribution Plots 2004

Comments on the plots, both in substance and style, are welcome, as are suggestions for how to get rid of this damn athlete's foot.


Update [2006-2-23 20:26:1 by salb918]: All the plots are in .bmp format, so they should be viewable by just about anybody.

0 recs  |  Comment 2 comments

Story-email Email Printer Print

Comments

Display:

Athlete's foot
You need BOOM! Tough actin' Tinactin.

Oh, and this was a good article... you simplified it enough so that the history major here (me) could understand most of it.

by Dan Scotto on Feb 23, 2006 4:55 PM EST reply actions   0 recs

I can hear
John Madden's voice in my head as we speak.

by salb918 on Feb 23, 2006 4:56 PM EST up reply actions   0 recs

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?
Start posting on Beyond the Box Score »

Join SB Nation and dive into communities focused on all your favorite teams.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

770insig_small
BtB's "Ball On A Budget" Fantasy League - Discuss Participants, Payrolls and Position Eligibility

Recent FanPosts

Ds9_small
good graphing program?
Small
Predicting HR/FB Rates
Leopold_butter_scotch_southpark_small
Troy Tulowitzki vs Ryan Braun
Small
Pitchers batted ball observations
Small
Eric Byrnes: A player worth a look?
Small
Valverde Is Charging Detroit Double
Mukuro_small
Another question: About power rankings
Small
Why You Shouldn't Trade for Arroyo
Jinaz-reds-avatar_small
Last Call for BtB Sabermetric Writing Award Nominations

+ New FanPost All FanPosts >

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

If you care about newspaper coverage of MLB, read this post
Visualizing the Difference Between Offensive and Defensive Value for Catchers
First B-Pro and now ESPN. Tommy, you're growing up so fast
THT - Advancing by ground
Negro League Museum Close to Folding
It is a capital mistake to theorize before one has data. Insensibly one...
Ranking Minor League Systems Using Victors Wang's Prospect Valuations
Pitch f/x on Ricky Nolasco Stretch vs. Windup again
Veron Wells the artist.  I never knew.

http://www.vwellsart.com/
A Dream Team... in honor of Dr. King

+ New FanShot All FanShots >

BtB on Twitter

Main Feed: @BtBScore

Jeff: @jeffwzimmerman
Steve: @steve_sommer
Sky: @BtB_Sky
Dan: @dturkenk
Harry: @harrypav
Jinaz: @jinazreds
Jack: @jh_moore
Erik: @Erik_Manning
Tommy R: @trancel
Justin: @justinbopp

Subscribe to BtB via Email

Enter your email address:

Delivered by FeedBurner

BtB Goes Social


Managers

Wbc_029_small Jeff Sullivan

Editors

Rawlings_baseball_bigger_small Dan Turkenkopf

Limes_125_small Sky Kalkman

770insig_small Jeff Zimmerman (TucsonRoyal)

Aviles_small Justin Bopp

Authors

Roots_game_small R.J. Anderson

Jinaz-reds-avatar_small JinAZ

Face_small Harry Pavlidis

1753738656_110919ebe9_o_small vivaelpujols

Ozzie_small erik

Raysring1_small Tommy Rancel

Redcap_small SFiercex4

St_louis_cardinals_ce1141_003263_small stevesommer05

Paige_small Satchel Price