Beyond the Box Score: An SB Nation Community

Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Around SBN: Interview With UMD Athletic Director, Dr. Debbie Yow

Daily Box Score 8/25: Education, Statistics, and Baseball

Option4_medium

Our understanding of statistics and probability probably began formally in junior high or high school. You probably dedicated a few weeks from one of your math classes (which, if it included the words "statistics" or "probability" in the title, probably included other subjects in it as well). So you learned about coin flips and dice rolls. You may have even been required to learn the permutation and combination formulas (I really hope I just scared some people). But there's a systematic failure of people to understand basic statistical concepts.

Star-divide

Table of Contents

The Problem
The Personal Story
The Mistake
Improving
Discussion Question of the Day

 

The Problem

The problem we face is serious. I would guess that most people could tell you the odds of getting heads twice in two consecutive coin flips. Or at least I'm not going to spend time here worrying about those who can't--there are likely more important concerns for those individuals. But we still make consistent, categorical errors.

Take, for example, the Martingale fallacy. It's a common one (and I believe especially common among the sort of person inclined to wager on the outcomes of sporting contests). The attractiveness of the Martingale fallacy is that it sounds foolproof (insert rejoinder about things sounding too good to be true here). The idea is that you make a wager (say on a 50-50 proposition). If you win, great, you profit. If you lose, all you have to do is double your wager on the next game. If you win, you've come out ahead (-1, +2). If you've lost, just repeat, doubling your wager again. Eventually, you're bound to win, right? 

No:

Martingale betting systems are guaranteed to work provided that the gambler has an infinite amount of capital and no limits are imposed on the maximum bet that’s allowed to be placed. In the real world, both of these requirements cannot be realistically met. The amount bet grows exponentially, so the Martingale system ends up being a surefire way to bankrupt those who employs it.

The Martingale example has some important points to make. The first is that statistics and probability are HARD! We have to think things through carefully, lest we make errors that could have ruinous consequences. The second is that statistical truths often run counter to our initial assumptions. It's very tricky like that.

How are our intuitions often wrong? Here's a short list: we believe that the most recent outcomes have more weight than they actually do (regression to the mean), we ascribe causal relationships to relationships that are in fact coincidental or only weakly correlated (randomness), and we believe that past performance is an excellent predictor of future performance (poor projection).

The difficulty is that almost no one intuits these insights. Absent mathematical education, it's very difficult to intuit all of statistics from first assumptions. But I do think sabermetrics is a good way to get started.

The Personal Story

Wait, baseball is a good way to become educated about statistics? Really? Isn't it a pretty trivial subject on which to hinge your mathematics education? (Well, you and I know it's not trivial but you know what I mean). Take a look, it's in a blog!

To really get into sabermetrics, you need to possess two traits. First and foremost is a deep interest in baseball, but second is the desire to quantify things and to understand them. I may have been woefully lacking in the first department, but I already possessed the latter trait. As long as I can remember I was always interested in learning facts and reading. My favorite book when I was in the second grade was the World Almanac. When I was in kindergarten or first grade, I kept notecards with data about the planets on them--distance from the sun, length of day and year, diameter, etc.--even though I was obviously too young to really understand what it meant.

So it was only natural that when I did catch the baseball spark, it was only a matter of time until I was interested in records and statistics. And since I was predisposed to like that sort of thing, the wealth of records and statistics in baseball only strengthened my interest in the game. I did take a short detour into the world of baseball cards, but that only lasted through the spring of 1995, and I was always reading the numbers on the back.

I would like to second the fine personal story told over at Walk Like a Sabermetrician. The relationship between baseball and statistics, if they're your sort of thing, is a reinforcing one. The more you get into the statistics, the more you can learn about the game. And the more you learn about baseball, the more interesting the game becomes. Pretty soon, you know the difference between independent and dependent variables, can calculate a standard deviation, and might even remember the combination formula off the top of your head. 

Does that make you a nerd? Absolutely. Does it make you better able to understand the world around you? Ya darn skippy.

The Mistake

Because this stuff is so hard (and because I struggle with it myself), I don't like to be too hard on those who don't have a strong grasp on statistics. But sometimes it's important to use a mistake to illustrate (NOT to rebuke!). Here's one:

On the other hand, perhaps Kemp should be the Dodgers’ leadoff man. Yes, he’s a big run producer, but among his teammates who qualify for the batting title, he is first with a .371 on-base percentage.

There’s an even more convincing argument for putting Kemp at the top of the order. In 2009, no major leaguer is better at getting on base to open an inning. Among players with at least 50 plate appearances leading off an inning, Kemp tops the leader board with a .464 average (40-for-87). He’s also drawn 10 walks as the first batter, and his .515 OBP ranks first among this group, too.

There's not any proof (of which I am aware) that ability to bat in a certain slot in the order is a persistent trait. That doesn't mean that such a skill doesn't exist. But it does mean that from 87 ABs, we can say absolutely nothing about Matt Kemp's abilities as a leadoff hitter. Even if we could say anything about them, they would be dwarfed by what we can say based on his entire career, which is actually pretty good. So while the author here gets the original point right (Kemp should bat leadoff, or at least never eighth), part of the reasoning reinforces a common sort of statistical mistake.

Improving

So how better could we approach this subject? Here's a good stab by River Avenue Blues' Ben Kabak:

On the season, the Yankees have been unable to slam the door on innings. The team has allowed 589 runs, and 248 of those have come with two outs. That’s 42 percent of all runs. 

With two outs, the team’s sOPS+, a measure of the team’s OPS as compared to the league average for that split, is 104. For both no outs and one out, the team’s sOPS+ is 95. In other words, the Yanks are better than league average with zero and one outs but worse with two outs. Overall in the AL, just 36.8 percent of runs have scored with two outs.

Data is presented, then compared to league average...we could be on to something here! Let's make sure the conclusions drawn are sufficiently non-causal:

It’s tough to draw many conclusions from here. We’re looking at a rather selective sample that isn’t really indicative of anything other than past frustration. Will A.J. Burnett and Andy Pettitte always struggle with two outs? Probably not.

Well I'll be. Not half bad. Still, wouldn't it be nice if we could reach some kind of conclusion? (If anything, in this case, I think it suggests the Yankees have been run-unlucky.)

Help me out, Derek Zumsteg:

There’s some variation, of course, because the sample size for hitting is huge and the sample size with guys on second is small, and leans heavily on hitters who are up when the good hitters are on, and so on and so forth. But you can predict a team’s hitting with men on next year with this year’s hitting better than you can with this year’s hitting with men on.

And of course, the same goes for batters hitting in a specific lineup slot, or pitchers with a certain number of outs, or just about any contrived situation you might imagine in a baseball game. There are, of course, important exceptions. But before we go out saying that such and such is significant, wouldn't it be nice if someone had at least tried to reject the null hypothesis?

I'll finish with a guy who is really much smarter about statistics than I, Columbia professor Andrew Gelman (of Five Thirty Eight fame, such as it is):

One way things are changing is that there's a ton of raw, raw data--locations of where every ball landed on the field, things like that. In that case, the steps going from raw data to inference are going to be more apparent. With old-fashioned statistics such as batting and fielding averages, it can be easier to fool yourself into thinking of them as pure measurement. [...]

In evaluating players, though, you want to factor out the luck, as well as you can--especially if your goal is to evaluate how well the player will perform in future years. So I think it's important to make your inferential goal clear [...]

So can we all please promise that we will do our best to make our inferential goals clear?

Discussion Question of the Day

Which statistical truths do you find yourself tripping over, even if you know what the numbers say? I know that I have a hard time convincing myself not to over-weight recent performance, especially in situations like fantasy baseball. 

0 recs  |  Comment 10 comments |

Story-email Email Printer Print

Around SB Nation

A Plug

Aug 2009 from DRaysBay - 0 comments

Comments

Display:

Wish I would've paid more attention..

in all of my stats classes. My goal in those was merely getting in and getting out as I enjoyed my Operations Research based classes MUCH MUCH more.

by stevesommer05 on Aug 25, 2009 4:43 PM EDT reply actions   0 recs

Tommy, I really enjoy reading your Stats 101 articles. As much as I like to think I know statistics, reading your articles reminds me of how vast the subject really is and how much more I can and need and want to learn.

Which statistical truths do you find yourself tripping over, even if you know what the numbers say?

I am one of those that got on the LD% + .120 boat. Still keep slapping myself when I find myself doing that. There was a THT article a few months ago that showed little to no correlation between LD% and BABIP.

by Crashburn Alley on Aug 26, 2009 1:36 AM EDT reply actions   0 recs

Sometimes

I wonder how much of that is poor coding of data, especially when various sources differ so much on LD%. But yeah, it was a seductive myth.

by Tommy Bennett on Aug 26, 2009 7:18 PM EDT up reply actions   0 recs

Actually

I believe that the article showed their was little correlation in LD rate +.120 to predict future years BABIP. That isn’t surprising, because LD rates can vary significantly. If you had a properly regressed version of LD rate, I bet it would return a much higher correlation.

Smoltz.

by vivaelpujols on Aug 27, 2009 1:06 AM EDT up reply actions   0 recs

Er, to explain.

I mean it’s not a skill, and sometimes I forget that.

by R.J. Anderson on Aug 26, 2009 12:45 PM EDT up reply actions   0 recs

Yep

I still wonder why FanGraphs use FIP instead of xFIP.

Smoltz.

by vivaelpujols on Aug 27, 2009 1:07 AM EDT up reply actions   0 recs

How can people not read this stuff?

I don’t often ask for linkage, but if anyone who likes the DBS’s as much as I do and has their own site would give Tommy’s column a plug, that’d be awesome.

by Sky Kalkman on Aug 27, 2009 11:53 AM EDT reply actions   0 recs

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?
Start posting on Beyond the Box Score »

Join SB Nation and dive into communities focused on all your favorite teams.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

770insig_small
BtB's "Ball On A Budget" Fantasy League - Discuss Participants, Payrolls and Position Eligibility

Recent FanPosts

Ds9_small
good graphing program?
Small
Predicting HR/FB Rates
Leopold_butter_scotch_southpark_small
Troy Tulowitzki vs Ryan Braun
Small
Pitchers batted ball observations
Small
Eric Byrnes: A player worth a look?
Small
Valverde Is Charging Detroit Double
Mukuro_small
Another question: About power rankings
Small
Why You Shouldn't Trade for Arroyo
Jinaz-reds-avatar_small
Last Call for BtB Sabermetric Writing Award Nominations

+ New FanPost All FanPosts >

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

If you care about newspaper coverage of MLB, read this post
Visualizing the Difference Between Offensive and Defensive Value for Catchers
First B-Pro and now ESPN. Tommy, you're growing up so fast
THT - Advancing by ground
Negro League Museum Close to Folding
It is a capital mistake to theorize before one has data. Insensibly one...
Ranking Minor League Systems Using Victors Wang's Prospect Valuations
Pitch f/x on Ricky Nolasco Stretch vs. Windup again
Veron Wells the artist.  I never knew.

http://www.vwellsart.com/
A Dream Team... in honor of Dr. King

+ New FanShot All FanShots >

BtB on Twitter

Main Feed: @BtBScore

Jeff: @jeffwzimmerman
Steve: @steve_sommer
Sky: @BtB_Sky
Dan: @dturkenk
Harry: @harrypav
Jinaz: @jinazreds
Jack: @jh_moore
Erik: @Erik_Manning
Tommy R: @trancel
Justin: @justinbopp

Subscribe to BtB via Email

Enter your email address:

Delivered by FeedBurner

BtB Goes Social


Managers

Wbc_029_small Jeff Sullivan

Editors

Rawlings_baseball_bigger_small Dan Turkenkopf

Limes_125_small Sky Kalkman

770insig_small Jeff Zimmerman (TucsonRoyal)

Aviles_small Justin Bopp

Authors

Roots_game_small R.J. Anderson

Jinaz-reds-avatar_small JinAZ

Face_small Harry Pavlidis

1753738656_110919ebe9_o_small vivaelpujols

Ozzie_small erik

Raysring1_small Tommy Rancel

Redcap_small SFiercex4

St_louis_cardinals_ce1141_003263_small stevesommer05

Paige_small Satchel Price