Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Why Hockey Fans Should Root For Devils Vs. Kings

Daily Box Score 8/27: With a Little Help From My Friends

Option4_medium

I've written a bit recently about how statistics often trip us up by running counter to our intuition. This is a difficult problem, and if it's one that interests you, I recommend you read some behavioral economics, beginning with Dan Ariely (author of Predictably Irrational). But one of the nice things about online communities is that they allow community auditing. What do I mean by that? 

There are plenty of blogs about baseball on the internet, and even a good many that are about baseball and take a rather quantitative approach to analyzing the game. These blogs cross-pollinate ideas and link to one another; in short, this is what we mean when we refer to a blogosphere (a word which, ugly though it is, describes an otherwise unnamed phenomenon). And it seems to me that especially when it comes to baseball statistics, people on the internet get really excited about proving other people wrong (no, I'm not naming any names). 

Could this be a good thing?

Star-divide

Table of Contents

Crowdsourcing
Probability and Google
A Project
Discussion Question of the Day

 

Crowdsourcing

Some tasks are best executed by a single person. Aphoristically, a room full of monkeys on typewriters might eventually write Shakespeare, but rarely has great literature sprung from the minds of a group of people. When it comes to art, it's a benefit to have a single author with individual personality, desires and idiosyncrasies. 

Other tasks are completed fastest, most efficiently, and even most accurately, when left to large groups of people. This wasn't always apparent. For example, this TED talk makes the point that, 15 years ago, nobody would have expected a free collaborative encyclopedia that did not pay its authors could compete with the Microsoft behemoth. But of course, nobody uses Encarta anymore. Anybody out there not have Wikipedia in his/her browser history? (You're lying.)

So how does this relate to the sabermetrics blogosphere? For the seventh time now, Tom Tango has opened his Scouting Report, By the Fans, For the Fans for balloting. It's an interesting idea. From his introduction:

Baseball's fans are very perceptive. Take a large group of them, and they can pick out the final standings with the best of them. They can forecast the performance of players as well as those guys with rather sophisticated forecasting engines. Bill James, in one of his later Abstracts, had the fans vote in for the ranking of the best to worst players by position. And they did a darn good job.

So he takes this idea of crowdsourcing and applies it to individual defense. All he had to do was create a ballot and a system to tabulate the results, and get as many people to vote as possible. After all, the more ballots, the lower the random error. Perhaps because they are simply aggregated intuition, the results accord fairly well with intuition. For example, the top three defenders on the 2008 Atlanta Braves were Yunel Escobar, Mark Teixeira, and Mark Kotsay.

But wait a minute. Crowds might be pretty good at figuring out some things, but are they really good at evaluating performance? And didn't I say just two days ago that human beings are actually pretty bad at avoiding bias? 

Certainly, you would never crowdsource player projections. If you did, you'd probably end up with all kinds of mistakes. Imagine the crowdsourced projection for Brad Lidge's performance this year? Sure, some commentators might have pegged him for regression, but most people probably would have taken 48-48 at face value. 

So why then is it a good idea to crowdsource defense? Because of the alternatives. We are getting better at defensive statistics. The best, probably UZR, are what I would call "not terrible." However, there are some pretty sophisticated projection systems out there for hitting and pitching performance. But that simply isn't the case with defense.

So, by working together, fans are able to improve on what exists already. And I'd be willing to bet that the next great defensive statistic will be written about first on the web, open-source, and freely available for all. Until then, go evaluate your favorite team's players.

Probability and Google

You knew that Google was also a calculator, right? But did you know it could also help you to calculate statistically significant player slumps? That's what Ian Ayres, writing at Freakonomics, says:

Over his career, A-Rod has averaged one homer for every 14.2 at bats — suggesting there is about a 93 percent chance that he will not homer on any individual at bat. It would be crazy to say that he was in a home-run slump after failing to homer after just a few at bats. But the question is how many homer-less at bats is enough to be a statistically significant drought?

The answer is 42. 

(Of course, we already knew the answer was 42.)

But how did he arrive at that figure?

Athlete is having a statistical significant drought if:

Total consecutive number of bad events > log(.05)/log(probability of single bad event)

You can copy and paste the right-hand side of this inequality into Google, plugging in the probability of a single bad event (yes, Google is a calculator):

For A-Rod going homer-less, you would Google: log(.05)/log(.93).

Where .05 is the confidence level (95%) and .93 is the probability of ARod not hitting a home run (based on 14.2 HR/AB career). It's a pretty nifty trick.

I would also add that, if you have Excel handy, it's pretty easy to go the other way. For example, if I wanted to know the exact probability of ARod going 42 at bats without a home run, I could simply enter the following formula:

=BINOMDIST(42, 42, 0.93, 0)

Where the first 42 is the number of desired outcomes, the second 42 is the number of trials, .93 is again the probability of ARod not hitting a home run, and 0 is a binary variable telling Excel that we are not looking for a cumulative probability (you almost always want this at 0 when doing this kind of calculation). If we punch that into Excel, it gives us back a simple number, in this case .0475, indicating a 4.75% chance.

Between Google and Excel, there isn't a whole lot of math that can elude you.

A Project

Speaking of math that eludes me, here's an interesting thread I've picked up. It started back in March, with Larry at wezen-ball.com. He wondered if two baseball games had ever played out identically:

I did this by looking at every game in the database and finding any games that had identical end-game statistics to it. If two games had the same number of innings played and identical home- and road- runs, hits, errors, and men left-on-base, I marked them as a unique pair. There were 3,479 such pairs of games.

You can read his full method, along with a list of the closest games in the Retrosheet era, at the link. But recently, someone with a mathematics blog (God Plays Dice) came across his post and wondered if perhaps Larry's criteria were overly strict:

"[I]dentically" is defined a bit too strictly; (say) a groundout to second and a groundout to shortstop are counted as different. And the metric that the author uses for similarity of two games A and B is, I think, the number of times where the nth plate appearance in games A and B had the same outcome. Intuitively I think you'd want to line up innings with each other. Two "most similar" games should at least have similar-looking line scores. I think what one wants is some notion of "edit distance" between games, and defining that is hardly trivial. 

Hardly trivial indeed. This is baseball we're talking about!

Ok, enough joking around, it's time to bump up against he limits of my mathematical knowledge. I think what we want here is a specific kind of edit distance, which in computer science is a way of describing how many changes you would have to make to a string in order to transform one into the other. For example, the edit distance of "one" and "two" is three, because you have to change each letter. There are different kinds of edit distance, depending on what counts as a fair move (can you transpose?).

For a baseball game, I don't think we especially care whether the groundout came before the single or vice-versa, as long as the outcomes were similar. So it would seem most appropriate to use something called Damerau-Levenshtein distance, which is described by Wikipedia (take that Encarta) as:

Damerau–Levenshtein distance is a "distance" (string metric) between two strings, i.e., finite sequence of symbols, given by counting the minimum number of operations needed to transform one string into the other, where an operation is defined as an insertion, deletion, or substitution of a single character, or a transposition of two characters.

I think if we could calculate a Damerau-Levenshtein distance for all sufficiently similar games and rank them in ascending order, we'd have a pretty good answer to the question of which two baseball games were most similar.

Discussion Question of the Day

Comp. sci. geeks: now is your time to shine. Can we make this happen? Am I even right about which edit distance would be most appropriate? Is this a fool's errand? Please, I'm in far over my head, and I need a little help from my friends.

And no, I'm not going to let you get away without linking to Joe Cocker.

Comment 9 comments  |  2 recs  | 

Do you like this story?

Around SB Nation

A Plug

Aug 2009 from DRaysBay - 0 comments

Comments

Display:

Great suggestion...

Good suggestion on the “identical games” problem, Tommy. I have to say, since I first wrote that piece back in March, I’ve thought of about a dozen different ways to look for identical games. Your suggestion sounds similar to one that I’ve been pondering. I haven’t had the chance to explore it much, though, so I don’t know yet how feasible it is. Using a process with some established CSC theory behind it like that is probably a good place to start.

I certainly wouldn’t mind hearing some other thoughts. It’s a problem I’m very interested in solving more definitively.

by lar on Aug 27, 2009 8:49 PM EDT reply actions  

The problem is that the Freakonomics piece is kind of wrong.

Given his well over 8000 PAs, we should expect some things outside of 2 SDs/.05 p-value/etc. to have happened at some point in his career.

by cwyers on Aug 27, 2009 10:28 PM EDT reply actions  

Does that mean they aren’t slumps? I don’t take him to be saying that the slump is predictive (though there is some evidence in The Book that slumps are predictive), just as a way to describe how far from a player’s established performance level a given stretch is.

by Tommy Bennett on Aug 27, 2009 10:36 PM EDT up reply actions  

I really don’t care what someone wants to call a slump or not. Here’s where he goes wrong:

The answer is 42. There is less than a 5 percent chance that Rodriguez would go homerless 42 times in a row — so we can reject the hypothesis (at a 5 percent level of statistical significance) that he is going homer-less merely as a matter of chance.

We can do no such thing – at the five percent level of significance such things still happen five percent of the time. And since A-Rod plays so much baseball, things that should happen five percent of the time do happen from time to time based upon chance alone.

And to extend it further – it’s not just A-Rod, it’s hundreds of batters. We should expect a certain number of batters to have a home run drought exceed 2 SD every season (it may be fractional, I haven’t bothered to check what it is – you get into issues with dependence and it’s messy).

You can’t select a subset of AB from one batter based upon the stats themselves – that is, after the fact – and then claim those results are statistically significant; you have a selection bias to contend with.

by cwyers on Aug 27, 2009 10:53 PM EDT up reply actions  

That's why

I didn’t choose to talk about that portion of the post. But you are of course correct.

by Tommy Bennett on Aug 27, 2009 11:00 PM EDT up reply actions  

You can't claim anything is "statistically significant" without specifying a confidence level, though

A-Rod’s 42-PA drought is statistically significant for the criteria given; the validity of this criteria is wholly a matter of opinion. You can select any batter or subset of batters who have had a HR drought over 2 SD (based on true talent levels) and report that there is a 95% chance that this is not due to bad luck and it will always be true, regardless of how you selected them. Of course, biases will show up based on how you chose to measure true talent levels, but the underlying statistical methodology is sound.

by yugret on Aug 28, 2009 4:56 AM EDT up reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

Yahoo_full_count

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Recent_pic_pg_small Patrick Gordon

Btbpro_small Dave Gershman

Me_small Bryan Grosnick

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung

30472_1481067225243_1190689185_1381415_997334_n_small Glenn DuPaul

1mnvxku7_small joshuaworn

Set_small MattFilippi18

Photo0011_small Nathaniel Stoltz