Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: This Week In GIFs

The Hall of Fame Zone Revisited: Career Arcs

Graph of the Day

Arc-of-war-90s_and_00s_contemporaries-thumbnails

Career WAR Arcs can be compared to the average Hall of Famer for interesting results.

If you're not already familiar with the concept, readers JBrew and Studes (and Justin Inaz and Sky Kalkman)* developed a way of looking at potential Hall-of-Famers by arranging all previous members' WAR data into a best-season-to-worst view. They took the data, found the median, and then stretched a "hall of fame zone" from the 20th to 50th percentiles from which to compare would-be candidates.

You can see that concept HERE. (Sorry, requires an ESPN Insider subscription.)

*edit: Sky corrects the record in the comments section. He was not in fact the creator of the "HoF Zone."

While the effort has been duplicated numerous times, not only by myself but also now appearing on Fangraphs as a regular feature/tool, there was always one minor issue I took with the approach. That was that the approach seemed to neglect the overall curve of the player's career: the ups, the downs, the long stretches of average production, the shockingly swift decline that all players must eventually face.

I decided another approach was in order.

Star-divide

Method

With that in mind, I decided to take the same set of players and find another Hall of Fame Zone--the Hall of Fame Career WAR Arc. I took every player and arranged his career WAR numbers (via Fangraphs) from his first season to his last. I found that the average Hall of Famer has a 20.5-year career. That surprised me a bit.

With the career-aligned data in place, from each of their rookie seasons to their sad departure from the majors some 20 years later, I averaged every year's WAR data. Obvious note: while almost every hall of famer starts with a modest 1.0-ish WAR, they adjust very quickly, up into the 5s and 6s in just three years. Even if you've been following sabermetrics for some time, you might still be surprised to note that the average hall of famer peaks in his 7th season--somewhere between 26 and 28 years old.

With the average hall-of-fame WAR for each season of the "average hall of famer" in place, I then found the standard deviation for every season, then added and subtracted that amount from each respective season. I plotted the data and found a polynomial (^4) line-of-best-fit for each line: one line for the average, one line for the lower standard deviation, and one line for the high standard deviation.

The results are as follows.

Arc-of-war_medium

I truncated the data at 21 seasons, and kept the scale to show 14 WAR. With the area between the standard deviations shaded in grey (keeping with Sky's original theme), we now have a Hall of Fame Zone in a career arc format from which to judge potential candidates.

Here's how it might be applied:

War-arcs-_griffey_medium

The Kid had a higher WAR in his 5th season (9.0) than he did for his last 11 seasons combined. Beautiful, and tragic career arc.

 

War-arcs-_thome_medium

Per reader request. And yeah--AWESOME. His curve puts him in the thick of the Hall of Fame for his entire career.

 

War-arcs-_arod_medium

What a brilliant career. Freaking amazing. Playing at a premium position at the beginning of his career gives him a nice boost, though.

 

War-arcs-_pujols_medium

Even at a positional disadvantage, Pujols has spent his career as BETTER than Hall of Fame.

 

War-arcs-_bonds_medium

*Ahem*

 

 

Removing the bar charts and combining the curves into a single chart gives a wonderful career comparison:


Arc-of-war-90s_and_00s_contemporaries_medium

 

So, what do you think? Whose career would you like to test against the Career Arc Hall of Fame Zone? Who, right now in his 3rd-4th-5th season, is producing 5 or more WAR? Maybe we should watch them.

 

And because I had it open still:


Arc-of-war-90s_and_00s_contemporaries_ii_medium

Comment 153 comments  |  4 recs  | 

Do you like this story?

Comments

Display:

Awesome graphs.

I love the new stuff…especially the LOL zone. :)

by jwiscarson on Jun 7, 2010 11:27 AM EDT reply actions  

If that is the LOL zone

what should the area under the start of Pujols’ career be? The “not a maching, I’m jess Albert” zone?

Albert Pujols does not have "down" years. He has "~6 WAR" years.

by mattybobo on Jun 11, 2010 2:28 PM EDT up reply actions  

Removing my crown

Passing to Mr. Bopp. This is AWESOME.

On Twitter: @baseballtwit

by adarowski on Jun 7, 2010 11:41 AM EDT reply actions  

Standing on the shoulders of giants, my friend.

Off topic: how awesome is our weekly competition now? I think it’s made both of us better in very little time.

by Justin Bopp on Jun 7, 2010 12:20 PM EDT up reply actions  

If by “made both of us better” you mean “spending the time that I used to sleep plotting my revenge in the form of pixels”, then totally.

On Twitter: @baseballtwit

by adarowski on Jun 7, 2010 12:34 PM EDT up reply actions  

I knew you were on to me.

I was doing the same thing!

Question is, what direction will you be coming from next? I fully expect something mindblowing.

by Justin Bopp on Jun 7, 2010 12:36 PM EDT up reply actions  

To be honest...

I have a simple one queued up for tomorrow. Thinking about v2 of the SaberCard more and more though. :)

On Twitter: @baseballtwit

by adarowski on Jun 7, 2010 1:12 PM EDT up reply actions  

Mmmmm

These are very sexy.

The blown up version of the first graph is much too small, is my only caveat.
I’ll comment more later.

by Patrick42 on Jun 7, 2010 12:08 PM EDT reply actions  

I'm not sure which one you mean...

But the very first image in your post. Right under “Graph of the Day”.

I feel like I should be able to click on it and have it blow up to where it’s legible. As is, it’s still just a preview that I can’t read.

by Patrick42 on Jun 7, 2010 12:13 PM EDT up reply actions  

I almost commented on this as well.

But Justin did link to full-sized graphs for each individual graph, and a combined full-sized one would probably be all but totally undigestible for normal surfers.

Justin —

Could the “full-sized” graph replace the thumbnail in the story?

by jwiscarson on Jun 7, 2010 12:17 PM EDT up reply actions  

You've got the little one...

In the story right now. Then the slightly larger one on the click-through.

I think he’s suggesting replacing the little one with the slightly larger one.

by Patrick42 on Jun 7, 2010 12:24 PM EDT up reply actions  

I've messed around with it a bunch now

and I think the best way was the way it started. Now it just looks goofy.

I’m going to put it back and remove the link and be done with it. The idea was to give a preview of what was beyond the jump, not an overview. Lesson learned.

by Justin Bopp on Jun 7, 2010 12:34 PM EDT up reply actions  

Haha.

No problem. When I viewed the article on its own page, I thought there was enough space for the larger image, but when I checked the front page, there wasn’t anywhere near enough room…and then I realized I also view the site in wide mode.

Thanks for trying anyway.

by jwiscarson on Jun 7, 2010 12:36 PM EDT up reply actions  

It would be pretty cool

if we could a slideshow thing, but most of the time that comes with the inevitable (and blatant) begging for page-loads. I hate that.

by Justin Bopp on Jun 7, 2010 1:10 PM EDT up reply actions  

I think it’s kind of sad that the one thing that will stick out most about this post to me is the “LOL zone.”

Pittsburgh sports all the way

by GoPens! on Jun 7, 2010 12:27 PM EDT reply actions  

Search:

Player seasons 14-21+ spent above 6 WAR. I’m guessing there aren’t many.

by Justin Bopp on Jun 7, 2010 12:39 PM EDT up reply actions  

The Bambino.

Maybe Mays. Cobb? Though with Cobb I think it’s more of a “He was incredible and played forever” than having seasons of “OMGWTFWARBQ”-range WAR.

by Patrick42 on Jun 7, 2010 12:48 PM EDT up reply actions  

Probably =)

I’m amazed anyone remembers.

Justin, thanks again for the really slick graphs.

How are you making these?

I’m going to give some thought to trying to find a nicer looking fit line for the graphs… Since it’s not an aging curve anyway…

by Patrick42 on Jun 7, 2010 1:36 PM EDT up reply actions  

They are created using 2003 Excel and a graphic-editing program.

I have 2007 at work and I honestly cannot wait until I get it at home. Hell, I’m not even sure why I don’t already. Either way, a lot of what I do (but not all) can be replicated in 2007.

By “nicer looking,” do you mean more aesthetically- or more intellectually-pleasing?

by Justin Bopp on Jun 7, 2010 1:51 PM EDT up reply actions  

Without drawing this thread greatly off-topic

I don’t use Excel 2007 that much (I’m a programmer, so I end up using SQL Server Reporting Services if someone needs graphs), but I absolutely love Word 2007 (I minored in technical writing and try to force it on other programmers whenever I can…in general, we are not an explanatory bunch).

The leap in functionality/usability between XP/03 and 07 is huge. I’ll be interested to see what comes out with Office 2010.

by jwiscarson on Jun 7, 2010 1:57 PM EDT up reply actions  

That's a good question, isn't it?

Intellectual that I am, I think intellectually pleasing… I’m just going to try poking at it and seeing what comes out. Have to bust out my old copy of Office 2k03.

by Patrick42 on Jun 7, 2010 2:12 PM EDT up reply actions  

Bring it.

I like moving averages, and the thing you showed me in the other thread seemed like a really neat idea. Let’s see what you got.

by Justin Bopp on Jun 7, 2010 3:04 PM EDT up reply actions  

Thing I showed you?

I showed you a thing? :X
I made some points about graphs and derivatives and zeroes and the like.

by Patrick42 on Jun 7, 2010 4:51 PM EDT up reply actions  

Uh.

It was a wiki article about higher-math-than-i’m-able, talking about a jittery line and smoothing it out with continually refined something-or-other.

Again, if you (or that cat below) finds a more effective way of visualizing the same data that is demonstrably different (both in utility and in legibility), I’m all for it. As is, I have no problem standing by the lines of best fit.

by Justin Bopp on Jun 7, 2010 4:56 PM EDT up reply actions  

Sure.

I have a REALLY dumb question:

Where and how are you getting the data? Fangraphs, but how are you extracting the per season WAR data?

I have never done this before and I found when I went to the individual player page, my export button seemed to vanish.

/dumb questions :(

by Patrick42 on Jun 8, 2010 1:12 AM EDT up reply actions  

Class With Professor Justin

[ ] Fangraphs
[ ] Player of your choice (example: Justin Morneau)
[ ] Copy (avoid the headings and totals rows. they’ll make your paste look wonky)
[ ] Paste in excel. You should be able to see which row is WAR.
.
.
I could be wrong, but I’m pretty sure the removal of “export” is intentional.

by Justin Bopp on Jun 8, 2010 9:11 AM EDT up reply actions  

Ahhh

I thought so too… OK. I’ll copy and paste. shudders at copying and pasting from HTML

by Patrick42 on Jun 8, 2010 9:34 AM EDT up reply actions  

:O

I work in IT. Sometimes people want text pulled from websites and spreadsheet-ed. I think I love you. :O

by Patrick42 on Jun 8, 2010 10:42 AM EDT up reply actions  

but wait, there's more!

[copy]
notepad
right click
paste
control-home
press [tab]
press [shift]+[home]
control-C
backspace
edit —> replace
find what: (press spacebar)
replace with: control-V
replace all
control-A, control-C

excel
control-V

party

(you just copied data, pasted it into notepad, copied a tab and inserted it into every space, copied all and pasted into excel, which will paste the data into columns. messy but effective when you’re working with html)

sidenote: notepad is my 3rd favorite program.

by Justin Bopp on Jun 8, 2010 11:01 AM EDT up reply actions  

Are you guys familiar with the Web query function of Excel?

Where you can grab a table from a Web site and load it dynamically?

by Dan Turkenkopf on Jun 8, 2010 11:13 AM EDT up reply actions  

Go to the Data menu (or Ribbon) in Excel, depending on version

And choose either Web Query or the From Web button.

Enter the URL you want. Once the page loads you’ll get an arrow next to the tables you can import. Click the arrow next to the table you want, and it will load it to the sheet for you.

by Dan Turkenkopf on Jun 8, 2010 11:26 AM EDT up reply actions  

My tentative understanding

Is that it doesn’t work very well. You get a lot of munged data, mixed with junk, and etc.

But if you endorse it I might have to give it a try.

My usual go-to methodology is to copy and paste the table in to Excelt, then save it to The Best Database Format Ever™ – The mighty CSV file. This strips all the extraneous junk.

by Patrick42 on Jun 8, 2010 11:27 AM EDT up reply actions  

I've had good luck using it

with Fangraphs also. You can also set up a macro fairly easily to grab multiple years at one time that way too.

by stevesommer05 on Jun 8, 2010 11:44 AM EDT up reply actions  

This is disturbing

Not the actions themselves. They look excellent.

That you took the time to type it all up. That’s disturbing.

by Patrick42 on Jun 8, 2010 11:27 AM EDT up reply actions  

Oh God.

Kernel smoothing.

That wasn’t me, it was Jadelane. It was right next to my posts, but wasn’t me. Lest you think I’m familiar with it, I’m going to go hide under my covers.

by Patrick42 on Jun 8, 2010 1:14 AM EDT up reply actions  

Yeah, my math...

Goes a few years of college calc, but then it gets real fuzzy. Diff-eq is all a blur, and multivariable calculus is swiftly fading.

I still remember my set theory pretty well, though! So if we need esoteric proofs of basic arithmetic, I’d just need a few minutes with my text book and I’d be ready to rock.

by Patrick42 on Jun 8, 2010 10:01 AM EDT up reply actions  

let's just say

that University Calc II and Physics for Engineering Majors is the reason I’m not an engineer.

Communications Degree FTW! I can public relations my ass off!

by Justin Bopp on Jun 8, 2010 10:17 AM EDT up reply actions  

Err

Err. Errr. … err. :)

by Patrick42 on Jun 8, 2010 10:43 AM EDT up reply actions  

Haha!

Side story:

The college I graduated from (University of North Texas) participates in a program where high school juniors and seniors attend college classes. It’s called TAMS (Texas Academy of Math and Science). TAMS students are, shockingly, brilliant kids who intuitively understand a lot of the more vague aspects of calculus and physics.

I went through Cal-based physics (kinematics and electrostatics) in a class 2/3rds full of TAMS students. The rest of us were varying shades of lost (ranging from “huh?” to “WTF?” to “”). The university never advertised this, but the opposing semesters had no TAMS students and went at approximately half the pace we did (we covered a chapter per week in my class). We had problem sets with 30-40 questions due at the end of each week in addition to labs.

We also had TAMS students posting the answers to our problem sets online. A part of me feels bad for cheating, but I rationalized it away through my irate feelings at the university hiding the fact that even we above-average students would be competing with kids who go on to lives full of nuclear physics, advanced math, etc.

In hindsight, I wish I would’ve been more dedicated to the class, but I was not exactly fully grown up at that point in my college career.

by jwiscarson on Jun 8, 2010 12:46 PM EDT up reply actions  

A number of guys to try it on

Here are a number of guys I’d like to see how they fit on your charts if they are easy to do.

Scott Rolen
Roy Halladay
Adam Dunn
Matt Holiday
Chris Carpenter
Mariano Rivera
David Ortiz
Vlad Guerrero
Miggy Cabrera
Ichiro Suzuki
CC Sabathia
Andy Pettite
Roy Oswalt

by Michael Fulton on Jun 7, 2010 12:48 PM EDT reply actions  

I was thinking about taking

Adam’s “many ways to 60 WAR” and applying a few of their careers to this. It would be interesting to see how their careers slip in-and-out of the Hall of Fame Zone and the resulting arguments it might create.

by Justin Bopp on Jun 7, 2010 12:52 PM EDT up reply actions  

Oooh...

I’d be curious if the all-defense guys show a very different curve than the all-offense guys. And then the Good at Everything guys… consider me intrigued.

On Twitter: @baseballtwit

by adarowski on Jun 7, 2010 1:03 PM EDT up reply actions  

Well,

we’ve known for a long time that certain skills age better than others. Maybe we could combine the breakout data of WAR components into separate curves?

Hall of Fame Defense Curve
Hall of Fame Batting Curve
Hall of Fame Pitching Curve
etc

by Justin Bopp on Jun 7, 2010 1:12 PM EDT up reply actions  

Also, Clemens

He was doing some wacky stuff (performance… and performance enhancing) late in his years.

On Twitter: @baseballtwit

by adarowski on Jun 7, 2010 1:04 PM EDT up reply actions  

Having found this article, I immediately thought about what Rivera, Halladay, and Ichiro’s curves would look like. This is a great list of players to try. The only ones I would add that I’m really curious to see are Chase Utley, Tim Lincecum (I realize he’s young, but what do two consecutive Cy Young’s in your first three seasons do for you?), and Yadier Molina (especially for his defense, as Justin introduces below)

by Baseball Nerd on Jun 7, 2010 4:39 PM EDT up reply actions  

For Lincecum, you’d have to count 2007 even though it wasn’t a full season, but I’d still be curious.

by Baseball Nerd on Jun 7, 2010 4:40 PM EDT up reply actions  

In reference to the Barry Bonds WAR graph, I wonder if there’s a way to incorporate when it was first alleged he took steroids.

I was thinking that if there’s a statistically significant difference among previous years, it might give evidence that he did take them, especially if they were in years that other Hall players had declining WAR numbers.

Pittsburgh sports all the way

by GoPens! on Jun 7, 2010 1:00 PM EDT reply actions  

Evidence that he was performing way beyond what the average WAR of other Hall-of-Fame players would indicate. The major boost in the latter years when everyone else experiences a decline seems too good to be natural.

Pittsburgh sports all the way

by GoPens! on Jun 7, 2010 3:55 PM EDT up reply actions  

I think that's a little like asking

for evidence that Colorado had an issue with too many home runs when it first opened. The evidence is there, documented, and reviewed ad naseum.

Of course, that won’t silence the non-believers. The important thing to remember is that he wasn’t alone, and it doesn’t explain everything (don’t forget the ball was juiced as well).

by Justin Bopp on Jun 7, 2010 4:13 PM EDT up reply actions  

I’m not super into baseball news and press, so I wasn’t aware that there was a solid, well-organized case that argued Bonds took steroids.

Pittsburgh sports all the way

by GoPens! on Jun 7, 2010 4:27 PM EDT up reply actions  

Is this too part of the LOL zone?

Pittsburgh sports all the way

by GoPens! on Jun 7, 2010 4:53 PM EDT up reply actions  

You know

He actually looks pretty jacked in the before pic. Look at that arm!

by Justin Bopp on Jun 7, 2010 4:57 PM EDT up reply actions  

And he’s got a killer mustache!

Pittsburgh sports all the way

by GoPens! on Jun 7, 2010 5:22 PM EDT up reply actions  

I know we're joking and all that now

but your inquiry is similar to asking what evidence OJ “did it” about 6 years ago, before he wrote about it in a book.

As my friend Walter would point out—one of these things is murder. Another is a guy possibly cheating. Not exactly the same thing. The only thing Bonds murdered is a baseball. Almost 800 times.

by Justin Bopp on Jun 7, 2010 5:33 PM EDT up reply actions  

If you're going to do that for Bonds

You gotta do it for A-Rod. He admitted PED use.

I don’t get why Bonds gets singled out so much. PEDs or not, probably the greatest player I’ve ever seen play.

Even if you take from 1986 to 1999 (which was supposedly before the PED use), the guy had a WAR of 107.4 (or so, depending on which calc you use. I’m using baseball-reference) over those 14 seasons. Considering there are only 20 guys with a career WAR over 107, that’s pretty damn good. Even if he followed a normal decline, he was going to have a WAR of at least 115 (I’m willing to bet that over the last 8 years, he would’ve had an average of 1 win over replacement/year), which would have been good enough for 17th best all time.

I know everyone hates Bonds but he was a damn great player.

All I’m trying to say is that while some of the stuff in the latter years is screwy, the graph over the majority of his career is still interesting and worth looking at.

by Mark Kieffer on Jun 7, 2010 4:13 PM EDT up reply actions  

I think that he’s a remarkable player, and the WAR totals in his earlier years prove as much. It’s just the later years that are highlighted by Justin’s aptly titled LOL zone that give me pause.

Pittsburgh sports all the way

by GoPens! on Jun 7, 2010 4:30 PM EDT up reply actions  

Honestly, I probably should have left it out.

But it’s time to stop skirting around:

a) the greatest player ever (or one of)
b) how funny his stats look

by Justin Bopp on Jun 7, 2010 4:35 PM EDT up reply actions  

I should mention--he's also singled out because

his career decline was so obviously altered mid-stream. I could be EXTREMELY wrong, but I can’t think a single other player who passed and sur-passed his accepted peak.

Now, I’m one of the people that believe he’s probably the 2nd or 3rd best player that ever lived….but he deserves the spotline precisely because he was so good.

That said, I’ve noticed a lot of otherwise rational people get really figity and nervous when PEDs come up. I’m not afraid to talk about it, but I’m also not looking for a job with the MLB (why, you hiring?).

I think the better discussion will eventually be where the stat-altering began and ended, if it did. Hell, today I read something that said the meth ban is more responsible for the decline in batter production than PEDs ever were. Who knows.

I know this: stat people get very, very nervous when you tell them their data is corrupted. (and rightfully so).

by Justin Bopp on Jun 7, 2010 4:32 PM EDT up reply actions  

The data isn’t so much corrupted as it could be used as a tool to find out when abnormal changes began.

Pittsburgh sports all the way

by GoPens! on Jun 7, 2010 4:55 PM EDT up reply actions  

but what if the entire well was poisoned,

and not just this particular glass?

That’s why people get crazy about it. It’s easier to dismiss it as tight ball threads and give everyone the same advantage (or disadvantage).

by Justin Bopp on Jun 7, 2010 4:58 PM EDT up reply actions  

Point taken. Nice analogy Justin, I’ll have to bookmark that :)

Pittsburgh sports all the way

by GoPens! on Jun 7, 2010 5:23 PM EDT up reply actions  

Well...

Just look at the last few seasons of his career. Most players decline steadily in to replacement-ness, Bonds goes OFF THE CHARTS good for a while, like, Babe Ruth in his prime level good, at an age when almost everyone else in baseball history is sliding slowly in to oblivion.

by Patrick42 on Jun 7, 2010 1:39 PM EDT up reply actions  

That’s what I was looking at, and it seemed especially fishy.

Pittsburgh sports all the way

by GoPens! on Jun 7, 2010 3:53 PM EDT up reply actions  

Guys I'd like to see

As far as retired guys, I’d like to see
Ron Santo,
Jim Rice
Andre Dawson
Tim Raines
Robbie Alomar
Gary Carter…

All of which are guys who people feel are either getting snubbed (Santo, Raines, Alomar), or took too long to get in (Santo, Dawson, Carter).

As far as current guys, in their 3-5th year to watch for:

Joe Mauer
Justin Morneau
Ryan Zimmerman
Adrian Gonzalez
Kevin Youkilis
Tulowitzki

by Mark Kieffer on Jun 7, 2010 2:20 PM EDT reply actions  

Also

If there is a tool to do this, I’d gladly do it on my own. Just would like to see it a lot… Especially the retired guys.

by Mark Kieffer on Jun 7, 2010 2:21 PM EDT up reply actions  

Not only did it take Santo too long to get in...

… but he didn’t even get in yet.

Yes, sometimes I forget that too. Unreal.

On Twitter: @baseballtwit

by adarowski on Jun 7, 2010 2:37 PM EDT up reply actions  

My bad

I put Santo twice. I know he isn’t in. But everyone says he should be, so I wanna set it broken down like this. I go back and forth on it. He was before my time, so I have no visual memories or evidence…

by Mark Kieffer on Jun 7, 2010 3:11 PM EDT up reply actions  

I think I'll start a series, Mark.

Anybody that wants to make a tool with me let me know. I’d love something in flash that could spit these out.

by Justin Bopp on Jun 7, 2010 3:05 PM EDT up reply actions  

Roberto Alomar

That one intrigued me too… so here’s my (ugly, but accurate) version
http://yfrog.com/g0robertoalomarp

by C.DeVore on Jun 12, 2010 11:15 PM EDT up reply actions  

Why a quartic?

Good work, but why did you choose to use a quartic? I find polynomial interpolation to be a very dangerous venture that can lead to strange conclusions if one is not careful. I think it’s a good rule of thumb that the higher degree polynomial you use, the more justification you need to use it. So lines (degree 1) are fairly common and don’t need much a priori justification, but not a lot of well-studied processes are modeled best by quartics (degree 4), so it’s a tougher sell. I wonder if some kind of moving average or LOESS would have been better.

That said, nice work

by mickeyg13 on Jun 7, 2010 2:44 PM EDT reply actions  

It's the smallest poly

that allowed for up-and-down variation while having a natural down-up-down curve that naturally fits the data (and logic of a player’s career).

There’s also an aesthetic aspect that I can’t deny. It looks bad ass. I know that won’t satisfy intellectual curiosity, but I’m open to additional ideas that a) provide a better fit and b) necessarily improve the conclusion.

As we apply the method to more players across the board, we’ll see where the flaws are and how it can be improved. As is, a quartic line of best fit really represents the data quite well.

by Justin Bopp on Jun 7, 2010 3:11 PM EDT up reply actions  

You know...

This makes me think of the Similarity Scores on baseball-reference.

Do you think you could do a series on similarity scores, picking players (perhaps a handful of players from each of the WAR “categories”), and graphing their careers to see how accurate the similarity scores really are?

by jwiscarson on Jun 7, 2010 3:55 PM EDT reply actions  

I did one on player comps last year,

comparing Billy Butler to Dave Winfield, of all people. Credit to Walter Fulbright for the analysis, though.

That said, give me an example of what you’re talking about.

by Justin Bopp on Jun 7, 2010 4:16 PM EDT up reply actions  

On further thought...

I’m not sure it’s entirely applicable — I just checked the BR sim scores page and found the definition.

I was thinking of something more along the lines of, “do similarity scores really tell us if players are similar?” with your shiny graphs, because I do wonder how arbitrary Bill James’ decisions are (he included runs scored and RBI). I’m not sure if you can (or if it makes sense) to break WAR down into its components at such a deep level, but I’d be interested in seeing the results if it’s possible.

by jwiscarson on Jun 7, 2010 7:03 PM EDT up reply actions  

The LOL Zone

Does it represent the growth of Bonds’s head or the tumor that will protrude from his bosom in 20 years?

good stuff mang

You guys win. You can keep your little marked-out piece of internet territory. Spend your days communicating via keyboard with people too ugly for the real world and too nerdy for anyone to care, anyway. Your piece of land is here. Do the rest of civilization a favor and stay within its limits. You bore me. Have fun with your nightly sobs and screams into your pillow over your inability to attract a good mate, Radiohead. ~The Hooligan

by Daniel Berlyn on Jun 7, 2010 4:11 PM EDT reply actions  

The zone wasn't me.

I think it was mostly jbrew, who gathered the HoFamer data and calculated the percentiles. Then studes turned the boundary lines into a gray area.

Love the zone, on both the best-to-worst and chronological order graphs.

by Sky Kalkman on Jun 7, 2010 4:49 PM EDT reply actions  

I searched and searched

by Google I mean, and I couldn’t find any creator other than you. I’ve been thinking it was you for a looong time. Help me collect links to the creation posts and I’d be happy to correct the record (and article).

by Justin Bopp on Jun 7, 2010 5:00 PM EDT up reply actions  

Just to make sure, and maybe I missed it in the explanation

Are you taking HoFers as a whole, or splitting it up by pos. player/pitcher? I’d say the latter is probably the right way to go.

by SFiercex4 on Jun 7, 2010 7:07 PM EDT reply actions  

Projecting Career Arcs Into the Future

First, I have a nit to pick. A-Rod’s arc should not go to 0 in a year or two. Like Pujols, it should simply stop at the last data point.

But we want to see the future. And using Similarity Scores we can – sort of. The idea I threw out last Friday was to compare Pujols’ arc to that of his Top 10 list of similar players (8 HOF’ers plus ARod and Juan Gonzalez(!)). Assuming they are indeed similar so far, average the futures of his comps and use that to complete his arc. You can even create a confidence interval about the arc by computing standard deviations.

Speaking of standard deviations, the HoF Zone seems awfully wide to me. It looks like at its peak (season 7) it ranges from 3 to 8. Anybody who peaks at 3 is not on a HoF path. Yet there are a lot of data points below the zone. Presumably those were years largely lost to injury. What did you do about players like Ted Williams who lost prime years to the war(s)? Those should not be treated as 0’s but as missing data. Same goes for injury years.

Someone suggested moving averages. That’s another way to deal with the injury/military problem. You could simply take the average of the WAR’s for Years X-1, X and X+1 and treat that as the WAR for Year X. That would eliminate a lot of the noise in the data.

by fjm235 on Jun 7, 2010 7:09 PM EDT reply actions  

The real problem with the lines on the graphs - which Justin I think understands - is...

That they have essentially no predictive value at all.

I don’t know if it’s best to put a disclaimer on it or whatever, but I think a lot of people are figuring it’s supposed to be an aging curve.

I can’t say whether or not it was supposed to be because I’m not Justin, but Justin’s a smart guy and since it’s NOT an aging curve, I’m guessing he didn’t mean it as one.

It’s just a fit line, essentially. In the previous post about the earlier WAR graphs by season with the lines, we discussed it. It’s just a pretty line to look at.

Which… Come to think of it, makes me feel like the lines for the players where the career isn’t over yet, should REALLY stop at the end of their data. Otherwise we get the impression it’s supposed to be predictive… But it’s just a best fit line to their already extant data points, so the idea that it’s predictive is… Very, very silly. And it’s why Pujols’ line went crazy in the other graph.

If a player has already begun their decline phase, a graph like these will be sort of able to model it… if the career hasn’t been too wild of a ride – It’s a 4th order best fit, so it can change between up and down three times. That means it starts out up, then can handle one dip (remember, a dip is a fall and then a rise), then the final decline phase. So… It’s going to kind of sort of be LIKE an aging curve, but it’s NOT one, and when the data is incomplete, the extrapolation from the current points is only going to be right by luck. (In other words, its meaningless)

The only reason it looks like an aging curve on the other ones is because it’s fit to the data of an aging player.

(See: http://www.beyondtheboxscore.com/2010/6/1/1494537/top-5-war-active-leaders#comments – The original post and the comments on it for more mathy-ness. Not a lot of math-iness if you’ve had high school calc – or even a good pre-calc course , but still.)
-
-

Still, these are great graphs, and if you take the line for what it is – a graphical exploration of WAR for some great players, not anything predictive – then it’s a handy visual aid.

by Patrick42 on Jun 8, 2010 1:10 AM EDT up reply actions  

This is extremely correct.

Career arcs are in no way predictive and should not be inferred as such. I made a mistake by not truncating A-Rod’s line.

by Justin Bopp on Jun 8, 2010 9:13 AM EDT up reply actions  

Wow

really sets the bar high for the next thing you do :)

by Mark Kieffer on Jun 8, 2010 11:36 AM EDT up reply actions  

I think the fact that this thread generated over 100 comments

also speaks very highly of your work here. BTB is easily the best-kept “secret” on SBN.

by jwiscarson on Jun 8, 2010 12:47 PM EDT up reply actions  

Or fewer secrets.

I think the larger issue is that we’re buried under the baseball tab. ESPN has a thing called “page 2” where a lot of the non-team related things get more prominently featured.

Perhaps we can get something similar, along with various other larger “specific sport but not team-related” SBN sites.

by Justin Bopp on Jun 8, 2010 1:34 PM EDT up reply actions  

Agreed.

I found BTB way back when SBN was small (at least, relative to its current size) and I was curious about the other blogs. Now, it seems like you’re wading through a sea of hyper-focused blogs — not that there’s anything necessarily wrong with that — and there isn’t really a neat “fit” for blogs like BTB.

Maybe the SBN programmer-wizards could create a pseudo-RSS-feed that collected all articles from other blogs that tagged particular teams or players on those teams, and each blog could have a section of links like this? I’m not sure how much surly blog interactions you’d have, but I have to think it’d be beneficial to a blog like BTB.

by jwiscarson on Jun 8, 2010 2:24 PM EDT up reply actions  

it also helps

1. that we’re starting to see a few regulars in here commenting on the GotDs.
2. that I comment spam. I think it’s an attention-needing thing.

Bring your friends!

by Justin Bopp on Jun 8, 2010 1:30 PM EDT up reply actions  

I found you via Tango's Blog/Fangraphs

And I think you guys do a lot of the most accessible and interesting Sabermetric stuff around. There’s a very wonderful attitude of “Get your hands dirty” and a lot of active minds.

It makes it more interesting than Fangraphs, because Fangraphs is a bit short on heavier analysis. (To say the least!)

by Patrick42 on Jun 8, 2010 2:44 PM EDT up reply actions  

To be clear...

I LOVE and respect Fangraphs and read it daily. But the quantitative part of my brain is better fed here.

by Patrick42 on Jun 8, 2010 2:45 PM EDT up reply actions  

Agreed

I think FanGraphs has established themselves as more a sabermetric commentary website than sabermetric research. They generally have 5-10 short and sweet articles each day. Of course there is nothing wrong with that and it’s obviously worked well for them, but I think BtB could perhaps establish itself by doing more research and writing longer more in depth pieces. I think I’ve mentioned this to Sky before.

by vivaelpujols on Jun 10, 2010 3:19 AM EDT up reply actions  

Is that it’s intimidating to jump in because you don’t want to sound like a fool. I can speak from experience. I have been reading for a while, but am saying saying “eff” it, and if I sound like an idiot, oh well, life goes on. I like to talk/write and like baseball even more, so I’m just saying screw it and commenting at will.

Plus, who doesn’t love cool graphs, and who doesn’t love a HOF debate?

by Mark Kieffer on Jun 8, 2010 2:01 PM EDT up reply actions   1 recs

This is fantastic input.

And I’ve long since assumed people think I’m a fool. ;)

It’s a delicate balance on BtB, because we want to appear as authorities, but we also want to continue learning — and encourage our readers to comment. I mean, really.

That’s how each of us will gain a greater understanding, and those are my goals: first, to take a set of hard-to-understand data and put it in terms that the average joe (me) can relate to known quantities; and second, to get attention. :)

by Justin Bopp on Jun 8, 2010 2:15 PM EDT up reply actions  

A thought on generating traffic...

Perhaps if Beyond the Box Score tried to do some team specific features to draw more traffic from the team specific sites?
Suggest they stick around and check out the thinking going on around here.

I also wish I saw more links from Fangraphs and The Hardball Times over this way. FanGraphs has an SBNation partnership already….

by Patrick42 on Jun 8, 2010 2:47 PM EDT reply actions  

Thanks for all of your input, Patrick. Keep the ideas coming!

Our Top 50 Players series was aimed at pulling in new readers from team-specific blogs, but maybe we need to really hone in on one team for some articles.

by Sky Kalkman on Jun 8, 2010 2:50 PM EDT up reply actions  

I've got a Mets franchise WAR post coming soon.

We’ll have to alert some Mets blogs about it, perhaps. Doesn’t get much more team specific than a team franchise leaders post.

On Twitter: @baseballtwit

by adarowski on Jun 8, 2010 3:22 PM EDT up reply actions  

Sweet.

Amazin’ Avenue is one my absolute favorite SBN team-specific blogs (Arrowhead Pride and Royals Review notwithstanding)

by Justin Bopp on Jun 8, 2010 3:45 PM EDT up reply actions  

I'm not sure I can do this post justice.

1. Team-based interest: very much agreed. I expect that in the near future. Maybe we’ll revive the Value-Over-Contract stuff we did last fall?

2. The Eye-Test: aaand that’s where a quality visual comes in. I can attempt to provide those.

3. A fan just wants to believe what they want to believe: I completely disagree.

by Justin Bopp on Jun 8, 2010 7:57 PM EDT up reply actions  

I know I posted a book

But to tackle your points #2 and #3

  1. - I just mean that there are players who are more valuable and contribute more to wins than may meet the naked eye to the casual fan. As far as stuff like graphs and charts, I think an awesome job is done in here and helps drive home the point. I just mean the line of thinking as far as “I know what I see” , or " I don’t need a computer to tell me who’s good at the game", is still very popular amongst casual fans.

With #3 – I just am saying that it’s hard to convince a casual fan that a player may not be as good as they were. For example, by looking at Torii Hunter’s UZR , perhaps he isn’t as great of a fielder as his reputation shows, as over his career it’s only about 2 or so. But when a guy sees him robbing homeruns and making webgems on ESPN, it’s hard to make the case to the fan. Hopefully that makes sense.

None of what I was saying is a criticism of the site. Just thinking outloud. People were talking about increasing readership, and in short I think there needs to be an incentive for joe-schmo to check this stuff out. I think it’s possible, but was just listing some of the hurdles, along with my life story lol.

by Mark Kieffer on Jun 8, 2010 9:17 PM EDT up reply actions  

I don't know why it did that

That whole ‘1’ has to do with #2 point from Justin

by Mark Kieffer on Jun 8, 2010 9:17 PM EDT up reply actions  

top: y = -0.0002×4 + 0.0129×3 – 0.2934×2 + 2.5388x + 1.4995
average: y = -1E-04×4 + 0.0074×3 – 0.2057×2 + 2.0251x – 0.1689
bottom: y = 2E-05×4 + 0.0019×3 – 0.118×2 + 1.5114x – 1.8372

I won’t be mad if you find something wrong with them. In fact, I’d prefer if this was peer reviewed before making too much of a fuss about it.

by Justin Bopp on Jun 8, 2010 9:33 PM EDT reply actions  

I dunno

it makes me feel extremely uncomfortable. I know I’m about to be exposed as a know-nothing graphic charlatan at any moment!

by Justin Bopp on Jun 8, 2010 9:45 PM EDT up reply actions  

Look on the bright side...

I have absolutely no idea what you just posted.

But I could set it in Georgia with a 1.4 line-spacing and make it look hawt.

On Twitter: @baseballtwit

by adarowski on Jun 8, 2010 10:17 PM EDT up reply actions  

It's my favorite, too!

I try to shoe-horn it into as many reports as possible, but because my employer is so numerically-oriented, they hate the way it represents numbers. Le sigh.

by jwiscarson on Jun 9, 2010 11:38 AM EDT up reply actions  

Peer Review

I’m having a problem replicating the Top curve. I get a WAR of 6 in year 21. Eyeballing, it looks like it should be between 3 and 4. If I change the x^4 coefficient to -0.00021 I get 4.05. The point is, as you get out in the tail, you need more than one significant digit. That applies to the other two curves as well, although those look much better.

by fjm235 on Jun 9, 2010 2:21 AM EDT up reply actions  

Request: Kevin Appier

If/when you get around to doing pitcher career arcs like this, I’d love to see Kevin Appier. By the numbers, at least for his peak, he’s a hall of famer.
http://www.beyondtheboxscore.com/2009/12/15/1200949/the-2010-hall-of-fame-ballot
I’d be interested to see how he looks in your graphs.
-j

by JinAZ on Jun 11, 2010 1:36 PM EDT reply actions  

HOF Raw Data

Just to echo the sentiments of some fellow commenters, this is probably my favorite thing I’ve ever seen done with stats. I’d like to do my own, do you have the equations (or even the data for mean and standard deviation) for the three HoF lines?

by C.DeVore on Jun 12, 2010 9:26 PM EDT reply actions  

Nevermind, I’m retarded.

by C.DeVore on Jun 12, 2010 9:28 PM EDT up reply actions  

Comments For This Post Are Closed


User Tools

We use numbers and stuff.
Community Guidelines
Why be a member?

Follow us on Facebook!

Follow us on Twitter!

SaberGraphics

Yahoo_full_count

MLB Daily Dish

Get the latest MLB Trade Rumors, Transactions, and News at MLB Daily Dish!


Managing Editor:

Jbopp-kc_small Justin Bopp

Columnists:

Adam_small adarowski

Dme_small Satchel Price

Closeup4_small J-Doug

Carlosicon_small Julian Levine

Billy_and_daddy_4th_of_july_small Bill Petti

Featuring:

Dayton_small Jeff Zimmerman

12475953_small Jacob Peterson

Recent_pic_pg_small Patrick Gordon

Btbpro_small Dave Gershman

Me_small Bryan Grosnick

229331_10150183361996591_674441590_6760167_6637860_n3_small Lewie Pollis

Img_3830_small David Fung

30472_1481067225243_1190689185_1381415_997334_n_small Glenn DuPaul

1mnvxku7_small joshuaworn

Set_small MattFilippi18

Photo0011_small Nathaniel Stoltz