clock menu more-arrow no yes

Filed under:

Fun With Sim Scores: My HOF Ballot

New, 2 comments

It's the end of the week and I thought I'd depart from my usual heavily numeric posts and have some fun.

In the downtime between the last pitch of the World Series and the first tosses of spring training, our thoughts turn to two major topics: transactions and the Hall of Fame.

Whether we're trying to decide which left fielder our team should sign, or which outfielders belong in the Hall of Fame (Rickey and Raines, but no Rice), we're often comparing players.

There are multitudes of methods that allow us to look at how players relate to each other.  Some are more correct than others (WAR versus batting average) but few are as much fun as Bill James' similarity scores.


For those who aren't familiar with the background of similarity scores, James first created them in his 1986 Baseball Abstract, and then expanded on them in his book The Politics of Glory.  The goal is to identify how similar two players are based on a variety of stats, mostly counting, some rate, and none very advanced. 

James is quick to point out that two players merely being similar doesn't make them similarly valuable, and therefore sim scores should be no more than a quick and dirty approach to similarity.

If you're interested in the actual details of the calculations, an overview can be found at Baseball Reference.  In general, two identical players would have scores of 1000, and differences would be subtracted from there.  The smaller the number, the less similar players are.  The smaller the number of the most similar player, the more unique the player in question is.  For example, Babe Ruth's most similar batter is Barry Bonds at 738 (according to BBRef), which makes Ruth one of the most unique players.

That's a pretty long introduction to the main thrust of my post.  Sim scores as James conceived them have one big problem - there's no context.  40 home runs in 2001 Coors Field count the same as 40 home runs in 1968 Dodger Stadium.  What I've done is park and era adjust these stats to figure out more "correct" similarity scores.  I haven't moved away from James' basic formula, so we're not using anything more advanced than slugging percentage for batters or ERA for pitchers - I've just added context to those scores.  I'm pretty happy with the results for batters, but the pitchers might need some work. 

Let's use this approach to look at some of the players on the Hall of Fame ballot.   Today I'll start with the players I'd elect, and then we'll consider the rest of the ballot this weekend.

Rickey Henderson:

1. Max Carey* 713
2. Tim Raines 699
3. Billy Hamilton* 689
4. Roy Thomas 682
5. Lou Brock* 671
6. Dummy Hoy 669
7. Kenny Lofton 669
8. Fielder Jones 667
9. George Burns 661
10. Harry Hooper* 658

With a top sim score of 713, Rickey is very unique.  Not that any of you are surprised by that.  4 Hall of Famers in his top 10, plus Raines who I feel should be.  The rest of his list tend to be deadballers who were fast and hit pretty well.

Tim Raines:

1. Paul Molitor* 904
2. Sam Rice* 883
3. Lou Brock* 878
4. George Van Haltren 873
5. Billy Hamilton* 860
6. Max Carey* 858
7. Fred Clark* 858
8. Harry Hooper* 858
9. Ben Chapman 856
10. Lonnie Smith 855

Even more Hall of Famers for Raines than for Henderson (although Fred Clark might be in as a manager).  While that's not an argument in itself for electing Raines, it certainly looks good.  Of course many of his similar players played long ago in a very different game.  But then there's Paul Molitor at the top, who easily strolled into Cooperstown.

Alan Trammell:

1. Jack Glasscock 933
2. Barry Larkin 933
3. Johnny Logan 920
4. Jay Bell 916
5. Dickie Thon 913
6. Joe Sewell* 908
7. Luke Appling* 904
8. Edgar Renteria 901
9. Phil Rizzuto* 900
10. Bill Doran 899

Not as impressive a list as Raines or Rickey, but still 3 Hall of Famers and one should-be in Larkin.  Remember that these sim scores don't consider defense at all, which is a positive for Trammell.  Trammell's not an inner-circle SS, but in my mind he meets the threshold.

Mark McGwire:

1. Willie McCovey* 820
2. Jim Thome 812
3. Jimmie Foxx* 807
4. Harmon Killebrew* 800
5. Sammy Sosa 781
6. Jose Canseco 771
7. Mel Ott* 768
8. Mike Schmidt* 767
9. Frank Thomas 762
10. Carlos Delgado 757

I don't think there's much argument that McGwire's stats measure up to the Hall of Fame's standard slugging first basemen.  As many as 8 of his top 10 scores may make the Hall of Fame (sorry Jose and Carlos) barring outside influences.  And of course that's the rub with McGwire himself.  Personally, I don't really hold it against him, but I understand why others do.

Bert Blyleven:

1. Don Sutton* 913
2. Steve Carlton* 905
3. Curt Schilling 898
4. Red Faber* 894
5. Jamie Moyer 892
6. Phil Niekro* 888
7. Vida Blue 886
8. Earl Whitehill 883
9. John Candelaria 883
10. Eppa Rixey* 878

A lot of Hall of Famers on this list too.  Generally medium peak, long prime, long career pitchers. And neturalizing for park and era doesn't help Bert with his lack of run support, even if he did underperform other elite pitchers in that sense.

Well that's all for my ballot this year.  I'm seriously considering Tommy John and Dale Murphy, but haven't pulled the trigger yet.  I could definitely be convinced on them though.  We'll look at their similar players along with Rice, Morris and some others this weekend.

Next week, we'll take a look at some of the major free agents from this offseason to see if we can identify any patterns in their most similar players.