It's the end of the week and I thought I'd depart from my usual heavily numeric posts and have some fun.
In the downtime between the last pitch of the World Series and the first tosses of spring training, our thoughts turn to two major topics: transactions and the Hall of Fame.
Whether we're trying to decide which left fielder our team should sign, or which outfielders belong in the Hall of Fame (Rickey and Raines, but no Rice), we're often comparing players.
There are multitudes of methods that allow us to look at how players relate to each other. Some are more correct than others (WAR versus batting average) but few are as much fun as Bill James' similarity scores.
For those who aren't familiar with the background of similarity scores, James first created them in his 1986 Baseball Abstract, and then expanded on them in his book The Politics of Glory. The goal is to identify how similar two players are based on a variety of stats, mostly counting, some rate, and none very advanced.
James is quick to point out that two players merely being similar doesn't make them similarly valuable, and therefore sim scores should be no more than a quick and dirty approach to similarity.
If you're interested in the actual details of the calculations, an overview can be found at Baseball Reference. In general, two identical players would have scores of 1000, and differences would be subtracted from there. The smaller the number, the less similar players are. The smaller the number of the most similar player, the more unique the player in question is. For example, Babe Ruth's most similar batter is Barry Bonds at 738 (according to BBRef), which makes Ruth one of the most unique players.
That's a pretty long introduction to the main thrust of my post. Sim scores as James conceived them have one big problem - there's no context. 40 home runs in 2001 Coors Field count the same as 40 home runs in 1968 Dodger Stadium. What I've done is park and era adjust these stats to figure out more "correct" similarity scores. I haven't moved away from James' basic formula, so we're not using anything more advanced than slugging percentage for batters or ERA for pitchers - I've just added context to those scores. I'm pretty happy with the results for batters, but the pitchers might need some work.
Let's use this approach to look at some of the players on the Hall of Fame ballot. Today I'll start with the players I'd elect, and then we'll consider the rest of the ballot this weekend.
Rickey Henderson:
1. | Max Carey* | 713 |
2. | Tim Raines | 699 |
3. | Billy Hamilton* | 689 |
4. | Roy Thomas | 682 |
5. | Lou Brock* | 671 |
6. | Dummy Hoy | 669 |
7. | Kenny Lofton | 669 |
8. | Fielder Jones | 667 |
9. | George Burns | 661 |
10. | Harry Hooper* | 658 |
With a top sim score of 713, Rickey is very unique. Not that any of you are surprised by that. 4 Hall of Famers in his top 10, plus Raines who I feel should be. The rest of his list tend to be deadballers who were fast and hit pretty well.
Tim Raines:
1. | Paul Molitor* | 904 |
2. | Sam Rice* | 883 |
3. | Lou Brock* | 878 |
4. | George Van Haltren | 873 |
5. | Billy Hamilton* | 860 |
6. | Max Carey* | 858 |
7. | Fred Clark* | 858 |
8. | Harry Hooper* | 858 |
9. | Ben Chapman | 856 |
10. | Lonnie Smith | 855 |
Even more Hall of Famers for Raines than for Henderson (although Fred Clark might be in as a manager). While that's not an argument in itself for electing Raines, it certainly looks good. Of course many of his similar players played long ago in a very different game. But then there's Paul Molitor at the top, who easily strolled into Cooperstown.
Alan Trammell:
1. | Jack Glasscock | 933 |
2. | Barry Larkin | 933 |
3. | Johnny Logan | 920 |
4. | Jay Bell | 916 |
5. | Dickie Thon | 913 |
6. | Joe Sewell* | 908 |
7. | Luke Appling* | 904 |
8. | Edgar Renteria | 901 |
9. | Phil Rizzuto* | 900 |
10. | Bill Doran | 899 |
Not as impressive a list as Raines or Rickey, but still 3 Hall of Famers and one should-be in Larkin. Remember that these sim scores don't consider defense at all, which is a positive for Trammell. Trammell's not an inner-circle SS, but in my mind he meets the threshold.
Mark McGwire:
1. | Willie McCovey* | 820 |
2. | Jim Thome | 812 |
3. | Jimmie Foxx* | 807 |
4. | Harmon Killebrew* | 800 |
5. | Sammy Sosa | 781 |
6. | Jose Canseco | 771 |
7. | Mel Ott* | 768 |
8. | Mike Schmidt* | 767 |
9. | Frank Thomas | 762 |
10. | Carlos Delgado | 757 |
I don't think there's much argument that McGwire's stats measure up to the Hall of Fame's standard slugging first basemen. As many as 8 of his top 10 scores may make the Hall of Fame (sorry Jose and Carlos) barring outside influences. And of course that's the rub with McGwire himself. Personally, I don't really hold it against him, but I understand why others do.
Bert Blyleven:
1. | Don Sutton* | 913 |
2. | Steve Carlton* | 905 |
3. | Curt Schilling | 898 |
4. | Red Faber* | 894 |
5. | Jamie Moyer | 892 |
6. | Phil Niekro* | 888 |
7. | Vida Blue | 886 |
8. | Earl Whitehill | 883 |
9. | John Candelaria | 883 |
10. | Eppa Rixey* | 878 |
A lot of Hall of Famers on this list too. Generally medium peak, long prime, long career pitchers. And neturalizing for park and era doesn't help Bert with his lack of run support, even if he did underperform other elite pitchers in that sense.
Well that's all for my ballot this year. I'm seriously considering Tommy John and Dale Murphy, but haven't pulled the trigger yet. I could definitely be convinced on them though. We'll look at their similar players along with Rice, Morris and some others this weekend.
Next week, we'll take a look at some of the major free agents from this offseason to see if we can identify any patterns in their most similar players.