In the past year or so I made a conscious effort to learn more about baseball. I read more people, expanded the sites from which I get information, and follow people on Twitter who can inform me. Prior to that, I used primarily one source--Baseball-Reference.com (B-R), a site that provides me almost everything I'll ever need--for data any way I needed it (season, game, play-by-play) sliced and diced, the Play Index feature, a must-have for anybody even remotely interested in baseball research, and most importantly to me, little editorial content. I like to let data speak for itself.
On a whim, I sent out a tweet to Sean Forman, the founder of B-R asking if he'd answer some questions for me. Imagine my surprise when he actually said yes. This is the first of two posts, and without further ado, these are the questions and answers--in all cases, the block quotes are the exact questions and answers, with my (generally unfunny) comments interspersed. Also, big thanks to Comcast SportsNet Chicago's Christopher Kamka for supplying some of the questions:
What started your interest in statistics, and baseball statistics in general?
I’ve always been interested. I would sort my baseball cards by home runs or hits and then sort them again. I helped my dad with his football stats (he’s a HS coach) while in elementary school and kept the tackle chart in junior high. I also started several fantasy baseball leagues while still in HS.
Math professor at St. Joseph—what kind of math, educational background
I have a PhD in applied mathematics. My thesis was in protein folding, but my main background is optimization/Operations Research. I taught those courses while a math professor plus the intro courses.
Monthly unique visitors, Play Index subscribers, retention and growth
Across all sites we do about 300k visitors a day.
This site shows visits for all sports sites, and this is how baseball-related sites ranked amongst all sports sites (page views were not shown):
Noticeably absent from this list is Beyond the Box Score--we're working on that.
What led you to create Baseball-Reference, and how many people were there at the start? Now?
It was just an itch I had. I enjoyed web design and there was no place to get Cobb, Hornsby or Ruth stats online, so I decided to start one. At the start it was just me. Our company has five total employees now, but I’m still the primary programmer/manager of baseball.
I remember clearly the first time I saw B-R--my mind was opened and I said to myself "I've been looking for this my entire life!" It was around 2000 and it was a primary reason my interest in baseball was rekindled. One of the first things I did was check Ron Santo's similarity scores and said to myself "Yeah, he's a Hall of Famer."
How were metrics added as the years went along, such as WAR?
Basically it was just what felt we needed to add to stay relevant and answer user questions, like "who’s the best player?" Who’s the best fielder? I’m always trying to keep up with the literature and while we will never be bleeding edge, I hope we can be leading edge for most users.
I first became aware of WAR through Baseball Prospectus' WARP and the moment I saw them, I realized I had found the number I had been searching for. I had been using the various numbers in Total Baseball but those weren't electronic or sortable. My jaw dropped when I saw BP WARP, and then I stumbled across B-R WAR values about a year later. I'm not particularly interested in the difference between the calculations--I don't deny they exist, it just doesn't matter to me since the only thing I'm interested in is seeing where a player ranks. It doesn't matter to me that Albert Pujols' career rWAR is 93.1, #2 among active players (Alex Rodriguez is still considered active)--I'm very interested that Derek Jeter is #3 at 71.6. That's a big gap. Take a look at the FanGraphs list--the numbers change, but the ranks are very similar.
What are YOUR favorite metrics to analyze players? Least favorite?
Obviously I think WAR is a good measure for comparing players. I use OPS+, ERA+. I would probably be a fairly orthodox stathead.
I don't consider myself a sabermetrician and reserve that term for those who either create the metrics like Bill James or Voros McCracken or can explain the mathematics behind the numbers. What I am is a data amalgamator, someone who takes reams of data and looks for patterns, some of which might mean something. This is why I like WAR so much, because it cuts things down to something manageable. It's not perfect, but nothing is, and it's a vast improvement over what was previously available and is continually being improved.
What role do you think B-R played in the acceptance of metrics, and advanced metrics in MLB?
It’s hard to say. I think that WAR has had a definite impact in different ways. For instance, while many voters didn’t use it to cast their MVP votes, they did feel they needed to justify why their votes differed from those rankings. You can also see that in the defensiveness of the people who were voting for Jack Morris in the HOF. For us, the metrics are the metrics. If someone can point out a flaw we’ll fix it and acknowledge it. I’m not creating metrics to rob Cabrera of MVP’s or keep Jack Morris out of the HOF. We are just trying to, as objectively as possible, measure what is going on on the field.
Without realizing it, this answer told me I had been doing the same thing for the past 15 years. I will make arguments when discussing MVP voting (Miguel Cabrera the past two years, although I understand how close Mike Trout was) or HOF voting (no to Jack Morris) and freely admit that statistics by themselves don't tell the entire story. This puts me in the company of just about every other stathead who ever was, is or will be. Having written that, I prefer to let the numbers speak for themselves, since they should tell the same story to any reasonably intelligent reader.
I'll continue with the interview in my next post, which will be up on Monday, April 14th.
Follow Scott on Twitter @ScottLindholm