clock menu more-arrow no yes mobile

Filed under:

An interview with Baseball-Reference founder Sean Forman: Part 2

The second part of my interview with founder Sean Forman.

Jonathan Daniel

The first part of my interview with (B-R) founder Sean Forman was published last week. These questions delve a little deeper into his thinking, and in all cases, the block quotes are the exact question and answer, with my commentary added. Several questions were suggested by Comcast SportsNet Chicago's Christopher Kamka.

The last question in the previous post asked about the role B-R played in the increased use of baseball metrics and is repeated here:

What role do you think B-R played in the acceptance of metrics, and advanced metrics in MLB?

It’s hard to say. I think that WAR has had a definite impact in different ways. For instance, while many voters didn’t use it to cast their MVP votes, they did feel they needed to justify why their votes differed from those rankings. You can also see that in the defensiveness of the people who were voting for Jack Morris in the HOF. For us, the metrics are the metrics. If someone can point out a flaw we’ll fix it and acknowledge it. I’m not creating metrics to rob Cabrera of MVP’s or keep Jack Morris out of the HOF. We are just trying to, as objectively as possible, measure what is going on on the field.

Which led directly to this question:

Are you working to formalize that role in broadcasts?

No, we aren’t working directly. I have friendly relationships with many broadcasters and I know they rely on the site day in day out. The highlight of my career so far was when Vin Scully namechecked BR in referencing the last time something happened in a Dodgers game.

I primarily watch Cubs and White Sox games, and their television crews take very different approaches to the use of advanced metrics. The Cubs tandem of Len Kasper and Jim Deshaies work to incorporate them when they add context and nuance to a broadcast, and the Sox duo of Ken Harrelson and Steve Stone . . . doesn't. In fact, there are occasional days in which a statement from Len is used as a promo on the B-R home page. The fact Vin Scully uses B-R is probably all any broadcaster needs to know.

How are your relationships with FanGraphs and Baseball Prospectus? How would you describe the way you’re different from them?

We get along very well IMO. When you deal with baseball data as much as I do or David Appelman does or Sean Holtz at does or Colin Wyers did, a lot of mutual respect builds up because you all know how hard some of the stuff is and you can share an inside joke like having to deal with Jean Segura’s steal of 1B.

We have zero interest in editorial work. We want to answer user questions as quickly as possible and research and entertainment is much more what they do.

B-R is a data-intensive site. FanGraphs incorporates advanced metrics like weighted on-base average, fielding independent pitching and PITCHf/x data for folks who care for that level of analytics. They also provide a forum for prospective writers. Baseball Prospectus has much more editorial content along with their PECOTA projections and playoff odds. They all bring value, and taken together address baseball questions from different approaches. I admire all three sites but default to B-R because of the ease with which I can access reams of information, especially game and play-by-play data. Having written this, it's a rare day I don't use all three sites.

Do you have a favorite/most interesting Play Index search result (mine is this)?

No not necessarily. I’m proud of how fast some of the results are as I know how hard it is to do some of it.

The search I included in my question will come front and center in the Hall of Fame voting this winter--click it and see what jumps out at you. For people interested in baseball research who don't subscribe to the Play Index feature ($36 a year), I don't have the words to describe what you're missing. Favorite follows like High Heat Stats and MLB Play Index can educate you in ways to use these tools and are very friendly and engaging.

Has a current/former player made a memorable comment regarding his B-R page?

I’ve heard from Stan Bahnsen. On the basketball side Jim McIlvaine tried to negotiate down the cost to sponsor his page. Dave Stieb sent me corrections to his salaries. I heard from Octavio Dotel’s agent that he really didn’t like his sponsorship message.

This was one of Christopher's questions and the answer is priceless, since these are four names I'm pretty sure I had no idea would make it into any post I ever write.

What kinds of relationships do you have with major league teams?

It’s friendly, but I don’t hear a whole lot from them unless something is wrong or broken and they come out asking me to fix it. :) Will you obtain access to PITCHf/x or FIELDf/x data?

What about MLBAM’s move to measure everything in a ballpark?

I’m looking at PitchFx, but nothing is definite. It’s a great move for them. I’m skeptical we’ll see much raw data. I hope that they will consider doing something like allowing 2 year old data out to the public, but we’ll see.

Is there was one category or data set you could add (whether possible or not), what would it be?

The advanced fielding stuff would probably it. Measuring distances traveled, reaction times and the like would be very very useful. On a more likely note, I’m hoping to add DL data sometime this year.

I love what Dan Brooks at Brooks and Daren Willman at have done with PITCHf/x data, and there's quite a bit on FanGraphs as well. I personally am very curious how the MLBAM data will be released, because I think it can change the way fielding is evaluated, but we won't know until we see the data.

Do you perceive any pushback to the explosion of data in baseball, some sense we’re losing the forest for the trees? Another way to put this—has the power and ease of data crunching possibly substituted measurement for analysis?

Perhaps, that’s not really my area of concern. We try to incorporate analysis into some of the measures we publish, but we are really mostly about measurement, so it’s up to the user to use that in a good way. We do try to avoid "junk numbers" and push the users in more responsible directions, but in the end it’s their responsibility.

This is a question I struggle with on a daily basis, because there's a big difference between data and analysis. The fact we can aggregate mounds of data doesn't mean that the conclusions we draw have predictive value, or even that they're correct.

When will B-R be enshrined in the Hall of Fame?

Hey, we already are in the HOF as we power their Top Ten Tower and are their official stats provider. That’s plenty good for me.

Anyone who does any amount of baseball research is familiar with Baseball-Reference, and it's good to understand the motivation of the people behind it. New features and measures are constantly being added, and they're amazingly responsive with requests. For bloggers, they even have a linking tool to get content on a given player's page, which can deliver tremendous visibility. I'd have nothing to write about without B-R, so I can't state enough my appreciation to Sean for creating and improving the site and for taking the time to answer these questions.

Thanks to Sean Forman for answering the questions and to Christopher Kamka for developing several of them. All data from

Scott lives in Davenport, IA and can be followed on Twitter @ScottLindholm