clock menu more-arrow no yes

Filed under:

The BtBS guide to analytics tools and resources

New, 1 comment

If you're looking to learn more about baseball analytics, you can start here. Our little corner of the internet would not be possible without these resources.

The world of public baseball analytics is fairly robust. There are plenty of examples of intrepid individuals pioneering new areas of research with new and old data (Voros McCracken, Mike Fast). Public baseball analysis, and the ensuing writing, has evolved into something of a cottage industry that would not be possible without the contributions of various entities.

The goal of this article is to provide a reference point for those various entities. What follows is a list of the resources, or tools, available to the general public for baseball analysis and some information about those resources. I will not claim this list is complete; if I have missed something, please comment so that I can add it to the list.

Retrosheet

I'll let Retrosheet concisely explain what it is they do: "Retrosheet was founded in 1989 for the purpose of computerizing play-by-play accounts of as many pre-1984 major league games as possible."

Retrosheet houses data after 1984 as well; Retrosheet is often the engine that makes other sites go. Click here for a more detailed description of Retrosheet.

In essence, Retrosheet can be used to grab large amounts of play-by-play data.

The Lahman Database

I'll let Sean Lahman concisely explain what it is he and his team do: "The updated version of the database contains complete batting and pitching statistics from 1871 to 2014, plus fielding statistics, standings, team stats, managerial records, post-season data, and more."

This functions as a relational database, which means it won't work unless the user has a database application.

FanGraphs

FanGraphs (FG) has several elements to its website. First, it houses an extensive database of statistical information and provides a platform for contemporary and fantasy baseball analysis.

If you want projections, FanGraphs has Steamer and ZiPS. If you want value calculations, FanGraphs is one of the websites that calculates Wins Above Replacement (WAR), often abbreviated as fWAR. Defensive statistics. PITCHf/x statistics. In-house statistics. A user can create custom leaderboards, dive into detailed player information, and export data.

FanGraphs also houses a library of explanations for advanced baseball statistics.

Baseball Reference

Baseball Reference (BR) houses an extensive database of statistical information, but it does not provide analysis or writing. It is one of the outlets that calculates Wins Above Replacement, often abbreviated as bWAR. Marcel projections can be found here.

A user can export data and view plenty of unique statistics, including some in-house statistics, but the bread-and-butter of BR is the Play Index. The Play Index allows the user to define search parameters to find all results matching those parameters. For example, if a user wanted to find all games in which any player both struck out three times and hit two home runs, the user can create such a query and view the results.

While Baseball Reference's site is free, the Play Index is not. A user may obtain a full-year subscription for a nominal fee.

Baseball Prospectus

Baseball Prospectus (BP) houses an extensive database of statistical information and provides contemporary and fantasy baseball analysis. Much of its statistical information is developed in-house, which is similar to FG and BR. It is one of the websites that calculates Wins Above Replacement, often abbreviated as WARP. The PECOTA projection system can be found here.

Large portions of Baseball Prospectus' content sits behind a paywall, but much of the site is available for free.

Cot's Contracts

Part of the Baseball Prospectus site is Cot's Contracts, a contract repository for all MLB teams. If you want to know the Opening Day payroll history of your favorite team, you can go here. If you want to know your team's salary obligations for the next several years, you go here.

MLB Trade Rumors

Run by Tim Dierkes, MLB Trade Rumors focuses on reporting trades, transactions, and rumors affecting rosters for MLB teams. However, the site provides a few tools for users to derive value.

First, the site produces well-regarded projections for arbitration salaries. There is a free agent tracker, an arbitration tracker, and other trackers. There is a database of which players have which agents.

Baseball Savant

Run by Daren Willman, who just took a job with MLB, Baseball Savant provides an interface to query the PITCHf/x database. There are many query parameters, and the website produces a variety of outputs based on a user's query. One of the outputs is the raw data, which can be exported for further analysis.

Brooks Baseball

Brooks Baseball provides PITCHf/x data with a twist. According to their website, referring to PITCHf/x's automated classifications of each pitch, "automated classifiers have difficulty with certain repertoires and pitch types." Because of that, Brooks Baseball "[makes] systematic changes that improve the quality, usefulness, and usability of [PITCHf/x] data."

Users may analyze pitch data of individual players, both pitchers and hitters. Zone by zone counts of pitch types, results against pitch types, and velocity and movement are available, among many other things.

Baseball Heat Maps

Run by the Zimmerman brothers, Baseball Heat Maps provides database downloads, applications to interface with PITCHf/x data, visuals, and blogging. There is a good amount of information regarding injuries, specifically Tommy John surgeries.

There are plenty of other uses for this website, but I often use the "Angle and Distance of batted balls" applications for pitchers and hitters.

Texas Leaguers

Run by Trip Somers, Texas Leaguers provides ways to interact with and visualize the PITCHf/x database, blog posts, and strength and conditioning information. A user can view pitch trajectories by pitch type, pitch locations, velocity, movement, release points, and spin among other things.

The pitchRx Package for R

Created by Carson Sievert, this website provides a package for the free statistical analysis programming tool called R. This package is built to allow a user to interact with the PITCHf/x database easily.

Python-based PITCHf/x Parser

Created by John Choiniere right here at Beyond the Box Score, this article provides a new way to parse and scrape PITCHf/x data. The purpose of this article was to build a way to unify PITCHf/x data with extra information like plate appearance outcomes and game situations.

Baseball on a Stick

This project is intended to provide code to be used with MySQL and Python to create a database of major league baseball game events which are freely provided by the mlb.com Gameday application and retrosheet.org . All major and minor league pitch location and game statistic data can be downloaded using BBOS.

ESPN Home Run Tracker

Anything and everything you ever wanted to know about home runs.

TangoTiger.com

As one of the authors of The Book, Tom Tango offers explanations for a lot of sabermetric concepts, including linear weights and Defense Independent Pitching Statistics (DIPS). The Marcel projection system was originally developed by Tango.

Again, I'm sure there are things out there not on this list. I hope I haven't missed anything blatantly obvious. If you know of something that's not on this list, please comment!