clock menu more-arrow no yes mobile

Filed under:

BtB Sabermetric Writing Award Results: Best Novel Research Article/Project

The votes are in, data have been compiled, and it's time to announce the winners of the BtB Sabermetric Writing Awards!  We will have a series of seven posts this week, each featuring a different category.  As the major emphasis of this project is to celebrate sabermetric writing and research of the past year, and not so much to just crown a specific "winner," we will be drawing attention to several of our top vote-getters in each post while applauding all of the nominees.  

As a reminder, these awards were determined by a 50% internal and 50% external vote.  Because of the number of BtB authors that were nominated for categories (and thus not eligible to vote in those categories), we opted to expand our internal voting roll with a number of guest voters.  These individuals were selected as experts in our field, and we appreciated their input into this process.  We thank Mike Fast, KJOK, Dan Novick, Patriot, Eric Seidman, Sean Smith, Dan Szymborski, and Tangotiger for agreeing to cast one of these guest ballots (and no, for the record, they did not vote for themselves!).

First off is the award for "Best Novel Research Article or Project."  Here is the award category description:

Original sabermetric research that enhances our understanding of some general aspect of baseball. These studies should help establish new sabermetric principles, metrics, techniques, or perspectives. Think "breakthrough research" when nominating this category.

We had twelve nominees in this category, all of which were examples of some of the best new work done in sabermetrics in 2009.  We received a sizable number of links to our voting post simply because it was such a terrific collection of research!  In the end, though, several articles did rise to the top in both our internal and external voting.  Here are the top four.

4.  Matt Swartz - Improving BABIP Estimation

The best way I suggest to approximate BABIP is probably using the 3-year average method, but some of the one-year regressions are useful too.  A correlation of .63 between the predicted BABIPs and the actual BABIPs is better than the .45-.55 range of results that any of my one year regressions, Dutton’s new regression, or Tom Tango’s Marcel will get you.

One of the emphases over the past few years around the sabermetric blogosphere is to try to isolate luck from hitting lines.  With pitchers, we have fairly well established approaches: FIP, xFIP, DIPS, QERA, tRA, tRA*, etc. But with hitters?  There have been a variety of attempts, but the problem is more complex because hitters do have substantial control over things like their batting average on balls in play (BABIP).

Matt's article details his development a regression equation that attempts to predict a player's BABIP from historical data.  There is more work to be done on the issue.  But Matt shows all of his work, and the result is an incremental improvement over what a simple forecasting system like Marcel can do in terms of BABIP estimation.  I tend to think that the next great advance in hitter projections will require heavy use of batted ball (and probably hitf/x) data, and Swartz's work could help lay a foundation for such work.


3. Josh Kalk - The Injury Zone

Vertical movement was the second most important and, if you combine horizontal movement, total movement is actually more important than speed. Movement here is created entirely by spin and drag, so if a pitcher isn't quite right it is very hard to get the proper spin on the ball. Vertical movement is likely more important than horizontal because most of the spin applied is backspin (fastball) or front spin (curveball) which makes the move up or down. Large horizontal movement is most often found in sliders, but not all pitchers throw a slider and even those who do don't always produce a large slide with it. 

Kalk's was, and to my knowledge, still is, the best attempt to develop a tool that could identify a pitcher injury several pitches before it happens.  It uses a combination of pitchf/x input and an artificial neural network algorithm to develop a means of evaluating a player's current performance against his own history and thus identify arm problems.  Furthermore, beyond the immediate application of helping you pull a player before he actually breaks, it could also be used to assess severity of arm stress on players, as his reference to C.C. Sabathia demonstrates.  Amazing work that absolutely fits the bill as groundbreaking research.

Kalk's article went online on February 17th, 2009.  On March 23rd, he removed his pitchf/x player cards from his site, and disappeared--only to reappear on the roster of the Tampa Bay Rays front office.  I don't think it's unreasonable to suggest that this is the article that got Josh hired--though, of course, he did a lot of other tremendous work in the years leading up to this article as well.


2. Mike Fast - Confessions of a Dips Apostate

We have seen that the direction a fly ball is hit has a huge effect on the home run chances for that fly ball and also affects the batting average even if the ball stays in the park. A fly ball hit to the batter's pull field is more than six times as likely to leave the park as a fly ball hit to center or the opposite field, and flyball BABIP improves by over 50 points to the pull field.

We also found that whether or not a batter pulls the ball in the air appears to be a persistent characteristic, and it appears that pitchers may have a similarly persistent but much weaker characteristic in the fly balls they allow. Adding the 2005-6 MLBAM Gameday data to the sample would help detect the pitcher skill, if it exists, for some or all pitchers.

DIPS--or, more specifically, the idea that pitchers have little control over their batting average on balls in play--may be the most radical and simultaneously successful idea to come from the sabermetric movement.  It fundamentally changed how we--as fans, analysts, and even members of front offices--evaluate pitcher performances.  

The thing about DIPS is that it is, as originally described, an overstated case.  Pitchers ultimately do have some repeatable skill to control the outcomes of batted's just that it's a much weaker effect that batters and, at the level of a single season's data, is swamped out by more random effects.  This had been demonstrated before, but Fast's article--which employs gameday data to track outcomes of batted balls--is one of the more straightforward demonstrations of this to date.


1. Victor Wang - Valuing a Draft (part 1 & part 2)

If teams decide to put a a value on draft picks they may receive for a Type A free agent, there are a lot of probabilities they'll need to calculate. Given the uncertainty of off season events, it can be rather difficult in estimating these probabilities. Because of this, I feel that the most reasonable projection for the value of Type A draft picks would be something between $3-5 million. I'll make sure to note this update in any future trade valuations.

Victor exploded onto the scene less than three years ago with a WARP-based look at prospect values published in By the Numbers.  It was one of the first quantitative descriptions of prospect value--given in actual surplus dollars saved--I'd ever seen, and immediately changed the detail with which we could evaluate trades involving prospect players.  Subsequent work improved the study with the use of better metrics and extended it to lower-quality prospects.  With this information in hand, we could judge prospects based on their Baseball America (or Sickels, in later work) ranking, rather than having to make some nebulous projection into the future.  Furthermore, by providing surplus dollar values, we could make apples to apples comparisons of how valuable a prospect was compared to a veteran player with a big contract.

Wang has been building upon this work ever since.  This year, one of his biggest contributions--and the project for which he wins our first Saber award--allows us to place dollar values on draft picks won as compensation for losses of free agents.

The values were surprisingly high--especially for top-tier picks--and helped explain some of the struggles that some free agents (e.g. Orlando Hudson) encountered last offseason in which teams were apparently reluctant to sign them due to the draft picks they would have to give up.  In part 2, Wang revisited the question previously investigated by both Bill James and Rany Jazayerli (at BPro) about differences between high school/college/hitter/pitcher selections in the draft.  He found relatively little evidence of massive differences between high school and college players.  Nevertheless, consistent with his earlier work, found that elite hitters tended to be dramatically betters selections in early rounds, with many excellent pitchers emerging from later rounds.


Other nominees

While they did not receive the same level of voting support as the above articles, this was an extremely competitive category, and most of our top nominees received considerable support in both the internal and external voting.  Here are the rest of the nominees, in alphabetical order by author.

Dave Allen: PitchF/X Detective: Has Bradley's Strike Zone Been Widened

Brian Cartwright: Major League Equivalencies

Jeremy Greenhouse: Controlling the Zone

Adam Guttridge: Guttridge-Wang Trade Model

Mitchel Lichtman (MGL): An Age Old Question

Max Marchi: Chase-ing the FieldF/x

Greg Rybarczyk: 2009 Projections with Hit Tracker

Colin Wyers: When Is A Fly Ball A Line Drive?

Please join me in congratulating all of these authors!  Here's to a great 2010!