Diamond Dollars 2015: Cole Hamels Trade Machine (Stanford)

Re-publishing a writeup for the Diamond Dollars Case Competition, which was to work on trades for Cole Hamels. This is the group from Stanford, who won the undergraduate division of the competition.

[Editor's Note: Jordan Wallach will be joining BtBS as a contributor. Jordan worked on this case with the other authors named below. This manuscript was created before the 2015 season began. It has, however, been slightly edited by the BtBS staff for presentation here. We are happy to present it.]

2015 Diamond Dollars Case Competition:

Developing a Trade Machine to Determine the Optimal Cole Hamels Trade

Jordan Wallach, Avner Kreps, Vihan Lakshman, Do-Hyoung Park, Alec Powell

Stanford University

Introduction

This year, the competition's prompt was to develop two trade scenarios for Cole Hamels, optimizing the value added to both the Phillies and their trade partner. (Note: We were told to ignore Hamels' personal preferences - geographic and team - as well as any no-trade or limited trade clauses in his current contract.)

Right off the bat, we broke the case down into three parts:

How do we project the performance of Cole Hamels and traded prospects over the coming seasons?
How do we quantify a team's improvement upon the addition of Hamels?
Can we develop a metric to evaluate the value of a trade to both the Phillies and their trade partner?

Cole Hamels Projection and our Aging Curve

When we began thinking about ways to project a starting pitcher's performance over the course of his career, we first wanted to use a linear regression model. We took a dataset of every qualifying player-season dating back to 1961 (the start of the expansion era) and selected various metrics that we were interested in testing for a correlation with increasing age, among them: K%, BB%, and ERA-. However, we quickly realized that we needed to change the way we thought about that for one major reason: the data was self-selecting.

Simply put, player-seasons at the extremes (lower-20's and mid-40's) exist because those players are outliers. They've proven to their teams that they are capable of pitching at the major-league level, but that isn't representative of the majority of players at that age.

As a result, we decided to build an aging curve plotting fWAR. The goal was to assess the variability of Cole Hamels' performance over his first eight major-league seasons to the curve, and then use that average deviation to predict upper and lower bounds for his performance through the remainder of his contract and beyond.

Something of note is that each of Hamels' data points for his first eight seasons (after scaling his single-season fWAR up to the total fWAR for all player-seasons at the given age) were above the polynomial best fit aging curve. Thus, as you can see in the plot below, the green data points with the error bars are our projections for Hamels through his age-38 season.

Prospect Performance

After projecting Hamels' expected performance, we also wanted to project the performance of top prospects in every team's system to get an idea of their future contributions. It's safe to say that big trades - such as the one for this case - need good prospects involved. We decided to formulate trade proposals that involve only "top" prospects for either side (as identified by Baseball America in their annual preseason rankings).

We developed a computer program that aimed to predict the MLB impact of a current prospect in a team's system using similar statistical seasons. The program was a Python script with an integrated SQL database consisting of two tables: the first being the top 10 prospects on each potential trade partner and their most recent statistical minor league seasons and the second being the career service time and fWAR for all former top 10 prospects from 2010-2014 currently in the majors.

Using our intuition, we wanted to base the performance of these prospects off the career MLB performance of former top prospects, hence the second table in our database. The program finds "comparable" statistical seasons based on the comparison statistic, which we defined to be wOBA for batters and FIP for pitchers.

As input, the program takes an MLB potential trade partner with a list of top 10 Baseball America prospects and tradeable MLB assets. We then merge the top 10 prospects with the minor league database and find, for any prospect p, the matches in the database with the same age as p, in the same minor league level as p, and with a wOBA or FIP in the same range as p. This gives us a list of comparable former minor league player seasons on the order of 100. In addition, we filter this list and return only those players of comparable BA valuation (who were former BA top 10 prospects) because they are highly valued and more likely to be traded. They are a more similar type of asset to our current prospects.

For each of our matches from above, we find the average fWAR we could expect from the original prospect p in 2015 by taking the sum of career MLB fWARs of the comparable players divided by number of career seasons. The projected WAR output is the baseline projection that gets updated in our trade machine optimization program based on team needs.

Marginal Benefit of Adding Cole Hamels

Our task moves to evaluating how we can apply this to possible trades, and that starts with identifying how exactly the addition to Cole Hamels will impact teams on a quantitative scale in order to isolate the "top tier" of teams that have the most to gain by adding Hamels.

In the problem statement that was given to us, the importance of making the playoffs was emphasized above all else. Consequently, we decided to use the change in a team's probability of making the playoffs as our metric to quantify how much a team will "gain" from the addition of Hamels to its pitching staff.

The first step for doing this was to establish a baseline - that is, how do we expect each major league team to perform in 2015 without the addition of Hamels? In order to do this, we compiled offensive WAR, pitching WAR, and UZR for every team in every year from 2002 to 2014 and created a simple linear regression model trained on that dataset to predict the number of wins for each team given those WAR values. It turned out that UZR was essentially negligible compared to hitting WAR and pitching WAR and was thus ignored, giving us our final equation:

pred. wins = (A₁ * pWAR) + (A₂ * hWAR) + A₃

A₁ = 1.01950018

A₂ = 0.95217129

A₃ = 48.2776769

One note on this model: both of the weights are approximately equal to 1, which is what we'd expect given that WAR approximately adds up in this way to predict wins. Our model essentially gave us that with slight aberrations to account for the fact that pitching has evidently been a little more important to a team's success than hitting over the last decade-plus.

Using this model, we were able to project the 2015 MLB standings in each division and league and establish a baseline playoff picture along with win cutoffs in each league to make the playoffs.

Given that Hamels was estimated by our model to provide approximately 3.1 WAR in 2015, we isolated all of the teams that were projected to finish within around 3 wins of that cutoff. We also somewhat arbitrarily eliminated Colorado at this step because there's absolutely no way that Hamels would OK that trade and there's absolutely no way Jeff Bridich and company would deal out Hamels-type big bucks for a guy to come throw at Coors (Mike Hampton, anyone?). This left us with 15 teams to consider for our next step.

We have wins for each team now, but we needed a way to correlate this with playoff probability. To do this, we created another regression -- this time, a logistic one -- trained on data for teams in the wild-card era. Given the current four-team wild card system, we went through the data from 1995 onward and measured, for every possible win total, how many teams over that time span finished with that many wins and how many of those teams made the playoffs. We did a simple proportion after that to correlate those numbers to a "playoff probability" as a function of number of wins, which we fit to a logistic regression using R.

So for our 15 teams, we now have a way of establishing a rough playoff probability based on their predicted wins for the 2015 season. To quantify how Hamels changes that now is pretty simple. Hamels is not, and will never be, a Micah Owings at the plate, and so we can ignore his impact on hWAR and focus solely on pWAR. This is a crude way of doing it, but for each of the 15 possible trade partners, we simply took out the starter with the lowest projected WAR for the 2015 season, replaced him with Hamels, and re-calculated the pitching staff's collective pWAR. From there, we just re-calculated each team's projected wins and playoff probability and computed how much the playoff probability changed as a result of adding Hamels. This narrowed our list of teams down to just six (there was a big drop-off in the change after the sixth team).

An issue that arises is that our way of evaluating possible trade partners focuses specifically on these "fringe" teams on the edge of wild-card contention that could potentially jump into the playoffs and inherently disadvantages teams that are already good and would add Hamels as a way to turn an already playoff-lock team into a World Series-favorite team (e.g., Dodgers, Red Sox). Ultimately, those trades are probably just as likely, if not more so, than any of the trades that we considered in this case - but again, we felt that the marginal value of getting one of these teams into the playoffs would be more significant than it would for an NLDS team to get to the World Series instead. That's, again, a pretty limiting assumption. If we had more time to work on this (instead of spending a lot more time than we should have on the trade machine that's outlined below), it would probably be a major point of focus - better quantifying the value added for teams that are already in good position and looking to make themselves pennant and World Series title favorites.

Trade Partners' Needs and Assets

Now that we have narrowed down our list of teams to just six, we started narrowing down the major league rosters of each potential trade partner. To model whether or not a team would be willing to trade a player, we used a decision tree. The tree has six branches, but it all boils down to two questions: would the Phillies be willing to trade for this player and would the trade partner be willing to part with this player? The former manifests itself in age and contract situation; we figured that the Phillies would not want to trade for a player who is too old or who becomes a free agent before 2019, which is around the Phillies' target time to be competitive again.

The latter, whether the trade partner would be willing to part with the player, was evaluated using the projected WAR of the player and his backup as well as the existence of any high-level prospects at that position. A player was deemed untradeable if his fWAR projection was above that of Hamels or if the team did not have a suitable backup at that position. However, a player without a suitable backup could be deemed tradeable if there was a high-level prospect at that position. For "high-level prospects," we counted current Baseball America top-10 prospects in Double-A or Triple-A because the teams we are looking at are win-now teams who place a higher priority on present wins rather than future wins. Using this process, we pared the list of tradeable assets on each potential partner down to only those who could reasonably be traded. We used a similar model to evaluate the Phillies, but we removed the restrictions on age, contract situation, and the difference between the player's WAR and his backup's.

We also had the task of modeling each team's "positions of need." For each position on each team, we looked through that team's roster and evaluated that team's situation at that position. We assigned each position for each team two numbers: the first was the willingness of each team to trade away a player at that position, the second the desire to trade for a player at that position. This process was done heuristically; it might have been improved by tying it in some way to our WAR projection. Nevertheless, we ended up with a "needs matrix" for each of the six teams as well as the Phillies. Additionally, we created a win-weighting vector that weighted the number of projected WAR each player offered; essentially, the Phillies put more weight on wins in 2017 and 2018, while the teams we are looking at put more weight on wins in 2015 and 2016. Using all this information, including the Hamels projections, prospect projections, list of tradeable assets per team, and positional and win-now versus win-later weights, we can tie it all together using our Trade Machine.

The Trade Machine

After determining the most suitable trade partners for Hamels' services and identifying the tradeable assets on each team's roster, we are now ready to invoke the Trade Machine, a computer program we designed to simulate player exchanges between two teams.

As input, the Trade Machine takes in a list of tradeable assets for both the Phillies and a given trade partner, with fWAR projections for each player over the next four seasons. In addition, we also include position two weight matrices for each team, which indicates how much a team is willing to give away a player of a particular position and how much a club wants to take on a player of a particular position. We heuristically determined the weights on a discrete 0.5, 1.0, and 1.5 scale based on an empirical examination of each team's roster.

With our given inputs, our Trade Machine algorithm simulates all possible player exchanges between the Phillies and a trade partner up to the Phillies sending over five players (including Hamels) and receiving two in return. We chose to cut off our program at this point because we made the assumption that - on a practical level - the trade would not involve more than seven players in total.

For each simulated trade, we compute individual scores S_Aand S_Bfor each team; the individual scores are determined by adding the value of each acquired player to a given team by weighting each by the positional and win-now vs. win-later matrices. The objective of the machine was to maximize S_A+ S_B (ensures maximum utility for both teams) subject to |S_A+ S_B| less than t, where t is a threshold (ensures that a trade is fair to both teams).

(Sample trade machine output for one of the first versions of the algorithm...)

Trades

The case instructed us to come up with two different trades, one to be the first choice and one to be the second choice. We had the Trade Machine output the highest scoring trades for each team, and we selected the two with the highest trade scores. Our first trade would have the Phillies sending Hamels and Ben Revere to the Yankees for Didi Gregorius, Greg Bird, Gary Sanchez, Rob Refsnyder, and Jacob Lindgren. The second would have the Phillies trading Hamels and Darin Ruf to the Mariners in exchange for D.J. Peterson, Taijuan Walker, Ketel Marte, Patrick Kivlehan, and Edwin Diaz.

We primarily analyzed which trade was better based on their trade scores. The Phillies-Yankees trade had a total trade score of 5.476. Compared to the score of the Phillies-Mariners trade, 4.114, the Yankees trade seemed like it was clearly better. Our Trade Machine loved the Yankees as a trade partner for the Phillies; in fact, each of the top 30 or so Yankees trades outputted by our Trade Machine scored higher than the highest trade by another team, which was our Mariners deal. It is primarily because of this disparity in Trade Machine scores that we selected the Yankees trade as our top trade and the Mariners one as second. However, there were other factors that went into our decision.

In addition to the quantitative advantage the Yankees had over any other trade partner, there were a few qualitative advantages they had over the Mariners in being a trade partner for the Phillies. First of all, according to Cot's Baseball Contracts, the Yankees are currently about $14 million below their payroll at the 2014 season's end, while the Mariners are a little over their end-of-season payroll. Thus, the Yankees seem to be in a better position to take on Hamels' contract.

Additionally, to abate concerns about the Yankees trading away what seems to be their shortstop of the present and future in Gregorius, the Yankees' number three prospect per Baseball America is a shortstop named Jorge Mateo. While he played in rookie league last year, which precluded him from factoring into our decision tree model, his existence within the Yankee system is still a factor that would make the Yankees trading Gregorius an easier pill to swallow.

Finally, the Mariners already have one of the top pitching rotations in MLB; in fact, Walker, who was a three-year top-twenty prospect per Baseball America before losing rookie eligibility in 2014, is their fifth starter. Thus, the marginal value of adding another great pitcher to the Mariners' rotation seems to be less than that of adding Hamels to the Yankees' weaker rotation. For these qualitative reasons, as well as the trade machine outputs, we placed the Yankees-Phillies trade as our top trade, relegating the Yankees-Mariners trade to number two.

Risk Analysis

First, in terms of our prospect projections, the risk was implicit within the calculations. Our prospect database included prospects that flopped as well as those that succeeded, so the major-league statistics of those comparable players were included within the average we took. Second, our results for all of the MLB fWAR projections through the 2018 season matched our intuition. Younger players at lower levels of the minor leagues are "riskier" and thus tougher to project, while current MLB players have more accurate, higher projections because the level of "risk" is lower.

One other area where we considered risk was in the first part for Hamels' age curve. It has "worst case" and "best case" error bars accounting for the magnitude of potential variability from the general trend curve.

Something we noticed was that the WAR method for predicting wins is generally accurate for middle-of-the-pack (i.e., 75-90 wins) teams but sometimes does a poor job of predicting higher and lower extremes. However, this error is accounted for because we considered potential trade partners only those middle-of the-pack teams that had the most to gain from adding a margin of three or more wins to their 2015 projections.

Lastly, we "eye tested" all trade machine outputs for sanity, making sure that the results that our trade machine spit out were feasible in real life.

Suggestions for Improvement

Simply put, there is a disconnect between our minor-league and major-league projection systems. We would ideally like to develop a universal projection method or find a scaling between the two.

Our Trade Machine program could also be more robust and simulate more than two-team trades. It would be interesting to consider three-team trades, incorporate cash considerations, and more. We could also correlate trade scores with monetary values to determine a more tangible cost associated with roster transactions.

. . .

The graduate division winning case was presented on BtBS here.

All statistics courtesy of FanGraphs.

Jordan Wallach is a Contributor for Beyond the Box Score. You can follow him on Twitter at @jwallach12.