Note: the presentation reproduced below is the joint work of Tulane University Law School students Joshua Mastracci, Hamilton Wise, Jeffrey Tiedeman and Andrew Respess.
I wanted to preface our team's take on the prompt with some praise for SABR's Diamond Dollars Case Competition and Analytics Conference put on March 13-15. For those who weren't able to make it down to Phoenix, the competition itself featured nineteen teams of students getting the opportunity to pitch analytics models to a host of analytics-oriented MLB front office executives that comprised this year's panel of judges — I highly recommend the experience if you get a chance and will be back myself. Thanks especially to SABR president Vince Gennaro who developed the case and facilitated both the competition and conference.
In interviewing with a club, the GM lays out a problem to be presented to him and his key staff members in 4 days, asking you to rate and rank the top 3 "pitching assets" in the game today.
"We're interested in understanding the best assets to have on our roster today — the ones that provide the most value. Your top pitcher should be expected to generate the greatest surplus value to a team over the balance of his control years (or years under contract), beginning with the 2014 season."
The methodology behind each performance forecast was asked to be broken down into a three-step process: (1) an overall approach to evaluating pitching talent; (2) an approach to developing performance projections; and (3) an approach for translating performance into a dollar valuation—factoring in an assessment of risk in each projection's certainty.
In developing projections of surplus value, the market value of a marginal win in the free agent market was standardized at $6 million per marginal win for the 2014 season, inflated at a rate of 5% per year.
Part One — Approach to evaluating pitching talent
Skill-Interactive ERA (SIERA) components
We utilized various SIERA components for their high correlation year to year, as the metric as a whole is in our opinion the best publicly available future indicator of success. FanGraphs describes SIERA best (especially well-suited for a 20-minute presentation), explaining that while adjusted-ERA metrics like FIP and xFIP are the most accurate measures of past success, SIERA gets after why certain pitchers are more successful at limiting hits and preventing runs. We used the most bare components of SIERA (K%, BB% and GB/FB)—as high K% generates weaker contact, low HR rates, high double-play rates and arguably lower BABIP (although the ZiPS following thinks otherwise), a low BB% more obviously allows for less baserunners and lower odds of allowing runs, and a high GB/FB rate allows for easier outs and less extra base hits. Taken together, we felt the three statistics were an efficient and accurate way of evaluating pitching talent while maintaining an element of forecasting.
Pitch arsenal components
The second piece of our approach to evaluating pitching talent revolves around pitching arsenal components or pitch repertoires. Different classes of pitchers progress differently throughout the course of their careers—leading us to categorize pitchers by what they throw, how often they throw it, and production rates against by pitch type. From a statistical standpoint, we limited our analysis to primary and secondary pitches, looking at percentage thrown, velocity, K%, GB% and wRC+ against by classification. Having an idea of a pitcher's arsenal components allows them to later be fitted to an aging curve.
Lastly, evaluating talent necessarily includes pinpointing potential risk, both tangible and intangible. Because the nature of the prompt steers us towards evaluation of younger pitchers, often having pitched less than a full year in the league, probably the biggest risk up front is the lack of big league experience and the resulting small sample size. We looked into player mental makeup, performance by situational leverage, pitching mechanics and adherence to optimal pitch limits by age (seeking potential indicators of injury liability), home/road and in-division splits, and from a financial aspect the likelihood of extension before free agency. While difficult to include intangibles in statistical evaluation, some of the risk factors also served the purpose of allowing us to narrow down our initial pool of eligible players. For example, our non-inclusion of Chris Sale was due in part to his mechanics creation of too elevated an injury risk for us to be comfortable considering him a projectable long-term asset.
Part Two — Developing performance projections
Our performance projections attempted to combine each pitcher's body of work to date, comparable players and league trends, with the end goal being to fit each player to an expected statistical projection of performance through the end of club control.
Selecting comparable player set
Based on our evaluation of talent, the top selected players were matched with a group of comparable players using a similarity analysis. Each comparable player set was formed by selecting pitchers based on similarities in body type (height and weight), SIERA components (K%, BB% and GB/FB), and pitch arsenal components (type of primary/secondary pitches, percentage use, velocities), in order to align pitchers with those who possessed similar "best correlating" statistics at the same age; accordingly, the end goal of compiling comparable player groups based on factors that are best-indicators of future performance is to have a portion of the projection weighted by how similar pitchers have progressed in the past—appropriate for young pitchers with little to base forecasts on themselves (similar to both the ZiPS theory and PECOTA in a sense). The four pitchers comprising the comparable set used for each pitcher were those with the highest similarity scores, essentially, along with the similarity criteria outlined above, the four pitchers whose median base years performance were most similar to our selected player's base year performance. The base year signifies our selected pitcher's 2013 season, and for each of the comparable pitchers, his season at the age corresponding to our selected player in his 2013 season.
Because the prompt offered a set dollar value on marginal wins and wins above replacement are a good indicator of production as viewed through a lens of contribution to a club, we used our comparable player data groups to forecast both "comparative projections" and "league-weighted projections" for WAR over the course of each player's club control years. The comparative projections for each player's 2014 were made by taking the median value of the changes (∆) in WAR between the base year (matching age) and +1 year (following year) for the players in the comparable group. When a comparable player lacked a base-year (rookie year at an older age), a value was substituted by backwards forecasting WAR based on league average for pitchers at that age. This median change was applied to the selected player's base year WAR to project a +1 year (2014) figure, and by calculating the median percentage change of the comparable players' WARs year-to-year, we extended the projections through the end of club control. Maintaining age correlation allowed these projections to begin to fit starting pitcher aging curves, but with comparable player data sets of only four, the forecast lacked depth without inclusion of league trends.
For this reason, our league-weighted trend is derived from the league average percentage change in WAR from one age to the next for starting pitchers between the 2002 and 2013 seasons and weighted equally with the comparative projections to provide both depth (322 qualified starting pitchers over a 12 year period) and an inherent conservative backing to production estimates, given the inherent possibility a young, elite pitcher ultimately deviates from the paths of similar players, because of injury or otherwise, and performs closer to the average. Taken together, the comparative and league-weighted projections tightly fit the aging curve and provided insight on how production progression varies by type of pitcher, evident especially when league-weighted projections surpassed comparative estimates as the high-end projection by age (a trend most clear in power pitchers' late 20's).
Part Three — Translating performance into dollar valuation
A requisite first step in putting a dollar value on young pitchers is to project pre-arbitration salary—two years worth for projected Super Two eligible players and three years worth otherwise, building off of salaries already determined for the upcoming season. For a more comprehensive idea of how clubs determine pre-arbitration salaries, MLB Trade Rumors has published a good piece here. For players on clubs who deviated from their regular practice of paying at or slightly above the league minimum to all pre-arbitration players without significant raise, we were forced to project future pre-arbitration raises based on the percentage raise received over the league minimum, applied to the projected minimums for further pre-arbitration years (assuming 4% salary inflation), given the lack of precedent for determining if the clubs will then stay put, or continue to escalate pay after the second year of eligibility. Conversely, for clubs with apparent set payment structures between the first and second, and second and third years of pre-arb eligibility, salaries were determined by club trends when signing comparable pitchers with correlating service time—typically between a 4-6% and 7-8% raise for the last two seasons before arbitration. Lastly, players on clubs who have made their formulaic approach public were fit to the models when appropriate. Though these salaries are difficult to predict, the relatively small sums and comparatively small variation of pre-arbitration salaries make these estimations matter much less than the projections of potential salary awards in arbitration.
Because arbitration effectively operates under a much more traditional set of evaluation standards, we (begrudgingly) set aside our projected marginal win values during arbitration eligible years and looked instead to the two things that actually matter in hearings for accuracy sake: counting statistics and comparable players. As a result, we looked back at the comparable player data sets we had previously used to project marginal wins, and instead compiled the group's collective counting statistics during ages matching those of our selected player's arbitration eligible years to forecast performance in the categories that drive starting pitcher arbitration hearing (innings pitched, earned runs, hits, walks and strikeouts). The resulting forecasted counting statistics were aligned with the statistics of eligible players with correlating major league service time—finding comps in the arbitration sense: used to determine ceiling or floor arbitration value￼ on which awards are based.
Part Four — Risk assessment
While risk assessment factors are difficult to quantify and allow what we feel is a necessary "eye test," they remained important as a way of not only trimming a long list of eligible players to a more manageable number for our allotted four day window, but also as indicators of whether early career success is likely repeatable. Physical defects, such as past or current injuries, served to primarily trim the list. Had Matt Harvey not undergone Tommy John surgery this past year he likely would have been near the top of the list; similarly, it's difficult to predict future success of young pitchers who have already undergone the operation (Danny Salazar) as results have varied so wildly in the still relatively small sample (248 pitchers before excluding relievers) without any sense of factors driving post-op success. Current states of pitching repertoires (over reliance on fastballs for strikeouts, lack of/high wRC+ against offspeed pitches) can indicate more so negatively than positively future success; however, given the general lack of big league experience for our players the lack of a third pitch is less of a concern.
Perhaps the biggest piece of risk assessment lies in the number of career pitches thrown, and the corresponding increases in innings pitched per year by age. Although we would have liked to include these more so into our statistical analysis rather than only as a talking point, the notion did again serve to narrow our window of eligible players. The thought process was most recently discussed in the wake of New York's seven year, $155 million investment in Masahiro Tanaka despite having already thrown 1,315 innings through his age-24 season (and 186 innings at age 18) in Japan. The trend in pitcher forecasting, as we saw hotly debated with Stephen Strasburg, is pitching inning limits and effect on performance and longevity—with one camp pleading us to look to Greg Maddux and another agreeing with the, for lack of a better word, coddling of young pitchers through slow progression of workload. While we weren't able to get into the subject in any sort of depth, do read Tom Verducci's column from earlier this year (he keys in on two of our top three pitching assets).
The pre-arbitration and arbitration salary estimates taken together over a five or six-year window (depending on the player) provide the low value in calculating surplus value. While we have to assume (for players who haven't already received extensions buying out their arbitration years) that they will make it through to free agency without being extended, comparing projected actual salary over years of club control with market value as standardized at $6 million per marginal win (in 2014, and inflated at a rate of 5% per year) allows a surplus value to be calculated between a pitcher's market value to the club and the money they are likely to actually receive, with a range provided for market values based on disparity between our comparative projections and league-weighted projections.
A note regarding the results: for the sake of length, we'll primarily let the results speak for themselves through illustrations from our presentation, supplemented by short explanations when necessary.
#3 — Gerrit Cole (Pittsburgh)
Credit: Jonathan Dyer
Our projected third most valuable pitching asset in baseball is Pittsburgh's Gerrit Cole. The six-foot, four-inch, 235 pound right-handed starter from UCLA started 19 games in his 2013 rookie campaign (plus two playoff starts), and assuming he remains with the big league club indefinitely, will be arbitration eligible after the 2016 season (major league service time: 0.111). Cole is a clear cut power pitcher, given his three most thrown pitches are a four-seam fastball (51.1%), two-seam fastball (18.5%) and cutter (13.8%) travelling at (excluding the cutter) an average of 95.6 MPH (max: 101 MPH). His big frame/heavy fastball mold has enabled the development of an intimidating mound presence: aggressive (63.8% first-pitch strike percentage) yet controlled (0 BB in high leverage situations).
"He's as big-game as we've seen ... just watching him — I mean, enormous pitches, on the road, very big game — and I'm out there thinking, this is really impressive."
—Neil Walker on Cole's 2013 NLDS Game 2 start
Comparable player set:
The comparable player data set allowed us, as previously described, to forecast Cole's comparative projected and league-weighted projected WAR over the course of the next six years (extent of Pittsburgh club control).
Given Pittsburgh's apparent trend of following a set payment structure between the first and second, and second and third years of pre-arb eligibility (as seen with Zach Duke, Paul Maholm, Charlie Morton and Ross Ohlendorf), Cole's pre-arbitration salaries were determined by following the percentage increases the club has given when signing comparable pitchers with correlating service time. The average seasonal performances of Zambrano, Cain, Johnson and Lackey taken together to project each of Cole's potential arbitration platform years and corresponding career contributions to date allowed us to find three past cases from which we derived projected award earnings.
Building off Cole's 2014 $512,500 salary, his pre-arb earnings (projected to total over $1.6MM) in addition to his projected earnings in arbitration (roughly $25.8MM) total the nearly $27.5MM dollar figure—his projected actual earnings over the course of club control. In comparison, given the WAR forecasts and corresponding assumed $6MM market value per marginal win, Cole's actual value falls anywhere between $118.6MM (comparative projections) and $124MM (league-weighted projections)—making his surplus value to the club anywhere between the total surplus values highlighted in blue and green below.
Despite the risk of power pitchers overly reliant on fastball velocity for strikeouts more sharply declining after the age of 25 than more well-rounded pitchers, Cole remains a top-three valuable pitching asset in the league today given the disparity between the value of his short-term upside and the relatively little amount of money he projects to make before free agency.
#2 — Michael Wacha (St. Louis)
Credit: USA TODAY Sports
Our projected second most valuable pitching asset in baseball is Michael Wacha. The six-foot, six-inch, 210 pound right-handed starter from Texas A&M started nine games, appearing in 15 (plus an additional five playoff starts) in his 2013 rookie campaign, and assuming he remains with the big league club indefinitely, will be arbitration eligible after the 2016 season (major league service time: 0.062). Wacha is long and totes one of the league's best changeups (52 wRC+ against) in addition to a mid-90s fastball (93.1 MPH average, 97.6 max). Wacha was awarded the NLCS MVP after making two scoreless starts in the series (13.2 IP) and has captivated with two no-hit bids (one out away, five outs away)—earning praise around the league for both his composed demeanor on the field and his humbleness off it.
"His presence, his demeanor ... he's confident, but not disrespectful. He believes in himself, but he also believes in learning and understanding the game and listening to others ... he's not scared of failure."
—Chris Carpenter on Wacha
Comparable player set:
The comparable player data set allowed us, as previously described, to forecast Wacha's comparative projected and league-weighted projected WAR over the course of the next six years (extent of St. Louis club control).
Given St. Louis, like Pittsburgh, follows an apparent set payment structure between the first and second, and second and third years of pre-arb eligibility (as seen with Shelby Miller, Joe Kelly, Lance Lynn and Adam Wainwright) Wacha's pre-arbitration salaries were determined by following the percentage increases the club has given when signing comparable pitchers with correlating service time. The average seasonal performances of Beckett, Verlander, Garland and Scherzer taken together to project each of Wacha's potential arbitration platform years and corresponding career contributions to date allowed us to find three past cases from which we derived projected award earnings.
Building off Wacha's 2014 $510,000 salary, his pre-arb earnings (projected to total over $1.6MM) in addition to his projected earnings in arbitration (roughly $23.4MM) total the nearly $25MM dollar figure—his projected actual earnings over the course of club control. In comparison, given the WAR forecasts and corresponding assumed $6MM market value per marginal win, Wacha's actual value falls anywhere between $150.2MM (comparative projections) and $107.2MM (league-weighted projections)—making his surplus value to the club anywhere between the total surplus values highlighted in blue and green below.
Like with Cole, we run an inherent risk in basing projections on so few big league starts; however, unlike Cole, Wacha's size and developed secondary pitch indicate heightened sustainability with age. Though Wacha has yet to fully develop a second offspeed pitch (threw a curveball 5% of total pitches), his previously described coachability and desire to learn indicates it's likely to come with time—especially given the opportunity to learn from Adam Wainwright (curveball: career 20 wRC+ against). Another risk with Wacha (mentioned in Verducci's previously linked article) is the heavy inning load he was burdened with this past year due to the five additional playoff starts.
"The Cardinals began 2013 with the idea of limiting Michael Wacha to a range of 160-180 innings. Wacha was the 19th overall pick of the 2012 draft who threw 134 innings [in 2012] between his work at Texas A&M and the minors. St. Louis' ideal innings jump for Wacha was between 26 and 46 innings — including the possibility of being available if the Cardinals made the postseason."
After pitching over 30 innings in the postseason, he closed the year at 179.2 innings, a 45.2 IP jump. As Verducci aptly described: "that's a cause for concern, but not a full-scale alarm."
#1 — Jose Fernandez (Miami)
Credit: Steve Mitchell
Finally, our projected most valuable pitching asset in baseball is Miami's Jose Fernandez. The six-foot, two-inch, 240 pound Cuban right-hander started 28 games in his 2013 rookie campaign, and assuming he remains with the big league club indefinitely, will be arbitration eligible after the 2015 season (major league service time: 1.000). Fernandez has already established himself as one of the league's elite: winning the NL Rookie of the Year and finishing third in the NL Cy Young voting, capping his rookie campaign with a 2.19 ERA (but a 2.73 FIP and 3.15 SIERA). Like Wacha, Fernandez possesses a mid 90s fastball (average velocity 94.9 MPH, max 99.2 MPH) and an elite second pitch—his curveball (thrown 21.3% of the time) against which hitters posted a 4 wRC+. While some have questioned the 20-year-old's maturity (see: Brian McCann incident), don't—instead, read the backstory of his defection from Cuba and understand both his desire to play the game and the resiliency displayed in pursuit of being able to do so.
Comparable player set:
The comparable player data set allowed us, as previously described, to forecast Fernandez's comparative projected and league-weighted projected WAR over the course of the next five years (extent of Miami club control).
Unlike St. Louis and Pittsburgh, Miami has rarely deviated from paying essentially the league minimum to all pre-arbitration players without significant raise—the club gave Fernandez a 27% raise this past offseason, making the forecast of his last year of pre-arbitration salary guesswork. As a result, although Fernandez should probably be extended, we were forced to project his 2015 salary based on this year's raise as applied to next year's projected minimum (yet to be determined, based on a cost of living adjustment). The average seasonal performances of Hernandez, Greinke, Kershaw and Gallardo taken together to project each of Fernandez's potential arbitration platform years and corresponding career contributions to date allowed us to find three past cases from which we derived projected award earnings, although with somewhat less accuracy as we lacked as tight of comps (especially in his first year, where he fell between Jered Weaver and David Price) for such a potential statistical outlier as most starting pitchers this good are usually extended before later arbitration years.
Building off Fernandez's 2014 $635,000 salary, his pre-arb earnings (projected to total over $1.4MM) in addition to his projected earnings in arbitration (nearly $34MM) total the roughly $35.3MM dollar figure—his projected actual earnings over the course of club control. In comparison, given the WAR forecasts and corresponding assumed $6MM market value per marginal win, Fernandez's actual value falls anywhere between $168MM (comparative projections) and $170MM (league-weighted projections)—making his surplus value to the club anywhere between the total surplus values highlighted in blue and green below.
While Fernandez does have a full year's worth of big league experience, making him somewhat safer to project, one metric presents a red flag and questions whether he can sustain the torrid pace with which he started his career. His BABIP this past year was .242—and again, there exists two thought processes in deciphering meaning from that figure. Either (a) BABIP is out of his control and because the average is around .300 he is due for regression of biblical proportions; or (b) his stuff really is that good, and his 27.5% K rate is indicative of his league-best ability to induce weaker contact. Regardless, it likely is due to regress closer to the average for one reason or another.
To close, I wanted to thank you if you're still reading and cap our response with a brief conclusion. To answer the prompt more generally, we find basic asset value in two general elements of today's pitcher pool: (1) the less innings sunk into a pitcher's arm the better; and (2) size, mechanics, pitch repertoires and the resulting tight comparisons with successful pitchers are the best means of indicating future success, especially in the statistics that compose SIERA. Ultimately, while we usually see these talented young arms paid closer to market value in early extensions, the practice evidently isn't economical from a club's perspective—illustrated by the gross surplus value we see accumulated when pitchers are held onto throughout club control.
Feel free to share your response in the comments, we appreciate the feedback.
. . .
Joshua Mastracci is a contributor to Beyond the Box Score. You can follow him on Twitter at @joshuamastracci.