Hailing from Canada (although I am from a part of the country that is actually further south than the Seattle Mariners), I'm used to winter. It's the cold and baseball-less season.
During this time though, I like to get some of my real, favourite work in. Previously I've spent my Christmas time working on honing my writers craft, familiarizing myself with statistics and working on my own park factors.
This winter, I wanted to try something different. I've switched my interest lately to predictive measures. Sure, evaluating a player's worth over a span of time (such as wins above replacement) is extremely valuable. But, what's really fascinating to me is how sites like Steamer project a player's worth before it happens. Actually, not only is this fascinating; it's marketable. With front offices starting to catch up with each other, the next step to "Winning an Unfair Game" is fashioning ways to predict which player will succeed in a major league environment.
So, instead of just posting my findings, I thought it might be more interesting to show my process and to actually ask for feedback from you. There's a documentary film about Enron I really like entitled Enron: The Smartest Guys in the Room. I especially like it for its title because it makes me think of sample sizes -- how big was this room and how many people were actually occupants of the room? I bring this up because a platform like this makes the room much, much larger. And to wholly dismiss feedback from that large of a room, with the intelligence of our readership would be a huge mistake.
Where did the idea come from?
I was actually reading one of Scott Lindholm's articles in which he discussed his frustration with the win. And rightfully so -- it's a nuisance. There have been many strides made to measure a pitcher's performance that are more effective than wins, but no real replacement. So, at first, this kind of frustrated me. I mean, I'm not a big fan of counting statistics but, in theory, this wouldn't really be difficult to do. First you take the overall percentage of games starters were awarded a win in previous eras. Then, you take a specific modern year's Bill James Game Score data. Next you calculate which Bill James Game Score or higher is earned in the same percentage of games as wins of a previous era. Any pitcher with a Bill James Game Score higher than that arbitrary set point is given a "Bill James Win." That is to suggest, according to that Bill James Game Score, the performance was worth a "win" in a previous era. For example, if 60% of games during Koufax's era ended in a win for the starter and the 60th percentile of Bill James Game Scores last season was 74, every pitcher with a 74 would be granted a win.
This is admittedly very futile, but if you wanted a replacement to the win this does the incredibly arbitrary job of doing that.
Acknowledging that this was futile -- though perhaps a fun exercise for myself later -- I started thinking this: is there a way to actually predict a pitcher's performance against a specific lineup. As stated previously, it's nice to have the Bill James Game Score after the game, but wouldn't it be more helpful before the game started?
To recap, here's what the Bill James Game Score formula looks like:
50 + (3 x IP) + (2 x complete IP after 4) + K - (2 x H) - (4 x ER) - (2 x unER) - BB
It's not a perfect measure, but it might get a facelift of its own shortly.
I started hastily (which is my excuse for the first draft being so fallacious) just by choosing which variables I wanted to consider. First, a pitcher's streak over the last little while. I decided on xFIP over the past 12 games. Second, the pitcher's splits. I decided on wOBA vs LHH and vs RHH. To make it matter I would weigh it against the actual amount of LHH and RHH in the opposing lineup. Third, to represent the actual lineup, I had to actually take the career wOBA of that specific lineup. I divided this into two parts, the wOBA of the first four hitters and the wOBA of the fifth-through-eighth hitters. Knowing that the first set of hitters comes up to hit more than the second set of hitters, I wanted that to be represented as well. Then I wanted a luck index: the pitcher's BABIP. Lastly, the park factor. I took the BPERA that I created on my own time a few winters ago.
Without further ado, here's the first iteration of my predictive Bill James Game Score (or pBJGS for short):
pBJGS = 100 + ((18 - xFIP of previous 12 starts) x 10) - (wOBAvLHH x (60 x Number of LHH in Opposing Lineup)) - (wOBAvRHH x (60 x # of RHH in Opposing Lineup)) - (wOBA of 1-4 hitters x 50) - (wOBA of 5-8 hitters x 32.5) - (BABIP x 15) - BPERA
Please bear in mind a few things. Firstly, this is not the current formula I'm using anymore; I'm merely showing the process to be more enjoyable. I'm aware the above formula is (very) flawed so please be gentle. I do have a more effective one and I'm conducting the data collection. I will add that in a very small sample, it was effective at predicting starts by Sonny Gray and Clay Buchholz ... though woefully terrible at predicting starts by Clayton Kershaw. My hope is that, together, we can make pBJGS fully operational after two more posts. Thank you for your feedback.
. . .
All statistics courtesy of FanGraphs.
Michael Bradburn is a Featured Writer for Beyond the Box Score. You can follow him on Twitter at @mwbii. You can also reach him at firstname.lastname@example.org