Filed under:

Developing a predictive Bill James Game Score: Part 3

The (un)epic conclusion to the predictive Bill James Game Score chronicles.

I have made you wait long enough. It's time for the big reveal of the sweat of my labor. Or the fruit of my brow. Nevermind.

pBJGS = 100 + ((10 - xFIP from previous 12 appearances) x 8.33) - ((((((Pitcher's wOBA against LHH x number of LHH in opposing lineup) + (Pitcher's wOBA against RHH x number of RHH in opposing lineup)) / 8) x 1000) - 260) x 0.666) - (((wOBA of 1-4 hitters in opposing lineup x 1000) - 260) x 0.386) - (((wOBA of 1-4 hitters in opposing lineup x 1000) - 260) x 0.28) - (BABIP x 20) - BPERA

Magnificent, isn't it? And by magnificent I mean ... borderline unreadable.

What is it? It's my formula for predicting Game Score for a particular pitcher against a particular lineup. And, if you're curious about the methodology, perhaps I can interest you in my previous posts.

Part 1! Part 2!

Instead of re-capping everything, I'll quickly hit the points that made this the final* iteration.

* - DISCLAIMER: nothing is actually final about this, it will definitely be changed. Especially the way I've weighted the xFIP at the beginning which, even as a write this, I'm not 100% happy with.

There's only one truly important part that has changed from previous versions: the way I've weighted the wOBA of the opponents and the pitcher's splits. I found that the opposing lineup had very little impact on the final outcome of the formula. For instance, if Sonny Gray faced the Angels or the Phillies it was returning a predicted score with a mere five points of difference. Naturally, that wasn't nearly good enough and the formula was really just a measure of how good a pitcher was, which wouldn't have been new. By making the lineups work in more of a tiered format, we could witness much more variance.

Other than that, I made a minor tweak to the xFIP to represent a pitcher's last 12 appearances as opposed to starts. Naturally this may skew data in favour of relievers that happen to be making a start but I think I'd just caution against trying to predict that as much as possible.

Once I was happy with tinkering, it was data collection time. I somewhat arbitrarily chose all the games on June 1 to use. This is still a small sample size, but data collection is extremely laborious for this exercise, so please forgive me. I did, however, have some motive for choosing June 1. First, it was a Sunday which (usually and in this case did) meant all teams played. Next, it was deep enough into the season to allow for the pitchers are ahead of hitters argument to be rendered moot. Last, it wasn't so deep into the season that the pitchers had already made 12 starts. That meant I had to take at least one game from the previous season as a part of their xFIP trend. That's true of every pitcher on the list following except for Wade Miley who -- as you may remember -- started his season in Australia against the Dodgers. Although this was somewhat annoying, there was nothing I could do about it and, frankly, it was healthy to have some variance within the sample I chose.

Here is the table with the variables removed just for simplicity sake (BJGS is the pitcher's actual Bill James Game Score, pBJGS is the predicted value):

Predictive Bill James Game Score

Date Game Pitcher BJGS pBJGS
6/1/14 ROCKIES @ Indians Jhoulys Chacin 44 40
6/1/14 INDIANS v Rockies Josh Tomlin 58 25
6/1/14 TWINS @ Yankees Phil Hughes 72 78
6/1/14 YANKEES v Twins Chase Whitley 59 21
6/1/14 ROYALS @ Blue Jays Jeremy Guthrie 56 35
6/1/14 BLUE JAYS v Royals Mark Buehrle 72 49
6/1/14 BRAVES @ Marlins Aaron Harang 54 67
6/1/14 MARLINS v Braves Nathan Eovaldi 66 58
6/1/14 METS @ Phillies Jon Niese 63 69
6/1/14 PHILLIES v Mets Cole Hamels 61 88
6/1/14 RAYS @ Red Sox Erik Bedard 42 45
6/1/14 RED SOX v Rays Jon Lester 80 94
6/1/14 RANGERS @ Nationals Yu Darvish 82 71
6/1/14 NATIONALS v Rangers Tanner Roark 61 102
6/1/14 ORIOLES @ Astros Wei-Yin Chen 60 59
6/1/14 ASTROS v Orioles Scott Feldman 12 49
6/1/14 CUBS @ Brewers Jeff Samardzija 12 78
6/1/14 BREWERS v Cubs Kyle Lohse 87 88
6/1/14 PADRES @ White Sox Eric Stults 43 39
6/1/14 WHITE SOX v Padres Chris Sale 88 111
6/1/14 GIANTS @ Cardinals Tim Hudson 75 72
6/1/14 CARDINALS v Giants Lance Lynn 20 75
6/1/14 ANGELS @ Athletics Jered Weaver 29 63
6/1/14 ATHLETICS v Angels Sonny Gray 50 78
6/1/14 REDS @ Diamondbacks Alfredo Simon 50 57
6/1/14 DIAMONDBACKS v Reds Wade Miley 49 57
6/1/14 TIGERS @ Mariners Max Scherzer 47 93
6/1/14 MARINERS v Tigers Roenis Elias 88 49
6/1/14 PIRATES @ Dodgers Edinson Volquez 54 59
6/1/14 DODGERS v Pirates Zack Greinke 51 80

At first glance, this may seem underwhelming. But, as the author who put lots of work into this, I choose to be optimistic. Here's why you should be too:

First, it accurately predicted the team that won eight times out of 15 -- or 14 if we count the Reds/Diamondbacks game as a push. Admittedly, 57% isn't great odds. Especially when the actual Game Score gives us 14/15 games (the only one it got wrong was Harang vs. Eovaldi).

Second, its errors, although plenty, actually follow some logic. Take, for instance, my favourite example: Tanner Roark. He is basically predicted to throw a perfect game by using my formula. It's not because the formula thinks that highly of Tanner Roark; it's because the formula knew exactly how bad the Rangers lineup was and it weighted so heavily that it deemed the Rangers offense no match for the Nationals 2015 long reliever/sixth starter. Texas sent out six right-handers in their top eight hitters (Roark has a 0.263 wOBA against RHH in his career), their top four hitters had a combined 0.322 wOBA and their No. 5 - No. 8 hitters had a combined 0.288 wOBA. If you were going to predict Tanner Roark to throw a complete game shutout last season, June 1st looked like a pretty good bet. And he did pitch seven innings allowing only one run after all. Little did the formula know that Yu Darvish would take the hill for the Rangers and all-but guarantee a loss for Roark despite facing a much better lineup (0.348 wOBA of 1-4, 0.320 wOBA of 5-8).

Third, it loves Chris Sale a bit too much. But wait, there's absolutely nothing wrong with that. Who doesn't love Chris Sale?

Fourth, it came close a couple times but it actually didn't get a single game correct. Wei-Yin Chen's predicted 60 for an actual of 59 looks good. As does Kyle Lohse's 88 pBJGS on an actual 87. But you couldn't just give me one rounding error, could you, Oh Mighty Formula?

Fifth, honestly, this may seem just naturally intuitive to frequent watchers of the sport -- but the formula thought the good pitchers would do well and the bad pitchers would do poorly. Is that exactly what I sought out to do when I started this? No. I wanted a better correlation than barely over 50%. But when it makes defensible mistakes I don't think there's much logic left to blame here. Obviously the formula isn't perfect but, hey, this was a great learning process for me and, I think, it's a good start.

The next time you see anything regarding this I will predict the Opening Day games as soon as the lineups get posted, just to put it to the test. That should be fun and look out for it in the future.

Stray Thoughts

• Only two teams in this sample fielded lineups where their No. 5 - No. 8 hitters were better than their No. 1 - No. 4 hitters (by wOBA): Seattle Mariners and Cleveland Indians.
• Two pitchers hadn't made the requisite 12 previous appearances. They were Roenis Elias (11) and Chase Whitley (3).
• Roenis Elias pitched to an actual Game Score of 88 which is immaculate. It's even more immaculate considering he did it against nine right-handed bats!
• Wei-Yin Chen also did this but to a slightly-less-impressive 60 Game Score.
• The formula hates both Jeremy Guthrie and Mark Buehrle. That's what I get for using xFIP as the basis of a pitcher's success.

. . .

All statistics courtesy of FanGraphs.

Michael Bradburn is a Featured Writer for Beyond the Box Score. You can follow him on Twitter at @mwbii. You can also reach him at michaelwbii@gmail.com