Recently, Crashburn Alley's Bill Baer did a piece on Game Score and consistency which found its way onto resident chief Tommy Bennett's No Pepper links. Apparently, Tommy's not a fan of Game Score, and neither am I.
I had been recently trying to come up with a way to adjust Game Score for defense and park, maybe attempt a shot at a new Game Score formula for batted balls. This seemed like an interesting side project to tackle until VEP posted this one-line gem:
I just use Game WAR.
I was floored. What an simple yet elegant concept. I decided to have some fun with the data and throw out some Game WAR numbers for the BtB readership. Here's what I did.
Methodology
I took all non-Interleague starts this season with a Game Score over 80. I decided to exclude Interleague games to eliminate the need to determine average run support in an Interleague game (in retrospect, I suppose I could have used NL/AL average runs/game in Interleague play, but I already did the work, so take that). For what it's worth, I had to remove seven games from the sample of total starts with a Game Score above 80. This left me with 121 starts, ranging from Jonathan Sanchez's no-hitter (Game Score of 98) to Randy Johnson's April 19th start against the Arizona Diamondbacks (Game Score of 80). Of the 9276 starts made outside of Interleague play this season, this supposedly represents the best 1.3% of starts this season.
For my methodology for pitcher WAR, I used FIP as the run-determining metric to separate defense from pitching, mostly because of ease. I took all the relevant FIP statistcs (strikeouts, walks, hit-by-pitches, intentional walks, and home runs) and calculated FIP, using 3.19 as the scalar to ERA (the NL should have been 3.18 and the AL should have been 3.20, but these points shouldn't matter all that much). I park-adjusted the pitcher's FIP according to Patriot's five-year regressed park factors and stuck the inputs into Pythagenpat. Since all of Patriot's factors are already adjusted to account for players playing half their games at their home park, I reversed the adjustments for this exercise by doing the following:
PFGmWAR = [(PF-1)*2] + 1
This was done to get pure run inflation/deflation values for each park.
I used NL and AL non-Interleague average runs scored as the run support component. I used a .380 win% pitcher as replacement level.
Just for fun, I also decided to look at an old question brought up by Jeff of Lookout Landing, regarding A.J. Burnett and the myth of inconsistency. I grabbed game logs for three Yankees, Burnett, Andy Pettitte, and C.C. Sabathia, during the 2009 season and calculated Game WAR. I also took a look at [Game +1 WAR - Game WAR] (difference between consecutive starts).
The Numbers
I have all of the 121 starts listed in this Google spreadsheet for your perusal, but here I'll list the top 10 Game Scores of the season with relevant data and game WAR. I'll also then list the top 10 Game WAR games with data and Game Scores.
Ranked by Game Score
Gscr Rk | GmWAR Rk | Player | Date | Opp | IP | K | BB | HR | FIP/0.92 | GmWAR | Gscr |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | Jonathan Sanchez | 7/10 | SDP | 9 | 11 | 0 | 0 | 0.81 | 0.56 | 98 |
T-2 | T-33 | Chris Carpenter | 9/7 | MIL | 9 | 10 | 2 | 0 | 1.78 | 0.44 | 93 |
T-2 | T-28 | Mark Buehrle | 7/23 | TBR | 9 | 6 | 0 | 0 | 2.02 | 0.45 | 93 |
T-4 | T-5 | Cliff Lee | 8/19 | ARI | 9 | 11 | 0 | 0 | 1.17 | 0.53 | 92 |
T-4 | T-16 | Justin Verlander | 5/8 | CLE | 9 | 11 | 2 | 0 | 1.54 | 0.49 | 92 |
T-6 | T-44 | Roy Halladay | 9/4 | NYY | 9 | 9 | 3 | 0 | 2.38 | 0.40 | 91 |
T-6 | T-22 | Cole Hamels | 9/1 | SFG | 9 | 9 | 1 | 0 | 1.66 | 0.47 | 91 |
T-6 | T-24 | Tim Lincecum | 6/29 | STL | 9 | 8 | 0 | 0 | 1.54 | 0.46 | 91 |
T-6 | T-19 | Jeff Niemann | 6/3 | KCR | 9 | 9 | 1 | 0 | 1.66 | 0.48 | 91 |
T-10 | T-37 | Carlos Zambrano | 9/25 | SFG | 9 | 8 | 1 | 0 | 1.90 | 0.43 | 90 |
Note: There were three other games listed at Game Score of 90, I just listed the first one on the list for convenience.
Ranked by Game WAR
GmWAR Rk | Gscr Rk | Player | Date | Opp | IP | K | BB | HR | FIP/0.92 | GmWAR | Gscr |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | Jonathan Sanchez | 7/10 | SDP | 9 | 11 | 0 | 0 | 0.81 | 0.56 | 98 |
T-2 | T-36 | Zack Greinke | 5/4 | CHW | 9 | 10 | 0 | 0 | 1.05 | 0.55 | 85 |
T-2 | T-36 | Tim Lincecum | 4/18 | ARI | 8 | 13 | 0 | 0 | 0.00* | 0.55 | 85 |
T-2 | T-57 | Zack Greinke | 4/18 | TEX | 9 | 10 | 0 | 0 | 1.05 | 0.55 | 83 |
T-2 | T-23 | Tim Lincecum | 7/27 | PIT | 9 | 15 | 3 | 0 | 0.93 | 0.55 | 87 |
T-6 | T-91 | Ricky Nolasco | 9/30 | ATL | 7.67 | 16 | 2 | 0 | 0.00* | 0.53 | 81 |
T-6 | T-4 | Cliff Lee | 8/19 | ARI | 9 | 11 | 0 | 0 | 1.17 | 0.53 | 92 |
T-8 | T-75 | Roy Halladay | 9/25 | SEA | 9 | 9 | 0 | 0 | 1.29 | 0.52 | 82 |
T-8 | T-19 | Zack Greinke | 4/24 | DET | 9 | 10 | 1 | 0 | 1.41 | 0.52 | 88 |
T-8 | T-14 | Wandy Rodriguez | 7/8 | PIT | 9 | 11 | 1 | 0 | 1.17 | 0.52 | 87 |
*These games actually recorded negative FIP, but that of course broke the Pythagorean equation (at least on Excel). For those games I used a FIP of 0.
What do we see in these two lists? Well first off, none of these games are actually bad. There were a few games in the overall list in which pitchers had mediocre Three True Outcomes performances but allowed few hits and runs, resulting in good Game Scores but mediocre WAR totals. But in the top 10 performances, there were no poor performances to be shown, unsurprisingly. Also, the top 10 Game WAR list is comforting because it contains a lot of names from whom we would expect to see great starts, particularly Cy Young winners Zack Greinke and Tim Lincecum. As a whole, these 121 games totaled 45.5 WAR if added up (which I know isn't really the correct method for calculating the WAR as a whole, but it likely comes close) in 1027 innings, a rate of 0.40 WAR/ 9 innings. Only Roy Halladay's start versus the Yankees came out as "average" for the sample.
That being said, there is not a whole lot of agreement between the two lists. Only two games appear in both lists, Cliff Lee's Philadelphia start versus Arizona and the overall leader, Sanchez's no-hitter against the Padres. In the 121-start sample, the R-squared of between Game Score and Game WAR was 0.26, and this should come as no surprise. Pitchers receive the full credit for all aspects of the game under Game Score, while FIP attempts to separate out defensive contributions. The strikeouts, walks/HBP, and home runs are there, but the remainder of the hits and outs are not credited to the pitcher in FIP.
When looking at the Game Score formula, I found that it weights innings pitched very heavily, as it gives bonus points for innings pitched/outs made after the fourth inning. When looking at the top 50 performances by Game Score, only 10 starts were of less than nine innings, and only one lasted less than eight. In the entire sample, 51 starts lasted fewer than nine innings. When ranked by Game WAR, 18 performances of less than nine innings made the top 50. I do think that shows some bias towards complete games that isn't reflected in the Game WAR ranking.
The home run, on the other hand, was punished more severely in the Game WAR system than the Game Score system. The first start to contain a home run when ranking by Game Score was Javier Vazquez's 6/11 start against Pittsburgh, ranked 32nd on the list. He struck out 12 and walked none in eight innings, picking up a 2-1 loss (and people think he pitched poorly!). However, when ranked by Game WAR, the first appearance of a home run comes from Gavin Floyd's 9/5 start against Boston, in which he gives up one homer, while striking out 11 and walking none in eight innings. That start is ranked 58th. Vazquez's start is ranked 63rd. While Game Score weights your average home run at maybe -2 to -3 points on a scale beginning at 50, home runs have a much more staggering weight on performance in WAR, especially at such low inning totals. Allowing one homer in nine innings of play already starts your FIP at 4.63, well below the league average of around 4.3-4.4. As a comparison, when tallying all the statistics of these games as a unit, the average game FIP was 2.02 in these 121 games.
Some more interesting notes:
- Lincecum appears three times on the list, and Greinke appears four times. As you saw in the Game WAR top 10, two of each of their starts appear, in alternating order, tied for second on the list with 0.55 WAR. Amazing seasons.
- The most victimized team was, unsurprisingly, the Pittsburgh Pirates. The team appeared ten times as opponents, being victimized for 4.1 WAR.
- The worst start, after park adjustment, belongs to Chris Carpenter, who struck out three and allowed one homer among three hits in a complete game against Cincinnati.
- Dan Haren was the only pitcher to allow more than one home run, giving up two solo jacks among 10 strikeouts and no walks in nine innings in a 7-2 win against the Chicago Cubs.
- There were four starts made in San Diego, with an average Game Score of 82 but an average Game WAR of 0.26. Unsurprisingly, there were no starts in Colorado, though i suspect after park adjustment, a few Coors Field starts would do well in the Game WAR method.
Consistency Talk
As I mentioned, I collected the game logs for three Yankees starters for 2009, Burnett, Pettitte, and Sabathia. I uploaded them here for your viewing pleasure; the WAR and Game Score details are the end of the sheet for each player.
Here are the average and standard deviations for each player.
Player | Avg GmWAR | StdDev | Avg GmWARDiff | StdDev |
---|---|---|---|---|
A.J. Burnett | 0.11 | 0.16 | 0.00 | 0.26 |
Andy Pettitte | 0.12 | 0.15 | -0.02 | 0.17 |
C.C. Sabathia | 0.18 | 0.16 | 0.00 | 0.21 |
GmWARDiff refers to the difference in GmWAR between consecutive starts. I don't really think standard deviation is the way to go to measure something like this, and Jeff Zimmerman has said that doing something that breaks down performance into buckets with assigned percentage chances of the pitcher's team winning would be more appropriate. Perhaps using Game WAR would be more beneficial in this methodology, since Game WAR is technically representative of a fraction of an actual win during a game.
Still, if you take this at face value, you can see that, as Jeff pointed out, the standard deviations between actual start values aren't all that different between the three pitchers, only one of which has a reputation as an inconsistent pitcher. If you take the standard deviation of the pitcher's start-to-start differences, Burnett does seem to have a higher value, signifying that he is "more inconsistent," though I don't think it's all that worse than Sabathia.
I think this was an interesting look, and if I get a chance (i.e. time) a little bit later, I might pick up all of the game logs for 2009 and check out FIP-WAR for each of them. I certainly think that, if we wanted to compare individual game performances, this would be a better way to go than using Game Score.