# Fun with World Series matchups

The World Series is too short to accurately assess the batter-pitcher matchups. However, we certainly can have a look at how things should play out by using available statistical tools.

The World Series has finally begun! We have a very interesting battle between wild card teams for the title. The Giants grabbed an easy win last night after their offense jumped on James Shields in the first inning and did not let up. Hopefully the rest of the series will have closer games for us to enjoy. Undoubtedly, heroes, legends, and goats will be identified, manager tactics will be dissected, and we will experience all the other narratives that have become part of postseason baseball analysis. But really we need to remember that we are examining a short series of games in which small samples and variance will rule the day. We should keep this in mind and hopefully avoid overreacting to players' performance in this series. Although, this does lead to an interesting question: how should some of these matchups play out if we had a larger sample of plate appearances?

To evaluate this idea we can use Bill James' log5, which was created to help estimate the probability that a team will win a game given their winning percentage and the winning percentage of their opponent. Luckily for the purposes here, log5 has been refined for use with batter-pitcher matchups and has been shown to do a good job of predicting outcomes for actual batter-pitcher matchups. The log5 equation takes into account the batter's skill, the pitcher's skill and the league context to calculate the expected result:

For example, we can use the log5 formula to estimate how a .310 hitter will do against a pitcher with a .290 batting average against. Assuming a league average of .250 (similar to the 2014 season), we expect the hitter to perform better than his typical average against this pitcher because the pitcher is worse than league average. And this is confirmed:

You can exchange batting average for other (better) statistics (e.g., OBP, SLG, K%, BB%) to come up with a more complete expectation for how a batter-pitcher matchup will work out. Using AVG, OBP, and SLG we can derive expected slashlines, and using the player's hit outcomes per plate appearance (i.e., 1B, 2B, 3B, HR), independently determine the likely frequency of each outcome over some number of plate appearances. Ideally the numbers used in the equation would be regressed appropriately and park adjusted, but for this work I am just using career numbers (where applicable) and the 2014 league average context. With that noted, let's have some fun with these World Series matchups.

Matchup 1: Eric Hosmer vs. Madison Bumgarner

Last night, the matchup between these two big lefties ended up with Hosmer taking an 0-3, with one really hard hit ball. Advantage: Bumgarner. But it won't always end up like that, right? Let's imagine they face each other 100 times, given their career statistics against left-handed opponents and the 2014 league context of LHB vs. LHP. Using 100 plate appearances is a small sample in the grand scheme of things, but it is just an arbitrary round number I selected in order to give hit outcome frequencies. In any case, in 100 plate appearances we expect:

• Hosmer to strike out 20 times and take 5 walks.
• Only 1 ball to leave the yard for a home run and 4 to go for doubles.
• Expected slashline: .231/.254/.332

I recognize that Hosmer is not necessarily an elite hitter (especially against left-handed pitching), so the numbers will already be low, but even with that in mind those expected numbers show the effectiveness of Bumgarner. Relative to his career line against lefties, we expect Hosmer to lose 34 points in batting average, 54 points in on-base percentage, 31 points in slugging, strike out 5% more often, and walk 4% less often when facing Bumgarner. Tough days.

Matchup 2: Wade Davis vs. Hunter Pence

Davis, the middle-man of the HDH monster the Royals deploy at the end of games, strikes out a lot of batters (5th best rate in baseball this season for relievers). Pence has a beautifully quirky swing and strikes out at the second highest rate of qualified batters on the Giants. I would love to see these two face off. Because of the game situation, we did not get to see it in Game 1. We will have to wait. Regardless, how would this matchup look over multiple plate appearances? Well, given Davis' 2014 statistics against righties (we are concerned with him only as a reliever), Pence's career statistics against right-handed pitching, and the 2014 league context for relievers, we can sort it out. Over 100 plate appearances we would expect:

• Pence to strike out 38 times (!) and walk 4 times.
• Pence to bang out a whopping 12 hits, 2 of which end as doubles.
• Expected slashline: .134/.184/.177

There is a slight issue here in that Davis did not allow a triple or home run this season, so using the log5 equation for those values means they will remain at 0. Of course this will not hold for the rest of Davis' career, but that contributes to why we see Pence with only the two doubles as extra-base hits against him. Regardless, if the Davis we saw this past season is what he will be going forward (unlikely), then we would expect Davis to dominate Pence, just as he did many other opponents this season.

Matchup 3: James Shields vs. Pablo Sandoval

A matchup of mythical beasts: Shields, the Astronaut Lion against Sandoval, the Kung-Fu Panda. Shields is known for attacking the bottom of the zone with his fastball, cutter, changeup mix. Sandoval has a reputation as a hacker who attacks pitches in or out of the zone with success. Last night, Sandoval was able to get the better of Shields in their first encounter, ripping a double down the right field line. Then Shields got Sandoval to fly out in their second battle. But that is just two plate appearances. Let's pit these two against each other as we have with the matchups above to see how things would shake out. Because Shields is closer to a league average pitcher at this point in his career, I suspect Sandoval would perform as he typically does against right-handed pitching. After running the numbers, in 100 plate appearances we expect:

• Sandoval to be a strikeout victim 13 times and walk 6 times.
• He will hit 4 home runs, 6 doubles and even a triple.
• Expected slashline: .296/.339/.503

As I suspected, we have our first matchup not dominated by pitching. The expected slashline and hit outcomes do not deviate much from Sandoval's typical performance. The Panda is an above average hitter (career 122 wRC+) and would be expected to do some damage against Shields over a series of plate appearances.

Matchup 4: Jarrod Dyson and Terrance Gore vs. Buster Posey

This is not a batter-pitcher matchup, but we can use the log5 principle to look at expected stolen base outcomes. As you may have noticed, the Royals' running game has received some attention this postseason. Seven stolen bases in one game will make that happen. Ned Yost has shown an affinity for using his speedsters late in games as pinch runners (and in Dyson's case as a defensive replacement). This tactic worked well against the defensively challenged Derek Norris (career 21.8% caught stealing rate), but against Buster Posey things will be more difficult. Now clearly, not all of the credit can go to Posey. His pitchers also have a role in controlling the run game and in helping him throw out runners, but using his CS% will work for the evaluation given here.