clock menu more-arrow no yes mobile

Filed under:

Assessing the Impact of the Instant Replay-Challenge System

This season was the first with the expanded instant replay-challenge system in major league baseball. Did it work? We can use RE24 to assess the impact.

Kelley L Cox-USA TODAY Sports

The 2014 season was the first with an instant replay system available for plays beyond verifying home runs. This season, managers could take advantage of the instant replay to challenge an umpire's call on force plays, tag plays, catch/trap plays, home runs, ground rule doubles, fan interference, and any other item listed here. Managers got one challenge per game. If they used it and the call they challenged was overturned, they got another challenge. Umpires could initiate challenges at the request of a manager, but only if the manager did not have a challenge. The goal is to get more calls right. Yet, when the system was first introduced there was concern that replay would change the game too much and that the human element was integral to baseball. Well, we made it through a whole season (with only a couple of hiccups) and now replay and the challenge system have really become just another part of the game. Here, I examine the impact of the replay/challenge system.

First, some simple numbers from this first season of expansive instant replay. There were 1265 manager challenges. Which means that there was a challenge just about every other game (1:1.92 to be exact). Surely all of those challenges made the game even slower. Well, not really. The average challenge took a little under 2 minutes to complete, which is basically like another commercial break without the commercials. So that seems fine. How did the umpires do? Of the 1265 replays only 604 (47.75%) were overturned so let's hear it for the umpires! Kidding aside, the critical aspect here is that 604 plays were now called correctly, which, without the implementation of the replay/challenge system would have been left wrong. Now we can ask the most important question: what was the value of getting these calls correct? Using our old friend RE24, we can find out.

Some of you are already quite familiar with RE24 and can skip the rest of this paragraph and the next. For those unfamiliar I suggest carrying on. The basic idea is to take a team's run expectancy in an inning after a given play, subtract their run expectancy in the inning before the play and add any runs that actually scored on the play. The difference in run expectancy is then credited to the players (hitter and pitcher) involved in the play. The run expectancy values used for the calculation have been derived for each of the 24 base-out states based on multiple years of data. The values represent the average runs scored to the end of the inning from a given base-out state. It is important to note that RE24 is not concerned with the inning or score of the game, just the number of outs and the bases occupied. So a single in the 9th inning of a blowout is treated the same as a single in the 9th inning of a tied game in this framework. An example will help clarify how this works.

Suppose we are in a situation with nobody out and a runner on first. Given this situation the batting team is expected to score 0.831 runs in the inning. The run expectancy value of 0.831 that I am using comes from the matrix here. Now imagine the batter at the plate singles to right field and the runner on first advances to third base. We find ourselves in a new situation: nobody out and runners on first and third. This situation has a different run expectancy than nobody out, runner on first. Now, the batting team is expected to score 1.798 runs in the inning. The batter and pitcher are credited with affecting the difference in run expectancy, so the batter gets 0.967 runs (the pitcher is given a corresponding -0.967). This process is continued throughout the season for every play, adding and subtracting all the changes in run expectancy and assigning them to the batters and pitchers involved to come to their final RE24 values.

What does RE24 have to do with the replay/challenge system? Suppose that in the play described above the runner that got to third base safely did so on a really close play. Close enough that the manager of the team on defense decides there is enough evidence to challenge the call and through replay it is determined that the runner was actually out. This means that rather than first and third, nobody out (1.798 RE), the game situation is now: runner at first, one out (0.489 RE). The -1.291 change in run expectancy comes as a direct result of the replay system. This is just one example of replay/challenges changing the outlook of the run expectancy in an inning.

In an effort to measure the impact of the challenge system, I used the RE24 framework to evaluate the change in run expectancy for all 604 overturned calls from the past season. I only looked at the overturned calls because these are the plays where replay has an effect. The upheld calls remain as they would without the replay/challenge system, so replay is not really adding anything other than confidence in the initial call. All of the overturned plays were pulled from the incredible Instant Replay Database at Baseball Savant. To determine the change in run expectancy resulting from the challenge, I watched and/or parsed the play-by-play account of each play. The base-out situation given the call on the field was used as the initial run expectancy, and the base-out situation resulting from the challenge (and overturning of the call) was used as the end run expectancy. Any runs scored (or taken back) were also accounted for in the calculation. In all cases the absolute difference in run expectancy was recorded.

A few plays were discarded from my final analysis because they involved record keeping, challenges of hit-by-pitches, or challenges of fair-or-foul in the outfield. These are cases where it is difficult to appropriately and consistently assess changes in run expectancy. Rather than being inconsistent and/or subjective I just removed them. After removing these plays, 569 were left in the final analysis. Using these plays, my analysis shows that the average overturned call was worth 0.65 runs. That value is roughly the equivalent of leading off an inning with a double. What's more is that over the 569 overturned calls there were 367.76 runs given correctly as a result of the replay/challenge system! The word runs is italicized in the previous sentences because my analysis is based on average run expectancy values, and not necessarily actual runs scored. Regardless, that is a tremendous effect.

So on the whole it looks like the replay/challenge system had quit the influence. How did these overturned calls impact each team? Below is a table showing the number of challenges each team made. For interest's sake I have divided the numbers into offense challenges (challenging team was batting) and defense challenges (challenging team was pitching/fielding).

Team Defense Offense Total
Cubs 17 9 26
Braves 15 9 24
Rangers 16 8 24
Royals 16 6 22
Tigers 7 15 22
Angels 11 10 21
Dodgers 14 7 21
Giants 13 8 21
Mariners 13 8 21
Rays 8 13 21
Yankees 8 13 21
Marlins 12 8 20
Red Sox 11 9 20
Indians 8 11 19
Nationals 9 10 19
Pirates 8 11 19
Astros 11 7 18
Athletics 14 4 18
D-backs 12 6 18
Twins 6 12 18
Brewers 15 2 17
Rockies 10 7 17
Mets 10 6 16
Padres 12 4 16
Phillies 11 5 16
Reds 11 5 16
Blue Jays 8 7 15
Orioles 10 5 15
White Sox 5 10 15
Cardinals 6 7 13
Grand Total 327 242 569

The Cubs challenged the most plays and the Cardinals challenged the fewest. Overall there is not very much difference between teams. I certainly would not suggest that the differences are due to poor management (or video staff). This is likely a function of the random variation in bad umpire calls. I am not entirely sure what to make of more challenges coming from the pitching/fielding team. Perhaps this is evidence of umpires using a ‘tie goes to the runner' rule when there is some doubt (i.e., bang-bang plays). But that is pure speculation. Cognitive scientists studying temporal order judgments could help sort this out. In any case, in addition to the number of challenges we can see how many runs each team accumulated through the replay/challenge system.


Looking at this figure with the table above in mind we can see that there is a pretty clear relation between number of successful challenges and runs accumulated. This is straightforward; the more challenges that are overturned in your favour, the better off you will be in terms of runs.

An important thing to consider when examining the value of the replay/challenge system at the team level is that each call overturned is beneficial for one team and detrimental to another. So while the Cubs have enjoyed 26 calls in their favor, they were certainly on the wrong end of opposing teams' supported challenges at least a few times. After all we are working with the assumption that umpiring errors will be randomly distributed among games and teams. So there is little reason to expect that one team would be on the good or bad end more than another beyond random luck. Regardless, having a call overturned against you represents value lost to the replay/challenge system. Think of the team that had their runner called out at third after replay in the example way back at the beginning of this article. Their run expectancy went down because of replay. So when we are considering value at the team level we should consider these ups and downs when determining how replay affected their season. Here is how often teams were victimized by the replay/challenge system:

Team Challenger Victim Difference
Nationals 19 11 8
Angels 21 15 6
Braves 24 18 6
Cubs 26 20 6
Giants 21 16 5
Rays 21 16 5
Yankees 21 16 5
Mets 16 12 4
Indians 19 16 3
Marlins 20 17 3
Twins 18 15 3
Athletics 18 16 2
Orioles 15 13 2
Red Sox 20 18 2
Royals 22 20 2
Cardinals 13 14 -1
Pirates 19 20 -1
D-backs 18 20 -2
Padres 16 18 -2
Phillies 16 18 -2
Astros 18 22 -4
Mariners 21 25 -4
Rockies 17 21 -4
Rangers 24 29 -5
Blue Jays 15 21 -6
Brewers 17 23 -6
Dodgers 21 27 -6
Reds 16 22 -6
Tigers 22 28 -6
White Sox 15 22 -7
Total 569 569 0

As you can see in the table above, the replay/challenge system is zero-sum. Teams will benefit to the exact amount that other teams are hurt. Ending the year with positive runs, or negative runs is really just random. Being in the positive or negative could work out differently next year. Or not. That is the way this goes. Regardless, here is a look at the winners and losers in terms of runs for the season:


The top 3 teams do not change from the previous figure, but otherwise there is quite a bit of shuffling of teams. For example, the Tigers were a top-10 team in runs gained through replay, but were unfortunate to be the victim of many challenges and end up on the losing end when everything is calculated. The scale ranges from +8 runs to -6 runs, which works out to be about 1.5 wins between top and bottom. On average teams gained or lost 0.33 wins as a result of the replay/challenge system. That is not a huge gain, but it is also not entirely trivial.


In the end, the real take home message of all of this is that the replay/challenge system is correctly assigning value on these plays. It turns out to be quite a lot of value and the process really does not take very long. What's more is that we can determine how each team fares with their challenges in terms of run expectancy. If this season is any indication, correctly challenging a lot and lucking into not being on the wrong end of opposing teams' challenges can be worth as much as 1 win.

The analysis here is based on calls that were challenged and overturned; there are likely many more that could have been overturned but were not challenged. The manager may have been out of challenges, or just hesitant to initiate a challenge for whatever reason. I cannot speak to this issue. Perhaps it will change in the future as managers become more willing to challenge, or maybe rules will be implemented that will allow for more challenges per game. Those are discussion topics for another day. For now we can be confident that the replay/challenge system is positively impacting the game. Plays are being called correctly, which reduces the likelihood that a team is awarded an unfair advantage because of the fallibility of the human element. That seems like a step in the right direction to me.

. . .

All statistics courtesy of FanGraphs , Baseball-Reference and Baseball-Savant.

Chris Teeter is a Featured Writer at Beyond the Box Score. You can follow him on Twitter at @c_mcgeets.