Based on years of watching television, I've learned that arriving to a party exactly on time is uncool. I'm not sure who started that social norm, but it seems to have taken hold. As a result, people arrive late to things that don't have a formal beginning. One such thing is the sabermetric appreciation of baseball. If you grew up reading Bill James at your father's knee, you're unusual. Most people show up halfway through the party and sometimes when you show up late, you don't know where to put your coat and you're too embarrassed to ask. You wander around the house opening doors hoping to find a pile of coats. Sometimes you find it, but sometimes you give up and just wear your coat and pretend you're cold.
At Beyond the Box Score, we'd like to make an effort to help you find the coat room. If you're well-versed in advanced stats, this post isn't going to tell you anything you don't already know. However, if you're curious about how we do things in the modern age but aren't really sure where to start, this might help. You'll notice the format and especially the introduction of this post look very familiar. That's because you may have already read a similar post on how to evaluate a hitter, sabermetrically.
Oftentimes, people ask a FanGraphs writer or a well-established stathead where to start and their typical answer is to start with The Book or to click around in the FanGraphs library. While both of those contain excellent information, they're sort of backwards. The reader has to decide they want to know what Fielding Independent Pitching (FIP) is and then learn about it. What if you don't know where to start? First, you need to know what questions to ask. Then, you move on to learning about the metrics we have which answer those questions. We use numbers in baseball to tell a story and analyze what we see. You can love baseball without ever looking at a single statistic, but if you're going to look at a number or many numbers, looking at the ones that tell the most accurate story is ideal. We all want to understand and appreciate the game.
The rest of this post will walk you through the process of evaluating a pitcher using sabermetric thinking and stats. If you want to know more about the game, but don't know where to start, hopefully this will send you on your way!
So you've arrived at the decision that you'd like to evaluate a major league pitcher and you'd like to see if you can do so using the the best tools available to the public. In the future, we'll talk defense, baserunning, etc, but for now we'll walk through the process of measuring performance on the mound. Let's choose an example for later! James Shields.
So the first thing we're going to do is head over to his FanGraphs page. FanGraphs is one of the leading sabermetric websites which houses many of the most important stats. Baseball-Reference and Baseball Prospectus are also terrific, but if you're new to the game, I recommend FanGraphs because it's easier to navigate, in my opinion.
There's a lot of information on that page (the top of which appears above), and many links to other pages, so in order to focus our discussion, let's set out two important questions that we want to answer. How does James Shields' 2014 season compare to his previous seasons and how good of a pitcher is James Shields?
Credit: Denis Poroy/Getty Images
If you've spent any time around baseball, the concept of wins, losses, saves, innings pitched, and earned run average (ERA) aren't going to be new and you'll find those on Shields' main FanGraphs dashboard. But what we really want to do here is think through an evaluation in the most logical way possible. We want our questions to guide us to a statistic, we don't want statistics to guide us to a question.
So let's think first about what pitchers are asked to do. Baseball is about outscoring your opponent and pitchers are partially responsible for the run prevention side of the equation. It's their job to make sure the fewest possible runs score, but they are also out there with eight other players, so you can't simply look at their runs allowed and be finished.
We can start by dividing aspects of run prevention into two categories: those that the pitcher controls almost entirely and those in which his defense plays a major role. Pitchers have almost complete control over strikeouts, walks, and home runs and have much less control over hits because those are conditional on the quality of the defense and some degree of luck. A strikeout is a good outcome and walks and home runs are bad outcomes. Pitchers should be held accountable for those.
Hits, on the other hand, are partially dependent on factors outside the pitcher's control. How many times have you watched a game when a pitcher throws a perfect pitch and gets the batter to hit a weak ground ball that somehow gets just past the second baseman? The pitcher did his job. The fielder didn't. This goes beyond errors (which are poorly assigned anyway) and centers on the play not made. Imagine two pitchers facing the exact same batter in the exact same situation who throw the exact same pitch. Now picture one with Torii Hunter in right field and one with Jason Heyward. If the ball is heading into the right centerfield gap, Heyward has a way better chance of grabbing the ball than Hunter does.
So we know that pitchers aren't universally responsible for hits allowed. They play a role, but the quality of their defense, along with simple dumb luck, factor into the number of hits allowed and we don't want to evaluate a pitcher's performance based on the defense and luck. This leads us to Defense Independent Pitching Statistics (DIPS), and the most well-known among them is Fielding Independent Pitching (FIP).
FIP takes a pitcher's strikeouts, walks, hit batters, and home runs allowed per inning and generates a number that looks exactly like ERA and can be read the same way. You can essentially think about FIP as a pitcher's ERA if that pitcher had received league average defense and league average luck. In fact, FIP is actually a better predictor of future ERA than current ERA which tells us that the difference between ERA and FIP at a given time is at least partially due to things outside of the pitcher's control.
The great part about FIP is that you don't have to learn a new scale. You read it exactly like ERA meaning that a 2.80 FIP is just about as good as you think a 2.80 ERA is. It's easy and it tells you a whole lot about what a pitcher does without crediting or penalizing them for things they cannot control.
So pitcher's don't control hits at all?
Of course pitchers play a role in hits allowed, but let's work through this a little bit. A pitcher controls the rate at which they allow the ball to be put in play. There's no argument about that. If you don't strike batters out, you create a situation in which you're more likely to allow hits. On average, a pitcher is going to allow about 30% of the balls that are put into play (read: not strikeouts, walks, or home runs) to fall for a hit. The problem is that based on luck and defensive performance that can sometimes take 500 innings to even out. If you're watching every at bat, you know when a batter crushed a ball and when one barely found a hole, but you aren't going to remember every one, you're going to make subjective judgments, and you certainly didn't watch every swing in the league during a season. Batted ball velocity numbers are out there, but they aren't publicly available yet, so we have to make do.
No sabermetrician would argue that FIP captures everything about pitching, which leads me to my advice to newly curious fans. Start with FIP, but don't stop there. FIP tells you how a pitcher is doing based on three very critical indicators of success, but there is more to the story. Some pitchers might be getting lucky or unlucky on home run balls, some might be able to have more influence on hits by generating weak contact, etc.
So what's next?
Next you want to take a peak at the components of FIP and a host of other indicators to see if what that number is telling you is reasonably accurate. Strikeout and walk rate are pretty straightforward, but you want to make sure your pitcher has faced 100 or more batters in a season before you buy into any dramatic changes in those rates. If a starter punches out 15 on Opening Day but usually averages six, don't overreact. If they're averaging 25% strikeouts over 150 innings after years of an 17% strikeout rate, you can believe it.
Home runs are a bit of a different animal because one thing we know from years of studying it is that the number of home runs a pitcher allows per fly ball fluctuates wildly in small samples. Most pitchers are going to allow about one home run for every ten fly balls, but a HR/FB% of between and 8.0 and 12.0 is going to be the long run average. If your pitcher is way outside of those bounds, you're probably due for some regression to the mean.
A pitcher plays a big role in the number of fly balls they allow, but not as big of a role in how many of them officially clear the fence. Which brings us to Expected Fielding Independent Pitching (xFIP) which is the same thing as FIP except that it calculates the number of home runs you should have allowed given the number of fly balls you allowed and a league average HR/FB%. In this sense, xFIP is a better predictor of the future than FIP but it is a worse barometer of the past. We can't ignore those home runs when evaluating a pitcher's season, but we can use xFIP to get a better sense of where that pitcher's true ability sits.
After that, we turn to Batting Average on Balls in Play (BABIP). Some pitchers have the ability to limit their BABIP, predominantly fly ball pitchers because fly balls drop for hits less often than grounders and line drives, but for the most part, most pitchers will sit near .300. If you see a pitcher allowing a higher or lower BABIP than that by a sizable margin for the first time, you're looking at a candidate for regression to the mean. In a practical sense, if you have a pitcher with a 4.20 ERA and 3.20 FIP and you see a .400 BABIP, you're going to trust the FIP. Same is true if it's a 2.00 ERA and 3.40 FIP with a.220 BABIP.
The longer you demonstrate you can keep a low BABIP, the more you start to give the pitcher credit, but those types of pitchers are reasonably rare. Another nuance is that pitchers who are good at managing the running game will tend to allow fewer runs than their FIP indicates because they can keep runners from taking extra bases.
Next, I'd recommend checking out the pitcher's batted ball profile, specifically their ground ball rate. Grounders find holes more often than fly balls, but ground balls go for extra bases very rarely. If you're able to keep the ball on the ground, you aren't going to give up as many runs, all else equal.
You care about how well the pitcher helps their team prevent runs, but that doesn't mean you can look at their earned runs allowed or total runs allowed and be done. Some portion of those runs belong to the pitcher and statistics like FIP try to estimate that value. But FIP isn't perfect and you want to add in HR/FB%, BABIP, and ground ball rate to see if there's something going on that could explain why a pitcher is responsible for under or over-performing their FIP.
Park adjusting and total value
Credit: Troy Taormina/USA Today Sports
Unsurprisingly, innings matter too. Pitchers who routinely pitch deep into games are more valuable than ones who pitch well, but don't stay out there very long, so you want to make sure you're giving more credit to guys with equal numbers but more innings. It's also important to factor in the park in which the pitcher throws. A 3.00 FIP or ERA at Coors Field is much more impressive than one at Petco Park.
Park adjusting is slightly tricky to calculate, but very easy to understand. FanGraphs houses three stats that do the heavy lifting; ERA-, FIP-, and xFIP-. The minus sign indicates that the stat is park and league adjusted, meaning that 100 is league average at a neutral park and every point lower is a percentage point better than league average. So a pitcher with a 90 FIP- has a FIP that is 10 percentage points better than average when controlling for their park.
To factor in the value of more innings, we'll turn to Wins Above Replacement or WAR. There are a couple of versions of the stat, but I'll give you the basic idea first. WAR is basically a park adjusted runs allowed or FIP scaled to the number of innings pitched. The actual calculation requires some more work, but it really is as simple as innings pitched and runs allowed or FIP.
FanGraphs' version (fWAR) uses FIP as the base. They also have a stat called RA9-WAR, which uses runs allowed per nine as the base. Baseball-Reference (rWAR) uses runs allowed as their base but also works in a control for defense after the fact. The best way to handle the variety is to look at all of them, but if you look at a pitcher's fWAR and their RA9-WAR and they're close, you're pretty much set. If they're different, go back and see if you can figure out why and if that difference is because of the pitcher or their defense. RA9-WAR treats all hits and sequencing as the pitcher's responsibility while fWAR treats them as team-dependent.
There are plenty of other statistics, like velocity data, pitch type, and plate discipline stats, but this is a primer. Now let's put it to use.
Putting it all together
So how's James Shields doing this year? Let's take a look at his numbers prior to his most recent start. This year, Shields has a 3.43 FIP, which is almost identical to his FIP over the last three seasons. The strikeouts, walks, and home runs are all changing a touch, but on balance you're looking at a pretty consistent profile. His FIP is also in line with his xFIP, so you don't have to worry much about a weird home run situation.
His BABIP is identical to his career average and very much in line with his recent past. It doesn't look like there's anything particularly different about Shields between this year and last, other than a slight uptick in ground balls, but he's been up and down with those during his entire career.
Put it this way, Shields is pitching very much like we would expect him to pitch. You'll notice his ERA is quite a bit lower than his FIP, but there's a super easy explanation. He's given up eight unearned runs. His RA9-WAR and fWAR are almost the same, and as we talked about earlier, the difference between a batted ball being classified as an error and it being classified as a play not made have nothing to do with the pitcher. You don't want to penalize him for having poor defense behind him, but you also have to remember that earned versus unearned runs don't capture the true difference between good defense and bad defense, just a random assortment of the way in which the defense performed poorly.
That 1.4 fWAR in 11 starts and 73.1 innings is a very solid number, rating out to something in the 4 fWAR range over a full season which would put him in the second tier of starting pitchers. Probably just south of the ace category, but it wouldn't be crazy to call him a lesser ace given his durability.
Again, this was a lot of information and I still think I'm leaving plenty out. There is much more to know, but hopefully this gives you a sense about how to think through an evaluation using advanced stats. This information doesn't replace scouting data or Pitchf/x type analysis, but if you're going to look into stats for pitchers, looking at K%, BB%, HR/9, BABIP, GB%, FIP, xFIP, and WAR is going to give you a much richer picture than misleading and less useful statistics like wins, saves, and ERA.
. . .
All statistics courtesy of FanGraphs.
Neil Weinberg is the Associate Managing Editor at Beyond The Box Score, a contributor to Gammons Daily, and can also be found writing enthusiastically about the Detroit Tigers at New English D. Follow @NeilWeinberg44