Earlier last week, I got a notification on Twitter from Bryan Grosnick, the managing editor here at Beyond the Box Score. The notification dealt with the following question from Craig Glaser...
Have heard a few comments about similarity... would be interested to see if an algorithm would identify Tanaka/Kuroda as similar pitchers— Craig Glaser (@sabometrics) April 10, 2014
The reason why Bryan notified me was that one year ago, I created my version of Pitcher Similarity Scores, and then revised them last November. So with the algorithm in place, and a question answerable with the algorithm, let's see what happens.
First of all, it's important to note the obvious concern about making a Masahiro Tanaka comparison so soon: sample size. As of this writing, Tanaka has started two games and thrown 198 pitches, hardly a large body of work. I've stated before that I'd like to have a minimum of 1,000 pitches before beginning comparisons (roughly equivalent to 10 starts or an entire relief season), but we are still able to make some way too early Tanaka comparisons.
As a start, let's talk a little about Tanaka per Brooks Baseball. He relies on four pitches: a fourseam fastball (21.21% usage), a sinker (26.77%), his much-acclaimed splitter (22.73%), and a slider (19.70%). The splitter gets a lot of attention, generating 53.3% whiffs/swing, and 77.8% GB/BIP. His slider can also be an effective out pitch, however. Against righties, the slider gets 52.9% whiffs/swing and has generated 7 Ks in his first two starts. Combining that with a fastball that averages over 93 MPH with over 10" vertical movement and a sinker at 92 MPH with over 8" horizontal movement and you have an impressive arsenal.
But let's look at Tanaka, starting initially pitch-by-pitch, and then moving on to the overall comparison. Starting with...
Tanaka's fastball, as mentioned before, has some impressive characteristics. To start, there's the average 93.25 MPH, which would rank in the top 50 for starters. The aforementioned vertical movement also would rank in the top 50.
For the fourseam, Tanaka boasts some decent comparisons (According to 2013-2014 data): Jerome Williams (Fourseam similarity score 0.9622), Lance Lynn (0.9218), Craig Stammen (Yes, a reliever, but 0.9190), Julio Teheran (0.9100), and Rick Porcello (0.9012). All of these pitches have average velocity near 93 MPH, show highly similar movement (Horizontal near -5", vertical near 8"), with similar arm slot angles. Generally speaking, there are 54 pitchers (Out of 289) with reasonably similar fourseam fastballs to Tanaka (Similarity score above 0.8).
While the splitter and slider get the strikeouts, and the fastball shows the power, the sinker is Tanaka's most prevalent pitch. Nearly as fast as the fourseam (A shade under 92 MPH on average), his sinker shows more horizontal than vertical movement (-8.20" vs. 5.48").
For this pitch, Tanaka sports 22 reasonable comparisons. The most apt comparisons (Including the second-highest similarity score for any pitch) were Adam Wainwright (0.9583), Nick Tepesch (0.9407), Fausto Carmona (0.9343), Jose Veras (Another reliever, 0.9312), and Anibal Sanchez (0.9298).
The least-acclaimed weapon of Tanaka's main arsenal, the slider has some ability in its own right, especially against righties. The velocity (84.9 MPH) ranks in the top 50, while the movement comes in just outside the top 50 in both vertical and horizontal (1.50" and 2.56").
Tanaka's slider comparisons include some nice names, including Max Scherzer (0.9626), Jeff Samardzija (0.9494), Carlos Marmol (0.9450), Johnny Cueto (0.9189), and Craig Stammen again (0.9174). Here, Tanaka has 50 reasonable comparisons throughout baseball.
Might as well save the best for last. Tanaka has thrown the fifth-most splitters on the year, and among the 13 pitchers who have thrown at least 20 splitters, here's Tanaka's ranks: Velocity, first (87.6 MPH). Whiffs/swing, second (53.3% behind Danny Salazar's 57.9%). GB/BIP, tied second with Jake Odorizzi (78%, behind Brandon Morrow).
Because the splitter isn't the most common pitch (only 27 pitchers out of the 289 comparables threw the splitter), there are only five pitchers with reasonably similar splitters: Mike Pelfrey (0.9535), Hiroki Kuroda (0.9445), Yu Darvish (0.9018), Alfredo Simon (0.8502), and Kevin Gregg (0.8054).
Of course, we can compare pitch-by-pitch, but people are equally interested in how Tanaka compares overall to pitchers. Once you take into account usage, pitch sequence, and all the previously mentioned components, we get the following histogram of similarity scores.
The scores range all over, with R.A. Dickey being the least similar pitcher (I know, you're shocked). In fact, Dickey is so dissimilar to Tanaka that he registers a negative similarity score, an extreme rarity. But you'll notice the gap towards the upper end of the similarity scores. There's only one pitcher who is considered a reasonable match to Tanaka, with the only score above 0.8 at 0.8978. After taking all this into account, Tanaka's best comparison in terms of arsenal is...
...Hiroki Kuroda. According to the individual scores, Kuroda and Tanaka have reasonably similar fourseam fastballs (0.8025), sinkers (0.9055), and splitters (0.9445). Only the sliders differ, and even that isn't to far off (0.7271). Finally, here's a list of Tanaka's top-20 comparisons, with their scores for each individual pitch.
So there you are Craig. It seems that the eyes and the algorithm match up here, as Hiroki Kuroda turns up in both analyses. If only this happened more often.
. . .
Statistics courtesy of Brooks Baseball. PitchF/X data courtesy of Baseball Heat Maps
Stephen Loftus is an editor at Beyond The Box Score. You can follow him on Twitter at @stephen__loftus.