Filed under:

# Daily Box Score 9/23: Self-Similarity

If you buy something from an SB Nation link, Vox Media may earn a commission. See our ethics statement.

The notion of consistency, that bugaboo of thinking baseball fans everywhere, has eluded our frontal attacks for decades.

What conclusions can we draw from the fact that player's performance has been stable from year to year? Is it a mere trivial fluke or parlor trick? Or does it betray some deeper ability? Is it possible consistency bodes poorly for a player's projection?

Fractal Geometry

When I was in high school, I attended a lecture given by the mathematician Benoît Mandelbrot. I asked him a question about his seminal paper (which I definitely did not fully understand). His answer, which was short and to the point, betrayed just how poorly he thought I had understood what he had written.

Consider this your fair warning that, even in the intervening years, I have not been fully able to grok the concept of fractals. However, I sincerely hope that no one is reading this column in the hopes that I have all the answers. At best, I can offer questions. Caveat emptor.

The concept of self-similarity, as it is used to describe certain mathematical functions, is relatively straightforward. Per Wikipedia:

[A] self-similar object is exactly or approximately similar to a part of itself (i.e. the whole has the same shape as one or more of the parts). Many objects in the real world, such as coastlines, are statistically self-similar: parts of them show the same statistical properties at many scales.

In fact, it was Mandelbrot who, in the paper linked above, showed that the coastline of Britain can be described as self-similar. That is, no matter the "zoom" level, a surface will display many of the same properties. In some cases (as is common with fractals), the actual appearance itself may be similar at many different resolutions.

As an aside, I'll tell you what I asked Mandelbrot. His paper suggests that surfaces that display statistical self-similarity have fractional dimensions between 1 and 2. I asked if it would be possible for a real-world surface, like the coastline of Britain, to be a fractal. (I surmise that people have been asking him this question, which is absurd really, for decades. He must be sick of explaining it.) In any event, the answer is no.

But the reason why real-world curves aren't fractals is because they only display self-similarity over certain intervals. It is not the case that any coastline fully replicates its shape at any part of itself. But the partial self-similarity, nevertheless, is what led to much of Mandelbrot's breakthrough work with fractals.

The Fractional Dimension of Nick Blackburn

One of my favorite pastimes is looking for players who have season statistics that are uncanny in some way. For example, which players come closest to a .300/.400/.500 line? (Bobby Abreu worked better for this before this year, but he's still close at .299/.404/.492.)

As I talked to a friend the other day, he pointed out an interesting feature of Nick Blackburn's current season compared to his last. Let me demonstrate:

2008: 193.1 IP, 4.05 ERA, 4.40 FIP, 96 K, 39 BB, 23 HR, 224 H

2009: 191.2 IP, 4.18 ERA, 4.44 FIP, 89 K, 40 BB, 24 HR, 230 H

Having been thoroughly rebuked by Mandelbrot, I am definitely not suggesting that Blackburn is a fractal. That would be ridiculous

But isn't that a tad uncanny? Especially when you consider how few bats he misses, which I would have thought makes him more susceptible to variance. Of course, the fact that he had similar statistics ex post is not evidence that he was not subject to more variance ex ante. Still, weird, right?

So I went looking for other players who were their own doppelgangers. And I found a few. Let me share.

Much has been made of the fact that Adam Dunn, who currently has 38 HR, is two shy of collecting exactly 40 HR in five consecutive seasons. But that isn't the only statistic that has displayed stability:

2008: 651 PA, 40 HR, 100 RBI, 122 BB, 164 K, .236/.386/.513

2009: 623 PA, 38 HR, 103 RBI, 108 BB, 165 K, .279/.408/.556

Other than the difference in the number of singles (77 in '09 versus 59 in '08), the two lines are dead-ringers for one another.

Here's another: Shane Victorino.

2008: 627 PA, 14 HR, 58 RBI, 45 BB, 69 K, .293/.352/.447

2009: 646 PA, 10 HR, 58 RBI, 57 BB, 67 K, .297/.364/.451

Here again there is really only one substantive difference. This time, it's the number of walks.

Think I'm just cherry-picking hitters, and healthy ones at that? Nah, here's a guy who missed the same part of the season two years in a row, and his numbers STILL came out the same: John Lackey.

2008: 163.1 IP, 3.75 ERA, 4.53 FIP, 130 K, 40 BB, 26 HR, 161 H

2009: 169.1 IP, 3.56 ERA, 3.54 FIP, 135 K, 46 BB, 14 HR, 163 H

He allowed 32 doubles this year, versus 26 in 2008. If you figure that many of the extra home runs from last year have been doubles this year, he really starts to look like the same pitcher. In fact, the reason his FIP was so much higher last year is entirely because of his HR/FB rate (15.3% in '08 versus 8.0% in '09). His fly ball tendencies (34.7% in '08, 34.5% in '09) have been almost identical.

Still not close enough for you? How about the guy whose entry into the game is often likened to the sounding of taps, Mariano Rivera?

2008: 70.2 IP, 1.40 ERA, 2.03 FIP, 77 K, 6 BB, 4 HR, 41 H

2009: 61.1 IP, 1.91 ERA, 2.97 FIP, 67 K, 12 BB, 7 HR, 44 H

The difference in ER and FIP is entirely explained by the difference in HR allowed. In fact, his fly ball percentage this year has been even lower than last.

I find the consistency of all of these players to be strange and remarkable.

Conclusions?

I must confess, I am unsure what conclusions we can draw from these fractionally dimension, self-similar ballplayers. What they have in common is that, as you refine the statistical measures, and look beyond simple batting average and ERA, they look even more self-similar.

I suspect that the conclusion we can draw depends on a player's age. If the two consecutive years of similar performance come in the middle of a player's career, can we surmise that we have reached a performance maximum? If it is early in a player's career, can we count it as a bit of evidence suggesting that the previously projected development curve will be stunted? For an older player like Rivera, does it tell us something about his agelessness?

Or ought we simply assume it all to be statistical noise--ephemera of the fact that there are many ball players, and many of them will play at or near their true talent level two years in a row?

Discussion Question of the Day

This should be fun. See if you can find examples of players who are self-similar. Share their stats. I bet you can find better ones than I did. My methods, admittedly, were not scientific.

Also, I'm curious to hear what you think we can learn from this type of anomaly.