The injury zone - The Harball Times
Josh Kalk's latest on pitching injuries is a major step, and one of the most, if not the most, advanced uses of PITCHf/x data to date.
almost 3 years ago
Harry Pavlidis
16 comments
0 recs |
Comments
I don't quite understand what he's getting at.
He only looks at the last 10 pitches before a pitcher goes on the DL. That certainly isn’t enough time to “save” a pitcher from injury, especially since he has to throw those 10 pitches before you know something’s wrong.
It looks like his study does nothing more than mathematically prove that injured pitchers don’t throw as well as they do when they’re healthy.
Maybe I’ve misunderstood something? Anyone?
I was thinking the same thing
I find this interesting, but I don’t really understand how it could possibly help prevent inuries.
by lookatthosetwins on Feb 17, 2009 6:18 PM EST reply actions
This shows an empirical measure of "beyond normal fatigue" is something that is achievable.
Not there yet. I think Josh’s comments before the case study sum it up well
That said, I feel this is an important step in understanding pitcher injuries. It is clear, if you dig deep enough in the data, that pitchers who land on the DL are pitching differently from healthy pitchers. Future studies can look at things like how quickly a pitcher moves toward the injury zone and whether there is a sign of danger earlier than the very end. The separation between these samples also might be improved with better data or a more advanced neural network or other methods
by Harry Pavlidis on Feb 17, 2009 10:11 PM EST reply actions
His one conclusion:
It is clear, if you dig deep enough in the data, that pitchers who land on the DL are pitching differently from healthy pitchers.
Do you really need to dig deep into the data to know that?
by NoNameOnCard on Feb 18, 2009 10:50 AM EST up reply actions
It's assumed, I suppose. But yeah, it's nice to see the data, rather than believe everything you hear or see.
There are plenty of examples of cliches that are wrong.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Feb 18, 2009 11:12 AM EST up reply actions
Calling that cliche is a bit of a leap.
Fatigue is the only this data will be able to predict any kind of injury. Thanks to individualized mechanics and genetics, pitchers fatigue differently.
If genetics and mechanics aren’t added to the data (and I have zero ideas about making that work), there’s no way this or the future studies he suggested can be applied as any kind of predictive model.
I appreciate the work he put into the study, but it’s about as practical as using pitch limits to prevent injury. If that worked, relievers would never hurt their arms.
by NoNameOnCard on Feb 18, 2009 11:30 AM EST up reply actions
*err
“Fatigue is the only way this data will be able to predict…”
by NoNameOnCard on Feb 18, 2009 11:31 AM EST up reply actions
I just think it's cool that the data shows pre-injury pitchers are doing something differently.
And if you can spot “doing things differently” with the data, you might be able to catch “doing things differently” with the data before you can with the eye. And the earlier you catch “doing things differently”, the more you can prevent pitchers from continuing to pitch hurt or continuing to “pitch differently”.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Feb 18, 2009 11:43 AM EST up reply actions
A companion study would be needed to prove that "different" is bad.
Based on the study as presented, the results were pretty mixed. My hypothesis would be that the link between the two is tenuous at best.
by NoNameOnCard on Feb 18, 2009 12:12 PM EST up reply actions
Elaborating on this...
Such a study would need to:
-find all groups of 10 pitches that fit the fatigue criteria.
-relate these clusters with stays on the DL with consideration potentially given to number of days on the DL
-compare frequency of these clusters for “healthy” and “unhealthy” pitchers
If the frequency is just as high for the healthy group as it is for the unhealthy group, then different is not necessarily bad. It’s a lot of work, but I can’t think of any other way to prove the link.
by NoNameOnCard on Feb 18, 2009 12:29 PM EST up reply actions
Thoughts
Do you really need to dig deep into the data to know that?
You absolutely have to dig deep into the data. The only other study I am aware of that correlations anything to pitchers landing on the DL is Woolner’s work back in 2002 with pitch counts In fact, as I pointed out in the article, none of the variables I used were even weakly correlated by themselves. If you feel this is easy then I encourage you do look at whatever variable you want and do the correlation because I am very confident the result you get will be that the variable you choose is not correlated (less than 0.3 correlation).
Fatigue is the only this data will be able to predict any kind of injury. Thanks to individualized mechanics and genetics, pitchers fatigue differently.
If genetics and mechanics aren’t added to the data (and I have zero ideas about making that work), there’s no way this or the future studies he suggested can be applied as any kind of predictive model.
I appreciate the work he put into the study, but it’s about as practical as using pitch limits to prevent injury. If that worked, relievers would never hurt their arms.
The whole point of using this advanced set of PITCHf/x data is to compare pitchers to themselves when they are healthy. This takes into account things like mechanics because if a pitcher’s mechanics have been altered his release point and movement on his pitches will change. This means you can properly evaluate Sabathia to Sabathia and not to other pitchers who tire earlier like Woolner’s pitch count study.
Yes, as it currently stands this study is probably not going to be able to prevent many injuries but I strongly feel this is the right track. It already probably would do a better job than something like PAP to look at pitchers after the game to determine if they were really overworked. Also, something like PAP is completely useless for relievers. This metric, while not tested on relievers yet, has a chance of being useful. It doesn’t care about the pitcher’s pitch count so it doesn’t matter if they have thrown 20 or 120 pitches up to this point. All it cares about is how that pitcher is throwing compared to a healthy version of himself.
Based on the study as presented, the results were pretty mixed. My hypothesis would be that the link between the two is tenuous at best.
This current link is far from tenuous. Depending on where exactly you want to make the cut and define the injury zone the correlation is at least weakly correlated and often strongly correlated. It never will be 100% correlated however.
Such a study would need to:
-find all groups of 10 pitches that fit the fatigue criteria.
-relate these clusters with stays on the DL with consideration potentially given to number of days on the DL
-compare frequency of these clusters for "healthy" and "unhealthy" pitchers
If the frequency is just as high for the healthy group as it is for the unhealthy group, then different is not necessarily bad. It’s a lot of work, but I can’t think of any other way to prove the link.
I think you are overestimating the need for adding something like number of DL days. Yes it certainly might be somewhat helpful but what do you do with pitchers who get hurt and the season ends? How many days should be listed there? I think that issue alone is a good enough reason to not use something like time on the DL as a weighting factor.
The network output already says that the frequency is clearly not the same for the healthy group compared to the injured group. As the pitcher gets closer and closer to one his chances of ending up on the DL rise astronomically. If the frequency was the same then the injured pitchers curve would be exactly the same as the healthy pitcher curve only with much less statistics.
-joshkalk
by dixieflatline on Feb 18, 2009 6:12 PM EST up reply actions
I never said this is easy.
In fact, if I were arguing about how easy this is, I’d say that it’s impossible to do what you’re trying to do.
I think you’ve misunderstood my point about digging deep into the data. An injured or fatigued pitcher is going to dance around the “injury zone” because he’s fatigued or injured. You’ve defined the “injury zone” based on velocity, release point, and movement outliers. This is obvious from a pitching standpoint.
When a pitcher is fatigued or injured, outliers pop up left and right because muscles don’t function normally. This necessarily causes changes. His release point might change, his velocity can drop, etc. Math isn’t needed to prove that this happens, cause and effect is plenty by itself.
My point about mechanics and genetics is that not all pitchers are equal, but this study treats them as though they are. It doesn’t account for prior surgeries or general mechanical soundness.
Essentially what I’m saying is this: some pitchers are at far greater risk for injury at 0.5 than other pitchers are at 0.8. According to your data, there are also pitchers who get hurt at 0.0 and pitchers who don’t get hurt at 1.0.
Your study unquestionably identifies periodic mechanical inconsistencies, but these could be caused by weather, energy level, focus, or a litany of other factors that have nothing to do with a pitcher’s injury risk.
Like I said before, I appreciate the work you put into this. I don’t think it was easy, but there’s just no viable way to quantify a pitcher’s genetics or the soundness of his normal mechanics (or his “fatigued” mechanics for that matter). Unless that can be done, the search for a statistical predictor to pitching injuries is going to result in a final model with a large margin of error that relies on the same guesswork and estimation already in use.
BTW, my suggestion was to “potentially” consider the number of days on the DL. It’s a stretch to call that an overestimation of need.
by NoNameOnCard on Feb 18, 2009 7:02 PM EST up reply actions
Comments
I am not sure this point is totally clear so I want to reiterate it. Fatigued pitchers are the background sample here. Sometimes they do move into the injury zone but this is really very rare, less than 10% of the time. Injured pitchers move into the injury zone at a far greater percent, over 75%.
Yes, some pitchers who end up injured don’t enter the injury zone. It is likely this is when something just pops instead of a slow build up of strain beforehand. My guess is in these cases it will be somewhere between very hard and impossible to detect before they happen. This method isn’t full proof but here is the thing no method ever is or will be. When scientists working on a real analysis use a NN they get very similar, though generally a bit better, result. Some of their signal events end up near 0 and some of their background events end up near 1 that just is the way nature works and this is no exception. All you can ever do is say things like “If I put a cut here I get a 5 to 1 signal to background ratio”. Even if all this method can ever do is help prevent some of the pitchers with a strain building up to not go over the cliff that will be a magnificent success.
by dixieflatline on Feb 19, 2009 8:29 AM EST up reply actions
A strain is a strain, though.
If the pitchers is pitching with a strain, the strain has already occurred, and there’s about a 99.9% chance that he knows it. I know that if I’ve got a muscle strain, I don’t need a graph or statistics to tell me that – unless I think it might be worse than a strain (tear, blood clot, or whatever), then I’ll get an x-ray or MRI.
The problem is in getting injured pitchers to stop “manning up” to pitch through these injuries without looking like little girls.
If a pitcher doesn’t know he’s injured (not likely), what do you do when he pops up on the red flag list? Run a full body scan to find out where the problem is? Sit him down for a week and hope his release point goes back to normal when he picks up a ball again? Or, my favorite, Ask him where it doesn’t hurt?
Again, I’m not attacking you, I’m searching for a practical application of the ideal model. If you were the GM of the team that finally figured this out, how would you implement it?
by NoNameOnCard on Feb 19, 2009 10:47 AM EST up reply actions
Didn't you answer your own question?
The problem is in getting injured pitchers to stop "manning up" to pitch through these injuries without looking like little girls.
Using the data is a way to detect which pitchers are “manning up”. An organization knows which pitchers are risking injury if they pitch more with a strain, without having to have the pitcher admit they’re hurt.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
No.
Unless a “strain” isn’t considered an injury, the pitcher is already injured at that point.
There may be a limited benefit in the form of lie detection, but my other questions still stand. If you think your pitcher lying about how good he feels, what do you do?
Do you sit an important bullpen arm for a week because his release is slightly different? Do you let him work his way through it? Do you call him a liar and put him on the DL?
by NoNameOnCard on Feb 19, 2009 1:25 PM EST up reply actions

















