I would like to take this chance to introduce myself. My name is Dave, and I am an Electrical Engineer by trade. I also hold a Design for Six Sigma Blackbelt (DFSS) title.
I am also a Red Sox fan to be bluntly honest but also a fan of baseball in general. Not all my topics will be on the Red Sox though.
I am doing this for the first time. I am not a sabermetric student per se, but approach analysis from a far different perspective.
In my real job, I design antennas and low noise amplifiers for high volume manufacturing using Six Sigma tools to achieve these results. I am sure that many of the readers have one of my designs on their current vehicles and that they help entertain you, as well as possibly help save some lives.
So many will ask what is Six Sigma and how can that possibly relate to the statistical analysis of baseball? In Six Sigma, we use statistics to aid in analysis and many times to identify key indicators for a process. Baseball is a process whether it is batting, pitching or building a team. Therefore, I anticipate looking more at the big picture issues that effect performance.
Now for some fun...
Many have tried to analyze a given players performance to prove that there is a streak or momentum factor. At best, most of these efforts fail to quantify what we all suspect; that player X is on a hot streak or in a slump. In the case of daily batting average, most of the results are random. It is difficult to easily identify these tends as there is a lot of noise in the data. Noise being the outcome is 0 to 1. We can look at averages for a given week or month, but what do we know about the nature of a given player?
I like to speak with graphs as they convey a whole lot more information compared to a number. In most of the cases, we can simulate a given performance by knowing the mean and standard deviation. Most baseball stats do not show the standard deviation. Stdev , for short, is an important piece of information. In addition, since we already know that a plot of daily batting performance is quite noisy, we can apply averaging to improve our signals. For simplicity, we will choose a 5 or 10-Day moving average to help filter out some of the noise. What we are looking for is clear signals in the data.
Let us examine a famous Player, George Brett. In 1980, Brett had a tremendous year and hit .390. Brett is an excellent batter but nothing in his previous years would suggest he could hit .390 and flirt with the magical .400. In fact, Brett could have hit .400 that year once we look at the data. He played in only 117 games so that may have helped and was 5 hits away from .400. Brett, unfortunately finished he season on a down note near his lower control limit which end his quest for .400
The process was to analyze 3 years of daily plate appearances and apply a 10-Day moving average to smooth out the noise. I will plot these on a control chart so we may see the upper and lower limits based on his standard deviation. A control chart displays the data, calculates the mean, and plots the Upper and Lower Limits at +- 3.0 (fixed this typo was 1.5) Stdev from the average. In theory 99.7% of the time, normal data should be inside the control limits. Once we apply the 10-day moving average, there is a tendency for the values to go outside these limits for a few days. Anytime we have many days, we have a signal. This approach can identify a change or shift in performance. Predicting that shift is not that simple as we can see.
So here's my take on what happened which makes this event even more remarkable. Brett started out in a very bad slump (for Brett). In 1980, he spent many days near or below his lower control limit (LCL) as he did in the year before. The chart says we should expect similar result it his 1979 season. The probability for Brett to remain at such a low batting average is very small assuming he is healthy. George finally breaks out and makes a run for his upper control limit (UCL). He not only breaks through the limit but also shifts his performance to a new level for 53 games. This level of performance was not seen in the previous or following year, so we can call this a unique event. The sad part is that he eventually regresses back towards the LCL. This event signifies that his performance was extraordinary based on his capabilities. If I put on my stock technical analysis cap, I can see some similarities of a depressed stock building a long base, ready for a breakout letting the limits be channel or know as breaking ice. However, I digress, and limit this to what we see and not bring in another tool for now.
Now, In doing some research on the Red Sox, I thought it might be possible to show that Kevin Millar's 2004 performance was unique and that the chance of repeating that in 2005 was very small. The point is that by all accounts, that is exactly what the manger was counting on. He knew Millar got hot in 04' but he did not realize that his hot streak was a unique event and unlikely to occur again as a long 43+ day sustained streak. The control chart clearly shows this as an extraordinary event. In 2005, Millar did experience a hyperbolic hot streak. Unfortunately, most hyperbolic events are short lived and he regressed back to the lower control limits. Still, what Millar achieved in 2004 shows that a player can get hot and he did make a run in 05 but he was not able to duplicate his performance.
After reviewing random and real data, I have determined that the process can only identify long-term strong signals. There is some utility in the method but it is not a great indicator of minor fluctuations. It does beg the question as to what caused these two examples of high achievement based on each player's ability. Is it possible to study these further and determine the cause and effects? I will leave that up to the players and their organizations. For now, I will examine some more data and report any interesting outcomes that I can validate.