Last week I recapped the first portion of my SABR presentation on geographic bias in the amateur draft. I gave three possible reasons for the observed bias: behavioral, geographic/cultural, and structural. However my extended research for the SABR conference has lead to a possible fourth reason for a bias, scouting. Every team knows where the most players come from, but do their alignments of area scouts reflect this?
My research showed that high school hitters from baseball power states seemed to not be drafted as efficiently as they could be. It also showed the high school pitchers from power states were significantly better investments than high school pitchers from northern states.
For the original post at The Hardball Times, Alex Smith and I georeferenced the towns of all the players in our sample. To see if teams were proportionally scouting the baseball rich areas we also scraped the locations of about 500 area scouts as listed in Baseball America’s scouting directory. We presented the information in this simple map.
It was pretty generous when we wrote that this map was "hard to interpret." This map simply threw all those points together, many of them overlapping. From this map we drew the conclusion that it looked as though there were indeed clumps of scouts where they needed to be. However extending the research for SABR warranted further investigation.
For my SABR research I revisited Colby’s ArcGIS lab. With the help of Colby professor Manny Gimond, I used more advanced spatial analysis tools to investigate whether or not scouts were aligned proportionally with draftees. Instead of plotting a point for each draftee or scout, this time around I collapsed the data on city, so now if there is a town with one draftee and a town with ten draftees we can tell the difference on our maps. This change will help a lot, but we can make the map even easier to glean information from. Instead of simple point maps I created density or heat maps to more adequately show where there were large concentrations of draftees and scouts.
Here is a map of all players drafted between 1997 and 2006 in the first five rounds. This heat map has a radius of 300KM and is weighted by proximity to the center of that circle.
We can see that while players are spread throughout the country, there is a much larger concentration of players in the south. The largest concentration is in southern California, which large pockets also in Texas and the southeastern United States. This make shows that much more clearly than the original map we presented.
We can examine how this compares to the distribution of area scouts by looking at a density map with the same 300KM radius. I chose to only include area scouts (and not regional/national cross-checkers) because while listed address is not necessarily completely indicative of that scouts area responsibility, it is a pretty good proxy. This is likely not true anymore with regional and national cross-checkers. Additionally I felt that it is more instructive to look at area scouts because they are the people responsible for first identifying the talent. If we are concerned with finding all the talent in the power states, area scouts should be our focus.
The scout map appears to be pretty well aligned with the draftee map. The highest concentration of scouts is in southern California as well and they appear to be grouped in metro areas near the largest pockets of players.
However we know our eyes can deceive us. We can overlay these density maps to get at the relative distribution of draftees and scouts together. We can normalize the density of draftees and scouts to each be on a 0 to 1 scale and then subtract them from each other. The result is a density plot showing which areas are over and under scouted in relation to each other.
The map shows that the southeastern U.S., Houston area, and the outskirts of southern California appear to be relatively "under" scouted. On the other hand Dallas, Phoenix, northern California and several areas throughout the northern U.S. appear to be relatively "over" scouted.
A few caveats on what we see here. I am not suggesting that teams need to fire or reassign their area scouts in the northern U.S. While not as saturated with talent as the south, there are plenty of talented high school and college players that need eyes on them. Further, people like to live in cities. I think it is likely that many scouts are listed to live around Chicago, but cover areas spanning the entire mid-west.
In terms of "under" scouted areas we should be careful as well. Large colleges who send multiple players to the bigs could be driving the darker red areas. For instance Clemson is that large circle in northwest South Carolina. There are a lot of players here, but it is an easy trip to Clemson from Atlanta that a scout can make and see multiple guys.
This map is informative, but it is an aggregate representation of all MLB team’s area scouts. This could be smoothing over interesting patterns at the team level. I generated a few of these maps for different teams and while a few of these patterns persisted, each team had a different area where they would benefit most from adding a scout.
For the individual team maps I have expanded the search radius for the scouts to 500KM to make it a little smoother. Also after SABR more than one person recommended a wider search radius, perhaps only 300KM is not giving scouts enough credit for the travel they do.
First we will look at a team that appears to be employing their scouts quite effectively.
They have southern California and Texas covered well without conceding much elsewhere. The one weakness is in the southeastern U.S. where it looks like they could benefit from another scout in Atlanta.
Now let us look at a team that does not quite have their scouts as optimally placed throughout the country.
This team as represented here appears to have an especially strong need for an additional scout in the southeastern corridor. While the aggregate map showed an overall need for more scouts in this area, this team in particular seems to struggle to cover this section of the map. The tradeoff is that this team seems to cover California, Texas and Florida quite well.
After examining the distribution of scouts and draftees more closely we can conclude that some teams are not aligning their scouts efficiently. We can also use this tool to identify where teams have the strongest need for an additional area scout. In a fantastic piece right here at Beyond the Box Score Andrew Ball identified scouting as a potential market inefficiency. Specifically Ball noted teams would be wise to allocate more funds to their scouting department. Now we have the tools to take that finding a step further and also answer the question of where to put them.
Location data for scouts comes from the Baseball America Scouting Directory, player data courtesy of Baseball-Reference. Special thanks to Manny Gimond for his indispensable assistance in the ArcGIS lab.
Daniel Meyer is a junior Economics and Mathematical Sciences major at Colby College. You can follow him on twitter @dtrain_meyer.