clock menu more-arrow no yes

Filed under:

Learning R: Compatibility issues

For once, the constant error messages weren’t the result of user error. (Probably.)

Tampa Bay Rays v Boston Red Sox Photo by Billie Weiss/Boston Red Sox/Getty Images

After three months of learning R using Analyzing Baseball Data with R, I finally got to a chapter that deals with data from Baseball Savant. Previously, I’ve only dealt with data from Retrosheet and Sean Lahman’s database, both incredible resources in their own right. Baseball Savant though has all the sexy Statcast numbers: plate_z, plate_x, hc_x, things of that nature.

But in trying to parse those thicc .xml’s, I ran into a compatibility issue. At least I think it’s a compatibility issue. It can be hard to tell the difference between user error and mechanical malfunction because boy, there sure is a lot of user error. 90 percent of the time, my code doesn’t work because I forgot a comma or I didn’t capitalize something I was supposed to. 50 percent of those times, I don’t catch the missing comma on a second check, so I then spend 20+ minutes trying to find a problem that doesn’t exist.

Chapter 7 of ABDR begins simply enough. It asks the reader to install the pitchRx package into R. To install a package, all one has to do is type install.packages(“Package”) into the console. No comma necessary. It’s almost impossible to screw up.

Reader, I didn’t make it much further than that first step, and I’m pretty sure it wasn’t my fault. I did, however, spend about three hours trying to figure out why it wasn’t working and that went something like this:

install.packages(“pitchRX”)

R: package ‘pitchRX’ is not available. Perhaps you meant pitchRx?

install.packages(“pitchRx”)

R: Warning in install.packages : there is no package ‘XML’

Me checking installed packages: But I have xml2. That’s better right?

install.packages(“XML”)

R: package ‘XML’ is not available (for R version 3.6.3)

*5 minutes to find ‘Check for updates’ and update R Studio*

install.packages(“XML”)

R: package ‘XML’ is not available (for R version 3.6.3)

*20 minutes of Googling*

Me: Oh, I need to update R, not R Studio.

*10 minutes to figure out how to update R.*

install.packages(“XML”)

R: You got it, dude.

Me: Oh, hell yeah.

install.packages(“pitchRX”)

R: package ‘pitchRX’ is not available. Perhaps you meant pitchRx?

install.packages(“pitchRx”)

R: Sure thing, boss.

Me: Damn yeah.

db <- src_sqlite(“data/pitchrx.sqlite”, create = TRUE)

R: Roger that.

files <- c(“inning/inning_all.xml”, “inning/inning_hit.xml”, miniscoreboard.xml”, “players.xml”)

R: Right on.

scrape(start = “2016-05-01”, end = “2016-05-31”), connect = db$con, suffix = files)

R: Sure, just a second.

*A few seconds*

R: Uh, we got a problem. I couldn’t resolve the host.

*20 minutes of checking for missing commas*

*20 minutes of checking the GitHub repository*

*20 minutes of Googling*

Finally, I found this Reddit thread where someone was having a similar problem. According to this, MLBAM likely changed the location and formatting of the XML files which broke the pitchRx package. TwoTacoTuesdays suggested BaseballR as an alternative. So I went back to R Studio.

install.packages(“BaseballR”)

R: ‘BaseballR’ is not available (for R version 4.0.2)

Me: Oh, fuck off.


Kenny Kelly is the managing editor of Beyond the Box Score. You can follow him on Twitter @KennyKellyWords.