There’s a sight gag from season three of Schitt’s Creek that pops into my head whenever I’m studying Analyzing Baseball Data with R. When Alexis is studying her high school homework, there’s a shot of her highlighting every single sentence in her textbook. Alexis is, of course, a heightened level of oblivious, but my impostor syndrome makes me think I’m screwing up my studies in the same way as her. Whenever I highlight anything in the book, there’s a voice in my head that whispers, “Is that really necessary?”
Learning on your own is hard enough without crippling self-doubt, but the fear that you’re doing something wrong presents plenty of additional obstacles. Obviously, it’s harder to get started on something if you know you’re not going to enjoy it. There’s also a “boy who cried wolf” phenomenon where it becomes difficult to tell where when you actually are doing something wrong.
Like Alexis, I wasn’t studying optimally, but it wasn’t that I was highlighting too much. It’s that highlighting was the only thing I was doing. Your mileage may vary, but for me, highlighting might make going back to a text easier, but it doesn’t force me to synthesize information. For that, I need to be writing things down. The act of writing helps me memorize things in a way that highlighting doesn’t because highlighting is passive activity while writing a note is active.
Not only am I learning R, but I’m also re-learning how to learn. In some ways, the latter is just as difficult. Self-directed learning isn’t so easy when you can’t just look up how to do something on YouTube. I don’t envy the students who are suddenly having to pursue their education from home. It’s much easier to get stuck if you can’t just call your teacher over to answer a question.
Those were the two main hurdles I had to clear as I made my way through the second half of the second chapter: remembering what it was like to work from a textbook and figuring out what to do when I reached an impasse.
For the first, I think I’ve figured things out. I still struggle with much of the new terms and definitions. ABDR doesn’t include a glossary, so I might have to make one myself. Not having 100 percent mastery of the vocabulary naturally makes it tough to understand what’s being said. A sentence like “A data frame is an example of a container that contains vectors of different types and a list is a general way of storing ‘mixed’ data,” is simple enough until you remember that ‘data frame,’ ‘container,’ ‘vector,’ and ‘list’ all have specific definitions within the context of R.
That’s just going to take more familiarity with the language. Currently, I’m trying to figure out a better way to become unstuck than just going to the solutions page on the book’s website. I spent about an hour trying to figure out how to make a scatter plot of strikeout-to-walk rates against median seasons for pitchers with at least 10,000 innings with nothing but error messages saying “object not found.” Eventually, I just gave up.
Copying down the code gave me this swank looking graph, but after looking up the solution I’m still not sure what I was doing wrong. Last week plot(x, y) worked just fine, but now it suddenly has to be ggplot(data.frame, aes(x,y)) + geom_point(). Maybe the plot() function only works with containers and ggplot() is necessary when working from a data frame or CSV. I don’t know. I thought I was piping from the data frame, but maybe I wasn’t. I’m not even sure if that’s what I was supposed to do anyway.
The next chapter deals with graphics, so I’ll need to get this straight by next week. I still haven’t figured out how to add data labels.
Kenny Kelly is the managing editor of Beyond the Box Score. You can follow him on Twitter @KennyKellyWords.