Sky's SQL Tutorial, Part One (Which is Like a David Ortiz Bunting Tutorial)

When I challenged folks to write a complex SQL query using the Baseball Databank database in the last installment of Saberizing a Mac, I anticipated we'd all start at ground level and work our way up to more complex queries until we eventually figured it out.  Well, sometimes with a lofty goal, taking all those baby steps can be overwhelming and we just see one large, overwhelming step.  I decided to start from the beginning, with the help of this SQL tutorial and write a bunch of simple SQL queries, each adding just a bit of complexity.  Not only does starting simple help, but forcing myself to write the queries instead of copying them from somewhere else helped my brain process what was going on.  All the mistakes along the way were productive.

I'm going to paraphrase the tutorial I used and show the SQL queries I came up with.  My suggestion to you is to follow along, manually typing in each and every query, and modify them a bit (different tables, different stats, different qualifications, etc.) so your learning path is as different from mine as possible.

  1. SELECT, FROM, WHERE -- SELECT tells the program what data you want to use, be it names, teams, or stats.  These are field names, which you can look up in the individual tables at the top of the columns of data.  FROM tells the program which table(s) you're pulling data from.  WHERE tells the program what conditions you're setting -- you'll almost always want to pull only certain data, not all of it.

    My first query looked like this:
    SELECT playerID, yearID, HR
        FROM Batting
        WHERE HR >= 50;
    It pulls the player id code (not name, that'll come next time with JOINs), the year, and number of home runs.  The data is in the Batting table.  And I only wanted to pull instances (rows) where the home runs were at least 50.  (This is not all the 50 HR seasons, though, as I'll explain later.)
  1. Some details on WHERE -- When you want to set conditions based on numbers, the WHERE statements are easy.  With text, you have some more options.  If you want to choose a specific text string, you need to surround it in quotes and you need to put AND or OR between multiple conditions.  Like this:
    SELECT playerID, yearID, HR
        FROM Batting
        WHERE HR >= 50 AND (teamID = "SEA" OR teamID = "NYA");
    You should see a bunch of Griffey- and Ruth-related results.  You can also pick apart pieces of text strings using LIKE.  (I believe the geeks call these "regular expressions".)  Let's say you only want to use players whose last names start with "R".  Then you'd do this:
    SELECT playerID, yearID, HR
        FROM Batting
        WHERE HR > 40 AND playerID LIKE 'R%';
    The % sign means there could be any number of characters after the R.  If you wanted to indicate just one wildcard character, use _ (an underscore).  You can also use NOT LIKE to take the complement set, use multiple combinations of letters/%/ _, and include numbers.  For more tips, try this.

  2. ORDER BY -- Ok, nice you you have your results, but what if you want to sort them?  Use ORDER BY followed by a field name.  This makes the most sense if you order by whatever data you're pulling but I don't think it's necessary.  For example, you could sort home runs hitters by the number of doubles they hit:
    SELECT playerID, yearID, HR
        FROM Batting
        WHERE HR > 40 AND playerID LIKE 'R%'
        ORDER BY 2B Desc;
  3. Doing calculations -- When you SELECT data, some stats you want already exist, like home runs, while some don't, like OBP.  In those cases, you have to tell the program to do the calculation and provide a name.  (You can also provide a new name to a stat without doing a calculation, like renaming 2B as "Doubles".)  Something like this, although this isn't a great definition of OBP:
    SELECT playerID, yearID, HR, (H+BB+HBP)/(AB+BB+HBP) AS "OBP"
        FROM Batting
        WHERE HR > 40 AND playerID LIKE 'R%'
        ORDER BY 2B Desc;
    There are also some built-in calculations you can use, like MIN, MAX, SUM, AVG, and more.

  4. GROUP BY -- The data in the data tables is extremely granular, probably more granular than we want most of the time.  For example, in the Batting table, the same hitter in the same season could have multiple entries, even for the same team.  All "stints" (consecutive games played for the same team) are separated.  So if a player was a Giant, then an Oriole, then a Giant again, he'd have three separate lines.  Two lines would have stint listed as 1, while the second stint for the Giants would have stint listed as 2.  If you want all of a player's season data aggregated, or all of his data for the same team aggregated, you need to tell the program to do just that.  Hence, GROUP BY.  If you only group by playerID, though, it will combine all seasons.  So you need to list all the ways you want the groupings to be unique.  I'm using player and season below, but you might also want to separate performance by team:
    SELECT playerID, yearID, HR
        FROM Batting
        WHERE HR > 40 AND playerID LIKE 'R%'
        GROUP BY playerID, yearID
        ORDER BY HR Desc;

Ok, that's as far as I've gotten for now.  I just took a peak at Colin's two SQL tutorials and he has similar stuff, plus more, so go check those out.  I'll definitley continue with part two soon.

X
Log In Sign Up

forgot?
Log In Sign Up

Forgot password?

We'll email you a reset link.

If you signed up using a 3rd party account like Facebook or Twitter, please login with it instead.

Forgot password?

Try another email?

Almost done,

By becoming a registered user, you are also agreeing to our Terms and confirming that you have read our Privacy Policy.

Join Beyond the Box Score

You must be a member of Beyond the Box Score to participate.

We have our own Community Guidelines at Beyond the Box Score. You should read them.

Join Beyond the Box Score

You must be a member of Beyond the Box Score to participate.

We have our own Community Guidelines at Beyond the Box Score. You should read them.

Spinner.vc97ec6e

Authenticating

Great!

Choose an available username to complete sign up.

In order to provide our users with a better overall experience, we ask for more information from Facebook when using it to login so that we can learn more about our audience and provide you with the best possible experience. We do not store specific user data and the sharing of it is not required to login with Facebook.

tracking_pixel_9351_tracker