This is a general SQL introduction and not about specific database systems
Query builders - they are your friend, don't write SQL unless you have to! But it does help to know what's going on under the hood
What is SQL?
Structured
Query
Language
is a tool for manipuating data in
relational databases
Relational databases are everywhere
Some examples of database systems that use SQL
Microsoft Access
Microsoft SQL Server
Oracle RDBMS
IBM DB2
SAP Sypbase
MySQL
PostgreSQL
Amazon SimpleDB
Although SQL is standardised, differences do exist between systems
We will be using SQLite in this tutorial but will be using a basic subset of SQL
that is broadly universal
Why learn SQL?
In business you might want to:
Extract a subset of data from a larger dataset -
filter it to only include data you're interested in and maybe
sort it while you're at it
Transform data by performing calculations on it
Summarise complex data (e.g. computing sums or averages)
Insert new data or delete existing data from a
dataset (out of scope for today)
And of course, databases are at the heart of many mobile and web applications
Why learn SQL
What is a relational database?
A relational database is a collection of tables,
which have columns and rows
More about tables
A table is a collection of entries - each row is an entry, e.g. a customer
The columns of a table define the what information is stored for each entry - e.g. their name, address etc.
Each column has a type - it could be a number, text ('string'), date etc. Different database systems support different types
A common column found in tables (though not always) is a unique identifier for each row - e.g. 'customerId'. These are often used to link data in different tables (e.g. customerId appears in our transactions table)
Introducing our example database
Our example database has 3 tables:
Customers
Products
Sales
Example: Your first SELECT query
A SELECT query is used to get information out of a database. It's probably the most common type of query you will write
You should get output like this:
Deconstructing your first SELECT query
The start of every SQL query tells you what sort of query you are running,
in this case, it's SELECT
Next comes the list of columns that you want to see in the output, separated by
commas
The FROM clause specifies which table the columns reside in
Finally, every SQL query ends with a semicolon
Whitespace is ignored. It's good practice if you can to split your query over multiple lines to improve readability
Exercise: Simple SELECT query
Write a query that selects the transactionId and timestamp columns from the
Sales table
Filtering using a WHERE clause
Often you will only want to select rows that match a certain criteria. You can add a
WHERE clause to your SELECT query that will specify which rows to include
This example selects only customers whose countryCode is 'US':
Deconstructing your new query
You can use '*' as a shorthand to mean 'select all of the columns in the table'
instead of listing them explicitly
The WHERE clause comes after the FROM and is followed by a conditional expression. Only rows that meet the criteria will be included in the final output
Conditional expressions
Conditional expressions should be fairly familiar to you if you've used them elsewhere -
e.g. with IF() functions in Excel, although there are some differences
Exercise: Filtering with WHERE
Write a query to select all columns for transactions that occurred prior to 1-Sep-2013.
Hint: write the date as '2013-09-01'
A quick note about dates in databases
Date handling tends to vary quite a bit between different database systems
Most systems will have specialised functions that make working with dates easier - but these are out of scope for today
Sorting the results of a query with ORDER BY
You may want to order the results of a query in a certain way. You can specify this
using an ORDER BY clause in your SELECT
Understanding ORDER BY
ORDER BY comes after any WHERE clause (if there is one) and is followed by the column to sort on, and then either ASC (ascending) or DESC (descending)
If you omit the direction, ASC is assumed
You can add additional sorting criteria (to break ties), by separating them with a comma:
Exercise: sorting with ORDER BY
Write a query to select all sales occurring in the month of April 2013, sorted
so that the transaction with the largest value comes first. If two transactions have the
same value, a transaction with a lower customerId should come first
Hint: you will need to combine 2 conditions with AND to select only sales from April 2013
Expressions in SELECT statements
So far we have just been giving the database a list of columns that we want to see in
the output of our query, as-is
However, we can also ask the database to perform calculations on the data as well
Explaining expressions
Instead of simply specifying the column name, we use an expression to tell the database
how to calculate that column in our output
We can specify a label for that output column by using 'AS' after the expression with
a name
Expressions can get complex! As well as basic math, most databases come with lots of
in-built functions that can be used
Unfortuately, different database systems offer different functions, so you will need
to find out what's available for your given system
More expression examples (Access)
Calculating the months since the customer first joined (this works in Access but not SQLite)
More expression examples (Access)
You can even include 'IF' functions (again, works in Access but not SQLite)
Exercise: expressions
Write a query to retrieve all sales since June 1, 2013, and include the average unit price
for each transaction
Aggregation with GROUP BY
A common task is to summarise information from multiple rows in a table into a single row
For example, we may want to know, what
is the total value of all sales for each customer?
We can add a GROUP BY clause to our SELECT queries to achieve this
Aggregation with GROUP BY
Explaining GROUP BY
GROUP BY works a bit like Pivot Tables in Excel
In the example, we combined rows with the same
customerId together into a single row in our output. We did this by specifying the
column name 'customerId' after the GROUP BY clause
After 'SELECT' we list the columns we want in the output (as usual), but if
we list a column that does not appear after GROUP BY, we also need
to tell the database which function to use to combine values from multiple rows. In this
case we use the Sum function to tell it to add the values of salesValue together
You can specify multiple columns as part of a GROUP BY clause, to get all unique
combinations of values in those columns
Exercise: GROUP BY
Write a query to calculate the total sales and average unit sell price for every
customer & product combination
Hint: you will need to use an expression that contains 2 Sum functions to calculate the average unit sell price
Other Aggregate Functions
Sum()
Avg()
Count() - returns the number of rows
First() and Last() (not in SQLite)
Max() and Min()
Exercise: GROUP BY
Write a query that returns the following for each customer: the number of transactions they have made (call it numOfTransactions) and the value of their largest transaction (call it maxSalesValue)
JOINing tables
So far, we've only be writing queries that operate on one table at a time
However, it's very common to need to combine information from multiple tables in a database
For instance, in our sample database we have a Customers table, that holds information on each customer such as their name and location. We also have a Sales table, that describes the sales made to each customer.
But what if we need a query that selects all of the sales, but only for customers in a certain country. We don't store the location of each customer in our Sales table, we just store it once in the Customers table. We need to join the Customers table to Sales
JOINing tables
Explaining JOIN
Like a normal SELECT statement, you specify the columns you want first, however now
we also need to tell the database which table each column comes from. We do this by
specifying columns in the format: [tableAlias].[columnName]
The main table that our query is based on comes after FROM keyword, however we also
need to give the table a 'alias' ('s' in the example). We use this alias to refer to the table elsewhere in the query
Explaining JOIN (cont)
The JOIN comes next - first we say which table we want to join onto our main table, then after ON we tell the database how to match rows from the new table to the main table
We can include a WHERE clause as per usual - only now we need remember to use the table alias in our condition
Exercise: JOIN
Write a query to select transactionId, productId, the product name and quantity for each transaction that is for a product in the Widget category
JOINing multiple tables
We can also join 3 or more tables together by adding more JOIN clauses
Exercise: multiple joins
Write a query that outputs for each transaction the following: the first name of the customer, the tier of the customer, the product name and category and the quantity
Exercise: aggregating in joins
Modify your previous query so that instead of a line for every transaction, it calculates the total sales in each category for every customer
We've only scratched the surface of JOIN
We've only one covered one specific type of JOIN - called a LEFT INNER JOIN
There are also RIGHT INNER JOINs, LEFT and RIGHT OUTER JOINs, FULL OUTER JOINs, CROSS JOINs and more. Each uses different rules for combining the tables
You can also join a table to itself, or join a table to another multiple times (on different columns)
Some parting tips
Databases also have a concept of NULL values - used when there is no data. NULL doesn't always behave how you might expect it to, especially if you try to do math on it
Your database table names or columns may contain spaces of other characters in them - you may need to enclose these names in [brackets] or 'quotation marks' depending on the database system
If you're working with very large databases it can help to test your queries on a smaller subset of data - to help you find and fix mistakes quicker
Things we haven't been able to cover
You can query queries! Most db systems will let you store queries (sometimes called 'views') and let you query those queries just like you would a table
SQL is used for a lot more than reading data - it can be used to insert or delete data, create or delete or even alter tables and more
DB systems each support a lot of custom functions that can be very useful for doing analysis. Some even let you code your own functions (like using VB in Excel)
Things we haven't been able to cover
Designing robust and high-performing databases is a whole field in itself. Understanding some of the basics (e.g. indexes and how to take advantage of them) can help you to write more efficient queries
Most databases come with tools to help you analyse the performance of your queries if they're taking a long time to execute
Saying it again - query builders/GUIs are your friend. Don't write SQL if you don't have to!
We're out of time!
Here are some other resources you may want to look into: