Understanding SQL: SELECT The Data You Want
(
Jun 27 2007 - 09:27:46 PM by
Timothy Khouri) - [
print article]
Many articles about SQL are spread too thin with "the basics" of a SELECT, INSERT, UPDATE and a DELETE. By the time your done, you know a little about everything, and nothing useful. This article will focus on the meat of SQL, getting the data you want. We'll start off with the simple, and move to the more advanced and creative.
The best way to learn how to use SQL properly is by using a real world problem and then solving it step by step. The scenario we're going to work with is an ecommerce website. Let's keep it simple and say that this site sells t-shirts, hats and mugs. They have a small customer base, and they want to try to learn more about their audience, so let's build our reports.
Our Example Schema
We need to quickly define the schema (or design) of our database. This way you'll be able to understand the queries below as we piece together our reports. We are going to have a few tables:
- Customers
- Products
- Orders
- OrderItems
Here is what each table will look like:
CREATE TABLE dbo.Customers
(
ID INT IDENTITY(1, 1) NOT NULL,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
Address VARCHAR(100) NOT NULL,
City VARCHAR(50) NOT NULL,
State CHAR(2) NOT NULL,
Zip CHAR(5) NOT NULL
)
CREATE TABLE dbo.Products
(
ID INT IDENTITY(1, 1) NOT NULL,
Name VARCHAR(50) NOT NULL,
Price DECIMAL(8,2) NOT NULL,
Description VARCHAR(1000) NOT NULL
)
CREATE TABLE dbo.Orders
(
ID INT IDENTITY(1, 1) NOT NULL,
CustomerID INT NOT NULL,
OrderTime DATETIME NOT NULL DEFAULT GETDATE()
)
CREATE TABLE dbo.OrderItems
(
ID INT IDENTITY(1, 1) NOT NULL,
OrderID INT NOT NULL,
ProductID INT NOT NULL,
Quantity INT NOT NULL
)
I realize that's a lot of SQL, and you might not be used to building tables from scratch, but it's ok if you don't understand that code above right now. Basically, the schema is there so that you can refer to it as you continue the article to see how / why we are doing things.
SELECTing Basic Data
Our first report is going to be a basic one: who are our customers. Lets examine the query that would give us a list of our customers:
SELECT * FROM dbo.Customers
There's not much to this query. If you didn't already know, the * (STAR or asterisk) tells SQL to return all of the columns in the table. Let's change this up a bit by putting the columns in the order we want.
SELECT
LastName,
FirstName,
Address,
City,
State,
Zip
FROM
dbo.Customers
So I've done two things here; 1) I put the LastName column in front of the FirstName column, and 2) I decided not to show the ID column. If you specify the columns manually, you have the power to move them around or leave them out all together. There is one problem though with this query above. Just because we put the LastName column first, doesn't mean our results are going to be in ORDER we want. So let's now change the query a bit to sort the results alphabetically by a customer's last name, then first name.
SELECT
LastName,
FirstName,
Address,
City,
State,
Zip
FROM
dbo.Customers
ORDER BY
LastName,
FirstName
Getting Some Useful Data
OK, so far this information isn't very useful. In fact, you can't really call it a report as it's just a list of your customers. So we are going to step up our query a bit. We are going to add a column that will give you the number of all the orders that each customer has ever ordered. How you might ask? Just watch:
SELECT
LastName,
FirstName,
Address,
City,
State,
Zip,
(SELECT COUNT(*) FROM dbo.Orders
WHERE CustomerID = Customers.ID) AS OrderCount
FROM
dbo.Customers
ORDER BY
LastName,
FirstName
Do you see how simple that was? We added a "subquery" right inside of our main query. We used the COUNT function which will do exactly as it sounds, count how ever many rows meet your criteria. The criteria in this case (our WHERE clause) said to count all the records in the Orders table where the CustomerID field matched the ID field in the Customers table in our main query.
If you are new to subqueries (or sub-selects), or if you've heard before that this is a performance nightmare, don't worry. This is not hard to understand, and there is no performance problem. The performance questions belong in another article, but as for trying to understand what SQL is doing, here goes.
If you are a programmer, you can think of your query as a "for loop" and the sub-select is like a function that is being called inside the loop to get a piece of data from somewhere else. But if you're not a programmer you can think of it like shopping for a car. As you look at each car you turn to the salesman next to you and ask "how much does this one cost?" SQL is basically doing the same thing. It's looking through one table, but you told it to get a piece of information from another table for each record.
JOINing Multiple Tables Together
The last piece of the puzzle that we are going to cover is JOINing two (or more) tables together to get relational data. We want to see all of the sales that were made by a customer. We want to know when the order was made, and what items were bought. To do that, we are going to join the Orders table (which has the customer's ID and the order date) and the OrderItems table (which has the actual products that were purchased).
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID
WHERE
Orders.CustomerID = 2
This is a simple query, but again, if you're not familiar with JOINs then spend some time looking it over. Let's examine the query backwards. What that query is doing is it's finding all the orders for customer "2". Then, it will find all the order items that belong to each of his orders. Finally, we tell SQL what columns to display (in our SELECT clause at the top).
The results will look something like this:
Let's extend this query a little bit more to JOIN to another table, the Products table. This will give us the friendly name of the products that were ordered instead of just the ProductID. Watch how easy it is to get a little more information in our report:
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.Quantity,
Products.Name AS ProductName,
Products.Description AS ProductDescription
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID
INNER JOIN dbo.Products
ON OrderItems.ProductID = Products.ID
WHERE
Orders.CustomerID = 2
Before I show you the result, I want to touch on something I did above. I "aliased" (or renamed) the column "Name" and called it "ProductName". This is because a column called "Name" in this report might be misleading, but ProductName let's you know exactly what to find in that column.
Here is the result:
The End - For Now
This is the end of the first segment of the "Understanding SQL" series. You should have learned something if you are new to SQL, or even if you've been using it for a little while. This article (and the next few to follow in this series) is designed to help you think like a database, and not like a human.
There is still a lot left under the topic of SELECTing (or querying) data. There are more complex joins, nested subqueries, aggregated results and so much more. We'll cover those topics in the next "Understanding SQL" article.