Understanding SQL: Many to Many Relationships
(Aug 03 2007 - 08:18:15 PM
by Timothy Khouri
) - [print article
In the world of relational data, many-to-many relationships are one of the hardest concepts to understand and implement correctly. Quite likely the scenario will arise for a developer to decide whether to support a one-to-many or a many-to-many schema, and out of fear we crumble to an "easy" design. This article will use real world scenarios to show how and why to use a many-to-many relationship as well as how to achieve perfect performance.
There are many situations that would call for a many-to-many schema. You might have a table of "Stores" and a table of "Managers" and your company might to allow managers to switch between stores. This would require that a store could have more than one manger, and a manger could belong to more than one store (thus, many to many). Or you could consider this scenario:
You're building an application for a customer support center that takes phone calls for a cable company. When a customer calls in to complain about their fuzzy T.V. reception and also they want to upgrade their high-speed internet access, the customer service representative will need to open up two "tickets"; one for the fuzzy reception problem, and one for the high-speed upgrade.
In this scenario we are dealing with two things: phone calls and support tickets. So when the customer calls back the next day about the internet upgrade, we will want to associate that phone call with the second ticket we opened. Likewise, when they call back again the next day about the fuzzy reception, we want to associate that phone call with the first ticket we opened. Let's see how we can accomplish this many-to-many relationship in SQL. We'll learn:
- How to create tables that support a many-to-many relationship.
- What indexes should be created for the best performance.
- How to report by phone calls or by tickets (either side of the relationship).
How to Make a Many-to-Many Schema
The first piece in making our tables is to define the two entities (or tables) and then we can join them together by their IDs in a third "linker" table. For the example we are going to use we'll need to very simple tables: PhoneCalls and Tickets. We'll keep it simple so as not to get lost in the illustration, but instead to focus on the many-to-many relationship concept. Here is the schema of the two tables:
CREATE TABLE dbo.PhoneCalls
ID INT IDENTITY(1, 1) NOT NULL,
CallTime DATETIME NOT NULL DEFAULT GETDATE(),
CallerPhoneNumber CHAR(10) NOT NULL
CREATE TABLE dbo.Tickets
ID INT IDENTITY(1, 1) NOT NULL,
CreatedTime DATETIME NOT NULL DEFAULT GETDATE(),
Subject VARCHAR(250) NOT NULL,
Notes VARCHAR(8000) NOT NULL,
Completed BIT NOT NULL DEFAULT 0
If we only wanted to support one-to-one functionality we would simply put a "PhoneCallID" field in the Tickets table and make the rule that you can't have a ticket without a phone call. But instead, we want to allow multiple tickets to be created on one phone call, and we want to allow many phone calls to be associated with a single ticket. To do that we'll need to build this linker table:
CREATE TABLE dbo.PhoneCalls_Tickets
PhoneCallID INT NOT NULL,
TicketID INT NOT NULL
That's all we need to make our many-to-many schema work. First we would insert a record into the PhoneCalls table. Then, when the caller explains the situation, the customer service rep would enter a record into the Tickets table and the system would insert a link (the ID of the phone call and the ID of the ticket) into the linker table. Eventually though this table is going to be big, and will begin to perform slowly in views and reports. Let’s now look at how to make the perfect indexes for performance on both sides of the relationship:
Perfect Many-to-Many Indexes
Without a proper understanding of SQL indexes, most people would either add an arbitrary column to the linker table and make that the primary key (which doesn't help us at all), or they might make a composite primary key of PhoneCallID and TicketID. This index will help a lot; it makes sure that there are no duplicates and it gives us a lot of speed when searching or joining to the PhoneCalls table. But what if we wanted to query by the TicketID?
The problem with just one index when dealing with a many-to-many linker table is that you are only tuning performance for one side of the relationship (which makes sense: one index only helps one side). So how do we add an index that will give us the best performance possible? Well, a regular index is ok, but nothing beats a unique index in terms of speed. Here's the trick!
A linker table (a table of IDs from two other tables for the purpose of a many-to-many relationship) is always unique whether you look at it from left to right or right to left. Meaning, PhoneCallID_TicketID is unique and TicketID_PhoneCallID is also unique, so we can add another unique index to the table!
You can add the PRIMARY KEY indexes right in the CREATE TABLE statement (meaning, you don't have to use the designer). For simple cases where only one column is the primary key, you can just add the phrase "PRIMARY KEY" on the column definition line. Example:
CREATE TABLE dbo.SomeTable
ID INT IDENTITY(1, 1) PRIMARY KEY NOT NULL
For more complex PRIMARY KEYS where you want multiple columns, or you want to specify the default sort order, you would use the "CONSTRAINT" keyword. Example:
CREATE TABLE dbo.ComplexTable
FirstName VARCHAR(100) NOT NULL,
LastName VARCHAR(100) NOT NULL,
CONSTRAINT PK_ComplexTable PRIMARY KEY (LastName, FirstName DESC)
How Does Another Unique Index Help?
Well, think about your table as if it was a library filled with thousands of books. How would you find a book called "SQL Performance Tips"? Well, you would use the library's index and skip ahead to the letter "S" and continue in alphabetical order until you found your book. But what if you wanted to find a book by the author "Billy McGee"? You can't use the index that you were just in because that is sorted by book title (which doesn't help you because you only know the author's name). So you would use the library's other index, the one that is sorted by author's name.
The same is true for SQL. If you do a query like this:
SELECT * FROM dbo.PhoneCalls_Tickets WHERE PhoneCallID = 123
Then SQL will realize that it should use the PhoneCallID_TicketID index to find your results. But if you change the query only slightly to do this:
SELECT * FROM dbo.PhoneCalls_Tickets WHERE TicketID = 123
Then SQL will understand that the first index doesn't help, so it will use the TicketID_PhoneCallID index to find your results. That's how to achieve the best performance with many-to-many relationships.
How Do I JOIN These Tables Together?
When you are dealing with JOINing more than two tables in a query it can sometimes get confusing. So when you're dealing with a many-to-many query (which requires three tables), it helps to first break up your FROM/JOIN clauses and explain them out loud to yourself as you type. The JOINs should make sense to SQL and to you. Let's look at a query that would JOIN both of our tables together with the help of the linker table.
INNER JOIN dbo.PhoneCalls_Tickets ON
PhoneCalls.ID = PhoneCalls_Tickets.PhoneCallID
INNER JOIN dbo.Tickets ON
PhoneCalls_Tickets.TicketID = Tickets.ID
You can also use the linker table in your WHERE clause. An example of this would be if you wanted to see all calls associated with a ticket, or all tickets associated with a call. Here is how you would achieve those two queries:
SELECT * FROM dbo.PhoneCalls WHERE ID IN
(SELECT PhoneCallID FROM dbo.PhoneCalls_Tickets WHERE TicketID = 123)
SELECT * FROM dbo.Tickets WHERE ID IN
(SELECT TicketID FROM dbo.PhoneCalls_Tickets WHERE PhoneCallID = 123)
Now that we have examined the facts we realize that many-to-many relationships are not difficult to build, and they can perform very well when indexed properly. Also, if you name your tables appropriately, then you're queries will also be easy to read. So the next time you are designing a database schema for one of your projects, try to squeeze in some of this powerful functionality. Remember, you can easily support a one-to-many relationship with a many-to-many schema, but you can't go the other way around. So if you think you might need it, why not support it from the ground up.