SingingEels : Development Community & Resource

Login

Articles

  • ADO.NET (2)
  • ASP.NET (36)
  • LINQ (5)
  • Security (2)
  • Silverlight (3)
  • SQL (7)
  • Standards (5)
  • WCF (2)

Syndication

  • Articles RSS
  • Blogs RSS

Contribute

  • Our Authors List
  • Member Sign-Up
  • Suggestions Box

Understanding SQL: Many to Many Relationships

(Aug 03 2007 - 08:18:15 PM by Timothy Khouri) - [print article]

In the world of relational data, many-to-many relationships are one of the hardest concepts to understand and implement correctly. Quite likely the scenario will arise for a developer to decide whether to support a one-to-many or a many-to-many schema, and out of fear we crumble to an "easy" design. This article will use real world scenarios to show how and why to use a many-to-many relationship as well as how to achieve perfect performance.

There are many situations that would call for a many-to-many schema. You might have a table of "Stores" and a table of "Managers" and your company might to allow managers to switch between stores. This would require that a store could have more than one manger, and a manger could belong to more than one store (thus, many to many). Or you could consider this scenario:

You're building an application for a customer support center that takes phone calls for a cable company. When a customer calls in to complain about their fuzzy T.V. reception and also they want to upgrade their high-speed internet access, the customer service representative will need to open up two "tickets"; one for the fuzzy reception problem, and one for the high-speed upgrade.

In this scenario we are dealing with two things: phone calls and support tickets. So when the customer calls back the next day about the internet upgrade, we will want to associate that phone call with the second ticket we opened. Likewise, when they call back again the next day about the fuzzy reception, we want to associate that phone call with the first ticket we opened. Let's see how we can accomplish this many-to-many relationship in SQL. We'll learn:

  • How to create tables that support a many-to-many relationship.
  • What indexes should be created for the best performance.
  • How to report by phone calls or by tickets (either side of the relationship).

How to Make a Many-to-Many Schema

The first piece in making our tables is to define the two entities (or tables) and then we can join them together by their IDs in a third "linker" table. For the example we are going to use we'll need to very simple tables: PhoneCalls and Tickets. We'll keep it simple so as not to get lost in the illustration, but instead to focus on the many-to-many relationship concept. Here is the schema of the two tables:

-- This table will hold our phone calls.
CREATE TABLE dbo.PhoneCalls
(
   ID INT IDENTITY(1, 1) NOT NULL,
   CallTime DATETIME NOT NULL DEFAULT GETDATE(),
   CallerPhoneNumber CHAR(10) NOT NULL
)

-- This table will hold our "tickets" (or cases).

CREATE TABLE dbo.Tickets
(
   ID INT IDENTITY(1, 1) NOT NULL,
   CreatedTime DATETIME NOT NULL DEFAULT GETDATE(),
   Subject VARCHAR(250) NOT NULL,
   Notes VARCHAR(8000) NOT NULL,
   Completed BIT NOT NULL DEFAULT 0
)

If we only wanted to support one-to-one functionality we would simply put a "PhoneCallID" field in the Tickets table and make the rule that you can't have a ticket without a phone call. But instead, we want to allow multiple tickets to be created on one phone call, and we want to allow many phone calls to be associated with a single ticket. To do that we'll need to build this linker table:

-- This table will link a phone call with a ticket.
CREATE TABLE dbo.PhoneCalls_Tickets
(
   PhoneCallID INT NOT NULL,
   TicketID INT NOT NULL
)

That's all we need to make our many-to-many schema work. First we would insert a record into the PhoneCalls table. Then, when the caller explains the situation, the customer service rep would enter a record into the Tickets table and the system would insert a link (the ID of the phone call and the ID of the ticket) into the linker table. Eventually though this table is going to be big, and will begin to perform slowly in views and reports. Let’s now look at how to make the perfect indexes for performance on both sides of the relationship:

Perfect Many-to-Many Indexes

Without a proper understanding of SQL indexes, most people would either add an arbitrary column to the linker table and make that the primary key (which doesn't help us at all), or they might make a composite primary key of PhoneCallID and TicketID. This index will help a lot; it makes sure that there are no duplicates and it gives us a lot of speed when searching or joining to the PhoneCalls table. But what if we wanted to query by the TicketID?

The problem with just one index when dealing with a many-to-many linker table is that you are only tuning performance for one side of the relationship (which makes sense: one index only helps one side). So how do we add an index that will give us the best performance possible? Well, a regular index is ok, but nothing beats a unique index in terms of speed. Here's the trick!

A linker table (a table of IDs from two other tables for the purpose of a many-to-many relationship) is always unique whether you look at it from left to right or right to left. Meaning, PhoneCallID_TicketID is unique and TicketID_PhoneCallID is also unique, so we can add another unique index to the table!

Perfect SQL indexes for a many-to-many linker table Another unique SQL index for a many-to-many linker table

You can add the PRIMARY KEY indexes right in the CREATE TABLE statement (meaning, you don't have to use the designer). For simple cases where only one column is the primary key, you can just add the phrase "PRIMARY KEY" on the column definition line. Example:

CREATE TABLE dbo.SomeTable
(
   ID INT IDENTITY(1, 1) PRIMARY KEY NOT NULL
)

For more complex PRIMARY KEYS where you want multiple columns, or you want to specify the default sort order, you would use the "CONSTRAINT" keyword. Example:

CREATE TABLE dbo.ComplexTable
(
   FirstName VARCHAR(100) NOT NULL,
   LastName VARCHAR(100) NOT NULL,
   CONSTRAINT PK_ComplexTable PRIMARY KEY (LastName, FirstName DESC)
)

How Does Another Unique Index Help?

Well, think about your table as if it was a library filled with thousands of books. How would you find a book called "SQL Performance Tips"? Well, you would use the library's index and skip ahead to the letter "S" and continue in alphabetical order until you found your book. But what if you wanted to find a book by the author "Billy McGee"? You can't use the index that you were just in because that is sorted by book title (which doesn't help you because you only know the author's name). So you would use the library's other index, the one that is sorted by author's name.

The same is true for SQL. If you do a query like this:

SELECT * FROM dbo.PhoneCalls_Tickets WHERE PhoneCallID = 123

Then SQL will realize that it should use the PhoneCallID_TicketID index to find your results. But if you change the query only slightly to do this:

SELECT * FROM dbo.PhoneCalls_Tickets WHERE TicketID = 123

Then SQL will understand that the first index doesn't help, so it will use the TicketID_PhoneCallID index to find your results. That's how to achieve the best performance with many-to-many relationships.

How Do I JOIN These Tables Together?

When you are dealing with JOINing more than two tables in a query it can sometimes get confusing. So when you're dealing with a many-to-many query (which requires three tables), it helps to first break up your FROM/JOIN clauses and explain them out loud to yourself as you type. The JOINs should make sense to SQL and to you. Let's look at a query that would JOIN both of our tables together with the help of the linker table.

-- It doesn't matter which table we start with, so I'll just
-- pick the PhoneCalls table.


SELECT
   *
FROM
   dbo.PhoneCalls

-- If I stop here, then all I have is the phone call data. Now

-- we can JOIN to the helper table to get the IDs of all tickets

-- associated with each phone call in the query above.


INNER JOIN dbo.PhoneCalls_Tickets ON
   PhoneCalls.ID = PhoneCalls_Tickets.PhoneCallID

-- That's simple enough. Now that we've joined to the linker

-- table we are able to use the PhoneCallID or the TicketID

-- fields from the linker table. Now let's get the tickets!


INNER JOIN dbo.Tickets ON
   PhoneCalls_Tickets.TicketID = Tickets.ID

You can also use the linker table in your WHERE clause. An example of this would be if you wanted to see all calls associated with a ticket, or all tickets associated with a call. Here is how you would achieve those two queries:

-- This will give you all calls for ticket number 123.
SELECT * FROM dbo.PhoneCalls WHERE ID IN
   (SELECT PhoneCallID FROM dbo.PhoneCalls_Tickets WHERE TicketID = 123)

-- This will give you all tickets for phone call number 123.

SELECT * FROM dbo.Tickets WHERE ID IN
   (SELECT TicketID FROM dbo.PhoneCalls_Tickets WHERE PhoneCallID = 123)

Closing Remarks

Now that we have examined the facts we realize that many-to-many relationships are not difficult to build, and they can perform very well when indexed properly. Also, if you name your tables appropriately, then you're queries will also be easy to read. So the next time you are designing a database schema for one of your projects, try to squeeze in some of this powerful functionality. Remember, you can easily support a one-to-many relationship with a many-to-many schema, but you can't go the other way around. So if you think you might need it, why not support it from the ground up.

  • Aug 31 2007 - 06:42:47 AM scottm

    Great article, What would be the best way to insert, update and delete data from this structure?

  • Aug 31 2007 - 10:09:37 PM Timothy Khouri

    It's nothing too complicated... You'd probably first insert a record into the "PhoneCalls" table first. Then, when the caller opens a "ticket", you'd insert a record into the "Tickets" table, retrieve the ID of that record (by selecting SCOPE_IDENTITY()) and then inserting a record of the two ID's into the linker table, "PhoneCalls_Tickets".

    If you wanted to do a DELETE, then you'd just have to make sure to delete the records out of all three tables. Let's say you wanted to delete all data for PhoneCall ID #7, then you'd do this:

    DELETE FROM dbo.Tickets WHERE ID IN (SELECT TicketID FROM dbo.PhoneCall_Tickets WHERE PhoneCallID = @PhoneCallID)

    DELETE FROM dbo.PhoneCall_Tickets WHERE PhoneCallID = @PhoneCallID

    DELETE FROM dbo.PhoneCalls WHERE ID = @PhoneCallID

    And that's it :)

  • Sep 05 2007 - 06:19:31 AM scottm

    Thanks for the info.

  • Nov 05 2007 - 07:29:59 PM hvolcy

    This article is just what I've been looking for. I tried the examples but it seems that the last select statment may be incorrect. Shouldn't it say "(SELECT TicketID..." instead of "(SELECT PhoneCallID..."

  • Mar 22 2008 - 08:22:51 PM Timothy Khouri

    I realize that this is way late for me to be fixing this... but you're right... I've fixed the typo.

  • Dec 28 2008 - 04:14:19 PM Philthy

    I'd like to see more examples of this. Especially with inserting into the junction table, when a row has been created in one of the other tables. I have been vaccuming the net lately for many-to-many scenarios, using sql server 2005, and I can't find many good examples that explain this.

  • Dec 30 2008 - 01:10:39 PM Jonathan Khouri

    Excellent article. Just what the SQL Dr. ordered.

  • Mar 09 2009 - 02:07:58 PM DanPinault

    Great article. Well written and easy to understand. Will you please consider the scenario shown here http://i42.tinypic.com/2uigns9.png and provide some advice? I'm struggling with a many-to-many situation and getting multiple instances of some values in my query results (the multiplicative effect). I tried creating a 'linker' table as you describe but that didn't fix it. I must be missing something. Thanks!

    [img]http://i42.tinypic.com/2uigns9.png[/img]

  • Mar 10 2009 - 08:00:02 AM Timothy Khouri

    Well, I definitely see your issue here, and it's not that bad. *IF* BaseModID and OEM_ID are 1-to-1 (meaning you won't ever have a "9876" BaseModID with anything other than a "6543" OEM_ID) - then you can do something like this:

    SELECT

    info.OEM_Country,

    (SELECT SUM(Reg_Qty) FROM dbo.Registration r WHERE r.BaseModID = (SELECT TOP 1 BaseModID FROM dbo.ProductionHistory ph WHERE ph.OEM_ID = info.OEM_ID)) AS TotalRegistered

    FROM

    dbo.OEM_Info info

    But, I doubt that those fields are 1-to-1... If they aren't, then you'll need to add the OEM_ID to your "Registration" table. That might be the simplest route anyway. If so, then you'll only need to join once to the OEM_Info table.

    Hopefully I didn't misunderstand your architecture all together!

  • Mar 10 2009 - 08:21:45 AM DanPinault

    Timothy - Thanks for such a quick reply! You read the architecture better than I pictured it. You are correct in the assumption that BaseModID and OEM_ID are NOT 1-to-1. I should have illustrated that in my example. I think what I am going to try is joining the Reg dimension table to the Prod History fact table by adding the Reg_ID to the Prod History fact table and see what happens.

    I think the thing that is really tripping me up is that there is a metric (Reg_Qty) in the Reg table. That makes me think that the Reg_Qty should be in its own fact table. Unfortunately, I don't have the depth of knowledge on how to work with multiple fact tables.

    Thanks again!

    Dan

You must be logged in to add comments. If you have not already done so, you can create an account here. If you already are a member, you first need to login before you can comment.

Developer / Architect / Author

People to Follow

Experts in the categories related to this article.

There are no experts related to this article.

Related Blogs

These are the most recent blog posts related to this article.

There are no blog posts related to this article.

Related Ads

SingingEels.com as of Mar 15 2010 - 03:20:39 PM - (0.1406304)