SQL Performance - Clustered Indexes
(Apr 24 2007 - 05:31:40 PM
by Timothy Khouri
) - [print article
If you are reading this article, you should already have a working knowledge of SQL and are familiar with filtering your data with a WHERE clause, or limiting your result set from joining other tables with an ON clause.
The part where this article comes in is when you start to see a significant amount of time passing while your once simple query is now hitting your production tables which are multitudes bigger than your test data.
This article is not just for those of you who are seeing your queries start to take 30 seconds or more, but everyone could probably benefit from this simple article.
First of All - What is Slow?
In my opinion, 98% of the queries you are running should fire in less than 1 second. That may sound bold (and perhaps to some even foolish) but in truth if you know what you are doing your queries themselves should be very fast and your slow down will be your network traffic if you are hitting a different server for your data.
I have created and worked with databases ranging from just a few thousand records to the hundreds of millions. And my "lookup" queries for finding records (whether customers or products or nearby store locations) all take sub-second. The key, is not just a good index... but rather, a good set of indexes. And more importantly the right index for the right job.
They Clustered my What?
If you are using Microsoft Enterprise Manager for SQL Server 2000 databases, or Microsoft SQL Server Management Studio for 2005 databases, or even just using Visual Studio for small SQL Express databases (such as the one used for this website) then you will likely notice the "Primary Key" button that sets your currently selected column into the primary key column for your table. A primary key is a "unique" index, but it also defaults to being a "clustered" index in the above mentioned IDE's and probably more.
Why is that important to know? For starters, you only get one clustered index per table. Secondly, your clustered index will also determine the default sorting of your select statements. It also can be your biggest performance loss when you are inserting or updating data in that table.
If the data in your table will rarely change and you use it basically to join to get a friendly name, or to filter out records not in a certain group, then you would just leave the clustered primary key alone. Even if your table changes fairly often such as a "store locations" table where you might have a few thousand stores across the country and they change their phone numbers or address etc., you would still likely just leave the index alone.
So When Do I Cluster?
If you had a table filled with customer data, and you need to find a record based on "Last Name" and "Zip Code" then you might find yourself waiting a good while (or worse yet timing out) while your query is being processed. Now is the time to remove the "cluster" from your primary key field (remember you only get one cluster so you have to remove it from the primary key). Now that you have done that you can create a new index, and of course you are going to set it as a "clustered" index. The columns, in this order, that you will add is "LastName,FirstName". Now you might be asking "why didn't we cluster 'LastName,ZipCode' since that is what we are searching on?"
That's a great question, and the answer is this: You really only need to cluster "LastName" as that will be more than good enough to drop your search query down to sub second even if you have millions of records. Adding "Zip Code" to your search will simply help you limit the result set for someone searching for 'Smith' which would return just too many records to look at. So, we know we don't need to add "ZipCode" to the cluster, but why did we add "FirstName"? The answer to that is something I mentioned above. And I'll explain now.
Default Sort Benefits
Your clustered index will tell SQL Server how to sort results given to you by default. So if you do a "SELECT * FROM dbo.MyTable" then you'd simply get all the records from that table as they were entered. But, if you had a clustered index on "LastName,FirstName" then you'd get all the records sorted by "LastName" then "FirstName". So in our scenario above of the 'Customer Search' we now get the benefit of displaying the results to the user for his search in alphabetical order which helps him to find the record he wanted a lot faster.
There are a lot of other benefits of clustered indexes, and there is so much to go over, but for now I'll leave it at that. Remember your primary key column is probably defaulted as "clustered" which is fine for most scenarios, but you might want to change it one day. Also, keep in mind that you only get one clustered index per table, so use it wisely. Lastly, keep in mind that fiddling with clustered indexes could effect your INSERT or UPDATE code, so make sure to check the performance of these if it is a factor for you.