Friday, Nov 17, 2017
Indexes are one of the most important features of the SQL Server while being most overlooked and misused in practice. Their proper use leads to great server performance, but having them set up incorrectly or not having them set up at all, leads to poor execution times. Furthermore, having unnecessary indexes negatively impacts the index maintenance and, in my experience, you can often get away with fewer indexes, just by careful planning and knowing what queries are being run on your server. There are different types of indexes, but in this post, we will talk about non-clustered ones.
Missing and incorrectly set up indexes
Imagine the following scenario: Your client calls you, telling you that the search feature in your recently deployed app is getting slower and slower, and lately it takes almost a minute to return the results. You didn’t notice any problems during the development, and it runs under a second on your development database. You would be surprised how often that happens - during the development it is usually the case that the database just doesn’t contain enough data to properly assess the performance problems that will arise once the data starts pouring in the production phase. When you get a call like that, the first thing you need to ask yourself is “Have I configured the indexes?”, and the second question is “Have I configured them correctly?” So let’s address these two questions.
…are often the culprit of a poor SQL Server performance. When you write your query’s
WHERE clause, be aware that columns used in your predicate should be indexed. I know this should be common sense, but some people still don’t do it and wonder why the execution time is high. Take the following example:
SELECT * FROM MyTable WHERE MyColumn1 BETWEEN 1 AND 10
Having no index on MyColumn1 will make SQL Server go through every record in MyTable. More data in the table, longer the execution time. What you want the SQL Server to do is Index Seek, especially if the query is highly selective and returns only a small percent of the data. To achieve that, you need to add an index on MyColumn1.
Incorrectly configured indexes…####
…are another and more common mistake that people make. Look at this query and tell me what would you do:
SELECT LastName, FirstName FROM MyTable WHERE LastName = "Doe" AND FirstName = "John"
“All right, I’ll just add two indexes, one for LastName and another for FirstName,” you say? Well maybe, but only if you know why you’re doing this. Let me try to explain: By creating two separate indexes, SQL Server will work harder whenever an insert or update happen. If you often query both the first and last name, rather than creating two indexes, it would be better to create one composite index, which contains both columns. That index will be used in the following query too:
SELECT LastName, FirstName FROM MyTable WHERE LastName = "Doe"
Yet it won’t be used in the following example, and you’d need a separate index for FirstName:
SELECT LastName, FirstName FROM MyTable WHERE FirstName = "John"
And why is that? It’s because composite indexes are used only if the
WHERE clause uses the leftmost columns in the index, which would be LastName, and then FirstName. Simply said, the order od the columns is important! Think of it as a phone book, where people are sorted by the last, then the first name. You would have trouble finding all people with first name “John” in the phone book, wouldn’t you? But you would have no problem finding everyone with the last name “Doe,” and then if you need to narrow your search to a first name, it won’t take you long. That is how composite indexes work.
However, we can go even further with something that I often see not being used enough - Covering Indexes. Again, let’s take the following query as an example, and let’s assume that you have a composite index on LastName and FirstName.
SELECT LastName, FirstName, DateOfBirth FROM MyTable WHERE LastName = "Doe" AND FirstName = "John"
SQL Server will have no problem finding a matching value in the composite index, but in order to get the DateOfBirth, it will need to go back to the table to fetch it, since it is not a part of the index. That means additional logical or physical reads, and that’s precisely what you want to avoid. Instead, add the DateOfBirth as an Included Column - by doing that, you now have a Covering Index. The DateOfBirth column will be added at the leaf level of the index, which will make the index smaller because column won’t be a part of the tree, and you’ll have your matching row and all columns from the
SELECT statement in a single Index Seek.
Lastly, let’s talk about unused indexes and their impact on server performance. Think of it as all the extra work that SQL Server needs to for nothing. We’re talking about maintaining these indexes, and all additional storage on disks that indexes actually require. Some databases I’ve seen had over 10GB in unused indexes. How did that happen? Well, someone thought that they should have an index on every foreign key. But they didn’t think about what kind of queries are being run on the server. In short, if you don’t use a specific column in your queries, you don’t need an index for it.
Luckily, there are many ways to find unused indexes, and I created a simple script that will get you started. It will help you identify the indexes that are not used, but server needed to create and update. Results are sorted by the number of writes (you can interpret it as a portion of server’s wasted resources). If you remove the
dm_db_index_usage_stats.user_updates <> 0 predicate, the query will list all unused indexes, which now includes even indexes that server hasn’t done any work with.
SELECT objects.name AS TableName, indexes.name AS IndexName, dm_db_index_usage_stats.user_seeks AS UserSeek, dm_db_index_usage_stats.user_scans AS UserScans, dm_db_index_usage_stats.user_updates AS Writes FROM sys.dm_db_index_usage_stats INNER JOIN sys.objects ON dm_db_index_usage_stats.OBJECT_ID = objects.OBJECT_ID INNER JOIN sys.indexes ON indexes.index_id = dm_db_index_usage_stats.index_id AND dm_db_index_usage_stats.OBJECT_ID = indexes.OBJECT_ID WHERE indexes.is_primary_key = 0 AND --remove to show all unused indexes dm_db_index_usage_stats.user_updates <> 0 AND dm_db_index_usage_stats.user_seeks = 0 AND dm_db_index_usage_stats.user_scans = 0 ORDER BY dm_db_index_usage_stats.user_updates DESC
Note that the script shows only the statistics created since the last SQL Server service restart or manual statistics reset. Every time you do either, the statistics are deleted.
Correctly set-up indexes are a prerequisite to a good database performance. I hope you learned something new, and if you have anything to add or a question to ask, feel free to comment below. Thanks for reading!