Posts Tagged ‘Query – Best Practices’

As SQL query performance improvement is a very debating topic between developers and the other user community. Users always wants a fast response on their data retrieval action and developers put forth their best efforts to provide the data in the minimum time span, however, there is no straightforward way to define what is the best performance. Sometime it’s debatable what is good and what bad performance of a query is but overall we have to follow best practices during development, a good developer may provide the best query response to users and avoid such discussions. We can choose multiple ways to improve SQL query performance, which falls under various categories like re-writing the SQL query, creation and use of Indexes, proper management of statistics….. and other best practices. Here are some top tips for this regards.

Avoid Correlated Subqueries

Subqueries are a very powerful and useful feature of the SQL standard. Subqueries can be categorized as either correlated or uncorrelated queries. A correlated subquery is one that is dependent on the outer query to be processed. These types of subqueries can be very inefficient and should be avoided. We can use temp table and join with outer query or other method to get the result.

Eliminate Cursors from the Query

Try to remove cursors from the query and use set-based query; set-based query is more efficient than cursor-based. A good SQL programmer must develop the mental discipline to explore set-based possibilities thoroughly before falling back on the intuitive procedural solution If there is a need to use cursor than avoid dynamic cursors as it tends to limit the choice of plans available to the query optimizer. For example, dynamic cursor limits the optimizer to using nested loop joins.

Avoid Multiple Joins in a Single Query

Try to avoid writing a SQL query using multiple joins that includes outer joins, cross apply, outer apply and other complex sub queries. It reduces the choices for Optimizer to decide the join order and join type. Sometime, Optimizer is forced to use nested loop joins, irrespective of the performance consequences for queries with excessively complex cross apply or sub queries

Avoid Use of Non-correlated Scalar Sub Query

We can re-write our query to remove non-correlated scalar sub query as a separate query instead of part of the main query and store the output in a variable, which can be referred to in the main query or later part of the batch. This will give better options to Optimizer, which may help to return accurate cardinality estimates along with a better plan.

Avoid Multi-statement Table Valued Functions (TVFs)

Multi-statement TVFs are more costly than inline TFVs. SQL Server expands inline TFVs into the main query like it expands views but evaluates multi-statement TVFs in a separate context from the main query and materializes the results of multi-statement into temporary work tables. The separate context and work table make multi-statement TVFs costly.

Creation and Use of Indexes

As many DBA/Developer creates index to magically reduce the data retrieval time but have a reverse effect on DML operations, which may degrade query performance. With this fact, Indexing is a challenging task, but could help to improve SQL query performance and give us best query response time.

Index on Highly Selective

Selectivity define the percentage of qualifying rows in the table (qualifying number of rows/total number of rows). If the ratio of the qualifying number of rows to the total number of rows is low, the index is highly selective and is most useful. A non-clustered index is most useful if the ratio is around 5% or less, which means if the index can eliminate 95% of the rows from consideration. If index is returning more than 5% of the rows in a table, it probably will not be used; either a different index will be chosen or created or the table will be scanned. Here is a query to find the index status.
SELECT statement AS [database.scheme.table],
column_id , column_name, column_usage,
migs.user_seeks, migs.user_scans,
migs.last_user_seek, migs.avg_total_user_cost,
FROM sys.dm_db_missing_index_details AS mid
CROSS APPLY sys.dm_db_missing_index_columns(mid.index_handle)
INNER JOIN sys.dm_db_missing_index_groups AS mig ON mig.index_handle = mid.index_handle
INNER JOIN sys.dm_db_missing_index_group_stats AS migs ON mig.index_group_handle=migs.group_handle
ORDER BY migs.user_seeks, migs.user_scans

Column order in an Index

Order or position of a column in an index also plays a vital role to improve SQL query performance. An index can help to improve the SQL query performance if the criteria of the query matches the columns that are left most in the index key. As a best practice, most selective columns should be placed leftmost in the key of a non-clustered index. The above query will let us know about Column positioning also.

Drop Unused Indexes

Dropping unused indexes can help to speed up data modifications without affecting data retrieval. Also, we need to define a strategy for batch processes that run infrequently and use certain indexes. In such cases, creating indexes in advance of batch processes and then dropping them when the batch processes are done helps to reduce the overhead on the database. Here is a query to find unused indexes.

SELECT ‘[‘DB_NAME() ‘].[‘ + su.[name] ‘].[‘ + o.[name] ‘]’ AS [statement] ,i.[name] AS [index_name] ,
ddius.[user_seeks] + ddius.[user_scans] + ddius.[user_lookups] AS [user_reads] ,ddius.[user_updates] AS [user_writes] ,
SUM(SP.rows) AS [total_rows]
FROM sys.dm_db_index_usage_stats ddius
INNER JOIN sys.indexes i ON ddius.[object_id] = i.[object_id] AND i.[index_id] = ddius.[index_id]
INNER JOIN sys.partitions SP ON ddius.[object_id] = SP.[object_id] AND SP.[index_id] = ddius.[index_id]
INNER JOIN sys.objects o ON ddius.[object_id] = o.[object_id]
INNER JOIN sys.sysusers su ON o.[schema_id] = su.[UID]
WHERE ddius.[database_id] =DB_ID() — current database only
AND OBJECTPROPERTY(ddius.[object_id]‘IsUserTable’)= AND ddius.[index_id] > 0
GROUP BY su.[name] ,o.[name] ,i.[name] ,ddius.[user_seeks] + ddius.[user_scans] + ddius.[user_lookups] ,ddius.[user_updates] HAVING ddius.[user_seeks] + ddius.[user_scans] + ddius.[user_lookups] = 0
ORDER BY ddius.[user_updates] DESC,su.[name] ,o.[name] ,i.[name ]

Statistic Creation and Updates

We have to take care of statistic creation and regular updates for computed columns and multi-columns referred in the query, the query optimizer uses information about the distribution of values in one or more columns of a table statistics to estimate the cardinality, or number of rows, in the query result. These cardinality estimates enable the query optimizer to create a high-quality query plan.

Happy Reading ! .. Like to hear your valuable comments…

Sometimes it happened with a developer or even with a DBA, a query seems to be a very perfect over time and time, having appropriate index, but taking long time to show result. As per my understanding in multiple scenario there may be various reasons. But one very common and widely happened error is data type conversion issue. As we know Data types can be converted either implicitly or explicitly. Implicit conversions are not visible to the user. SQL Server automatically converts the data from one data type to another while Explicit conversions use the CAST or CONVERT functions. The CAST and CONVERT functions convert a value (a local variable, a column, or another expression) from one data type to another.

Here, I am pointing about Implicit conversions, it’s a silent killer and it has been seen this literally bring a system to its knees by causing deadlocks during high load, CPU at maximum utilization, performance issues and people started blaming on server, but reason is something different. Let see below example,

Create Procedure usp_VirendraTest @ID VarChar(10)
Select ID,Name,Fname,BloodGroup,Dept,Desig From tblVirendraTest Where ID = @ID

The code looks very adequate and works good at initial stage but it starts show their own color when table’s rows count become multi millions. The reason is ID column belongs to INT data type and
Developers and even DBAs get confused and write a procedure with the wrong data type in this scenario. Now problem started, every time SQL Server has to look for a ID values and it has to convert @ID from a VarChar to an Int. This is an implicit conversion of data type. Internally, SQL Server uses a convert function perform this operation. When this happens, SQL Server cannot use an index effectively. it has to convert the value for each and every row and as a resultant SQL Server scans the entire table for the value. This takes time and, under default locking modes, places a share lock on the entire table preventing other processes from updating/Inserting/Deleting records while the scan is taking place.

The solution for this type of problem is use of CAST or CONVERT.

Here is the Data type conversions chart from Microsoft

Thanks for happy reading, Please comment and suggest the better ways. JJ

Here are initial tips for writing efficient/  cost-effective Queries

  • When using ANDput the condition least likely to be true first. The database system evaluates conditions from left to right, subject to operator precedence. If you have two or more AND operators in a condition, the one to the left is evaluated first, and if and only if it’s true is the next condition evaluated. Finally, if that condition is true, then the third condition is evaluated. You can save the database system work, and hence increase speed, by putting the least likely condition first. For example, if you were looking for all members living in Delhi and born before January 1, 1960, you could write the following query:
    SELECT  FirstName LastName  FROM  EMPLOYEE  WHERE  State  ‘Delhi’  AND  DateOfBirth  ‘1960-01-01’
    The query would work fine; however, the number of members born before that date is very small, whereas plenty of people live in New State. This means that State = Delhi will occur a number of times and the database system will go on to check the second condition, DateOfBirth < ‘1960-01-01’. If you swap the conditions around, the least likely condition (DateOfBirth < ‘1960-01-01’) is evaluated first:
    SELECT  FirstName LastName  FROM  MemberDetails  WHERE
    DateOfBirth  ‘1960-01-01’  AND  State  ‘Delhi’;
    Because the condition is mostly going to be false, the second condition will rarely be executed, which saves time. It’s not a big deal when there are few records, but it is when there are a lot of them.
  • When using ORput the condition most likely to be true first. Whereas AND needs both sides to be true for the overall condition to be true, OR needs only one side to be true. If the left-hand side is true, there’s no need for OR to check the other condition, so you can save time by putting the most likely condition first. Consider the following statement:
    SELECT  FirstName LastName  FROM  MemberDetails  WHERE  State
    ‘Delhi’  OR  DateOfBirth  ‘1960-01-01’;
    If Delhi is true, and it is true more often than DateOfBirth < ‘1960-01-01’ is true, then there’s no need for the database system to evaluate the other condition, thus saving time.
  • DISTINCT can be faster than GROUP BYDISTINCT and GROUP BY often do the same thing: limit results to unique rows. However,DISTINCT is often faster with some database systems than GROUP BY. For example, examine the following GROUP BY:

    SELECT  MemberId  FROM  Orders  GROUP  BY  MemberId;
    GROUP BY could be rewritten using the DISTINCT keyword:
    SELECT  DISTINCT  MemberId  FROM  Orders

  • Use IN with your subqueries. When you write a query similar to the following, the database system has to get all the results from the subquery to make sure that it returns only one value,
    EMPID  (SELECT  EMPID  FROM  Orders  WHERE  OrderId  = 2);
    If you rewrite the query using the IN operator, the database system only needs to get results until there’s a match with the values returned by the subquery; it doesn’t necessarily have to get all the values:

    SELECT  FirstName LastName  FROM  EMPLOYEE  WHERE  EMPID  IN (SELECT  EMPID  FROM  Orders  WHERE  OrderId  = 2);

  • Avoid using SELECT * FROM. Specifying which columns you need has a few advantages, not all of them about efficiency. First, it makes clear which columns you’re actually using. If you use SELECT * and actually use only two out of seven of the columns, it’s hard to guess from the SQL alone which ones you’re using. If you say SELECT FirstName, LastName…..then it’s quite obvious which columns you’re using. From an efficiency standpoint, specifying columns reduces the amount of data that has to pass between the database and the application connecting to the database. This is especially important where the database is connected over a network.
  • Search on integer columns. If you have a choice, and often you don’t, search on integer columns. For example, if you are looking for the member whose name is VIRENDRA YADUVANSHI and whose MemberId is 101, then it makes sense to search via the MemberId because it’s much faster.