Best Practices to Building Useful Database Indexes

I hope, all professional like DBA/Dev/DB Architect feels, designing an appropriate set of indexes can be one of the more troubling aspects of developing efficient relational database applications. Sometime, it may be, the most important thing we can do to assure optimal application performance when accessing data in a relational/SQL database is to create the correct indexes for tables based on the queries which applications use. But we can start with some basics. For example, consider this SQL statement:

SELECT LASTNAME, SALARY FROM EMPLOYEE WHERE EMPID =‘00110’AND DEPTID =‘D001’;

What index or indexes would make sense for this simple query? First, think about all the possible indexes that may probably create. The list may looks something like this:

Index1 on EMPID
Index2 on DEPTID
Index3 on EMPID and DEPTID

Initially, it is a better way, and Index3 is probably the best of among these. It enables the DBMS/RDBMS or Any DB system to use the index to immediately search the row or rows that satisfy the two simple criteria in the WHERE clause, but as a practice we have to first analyze the impact of these indexes or creating yet another indexes on table of production system. We should be very cautious as appropriate index creation always very complicated, below are the few steps to be keep in mind at the time of index creation..

* INDEX BY WORKLOAD, NOT BY TABLE

Many people make the mistake of just guessing at some indexes to create when they are creating database tables. Without an idea of how the tables are going to be accessed, though, these guesses are often wrong – at least some of them.

Indexes should be built to optimize the access of SQL queries. To properly create an optimal set of indexes requires a list of the SQL to be used, an estimate of the frequency that each SQL statement will be executed, and the importance of each query. Only then can the delicate balancing act of creating the right indexes to optimize the right queries most of the time be made. Below is the query occasionally is used to check missing index as,

SELECT statement AS [database.scheme.table],
column_id , column_name, column_usage,
migs.user_seeks, migs.user_scans,
migs.last_user_seek, migs.avg_total_user_cost,
migs.avg_user_impact
FROM sys.dm_db_missing_index_details AS mid
CROSS APPLY sys.dm_db_missing_index_columns (mid.index_handle)
INNER JOIN sys.dm_db_missing_index_groups AS mig ON mig.index_handle = mid.index_handle
INNER JOIN sys.dm_db_missing_index_group_stats AS migs ON mig.index_group_handle=migs.group_handle
ORDER BY statement, mig.index_group_handle, mig.index_handle, column_id
GO

* BUILD INDEXES BASED ON PREDICATES

We can create an expression-based index to improve the performance of queries that use column-expression predicates. For example a query that contain multiple outer joins so that a predicate that can be satisfied by an index on expression is in a different query block than the outer joins.
The Cost Based Optimizer (CBO) is a rather complex piece of code that has to deal with countless different possible scenarios when trying to determine what the most optimal execution plan might be. It’s also a vitally important piece of code because not only do the decisions need to be reasonably accurate so that it doesn’t generate inefficient execution plans but it needs to make these decisions in a reasonably efficient manner else it wastes resources and most importantly wastes time while it performs its calculations.

* INDEX MOST-HEAVILY USED QUERIES

Numbers 2 and 3 can be thought of as consequences to Number 1… that is, these are the aspects of application workload that need to be examined to produce appropriate and effective indexes.

* INDEX IMPORTANT QUERIES

The more important the query, the more we might want to tune by index creation. If any management guy like CIO/CTO run a query/report every day, First make sure it should be up to optimal performance. So building indexes for that particular query is important. On the other hand, a query for a clerk might not necessarily be weighted as high, so that query might have to make do with the indexes that already exist. Of course, the decision depends on the application’s importance to the business-not just on the user’s importance. By the way, we have to indexed data properly and always maintain updated statistics and fragmentation.

* INDEX TO AVOID SORTING (GROUP BY, ORDER BY)

In addition to building indexes to optimize data access, indexes can be used to avoid sorting. The GROUP BY and ORDER BY clauses tend to invoke sorts, which can cause performance slowdowns. By indexing on the columns specified in these clauses the relational optimizer can use an index to avoid a sort, and thereby potentially improve performance.

* CREATE INDEXES FOR UNIQUENESS – Primary Key/unique index

Some indexes are required in order to make the database schema valid. Most database systems require that unique indexes be created when unique constraints and primary key constraints exist.

* CREATE INDEXES FOR FOREIGN KEYS

Creating indexes for each foreign key can optimize the performance when accessing and enforcing referential constraints (RI – referential integrity). Most database systems do not require such indexes, but they can improve performance.

* CONSIDER ADDING COLUMNS FOR INDEX ONLY ACCESS

Sometimes it can be advantageous to include additional columns in an index to increase the chances of index-only access. With index-only access all of the data needed to satisfy the query can be found in the index alone — without having to read data from the table space.

For example, suppose that there is an index on the DEPTID column of the DEPT table. The following query may use this index:

SELECT DEPTNAME FROM DEPT WHERE DEPID =‘D001’;

The index could be used to access only those columns with a DEPTID greater than D000, but then the DBMS would need to access the data in the table space to return the DEPTNAME. If you added DEPTNAME to the index, that is, create the index on (DEPTID, DEPTNAME) then all of the data needed for this query exists in the index and additional I/O to the table space would not need be needed. This technique is sometimes referred to as index overloading.

* DON’T ARBITRARILY LIMIT NUMBER OF INDEXES

There should be no arbitrary limit on the number of indexes that you can create for any database table. Relational optimizers rely on indexes to build fast access paths to data. Without indexes data must be scanned – and that can be a long, inefficient means by which to retrieve your data.

Sometimes organizations develop database standards with rules that inhibit the number of indexes that can be created. When a standard such as this exists, it usually is stated as something like “Each table can have at most five indexes created for it” — or — “Do not create more than three indexes for any single table in the database.” These are bad standards. If you already have three indexes, or five indexes, or even 32 indexes, and another index will improve performance why would you arbitrarily want to avoid creating that index?

Anyway, a good indexing standard, if you choose to have one, should read something like this: “Create indexes as necessary to support your database queries. Limitations on creating new indexes should only be entertained when they begin significantly to impede the efficiency of data modification.”

* BE AWARE OF DATA MODIFICATION IMPLICATIONS

The DBMS must automatically maintain every index we create. This means every INSERT and every DELETE to an indexed table will insert and delete not just from the table, but also from its indexes.

Additionally, when we UPDATE the value of a column that has been defined in an index, the DBMS must also update the index. So, indexes speed the process of retrieval but slow down modification.

Source : Internet,Database information today,DBTA

WikiDBA

SQL Authors

Articles

Blog Stats

Search

Social

wikidba