WikiDBA

Best Practices to Building Useful Database Indexes

I hope, all professional like DBA/Dev/DB Architect feels, designing an appropriate set of indexes can be one of the more troubling aspects of developing efficient relational database applications. Sometime, it may be, the most important thing we can do to assure optimal application performance when accessing data in a relational/SQL database is to create the correct indexes for tables based on the queries which applications use. But we can start with some basics. For example, consider this SQL statement:

SELECT LASTNAME, SALARY FROM EMPLOYEE WHERE EMPID =‘00110’AND DEPTID =‘D001’;

What index or indexes would make sense for this simple query? First, think about all the possible indexes that may probably create. The list may looks something like this:

Index1 on EMPID
Index2 on DEPTID
Index3 on EMPID and DEPTID

Initially, it is a better way, and Index3 is probably the best of among these. It enables the DBMS/RDBMS or Any DB system to use the index to immediately search the row or rows that satisfy the two simple criteria in the WHERE clause, but as a practice we have to first analyze the impact of these indexes or creating yet another indexes on table of production system. We should be very cautious as appropriate index creation always very complicated, below are the few steps to be keep in mind at the time of index creation..

* INDEX BY WORKLOAD, NOT BY TABLE

Many people make the mistake of just guessing at some indexes to create when they are creating database tables. Without an idea of how the tables are going to be accessed, though, these guesses are often wrong – at least some of them.

Indexes should be built to optimize the access of SQL queries. To properly create an optimal set of indexes requires a list of the SQL to be used, an estimate of the frequency that each SQL statement will be executed, and the importance of each query. Only then can the delicate balancing act of creating the right indexes to optimize the right queries most of the time be made. Below is the query occasionally is used to check missing index as,

SELECT statement AS [database.scheme.table],
column_id , column_name, column_usage,
migs.user_seeks, migs.user_scans,
migs.last_user_seek, migs.avg_total_user_cost,
migs.avg_user_impact
FROM sys.dm_db_missing_index_details AS mid
CROSS APPLY sys.dm_db_missing_index_columns (mid.index_handle)
INNER JOIN sys.dm_db_missing_index_groups AS mig ON mig.index_handle = mid.index_handle
INNER JOIN sys.dm_db_missing_index_group_stats AS migs ON mig.index_group_handle=migs.group_handle
ORDER BY statement, mig.index_group_handle, mig.index_handle, column_id
GO

* BUILD INDEXES BASED ON PREDICATES

We can create an expression-based index to improve the performance of queries that use column-expression predicates. For example a query that contain multiple outer joins so that a predicate that can be satisfied by an index on expression is in a different query block than the outer joins.
The Cost Based Optimizer (CBO) is a rather complex piece of code that has to deal with countless different possible scenarios when trying to determine what the most optimal execution plan might be. It’s also a vitally important piece of code because not only do the decisions need to be reasonably accurate so that it doesn’t generate inefficient execution plans but it needs to make these decisions in a reasonably efficient manner else it wastes resources and most importantly wastes time while it performs its calculations.

* INDEX MOST-HEAVILY USED QUERIES

Numbers 2 and 3 can be thought of as consequences to Number 1… that is, these are the aspects of application workload that need to be examined to produce appropriate and effective indexes.

* INDEX IMPORTANT QUERIES

The more important the query, the more we might want to tune by index creation. If any management guy like CIO/CTO run a query/report every day, First make sure it should be up to optimal performance. So building indexes for that particular query is important. On the other hand, a query for a clerk might not necessarily be weighted as high, so that query might have to make do with the indexes that already exist. Of course, the decision depends on the application’s importance to the business-not just on the user’s importance. By the way, we have to indexed data properly and always maintain updated statistics and fragmentation.

* INDEX TO AVOID SORTING (GROUP BY, ORDER BY)

In addition to building indexes to optimize data access, indexes can be used to avoid sorting. The GROUP BY and ORDER BY clauses tend to invoke sorts, which can cause performance slowdowns. By indexing on the columns specified in these clauses the relational optimizer can use an index to avoid a sort, and thereby potentially improve performance.

* CREATE INDEXES FOR UNIQUENESS – Primary Key/unique index

Some indexes are required in order to make the database schema valid. Most database systems require that unique indexes be created when unique constraints and primary key constraints exist.

* CREATE INDEXES FOR FOREIGN KEYS

Creating indexes for each foreign key can optimize the performance when accessing and enforcing referential constraints (RI – referential integrity). Most database systems do not require such indexes, but they can improve performance.

* CONSIDER ADDING COLUMNS FOR INDEX ONLY ACCESS

Sometimes it can be advantageous to include additional columns in an index to increase the chances of index-only access. With index-only access all of the data needed to satisfy the query can be found in the index alone — without having to read data from the table space.

For example, suppose that there is an index on the DEPTID column of the DEPT table. The following query may use this index:

SELECT DEPTNAME FROM DEPT WHERE DEPID =‘D001’;

The index could be used to access only those columns with a DEPTID greater than D000, but then the DBMS would need to access the data in the table space to return the DEPTNAME. If you added DEPTNAME to the index, that is, create the index on (DEPTID, DEPTNAME) then all of the data needed for this query exists in the index and additional I/O to the table space would not need be needed. This technique is sometimes referred to as index overloading.

* DON’T ARBITRARILY LIMIT NUMBER OF INDEXES

There should be no arbitrary limit on the number of indexes that you can create for any database table. Relational optimizers rely on indexes to build fast access paths to data. Without indexes data must be scanned – and that can be a long, inefficient means by which to retrieve your data.

Sometimes organizations develop database standards with rules that inhibit the number of indexes that can be created. When a standard such as this exists, it usually is stated as something like “Each table can have at most five indexes created for it” — or — “Do not create more than three indexes for any single table in the database.” These are bad standards. If you already have three indexes, or five indexes, or even 32 indexes, and another index will improve performance why would you arbitrarily want to avoid creating that index?

Anyway, a good indexing standard, if you choose to have one, should read something like this: “Create indexes as necessary to support your database queries. Limitations on creating new indexes should only be entertained when they begin significantly to impede the efficiency of data modification.”

* BE AWARE OF DATA MODIFICATION IMPLICATIONS

The DBMS must automatically maintain every index we create. This means every INSERT and every DELETE to an indexed table will insert and delete not just from the table, but also from its indexes.

Additionally, when we UPDATE the value of a column that has been defined in an index, the DBMS must also update the index. So, indexes speed the process of retrieval but slow down modification.

Source : Internet,Database information today,DBTA

Avoid Database Locking

Posted: December 22, 2014 by Virendra Yaduvanshi in Database Administrator
Tags: Application Timeout, Avoid database, Database Locking, DB Locking, Deadlock, Timeout Issue

It can be very frustrating problems to investigate and debug the lock contention issues. Its happened due to concurrency problems. We have to first find out the self-question before blaming the Database system as.

Has the application run in the past without locking problems?
Have the lock timeouts or deadlocks started recently?
What version and level of the DBMS are running?
Does the problem only occur at certain times?
What has changed on the system (e.g., number of users, number of applications, amount of data in the tables, database maintenance/fix packs, changes to any other relevant software, etc?)
What, if anything, has changed in the application (e.g., isolation level, concurrent executions, volume of data, etc.)?

A developer who has written applications to access database data probably has had to deal with concurrency problems at some point in their career. When one application program tries to read data that’s in the process of being changed by another, the DBMS must control access until the modification is complete to ensure data integrity. Typically, DBMS products use a locking mechanism to control access and modifications while ensuring data integrity.

When one task is updating data on a page (or block), another task can’t access data (read or update) on that same page (or block) until the data modification is complete and committed. When multiple users can access and update the same data at the same time, a locking mechanism is required. This mechanism must be capable of differentiating between stable data and uncertain data. Stable data has been successfully committed and isn’t involved in an update in a current unit of work. Uncertain data is currently involved in an operation that could modify its contents.

Most of modern DBMS products allow us to control the level of locking (table, page/block, row), as well as to adjust other locking criteria (for example, locks per users, time to wait for locks, etc. Lock timeouts are one of the most perplexing issues encountered by database professionals. The longer a lock is held, the greater the potential impact to other applications. When an application requests a lock that’s already held by another process, and the lock can’t be shared, that application is suspended. A suspended process temporarily stops running until the lock can be acquired. When an application has been suspended for a pre-determined period of time, it will be terminated. When a process is terminated because it exceeds this period of time, it’s said to timeout. In other words, a timeout is caused by the unavailability of a given resource.

To minimize lock timeouts, be sure to design application programs with locking in mind from the start. Limit the number of rows accessed by coding predicates to filter unwanted rows. Doing so reduces the number of locks on pages containing rows that are accessed but not required, thereby reducing timeouts and deadlocks. Also, we should design update programs so the update is issued as close to the COMMIT point as possible. Doing so reduces the time that locks are held during a unit of work, which also reduces timeouts (and deadlocks).

Deadlocks also cause concurrency problems. A deadlock occurs when two separate processes compete for resources held by one another. For example, a deadlock transpires when a lock on PAGE1 and wants to lock PAGE2 but at the same time a lock on PAGE2 and wants a lock on PAGE1. One of the programs must be terminated to allow processing to continue. One technique to minimize deadlocks is to code your programs so that tables are accessed in the same order. By designing all application programs to access tables in the same order, you reduce the likelihood of deadlocks.

It is important to design all programs with a COMMIT strategy. A COMMIT externalizes the modifications that occurred in the program since the beginning of the program or the last COMMIT. A COMMIT ensures that all modifications have been physically applied to the database, thereby ensuring data integrity and recoverability. Failing to code COMMITs in a data modification program can cause lock timeouts for other concurrent tasks.

You can also control the isolation level, or serialization, of the data requests in our programs. Programs using the repeatable read locking strategy hold their locks until a COMMIT is issued. If no COMMITs are issued during the program, locks aren’t released until the program completes, thereby negatively affecting concurrency. This can cause lock timeouts and lock escalation.

For these, a DBA have techniques to minimize lock timeouts. When an object is being accessed concurrently by multiple programs or users, consider increasing free space, causing fewer rows to be stored on a single page, at least until data is added. The fewer rows per page, the less intrusive page locking will be because fewer rows will be impacted by a page lock. Locking is a complex issue and can be at the root of many performance problems.

Happy Reading …. Please suggest your views and experience on this topic.

Top Tips to Maintain the Security of Database Environment

Posted: December 5, 2014 by Virendra Yaduvanshi in Database Administrator
Tags: HIPPA, PCI, PCI DSS, Security of Database Environment, Tips to Maintain the Security of Database Environment, Top Tips to Maintain the Security of Database Environment

It’s very difficult to be confident about the security of database environment? Because databases may contain sensitive or regulated information, critical applications or stored functions, ensuring database security is undoubtedly a number one priority. And with a number of users viewing and accessing the data, how about all those “who-what-when-where” details that might be hidden from your radar?

The increasing pressure of compliance regulations and security policies makes the deployment of high-level database protection a must-have for any organization. However, it’s generally observed, in almost 90% of cases, unnoticed changes to database configurations result in outages and security breaches.

For those looking for ways to advance database security, here are 5 SQL Server best practices to maintain database security and streamline compliance.

Tip 1: Minimize SQL server exposure and do not leave any “open doors”

We can take the first step to minimize security risks for SQL Server even before your installation is complete and fully configured. Install only required components. In the first place, when configuring your installation, remember the principle of least privilege. Running SQL Server services under an account with local Windows administrative privileges is not a good idea. In case a violator gains possession of such an account with extended privileges, the probability of unwanted outcomes increases. The risk of overall exposure can be minimized if you use a domain account with minimum required privileges instead.

It stands to reason to avoid using the default settings. Rename or disable the default system account for server administration after installation. The same is applicable to naming SQL Server instances instead of using the default instances. Changing the SQL Server port number, which is 1433 by default, will also help you minimize service and data exposure, and so will hiding SQL Server instances and/or disabling the SQL Server Browser service.

Also, do not leave anything unattended. Disable and remove everything which do not use, any unnecessary services or databases from production servers, for example, and sample or test data we may have used to verify successful installation.

Tip 2: Control who can access SQL server and how

When thinking about a user and service accounts authentication, be mindful of establishing user accountability and avoid misuse of privileged accounts. When we can choose between integrated (Windows) authentication and built-in SQL Server authentication, choose the first option whenever it is possible. Integrated authentication encrypts messages to validate users, while built-in authentication passes SQL Server logins and passwords across the network and keeps them unprotected. If you have to use built-in SQL Server authentication for application compatibility, make sure you have ensured a strong password policy.

Again, never use shared user accounts for administrators. A SQL Server administrators should have dedicated accounts with no administrative privileges in other systems. Also, make sure that each admin is using a personal user account. The same recommendation works for applications. Creating separate service accounts with descriptive names for each application that works with SQL is among security best practices

Tip 3: Plan database ownership and data security in advance

Start by identifying the needed level of protection and encryption for each database. This is an important issue when you have to deal with securing sensitive data, such as credit card numbers or patient health information, which is also a staple requirement to meet PCI or HIPAA compliance regulations. Having ensured complete visibility into what is happening across your databases, you strengthen security and streamline compliance by reducing the risk of missing suspicious activities.

When creating a database, make sure that you get all the necessary information about data confidentiality. Do not forget to assign distinct database owners, meaning that the same login should not be applied across all databases. In order to mitigate future risks, establish the same process for new database requests and approvals as well as for database retention.

Protecting database files on disk from unauthorized access and copying in real-time is highly recommended and can be done by leveraging database-level encryption with the Transparent Database Encryption (TDE) feature. In case you need to keep data encrypted in memory (until it is actively decrypted), and/or if we need to give granular users specific access to certain column or cell values, it is recommended that use cell-level encryption.

Tip 4: Regularly patch your SQL servers

The list of security best practices would not be complete without mentioning the need for proper patch management. Because attackers are actively looking for new security flaws in IT systems, and new malware and viruses appear every day, establishing proper patch management of your SQL servers should be among mandatory security practices.

A timely deployment of current versions of SQL service packs, cumulative updates and critical security hotfixes will advance the stability of database performance. It is also necessary to pay attention to regular updating of the underlying Windows Server operating system and any supporting applications, such as antivirus applications, as well.

Tip 5: Keep track of what’s going on

Finally, establishing accountability in many respects means staying up-to-date with configuration changes and user activity. This is an ongoing process of maintaining the actual state of security policies to make sure that all changes are authorized and documented.

Note: Always keep in mind that security is not a state – it is a process. Monitoring, alerting and reporting on changes must become a part of the entire data lifecycle.

Native audit logs allow us, to some extent, to check recent activities and changes affecting security, but obtaining an older view of changes made far long ago can be a challenge. Much excessive information is saved, and as a result logs very often do not contain the required data. On the contrary, change auditing can help detect unauthorized and malicious changes at early stages or show you the historical data, all of which help prevent data breaches and system downtime.

Security Requires a Thoughtful Policy : Try implementing continuous auditing to protect database environment against internal and external threats by ensuring complete visibility across databases

Happy Reading !

Recourse : dbta.com

UPDATE Statement with .WRITE – Minimally logged operation during Partial/Full Update

Posted: November 18, 2014 by Virendra Yaduvanshi in Database Administrator
Tags: Development, NVARCHAR(MAX) and VARBINARY(MAX), partial update, replace vs write, SQL Server, SQL server partial full update, SQL Update with WRITE, SQLServer, T SQL, Update data for VARCHAR(MAX), Update NVarchar(max) data, Update Varchar(max) data, write option for partial update, write vs stuff

The .WRITE clause is an integral part of the UPDATE statement. Commonly it’s used to perform a partial update on big data set of VARCHAR (MAX), NVARCHAR (MAX) and VARBINARY (MAX) data types. Its functionally is very similar to the standard STUFF statement. The UPDATE statement is logged, however, partial updates to large value data types using the .WRITE are minimally logged.
In general practice we use REPLACE or STUFF function to update partial data of a big data values.

To demonstrate this , here I am creating a test table as :

IF OBJECT_ID(‘VirendraTest’) IS NOT NULL
DROP TABLE dbo.VirendraTest
GO

–Create a table as ‘VirendraTest’

CREATE TABLE dbo.VirendraTest (Details VARCHAR(MAX))|
GO

— Check test data

Select * from VirendraTest

Now, let see syntax of WRITE

.WRITE ( expression, @Offset , @Length )

As per BOL – The .WRITE (expression, @Offset, @Length) clause to perform a partial or full update of varchar(max), nvarchar(max), and varbinary(max) data types. For example, a partial update of a varchar(max) column might delete or modify only the first 200 characters of the column, whereas a full update would delete or modify all the data in the column. .WRITE updates that insert or append new data are minimally logged if the database recovery model is set to bulk-logged or simple.

Suppose, here I want to change word ‘Microsoft’ as ‘MS’, there may be 2 options, either use of REPLACE or STUFF as

–Option 1

UPDATE VT
SET VT.Details = REPLACE(Details,‘Microsoft’,‘MS’) FROM dbo.VirendraTest AS VT
GO

–Option 2
UPDATE VT
SET VT.Details =STUFF(Details,CHARINDEX(‘Microsoft’,Details,1),LEN(‘Microsoft’),‘MS’)
FROM dbo.VirendraTest AS VT
GO

Now same thing with .WRITE

–UPDATE with .WRITE option

UPDATE VT SET Details.WRITE(‘MS’,(CHARINDEX(‘Microsoft’,Details,1)-1),LEN(‘Microsoft’))
FROM dbo.VirendraTest AS VT
GO

Please do comment on this performance tips

Happy Reading!

Collations with Linked Server’s Queries

Posted: September 27, 2014 by Virendra Yaduvanshi in Database Administrator
Tags: Collate, Collations, distributed query, linked Server collation error, optimal query plan, Remote Collation, remote query, SQL server distributed query, T-SQL collation

As we know collations are used by SQL Server to compare and order strings. When working with remote SQL Server instances, the engine will correctly compare and order strings based on the remote column collation. Therefore, if remote and local columns have different collations it will result in collation conflicts. When defining a linked server, we have the option of using remote or local collation (“Use Remote Collation” in Server Options). If that option is set to true, SQL Server will try to push the ORDER BY and the WHERE clauses to the remote server. If Use Remote Collation is set to false, SQL Server will use the default collation of the local server instance. If the default collation of the local server instance do not match with the remote server column collation, this will result in poor performance. The local server will have to filter and order the data, thus having to transfer each row beforehand. It is obviously much faster to filter and order the data on the remote server. Then again, deciding to use the remote collation could lead to incorrect results.

Moreover, it is not possible to join on columns that have a different collation. The workaround is to explicitly cast the collation when querying the remote server with the COLLATE clause. But this is an expensive operation if you must scan millions of rows, especially if you need to access the column frequently. In that case, you should manually transfer the data to a local table with the proper collation. This problem can also arise on the same local database since collations are defined at the column level.

Please comments on this, Happy Reading!

Import Text file in a Table

Posted: August 29, 2014 by Virendra Yaduvanshi in Database Administrator
Tags: file to table, Import Text file in a Table, Importing text file in a table, Reading text file, Text file in a table, XP_CMDSHELL

As we know there are many options to import TEXT file data in a SQL Server Database table like using Import/Export Wizard, SSIS, BULK Insert command or OPENROWSET method, apart from these we can also use xp_cmdshell to import text file in a Table as,

— Create a TEMP Table

CREATE TABLE #TextData
(
Text VARCHAR(MAX)
)

DECLARE @sqlcmd VARCHAR(1000)

— Reading Data

SET @sqlcmd = ‘TYPE E:\Letter.txt’

INSERT INTO #TextData

EXEC master.dbo.xp_cmdshell @sqlcmd

— Displaying Result

SELECT * FROM #TextData
GO

— Drop TEMP Table

DROP TABLE #TextData

Customizing ORDER BY clause

Posted: August 29, 2014 by Virendra Yaduvanshi in Database Administrator
Tags: custom order list, Customizing ORDER BY clause, order by with case, sorting data in custom order, use case in order by

As we know ORDER BY clause used to sort result as per specified order – where it may be ASC or DESC. Its sort the result set by specified columns. Its all depends on columns data type.

But in practical environment, sometimes we need result set in a specific order, for example some values should be always on top and its does not matter what are these values, but it should be on top of result set. For example here are some indian cities listed in ASC order as

City Name

Ahmadabad

Banglore

Bhopal

Chennai

Gorakhpur

Jaipur

Kolkatta

Lucknow

Mumbai

Nainital

New Delhi

Pune

Now we want New Delhi and Mumbai always on top in List, The syntax for same will be as

SELECT CityName FROM Table1
ORDER BY CASE WHEN CityName =‘New Delhi’ THEN ‘1’
WHEN CityName = ‘Mumbai’ THEN ‘2’
ELSE CityName END ASC

CityName

New Delhi

Mumbai

Ahmadabad

Banglore

Bhopal

Chennai

Gorakhpur

Jaipur

Kolkatta

Lucknow

Nainital

Pune

Happy reading!!!

Re-Initiating LOG SHIPPING when Secondary Server does not exists

Posted: August 25, 2014 by Virendra Yaduvanshi in Database Administrator
Tags: fixing log shipping issue, fixing log shipping when secondary server does not exists, Log shipping error, Re-Initiating LOG SHIPPING when Secondary Server does not exists, reinitialization of log shipping when secondary server crased, removing secondary server from primary server's database, Secondary server failure

Today I faced an issue where one of secondary server box is now not available due to some circumstances, now I have to delete this secondary server Name and Database entry from primary server’s database. If we go through log shipping wizard from Database property page and try to remove secondary server it will ask to connect secondary server but in my case secondary server is now not available with us. To resolve this, here is a script to delete secondary server entry from primary server’s database is: ( in this case there is no need to connect secondary server)

EXEC Master.dbo.sp_delete_log_shipping_primary_secondary

@primary_database = N’VirendraTest’,
@secondary_server = N’VIRENDRA_PC’,
@primary_database =N’LSVirendraTest’;

Please don’t forget to comment on this and your experinces about it.

SQL Server Error Messages – Msg 8101

Posted: August 19, 2014 by Virendra Yaduvanshi in Database Administrator
Tags: Identity Insert issue, insert data in identity column, inserting rows to identity column, problem with identity column, Sql, SQL Error Messages, SQL Query, SQL Scripts, SQL Security, SQL Server Error, SQL Server error 8101, SQL Tips and Tricks, SQL wikidba, T SQL

Sometime SQL Server error msg 8101 occurred, when anyone is trying to insert a new record into a table that contains an identity column without specifying the columns in the INSERT statement and trying to assigning a value to the identity column instead of letting SQL Server assign the value. Error displays as

Server: Msg 8101, Level 16, State 1, Line 2
An explicit value for the identity column in table “Table_Name” can only be specified when a column list is used and IDENTITY_INSERT is ON.

The solution for above error is , we should include SET IDENTITY_INSERT ON,

Example :

SET IDENTITY_INSERT Table_name ON
Go
Insert into Table_Name (Col1,Col2,Col3,Col4)
Select Col1,Col2,Col3,Col4 from Any_Table_Name
Go

SET IDENTITY_INSERT Table_name OFF
Go

SQL Server’s Replication System Components

Posted: August 18, 2014 by Virendra Yaduvanshi in Database Administrator
Tags: Article, Distribution Agent, Distribution Databases, Distributor, Log Reader Agent, Merge Agent, Publications, Publications Database, Publisher, Queue Reader Agent, Replication, Replication agents, Replication components, Snapshot Agent, SQL Server replication, Subscriber, Subscriptions, Subscriptions database

SQL Server’s Replication requires many components to replicate data from one location to another. The below image is a high-level overview of the pieces involved in a replication setup.

The components used for replication setup include a Publisher and its publication database. The publication database contains a publication that may include a number of articles. The setup also includes a Distributor and its distribution database as well as a Subscriber and its subscription database, which contains the subscription. And using replication agents data replicates as per defined architecture.

The replication components details are as below.

Articles
For each SQL Server object that should be replicated, an article needs to be defined. Each article corresponds to a single SQL Server object like tables, views, stored procedures and functions (For a complete list of objects that can be replicated, check out the topic, Publishing Data and Database Objects in SQL Server Books Online.) An article’s properties determine whether that article contains the entire object or a filtered subset of its parts. For example, an article can be configured to contain only some of the columns of a table. With some restrictions, multiple articles can be created on a single object.

Publications

A publication is a collection of articles grouped together as one unit. Every article is defined to be part of exactly one publication. But in few cases we can also define different articles on the same object in separate publications. A publication supports several configurable options that apply to all its articles. Perhaps the most important option is the one that lets you define which type of replication to use.

Publication Database

A database that contains objects designated as articles is called a publication database, when we set up a publication on a database, SQL Server modifies the inner workings of that database and creates several replication-related objects. A publication database is also protected against being dropped. A publication can contain articles from a single publication database only.

Publisher

The Publisher is a database instance that makes data available to other locations through replication. The Publisher can have one or more publications, each defining a logically related set of objects and data to replicate.

Distributor

Each Publisher is linked to a single Distributor. The Distributor is a SQL Server instance that identifies changes to the articles on each of its Publishers. Depending on the replication setup, the Distributor might also be responsible for notifying the Subscribers that have subscribed to a publication that an article has changed. The information about these changes is stored in the distribution database until all Subscribers have been notified or the retention period has expired. The Distributor can be configured on a SQL Server instance separate from the Publisher, but often the same instance takes the role of the Publisher and the Distributor.

Distribution Databases

Each Distributor has at least one distribution database. The distribution database contains a number of objects that store replication metadata as well as replicated data. A Distributor can hold more than one distribution database , However, all publications defined on a single Publisher must use the same distribution database.

Subscriber

Each SQL Server instance that subscribes to a publication is called a Subscriber. The Subscriber receives changes to a published article through that publication. A Subscriber does not necessarily play an active role in the replication process. Depending on the settings selected during replication setup, it might receive the data passively.

Subscriptions

A subscription is the counterpart of the publication. Each subscription creates a link, or contract, between one publication and one Subscriber. There are two types of subscriptions: push subscriptions and pull subscriptions. In a push subscription, the Distributor directly updates the data in the Subscriber database. In a pull subscription, the Subscriber asks the Distributor regularly if any new changes are available, and then updates the data in the subscription database itself.

Subscription databases

A database that is the target of a replication subscription is called a subscription database. As in the case of the publication database, SQL Server modifies the subscription database during the first initialization. The most obvious change is the addition of a few replication-related objects. However, unlike publication databases, SQL Server doesn’t prevent a subscription database from being dropped.

Replication agents

The replication processes are executed by a set of replication agents. Each agent is an independent Windows executable responsible for one piece of the process of moving the data. In a default installation of replication, each agent is executed by its own SQL Server Agent job. Most of those agents usually run on the Distributor, although some can run on the Subscriber. The Publisher houses replication agents only when the Publisher and Distributor are the same instance. Instead of relying on the SQL Server Agent, you can execute any replication agent manually or by some other scheduling means. However, in most cases, these approaches provide little advantage and often make troubleshooting more complex.

The details of each replication agent types as

Snapshot Agent

In all replication topologies, the Snapshot Agent provides the data required to perform the initial synchronization of the publication database with the subscription database. Transactional replication and merge replication use other agents to keep the data in sync afterwards. For both topologies, replication will use the Snapshot Agent again (after the initial synchronization) only when you request a fresh resynchronization. Snapshot replication, on the other hand, uses the Snapshot Agent exclusively to replicate data. It works by copying all the data every time from the publication database to the subscription database.

Log Reader Agent

The Log Reader Agent reads the transaction log of the publication database. If it finds changes to the published objects, it records those changes to the distribution database. Only transactional replication uses the Log Reader Agent.

Distribution Agent

The Distribution Agent applies the changes recorded in the distribution database to the subscription database. As with the Log Reader Agent, only transactional replication uses the Distribution Agent.

Merge Agent

The Merge Agent synchronizes changes between the publication database and the subscription database. It is able to handle changes in both the publication database and the subscription database and can sync those changes bi-directionally. A set of triggers in both databases support this process. Only merge replication uses the Merge Agent.

Queue Reader Agent

The Queue Reader Agent is used for bi-directional transactional replication.

Happy readying …

Sources: Fundamentals of SQL Server 2012 Replication and SQL Server Book Online

WikiDBA

SQL Authors

Articles

Blog Stats

Search

Social

wikidba

Best Practices to Building Useful Database Indexes

Avoid Database Locking

Top Tips to Maintain the Security of Database Environment

UPDATE Statement with .WRITE – Minimally logged operation during Partial/Full Update

Collations with Linked Server’s Queries

Import Text file in a Table

Customizing ORDER BY clause

Re-Initiating LOG SHIPPING when Secondary Server does not exists

SQL Server Error Messages – Msg 8101

SQL Server’s Replication System Components

Upcoming Events

Virendra Yaduvanshi

SQL Authors

Articles

Blog Stats

Search

Social

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: