How to configure Read-Only routing on SQL Server 2016 Always On Availability Group (AG)?

The great advantage of Always On Availability Group over other availability solutions is their ability to scale-out read operations (SELECT queries). Read-only routing is a feature of Always On Availability Group, which redirects connection requests from applications into readable secondary. In SQL Server 2016 Always On Availability Group, you can configure upto 8 readable secondary replicas.

The connections are redirected based on the routing rules. Always On Availability Group provides the following options to define the rules:
  • Read-Only Routing URL
  • Read-Only Routing List
Before defining the routing rules, we must understand the following conditions:

The application must connect to the Virtual Network Name (VNN) and not to the secondary replica directly. VNN is defined at the time of configuring listener.

The application connection string must contain the read-only connection parameter, ApplicationIntent=ReadOnly;

There must be at least on readable secondary exist on the AG. Let’s configure the routing rules:

Read-only Routing URL:

The URL is used when an application explicitly trying to connect to readable secondary with read-only intent. This URL contains the hostname and port number.

Format : TCP://server.domain.com:1433
Example : TCP://node1.ms.com:1433 (note: this on node1.ms.com)
This rule is applicable only when the node is acting as a secondary replica (if it is primary, obviously it will accept, read and write connections).

Configure read-only routing URL using T-SQL:


ALTER AVAILABILITY GROUP [TestAG]
MODIFY REPLICA ON
N’node1’
WITH
(SECONDARY_ROLE (READ_ONLY_ROUTING_URL = N’TCP://node1.ms.com:1433’));
GO


Read-Only Routing List:

The read-only routing list contains the list of readable secondary with their priority. For example, node1.ms.com is the primary replica, and when an application is trying to connect to AG with explicit read-only intent, the primary replica will redirect the read-only connection to available secondary replicas as defined on the routing-list.

Format : ‘replica1’,’replica2’
Example  : ‘node2’,’node3’ (note; on node1.ms.com)

Configure read-only routing list using T-SQL:

ALTER AVAILABILITY GROUP [TestAG] 
MODIFY REPLICA ON 
N’node1’ 
WITH (PRIMARY_ROLE (READ_ONLY_ROUTING_LIST= (N’node2’, N’node3’))); 
GO

Note the role, it is when node1.ms.com server is in primary role, it will take effect. Now you have configured the routing URL and routing list, follow the below steps to verify it is working as expected.

Steps to verify Read-Only routing:

In SSMS login, enter server name as Always On AG listener and go to options and find the additional connector parameters.

Enter the below connection parameters and click on the connect button.

ApplicationIntent=ReadOnly; InitialCatalog=databasename;

Open new query window and see the server name to identify, currently which server it is connecting.

SELECT @@SERVERNAME;

The output should be name of Read-only Secondary Replica. 

At the Read-Only Secondary Replica, if you try to update tables, you will get the error message stating, you cannot perform DML operation on secondary replica.

If you are not getting the Read-Only Secondary replica, mention it on comment section, I will be glad to help you and get back to as soon as possible.

Load-balancing across Read-Only Secondary Replicas:

Starting from SQL Server 2016, you can configure the load balancing across the read-only replicas.  Load balancing can be configured as below:

ALTER AVAILABILITY GROUP [TestAG] 
MODIFY REPLICA ON 
N’node1’ 
WITH 
(
PRIMARY_ROLE 
(
READ_ONLY_ROUTING_LIST= ((’node1’, ’node2’,’node3’), ‘node4’, ‘node5’)); 
));
GO

Load-balancing performed using Round-robin algorithm, and connection requests are load-balanced between node1, node2 and node3. Next priority will be node4 and the last priority will be node5.

I hope this article helps you in configuring the read-only routing, if you have questions or doubts, please mention it on comment section. 

Architecting reliable backup and recovery strategy


I have been managing multiple databases, mostly in Microsoft SQL Server and MySQL server, both on on-premise and cloud. We have faced a lot of challenging issues such as records deleted from a user table, backup file is corrupted, backup file is not compatible, backup files got deleted, backup storage is full and backup is running for long time, etc. When you are facing this issues for the first time, it is surprising to see new kind of issues every day and if you are not good in documentation, repetitive issues will keep occurring and we keep fixing rather than suctioning. If you are facing same challenges, then you need to focus on your backup and recovery strategy.

A well-designed backup and recovery strategy maximizes data availability and minimizes data loss without tolerating business requirement. In this post, we will discuss about the following topics:
  1. Recovery Time Objective (RTO)
  2. Recovery Point Objective (RPO)
  3. Business Strategy
  4. Backup 
    • Backup Type
    • Backup Frequency
    • Backup Device
    • Retention and Archival plans
  5. Restore
    • Backup Availability
    • Restoration Access
    • Documentation
  6. Testing
Recovery Time Objective (RTO): Time taken to recover the data

Recovery Time Objective defines, how long would an interruption to data access need to be, before the business ceased to be viable / capable of working successfully?

Best Practice:

Backup and Restoration plan that involves quick recovery with small data loss might be acceptable to business than a plan that reduces data loss but takes much longer to implement.
Deciding optimal RTO might involve following things:
  •  Identifying the correct backup media
  •  Identifying the authorized persons to perform restore
  •  Identifying documentation related to the restore
Example:

RTO for the Stock Exchange System might be 1 minutes – which means, in case of any failure on database system, backup should be restored and system should be online within a minute.

Recovery Point Objective (RPO): The amount of data loss

Recovery Point Objective defines how much data loss will be acceptable to business.

Best Practice:

Backup plan should be able to recover all committed data without any loss and it should be acceptable to loss the data which is in process at the time of failure.

Example:

RPO for the banking system might be 1 seconds – which means, in case of any failure, the bank can loss at most 1 second worth of data.

Business Strategy:

Backup and recovery plan should be in sync with business requirement. As a best practice, it should be communicated with business stakeholders and expectations of the business users should be managed in line with the agreed strategy.

Each database must be categorized based on the importance to the core functions of the organization and RPO and RTO for each database must be documented well in advance and communicated to all stakeholders.

Example:

In certain business, Sales database is very critical than Marketing database and needs to restored in high priority, in case of failure. The RPO and RTO for Sales and Marketing databases will be defined accordingly.

The business requirements will determine all aspects of the backup strategy, including:
  •          How frequently backups need to occur?
  •          How much data is to be backed up each time?
  •          The type of media, that the backups will be held on?
  •          Retention and Archival plans for the media? 
Backup Type:

Backup and recovery plan should clearly define the backup type required for different scenarios. A full backup might be appropriate in certain cases and partial backup will be appropriate in other scenario.

For example, Product database may have huge number of records and backup and restoration of a huge backup is time consuming. When you have RTO of 1 minutes, it is not right strategy to have a full backup which needs 1 hour of restoration time. Similarly, when RPO is defined as 5 seconds, we cannot run full backup on every 5 seconds.

Backup and Recovery strategy should address this question. It should choose the combination of different backup type to achieve the defined RPO and RTO metrics.
Generally, there are 3 types of backups are used:
  •  Full backup.
  •  Differential backup. 
  •  Log backup.
Backup Frequency:

Most of the places, the backups are implemented in below frequency:
  •  Full backup – every week
  •  Differential backup – every day
  •  Log backup – every hour

Review the above schedule and think, will it satisfy the business requirement. There is no right or wrong questions, the above frequency might satisfy the RPO of 1 hours and depends on the size of the database and where backup stored, RTO might be achievable. There is no one size fit all approach here, you have to understand the business requirement and define the RTO and RPO.

When business demand no data loss, log backup of every 1 hour will not help, in case any failure, with the above backup plan, you may lose the data worth 1 hour. A well-designed backup and restore plan will address this issue and it should be documented and shared across business stakeholders.





Backup Device:

This is one of the important factors, choosing the wrong backup device, will negatively impact the RTO and business requirement. Generally, backup device can be either one or combination of below devices:
  •  Disk Files – (SAN, NAS, etc.)
  •  Cloud Storage (Azure Blob Storage, Amazon Cloud Storage, etc.)
Best Practice:

Make sure, the backup device is easily accessible and secured. There are instances, SAN administrator mistakenly deleted the backup files while freeing disk spaces and other instances, where DBA does not have access to backup storage device and the person who is having access is on vacation. If you are in Enterprise setup, usually different things are managed by multiple persons, if you are in small or start-up setup, you might have access to everything. Plan your backup and recovery strategy accordingly, where to write the backup and how you can access the backup, when there is a need.

Also, there are compliance policy, where it demands the backup must be secured, might be using encryption algorithms.

Archival and Retention Policy:

Archival and Retention policy should be defined considering the legal requirements and compliance requirement. Depends upon your region compliance requirements may impact how you store and how long you retain the backup.

Restore:

Backup Availability:

There are scenarios, where backup file got corrupted or mistakenly deleted from storage, if you have only one backup copy, it will lead to loss of data. Backup plan should consider this and implement, more than one copy of backup in an accessible location.

Best Practice:

Generally, it is good practice to have more than 2 copies of backup. Keep 1 one copy in the local machine or attached drive, other copy in SAN/NAS storage and another copy in remote location, such as cloud or other geo.

Keeping a copy in local help in reduce the time needed to transfer the file to local machine and it will help in achieving RTO.






Restoration Access:

In a large enterprise setup, everyone may not have access to restore or recover data. These permissions include, access to backup storage, access to restoration and security access, if encryptions are in place. It is advisable to capture all these permissions in Backup and Recovery strategy and share it with business and technical stakeholders.

In case of emergency, everyone knows their role and understand the sequence, when to execute their job. It will remove the confusions and keep the team in calm environment.

Documentation:

I have interviewed a lot of people for the database administration role and I realized, the documentation is least important task in their day to day job. For a DBA, most of the tasks are repetitive and we are using a lot of scripts and automations. Even something developed or written by us few days back will be very difficult to understand later point of time. When you are in emergency, searching on Google or using others scripts will not help you and it seems it is simple to recover the data but it is not. It is mandatory to document each and every steps and business requirements in clear manner and communicate with all stakeholders.

Also, it is important to agree both the business and technical stakeholders on how quickly data can be retrieved and how much data can be lost and get sign-off on the documentation.

Testing:

As John Ruskin said, “Quality is never an accident; it is always the result of intelligent effort.” It does not matter, how perfectly you designed your backup and recovery strategy, unless you tested many number of times and documented all the challenges and resolution, it is going to be surprise at the time of emergency.

As a best practice, restore the backup on testing and staging frequently in a defined interval and restore on the production machine in longer intervals. There are instances, a backup copy may work in one machine with same database engine and configuration, may not work in other machine with same engine and configuration. Make a habit of test, test and test frequently in a defined period.

Summary:

A well-designed Backup and Recovery strategy will ensure the data availability and minimize the data lose. A documented strategy will act as guiding light in terms of emergency. It will remove the confusions and dependency on individuals. A good tested and orchestrated strategy ensures the business continuity and give guarantee to your sleep J.

I hope this helps in planning your strategy, please share your thoughts on this article and let me know, if I need to edit or add content on this article. Thanks you for your time. Be proactive rather than being reactive!!!