Home »
Cloud Computing
High Availability and Disaster Recovery
Cloud Computing | High Availability and Disaster Recovery: In this tutorial, we will learn about the high availability and disaster recovery, the need for high availability (HA) and disaster recovery (DR), and their key elements in Cloud Computing.
By Rahul Gupta Last updated : June 04, 2023
High Availability (HA)
High Availability refers to removing single failure points."High availability refers to a system or component that for a long period is continuously operational." This could mean anything as simple as configuring several discs as a RAID for storage, or it could mean multiple redundant storage systems and servers, designed to provide reliable and continuous uptime for storage.
It must be capable of enduring failures at various levels of the solution for a solution to truly be deemed highly available. This involves internal hardware, software, and networking, but is not constrained.
Disaster Recovery (DR)
If a system fails, recovering from the incident quickly is always essential for an organization, and this is where the idea of disaster recovery comes into play.
Disaster recovery is a strategy that "allows an organization to maintain or quickly resume mission-critical functions after a disaster." IT organizations need features that enable data backup or automate the reconstruction of infrastructure, thus incurring minimal downtime, to be able to recover from a catastrophic event. This enables companies to sustain the productivity levels expected.
The Need for High Availability (HA) and Disaster Recovery (DR)
To guarantee business continuity, it is necessary to use both high availability AND disaster recovery technologies. High availability, as defined, protects us from day-to-day events that can affect device availability, such as hardware failure, network failure, load-induced failure, or other failures of the application. Having processes and technologies of high availability in place to ensure that these types of failures result in either limited or no effects, results in a highly accessible system. Disaster recovery comes into play when, as a consequence of natural disasters, user-induced data loss, security breaches, or site-wide failures, a significant outage is encountered.
We have resilient backups available to recover data in a disaster situation, resulting in data loss, by obtaining backups of business-critical systems and holding offsite DR copies. In a system-wide catastrophe where an entire site could be offline, replication ensures we are safe. In the event of the main production site going down, resources can be diverted to the DR site by replicating virtual machines to a DR facility. In business-continuity planning, both high availability and catastrophe recovery are extremely significant. In the event of a major catastrophe, each plays a vital role in ensuring both day-to-day uptime and data recoverability.
Key Elements of High Availability and Disaster Recovery
Distributed Approach
This approach suggests that the business control room and the enterprise customers on different computers, in addition to clustering automation everywhere and relevant data centre components.
The enterprise control room is fairly versatile to accommodate a large number of requests. Deploy multiple enterprise control room or enterprise client instances on multiple physical or virtual servers, as necessary.
Load Balancing
This is the method of spreading application or network traffic across multiple servers to protect service operations conducted by a load balancer and enables workloads to be spread across multiple servers. It ensures that all operations continue on clustered servers.
Databases
To protect the data, databases use their built-in failover. This allows data recovery from databases.
- Set up synchronous replication between the main (active) and secondary (passive/standby) clustered MS SQL servers in the data centre between the HA clusters. In the event of a database node failure, this ensures consistency.
- Configure the database between the DR sites to provide asynchronous replication from the primary DR site (production) to the secondary DR site (recovery) that is at a geographically separated location from the primary DR site.