MySQL High Availability is an option you can select to enable your MySQL database to remain available in the event of a failure or disruption. The feature allows you to set higher uptime requirements and zero data loss tolerance. In this article, we’ll take a look at what the general concept of high availability means, and how MySQL’s High Availability option works.
What Is High Availability?
High availability is the ability of a system or service to continue functioning and remain available when a failure or outage occurs. A highly available system ensures an organisation’s mission-critical systems and applications are always up and running. It’s especially important for organisations in sectors like healthcare, finance, and aviation where the failure of a mission-critical system could have serious consequences.
High availability is typically expressed as a percentage of uptime defined by service level agreements (SLAs), with a score of 100 representing a system that never fails. Because this is virtually unachievable, most organisations aim for “five nines” or 99.999% availability.
How MySQL Achieves High Availability
A highly available system must be able to recover instantly if a fault occurs. A high availability architecture requires at least three basic elements that work together to ensure recoverability and high availability:
Failure Detection
MySQL has a High Availability option that enables applications to meet requirements for higher uptime (and zero data loss tolerance). When the High Availability option is on, the MySQL system creates three instances across different fault domains or availability zones.
Data is replicated among the three instances using MySQL Group Replication, and the application connects to the primary instance for reading and writing data to and from the database. If a failure occurs, the system triggers an automatic failover to a secondary instance within minutes.
Failover
The failover mechanism transfers services to a replicated instance. If more than one backup instance is available, the failover mechanism chooses the best one to promote to the primary node.
A Redirection Mechanism
Once failover to a secondary instance occurs, the High Availability feature redirects all application and user connections to what is now the new primary node. It also redirects all queries from the old primary node to the new primary database.
MySQL High Availability: Uptime
Uptime is the time that a system is available and functioning correctly and is expressed as a percentage of the total time the system is expected to be operational. High uptime means that the system is available and functioning as expected most of the time.
The uptime you can expect with different levels of MySQL High Availability will depend on the specific high availability (HA) solution you implement.
MySQL Replication
MySQL Replication allows you to set up multiple servers to provide redundancy and failover to support higher uptimes than a MySQL server with no HA capability. A master-slave configuration uses a single master server that accepts reads and writes and one or more read-only slave servers. Data from the master server is replicated asynchronously to the slave servers.
To implement failover, you’ll need to set up one or more slave servers as standbys that can be promoted to master in the event of a failure. Failover is generally a manual process where you have to promote the slave node to the master node by changing the status of the promoted slave to read-write mode so that it can accept queries.
Because failover is done manually, it will take longer and could be prone to human error, leading to a longer outage. MySQL Replication also uses asynchronous replication, which means that if the master fails, transactions committed on the master may not yet have replicated to the slave servers. If there is critical data loss, data will need to be restored, adding to the time the system is down.
MySQL Group Replication
MySQL Group Replication allows you to achieve higher uptimes than MySQL Replication. Using MySQL Group Replication, you set up multiple MySQL servers in a group with one server designated as the primary server and the other servers as secondary servers. Each server in the group maintains a copy of the data and uses replication to ensure that the copies remain in sync.
If the primary server goes down, the secondary servers in the group automatically detect the failure and begin the failover process. One of the secondary servers is automatically promoted to the new primary server and begins serving requests from clients. The other secondary members in the group now receive updates from the new primary server and continue processing client-read requests.
If the failed server comes back online, it automatically rejoins the group as a secondary server.
Because failure detection and failover occur automatically with MySQL Group Replication, downtime is minimal, and users and applications are typically unaware that an outage has occurred.
MySQL Cluster
A MySQL Cluster HA solution offers the highest level of uptime. This highly available, distributed database system, along with automatic failover and load balancing, provides high levels of availability, performance, and scalability and is designed to provide near-zero downtime.
MySQL Cluster uses three types of nodes that work together to store and manage data:
- Data nodes: Store data and handle read and write queries.
- MySQL server nodes: Receive queries from the client applications, process them on the data nodes, and then return the result to the clients.
- Management nodes: Manage the operation of the cluster and handle failover and recovery if a failure occurs.
If one or more nodes in a cluster fail, the cluster automatically detects the issue and triggers the failover process. The entire process typically happens within one second of a failure, without interrupting service to client applications. The cluster continues to operate as normal with virtually no downtime.