Semi-synchronous Replication with MariaDB on Ubuntu 20.04

Updated on June 12, 2020
Semi-synchronous Replication with MariaDB on Ubuntu 20.04 header image

Introduction

MariaDB allows for replication between servers using two different modes.

Standard Replication

Standard primary/secondary replication in MariaDB is asynchronous. Actions are performed on the primary node and then subsequently on the secondary nodes when they are ready. Secondary nodes are responsible for copying changes from the binary log to the relay log and applying these changes to the database. Secondary nodes can differ from the primary node in performance and availability, creating a discrepancy between the primary and secondary nodes and affecting database integrity.

Semi-synchronous Replication

In semi-synchronous mode, the primary node waits for confirmation from one of the secondary nodes that they have received the update, ensuring that there is always at least one complete copy of the database available in the event of a failure of the primary node.

Replication is not a substitute for a good backup. All changes to the database are replicated to other nodes in the cluster, including accidental deletes or bad imports.

Tutorial Examples

This guide uses the following example names and addresses:

  • The primary node is host-1
  • The secondary node is host-2
  • The private IP address of host-1 is 203.0.113.111
  • The private IP address of host-2 is 203.0.113.222

Prerequisites

1. Configure Name Resolution

The primary and secondary servers need to refer to each other by hostname. This does not need to be the same host name that clients use to connect. For testing purposes, it is a good idea to comment out the 127.0.1.1 address.

Edit /etc/hosts on host-1.

$ sudo nano /etc/hosts

Comment out the entry for 127.0.1.1 and add the entry for host-2, as shown.

# 127.0.1.1 host-1
203.0.113.222 host-2

Edit /etc/hosts on host-2.

$ sudo nano /etc/hosts

Edit /etc/hosts on host-2. Make the corresponding changes for host-1, as shown.

# 127.0.1.1 host-2
203.0.113.111 host-1

Confirm DNS name resolution by doing a DNS lookup of each host:

$ dig host-1
$ dig host-2

2. Install MariaDB on Both Hosts

Install the database service.

$ sudo apt install mariadb-server

Secure the database.

$ sudo mysql_secure_installation

3. Configure Replication

The server ID must be unique for each node in the cluster. This guide also uses the hostname and server ID to name the binary log file for illustration. MariaDB calls the primary server master and the secondary server slave.

Edit the configuration on host-1.

$ sudo nano /etc/mysql/mariadb.conf.d/50-server.cnf

Edit the [mysqld] section:

# bind-address = 127.0.0.1
# replication
log-bin
server_id=1
log-basename=host-1-id-1
plugin_load_add = semisync_master
rpl_semi_sync_master_enabled=ON

Restart the database service.

$ sudo systemctl restart mariadb.service

Edit the configuration on host-2.

$ sudo nano /etc/mysql/mariadb.conf.d/50-server.cnf

Edit the [mysqld] section:

# bind-address = 127.0.0.1
# replication
log-bin
server_id=2
log-basename=host-2-id-2
plugin_load_add = semisync_slave
rpl_semi_sync_slave_enabled=ON

Restart the database service.

$ sudo systemctl restart mariadb.service

4. Create the Replication User

Create the database replication user on host-1. Replace example-password with a suitable password.

$ sudo mysql
MariaDB> CREATE USER 'replication'@'%' IDENTIFIED BY 'example-password';
MariaDB> GRANT REPLICATION SLAVE ON *.* TO 'replication'@'%';
MariaDB> exit

5. Copy the Database

Export the database on host-1.

$ sudo mysqldump --all-databases --master-data=2 > /var/tmp/primary-sql-db.dmp

The --master-data option causes the binary log position to be included in the database dump, but it is not used when importing the database dump on the secondary. As this is a new cluster and no database changes are taking place, this value does not need to be specified when starting replication on the secondary.

Transfer the database dump to the secondary. Replace username with your username on host-2.

$ scp /var/tmp/primary-sql-db.dmp username@host-2:/var/tmp/

Import the database on host-2.

$ sudo mysql < /var/tmp/primary-sql-db.dmp

6. Start Replication on the Secondary

Start replication on host-2. Use the same password as the replication user on the primary.

$ sudo mysql
MariaDB> CHANGE MASTER TO
            MASTER_HOST='host-1',
            MASTER_USER='replication',
            MASTER_PASSWORD='example-password',
            MASTER_PORT=3306,
            MASTER_CONNECT_RETRY=10,
            MASTER_USE_GTID = slave_pos;
MariaDB> START SLAVE;
MariaDB> exit

The Global Transaction ID (GTID) is used to improve the reliability of the binary log when used with replication.

7. Confirm Status

Check the replication status on host-2.

$ sudo mysql
MariaDB> SHOW SLAVE STATUS \G
MariaDB> exit

A Slave_IO_State of "Waiting for master to send event" is normal, and a Slave_IO_Running and Slave_SQL_Running state of "yes" mean that replication is working correctly in the cluster.

Check the status on both servers.

$ sudo mysql
MariaDB> SHOW STATUS;
MariaDB> exit

Look for variables beginning with Binlog_ and Slave_ for information about the binary log and the secondary node status. Variables starting with Rpl_semi_sync_ have information on semi-synchronous replication.

8. Test Replication

Create a database on the primary and confirm that it is replicated to the secondary.

Check host-2.

$ sudo mysql
MariaDB> SHOW DATABASES;

Create a database on host-1.

$ sudo mysql
MariaDB> CREATE DATABASE IF NOT EXISTS test;
MariaDB> exit

Verify the database now exists on host-2.

$ sudo mysql
MariaDB> SHOW DATABASES;
MariaDB> exit

Maintain the Cluster

Check the MariaDB logs from time to time.

$ sudo less /var/log/mysql/error.log

Check for any "semi-sync" replication messages created during startup, and be aware that semi-synchronous log messages can be created during cluster operation. For example, if there is a timeout between the primary and secondary servers, a "Semi-sync replication switched OFF" message will be created, and the replication mode will change to asynchronous replication, losing the benefits of semi-synchronous replication. This message may only be in the error log. Check the log regularly to spot problems with cluster operation.

Conclusion

Semi-synchronous replication ensures data integrity but may come at the cost of a reduction in performance. Alternatives may provide faster performance, but at the cost of complexity, or the possibility that nodes in the cluster could become unsynchronised and risk database corruption in the event of a failure. With semi-synchronous replication, the data is replicated to multiple nodes, and changes to the database are only permitted once the changes are replicated to other nodes in the cluster.

Next steps