Automatic Failover for PostgreSQL Streaming Replication using repmgr

Data Replication is the process of storing data in more than one site or node. It is useful in improving the availability of data. It is simply copying data from a database from one server to another server so that all the users can share the same data without any inconsistency. The result is a distributed database in which users can access data relevant to their tasks without interfering with the work of others.

repmgr is an open-source tool suite for managing replication and failover in a cluster of PostgreSQL servers. It enhances PostgreSQL’s built-in hot-standby capabilities with tools to set up standby servers, monitor replication, and perform administrative tasks such as failover or manual switchover operations.

repmgr has provided advanced support for PostgreSQL’s built-in replication mechanisms since they were introduced in 9.0. The current repmgr series, repmgr 5, supports the latest developments in replication functionality introduced from PostgreSQL 9.3 such as cascading replication, timeline switching and base backups via the replication protocol.

First testing purpose, we’re gonna set this environment as follow:

we use CentOS 7 with PostgreSQL12 installed on each server

vb_1 = Master, 192.168.100.49

vb_2 = Standby1, 192.168.100.50

vb_3 = Standby2, 192.168.100.57

witness = Witness, 192.168.100.56

1. Install the repmgr

Install repmgr on every node using this command

yum install -y repmgr12

Don’t forget to allow postgres through firewall

# sudo firewall-cmd --permanent --zone=public –add-service=postgresql
# sudo systemctl reload firewalld

2. Master configuration

After installing repmgr on master, we have to change the configuration file (default location is in /etc/repmgr/12/repmgr.conf)

node_id=1
node_name='masterdb'
conninfo='host=192.168.100.49 user=repmgr dbname=repmgr'
data_directory='/var/lib/pgsql/12/data'

Next edit the postgres configuration of master to this value:

listen_addresses = '*'
port = 5432
max_connections = 100
max_wal_senders = 10
wal_level = 'hot_standby'
hot_standby = on
archive_mode = on
archive_command = '/bin/true'
wal_keep_segments = 100

We need to edit the pg_hba on all machine too:

host 	repmgr     	 repmgr 	192.168.100.49/32 	trust
host 	replication      repmgr 	192.168.100.49/32 	trust
host 	repmgr     	 repmgr 	192.168.100.50/32 	trust
host 	replication      repmgr 	192.168.100.50/32 	trust
host 	repmgr     	 repmgr 	192.168.100.56/32 	trust
host 	replication      repmgr 	192.168.100.56/32 	trust
host 	repmgr     	 repmgr 	192.168.100.57/32 	trust
host 	replication      repmgr 	192.168.100.57/32 	trust

Create user and database for repmgr

-bash-4.2$ createuser -s repmgr
-bash-4.2$ createdb repmgr -O repmgr
-bash-4.2$ psql
postgres=# ALTER USER repmgr SET search_path TO repmgr, "$user", public;
ALTER ROLE

Register this node as master

-bash-4.2$  repmgr -f /etc/repmgr/12/repmgr.conf master register

To check whether the master has been register, we can use this command:

-bash-4.2$  repmgr -f /etc/repmgr/12/repmgr_master.conf cluster show

3. Slave1 Configuration

On slave1, we also need to change the repmgr configuration file (default location is in /etc/repmgr/12/repmgr.conf)

node_id=2
node_name='slavedb1'
conninfo='host=192.168.100.50 user=repmgr dbname=repmgr'
data_directory='/var/lib/pgsql/12/data'

Clone the data from master

-bash-4.2$ /usr/pgsql-12/bin/repmgr -h 192.168.100.49 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone

register the standby, remember to turn on the server first:

-bash-4.2$ /usr/pgsql-12/bin/pg_ctl -D /var/lib/pgsql/12/data start
-bash-4.2$  repmgr -f /etc/repmgr/12/repmgr.conf standby register

4. Slave2 Configuration

On slave2, we will do the same things just like on slave1, first we change the configuration file for repmgr

node_id=3
node_name='slavedb2'
conninfo='host=192.168.100.57 user=repmgr dbname=repmgr'
data_directory='/var/lib/pgsql/12/data'

Clone the data from master

-bash-4.2$  /usr/pgsql-12/bin/repmgr -h 192.168.100.49 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone

register the standby, remember to turn on the server first:

-bash-4.2$ /usr/pgsql-12/bin/pg_ctl -D /var/lib/pgsql/12/data start
-bash-4.2$  repmgr -f /etc/repmgr/12/repmgr.conf standby register

5. Preparing The Witness

Witness node will be used for automatic failover in consensus process. So we are gonna create repmgr user and database using this command:

createuser --replication --createdb --createrole --superuser repmgr

/usr/pgsql-12/bin/psql -c 'ALTER USER repmgr SET search_path TO repmgr, "\$user\", public;'

createdb repmgr --owner=repmgr

Also change the repmgr.conf

node_id=4
node_name='node_witness'
conninfo='host=192.168.100.56 user=repmgr dbname=repmgr'
data_directory='/var/lib/pgsql/12/data'

Register the witness

/usr/pgsql-12/bin/repmgr -h 192.168.100.49 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf  witness register -F

at this point replication should working well, next we will set a configuration for Automatic Failover using repmgrd

6. Setting an Automatic Failover

on each node set this following lines in postgresql.conf file:

shared_preload_libraries = ‘repmgr’

also edit repmgrd.conf file in each node:

failover='automatic'
promote_command='/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-12/bin/repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'
monitor_interval_secs=2
connection_check_type='ping'
reconnect_attempts=4
reconnect_interval=8
primary_visibility_consensus=true
standby_disconnect_on_failover=true
repmgrd_service_start_command='sudo /usr/bin/systemctl start repmgr12.service'
repmgrd_service_stop_command='sudo /usr/bin/systemctl stop repmgr12.service'
service_start_command='sudo /usr/bin/systemctl start postgresql-12.service'
service_stop_command='sudo /usr/bin/systemctl stop postgresql-12.service'
service_restart_command='sudo /usr/bin/systemctl restart postgresql-12.service'
service_reload_command='sudo /usr/bin/systemctl reload postgresql-12.service'
monitoring_history=yes
log_status_interval=60

on standby1 set the priority=40

on standby2 set the priority=60

so on failover process standby2 will be promoted to a new primary

7. Start repmgr daemon (repmgrd)

check the repmgrd met & requirement

repmgr -f /etc/repmgr/12/repmgr.conf daemon start --dry-run

if there’s no error log, then continue:

repmgr -f /etc/repmgr/12/repmgr.conf daemon start

check the repmgrd service using:

repmgr -f /etc/repmgr/12/repmgr.conf cluster event --event=repmgrd_start
cat /var/log/messages | grep repmgr | less

8. Simulating a Failed Primary

Here on this step, we will make the primary down so we can se the Automatic Failover Process. we simulate a failed primary by stopping data cluster on primary node (masterdb) using this command:

/usr/pgsql-12/bin/pg_ctl -D /12/data stop

check the status of cluster using:

repmgr -f /etc/repmgr/12/repmgr.conf cluster show --verbose

it should look like this

ID | Name         | Role    | Status    | Upstream   | Location | Priority | Timeline | Connection string                            
----+--------------+---------+-----------+------------+----------+----------+----------+-----------------------------------------------
 1  | masterdb     | primary | - failed  | ?          | default  | 100      |          | host=192.168.100.49 user=repmgr dbname=repmgr
 2  | slavedb      | standby |   running | ! slavedb2 | default  | 40       | 2        | host=192.168.100.50 user=repmgr dbname=repmgr
 3  | slavedb2     | primary | * running |            | default  | 60       | 2        | host=192.168.100.57 user=repmgr dbname=repmgr
 4  | node_witness | witness | * running | slavedb2   | default  | 0        | n/a      | host=192.168.100.56 user=repmgr dbname=repmgr

As we can see slavedb2 automatically become the new primary

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>