Deploying PostgreSQL for High Availability with Patroni, etcd and HAProxy – Part 1

2 Mar, 2021

LinkedInTwitter

In the first of a series of blogs on deploying PostgreSQL for High Availability (HA), we show how this can be done by leveraging technologies such as Patroni, etcd and HAProxy.

This blog gives an introduction to the relevant technologies and starts with showing how to deploy etcd as the distributed configuration store that Patroni will subsequently use for HA. Subsequent blogs will show how to configure Patroni, HAProxy and PostgreSQL

The second part of this blog can be found here.

For demonstration purposes the examples show how to manually install and configure the relevant components. However it is highly recommended to follow standard DevOps practices and use a configuration management tool such as Ansible or any other tool to install and configure the various software packages and OS.

If you would like to know more about how to implement modern data and cloud technologies, such as PostgreSQL, into to your business, we at Digitalis do it all: from cloud migration to fully managed services, we can help you modernize your operations, data, and applications. We provide consulting and managed services on clouddata, and DevOps for any business type. Contact us today for more information.

Patroni at a glance

Patroni is an automatic failover system for PostgreSQL built by Zalando. It provides automatic or manual failover and keeps all of the vital data stored into a distributed configuration store (DCS) that can be one of etcd, zookeeper, consul or a pure python RAFT implementation based on the library pysyncobj. Patroni is available either as a pip package or via the official RPM/DEB PostgreSQL repositories.

When Patroni runs on the top of the primary node it stores a token into the DCS. The token has a limited TTL measurable in seconds. If the primary node becomes unavailable then the token expires and patroni on the followers via the DCS initiates the election of a new primary.

When the old primary comes back on line then it discovers that the token is held by another node and patroni transforms the old primary into the follower of the new primary automatically.

The database connections do not happen directly to the database nodes but are routed via a connection proxy like HAProxy or pgbouncer. This proxy, by querying the patroni rest api, determines the active node.

It’s then clear that by using Patroni the risk of having a split brain scenario is very limited.

However by using patroni a DBA needs to surrender completely the manual database administration to patroni because all the dynamic settings are stored into the DCS in order to have complete consistency on the participating nodes.

In this blog post we’ll see how to build a Patroni cluster on the top of CentOS 7 by using etcd in clustering and HAProxy active on each database node capable of routing the database connection automatically to the primary, whatever node we decide to connect to.

Software installation

In order to set up PostgreSQL and Patroni we need to add the official pgdg (PostgreSQL Global Development Group) yum repository to CentOS.

The PostgreSQL website has an easy to use wizard to grab the commands depending on the distribution in use at the url https://www.postgresql.org/download/linux/redhat/

For CentOS 7 adding the yum repository is simple as that.

sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

We can now install the PostgreSQL 13 binaries by running the following command.

sudo yum install -y postgresql13-server postgresql13-server postgresql13 postgresql13-contrib

In order to use the packaged version of patroni we need to install the epel-release provides additional packages required by patroni

sudo yum install -y epel-release

Finally we can install the following packages

  • python36
  • patroni
  • patroni-etcd
  • etcd
  • HAProxy

OS setup

The CentOS base installation needs to be configured to prevent firewalld and selinux that need to be adjusted before configuring our Patroni cluster. 

For firewalld we need to open the ports required by etcd, PostgreSQL, HAProxy and the Patroni REST api.

For selinux we need to enable the flag enabling HAProxy binding to the ip addresses.

Firewalld

The ports required for operating patroni/etcd/haproxy/postgresql are the following.

  • 5432 PostgreSQL standard port, not used by PostgreSQL itself but by HAProxy
  • 6432 PostgreSQL listening port used by HAproxy to route the database connections
  • 2380 etcd peer urls port required by the etcd members communication
  • 2376 etcd client port required by any client including patroni to communicate with etcd
  • 8008 patroni rest api port required by HAProxy to check the nodes status
  • 7000 HAProxy port to expose the proxy’s statistics

Enabling the ports is very simple and can be automated via script or ansible using the firewalld module.

By using bash it is possible to enable the ports with a simple for loop.

for service_port in (5432 6432 2380 2376 8008 7000) 
do
sudo firewall-cmd --permanent --zone=public --add-port=${service_port}/tcp
done
sudo systemctl reload firewalld

selinux

selinux by default prevents the new services to bind to all the ip addresses.

In order to allow HAProxy to bind the ports required for its functionality we need to run this command.

sudo setsebool -P haproxy_connect_any=1

Network

For our example we are using three virtualbox virtual machines. Each machine has two network interfaces. The first interface is bridged on the host network adapter and is used for  internet access. The second interface is connected to the host only network provided by virtualbox and used for inter communication between the machines.

The second interface network is configured with a static ip address and each node has the hosts file configured with the other hosts ip names resolved to machine names.

127.0.0.1    localhost localhost.localdomain localhost4 localhost4.localdomain4
::1     	localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.56.40    patroni01
192.168.56.41    patroni02
192.168.56.42    patroni03

Etcd three node cluster

We are now ready to configure etcd to work as a cluster of three nodes.

For doing so we need to edit the file in /etc/etcd/etcd.conf and modify the following variables.

ETCD_LISTEN_PEER_URLS="http://192.168.56.40:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.56.40:2379"
ETCD_NAME="patroni01"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.56.40:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.56.40:2379"
ETCD_INITIAL_CLUSTER="patroni01=http://192.168.56.40:2380,patroni02=http://192.168.56.41:2380,patroni03=http://192.168.56.42:2380"

All the variables except ETCD_INITIAL_CLUSTER are machine specific and must be set accordingly with the machine name and ip address.

The variable ETCD_INITIAL_CLUSTER is a comma separated values list of the hosts participating in the etcd cluster.

After configuring etcd on each node we can enable and start the service.

sudo systemctl enable etcd
sudo systemctl start etcd

We can check the cluster’s health status with the following command.

etcdctl --endpoints http://patroni01:2379 cluster-health
member 75e96c8926bc6382 is healthy: got healthy result from http://192.168.56.40:2379
member 7c1dfc5e13a8008a is healthy: got healthy result from http://192.168.56.42:2379
member c920522ba9a75e17 is healthy: got healthy result from http://192.168.56.41:2379
cluster is healthy

Wrap up

We’ve seen how to configure a three node etcd cluster on the patroni nodes. This example configuration is made of three members.

In the next posts we’ll see how to configure and initialise a patroni cluster using the DCS (etcd) running on the database nodes and how to setup HAProxy for routing efficiently the database connections to the active leader.

Categories

Archives

Related Articles