Deploying PostgreSQL for High Availability with Patroni, etcd and HAProxy - Part 1

March 2, 2021

@ET-DC@eyJkeW5hbWljIjp0cnVlLCJjb250ZW50IjoicG9zdF90aXRsZSIsInNldHRpbmdzIjp7ImJlZm9yZSI6IiIsImFmdGVyIjoiIn19@[addtoany buttons="twitter", "linkedin"]

In the first of a series of blogs on deploying PostgreSQL for High Availability (HA), we show how this can be done by leveraging technologies such as Patroni, etcd and HAProxy.

This blog gives an introduction to the relevant technologies and starts with showing how to deploy etcd as the distributed configuration store that Patroni will subsequently use for HA. Subsequent blogs will show how to configure Patroni, HAProxy and PostgreSQL

The second part of this blog can be found here.

For demonstration purposes the examples show how to manually install and configure the relevant components. However it is highly recommended to follow standard DevOps practices and use a configuration management tool such as Ansible or any other tool to install and configure the various software packages and OS.

If you would like to know more about how to implement modern data and cloud technologies, such as PostgreSQL, into to your business, we at Digitalis do it all: from cloud migration to fully managed services, we can help you modernize your operations, data, and applications. We provide consulting and managed services on cloud, data, and DevOps for any business type. Contact us today for more information.

Patroni at a glance

Patroni is an automatic failover system for PostgreSQL built by Zalando. It provides automatic or manual failover and keeps all of the vital data stored into a distributed configuration store (DCS) that can be one of etcd, zookeeper, consul or a pure python RAFT implementation based on the library pysyncobj. Patroni is available either as a pip package or via the official RPM/DEB PostgreSQL repositories.

When Patroni runs on the top of the primary node it stores a token into the DCS. The token has a limited TTL measurable in seconds. If the primary node becomes unavailable then the token expires and patroni on the followers via the DCS initiates the election of a new primary.

When the old primary comes back on line then it discovers that the token is held by another node and patroni transforms the old primary into the follower of the new primary automatically.

The database connections do not happen directly to the database nodes but are routed via a connection proxy like HAProxy or pgbouncer. This proxy, by querying the patroni rest api, determines the active node.

It’s then clear that by using Patroni the risk of having a split brain scenario is very limited.

However by using patroni a DBA needs to surrender completely the manual database administration to patroni because all the dynamic settings are stored into the DCS in order to have complete consistency on the participating nodes.

In this blog post we’ll see how to build a Patroni cluster on the top of CentOS 7 by using etcd in clustering and HAProxy active on each database node capable of routing the database connection automatically to the primary, whatever node we decide to connect to.

Software installation

In order to set up PostgreSQL and Patroni we need to add the official pgdg (PostgreSQL Global Development Group) yum repository to CentOS.

The PostgreSQL website has an easy to use wizard to grab the commands depending on the distribution in use at the url https://www.postgresql.org/download/linux/redhat/ .

For CentOS 7 adding the yum repository is simple as that.

c3VkbyB5dW0gaW5zdGFsbCAteSBodHRwczovL2Rvd25sb2FkLnBvc3RncmVzcWwub3JnL3B1Yi9yZXBvcy95dW0vcmVwb3JwbXMvRUwtNy14ODZfNjQvcGdkZy1yZWRoYXQtcmVwby1sYXRlc3Qubm9hcmNoLnJwbQ==

We can now install the PostgreSQL 13 binaries by running the following command.

c3VkbyB5dW0gaW5zdGFsbCAteSBwb3N0Z3Jlc3FsMTMtc2VydmVyIHBvc3RncmVzcWwxMy1zZXJ2ZXIgcG9zdGdyZXNxbDEzIHBvc3RncmVzcWwxMy1jb250cmli

In order to use the packaged version of patroni we need to install the epel-release provides additional packages required by patroni

c3VkbyB5dW0gaW5zdGFsbCAteSBlcGVsLXJlbGVhc2U=

Finally we can install the following packages

python36
patroni
patroni-etcd
etcd
HAProxy

OS setup

The CentOS base installation needs to be configured to prevent firewalld and selinux that need to be adjusted before configuring our Patroni cluster.

For firewalld we need to open the ports required by etcd, PostgreSQL, HAProxy and the Patroni REST api.

For selinux we need to enable the flag enabling HAProxy binding to the ip addresses.

Firewalld

The ports required for operating patroni/etcd/haproxy/postgresql are the following.

5432 PostgreSQL standard port, not used by PostgreSQL itself but by HAProxy
6432 PostgreSQL listening port used by HAproxy to route the database connections
2380 etcd peer urls port required by the etcd members communication
2376 etcd client port required by any client including patroni to communicate with etcd
8008 patroni rest api port required by HAProxy to check the nodes status
7000 HAProxy port to expose the proxy’s statistics

Enabling the ports is very simple and can be automated via script or ansible using the firewalld module.

By using bash it is possible to enable the ports with a simple for loop.

Zm9yIHNlcnZpY2VfcG9ydCBpbiAoNTQzMiA2NDMyIDIzODAgMjM3NiA4MDA4IDcwMDApIApkbwpzdWRvIGZpcmV3YWxsLWNtZCAtLXBlcm1hbmVudCAtLXpvbmU9cHVibGljIC0tYWRkLXBvcnQ9JHtzZXJ2aWNlX3BvcnR9L3RjcApkb25lCnN1ZG8gc3lzdGVtY3RsIHJlbG9hZCBmaXJld2FsbGQ=

selinux

selinux by default prevents the new services to bind to all the ip addresses.

In order to allow HAProxy to bind the ports required for its functionality we need to run this command.

c3VkbyBzZXRzZWJvb2wgLVAgaGFwcm94eV9jb25uZWN0X2FueT0x

Network

For our example we are using three virtualbox virtual machines. Each machine has two network interfaces. The first interface is bridged on the host network adapter and is used for internet access. The second interface is connected to the host only network provided by virtualbox and used for inter communication between the machines.

The second interface network is configured with a static ip address and each node has the hosts file configured with the other hosts ip names resolved to machine names.

MTI3LjAuMC4xICAgIGxvY2FsaG9zdCBsb2NhbGhvc3QubG9jYWxkb21haW4gbG9jYWxob3N0NCBsb2NhbGhvc3Q0LmxvY2FsZG9tYWluNAo6OjEgICAgIAlsb2NhbGhvc3QgbG9jYWxob3N0LmxvY2FsZG9tYWluIGxvY2FsaG9zdDYgbG9jYWxob3N0Ni5sb2NhbGRvbWFpbjYKCjE5Mi4xNjguNTYuNDAgICAgcGF0cm9uaTAxCjE5Mi4xNjguNTYuNDEgICAgcGF0cm9uaTAyCjE5Mi4xNjguNTYuNDIgICAgcGF0cm9uaTAz

Etcd three node cluster

We are now ready to configure etcd to work as a cluster of three nodes.

For doing so we need to edit the file in /etc/etcd/etcd.conf and modify the following variables.

RVRDRF9MSVNURU5fUEVFUl9VUkxTPSJodHRwOi8vMTkyLjE2OC41Ni40MDoyMzgwIgpFVENEX0xJU1RFTl9DTElFTlRfVVJMUz0iaHR0cDovLzE5Mi4xNjguNTYuNDA6MjM3OSIKRVRDRF9OQU1FPSJwYXRyb25pMDEiCkVUQ0RfSU5JVElBTF9BRFZFUlRJU0VfUEVFUl9VUkxTPSJodHRwOi8vMTkyLjE2OC41Ni40MDoyMzgwIgpFVENEX0FEVkVSVElTRV9DTElFTlRfVVJMUz0iaHR0cDovLzE5Mi4xNjguNTYuNDA6MjM3OSIKRVRDRF9JTklUSUFMX0NMVVNURVI9InBhdHJvbmkwMT1odHRwOi8vMTkyLjE2OC41Ni40MDoyMzgwLHBhdHJvbmkwMj1odHRwOi8vMTkyLjE2OC41Ni40MToyMzgwLHBhdHJvbmkwMz1odHRwOi8vMTkyLjE2OC41Ni40MjoyMzgwIg==

All the variables except ETCD_INITIAL_CLUSTER are machine specific and must be set accordingly with the machine name and ip address.

The variable ETCD_INITIAL_CLUSTER is a comma separated values list of the hosts participating in the etcd cluster.

After configuring etcd on each node we can enable and start the service.

c3VkbyBzeXN0ZW1jdGwgZW5hYmxlIGV0Y2QKc3VkbyBzeXN0ZW1jdGwgc3RhcnQgZXRjZA==

We can check the cluster’s health status with the following command.

ZXRjZGN0bCAtLWVuZHBvaW50cyBodHRwOi8vcGF0cm9uaTAxOjIzNzkgY2x1c3Rlci1oZWFsdGgKbWVtYmVyIDc1ZTk2Yzg5MjZiYzYzODIgaXMgaGVhbHRoeTogZ290IGhlYWx0aHkgcmVzdWx0IGZyb20gaHR0cDovLzE5Mi4xNjguNTYuNDA6MjM3OQptZW1iZXIgN2MxZGZjNWUxM2E4MDA4YSBpcyBoZWFsdGh5OiBnb3QgaGVhbHRoeSByZXN1bHQgZnJvbSBodHRwOi8vMTkyLjE2OC41Ni40MjoyMzc5Cm1lbWJlciBjOTIwNTIyYmE5YTc1ZTE3IGlzIGhlYWx0aHk6IGdvdCBoZWFsdGh5IHJlc3VsdCBmcm9tIGh0dHA6Ly8xOTIuMTY4LjU2LjQxOjIzNzkKY2x1c3RlciBpcyBoZWFsdGh5

Wrap up

We’ve seen how to configure a three node etcd cluster on the patroni nodes. This example configuration is made of three members.

In the next posts we’ll see how to configure and initialise a patroni cluster using the DCS (etcd) running on the database nodes and how to setup HAProxy for routing efficiently the database connections to the active leader.

Subscribe to newsletter

Subscribe to receive the latest blog posts to your inbox every week.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Ready to Transform  
Your Business?

Let’s Talk

Deploying PostgreSQL for High Availability with Patroni, etcd and HAProxy - Part 1

Patroni at a glance

Software installation

OS setup

Firewalld

selinux

Network

Etcd three node cluster

Wrap up

Related Articles

Read More

CloudNative-PG: running PostgreSQL in K8s

Using PGBouncer with Consul for Postgresql high availability

Kafka Installation and Security with Ansible - Topics, SASL and ACLs

Ready to Transform  
Your Business?

CloudNative-PG: running PostgreSQL in K8s

Using PGBouncer with Consul for Postgresql high availability

Kafka Installation and Security with Ansible - Topics, SASL and ACLs

Ready to Transform Your Business?

Ready to Transform  
Your Business?