Deploying PostgreSQL for High Availability with Patroni, etcd and HAProxy – Part 2

6 Apr, 2021

LinkedInTwitter

In the first part of this blog we configured an etcd cluster on top of three CentOS 7 servers. We had to tweak the operating system configuration in order to have everything running smoothly. In this post we’ll see how to configure Patroni using the running etcd cluster as a distributed configuration store (DCS) and HAProxy to route connections to the active leader.

Patroni Configuration

The patroni configuration is a yaml file divided in sections with each section controlling a specific part of the patroni behaviour. We’ll save the patroni configuration file in /etc/patroni/patroni.yml

Let’s have a look at the configuration file in detail.

Namespace, Scope and name

The keys scope, namespace and name properties in the yml file control the node cluster’s membership – the namespace is where the cluster is created within the DCS and the identification for the node.

This comes quite handy if we have a dedicated DCS cluster with multiple patroni clusters configured. We can define either a namespace for each cluster or store multiple clusters within the same namespace.

Scope and namespace are the same across the three nodes, the name value must be within the cluster.

Our example we’ll have the following settings:

# patroni01
scope: region_one
namespace: /patroni_test/
name: patroni01

# patroni02
scope: region_one
namespace: /patroni_test/
name: patroni02

# patroni03
scope: region_one
namespace: /patroni_test/
name: patroni02

restapi

The restapi dictionary defines the configuration for the REST API used by patroni. In particular, the key listen – this defines the address and the port where the REST API service listens. Similarly the key connect_address – this defines the address and port used by patroni for querying the REST API.

The restapi can be secured by defining the path to the certificate file and key using the certfile and keyfile configuration options. It’s also possible to configure authentication for the restapi using the authentication configuration option within restapi config.

In a production setting it would be reccomended to enable the above security options. However, in our example the restapi is configured in a simple fashion, with no security enabled, as below.

#patroni 01
restapi:
  listen: 192.168.56.40:8008
  connect_address: 192.168.56.40:8008

#patroni 02
restapi:
  listen: 192.168.56.41:8008
  connect_address: 192.168.56.41:8008

#patroni 02
restapi:
  listen: 192.168.56.42:8008
  connect_address: 192.168.56.42:8008

Obviously, the ip address is machine specific.

etcd

The etcd: configuration value is used to define the connection to the DCS if etcd is used. In our example we store all the participating hosts in the key hosts as a comma separated string.

The configuration in our example is the same on all of the patroni nodes and is the following

etcd:
  hosts: 192.168.56.40:2379,192.168.56.41:2379,192.168.56.42:2379

bootstrap

The bootstrap section is used during the bootstrap of the patroni cluster.

The contents of the dcs configuration is written into the DCS in the position /<namespace>/scope/config after the patroni cluster is initialized.
The data stored in the DCS is then used as the global configuration for all the members in the cluster and should be managed only by interacting via patronictl or REST api call.

However some parameters like ttl, loop_wait etc. are dynamic and read from the DCS in a global fashion. Other parameters like postgresql.listen, postgresql.data_dir are local to the node and shall be set in the configuration file instead.

In our example we are setting up the bootstrap section in this way.

bootstrap:
  dcs:
    ttl: 10
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      parameters:
  initdb:  
  - encoding: UTF8
  - data-checksums

  pg_hba:  
  - host replication replicator 0.0.0.0/0 md5
  - host all all 0.0.0.0/0 md5
  users:

The dcs section defines the behaviour of the check against the DCS to manage the primary status and the eventual new leader election.

  • ttl: defines the lifetime in seconds for the token held by the primary
  • loop_wait: seconds the loop check for the token will sleep
  • retry_timeout: timeout in seconds for DCS and PostgreSQL operation retries. Any timeout shorter than this value will not cause the leader demotion
  • maximum_lag_on_failover: if the lag in bytes between the primary and the follower is larger than this value then the follower won’t participate in the new leader election
  • postgresql: dictionary where it’s possible to define specific options like the usage of pg_rewind of the replication slots and the cluster parameters.

We are also configuring the postgresql dictionary to initialize the cluster with certain parameters. The initdb list defines options to pass to initdb, during the bootstrap process (e.g. cluster encoding or the checksum usage).

The pg_hba list defines the entries in pg_hba.conf set after the cluster is initialized.

The users key defines additional users to create after initializing the new cluster. In our example is empty.

postgresql

The postgresql section defines the node specific settings. Our configuration is the following.

postgresql:
  listen: "*:6432"
  connect_address: patroni01:6432
  data_dir: /var/lib/pgsql/data/postgresql0
  bin_dir: /usr/pgsql-13/bin/
  pgpass: /tmp/pgpass0
  authentication:
    replication:
  	  username: replicator
  	  password: replicator
    superuser:
  	username: postgres
  	password: postgres
    rewind:  
  	username: rewind_user
  	password: rewind
  parameters:

In particular the key listen is used by patroni to set the postgresql.conf parameters listen_addresses and port.
The key connect_address defines the address and the port through which Postgres is accessible from other nodes and applications.

The key data_dir is used to tell patroni the path of the cluster’s data directory.
The key bin_dir is used to tell patroni where the PostgreSQL binaries are located.
The key pg_pass specifies the filename of the password authentication file used by patroni to connect to the running PostgreSQL database.

The authentication dictionary is used to define the connection parameters for the replication user, the super user and the rewind user if we are using pg_rewind to remaster an old primary.

Service setup

In order to have patroni started automatically we need to setup a systemd unit file in /etc/systemd/system. We name our file patroni.service with the following contents.

[Unit]
Description=Runners to orchestrate a high-availability PostgreSQL
After=syslog.target network.target

[Service]
Type=simple

User=postgres
Group=postgres

WorkingDirectory=/var/lib/pgsql

# Start the patroni process
ExecStart=/bin/patroni /etc/patroni/patroni.yml

# Send HUP to reload from patroni.yml
ExecReload=/bin/kill -s HUP $MAINPID

# only kill the patroni process, not its children, so it will gracefully stop postgres
KillMode=process

# Give a reasonable amount of time for the server to start up/shut down
TimeoutSec=30

# Do not restart the service if it crashes, we want to manually inspect database on failure
Restart=no

[Install]
WantedBy=multi-user.target

After the service file creation we need to make systemd aware of the new service.

Then we can enable the service and start it.

 sudo systemctl daemon-reload
 sudo systemctl enable patroni
 sudo systemctl start patroni

As soon as we start the patroni service we should see PostgreSQL bootstrap on the first node.

We can monitor the process via patronictl with the following command:
patronictl -c /etc/patroni/patroni.yml list

The output is something like this:

Patroni List 1

We can then start the patroni service on the other two nodes to make the follower join the cluster. By default patroni will build the new replicas by using pg_basebackup.

When all the nodes are up and running the patronictl command output will change in this way.

Patroni List 2
Patroni Sreaming Replication

HAProxy connection router

In order to have the connection routed to the active primary we need to configure the HAProxy service in a proper way.

First we need to have HAProxy to listen for connections on the PostgreSQL standard port 5432. Then HAProxy should check the patroni api to determine which node is the primary.

This is done with the following configuration.

global
	maxconn 100

defaults
	log global
	mode tcp
	retries 2
	timeout client 30m
	timeout connect 4s
	timeout server 30m
	timeout check 5s

listen stats
	mode http
	bind *:7000
	stats enable
	stats uri /

listen region_one
	bind *:5432
	option httpchk
	http-check expect status 200
	default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    	server patroni01 192.168.56.40:6432 maxconn 80 check port 8008
    	server patroni02 192.168.56.41:6432 maxconn 80 check port 8008
    	server patroni03 192.168.56.42:6432 maxconn 80 check port 8008

This example configuration enables the HAProxy statistics on port 7000. The region_one section is named after the patroni scope for consistency and listens on the port 5432. Each patroni server is listed as a server to be checked on port 8008, the REST api port, to determine whether the node is up.

After configuring starting HAProxy on each node we will be able to connect on any of the nodes and end always on the primary. In case of failover the connection will drop and at the next connection attempt we’ll connect to the new primary.

Patroni Sreaming Replication

Wrap up

This simple example shows how to set up a three node Patroni cluster without no single point of failure (SPOF). To do this we have etcd configured in a cluster with a member installed on each database node. In a similar fashion we have HAProxy insatlled and running on each database node.

However for production it would be reccomended to setup etcd on dedicated hosts and configure SSL for etcd and the Patroni REST APIs, if the network is not trusted or to avoid accidents.

Additionally, for HAProxy in production it is strongly suggested to have a load balancer capable of checking if the HAProxy service is available before attempting a connection.

Having an up and running Patroni cluster requires a lot of configuration. Therefore it is strongly recommended to use a configuration management tool such as Ansible to deploy and confgure your cluster.

If you would like to know more about how to implement modern data and cloud technologies, such as PostgreSQL, into to your business, we at Digitalis do it all: from cloud migration to fully managed services, we can help you modernize your operations, data, and applications. We provide consulting and managed services on cloud, data, and DevOps for any business type. Contact us for more information.

Categories

Archives

Related Articles

ECS Container monitoring using cAdvisor

ECS Container monitoring using cAdvisor

In this topic, I will explain how to monitor docker containers running on an ECS cluster. Even though AWS CloudWatch is the preferred tool for monitoring and collecting container metrics, in some scenarios it is required to use alternative solutions.