In the first part of this blog we configured an etcd cluster on top of three CentOS 7 servers. We had to tweak the operating system configuration in order to have everything running smoothly. In this post we’ll see how to configure Patroni using the running etcd cluster as a distributed configuration store (DCS) and HAProxy to route connections to the active leader.
The patroni configuration is a yaml file divided in sections with each section controlling a specific part of the patroni behaviour. We’ll save the patroni configuration file in /etc/patroni/patroni.yml
Let’s have a look at the configuration file in detail.
Namespace, Scope and name
The keys scope, namespace and name properties in the yml file control the node cluster’s membership – the namespace is where the cluster is created within the DCS and the identification for the node.
This comes quite handy if we have a dedicated DCS cluster with multiple patroni clusters configured. We can define either a namespace for each cluster or store multiple clusters within the same namespace.
Scope and namespace are the same across the three nodes, the name value must be within the cluster.
Our example we’ll have the following settings:
# patroni01 scope: region_one namespace: /patroni_test/ name: patroni01 # patroni02 scope: region_one namespace: /patroni_test/ name: patroni02 # patroni03 scope: region_one namespace: /patroni_test/ name: patroni02
The restapi dictionary defines the configuration for the REST API used by patroni. In particular, the key listen – this defines the address and the port where the REST API service listens. Similarly the key connect_address – this defines the address and port used by patroni for querying the REST API.
The restapi can be secured by defining the path to the certificate file and key using the certfile and keyfile configuration options. It’s also possible to configure authentication for the restapi using the authentication configuration option within restapi config.
In a production setting it would be reccomended to enable the above security options. However, in our example the restapi is configured in a simple fashion, with no security enabled, as below.
#patroni 01 restapi: listen: 192.168.56.40:8008 connect_address: 192.168.56.40:8008 #patroni 02 restapi: listen: 192.168.56.41:8008 connect_address: 192.168.56.41:8008 #patroni 02 restapi: listen: 192.168.56.42:8008 connect_address: 192.168.56.42:8008
Obviously, the ip address is machine specific.
The etcd: configuration value is used to define the connection to the DCS if etcd is used. In our example we store all the participating hosts in the key hosts as a comma separated string.
The configuration in our example is the same on all of the patroni nodes and is the following
etcd: hosts: 192.168.56.40:2379,192.168.56.41:2379,192.168.56.42:2379
The bootstrap section is used during the bootstrap of the patroni cluster.
The contents of the dcs configuration is written into the DCS in the position /<namespace>/scope/config after the patroni cluster is initialized.
The data stored in the DCS is then used as the global configuration for all the members in the cluster and should be managed only by interacting via patronictl or REST api call.
However some parameters like ttl, loop_wait etc. are dynamic and read from the DCS in a global fashion. Other parameters like postgresql.listen, postgresql.data_dir are local to the node and shall be set in the configuration file instead.
In our example we are setting up the bootstrap section in this way.
bootstrap: dcs: ttl: 10 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 postgresql: use_pg_rewind: true parameters: initdb: - encoding: UTF8 - data-checksums pg_hba: - host replication replicator 0.0.0.0/0 md5 - host all all 0.0.0.0/0 md5 users:
The dcs section defines the behaviour of the check against the DCS to manage the primary status and the eventual new leader election.
- ttl: defines the lifetime in seconds for the token held by the primary
- loop_wait: seconds the loop check for the token will sleep
- retry_timeout: timeout in seconds for DCS and PostgreSQL operation retries. Any timeout shorter than this value will not cause the leader demotion
- maximum_lag_on_failover: if the lag in bytes between the primary and the follower is larger than this value then the follower won’t participate in the new leader election
- postgresql: dictionary where it’s possible to define specific options like the usage of pg_rewind of the replication slots and the cluster parameters.
We are also configuring the postgresql dictionary to initialize the cluster with certain parameters. The initdb list defines options to pass to initdb, during the bootstrap process (e.g. cluster encoding or the checksum usage).
The pg_hba list defines the entries in pg_hba.conf set after the cluster is initialized.
The users key defines additional users to create after initializing the new cluster. In our example is empty.
The postgresql section defines the node specific settings. Our configuration is the following.
postgresql: listen: "*:6432" connect_address: patroni01:6432 data_dir: /var/lib/pgsql/data/postgresql0 bin_dir: /usr/pgsql-13/bin/ pgpass: /tmp/pgpass0 authentication: replication: username: replicator password: replicator superuser: username: postgres password: postgres rewind: username: rewind_user password: rewind parameters:
In particular the key listen is used by patroni to set the postgresql.conf parameters listen_addresses and port.
The key connect_address defines the address and the port through which Postgres is accessible from other nodes and applications.
The key data_dir is used to tell patroni the path of the cluster’s data directory.
The key bin_dir is used to tell patroni where the PostgreSQL binaries are located.
The key pg_pass specifies the filename of the password authentication file used by patroni to connect to the running PostgreSQL database.
The authentication dictionary is used to define the connection parameters for the replication user, the super user and the rewind user if we are using pg_rewind to remaster an old primary.
In order to have patroni started automatically we need to setup a systemd unit file in /etc/systemd/system. We name our file patroni.service with the following contents.
[Unit] Description=Runners to orchestrate a high-availability PostgreSQL After=syslog.target network.target [Service] Type=simple User=postgres Group=postgres WorkingDirectory=/var/lib/pgsql # Start the patroni process ExecStart=/bin/patroni /etc/patroni/patroni.yml # Send HUP to reload from patroni.yml ExecReload=/bin/kill -s HUP $MAINPID # only kill the patroni process, not its children, so it will gracefully stop postgres KillMode=process # Give a reasonable amount of time for the server to start up/shut down TimeoutSec=30 # Do not restart the service if it crashes, we want to manually inspect database on failure Restart=no [Install] WantedBy=multi-user.target
After the service file creation we need to make systemd aware of the new service.
Then we can enable the service and start it.
sudo systemctl daemon-reload sudo systemctl enable patroni sudo systemctl start patroni
As soon as we start the patroni service we should see PostgreSQL bootstrap on the first node.
We can monitor the process via patronictl with the following command:
patronictl -c /etc/patroni/patroni.yml list
The output is something like this:
We can then start the patroni service on the other two nodes to make the follower join the cluster. By default patroni will build the new replicas by using pg_basebackup.
When all the nodes are up and running the patronictl command output will change in this way.
HAProxy connection router
In order to have the connection routed to the active primary we need to configure the HAProxy service in a proper way.
First we need to have HAProxy to listen for connections on the PostgreSQL standard port 5432. Then HAProxy should check the patroni api to determine which node is the primary.
This is done with the following configuration.
global maxconn 100 defaults log global mode tcp retries 2 timeout client 30m timeout connect 4s timeout server 30m timeout check 5s listen stats mode http bind *:7000 stats enable stats uri / listen region_one bind *:5432 option httpchk http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server patroni01 192.168.56.40:6432 maxconn 80 check port 8008 server patroni02 192.168.56.41:6432 maxconn 80 check port 8008 server patroni03 192.168.56.42:6432 maxconn 80 check port 8008
This example configuration enables the HAProxy statistics on port 7000. The region_one section is named after the patroni scope for consistency and listens on the port 5432. Each patroni server is listed as a server to be checked on port 8008, the REST api port, to determine whether the node is up.
After configuring starting HAProxy on each node we will be able to connect on any of the nodes and end always on the primary. In case of failover the connection will drop and at the next connection attempt we’ll connect to the new primary.
This simple example shows how to set up a three node Patroni cluster without no single point of failure (SPOF). To do this we have etcd configured in a cluster with a member installed on each database node. In a similar fashion we have HAProxy insatlled and running on each database node.
However for production it would be reccomended to setup etcd on dedicated hosts and configure SSL for etcd and the Patroni REST APIs, if the network is not trusted or to avoid accidents.
Additionally, for HAProxy in production it is strongly suggested to have a load balancer capable of checking if the HAProxy service is available before attempting a connection.
Having an up and running Patroni cluster requires a lot of configuration. Therefore it is strongly recommended to use a configuration management tool such as Ansible to deploy and confgure your cluster.
If you would like to know more about how to implement modern data and cloud technologies, such as PostgreSQL, into to your business, we at Digitalis do it all: from cloud migration to fully managed services, we can help you modernize your operations, data, and applications. We provide consulting and managed services on cloud, data, and DevOps for any business type. Contact us for more information.
This blog describes different approaches to SSH tunneling along with practicle examples. Hopefully you find it helpful!
This blog shows you how and provides a fully working Ansible project on Github to install Kafka and manage its security.
If you want to understand how to easily ingest data from Kafka topics into Cassandra than this blog can show you how with the DataStax Kafka Connector.