Immutable virtual machine images, part deux

Some time ago I wrote an introduction to immutable images, and today I would like to share some of the other tools and techniques we use for this. As discussed previously, the role of immutable images is to provide a consistent, secure base image to run your applications. The image should be patched up to high standards (we prefer PCI-DSS) and contain all the software required to run the service, such as Apache Cassandra, Kafka, or our own in-house applications.
What these images are missing is the customer per-environment configuration, which should be applied on boot. And this is exactly where I'd like to start today's blog post.
On-boot configuration
When you deploy the immutable image that you have studiously built with Packer and orchestrated with Ansible, Chef, or plain shell, it will already have a configuration file with nearly everything in place. Let's say you are deploying a PostgreSQL image. It will have:
- PostgreSQL installed and pinned to the version we tested, with the data directory laid out on its own volume ready to mount.
- The operating system hardened to our PCI-DSS standards, with monitoring, logging, and backup agents baked in.
- A configuration file (
postgresql.conf,pg_hba.conf) templated with everything we know at build time, and the service enabled but not yet started.
What's missing:
- IP address
- Hostname
- Whether this node is a primary or a replica
- Secrets such as the database passwords and TLS certificates
- Any other environment-specific configuration
Now you need this configuration completed the first time the server starts up. There are several ways to do this, and each has its place. Let me run through them.
cloud-init
cloud-init is the de facto standard for first-boot configuration, and it ships with nearly every cloud image out there. It runs very early in the boot process and reads its instructions, the so-called user-data, from the cloud provider's metadata service. We use it for the core foundations: setting the hostname, writing the network configuration and IP address, opening the firewall, mounting volumes, and dropping in the right SSH keys.
A small example of user-data:
#cloud-config
hostname: pg-prod-01
fqdn: pg-prod-01.example.internal
write_files:
- path: /etc/postgresql/conf.d/listen.conf
content: |
listen_addresses = '10.0.1.21'
runcmd:
- systemctl restart postgresqlOne thing to keep in mind is that user-data has a size limit, typically 16KB on most providers, so do not try to cram everything into it. We use cloud-init to lay the foundations and let the next two methods do the heavy lifting.
systemd one-shot (run once)
For anything that will not fit in cloud-init, or that is easier to express as a proper script, we use a systemd one-shot service. The script itself is pre-installed by Packer at build time, and the unit simply runs it once on first boot. This is where we pull secrets from the secrets manager, template the heavier configuration files, and register the node in service discovery.
The trick is to make it idempotent and guard it so it only ever runs once:
# /etc/systemd/system/first-boot.service
[Unit]
Description=First boot configuration
After=network-online.target
Wants=network-online.target
ConditionPathExists=!/var/lib/first-boot.done
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/local/bin/first-boot.sh
ExecStartPost=/usr/bin/touch /var/lib/first-boot.done
[Install]
WantedBy=multi-user.targetThe ConditionPathExists guard means that even if the box reboots a hundred times the script will not run again, which keeps things deterministic and avoids drift.
ExecStartPre
Finally, application-specific configuration can be retrieved and configured from the systemd ExecStartPre of the application's own unit. This runs before the application (PostgreSQL, Cassandra, and so on) starts up, and crucially it runs on every start, not just the first one.
That makes it the right place for anything that must be fresh each time the service comes up, for example a short-lived TLS certificate or a dynamic database credential fetched from OpenBao or Vault moments before PostgreSQL starts:
# /etc/systemd/system/postgresql.service.d/override.conf
[Service]
ExecStartPre=/usr/local/bin/fetch-secrets.shBecause it runs on every start, the configuration stays current and you never end up with a stale secret baked into the image.
Which one to choose?
All of them. Each has its merits, and they map cleanly onto different stages of boot:
- cloud-init runs very early, first boot only. We use it for the IP address, hostname, firewall and routing, basically the core foundations.
- systemd one-shot runs once on first boot. Because cloud-init has a size limit, we pair it with a one-shot to kick off a larger Packer-installed script that handles first-time secrets and service discovery.
- ExecStartPre runs before the application, on every start. We leave the short-lived bits here, the TLS certs and dynamic credentials that must stay fresh.
Used together, these three give you a clean separation: cloud-init lays the foundations, the one-shot does the first-boot heavy lifting, and ExecStartPre keeps the short-lived bits current. The result is a server that goes from a generic golden image to a fully configured, environment-specific node in seconds, with no drift and nobody needing to SSH in.
Where are the configs?
Good question. You have now templated the config. The missing part is that you need to store that configuration somewhere, and do it securely. Remember, you will have passwords, SSL certificates, and plenty of other sensitive information in there.
Many of our customers on hyperscalers use their secrets or parameter store for this. Their boot script pulls the missing values and completes the config. Because we work with a long list of companies on all sorts of hosting providers, we need a method that works anywhere. This is why we use vals, still one of my favourite DevOps tools and very much underrated by the community. We also use it extensively in Kubernetes as part of our vals-operator.
vals can pull configuration from a large number of backend stores. We simply template the configuration using vals references. When we move the same immutable image to another provider, we only need to swap those references for others, and that saves us a huge amount of pain and work.
The boot script renders the file with a single vals eval -f config.yaml, and every reference is replaced with the real value at first boot. For example, a config templated for OpenBao would look like:
# config.yaml, resolved by vals on first boot
postgres:
replication_password: ref+vault://secret/data/pg/prod?address=https://openbao.internal:8200#/replication_password
ssl_cert: ref+vault://secret/data/pg/prod?address=https://openbao.internal:8200#/server_crtand the same for AWS Secrets Manager:
# config.yaml, resolved by vals on first boot
postgres:
replication_password: ref+awssecrets://prod/pg/replication?region=eu-west-2#/password
ssl_cert: ref+awssecrets://prod/pg/tls?region=eu-west-2#/server_crtNotice that the only thing changing between the two is the reference scheme. The structure of the file, the boot script, and everything else stays exactly the same, which is the whole point. OpenBao is Vault-compatible too, so the same ref+vault scheme works against either of them.
Closing thoughts
The golden image gets you most of the way there, but it is this on-boot configuration that turns a generic image into a working server. Keep each step fast, idempotent, and free of drift, and you get the best of both worlds: the repeatability of immutable infrastructure with the flexibility to drop the same image into any environment.
Next time, I would like to cover health checks and how we decide whether a freshly booted VM is fit to take traffic or should simply be replaced.
I, for one, welcome our new robot overlords.
If you'd like a hand building secure, repeatable infrastructure, the team at Digitalis.io designs and runs immutable, PCI-DSS-grade platforms for data services like Apache Cassandra, Kafka, and PostgreSQL.

.png)

