K3s – lightweight kubernetes made ready for production – Part 1

2 Jun, 2021

LinkedInTwitter

This is part 1 in a three part blog series on deploying k3s, a certified Kubernetes distribution from SUSE Rancher, in a secure and available fashion. A fullying working Ansible project, https://github.com/digitalis-io/k3s-on-prem-production, has been made available to deploy and secure k3s for you.

If you would like to know more about how to implement modern data and cloud technologies, such as Kubernetes, into to your business, we at Digitalis do it all: from cloud migration to fully managed services, we can help you modernize your operations, data, and applications. We provide consulting and managed services on Kubernetes, clouddata, and DevOps for any business type. Contact us today for more information or learn more about each of our services here.

Introduction

There are many advantages to running an on-premises kubernetes cluster, it can increase performance, lower costs, and SOMETIMES cause fewer headaches. Also it allows users who are unable to utilize the public cloud to operate in a “cloud-like” environment. It does this by decoupling dependencies and abstracting infrastructure away from your application stack, giving you the portability and the scalability that’s associated with cloud-native applications.

There are obvious downsides to running your kubernetes cluster on-premises, as it’s up to you to manage a series of complexities like:

  • Etcd
  • Load Balancers
  • High Availability
  • Networking
  • Persistent Storage
  • Internal Certificate rotation and distribution

And added to this there is the inherent complexity of running such a large orchestration application, so running:

  • kube-apiserver
  • kube-proxy
  • kube-scheduler
  • kube-controller-manager
  • kubelet

And ensuring that all of these components are correctly configured, talk to each other securely (TLS) and reliably.

But is there a simpler solution to this?

Introducing K3s

K3s is a fully CNCF (Cloud Native Computing Foundation) certified, compliant Kubernetes distribution by SUSE (formally Rancher Labs) that is easy to use and focused on lightness.

To achieve that it is designed to be a single binary of about 45MB that completely implements the Kubernetes APIs. To ensure lightness they removed a lot of extra drivers that are not strictly part of the core, but still easily replaceable with external add-ons.

So Why choose K3s instead of full K8s?

Being a single binary it’s easy to install and bring up and it internally manages a lot of pain points of K8s like:

  • Internally managed Etcd cluster
  • Internally managed TLS communications
  • Internally managed certificate rotation and distribution
  • Integrated storage provider (localpath-provisioner)
  • Low dependency on base operating system

So K3s doesn’t even need a lot of stuff on the base host, just a recent kernel and `cgroups`.
All of the other utilities are packaged internally like:

This leads to really low system requirements, just 512MB RAM is asked for a worker node.

K3s How it Works

Image Source: https://k3s.io/

K3s is a fully encapsulated binary that will run all the components in the same process. One of the key differences from full kubernetes is that, thanks to KINE, it supports not only Etcd to hold the cluster state, but also SQLite (for single-node, simpler setups) or external DBs like MySQL and PostgreSQL (have a look at this blog or this blog on deploying PostgreSQL for HA and service discovery)

The following setup will be performed on pretty small nodes:

  • 6 Nodes
  • 3 Master nodes
  • 3 Worker nodes
  • 2 Core per node
  • 2 GB RAM per node
  • 50 GB Disk per node
  • CentOS 8.3

What do we need to create a production-ready cluster?

We need to have a Highly Available, resilient, load-balanced and Secure cluster to work with. So without further ado, let’s get started with the base underneath, the Nodes. The following 3 part blog series is a detailed walkthrough on how to set up the k3s kubernetes cluster, with some snippets taken from the project’s Github repo: https://github.com/digitalis-io/k3s-on-prem-production

Secure the nodes

Network

First things first, we need to lay out a compelling network layout for the nodes in the cluster. This will be split in two, EXTERNAL and INTERNAL networks.

  • The INTERNAL network is only accessible from within the cluster, and on top of that the Flannel network (using VxLANs) is built upon.
  • The EXTERNAL network is exclusively for erogation purposes, it will just expose the port 80, 443 and 6443 for K8s APIs (this could even be skipped)
kubernetes network

This ensures that internal cluster-components communication is segregated from the rest of the network.

Firewalld

Another crucial set up is the firewalld one. First thing is to ensure that firewalld uses iptables backend, and not nftables one as this is still incompatible with kubernetes. This done in the Ansible project like this:

    - name: Set firewalld backend to iptables
      replace:
        path: /etc/firewalld/firewalld.conf
        regexp: FirewallBackend=nftables$
        replace: FirewallBackend=iptables
        backup: yes
      register: firewalld_backend

This will require a reboot of the machine.

Also we will need to set up zoning for the internal and external interfaces, and set the respective open ports and services.

Internal Zone

For the internal network we want to open all the necessary ports for kubernetes to function:

  • 2379/tcp # etcd client requests
  • 2380/tcp # etcd peer communication
  • 6443/tcp # K8s api
  • 7946/udp # MetalLB speaker port
  • 7946/tcp # MetalLB speaker port
  • 8472/udp # Flannel VXLAN overlay networking
  • 9099/tcp # Flannel livenessProbe/readinessProbe
  • 10250-10255/tcp # kubelet APIs + Ingress controller livenessProbe/readinessProbe
  • 30000-32767/tcp # NodePort port range
  • 30000-32767/udp # NodePort port range

And we want to have rich rules to ensure that the PODs network is whitelisted, this should be the final result

internal (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0
  sources: 
  services: cockpit dhcpv6-client mdns samba-client ssh
  ports: 2379/tcp 2380/tcp 6443/tcp 80/tcp 443/tcp 7946/udp 7946/tcp 8472/udp 9099/tcp 10250-10255/tcp 30000-32767/tcp 30000-32767/udp
  protocols: 
  masquerade: yes
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules: 
        rule family="ipv4" source address="10.43.0.0/16" accept
        rule family="ipv4" source address="10.44.0.0/16" accept
        rule protocol value="vrrp" accept

External Zone

For the external network we only want the port 80 and 443 and (only if needed) the 6443 for K8s APIs.

The final result should look like this

public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth1
  sources: 
  services: dhcpv6-client
  ports: 80/tcp 443/tcp 6443/tcp
  protocols: 
  masquerade: yes
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules: 

Selinux

Another important part is that selinux should be embraced and not deactivated! The smart guys of SUSE Rancher provide the rules needed to make K3s work with selinux enforcing. Just install it!

# Workaround to the RPM/YUM hardening
# being the GPG key enforced at rpm level, we cannot use
# the dnf or yum module of ansible
- name: Install SELINUX Policies  # noqa command-instead-of-module
  command: |
    rpm --define '_pkgverify_level digest' -i {{ k3s_selinux_rpm }}
  register: rpm_install
  changed_when: "rpm_install.rc == 0"
  failed_when: "'already installed' not in rpm_install.stderr and rpm_install.rc != 0"
  when:
    - "'libselinux' in ansible_facts.packages"

This is assuming that Selinux is installed (RedHat/CentOS base), if it’s not present, the playbook will skip all configs and references to Selinux.

Node Hardening

To be intrinsically secure, a network environment must be properly designed and configured. This is where the Center for Internet Security (CIS) benchmarks come in. CIS benchmarks are a set of configuration standards and best practices designed to help organizations ‘harden’ the security of their digital assets, CIS benchmarks map directly to many major standards and regulatory frameworks, including NIST CSF, ISO 27000, PCI DSS, HIPAA, and more. And it’s further enhanced by adopting the Security Technical Implementation Guide (STIG).

All CIS benchmarks are freely available as PDF downloads from the CIS website.

Included in the project repo there is an Ansible hardening role which applies the CIS benchmark to the Base OS of the Node. Otherwise there are ready to use roles that it’s recommended to run against your nodes like:

https://github.com/ansible-lockdown/RHEL8-STIG/
https://github.com/ansible-lockdown/RHEL8-CIS/

Having a correctly configured and secure operating system underneath kubernetes is surely the first step to a more secure cluster.

Installing K3s

We’re going to set up a HA installation using the Embedded ETCD included in K3s.

Bootstrapping the Masters

To start is dead simple, we first want to start the K3s server command on the first node like this

K3S_TOKEN=SECRET k3s server --cluster-init
Then following on the other masters, join the cluster
K3S_TOKEN=SECRET k3s server --server https://<ip or hostname of server1>:6443

How does it translate to ansible?

We just set up the first service, and subsequently the others

    - name: Prepare cluster - master 0 service
      template:
        src: k3s-bootstrap-first.service.j2
        dest: /etc/systemd/system/k3s-bootstrap.service
        mode: 0400
        owner: root
        group: root
      when: ansible_hostname == groups['kube_master'][0]

    - name: Prepare cluster - other masters service
      template:
        src: k3s-bootstrap-followers.service.j2
        dest: /etc/systemd/system/k3s-bootstrap.service
        mode: 0400
        owner: root
        group: root
      when: ansible_hostname != groups['kube_master'][0]

    - name: Start K3s service bootstrap /1
      systemd:
        name: k3s-bootstrap
        daemon_reload: yes
        enabled: no
        state: started
      delay: 3
      register: result
      retries: 3
      until: result is not failed
      when: ansible_hostname == groups['kube_master'][0]

    - name: Wait for service to start
      pause:
        seconds: 5
      run_once: yes

    - name: Start K3s service bootstrap /2
      systemd:
        name: k3s-bootstrap
        daemon_reload: yes
        enabled: no
        state: started
      delay: 3
      register: result
      retries: 3
      until: result is not failed
      when: ansible_hostname != groups['kube_master'][0]

After that we will be presented with a 3 Node cluster working, here the expected output

NAME       STATUS   ROLES                       AGE     VERSION
master01   Ready    control-plane,etcd,master   2d16h   v1.20.5+k3s1
master02   Ready    control-plane,etcd,master   2d16h   v1.20.5+k3s1
master03   Ready    control-plane,etcd,master   2d16h   v1.20.5+k3s1
Now it’s time to stop the bootstrap service and deploy the normal K3s server service to use from now on.
- name: Stop K3s service bootstrap
  systemd:
    name: k3s-bootstrap
    daemon_reload: no
    enabled: no
    state: stopped

- name: Remove K3s service bootstrap
  file:
    path: /etc/systemd/system/k3s-bootstrap.service
    state: absent

- name: Deploy K3s master service
  template:
    src: k3s-server.service.j2
    dest: /etc/systemd/system/k3s-server.service
    mode: 0400
    owner: root
    group: root

- name: Enable and check K3s service
  systemd:
    name: k3s-server
    daemon_reload: yes
    enabled: yes
    state: started

High Availability Masters

Another point is to have the masters in HA, so that APIs are always reachable. To do this we will use keepalived, setting up a VIP (Virtual IP) inside the Internal network.

We will need to set up the firewalld rich rule in the internal Zone to allow VRRP traffic, which is the protocol used by keepalived to communicate with the other nodes and elect the VIP holder.

- name: Install keepalived
  package:
    name: keepalived
    state: present
- name: Add firewalld rich rules /vrrp
  firewalld:
    rich_rule: rule protocol value="vrrp" accept
    permanent: yes
    immediate: yes
    state: enabled

The complete task is available in:  roles/k3s-deploy/tasks/cluster_keepalived.yml

An important configuration is to setup all nodes as BACKUP and with the same priority (default: 50) so that any node can be elected and if the original VIP node comes back it will not steal the VIP from the current holder.
vrrp_instance VI_1 {
        state BACKUP
        interface {{ keepalived_interface }}
        virtual_router_id {{ keepalived_routerid | default('50') }}
        priority {{ keepalived_priority | default('50') }}
...

Joining the workers

Now it’s time for the workers to join! It’s as simple as launching the command, following the task in roles/k3s-deploy/tasks/cluster_agent.yml

K3S_TOKEN=SECRET k3s server --agent https://<Keepalived VIP>:6443
To achieve this we will deploy a systemd service and start it and simply wait for all nodes to join
- name: Deploy K3s worker service
  template:
    src: k3s-agent.service.j2
    dest: /etc/systemd/system/k3s-agent.service
    mode: 0400
    owner: root
    group: root

- name: Enable and check K3s service
  systemd:
    name: k3s-agent
    daemon_reload: yes
    enabled: yes
    state: restarted
Expected result:
NAME       STATUS   ROLES                       AGE     VERSION
master01   Ready    control-plane,etcd,master   2d16h   v1.20.5+k3s1
master02   Ready    control-plane,etcd,master   2d16h   v1.20.5+k3s1
master03   Ready    control-plane,etcd,master   2d16h   v1.20.5+k3s1
worker01   Ready    <none>                      2d16h   v1.20.5+k3s1
worker02   Ready    <none>                      2d16h   v1.20.5+k3s1
worker03   Ready    <none>                      2d16h   v1.20.5+k3s1

Base service flags

Basic flags we can add to the service are
--selinux
To enable selinux support also in the embedded containerd and
--disable traefik
--disable servicelb

As we will be using ingress-nginx and MetalLB respectively.

And set it up so that is uses the internal network

--advertise-address {{ ansible_host }} \
--bind-address 0.0.0.0 \
--node-ip {{ ansible_host }} \
--cluster-cidr={{ cluster_cidr }} \
--service-cidr={{ service_cidr }} \
--tls-san {{ ansible_host }}

Ingress and LoadBalancer

The cluster is up and running, now we need a way to use it! We have disabled traefik and servicelb previously to accommodate ingress-nginx and MetalLB.

MetalLB will be configured using layer2 and with two classes of IPs

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    address-pools:
    - name: default
      protocol: layer2
      addresses:
        - {{ metallb_external_ip_range }}
    - name: metallb_internal_ip_range
      protocol: layer2
      addresses:
        - {{ metallb_internal_ip_range }}

So we will have space for two ingresses, the deploy files are included in the playbook, the important part is that we will have an internal and an external ingress. Internal ingress to expose services useful for the cluster or monitoring, external to erogate services to the outside world.

We can then simply deploy our ingresses for our services selecting the kubernetes.io/ingress.class

For example, an internal ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: dashboard-ingress
  namespace: kubernetes-dashboard
  annotations:
    kubernetes.io/ingress.class: "internal-ingress-nginx"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
  rules:
  - host: dashboard.192.168.122.200.nip.io
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: kubernetes-dashboard
            port:
              number: 443
Or for an external ingress:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
  namespace: my-service
  annotations:
    kubernetes.io/ingress.class: "ingress-nginx"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
  rules:
  - host: my-service.192.168.1.200.nip.io
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-service
            port:
              number: 443

Checkpoint

Mem:          total        used        free      shared  buff/cache   available    CPU%
master01:     1.8Gi       944Mi       112Mi        20Mi       762Mi       852Mi   3.52%
master02      1.8Gi       963Mi       106Mi        20Mi       748Mi       828Mi   3.45%
master03      1.8Gi       936Mi       119Mi        20Mi       763Mi       880Mi   3.68%
worker01      1.8Gi       821Mi       119Mi        11Mi       877Mi       874Mi   1.78%
worker02      1.8Gi       832Mi       108Mi        11Mi       867Mi       884Mi   1.45%
worker03      1.8Gi       821Mi       119Mi        11Mi       857Mi       894Mi   1.67%

Good! We now have a basic HA K3s cluster on our machines, and look at that resource usage! In just 1GB of RAM per node, we have a working kubernetes cluster.

But is it ready for production?

Not yet. We need now to secure the cluster and service before continuing!

In the next blog we will analyse how this cluster is still vulnerable to some types of attack and what best practices and remediations we will adopt to prevent this.

Remember – all of the Ansible playbooks for deploying everything are available for you to checkout on Github https://github.com/digitalis-io/k3s-on-prem-production

Categories

Archives

Related Articles