Kubernetes Capacity Planning with Vertical Pod Autoscaler

Introduction
When you deploy a new Pod into your Kubernetes cluster it is good practice to restrict how much memory and CPU it is able to use. This is set up in the resources section:
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: app
image: images.my-company.example/app:v4
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
One of the most time-consuming tasks we perform is deciding how much resources we should allocate to each pod we deploy. This is very important for many reasons:
- Too few resources will impact your application delivery and it will either run too slow or will crash often with OOM errors (Out Of Memory)
- Too many resources allocated sometimes can even be worse as you are telling the application to use more than it needs which not only impacts the misconfigured pod but everything else by taking away resources much needed by others. Resources can be expensive, so don’t waste money.
- The Vertical Pod Autoscaler is a fantastic tool to use for this and it can help you in several ways:
- It can advise on the right values to use for your memory and CPU allocation
- It can update the deployments with the recommended values
Enabling
If you are using Cloud-based Kubernetes servers you will need to check if your provider supports VPA. For example, Google’s GKE does support it but it is disabled by default.
~$ gcloud container clusters \
create CLUSTER_NAME — enable-vertical-pod-autoscaling
If you are using terraform, this is also configurable using the option enable_vertical_pod_autoscaling.
Installation
We always recommend using helm charts for any installation. There are several community charts you can use but I’m going to recommend the one published by Fairwinds which also includes a nice Web UI and a controller we can play with.
The default installation would look like this:
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa — namespace vpa — create-namespace
If you want to add the Goldilocks web UI you can enable it as well using an override values file. My example below includes an ingress to be able to reach it but you can also leave the ingress out and use port-forwarding:
dashboard:
enabled: true
replicaCount: 1
ingress:
enabled: true
annotations:
external-dns.alpha.kubernetes.io/hostname: goldilocks.digitalis.io
path: /
# Ingress Host
hosts:
- host: goldilocks.digitalis.io
paths:
- /
Usage
You will need to create a VerticalPodAutoscaler object for each of the deployments you would like to let VPA work on. For example, I have a Kafka cluster running and I’d like to get some metrics from it:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: kafka
namespace: kafka
spec:
targetRef:
apiVersion: "apps/v1"
kind: StatefulSet
name: digitalis-kafka
updatePolicy:
updateMode: "Off"
targetRef is used to discover the applications (in my example a StatefulSet). My Kafka cluster is deployed to the kafka namespace and it is called digitalis-kafka.
updateMode is quite important. Possible values are “Off”, “Initial”, “Recreate”, and “Auto”.
- Initial: The Initial mode automatically applies VPA recommendations only at pod creation.
- The Auto and Recreate modes automatically apply the VPA CPU and memory recommendations throughout the pod lifetime. The VPA deletes any pods in the project that are out of alignment with its recommendations. When redeployed by the workload object, the VPA updates the new pods with its recommendations.
- The Off mode only provides recommended resource limits and requests, allowing you to manually apply the recommendations. The off mode does not update pods.
If you are going to let VPA update your pods, I would recommend using the Initial mode or you may find pods restarted by VPA. However, I prefer leaving the update mode to Off and applying the recommendations whenever it is more convenient for us.
You can see the recommendations by describing the object:
~$ kubectl describe -n kafka verticalpodautoscaler kafka
[...]
Status:
Conditions:
Last Transition Time: 2021-09-27T15:23:04Z
Status: False
Type: LowConfidence
Last Transition Time: 2021-09-27T15:23:04Z
Status: True
Type: RecommendationProvided
Recommendation:
Container Recommendations:
Container Name: kafka
Lower Bound:
Cpu: 25m
Memory: 2711617536
Target:
Cpu: 95m
Memory: 2760900608
Uncapped Target:
Cpu: 95m
Memory: 2760900608
Upper Bound:
Cpu: 125m
Memory: 3310354432
Goldilocks
Goldilocks give you a bit more than just installing VPA for you. It can also create the VPA objects and on top of that, it provides a nice Web UI.
I also have a Postgres database in my lab. This time instead of creating a VPA object for each of the deployments, I’m going to label the namespace informing goldilocks to create the configuration for me.
~$ kubectl label ns postgres goldilocks.fairwinds.com/enabled=true
~$ kubectl get verticalpodautoscaler -n postgres
NAME AGE
digitalis 1m
digitalis-backrest-shared-repo 1m
digitalis-dcun 1m
digitalis-qaka 1m
As you can see shortly after labelling the namespace goldilocks created the virtualpodautoscaler and I just need to wait a few minutes to get the data.
Now I can open the Web UI and check them out:

Grafana
Finally, if you prefer not to use Goldilocks Web UI and you are using Grafana you can have a look at this dashboard that will show you the recommendations clearly.

Conclusion
Vertical Pod Autoscaler can be a powerful tool in your arsenal to better use the resources. By controlling resource allocation you may be able to run more pods and decrease your overall cost of running the platform.
You can also combine VPA with the Horizontal Pod Autoscaler (HPA) which will allow you to scale your pods both horizontally and vertically. By using the VPA to fine tune the resource allocations you will most likely be able to reduce the number of times the HPA triggers new pod deployments.