Kubernetes: downscale your costs

July 2, 2025

Introduction

Kubernetes has transformed how we deploy and manage containerised applications, offering unprecedented flexibility and scalability. However, this power comes with a significant caveat that many organisations discover only after their cloud bills arrive: mismanaged Kubernetes clusters can become exceptionally expensive.

The statistics are sobering. Studies reveal that organisations typically waste 32% of their cloud spend, with Kubernetes overprovisioning being a major contributor. In clusters with 50 or more CPUs, only 13% of provisioned CPUs and 20% of memory are actually utilised. This represents billions of dollars in unnecessary expenses across the industry, with 78% of companies estimating that 21–50% of their cloud expenditure is wasted annually.

https://www.theregister.com/2024/03/01/kubernetes_cluster_overprovisioning/

The theory behind Kubernetes cost management is straightforward: scale resources up when demand increases, scale down when it decreases. In practice, organisations consistently fail at the “scaling down” part.

Vertical Scaling: The “Set and Forget” Problem

When teams allocate more memory and CPU than workloads actually require, they’re engaging in what’s called vertical scaling inefficiency. This overallocation leads to underutilised nodes and forces organisations to provision more hardware than necessary.

The root cause is often anxiety-driven decision-making. DevOps teams, fearful of running out of memory or experiencing performance issues, intentionally over-allocate resources. While this approach might prevent immediate problems, it creates long-term financial inefficiencies.

From a cost perspective, vertical scaling can be more expensive as you approach hardware limits, with costs increasing dramatically for high-end resources. The financial impact becomes particularly pronounced when considering that vertical scaling often requires more powerful and expensive hardware.

Horizontal Scaling: The Proliferation Problem

The second challenge occurs when organisations add new resources for exciting projects but never decommission them afterwards. This horizontal scaling inefficiency manifests as idle resources running in the background, heightening cloud costs without providing value.

The problem is compounded by the ease with which teams can provision new resources. Shadow IT practices, where employees provision cloud services without proper oversight, lead to uncontrolled spending and a lack of standardisation. While this agility can drive innovation, it creates management headaches and cost overruns.

Horizontal Scaling 🔜 Vertical Scaling

What’s the solution?

Let’s go through the different options

Turn it off when you don’t use it! 🤯

It sounds obvious, doesn’t it? So why are you NOT doing it? Test and development environments often represent a cost as large, if not larger, than production environments. In most cases, it makes no sense at all to leave them running when there is no one using them: nights, weekends, public holidays, etc.

If you ask your favourite AI, it will take only 1 minute to give you a script to run as a scheduled job that will scale down the infrastructure when you’re not using it. Easy savings!

Set up an Autoscaler

The autoscaler you should use will depend on where your cluster runs. Unfortunately, this guide won’t be useful if you’re running on-premises or bare metal infrastructure. I’ll try to cover those scenarios in a future blog post. However, if you’re in the cloud, you will find at least one solution that works for you.

Cluster autoscaler: part of the Kubernetes project.

✅Pros: Supports lots of cloud providers, including all the hyperscalers.

❌Cons: Works with predefined cloud provider Auto Scaling Groups (ASGs) or Node Pools. It scales within these groups. It can be slow to react, and if uncontrolled, can lead to overprovisioning.

Karpenter:

✅Pros: In my opinion, it’s the best. It works very well, it has flexible configuration, and it’s easy to install and manage.
❌Cons: It only supports AWS

GKE Autopilot:

✅Pros: GKE Autopilot simplifies billing by charging based on pod-level resource usage. This means that users pay for the CPU, memory, and ephemeral storage allocated to the pods, rather than the nodes that run them. It is easy to install and manage.

❌Cons: It only works in Google Cloud. You can’t manage the nodes if this is something you need to do.

Node Autoprovisioning (NAP):

✅Pros: it is based in Karpenter, it offers similar options, and it is also relatively easy to install and configure.

❌Cons: at the time of writing, it’s in preview, so it could be unstable.

Review pods’ memory and CPU allocations

The simplest way, if you use Grafana, is to just use the Kubernetes dashboards to manually review the uses vs allocated memory and adjust your deployments accordingly.

Remember, when the pods don’t fit in your Kubernetes nodes, you will need more. If you overprovision the memory and CPU, you will likely have more workers than required, active and consuming 💸

The harder way is to install and use a Vertical Pod Autoscaler. It can either advise you on what adjustments you need or tell it to do it for you (spoiler: I wouldn’t recommend it).

You can read more about this in an old blog post I wrote ages ago.

Event-Driven Autoscaler

Traditional autoscalers often rely on resource metrics like CPU utilisation or memory consumption. While useful, these metrics might not always accurately reflect the true workload or demand, especially in modern, event-driven architectures where work is initiated by messages, queue lengths, or other external triggers.

KEDA uses “scalers” (or adapters) that are specifically designed to connect to and monitor different event sources. These scalers understand how to extract meaningful metrics (e.g., queue length, message count) from those sources, as well as the traditional memory and CPU usage.

Final words

Talking about money isn’t always the most thrilling topic, and I apologise if this post felt a bit dry. However, the core message is simple and powerful: avoid unnecessary spending.

You may wonder, if I’m asking you to downscale to save money, why I’m talking almost exclusively about autoscalers to increase the node count. Two reasons:

Autoscalers can go up and down
You should start from a very small size and let your load and clients decide the size of the cluster. For example, one of the clusters I manage starts as small as 10 worker nodes and grows up to 200–300 at peak times. It scales back down when the load dies.

When it comes to managing your Kubernetes clusters, there are excellent tools available that can help you do so wisely and save you a significant amount.

Still have questions or feel a bit lost? Don’t hesitate to reach out. We’re here to help! Contact our team for a free consultation to discuss how we can tailor our approach to your specific needs and challenges.

I, for one, welcome our new robot overlords

‍

Subscribe to newsletter

Subscribe to receive the latest blog posts to your inbox every week.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.