Are your Kubernetes workloads using resources efficiently? Check out these 5 best practices for achieving optimal cost and performance
View in browser
image

K8s Workload Rightsizing: 5 Best Practices for Optimizing Cost and Performance

It's a bird! It's a plane! It's your cloud spend going to the sky because of overprovisioned workloads! 💸💸💸

 

Don’t worry, you’re not the only one battling with K8s workload configuration out there. 

 

Luckily, there are things you can do to rightsize your workloads and make sure they don’t request more resources than they can burn.

 

To quickly recap, rightsizing has two faces in Kubernetes:

  • Instance rightsizing – to make sure that your K8s workloads have just enough capacity to keep the lights on.
  • Workload rightsizing – setting your workloads to request the right amount of resources and run smoothly.

 

We covered the first one in the previous edition of Kubernetes IRL, which you can read here.

 

Here are 5 best practices to help make your workloads more cost-efficient without compromising performance.

K8s workload rightsizing

1. Use the right metrics to identify inefficiencies

When planning capacity for K8s workloads, use metrics like CPU and RAM usage to spot inefficiencies and understand how much capacity your workloads really need.

 

Kubernetes components provide metrics in the Prometheus format. Naturally, Prometheus is a very popular open-source solution for Kubernetes monitoring.

 

  • For example, for CPU utilization, use the metric container_cpu_usage_seconds_total. 
  • A good metric for memory usage is container_memory_working_set_bytes since this is what the OOM killer is watching for.
  • Add kube_pod_container_resource_limits_memory_bytes to the dashboard together with used memory to instantly see when usage approaches limits.

Use container_cpu_cfs_throttled_seconds_total to monitor if any workloads are being throttled by a CPU limit that is too low.

2. Choose the right scaling method

When scaling your applications, you can go for one of these approaches: more small pods vs. fewer larger ones.

 

There should be at least two replicas of the application to ensure higher availability. More than a couple is better for reducing the impact of failure. It also enables you to use spot instances efficiently.

 

With more replicas, you also get more granular horizontal scaling – adding or removing a replica has a smaller impact on total resource usage. 

 

Don't go to the other extreme as well; too many small pods take resources from K8s. Also, there are limits for the number of pods per node or IP addresses in a subnet.

3. Set requests and limits

In K8s, workloads are rightsized via requests and limits set for CPU and memory resources. This is how you avoid issues like overprovisioning, pod eviction, CPU starvation, or running out of memory. 

 

Kubernetes has two types of resource configurations:

  • Requests specify how much of each resource a container needs. The Scheduler uses this info to choose a Node. Pod will be guaranteed to have at least this amount of resources.
  • Limits, when specified, are used by kubelet and enforced by throttling or terminating the process in a container.



Teams use limits to avoid raking up a massive cloud bill – by placing limits, you can make sure that pods don't use too much capacity. However, this may cause your application to crash.

 

And If you set these values too high, prepare for overprovisioning and waste. 

 

When setting up a new application, start by setting requests and limits higher. Then monitor usage and adjust.

 

Note: You specify CPU and memory resources by setting requests and limits, but their enforcement is different. When CPU usage goes over the limit, the CPU is throttled, slowing down the performance of the container. When the same happens to memory, the container can get OOM killed, potentially leaving unfinished operations and user requests.​

​4. Use autoscaling

To automate workload rightsizing, use autoscaling. Kubernetes has two mechanisms in place:

  • Horizontal Pod Autoscaler (HPA)
  • Vertical Pod Autoscaler (VPA)

 

The tighter your Kubernetes scaling mechanisms are configured, the lower the waste and costs of running your application. A common practice is to scale down during off-peak hours.

 

Make sure that HPA and VPA policies don’t clash. While VPA automatically adjusts your requests and limits configuration, HPA adjusts the number of replicas. Make sure these policies aren’t interfering with each other. 


Get more information about K8s autoscaling best practices here.

5. Keep your rightsizing efforts in check

Perform remedial steps and assess previous resource use on a regular basis. Tracking capacity utilization over time helps reduce uncontrolled resource use.

 

I hope this guide helps you rightsize workloads like a pro.

 

Cheers,

__________________________

 

Found this email useful? Forward it to your friends and colleagues who need more Kubernetes best practices in their lives!

CAST AI Group Inc., 111 NE 1st Street, 8th Floor #1041, Miami,Florida,33132,United States,

Manage Subscriptions

chrome_QoYZZ1Dbeu