In K8s, workloads are rightsized via requests and limits set for CPU and memory resources. This is how you avoid issues like overprovisioning, pod eviction, CPU starvation, or running out of memory.
Kubernetes has two types of resource configurations:
- Requests specify how much of each resource a container needs. The Scheduler uses this info to choose a Node. Pod will be guaranteed to have at least this amount of resources.
- Limits, when specified, are used by kubelet and enforced by throttling or terminating the process in a container.
Teams use limits to avoid raking up a massive cloud bill – by placing limits, you can make sure that pods don't use too much capacity. However, this may cause your application to crash.
And If you set these values too high, prepare for overprovisioning and waste.
When setting up a new application, start by setting requests and limits higher. Then monitor usage and adjust.
Note: You specify CPU and memory resources by setting requests and limits, but their enforcement is different. When CPU usage goes over the limit, the CPU is throttled, slowing down the performance of the container. When the same happens to memory, the container can get OOM killed, potentially leaving unfinished operations and user requests.