Kubernetes Requests vs Limits: CPU, Memory, and OOMKills

Cover: illustration generated for Devgains
Every Kubernetes pod that misbehaves in production eventually traces back to one under-explained pair of fields: Kubernetes requests vs limits. Requests decide where a pod lands and how much capacity it's guaranteed; limits decide when the kernel throttles or kills it. Set them too low and your app gets OOMKilled or CPU-starved under load. Set them too high and you pay for idle nodes while the scheduler refuses to pack more pods on. Leave them off entirely — the most common mistake — and one runaway container can starve every neighbor on the node. This guide explains exactly what each field does, how CPU and memory differ, why OOMKills happen, and how to pick values you can defend.
This is a supporting page for the Devgains Kubernetes architecture guide, which explains the scheduler and reconciliation loop that these numbers feed into. Here we zoom in on the two fields that decide a pod's fate on a node.
Quick answer: requests vs limits
- Request — the amount of CPU or memory a container is guaranteed. The scheduler uses the sum of requests to decide which node a pod fits on. A node only accepts a pod if its remaining requestable capacity covers the pod's requests.
- Limit — the ceiling a container may use. Exceed a CPU limit and the container is throttled (slowed, not killed). Exceed a memory limit and the container is OOMKilled (terminated and restarted), because memory can't be compressed.
The one-line rule: requests are for scheduling and guarantees; limits are for protecting the node from a single greedy container.
Why it matters
Requests and limits are the only levers you have over how Kubernetes shares a finite node between many pods. They drive three outcomes you feel directly in production:
- Scheduling. The kube-scheduler
places pods by requests, never by actual usage. A pod that requests 2 CPUs but uses 0.1 still
reserves 2 CPUs of schedulable room — so wrong requests waste money or leave pods
Pending. - Stability. Limits stop one container from consuming the whole node. Without a memory limit, a leak in one pod can trigger node-level memory pressure that evicts healthy neighbors.
- Quality of Service. The combination of requests and limits assigns each pod a QoS class that decides who gets killed first when the node runs out of memory.
Get these numbers right and the platform packs your workloads efficiently and self-heals. Get them wrong and you either overpay for capacity or spend your on-call shifts chasing OOMKills.
How CPU and memory differ
The single most important idea: CPU is compressible, memory is not. That one fact explains almost every behavior.
- CPU is measured in cores or millicores (
500m= half a core). When a container hits its CPU limit, the Linux CFS scheduler simply gives it fewer time slices — it runs slower. Nothing crashes. This is CPU throttling, and it's invisible unless you watch thecontainer_cpu_cfs_throttled_seconds_totalmetric. - Memory is measured in bytes (
256Mi,1Gi). You can't "run a process a bit slower" on memory. When a container tries to allocate past its memory limit, the kernel's OOM killer terminates it. Kubernetes records the reason asOOMKilledand restarts the container per its restart policy.
The practical takeaway lands in the best-practices section below: a CPU limit only ever slows you down, while a memory limit is a hard cliff.
QoS classes: who gets killed first
Kubernetes derives a Quality of Service class from how you set requests and limits, and uses it to decide eviction order under memory pressure:
- Guaranteed — every container sets a memory and CPU limit, and each limit equals its request. Highest priority; evicted last. Use for latency-sensitive or stateful workloads.
- Burstable — at least one container sets a request, but requests and limits don't fully match
(the common case). Evicted after
BestEffortbut beforeGuaranteed. - BestEffort — no requests or limits anywhere in the pod. First to be killed under pressure, and easily starved. Avoid in production.
When a node runs low on memory, the kubelet evicts BestEffort pods first, then Burstable pods
that are exceeding their requests, and only touches Guaranteed pods as a last resort.
Step-by-step: set requests and limits
Here's a Deployment with sensible resources on the container. Requests reflect steady-state usage; limits give headroom for spikes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
selector:
matchLabels: { app: api }
template:
metadata:
labels: { app: api }
spec:
containers:
- name: api
image: ghcr.io/acme/api:1a2b3c4
resources:
requests: # guaranteed + used for scheduling
cpu: "250m" # 0.25 of a core
memory: "256Mi"
limits: # ceiling before throttle (CPU) / OOMKill (memory)
cpu: "1" # may burst to a full core
memory: "512Mi" # killed if it allocates past thisApply it and inspect what actually happened, including the QoS class Kubernetes assigned:
kubectl apply -f api.yaml
# See the QoS class Kubernetes derived from your requests/limits:
kubectl get pod -l app=api -o jsonpath='{.items[0].status.qosClass}'
# -> Burstable (requests set, limits higher)
# When a pod restarts, check WHY. OOMKilled is the tell:
kubectl get pod -l app=api -o jsonpath='{.items[0].status.containerStatuses[0].lastState}'
# -> {"terminated":{"reason":"OOMKilled","exitCode":137, ...}}
# Confirm live usage against requests (needs metrics-server):
kubectl top pod -l app=apiExit code 137 (128 + SIGKILL's signal 9) plus reason: OOMKilled is the fingerprint of a memory
limit that's too low — or a genuine leak. Raise the limit only after you've confirmed the workload
actually needs the memory; otherwise you're just papering over a bug.
Requests vs limits at a glance
| CPU | Memory | |
|---|---|---|
| Unit | cores / millicores (500m) | bytes (256Mi, 1Gi) |
| Request does | Reserves schedulable capacity; sets CFS shares | Reserves schedulable capacity |
| Limit does | Throttles (slows) the container | OOMKills the container |
| Compressible? | Yes — over-limit = slower | No — over-limit = killed |
| Over-limit symptom | High CFS throttling, latency | OOMKilled, exit code 137, restarts |
| Safe to omit limit? | Often yes (see below) | Riskier — a leak can take the node |
Best practices
- Always set requests. They're how the scheduler reasons about capacity and how bin-packing works. A pod with no request can land anywhere and get starved. This applies to every workload controller — Deployments, StatefulSets, and DaemonSets alike; DaemonSet pods in particular compete for room on every node.
- Always set a memory limit; consider skipping the CPU limit. Because memory is incompressible, an unbounded container can OOM the node and hurt neighbors — set a memory limit. CPU is compressible, so many teams set a CPU request but no CPU limit, letting apps burst into idle cores instead of being needlessly throttled. Measure before deciding.
- Set memory request ≈ limit for critical workloads. Matching them (with a CPU limit too) earns the pod Guaranteed QoS, so it's evicted last under pressure — worth it for databases and latency-sensitive services.
- Right-size from real data, not guesses. Use
kubectl top, Prometheus, or the Vertical Pod Autoscaler in recommendation mode to set requests near the P95 of actual usage, with limits giving room for spikes. - Use LimitRanges and ResourceQuotas per namespace. A
LimitRangesets sane defaults so a pod with no resources isn'tBestEffort; aResourceQuotacaps a whole namespace so one team can't drain the cluster. - Pair resources with honest health checks. A right-sized pod still needs correct liveness and readiness probes so a throttled or restarting container is taken out of rotation cleanly during a production rollout.
Common mistakes
- No requests or limits at all. The pod becomes
BestEffort, the scheduler can't reason about it, and it's first to die under pressure. This is the default failure mode of hand-written YAML. - CPU limit set far too low. The container is throttled and latency spikes, yet nothing crashes
and
kubectl get podshowsRunning— so the symptom hides. Always watch CPU throttling metrics, not just restarts. - Memory limit lower than real usage. The container is OOMKilled, restarts, and loops through
CrashLoopBackOff. Teams often raise the limit blindly instead of checking for a leak first. - Requests set equal to observed peak. Requests should track steady-state usage, not the spike. Over-requesting reserves capacity you rarely use, so the scheduler leaves nodes half-empty and your bill climbs.
- Assuming
kubectl topreflects requests.topshows actual usage; the scheduler places by requests. A node can show 30% CPU used yet reject new pods because it's 100% requested.
Takeaways
- Requests = guaranteed + scheduling; limits = ceiling. The scheduler bin-packs by requests; limits cap what a container may consume.
- CPU is compressible, memory is not. Over a CPU limit you're throttled; over a memory limit you're OOMKilled (exit code 137).
- QoS class follows your numbers.
Guaranteed(limit = request) survives longest;BestEffort(nothing set) dies first — never run production asBestEffort. - Right-size from data. Set requests near P95 real usage, always cap memory, and use LimitRanges/quotas so no pod ships without resources.
Keep building your mental model with the Kubernetes cluster and the related DevOps guides: how the control plane turns your YAML into scheduled pods and how Services route traffic to them.
FAQ
What is the difference between requests and limits in Kubernetes? A request is the amount of CPU or memory a container is guaranteed and is what the scheduler uses to place the pod on a node. A limit is the maximum the container may use: exceeding a CPU limit throttles the container, while exceeding a memory limit gets it OOMKilled and restarted.
What happens if a pod exceeds its memory limit? The Linux kernel's OOM killer terminates the
container because memory is incompressible. Kubernetes marks the container state OOMKilled with
exit code 137 and restarts it according to the pod's restart policy — which shows up as a
CrashLoopBackOff if it keeps happening.
Should I set a CPU limit? Often no. CPU is compressible, so a CPU limit only throttles (slows) the container — it never crashes it. Many teams set a CPU request for scheduling but omit the CPU limit so apps can burst into idle cores. Always set a memory limit, though, since memory can't be throttled.
What are Kubernetes QoS classes? Kubernetes derives a Quality of Service class from your requests and limits: Guaranteed (every container's limit equals its request), Burstable (at least one request set but not fully matched), and BestEffort (nothing set). Under memory pressure the kubelet evicts BestEffort first and Guaranteed last.
How do I choose the right request and limit values? Measure real usage with kubectl top,
Prometheus, or the Vertical Pod Autoscaler in recommendation mode. Set the request near the P95 of
steady-state usage and the limit high enough to absorb spikes, then always cap memory to protect
the node.
Conclusion
Requests and limits are the contract between your workload and the node it runs on. Requests
tell the scheduler how much capacity to reserve and guarantee; limits tell the kernel when to
throttle CPU or OOMKill memory. Because CPU is compressible and memory is not, the two resources
fail in completely different ways — a lesson worth internalizing before your next 3 a.m.
OOMKilled page. Set requests on everything, always cap memory, right-size from real data, and let
QoS classes protect your most important pods. From here, revisit the
architecture guide to see
how the scheduler turns these numbers into placement decisions across your cluster.
References
- Kubernetes: Resource Management for Pods and Containers — how requests, limits, CPU units, and memory units are defined and enforced.
- Kubernetes: Pod Quality of Service Classes — how Guaranteed, Burstable, and BestEffort are assigned and used for eviction.
- Kubernetes: Node-pressure Eviction — how the kubelet reclaims memory and chooses which pods to evict.
- Kubernetes: Assign Memory Resources to Containers — a worked example of OOMKills and memory limits.
- Kubernetes: Limit Ranges — setting per-namespace defaults and constraints for requests and limits.



