What's the difference between resource requests and limits?
Quick Answer
A **request** is the amount of CPU/memory the scheduler guarantees is reserved for a container when deciding which node to place it on — it's the minimum a container is assumed to need. A **limit** is the maximum a container is allowed to consume at runtime — exceeding a CPU limit causes throttling, while exceeding a memory limit causes the container to be killed (OOMKilled). Requests drive scheduling decisions; limits drive runtime enforcement — and they're allowed to differ, which is exactly what creates the different Quality of Service classes.
Detailed Answer
Defining both
spec:
containers:
- name: app
image: myapp:1.0
resources:
requests:
cpu: "250m" # 0.25 CPU cores, guaranteed available
memory: "256Mi"
limits:
cpu: "500m" # can burst up to 0.5 CPU cores, then gets throttled
memory: "512Mi" # hard ceiling -- exceeding this gets the container killed
Requests — what the scheduler reserves
The scheduler only places a Pod on a node that has enough unreserved capacity to satisfy the sum of all its containers' requests — it doesn't look at a node's actual current usage, only at how much has already been requested by other Pods already scheduled there. This is a deliberate, conservative guarantee: if your Pod requests 256Mi of memory, Kubernetes guarantees a node with at least that much capacity available, regardless of what else happens to be running.
Node with 4Gi total memory, already running Pods requesting 3Gi total
-> only 1Gi of "requestable" capacity remains, regardless of how much
of the already-running Pods' memory is ACTUALLY being used right now
-> a new Pod requesting 1.5Gi will NOT be scheduled here, even if the
node's actual current usage is well under 4Gi
Limits — what's enforced at runtime
A limit is a hard ceiling enforced by the kubelet/container runtime while the container is actually running, independent of scheduling:
- CPU limit exceeded: the container is throttled (its CPU time is capped via the Linux kernel's CFS quota mechanism) — it keeps running, just slower, never killed for this reason alone.
- Memory limit exceeded: the container is OOMKilled (terminated by the kernel's out-of-memory mechanism) — memory can't be "throttled" the way CPU can, since there's no meaningful way to make an over-budget memory allocation just "run slower."
Why requests and limits are allowed to differ
Setting a limit higher than the request lets a container burst above its guaranteed baseline when the node happens to have spare, unreserved capacity available — useful for workloads with variable, spiky resource needs that don't need their peak usage permanently reserved. But it also means a node can be overcommitted: the sum of all limits on a node can exceed the node's actual total capacity, since not every container is expected to hit its limit simultaneously. If several containers do burst simultaneously and collectively exceed the node's real capacity, something has to give — which containers get throttled or killed first is governed by Quality of Service classes (see that question), derived directly from how requests and limits relate to each other.
What happens with no requests/limits set at all
A container with neither specified is treated as needing essentially nothing for scheduling purposes (it can be scheduled anywhere, even a nearly-full node) and has no enforced ceiling at runtime — it can consume as much of the node's resources as are physically available, potentially starving other, properly-configured workloads on the same node. This is almost always a misconfiguration in production — every container should have explicit requests (so the scheduler makes sound placement decisions) and, in most cases, limits (so one misbehaving container can't take down everything else sharing its node).
Set requests based on realistic, measured typical usage (not a guess), and set limits based on the worst-case acceptable ceiling for that workload — too tight a memory limit causes unnecessary OOMKills under normal, expected load spikes; too loose (or absent) a limit risks one container starving its node-mates. This is a tuning process that benefits from real monitoring data, not a one-time guess made at initial deployment.