What happens when a pod exceeds its memory limit vs. its CPU limit?

6 minadvancedoomkilledcpu-throttlingresource-limits

Quick Answer

Exceeding a **memory** limit gets the container killed immediately by the kernel's out-of-memory mechanism (OOMKilled) — memory usage can't be gracefully "slowed down," so the only enforcement mechanism is termination. Exceeding a **CPU** limit does *not* kill the container — the kernel's CFS (Completely Fair Scheduler) quota mechanism simply throttles it, allowing it less CPU time than it's trying to use, so the process keeps running, just slower, with no crash or restart involved.

Detailed Answer

Why memory and CPU are fundamentally different resources to limit

Memory is incompressible — a process either has the memory it needs allocated, or it doesn't; there's no meaningful way to give a process "half the memory, but slower." CPU, by contrast, is compressible — a process can be given less CPU time per unit of wall-clock time and simply run more slowly, without necessarily crashing. This fundamental difference is exactly why Kubernetes (via the Linux kernel's own mechanisms) enforces the two limit types completely differently.

Memory limit exceeded — OOMKilled

resources:
  limits:
    memory: "256Mi"

If a container's memory usage exceeds its limit, the Linux kernel's OOM (Out-Of-Memory) killer terminates the offending process (the container) immediately — there's no warning period, no graceful degradation. kubectl describe pod will show the container's last state as OOMKilled with a non-zero exit code (typically 137, which is 128 + SIGKILL(9)).

kubectl describe pod my-pod
# Last State:  Terminated
#   Reason:    OOMKilled
#   Exit Code: 137

Depending on the Pod's restartPolicy, the kubelet then restarts the container — if the underlying cause (a memory leak, or a limit set genuinely too low for the workload's real needs) isn't addressed, this produces a repeating cycle of OOMKill-then-restart, visible as a CrashLoopBackOff (see the troubleshooting topic).

CPU limit exceeded — throttled, not killed

resources:
  limits:
    cpu: "500m"     # 0.5 CPU cores

If a container tries to use more CPU than its limit allows, the kernel's CFS quota mechanism (which Kubernetes configures via cgroups behind the scenes) simply restricts how much CPU time the container's processes are allowed to consume within each scheduling period — the container's processes keep running, they're just allocated less CPU time than they're asking for, causing everything the container does to slow down proportionally. There's no crash, no restart, no visible error in kubectl describe pod beyond the fact that the application is simply running slower than expected.

Why this asymmetry catches people off guard

An application slowly leaking memory will eventually get OOMKilled and restarted — a visible, loud, easy-to-notice failure mode. An application that's CPU-throttled just becomes quietly, invisibly slower — often manifesting as elevated request latency or timeouts without any obviously corresponding Kubernetes-level event, which can send an on-call engineer looking everywhere except at CPU throttling metrics. Checking for CPU throttling specifically (via container_cpu_cfs_throttled_periods_total-style metrics, commonly surfaced through Prometheus/cAdvisor) is an essential, often-overlooked step when diagnosing mysterious latency issues in a containerized application, precisely because throttling produces no dramatic Kubernetes-visible failure event the way OOMKilling does.

Set memory limits carefully and somewhat generously relative to realistic peak usage, since the consequence of getting it wrong (a hard kill) is abrupt; CPU limits can be set more conservatively if bursty behavior is expected and acceptable, since the consequence of getting it wrong (throttling) is a graceful, if sometimes invisible, degradation rather than a crash — but either way, actual monitoring of both OOMKill events and CPU throttling metrics, not just guessing at reasonable-sounding numbers, is what closes the loop on whether your configured limits actually match the workload's real behavior.