What's the difference between resource requests and limits?

Detailed Answer

Defining both

spec:
  containers:
    - name: app
      image: myapp:1.0
      resources:
        requests:
          cpu: "250m"       # 0.25 CPU cores, guaranteed available
          memory: "256Mi"
        limits:
          cpu: "500m"        # can burst up to 0.5 CPU cores, then gets throttled
          memory: "512Mi"    # hard ceiling -- exceeding this gets the container killed

Requests — what the scheduler reserves

The scheduler only places a Pod on a node that has enough unreserved capacity to satisfy the sum of all its containers' requests — it doesn't look at a node's actual current usage, only at how much has already been requested by other Pods already scheduled there. This is a deliberate, conservative guarantee: if your Pod requests 256Mi of memory, Kubernetes guarantees a node with at least that much capacity available, regardless of what else happens to be running.

Node with 4Gi total memory, already running Pods requesting 3Gi total
   -> only 1Gi of "requestable" capacity remains, regardless of how much
      of the already-running Pods' memory is ACTUALLY being used right now
   -> a new Pod requesting 1.5Gi will NOT be scheduled here, even if the
      node's actual current usage is well under 4Gi

Limits — what's enforced at runtime

A limit is a hard ceiling enforced by the kubelet/container runtime while the container is actually running, independent of scheduling:

CPU limit exceeded: the container is throttled (its CPU time is capped via the Linux kernel's CFS quota mechanism) — it keeps running, just slower, never killed for this reason alone.
Memory limit exceeded: the container is OOMKilled (terminated by the kernel's out-of-memory mechanism) — memory can't be "throttled" the way CPU can, since there's no meaningful way to make an over-budget memory allocation just "run slower."

Why requests and limits are allowed to differ

Setting a limit higher than the request lets a container burst above its guaranteed baseline when the node happens to have spare, unreserved capacity available — useful for workloads with variable, spiky resource needs that don't need their peak usage permanently reserved. But it also means a node can be overcommitted: the sum of all limits on a node can exceed the node's actual total capacity, since not every container is expected to hit its limit simultaneously. If several containers do burst simultaneously and collectively exceed the node's real capacity, something has to give — which containers get throttled or killed first is governed by Quality of Service classes (see that question), derived directly from how requests and limits relate to each other.

What happens with no requests/limits set at all

A container with neither specified is treated as needing essentially nothing for scheduling purposes (it can be scheduled anywhere, even a nearly-full node) and has no enforced ceiling at runtime — it can consume as much of the node's resources as are physically available, potentially starving other, properly-configured workloads on the same node. This is almost always a misconfiguration in production — every container should have explicit requests (so the scheduler makes sound placement decisions) and, in most cases, limits (so one misbehaving container can't take down everything else sharing its node).

Set requests based on realistic, measured typical usage (not a guess), and set limits based on the worst-case acceptable ceiling for that workload — too tight a memory limit causes unnecessary OOMKills under normal, expected load spikes; too loose (or absent) a limit risks one container starving its node-mates. This is a tuning process that benefits from real monitoring data, not a one-time guess made at initial deployment.