What is Pod priority and preemption?

6 minadvancedpod-prioritypreemptionscheduling

Quick Answer

A PriorityClass assigns a numeric priority value to Pods; when the scheduler can't find room for a high-priority Pod, it can **preempt** (evict) lower-priority Pods on some node to free up enough resources for it, rather than leaving the high-priority Pod stuck Pending. This ensures genuinely critical workloads get scheduled even under resource contention, at the cost of the evicted lower-priority Pods needing to be rescheduled elsewhere (or wait) — it's specifically a mechanism for resolving scheduling conflicts, not a general-purpose importance label.

Detailed Answer

Defining priority classes

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "Critical production workloads"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: 1000
globalDefault: false
description: "Batch jobs, safe to preempt"
spec:
  priorityClassName: high-priority
  containers:
    - name: critical-service
      image: myapp:1.0

What happens under resource contention

When the scheduler evaluates a Pod with priorityClassName: high-priority and finds no node has enough free capacity to place it, it doesn't simply leave the Pod Pending (as would happen without priority/preemption configured) — instead, it looks for a node where evicting one or more lower-priority Pods would free up enough room, and if it finds one, evicts those lower-priority Pods (respecting their termination grace periods and, ideally, their PodDisruptionBudgets, though preemption can override a PDB if truly necessary) to make room for the higher-priority Pod.

Node is full, running several "low-priority" batch Pods.
A new "high-priority" Pod can't be scheduled anywhere else.
   → Scheduler preempts (evicts) enough low-priority Pods on this node
     to free sufficient capacity
   → The high-priority Pod is scheduled into the freed space
   → The evicted low-priority Pods go back to Pending, to be rescheduled
     elsewhere (or wait) once capacity is available again

Why this matters: guaranteeing critical workloads actually run

Without priority/preemption, resource contention is resolved purely by scheduling order and luck — a critical production service could, in principle, be stuck Pending indefinitely behind a cluster full of lower-importance batch jobs that happened to claim capacity first. Priority and preemption give the scheduler an explicit mechanism to say "this matters more, and if something has to give, it should be something less important" — a real operational safeguard, not just a label for humans to read.

Important caveats

  • Preemption isn't guaranteed to succeed — if evicting every lower-priority Pod on every node still wouldn't free enough capacity for the high-priority Pod (e.g., it needs more resources than any single node has, period), preemption can't help, and the Pod remains Pending regardless of its priority.
  • Preempted Pods don't get a persistent, tracked "we owe you a slot" guarantee — they're simply evicted and go back through normal scheduling, competing for whatever capacity exists at that point (possibly getting preempted again themselves if a still-higher-priority Pod later needs the space).
  • A globalDefault: true PriorityClass applies automatically to any Pod that doesn't specify one explicitly — worth being deliberate about, since an unconfigured cluster otherwise treats every Pod as equal priority (effectively no preemption benefit for anything).

Reserve priority classes for genuinely meaningful tiers (e.g., "critical production," "standard production," "best-effort batch") rather than a large number of finely-graded levels that are hard to reason about — and always design PriorityClasses alongside PodDisruptionBudgets and resource requests/limits, since all three interact in determining what actually gets evicted and when under real contention.