How do you safely upgrade a Kubernetes cluster with minimal downtime?
Quick Answer
Upgrade the control plane first (typically non-disruptive to running workloads on managed services, since the control plane is separate from where application Pods run), then upgrade worker nodes one at a time (or in small batches) by cordoning and draining each node — moving its Pods elsewhere safely, respecting PodDisruptionBudgets — before upgrading or replacing it, then uncordoning it to rejoin the pool. Always check the specific version's changelog for deprecated/removed APIs before upgrading, since a cluster using a removed API version will break, not just warn.
Detailed Answer
The general order: control plane, then nodes
Kubernetes supports the control plane running a newer minor version than the kubelets on worker nodes (within a supported skew, typically up to a few minor versions, per the official version skew policy) — this is precisely what allows upgrading the control plane first without needing to simultaneously upgrade every node, spreading the upgrade out safely rather than requiring one enormous, all-at-once cutover.
Step 1: check for deprecated/removed APIs before upgrading
Kubernetes deprecates and eventually removes old API versions on a predictable schedule (a notable historical example: many extensions/v1beta1 and apps/v1beta1 resources were removed in 1.16) — a manifest or a controller still using a removed API version will simply fail outright on the new version, not just print a warning. Tools like kubectl-convert, pluto, or kube-no-trouble scan a cluster's actual running manifests (and Helm charts) for use of soon-to-be-removed or already-removed API versions, letting you fix these proactively before the upgrade rather than discovering the breakage during or after it.
Step 2: upgrade the control plane
On a managed service (EKS, GKE, AKS), this is typically a single action the cloud provider handles largely automatically and with minimal disruption, since the control plane's own components (API server, etcd, scheduler) are separate from where application workloads actually run. On a self-managed cluster (kubeadm), this means upgrading each control-plane node's components in sequence, ensuring etcd quorum is maintained throughout (never upgrading so many control-plane nodes simultaneously that you lose quorum — see the etcd question).
Step 3: upgrade worker nodes, one at a time (or in small batches)
kubectl cordon node-1 # mark node-1 as unschedulable -- no NEW pods land here
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
# ... drain evicts existing Pods, respecting PodDisruptionBudgets (see that question) ...
# ... upgrade node-1's kubelet/OS, or replace it entirely with a new node ...
kubectl uncordon node-1 # allow new pods to be scheduled here again
cordonmarks a node unschedulable for new Pods, without touching Pods already running there — a preparatory step.drainadditionally evicts existing Pods from the node (via the Eviction API, respecting PodDisruptionBudgets), letting their controllers (Deployments, StatefulSets) reschedule them onto other, still-available nodes before this one is taken offline for the upgrade.- Repeating this one node (or a small batch) at a time, rather than draining every node simultaneously, is what keeps the application actually available throughout the process — assuming applications have enough replicas spread across enough nodes, and correctly configured PodDisruptionBudgets, to tolerate losing one node's worth of capacity at a time.
Why this is often easier on managed cloud clusters
Managed Kubernetes offerings frequently provide a largely automated node-pool upgrade process that performs exactly this cordon/drain/replace sequence across a node pool with configurable batching/surge settings, substantially reducing the manual orchestration burden compared to a fully self-managed kubeadm cluster, where an operator (or custom automation) is responsible for sequencing this correctly.
Rollback and canary-style caution for major version jumps
Kubernetes officially supports upgrading one minor version at a time (e.g., 1.27 → 1.28 → 1.29, not skipping straight from 1.27 to 1.29) — skipping versions isn't supported and can produce unpredictable results. For especially risk-sensitive environments, some teams first validate an upgrade against a staging cluster running representative workloads before applying it to production, treating a cluster upgrade with the same caution as a major application deployment, not a routine background task.
Mentioning cordon/drain specifically (not just "upgrade the nodes"), the API-deprecation-checking step, and the one-minor-version-at-a-time constraint together demonstrate real hands-on cluster operations experience, rather than only conceptual familiarity with the idea that clusters need periodic upgrades.