What's the difference between a blue-green and a canary deployment, and how do you implement each in Kubernetes?

8 minadvancedblue-green-deploymentcanary-deploymentprogressive-delivery

Quick Answer

A blue-green deployment runs the new version fully alongside the old version, then switches all traffic over in one atomic cutover (typically by updating a Service's label selector) — rollback is instant, but requires double the resources temporarily and offers no gradual, partial-traffic validation. A canary deployment gradually shifts a small, increasing percentage of traffic to the new version while most traffic still goes to the old one, letting you validate the new version under a limited slice of real production traffic before fully committing — typically requiring a service mesh or Ingress controller with traffic-splitting capability, since Kubernetes's core Deployment rolling update mechanism doesn't natively support percentage-based traffic splitting on its own.

Detailed Answer

Blue-green — two full environments, instant cutover

# "Blue" (current, live) Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-blue
spec:
  replicas: 5
  selector:
    matchLabels:
      app: web
      version: blue
  template:
    metadata:
      labels:
        app: web
        version: blue
    spec:
      containers:
        - image: myapp:1.0
# "Green" (new version), deployed ALONGSIDE blue, not yet receiving traffic
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-green
spec:
  replicas: 5
  selector:
    matchLabels:
      app: web
      version: green
  template:
    metadata:
      labels:
        app: web
        version: green
    spec:
      containers:
        - image: myapp:2.0
# Service currently points at "blue" -- cutover happens by changing this selector
apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  selector:
    app: web
    version: blue    # <-- change this to "green" to cut over ALL traffic at once

Both full versions run simultaneously (doubling resource usage temporarily), and the switch is a single, near-instant change to the Service's selector — every request after the switch goes to the new version, and rollback is equally instant (just switch the selector back). There's no gradual traffic-splitting phase — it's fully off or fully on for each version.

Canary — gradual, partial traffic shift

# 90% of traffic to the stable version, 10% to the canary --
# this typically requires something beyond a plain Kubernetes Service,
# since Services load-balance evenly across all matching endpoints,
# not by a configurable percentage split.

Plain Kubernetes Services have no native concept of "send 10% of traffic here, 90% there" — achieving genuine percentage-based traffic splitting requires additional tooling:

  • Ingress controllers with canary annotations (e.g., NGINX Ingress's nginx.ingress.kubernetes.io/canary-weight annotation) — a simpler, Ingress-layer approach.
  • Service mesh traffic splitting (Istio's VirtualService weighted routing, Linkerd's traffic split) — more powerful, supporting fine-grained routing rules (by header, by percentage, gradually shifting over time) at the service-mesh layer.
  • Progressive delivery controllers (Argo Rollouts, Flagger) — purpose-built tools that automate the entire canary process: gradually shifting traffic percentages on a schedule, automatically querying metrics (error rate, latency) at each step, and automatically rolling back if the canary's metrics look worse than the stable version's — without needing to manually watch and adjust percentages by hand.
# Simplified Argo Rollouts canary strategy concept
apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
  strategy:
    canary:
      steps:
        - setWeight: 10      # send 10% of traffic to the new version
        - pause: {duration: 10m}   # wait, observe metrics
        - setWeight: 50
        - pause: {duration: 10m}
        - setWeight: 100     # fully cut over

Comparing the two

Blue-greenCanary
Resource cost during rolloutDouble (both full versions running)Only slightly more (canary is usually a small extra replica count)
Rollback speedInstant (flip the selector back)Also fast, but may involve gradually shifting traffic back
Real production traffic validation before full commitNo — it's all-or-nothing at cutoverYes — the whole point is validating under a real (if limited) slice of production traffic
Native Kubernetes supportAchievable with plain Services + relabelingRequires additional tooling (Ingress annotations, service mesh, or Argo Rollouts/Flagger)
Best forSimpler risk profiles, or when partial-traffic validation isn't feasible/neededHigher-risk changes where gradually validating real user impact before full rollout matters

Why rolling updates alone (the Deployment default) aren't the same as either

A standard rolling update (see the workload controllers topic) gradually replaces old Pods with new ones, but every Pod — old or new — receives an equal, undifferentiated share of traffic throughout the process; there's no deliberate "send only a controlled fraction of traffic to the new version and watch its metrics before proceeding" logic the way a true canary strategy has, and no "two full parallel environments, instant atomic switch" behavior the way blue-green has. Rolling updates are a reasonable default for many workloads, but blue-green and canary are deliberate strategies reached for when the additional safety/control they provide is specifically worth the added complexity.