What's the difference between a blue-green and a canary deployment, and how do you implement each in Kubernetes?

Detailed Answer

Blue-green — two full environments, instant cutover

# "Blue" (current, live) Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-blue
spec:
  replicas: 5
  selector:
    matchLabels:
      app: web
      version: blue
  template:
    metadata:
      labels:
        app: web
        version: blue
    spec:
      containers:
        - image: myapp:1.0

# "Green" (new version), deployed ALONGSIDE blue, not yet receiving traffic
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-green
spec:
  replicas: 5
  selector:
    matchLabels:
      app: web
      version: green
  template:
    metadata:
      labels:
        app: web
        version: green
    spec:
      containers:
        - image: myapp:2.0

# Service currently points at "blue" -- cutover happens by changing this selector
apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  selector:
    app: web
    version: blue    # <-- change this to "green" to cut over ALL traffic at once

Both full versions run simultaneously (doubling resource usage temporarily), and the switch is a single, near-instant change to the Service's selector — every request after the switch goes to the new version, and rollback is equally instant (just switch the selector back). There's no gradual traffic-splitting phase — it's fully off or fully on for each version.

Canary — gradual, partial traffic shift

# 90% of traffic to the stable version, 10% to the canary --
# this typically requires something beyond a plain Kubernetes Service,
# since Services load-balance evenly across all matching endpoints,
# not by a configurable percentage split.

Plain Kubernetes Services have no native concept of "send 10% of traffic here, 90% there" — achieving genuine percentage-based traffic splitting requires additional tooling:

Ingress controllers with canary annotations (e.g., NGINX Ingress's nginx.ingress.kubernetes.io/canary-weight annotation) — a simpler, Ingress-layer approach.
Service mesh traffic splitting (Istio's VirtualService weighted routing, Linkerd's traffic split) — more powerful, supporting fine-grained routing rules (by header, by percentage, gradually shifting over time) at the service-mesh layer.
Progressive delivery controllers (Argo Rollouts, Flagger) — purpose-built tools that automate the entire canary process: gradually shifting traffic percentages on a schedule, automatically querying metrics (error rate, latency) at each step, and automatically rolling back if the canary's metrics look worse than the stable version's — without needing to manually watch and adjust percentages by hand.

# Simplified Argo Rollouts canary strategy concept
apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
  strategy:
    canary:
      steps:
        - setWeight: 10      # send 10% of traffic to the new version
        - pause: {duration: 10m}   # wait, observe metrics
        - setWeight: 50
        - pause: {duration: 10m}
        - setWeight: 100     # fully cut over

Comparing the two

	Blue-green	Canary
Resource cost during rollout	Double (both full versions running)	Only slightly more (canary is usually a small extra replica count)
Rollback speed	Instant (flip the selector back)	Also fast, but may involve gradually shifting traffic back
Real production traffic validation before full commit	No — it's all-or-nothing at cutover	Yes — the whole point is validating under a real (if limited) slice of production traffic
Native Kubernetes support	Achievable with plain Services + relabeling	Requires additional tooling (Ingress annotations, service mesh, or Argo Rollouts/Flagger)
Best for	Simpler risk profiles, or when partial-traffic validation isn't feasible/needed	Higher-risk changes where gradually validating real user impact before full rollout matters

Why rolling updates alone (the Deployment default) aren't the same as either

A standard rolling update (see the workload controllers topic) gradually replaces old Pods with new ones, but every Pod — old or new — receives an equal, undifferentiated share of traffic throughout the process; there's no deliberate "send only a controlled fraction of traffic to the new version and watch its metrics before proceeding" logic the way a true canary strategy has, and no "two full parallel environments, instant atomic switch" behavior the way blue-green has. Rolling updates are a reasonable default for many workloads, but blue-green and canary are deliberate strategies reached for when the additional safety/control they provide is specifically worth the added complexity.

What's the difference between a blue-green and a canary deployment, and how do you implement each in Kubernetes?

Quick Answer

Detailed Answer

Blue-green — two full environments, instant cutover

Canary — gradual, partial traffic shift

Comparing the two

Why rolling updates alone (the Deployment default) aren't the same as either

Related Resources

Related Questions