What's the difference between a blue-green and a canary deployment, and how do you implement each in Kubernetes?
Quick Answer
A blue-green deployment runs the new version fully alongside the old version, then switches all traffic over in one atomic cutover (typically by updating a Service's label selector) — rollback is instant, but requires double the resources temporarily and offers no gradual, partial-traffic validation. A canary deployment gradually shifts a small, increasing percentage of traffic to the new version while most traffic still goes to the old one, letting you validate the new version under a limited slice of real production traffic before fully committing — typically requiring a service mesh or Ingress controller with traffic-splitting capability, since Kubernetes's core Deployment rolling update mechanism doesn't natively support percentage-based traffic splitting on its own.
Detailed Answer
Blue-green — two full environments, instant cutover
# "Blue" (current, live) Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-blue
spec:
replicas: 5
selector:
matchLabels:
app: web
version: blue
template:
metadata:
labels:
app: web
version: blue
spec:
containers:
- image: myapp:1.0
# "Green" (new version), deployed ALONGSIDE blue, not yet receiving traffic
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-green
spec:
replicas: 5
selector:
matchLabels:
app: web
version: green
template:
metadata:
labels:
app: web
version: green
spec:
containers:
- image: myapp:2.0
# Service currently points at "blue" -- cutover happens by changing this selector
apiVersion: v1
kind: Service
metadata:
name: web
spec:
selector:
app: web
version: blue # <-- change this to "green" to cut over ALL traffic at once
Both full versions run simultaneously (doubling resource usage temporarily), and the switch is a single, near-instant change to the Service's selector — every request after the switch goes to the new version, and rollback is equally instant (just switch the selector back). There's no gradual traffic-splitting phase — it's fully off or fully on for each version.
Canary — gradual, partial traffic shift
# 90% of traffic to the stable version, 10% to the canary --
# this typically requires something beyond a plain Kubernetes Service,
# since Services load-balance evenly across all matching endpoints,
# not by a configurable percentage split.
Plain Kubernetes Services have no native concept of "send 10% of traffic here, 90% there" — achieving genuine percentage-based traffic splitting requires additional tooling:
- Ingress controllers with canary annotations (e.g., NGINX Ingress's
nginx.ingress.kubernetes.io/canary-weightannotation) — a simpler, Ingress-layer approach. - Service mesh traffic splitting (Istio's
VirtualServiceweighted routing, Linkerd's traffic split) — more powerful, supporting fine-grained routing rules (by header, by percentage, gradually shifting over time) at the service-mesh layer. - Progressive delivery controllers (Argo Rollouts, Flagger) — purpose-built tools that automate the entire canary process: gradually shifting traffic percentages on a schedule, automatically querying metrics (error rate, latency) at each step, and automatically rolling back if the canary's metrics look worse than the stable version's — without needing to manually watch and adjust percentages by hand.
# Simplified Argo Rollouts canary strategy concept
apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
strategy:
canary:
steps:
- setWeight: 10 # send 10% of traffic to the new version
- pause: {duration: 10m} # wait, observe metrics
- setWeight: 50
- pause: {duration: 10m}
- setWeight: 100 # fully cut over
Comparing the two
| Blue-green | Canary | |
|---|---|---|
| Resource cost during rollout | Double (both full versions running) | Only slightly more (canary is usually a small extra replica count) |
| Rollback speed | Instant (flip the selector back) | Also fast, but may involve gradually shifting traffic back |
| Real production traffic validation before full commit | No — it's all-or-nothing at cutover | Yes — the whole point is validating under a real (if limited) slice of production traffic |
| Native Kubernetes support | Achievable with plain Services + relabeling | Requires additional tooling (Ingress annotations, service mesh, or Argo Rollouts/Flagger) |
| Best for | Simpler risk profiles, or when partial-traffic validation isn't feasible/needed | Higher-risk changes where gradually validating real user impact before full rollout matters |
Why rolling updates alone (the Deployment default) aren't the same as either
A standard rolling update (see the workload controllers topic) gradually replaces old Pods with new ones, but every Pod — old or new — receives an equal, undifferentiated share of traffic throughout the process; there's no deliberate "send only a controlled fraction of traffic to the new version and watch its metrics before proceeding" logic the way a true canary strategy has, and no "two full parallel environments, instant atomic switch" behavior the way blue-green has. Rolling updates are a reasonable default for many workloads, but blue-green and canary are deliberate strategies reached for when the additional safety/control they provide is specifically worth the added complexity.