What's the difference between the Horizontal Pod Autoscaler, Vertical Pod Autoscaler, and Cluster Autoscaler?

Detailed Answer

Three autoscalers, three different axes

Cluster Autoscaler   -> adds/removes NODES (infrastructure capacity)
        │
        ▼
Horizontal Pod Autoscaler  -> adds/removes REPLICAS of a given workload
        │
        ▼
Vertical Pod Autoscaler    -> resizes the REQUESTS/LIMITS of each Pod

Horizontal Pod Autoscaler (HPA) — more/fewer copies

Covered in depth in the previous question — scales the number of Pod replicas of a Deployment/StatefulSet up or down based on observed metrics. Good for stateless workloads that scale well by adding more identical, load-balanced instances.

Vertical Pod Autoscaler (VPA) — bigger/smaller individual Pods

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  updatePolicy:
    updateMode: "Auto"   # or "Off" (recommendation-only) / "Initial"

The VPA observes each Pod's actual historical CPU/memory usage and recommends (or, in Auto mode, actually applies) better-fitting requests/limits values — solving the very real problem of engineers guessing at reasonable request/limit values upfront and then never revisiting them as an application's actual usage pattern becomes clearer over time. Applying a VPA recommendation typically requires recreating the Pod (you can't resize a running container's cgroup limits without at least a restart, in most configurations), which is an important operational difference from the HPA's non-disruptive replica changes.

A critical incompatibility to know: running HPA (on CPU/memory metrics) and VPA on the same workload simultaneously is explicitly not recommended and can conflict — both are reacting to the same underlying resource-usage signal but taking different kinds of action (more replicas vs. resized replicas), and can end up fighting each other's decisions. If both scaling dimensions matter for a workload, VPA is often used for right-sizing on a slower, calmer cadence, while HPA (ideally on a non-resource metric like requests-per-second) handles reactive scaling.

Cluster Autoscaler — more/fewer nodes

The HPA and VPA both operate within whatever node capacity currently exists in the cluster — neither can help if there simply isn't enough total cluster capacity to schedule more or bigger Pods. The Cluster Autoscaler operates one level below both: it watches for Pods that are Pending because no existing node has room for them, and — on a supported cloud provider — provisions new nodes to accommodate them; conversely, it identifies significantly underutilized nodes and, if their workloads can be safely rescheduled elsewhere (respecting PodDisruptionBudgets — see that question), removes them to save cost.

HPA decides "I need 3 more replicas of this Deployment"
   → Scheduler tries to place them
   → No existing node has enough free capacity
   → Cluster Autoscaler notices Pods stuck Pending due to insufficient
     resources, and provisions a new node
   → Scheduler places the pending Pods onto the new node

How the three commonly work together

A well-configured cluster typically layers all three: HPA reacts quickly to load by adjusting replica counts, Cluster Autoscaler reacts to the resulting capacity needs by adjusting the number of nodes, and VPA (used more cautiously, often in recommendation-only mode, or on workloads not also using HPA) periodically right-sizes each Pod's actual requests/limits based on real observed usage — together giving a system that scales along all three axes (replica count, per-Pod size, and total node capacity) with minimal manual intervention.

Precisely articulating that these three operate on three genuinely different axes — Pod count, individual Pod size, and node count — and knowing the HPA/VPA conflict caveat, demonstrates real production autoscaling experience rather than just knowing three acronyms exist.