What is Kubernetes, and what problem does it solve?

Kubernetes is an open-source container orchestration platform that automates deploying, scaling, healing, and networking containerized applications across a cluster of machines. It solves the operational problems that appear once you run containers at any real scale: which machine runs which container, what happens when a container or machine dies, how services find each other, and how to roll out changes without downtime — all without an operator manually managing each container.

What are the core components of the Kubernetes control plane?

The control plane makes cluster-wide decisions and consists of: the **API server** (the front door — all communication, including from `kubectl`, goes through it), **etcd** (the cluster's persistent, consistent key-value store holding all state), the **scheduler** (decides which node a new pod should run on), and the **controller manager** (runs the reconciliation control loops that keep actual state matching desired state). Managed Kubernetes services (EKS, GKE, AKS) run and maintain these components for you.

What are the core components running on a worker node?

Each worker node runs a **kubelet** (the agent that talks to the API server, ensures the containers described in its assigned Pods are actually running, and reports node/pod status back), a **container runtime** (like containerd or CRI-O, which actually pulls images and runs containers, via the Container Runtime Interface), and **kube-proxy** (which maintains the networking rules that let traffic reach the right Pod for each Service).

What is etcd, and why is it critical to a Kubernetes cluster?

etcd is a distributed, strongly-consistent key-value store, built on the Raft consensus algorithm, that holds the complete state of a Kubernetes cluster — every object's spec and status. It's the only stateful component in the control plane and the single source of truth the API server reads from and writes to; losing etcd's data without a backup means losing the cluster's entire configuration and state, which is why etcd backup and a tested restore procedure are non-negotiable for any production cluster.

What is the Kubernetes API server, and how does kubectl interact with it?

The API server is a RESTful HTTP API that's the single entry point for all cluster operations — every read or write, from `kubectl`, a controller, or any other client, is an HTTP request to the API server, which authenticates and authorizes the request, validates it, and reads from or writes to etcd on the caller's behalf. `kubectl` is essentially a thin client that translates commands and YAML manifests into API server requests and formats the JSON responses for human-readable output.

What is a Kubernetes object, and what do apiVersion, kind, metadata, and spec mean?

A Kubernetes object is a persistent record in the cluster (stored in etcd) representing the desired state of something — a Pod, a Deployment, a Service. Every object's manifest has four key top-level fields: `apiVersion` (which version of the API this object's schema belongs to), `kind` (what type of object it is), `metadata` (identifying information — name, namespace, labels, annotations), and `spec` (the desired state you're declaring). Most objects also get a `status` field, populated by the system, reflecting the observed actual state.

What's the difference between declarative and imperative management in Kubernetes?

Imperative commands tell Kubernetes exactly what action to take right now (`kubectl run`, `kubectl create`, `kubectl scale`) — simple for one-off tasks, but the history of *how* the cluster got to its current state isn't recorded anywhere reusable. Declarative management (`kubectl apply -f manifest.yaml`) describes the desired end state in a file, and Kubernetes computes and applies whatever changes are needed to reach it — this is version-controllable, repeatable, and safely re-runnable, which is why it's the standard approach for anything beyond quick experimentation.

What is a Namespace, and when should you use multiple namespaces?

A Namespace is a way to logically partition a single Kubernetes cluster into multiple virtual clusters, scoping most object names, RBAC rules, and resource quotas within it. Use multiple namespaces to separate environments (dev/staging/prod, though many teams instead use separate clusters for this), separate teams/applications sharing one cluster, or to apply different access controls and resource limits to different parts of the same cluster.

What is the reconciliation/control loop pattern, and why is it central to Kubernetes?

A control loop continuously watches for the current (actual) state of some resource, compares it against the declared desired state, and takes action to close any gap — repeating indefinitely. Kubernetes is built almost entirely out of many independent control loops (one per controller type: Deployments, ReplicaSets, Nodes, and so on), each responsible for reconciling one narrow slice of cluster state, which is what makes the whole system self-healing and eventually consistent rather than requiring a human to notice and fix every drift.

What is the Container Runtime Interface (CRI), and what changed when Docker was deprecated as a Kubernetes runtime?

The CRI is a standard plugin interface that lets the kubelet talk to any compliant container runtime without being hardcoded to a specific one — this is what allows containerd, CRI-O, and others to be used interchangeably. Docker itself was never CRI-compliant, so Kubernetes used a shim (`dockershim`) to translate between the two; that shim was removed starting with Kubernetes 1.24, meaning Docker specifically can no longer be used as the underlying runtime — but containers built with the Docker CLI/Dockerfile format are completely unaffected and run identically under containerd or CRI-O, since the OCI image format is a separate, shared standard.

How does kubectl know which cluster to talk to?

kubectl reads connection details — the API server's address, the credentials to authenticate with, and which cluster/user/namespace combination (a "context") is currently active — from a kubeconfig file, by default at `~/.kube/config`. You can define multiple clusters and contexts in one kubeconfig file and switch between them with `kubectl config use-context`, or point kubectl at a different file entirely via the `KUBECONFIG` environment variable or the `--kubeconfig` flag.

Kubernetes Fundamentals and Architecture

The control plane, worker node components, the reconciliation model, and the core building blocks of a cluster.

Questions

11 total

11 questions in this section

Difficulty

Open as page

The problem before orchestration

Running a single container on a single machine is easy — docker run and you're done. The problem appears at scale: an application with a dozen services, each needing multiple replicas for availability, spread across many machines, needing to survive machine failures, needing to find and talk to each other, and needing to be updated without downtime. Doing this by hand — SSHing into machines, manually restarting crashed containers, manually editing load balancer configs when an instance moves — doesn't scale past a handful of containers, and is fragile and slow even then.

What Kubernetes actually automates

Scheduling — deciding which machine (node) in the cluster should run each container, based on available resources and constraints.
Self-healing — if a container crashes or a node dies, Kubernetes notices and starts replacement containers elsewhere, without a human intervening.
Service discovery and load balancing — containers get a stable way to find and talk to each other, even as individual instances are created and destroyed and move between nodes.
Rolling updates and rollbacks — deploying a new version of an application gradually, replacing old instances with new ones, and automatically reverting if something goes wrong.
Scaling — increasing or decreasing the number of running instances of an application, manually or automatically based on load.
Configuration and secret management — injecting configuration and sensitive values into applications without baking them into container images.

The core idea: declarative desired state

Rather than issuing imperative commands ("start this container on that machine"), you describe the desired state of your system in configuration ("I want 3 replicas of this application running") and Kubernetes continuously works to make the actual state match it — this reconciliation-loop model (covered in depth in a later question) is the central idea that everything else in Kubernetes builds on.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 3          # desired state: always keep 3 running
  template:
    spec:
      containers:
        - name: web
          image: myapp:1.4.0

If a node hosting one of these pods dies, Kubernetes notices the actual count has dropped below 3 and schedules a replacement — without anyone needing to notice the failure or take manual action.

Why this matters for interviews

A strong answer doesn't just define Kubernetes as "a container orchestrator" — it connects that definition to the concrete operational pain (manual scheduling, no self-healing, fragile networking, risky deploys) that existed before it, and names the reconciliation/desired-state model as the mechanism that makes the automation possible. This framing sets up nearly every other Kubernetes topic, since almost every object type in the system (Deployments, Services, PVCs) is just a different application of the same "describe what you want, a controller makes it true" pattern.

Related Resources

Kubernetes: What is Kubernetes?

Open as page

The four control plane components

                     ┌─────────────────────────────────┐
                     │          Control Plane            │
                     │                                    │
  kubectl ────────▶  │   API Server ──▶ etcd              │
                     │       ▲                            │
                     │       │                            │
                     │   Scheduler   Controller Manager    │
                     └─────────────────────────────────┘
                              │ (schedules pods onto, and
                              │  monitors, worker nodes)
                              ▼
                     Worker Nodes (kubelet, kube-proxy, runtime)

API server (`kube-apiserver`)

The single front door to the cluster — every read and write, whether from kubectl, a controller, or another component, goes through it as a REST API. It validates requests, enforces authentication/authorization (see the RBAC question), and is the only component that talks directly to etcd. Because it's stateless itself (all real state lives in etcd), it can be horizontally scaled behind a load balancer for high availability.

etcd

A distributed, consistent key-value store that holds the entire cluster's state — every object definition, every current status, all of it. It's built on the Raft consensus algorithm, which is what gives it strong consistency guarantees even when run as a multi-node cluster for high availability. Losing etcd (without a backup) means losing the cluster's entire state — which is why etcd backup/restore is a critical, non-optional production practice (see the operations topic).

Scheduler (`kube-scheduler`)

Watches the API server for newly created Pods that don't yet have a node assigned, and decides which node each should run on — based on resource requests, affinity/anti-affinity rules, taints/tolerations, and other constraints (see the scheduling topic). The scheduler only decides placement; it's the kubelet on the chosen node that actually starts the container.

Controller manager (`kube-controller-manager`)

Runs a collection of controllers, each responsible for a reconciliation control loop for one type of object — the Deployment controller ensures the right number of Pods exist, the Node controller notices when a node stops responding, and so on. Conceptually, these are many independent loops, each continuously comparing desired state (from etcd, via the API server) against observed actual state, and taking action to close any gap.

Why this separation matters

Each component has one narrow job, and they only communicate through the API server (never directly with each other or with etcd, except the API server itself) — this decoupling is what lets each component be replaced, scaled, or restarted independently without the others needing to know or care, and is a large part of why Kubernetes itself is resilient to individual component failures.

Managed vs. self-hosted

Cloud-managed Kubernetes (EKS, GKE, AKS) runs and maintains the entire control plane for you — you never see or manage etcd, the API server, or the scheduler directly, only interact with the API server's endpoint. Self-hosting a cluster (via kubeadm or similar) means you're responsible for standing up, securing, scaling, and backing up all four of these components yourself.

Related Resources

Kubernetes: Cluster Architecture

Open as page

The three node-level components

   Worker Node
   ┌──────────────────────────────────────────┐
   │  kubelet ───────▶ Container Runtime        │
   │     ▲              (containerd / CRI-O)     │
   │     │              → actually runs Pods      │
   │     │ (talks to API server)                  │
   │  kube-proxy                                  │
   │     → maintains network rules for Services   │
   └──────────────────────────────────────────┘

kubelet

The primary agent on every node — it watches the API server for Pods assigned to its node, and ensures the containers described in each Pod's spec are actually running and healthy (starting them via the container runtime, restarting them if they crash, running liveness/readiness probes). It also reports the node's and its pods' status back to the API server, which is how kubectl get pods and kubectl get nodes show current state. The kubelet does not manage containers that weren't created through Kubernetes — it only manages what's described in the Pod specs assigned to it.

Container runtime

The software that actually pulls container images and runs containers — containerd and CRI-O are the two most common choices today. The kubelet talks to the runtime through a standard interface called the Container Runtime Interface (CRI), rather than being hardcoded to any one runtime (see that question for why Docker specifically was deprecated as a direct Kubernetes runtime, even though containers built with Docker still run fine).

kube-proxy

Maintains the network rules on each node that implement the Service abstraction (see the networking topic) — traditionally via iptables rules, though modern configurations increasingly use IPVS or eBPF-based approaches (like Cilium) for better performance at scale. When a Service is created or its backing Pods change, kube-proxy updates the node's networking rules so traffic sent to the Service's virtual IP gets routed to one of the actual healthy backing Pods.

Why nodes need all three, and the control plane doesn't run them

The control plane decides what should happen (desired state, scheduling decisions); worker nodes are where things actually run. The kubelet and container runtime are what turn a scheduling decision into an actual running container; kube-proxy is what turns a Service definition into actual working network routing on that node. Every node needs all three because every node needs to both run containers and participate correctly in cluster networking — the control plane components, by contrast, don't run application workloads at all (in most production setups) and so don't need them.

What happens if a node's kubelet stops reporting

The control plane's node controller notices the node has stopped sending heartbeats within a configured threshold, marks the node as NotReady, and — after a further grace period — Pods that were running on it are considered for rescheduling onto healthy nodes (assuming they're managed by a controller like a Deployment that maintains a desired replica count; a bare unmanaged Pod would simply be lost).

Related Resources

Kubernetes: Nodes

Open as page

What etcd actually stores

Every Kubernetes object you create — every Deployment, Service, ConfigMap, Secret, Pod status — is ultimately stored as a key in etcd. The API server is the only component that talks to etcd directly; everything else (kubectl, the scheduler, controllers, kubelets) goes through the API server, which reads and writes to etcd on their behalf.

kubectl apply -f deployment.yaml
   → API server validates & authorizes
   → API server writes the Deployment object to etcd
   → Controller manager's Deployment controller, watching the API server,
     notices the new/changed object and creates matching ReplicaSets/Pods

Why Raft consensus matters

etcd is typically run as a cluster of an odd number of nodes (commonly 3 or 5) using the Raft consensus algorithm to agree on writes — a write is only considered committed once a majority (quorum) of etcd nodes have durably persisted it. This gives etcd strong consistency (every read reflects the most recently committed write) and tolerance of node failures (a 5-node etcd cluster can lose 2 nodes and keep operating, since 3 still form a majority) — but it also means etcd write latency is bounded by the slowest node needed to reach quorum, and etcd performance is quite sensitive to disk I/O latency and network latency between its nodes.

Why losing etcd is catastrophic

Every other control plane component is effectively stateless or easily reconstructible — the API server holds no state of its own, the scheduler and controllers can be restarted and will simply re-read current state from etcd (via the API server) and resume operating. But if etcd's data is lost or corrupted without a backup, there is no other copy of the cluster's state anywhere — every Deployment, Service, Secret, and their current status is simply gone, and the cluster must effectively be rebuilt from whatever configuration (YAML manifests, Helm charts, GitOps repositories) exists outside the cluster.

Backup and disaster recovery

# Take a point-in-time snapshot of etcd's data
etcdctl snapshot save backup.db

# Restore from a snapshot (typically as part of rebuilding a control plane node)
etcdctl snapshot restore backup.db

Regular, automated etcd snapshots — stored somewhere other than the etcd nodes themselves — combined with periodic restore testing (an untested backup isn't a real backup) is standard practice for any self-managed production cluster. Managed Kubernetes services (EKS, GKE, AKS) handle etcd backup and the entire control plane's resilience for you, which is one of the most significant operational burdens they take off a team's plate compared to self-hosting.

Security note

Because etcd holds every Secret's data (by default, unencrypted unless encryption-at-rest is explicitly configured — see the security topic), direct network access to etcd must be tightly restricted to the control plane components that need it, and encryption at rest should be enabled for any cluster storing genuinely sensitive Secret data.

Related Resources

Kubernetes: Operating etcd clusters

Open as page

The API server as the single front door

Every interaction with a Kubernetes cluster — whether a human running kubectl get pods, a controller watching for changes, or the scheduler assigning a Pod to a node — happens through the API server's REST endpoints. Nothing in the cluster (other than the API server itself) talks directly to etcd.

kubectl get pods -n default
   → kubectl sends: GET https://<api-server>/api/v1/namespaces/default/pods
   → API server: authenticates the request, checks RBAC authorization,
     reads matching Pod objects from etcd, returns JSON
   → kubectl formats the JSON response as the human-readable table you see

The request pipeline

Every request passes through several stages: authentication (who are you — client certificate, bearer token, etc.), authorization (are you allowed to do this — typically RBAC), admission control (mutating and validating webhooks that can modify or reject the request — see the security topic), and finally the actual read/write against etcd. Any stage can reject the request, which is why a well-formed kubectl apply can still fail with a permissions error or an admission webhook rejection even though the YAML itself is syntactically valid.

What kubectl actually is

kubectl is a client binary with no special privileged access of its own — it authenticates using whatever credentials are configured in your kubeconfig file, and every single thing it does is exactly one or more calls to the same public API server endpoints that any other client (a CI pipeline, a custom controller, a monitoring tool) could call directly. This is why kubectl apply -f deployment.yaml and a Python script using the Kubernetes client library to PUT the same object are functionally identical from the API server's point of view.

# These achieve the same result via different means:
kubectl apply -f deployment.yaml

curl -X POST https://<api-server>/apis/apps/v1/namespaces/default/deployments \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/yaml" \
  --data-binary @deployment.yaml

Watches, not polling

A key API server feature many clients (including kubectl get pods --watch, and every controller) rely on is the watch mechanism — instead of repeatedly polling for changes, a client can open a long-lived connection and receive a stream of change events as they happen. This is the foundation of the reconciliation model: controllers watch for changes to the objects they care about and react immediately, rather than polling on a fixed interval.

Why this design matters

Because everything goes through one well-defined API, Kubernetes's entire ecosystem of tools (Helm, ArgoCD, custom controllers, monitoring dashboards) can all interact with a cluster the same consistent way, and the API itself can be extended (via Custom Resource Definitions — see that topic) without needing to change how any existing client talks to the cluster.

Related Resources

Kubernetes: The Kubernetes API

Open as page

Anatomy of a manifest

apiVersion: apps/v1        # which API group/version defines this object's schema
kind: Deployment            # what type of object this is
metadata:
  name: web                 # this object's unique name (within its namespace)
  namespace: production
  labels:
    app: web
    tier: frontend
spec:                        # DESIRED state -- what you're declaring you want
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: web
          image: myapp:1.4.0
status:                       # ACTUAL observed state -- populated BY Kubernetes, not by you
  availableReplicas: 3
  updatedReplicas: 3

apiVersion

Identifies which version of which API group defines this object's schema — core objects like Pods and Services use v1; most workload objects (Deployments, StatefulSets, DaemonSets) use apps/v1; networking objects use networking.k8s.io/v1, and so on. This matters because Kubernetes APIs evolve — a field available in v1 might not exist in an older v1beta1 version of the same kind, and using the wrong apiVersion for your cluster's Kubernetes version is a common source of "field not recognized" errors.

kind

The type of object being described — Pod, Deployment, Service, ConfigMap, and so on. Combined with apiVersion, this tells the API server exactly which schema to validate the rest of the manifest against.

metadata

Identifying and organizational information about the object itself, not its desired behavior: name (unique within its namespace), namespace (which logical partition of the cluster it belongs to), labels (key-value pairs used for selection/grouping — Services and Deployments use label selectors to find the Pods they manage), and annotations (non-identifying metadata, often used by tooling rather than Kubernetes itself — e.g., a value read by an Ingress controller or a CI/CD tool).

spec

The heart of the object — what you're declaring you want to be true. For a Deployment, this includes the desired replica count and the Pod template to use; for a Service, the ports and selector; for a PersistentVolumeClaim, the requested storage size and access mode. This is the "desired state" half of the reconciliation model (see that question) — you write the spec, and a controller works to make reality match it.

status

Populated by Kubernetes itself (never written directly by a user in a normal workflow) to reflect the object's currently observed actual state — how many replicas are actually available, what phase a Pod is in, and so on. Comparing spec (desired) against status (actual) is exactly what a reconciliation control loop does on every iteration.

Why this consistent structure matters

Every object type in Kubernetes — built-in or a Custom Resource you define yourself (see that topic) — follows this same apiVersion/kind/metadata/spec/status shape, which is precisely what lets generic tooling (kubectl, Helm, GitOps controllers) work uniformly across every object type without needing type-specific logic for each one.

Related Resources

Kubernetes: Understanding Kubernetes Objects

Open as page

Imperative commands

kubectl run nginx --image=nginx:1.25
kubectl create deployment web --image=myapp:1.0 --replicas=3
kubectl scale deployment web --replicas=5
kubectl delete pod nginx

Each command directly tells Kubernetes an action to perform right now. Fast for quick, one-off tasks and exploration, but there's no durable, reviewable record of the full desired configuration — if you later want to know exactly what flags/settings a running Deployment was created with, you have to inspect the live object rather than read a file, and there's no natural way to track this history in version control.

Declarative configuration

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 5
  template:
    spec:
      containers:
        - name: web
          image: myapp:1.0

kubectl apply -f deployment.yaml

You describe the entire desired state in a file, and kubectl apply computes the diff between that desired state and the cluster's current state, applying only what's needed to reconcile them. Running kubectl apply -f deployment.yaml again with no changes is a safe no-op; running it after editing the file to change replicas: 5 to replicas: 8 scales the deployment, with no separate "scale" command needed.

Why declarative wins for anything beyond experimentation

Version control — the YAML files are the source of truth, can be code-reviewed, diffed, and rolled back using ordinary git history, exactly like application source code.
Idempotency — re-applying the same manifest is always safe, which makes it trivial to build reliable automation (CI/CD pipelines, GitOps controllers) around it, since "did this already run" doesn't need to be tracked separately.
Auditable intent — the manifest describes the complete intended configuration, not just the last command that happened to be run — useful when onboarding someone new to a project, or debugging drift between what's running and what was intended.
Foundation for GitOps — tools like ArgoCD and Flux (see the production operations topic) work by continuously reconciling a cluster against manifests stored in a git repository — a workflow that's only possible because the underlying object management is declarative in the first place.

When imperative commands still make sense

Quick debugging (kubectl run -it --rm debug --image=busybox -- sh to spin up a throwaway shell), one-off scaling during an incident before a proper fix is written and reviewed, or simply exploring a cluster's state (kubectl get, kubectl describe, kubectl logs are inherently imperative/read operations, and there's no declarative equivalent that would make sense for them).

Recognizing that declarative management isn't just "using YAML files instead of flags" — it's the practice that makes GitOps, safe automation, and reliable rollback possible — demonstrates understanding of why the Kubernetes ecosystem converged so strongly on this pattern, not just familiarity with the apply command.

Related Resources

Kubernetes: Object Management

Open as page

What a Namespace actually scopes

kubectl get pods -n team-a       # only pods in the "team-a" namespace
kubectl get pods -n team-b       # only pods in the "team-b" namespace, even if named identically

Object names must be unique within a namespace, but the same name can be reused across different namespaces — team-a and team-b can each have a Deployment named web without conflict. Namespaces are also the scope boundary for RBAC Role bindings (see the security topic) and ResourceQuotas/LimitRanges (see the production operations topic) — a Role granted in one namespace has no effect in another unless explicitly bound there too.

What's NOT namespaced

Some objects are cluster-wide by design and don't belong to any namespace — Nodes, PersistentVolumes (though PersistentVolumeClaims are namespaced), ClusterRoles/ClusterRoleBindings, and CustomResourceDefinitions themselves. This distinction matters when writing RBAC rules or troubleshooting "why can't my namespaced Role see this object" — the object might simply be a cluster-scoped kind that a namespace-scoped Role was never going to be able to grant access to.

Common uses for multiple namespaces

Team/application separation on a shared cluster — team-a, team-b, payments, search, each with its own RBAC rules and ResourceQuotas, letting multiple teams safely share one cluster's underlying infrastructure without stepping on each other.
Environment separation — dev, staging on a shared non-production cluster (though production is very commonly kept on an entirely separate cluster, not just a separate namespace, for a stronger blast-radius boundary — see the multi-tenancy question).
Third-party/system components — many clusters keep cluster infrastructure (monitoring, ingress controllers, cert-manager) in dedicated namespaces (often prefixed kube-system-adjacent, or a custom platform namespace) separate from application workloads, so application-team RBAC doesn't accidentally reach infrastructure components.

The default namespace and its pitfalls

Every object created without an explicit namespace lands in default — fine for quick experimentation, but a real risk in production: it's easy to accidentally deploy to, or accidentally query, the wrong namespace when relying on an implicit default rather than always being explicit. Most teams enforce always specifying -n <namespace> (or setting a non-default namespace as the active context) as a basic operational discipline once a cluster is shared by more than one person or team.

What Namespaces don't provide

A Namespace is a logical, not a hard security, boundary — by default, Pods in different namespaces can still reach each other over the network unless NetworkPolicies explicitly restrict it (see the networking topic), and a sufficiently-privileged ServiceAccount or user can act across namespace boundaries. For workloads needing genuinely strong isolation (fully untrusted multi-tenant code, strict compliance separation), namespaces alone are usually not considered sufficient — see the multi-tenancy question for what additional mechanisms are typically layered on top.

Related Resources

Kubernetes: Namespaces

Open as page

The control loop, abstractly

loop forever:
    observed_state = get_current_state()
    desired_state  = get_desired_state()   # from the object's spec
    if observed_state != desired_state:
        take_action_to_reconcile()

This is a much older pattern than Kubernetes (thermostats and cruise control are the classic non-software examples), but Kubernetes is unusual in building nearly its entire system out of many independent instances of this same loop, each watching a narrow slice of the cluster.

A concrete example: the ReplicaSet controller

spec:
  replicas: 3

The ReplicaSet controller's loop: watch for ReplicaSet objects and their currently-running matching Pods; if fewer than 3 matching Pods exist, create more; if more than 3 exist, delete the excess. This loop runs continuously and reactively (triggered by watch events, not fixed polling — see the API server question) — if a Pod is deleted or a node crashes, the controller notices the actual count dropped below desired and creates a replacement, with no human needing to notice or intervene.

Why "eventually consistent," not "instantly consistent"

A control loop doesn't guarantee the desired and actual state match at every instant — only that the system keeps working toward convergence. Between a node crashing and a replacement Pod becoming Ready, there's a real window where actual state (2 running Pods) doesn't match desired state (3) — this is expected and acceptable; the guarantee is that the gap keeps shrinking and eventually closes, not that it never opens.

Why Kubernetes is composed of many independent, narrow loops

Rather than one monolithic "make everything correct" process, Kubernetes splits reconciliation across many separate controllers, each responsible for one object type: the Deployment controller manages ReplicaSets (creating a new one and scaling the old one down during a rolling update); the ReplicaSet controller manages Pods; the Node controller watches node health; the Endpoint controller keeps Service endpoint lists in sync with matching Pods. Each loop is simple and independently understandable, and this decomposition is exactly what makes the system extensible — a Custom Resource Definition plus an Operator (see that topic) is just adding a new controller that reconciles a new kind of object, using the exact same underlying pattern as every built-in controller.

Why this matters more than it might first appear

Understanding the control-loop pattern explains why Kubernetes behaves the way it does in situations that otherwise seem surprising: why manually editing a Pod created by a Deployment gets silently overwritten (the Deployment's controller reconciles it back to match the template), why deleting a Pod managed by a ReplicaSet just causes a replacement to appear (the controller notices the gap and closes it), and why kubectl apply is safe to re-run repeatedly (each run is just one more observation feeding the same convergence process). A candidate who can explain a surprising Kubernetes behavior by tracing it back to "some controller's reconciliation loop did that" demonstrates real conceptual understanding, not just command memorization.

Related Resources

Kubernetes: Controllers

Open as page

What the CRI standardizes

The kubelet needs to pull images, start/stop containers, and get container status — the Container Runtime Interface defines a standard gRPC API for exactly these operations, so the kubelet can work with any runtime that implements it, without runtime-specific code baked into the kubelet itself.

kubelet ──(CRI gRPC calls)──▶ containerd / CRI-O / any CRI-compliant runtime
                                    │
                                    └──▶ actually creates/manages containers

containerd and CRI-O are the two most widely used CRI-compliant runtimes today — both are lightweight, purpose-built specifically to be driven by Kubernetes (or standalone), unlike Docker, which is a much larger toolset (CLI, build tooling, networking, volumes) built primarily for a human developer's local workflow rather than for being one component in an orchestrated cluster.

Why Docker itself was never CRI-native

Docker predates the CRI standard and was never restructured to implement it directly — Docker Engine has its own internal API, not the CRI gRPC interface the kubelet expects. To let Kubernetes use Docker anyway, an adapter component called dockershim was built directly into the kubelet, translating CRI calls into Docker Engine API calls.

The deprecation, precisely

Starting with Kubernetes 1.24 (2022), dockershim was removed from the kubelet itself. This means: Docker Engine can no longer be used directly as the runtime the kubelet talks to on a 1.24+ cluster — clusters still using Docker needed to migrate their nodes to containerd (or another CRI-compliant runtime) before upgrading past 1.23.

What this deprecation did NOT change

This is the detail that trips people up: Docker (the CLI, docker build, Dockerfiles) is completely unaffected as a developer tool. Container images are built to the OCI (Open Container Initiative) image format, a standard shared by Docker, containerd, and every other modern runtime — an image built with docker build runs identically under containerd with zero changes needed. The deprecation was specifically about which component the kubelet talks to at runtime on a cluster node, not about how developers build or push images. In fact, containerd is itself one of the components inside Docker Engine — Docker uses containerd internally already, so removing dockershim mostly just cut out a redundant middle layer for the Kubernetes-specific path, rather than removing some fundamentally different technology.

Why this is a good interview question

It tests whether a candidate can precisely distinguish "the tool I use to build images on my laptop" from "the component the kubelet uses to run containers on a cluster node" — conflating the two is an extremely common (and revealing) misunderstanding, since the deprecation announcement caused a lot of confusion at the time about whether "Docker" as a whole was somehow being removed from Kubernetes.

Related Resources

Kubernetes: Container Runtimes

Open as page

The kubeconfig file

By default, kubectl reads ~/.kube/config, a YAML file with three related sections:

apiVersion: v1
kind: Config
clusters:
  - name: prod-cluster
    cluster:
      server: https://prod-api.example.com
      certificate-authority-data: <base64-encoded CA cert>
  - name: staging-cluster
    cluster:
      server: https://staging-api.example.com
      ...
users:
  - name: alice
    user:
      client-certificate-data: <base64-encoded cert>
      client-key-data: <base64-encoded key>
contexts:
  - name: prod
    context:
      cluster: prod-cluster
      user: alice
      namespace: production
  - name: staging
    context:
      cluster: staging-cluster
      user: alice
      namespace: default
current-context: staging

clusters — where each cluster's API server lives, and how to verify its identity (the cluster's CA certificate).
users — credentials for authenticating as a specific identity (a client certificate, a bearer token, or a command that generates one dynamically, e.g., for cloud-provider IAM-based auth).
contexts — a named combination of a cluster + a user + (optionally) a default namespace, letting you bundle "which cluster, as whom, in which namespace" into one switchable unit.

Switching contexts

kubectl config get-contexts                  # list all available contexts
kubectl config use-context prod              # switch the active context
kubectl config current-context                # show which one is active

Every kubectl command uses whichever context is currently active, unless overridden per-command with --context=<name>. Accidentally running a command against the wrong active context (a classic "meant to hit staging, actually hit prod" incident) is a well-known operational risk — many teams use shell prompt integrations or wrapper tools that visibly display the current context to reduce this risk.

Overriding the config file location

export KUBECONFIG=/path/to/other-config.yaml
kubectl --kubeconfig=/path/to/other-config.yaml get pods

KUBECONFIG can also point to multiple colon-separated files, which kubectl merges together — useful for combining a base config with cluster-specific credential files generated by different tools (a cloud provider's CLI, a CI pipeline's service account setup).

How managed cloud clusters populate this automatically

Cloud provider CLIs typically offer a command that fetches cluster connection details and merges an appropriate entry into your kubeconfig automatically — e.g., aws eks update-kubeconfig, gcloud container clusters get-credentials, az aks get-credentials — so you rarely hand-write these files for a real cluster; you generate them via the provider's tooling and then just manage which context is active.

Related Resources

Kubernetes: Organizing Cluster Access Using kubeconfig Files

Kubernetes Fundamentals and Architecture

What is Kubernetes, and what problem does it solve?

The problem before orchestration

What Kubernetes actually automates

The core idea: declarative desired state

Why this matters for interviews

Related Resources

What are the core components of the Kubernetes control plane?

The four control plane components

API server (kube-apiserver)

etcd

Scheduler (kube-scheduler)

Controller manager (kube-controller-manager)

Why this separation matters

Managed vs. self-hosted

Related Resources

What are the core components running on a worker node?

The three node-level components

kubelet

Container runtime

kube-proxy

Why nodes need all three, and the control plane doesn't run them

What happens if a node's kubelet stops reporting

Related Resources

What is etcd, and why is it critical to a Kubernetes cluster?

What etcd actually stores

Why Raft consensus matters

Why losing etcd is catastrophic

Backup and disaster recovery

Security note

Related Resources

What is the Kubernetes API server, and how does kubectl interact with it?

The API server as the single front door

The request pipeline

What kubectl actually is

Watches, not polling

Why this design matters

Related Resources

What is a Kubernetes object, and what do apiVersion, kind, metadata, and spec mean?

Anatomy of a manifest

apiVersion

kind

metadata

spec

status

Why this consistent structure matters

Related Resources

What's the difference between declarative and imperative management in Kubernetes?

Imperative commands

Declarative configuration

Why declarative wins for anything beyond experimentation

When imperative commands still make sense

Related Resources

What is a Namespace, and when should you use multiple namespaces?

What a Namespace actually scopes

What's NOT namespaced

Common uses for multiple namespaces

The default namespace and its pitfalls

What Namespaces don't provide

Related Resources

What is the reconciliation/control loop pattern, and why is it central to Kubernetes?

The control loop, abstractly

A concrete example: the ReplicaSet controller

Why "eventually consistent," not "instantly consistent"

Why Kubernetes is composed of many independent, narrow loops

Why this matters more than it might first appear

Related Resources

What is the Container Runtime Interface (CRI), and what changed when Docker was deprecated as a Kubernetes runtime?

What the CRI standardizes

Why Docker itself was never CRI-native

The deprecation, precisely

What this deprecation did NOT change

Why this is a good interview question

Related Resources

How does kubectl know which cluster to talk to?

The kubeconfig file

Switching contexts

Overriding the config file location

How managed cloud clusters populate this automatically

Related Resources

API server (`kube-apiserver`)

Scheduler (`kube-scheduler`)

Controller manager (`kube-controller-manager`)