What is a Custom Resource Definition (CRD), and why would you create one?

A CRD lets you define an entirely new kind of object in the Kubernetes API — with your own schema, your own `kind`, managed exactly like any built-in object (`kubectl get`, `kubectl apply`, stored in etcd, subject to RBAC) — without modifying Kubernetes itself. You'd create one to represent a concept that's meaningful to your application or platform but has no built-in Kubernetes equivalent (a "Database" object representing a managed database instance, a "CronBackup" representing a scheduled backup policy), giving that concept a first-class, declarative API rather than hand-rolling it as ad-hoc ConfigMap conventions or external tooling state.

What is the Operator pattern, and how does it build on CRDs and controllers?

An Operator is a custom controller, paired with one or more CRDs, that encodes the operational knowledge of running a specific piece of software — not just creating/deleting resources, but handling the ongoing lifecycle tasks a human operator would otherwise do manually (backups, failover, version upgrades, scaling decisions specific to that software). It extends the basic reconciliation-loop pattern (see that question) with domain-specific logic, letting complex, stateful applications be managed declaratively ("I want a 3-node PostgreSQL cluster, version 15") the same way a Deployment manages simple stateless replicas.

What's the difference between a CRD-based Operator and a Helm chart?

A Helm chart is a one-time (or repeated, on-demand) templating and installation mechanism — you run `helm install`/`helm upgrade`, and once the resulting resources are applied, Helm's job is done until you explicitly run another command. An Operator is a continuously running controller that keeps actively reconciling and reacting to the live state of the software it manages, indefinitely, without needing a human to trigger each reconciliation. They solve genuinely different problems and are often used together: Helm to package and install the Operator itself, and the Operator to then provide ongoing, automated lifecycle management of the actual application afterward.

What is API aggregation in Kubernetes?

API aggregation lets an entirely separate API server (running as its own component, implementing its own logic) register itself with the main Kubernetes API server and appear as if it were a native part of the Kubernetes API — requests for its registered API group are transparently proxied to the aggregated server. This is a more heavyweight extension mechanism than a CRD, used when you need to implement genuinely custom API behavior (not just a new schema with a standard controller reacting to it) — metrics APIs (like the Metrics API metrics-server implements) are the most common real-world example.

How does a custom controller's reconcile loop typically work?

A custom controller watches the API server for changes to the resource(s) it cares about, and for each change, calls a `Reconcile` function that reads the object's current desired state (`spec`) and the real-world current state of whatever it manages, then takes whatever action closes the gap between them — creating, updating, or deleting resources as needed. Critically, a well-written reconcile function is designed to be safely re-run repeatedly (idempotent) and to tolerate being called even when nothing has actually changed, since the underlying watch/queue mechanism doesn't guarantee exactly-once, only at-least-once delivery of change notifications.

Custom Resources and Extensibility

Extending the Kubernetes API itself with Custom Resource Definitions, Operators, and custom controllers.

Difficulty

Open as page

Defining a CRD

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.example.com
spec:
  group: example.com
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                engine:
                  type: string
                  enum: ["postgres", "mysql"]
                storageSize:
                  type: string
                replicas:
                  type: integer
  scope: Namespaced
  names:
    plural: databases
    singular: database
    kind: Database

Once this CRD is applied, Database becomes a genuinely new, first-class Kubernetes object type — the API server now accepts, validates (against the declared schema), and stores Database objects exactly as it does built-in ones like Deployments or Services.

Creating an instance of your new custom resource

apiVersion: example.com/v1
kind: Database
metadata:
  name: my-app-db
spec:
  engine: postgres
  storageSize: "50Gi"
  replicas: 3

kubectl get databases
kubectl describe database my-app-db
kubectl apply -f my-database.yaml

This works with kubectl, RBAC, kubectl apply/GitOps workflows, and any generic Kubernetes tooling — all identically to how they work with built-in objects, because a CRD-defined custom resource genuinely is just another object type to the API server, with no special-casing needed.

Why create one, rather than using a ConfigMap or an external system

First-class API semantics — a CRD gets schema validation, versioning, RBAC scoping, and kubectl support natively; representing the same concept as a cleverly-structured ConfigMap gets none of this for free, and requires custom tooling to validate/interpret its contents.
A natural fit for the reconciliation model — a CRD paired with a custom controller (an Operator — see that question) lets you build genuinely new, declarative "desired state, reconciled automatically" behavior for concepts specific to your domain, using the exact same pattern that powers Deployments and every other built-in controller.
Discoverability and consistency — anyone familiar with kubectl already knows how to interact with your custom resource; there's no separate CLI or API convention to learn.

A CRD alone is just a schema — it does nothing by itself

Critically, defining a CRD and creating Database objects does not, on its own, cause any actual database to be provisioned anywhere — a CRD only teaches the API server to accept, validate, and store objects of that shape. Making something actually happen in response to a Database object being created (actually provisioning a real database instance) requires a controller watching for these objects and reconciling real-world state to match them — this is exactly the role an Operator plays (see that question), and the combination of "CRD defines the shape" + "Operator provides the behavior" is the standard, complete pattern for meaningfully extending Kubernetes.

When creating a CRD is (and isn't) the right call

Worth the investment when you're building genuine platform/infrastructure tooling meant to be consumed declaratively by many users or teams, especially when the underlying concept benefits from Kubernetes-native reconciliation (self-healing, GitOps-compatible desired state). Often overkill for a one-off internal need that a simpler mechanism (a ConfigMap, a small external service, a script) would satisfy just as well with far less implementation effort — CRDs plus a working Operator represent real engineering investment, not a lightweight configuration trick.

Related Resources

Kubernetes: Custom Resources

Open as page

The problem Operators solve: encoding operational expertise as code

Running a stateful, complex piece of software well — a PostgreSQL cluster, Kafka, Elasticsearch — typically requires ongoing human operational knowledge: how to safely perform a failover, how to correctly execute a version upgrade without data loss, how to resize storage without downtime, what a healthy vs. unhealthy cluster state actually looks like for this specific software. StatefulSets (see the workload controllers topic) solve the scheduling and identity problem for stateful workloads, but know nothing about this specific software's operational rules — an Operator is where that specialized knowledge gets encoded as actual, automated, running code.

Anatomy: a CRD plus a controller with domain-specific logic

apiVersion: postgresql.example.com/v1
kind: PostgresCluster
metadata:
  name: my-app-db
spec:
  version: "15"
  replicas: 3
  storageSize: "100Gi"

Behind the scenes, an Operator's controller watches for PostgresCluster objects (a CRD) and reconciles the actual cluster state toward this desired spec — but unlike a generic controller managing simple replica counts, the Operator's reconciliation logic understands PostgreSQL-specific concerns:

Operator's reconciliation loop, for a PostgresCluster object:
   → does a StatefulSet with the right replica count and version exist? create/update if not.
   → is exactly one replica currently the primary, and are the others properly
     configured as streaming replicas? fix the replication topology if not.
   → if the current primary becomes unhealthy, orchestrate a safe failover
     to promote a healthy replica -- following Postgres's own specific
     failover procedure, not a generic "just restart it" approach.
   → if spec.version changes, perform a safe, ordered version upgrade
     across replicas, following Postgres's documented upgrade procedure.

Why this is more than "just a controller"

Every controller (including the built-in ones for Deployments, ReplicaSets, and so on) implements the same reconciliation-loop pattern — what makes something specifically an Operator is that the reconciliation logic encodes deep, software-specific operational knowledge, going well beyond simple "keep N replicas running." A well-built Operator effectively automates tasks a skilled human database administrator (or Kafka administrator, or whatever the target software is) would otherwise perform manually and carefully, making that expertise repeatable, consistent, and available on-demand via a simple declarative spec.

The Operator maturity model

Not every Operator does everything described above — the Operator Framework's commonly-cited maturity levels range from Level 1 (basic install/configuration automation) through Level 5 (full auto-pilot: automated upgrades, failure detection and recovery, and horizontal/vertical auto-scaling, all handled without human intervention). Many real-world Operators sit somewhere in the middle — automating the tedious/error-prone parts (initial setup, routine scaling, backups) while still leaving genuinely judgment-heavy decisions (a risky major version upgrade, a disaster-recovery scenario) to a human, deliberately.

Where Operators come from

You can build a custom Operator yourself (frameworks like the Operator SDK and Kubebuilder scaffold much of the boilerplate — CRD generation, controller wiring, testing setup), or, far more commonly for popular software, install an existing, published Operator built by the software's vendor or community (e.g., the Postgres Operator, Elasticsearch Operator, Prometheus Operator) via OperatorHub or a Helm chart, rather than building one from scratch for widely-used software that already has a mature Operator available.

Distinguishing an Operator from "just any controller" by pointing specifically to the domain-specific operational knowledge it encodes (failover procedures, upgrade sequencing, backup orchestration) — rather than just defining it as "a controller for custom resources" — demonstrates a real grasp of why the pattern exists and what problem it's actually solving.

Related Resources

Kubernetes: Operator pattern

Open as page

Helm — a packaging and installation-time tool

helm install my-postgres bitnami/postgresql --set replicaCount=3

Helm renders templates into concrete Kubernetes manifests and applies them once, at the moment you run install or upgrade — after that, Helm itself has no ongoing, running presence in the cluster reacting to changes. If the underlying PostgreSQL Pod crashes, or a replica falls out of sync, Helm does nothing about it (that's the job of whatever controller is managing the resulting objects — typically just a StatefulSet's own basic reconciliation, with no PostgreSQL-specific operational awareness).

Operator — a continuously running, domain-aware controller

apiVersion: postgresql.example.com/v1
kind: PostgresCluster
metadata:
  name: my-postgres
spec:
  replicas: 3

Once this custom resource exists, the Operator's controller is continuously watching it (and the real state of the PostgreSQL cluster it manages) indefinitely — not just at the moment of initial creation. If a replica becomes unhealthy, the primary fails, or the spec is edited to request a version upgrade, the Operator reacts and takes appropriate, software-specific action, on an ongoing basis, with no human needing to run any command to trigger each reaction.

The key distinction: point-in-time templating vs. ongoing reconciliation

	Helm	Operator
When it acts	Only when you explicitly run `install`/`upgrade`/`rollback`	Continuously, in response to any relevant change or failure
What it knows	How to render and apply YAML templates	Deep, software-specific operational logic (failover, upgrades, backups)
Ongoing presence in the cluster	None (no running component after `install` completes)	A running controller Pod (or Deployment), watching indefinitely
Handles a replica crashing at 3am	No — relies on whatever it deployed (e.g., a plain StatefulSet) to handle this on its own, generically	Yes — this is exactly the kind of scenario Operators are built to actively manage

Why they're commonly used together, not as competing choices

A very common real-world pattern: use Helm to install the Operator itself (the Operator's own Deployment, its CRDs, its RBAC rules) as a one-time setup step, and then interact with the application going forward purely through the Operator's custom resources (kubectl apply -f postgres-cluster.yaml), letting the now-running Operator handle all further lifecycle management continuously. This isn't a contradiction — Helm and Operators solve different layers of the same overall problem (installing software vs. operating it long-term), and combining them is the standard, not an unusual choice.

The distinction to articulate clearly: Helm is fundamentally a templating and installation-time tool with no ongoing runtime presence, while an Operator is a continuously running controller with real operational awareness of the software it manages — conflating "a Helm chart that installs a complex application" with "an Operator that manages that application's ongoing lifecycle" is a common surface-level misunderstanding that a precise answer should specifically avoid.

Related Resources

Kubernetes: Operator pattern

Open as page

How it differs from a CRD

A CRD (see that question) extends the API by adding a new schema that the existing API server itself stores and validates — all the actual request handling still happens inside the one main API server, backed by etcd like everything else. API aggregation is a different, more heavyweight mechanism: it registers a genuinely separate API server process, implementing its own request-handling logic (potentially not backed by etcd at all, and not bound by the standard CRD schema/validation model), and the main API server simply proxies matching requests to it.

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io
spec:
  service:
    name: metrics-server
    namespace: kube-system
  group: metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: false
  groupPriorityMinimum: 100
  versionPriority: 100

This APIService object tells the main API server: "requests for metrics.k8s.io/v1beta1 should be forwarded to the metrics-server Service" — from a client's perspective (kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes), this looks and behaves exactly like any other native Kubernetes API endpoint, but the actual logic answering the request lives entirely in the separate metrics-server component, not in the main API server or etcd.

Why metrics-server needs this, rather than being a CRD

Metrics-server's data (current CPU/memory usage) is fundamentally not the kind of thing etcd is designed to store — it's constantly changing, real-time, in-memory data with no need for durable persistence or the versioned-object history semantics etcd/CRDs provide. Implementing this as a CRD would force an awkward fit (constantly writing rapidly-changing snapshot data as etcd-backed objects); API aggregation instead lets metrics-server serve this data from its own purpose-built, in-memory implementation, while still appearing as a normal, integrated part of the Kubernetes API that kubectl and the HPA (see that question) can query uniformly.

When to reach for API aggregation vs. a CRD

	CRD (+ optional controller/Operator)	API aggregation
Backing storage	etcd (via the main API server)	Whatever the aggregated API server implements itself
Implementation effort	Lower — mostly declaring a schema, optionally a controller	Higher — building and running an entire separate API server
Typical use case	Representing a new kind of object with standard CRUD + reconciliation	Custom, non-standard request handling; data that doesn't fit the object-storage model (metrics, specialized queries)
Real-world examples	Most Operators (databases, certificate management, service meshes)	metrics-server, custom authorization/authentication extensions

The overwhelming majority of Kubernetes extensibility needs — representing a new application-specific concept, building automation/reconciliation around it — are well served by a CRD plus a controller/Operator, which is significantly simpler to build and maintain than a full aggregated API server. API aggregation is reserved for the comparatively rare cases where you genuinely need custom request-handling logic that doesn't fit the standard "object stored in etcd, reconciled by a controller" model — metrics-server remains the most commonly cited real-world example precisely because its use case (ephemeral, real-time data) is a poor fit for CRDs but a good fit for aggregation.

Recognizing that API aggregation and CRDs solve different classes of extension problems — and specifically that metrics-server uses aggregation rather than being a CRD, and why — demonstrates a level of Kubernetes internals understanding beyond the more commonly discussed CRD/Operator pattern alone.

Related Resources

Kubernetes: Kubernetes API Aggregation Layer

Open as page

The general shape, using controller-runtime concepts (the library underlying Kubebuilder/Operator SDK)

func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // 1. Fetch the current desired state (the custom resource itself)
    var db examplev1.Database
    if err := r.Get(ctx, req.NamespacedName, &db); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // 2. Check the real-world current state of what this object manages
    var sts appsv1.StatefulSet
    err := r.Get(ctx, req.NamespacedName, &sts)
    if errors.IsNotFound(err) {
        // 3. Take action to close the gap: nothing exists yet, create it
        newSts := buildStatefulSetFor(&db)
        return ctrl.Result{}, r.Create(ctx, newSts)
    }

    // 4. Something exists -- check if it matches desired state, update if not
    if *sts.Spec.Replicas != db.Spec.Replicas {
        sts.Spec.Replicas = &db.Spec.Replicas
        return ctrl.Result{}, r.Update(ctx, &sts)
    }

    return ctrl.Result{}, nil   // already matches desired state, nothing to do
}

The watch-queue-reconcile pipeline

API server (etcd change: a Database object created/updated/deleted)
   → controller-runtime's informer/watch mechanism notices the change
   → the affected object's identity is added to a work queue
   → a worker pulls it off the queue and calls Reconcile(ctx, thatObjectsRequest)
   → Reconcile reads CURRENT actual state fresh (not relying on the queue
     event's payload alone) and takes whatever action is needed

Notice that Reconcile re-fetches the object's current state itself, rather than trusting whatever triggered this particular call — this is a deliberate and important design principle.

Why reconcile functions must be idempotent

The underlying delivery guarantee for watch events is at-least-once, not exactly-once — the same object can trigger Reconcile being called multiple times for what was conceptually one logical change (or even with no change at all, since controllers also periodically re-sync/re-queue everything as a safety net against missed events). A correctly written Reconcile function must therefore be safe to call repeatedly with no net effect if nothing actually needs to change — checking current state and only acting if it genuinely differs from desired state (as in the example above), rather than blindly performing an action (like "create a new StatefulSet") on every single invocation regardless of whether one already exists.

Why it re-fetches state rather than trusting the event payload

Between the moment a watch event is generated and the moment Reconcile actually runs (there can be a queue delay, retries, or several events batched together), the real state may have changed further — re-fetching fresh, current state at the start of Reconcile (rather than acting on a possibly-stale snapshot carried in the event itself) ensures the reconciliation logic is always working from an accurate picture at the moment it actually takes action, consistent with how every built-in Kubernetes controller operates (see the fundamentals topic's reconciliation question).

Handling errors and retries

return ctrl.Result{}, err   // returning a non-nil error triggers an automatic retry, with backoff
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil  // explicitly reconcile again later

If Reconcile returns an error, the controller-runtime framework automatically re-queues the object for another attempt, with exponential backoff — this is what makes transient failures (a temporary API server hiccup, a downstream dependency briefly unavailable) self-healing without custom retry logic needing to be hand-written for every possible failure point.

Why this level of detail matters for an interview

Understanding that reconcile loops must be idempotent, must re-fetch current state rather than trust stale event data, and rely on automatic requeue-on-error rather than custom retry logic demonstrates genuine hands-on experience building or deeply understanding controllers — distinguishing this from someone who only knows the Operator pattern's name and high-level purpose without grasping the actual mechanics that make it correct and resilient in practice.

Related Resources

Kubebuilder: Controller Concepts