Defining a CRD
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: databases.example.com
spec:
group: example.com
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
engine:
type: string
enum: ["postgres", "mysql"]
storageSize:
type: string
replicas:
type: integer
scope: Namespaced
names:
plural: databases
singular: database
kind: Database
Once this CRD is applied, Database becomes a genuinely new, first-class Kubernetes object type — the API server now accepts, validates (against the declared schema), and stores Database objects exactly as it does built-in ones like Deployments or Services.
Creating an instance of your new custom resource
apiVersion: example.com/v1
kind: Database
metadata:
name: my-app-db
spec:
engine: postgres
storageSize: "50Gi"
replicas: 3
kubectl get databases
kubectl describe database my-app-db
kubectl apply -f my-database.yaml
This works with kubectl, RBAC, kubectl apply/GitOps workflows, and any generic Kubernetes tooling — all identically to how they work with built-in objects, because a CRD-defined custom resource genuinely is just another object type to the API server, with no special-casing needed.
Why create one, rather than using a ConfigMap or an external system
- First-class API semantics — a CRD gets schema validation, versioning, RBAC scoping, and
kubectlsupport natively; representing the same concept as a cleverly-structured ConfigMap gets none of this for free, and requires custom tooling to validate/interpret its contents. - A natural fit for the reconciliation model — a CRD paired with a custom controller (an Operator — see that question) lets you build genuinely new, declarative "desired state, reconciled automatically" behavior for concepts specific to your domain, using the exact same pattern that powers Deployments and every other built-in controller.
- Discoverability and consistency — anyone familiar with
kubectlalready knows how to interact with your custom resource; there's no separate CLI or API convention to learn.
A CRD alone is just a schema — it does nothing by itself
Critically, defining a CRD and creating Database objects does not, on its own, cause any actual database to be provisioned anywhere — a CRD only teaches the API server to accept, validate, and store objects of that shape. Making something actually happen in response to a Database object being created (actually provisioning a real database instance) requires a controller watching for these objects and reconciling real-world state to match them — this is exactly the role an Operator plays (see that question), and the combination of "CRD defines the shape" + "Operator provides the behavior" is the standard, complete pattern for meaningfully extending Kubernetes.
When creating a CRD is (and isn't) the right call
Worth the investment when you're building genuine platform/infrastructure tooling meant to be consumed declaratively by many users or teams, especially when the underlying concept benefits from Kubernetes-native reconciliation (self-healing, GitOps-compatible desired state). Often overkill for a one-off internal need that a simpler mechanism (a ConfigMap, a small external service, a script) would satisfy just as well with far less implementation effort — CRDs plus a working Operator represent real engineering investment, not a lightweight configuration trick.
Related Resources
The problem Operators solve: encoding operational expertise as code
Running a stateful, complex piece of software well — a PostgreSQL cluster, Kafka, Elasticsearch — typically requires ongoing human operational knowledge: how to safely perform a failover, how to correctly execute a version upgrade without data loss, how to resize storage without downtime, what a healthy vs. unhealthy cluster state actually looks like for this specific software. StatefulSets (see the workload controllers topic) solve the scheduling and identity problem for stateful workloads, but know nothing about this specific software's operational rules — an Operator is where that specialized knowledge gets encoded as actual, automated, running code.
Anatomy: a CRD plus a controller with domain-specific logic
apiVersion: postgresql.example.com/v1
kind: PostgresCluster
metadata:
name: my-app-db
spec:
version: "15"
replicas: 3
storageSize: "100Gi"
Behind the scenes, an Operator's controller watches for PostgresCluster objects (a CRD) and reconciles the actual cluster state toward this desired spec — but unlike a generic controller managing simple replica counts, the Operator's reconciliation logic understands PostgreSQL-specific concerns:
Operator's reconciliation loop, for a PostgresCluster object:
→ does a StatefulSet with the right replica count and version exist? create/update if not.
→ is exactly one replica currently the primary, and are the others properly
configured as streaming replicas? fix the replication topology if not.
→ if the current primary becomes unhealthy, orchestrate a safe failover
to promote a healthy replica -- following Postgres's own specific
failover procedure, not a generic "just restart it" approach.
→ if spec.version changes, perform a safe, ordered version upgrade
across replicas, following Postgres's documented upgrade procedure.
Why this is more than "just a controller"
Every controller (including the built-in ones for Deployments, ReplicaSets, and so on) implements the same reconciliation-loop pattern — what makes something specifically an Operator is that the reconciliation logic encodes deep, software-specific operational knowledge, going well beyond simple "keep N replicas running." A well-built Operator effectively automates tasks a skilled human database administrator (or Kafka administrator, or whatever the target software is) would otherwise perform manually and carefully, making that expertise repeatable, consistent, and available on-demand via a simple declarative spec.
The Operator maturity model
Not every Operator does everything described above — the Operator Framework's commonly-cited maturity levels range from Level 1 (basic install/configuration automation) through Level 5 (full auto-pilot: automated upgrades, failure detection and recovery, and horizontal/vertical auto-scaling, all handled without human intervention). Many real-world Operators sit somewhere in the middle — automating the tedious/error-prone parts (initial setup, routine scaling, backups) while still leaving genuinely judgment-heavy decisions (a risky major version upgrade, a disaster-recovery scenario) to a human, deliberately.
Where Operators come from
You can build a custom Operator yourself (frameworks like the Operator SDK and Kubebuilder scaffold much of the boilerplate — CRD generation, controller wiring, testing setup), or, far more commonly for popular software, install an existing, published Operator built by the software's vendor or community (e.g., the Postgres Operator, Elasticsearch Operator, Prometheus Operator) via OperatorHub or a Helm chart, rather than building one from scratch for widely-used software that already has a mature Operator available.
Distinguishing an Operator from "just any controller" by pointing specifically to the domain-specific operational knowledge it encodes (failover procedures, upgrade sequencing, backup orchestration) — rather than just defining it as "a controller for custom resources" — demonstrates a real grasp of why the pattern exists and what problem it's actually solving.
Related Resources
Helm — a packaging and installation-time tool
helm install my-postgres bitnami/postgresql --set replicaCount=3
Helm renders templates into concrete Kubernetes manifests and applies them once, at the moment you run install or upgrade — after that, Helm itself has no ongoing, running presence in the cluster reacting to changes. If the underlying PostgreSQL Pod crashes, or a replica falls out of sync, Helm does nothing about it (that's the job of whatever controller is managing the resulting objects — typically just a StatefulSet's own basic reconciliation, with no PostgreSQL-specific operational awareness).
Operator — a continuously running, domain-aware controller
apiVersion: postgresql.example.com/v1
kind: PostgresCluster
metadata:
name: my-postgres
spec:
replicas: 3
Once this custom resource exists, the Operator's controller is continuously watching it (and the real state of the PostgreSQL cluster it manages) indefinitely — not just at the moment of initial creation. If a replica becomes unhealthy, the primary fails, or the spec is edited to request a version upgrade, the Operator reacts and takes appropriate, software-specific action, on an ongoing basis, with no human needing to run any command to trigger each reaction.
The key distinction: point-in-time templating vs. ongoing reconciliation
| Helm | Operator | |
|---|---|---|
| When it acts | Only when you explicitly run install/upgrade/rollback | Continuously, in response to any relevant change or failure |
| What it knows | How to render and apply YAML templates | Deep, software-specific operational logic (failover, upgrades, backups) |
| Ongoing presence in the cluster | None (no running component after install completes) | A running controller Pod (or Deployment), watching indefinitely |
| Handles a replica crashing at 3am | No — relies on whatever it deployed (e.g., a plain StatefulSet) to handle this on its own, generically | Yes — this is exactly the kind of scenario Operators are built to actively manage |
Why they're commonly used together, not as competing choices
A very common real-world pattern: use Helm to install the Operator itself (the Operator's own Deployment, its CRDs, its RBAC rules) as a one-time setup step, and then interact with the application going forward purely through the Operator's custom resources (kubectl apply -f postgres-cluster.yaml), letting the now-running Operator handle all further lifecycle management continuously. This isn't a contradiction — Helm and Operators solve different layers of the same overall problem (installing software vs. operating it long-term), and combining them is the standard, not an unusual choice.
The distinction to articulate clearly: Helm is fundamentally a templating and installation-time tool with no ongoing runtime presence, while an Operator is a continuously running controller with real operational awareness of the software it manages — conflating "a Helm chart that installs a complex application" with "an Operator that manages that application's ongoing lifecycle" is a common surface-level misunderstanding that a precise answer should specifically avoid.
Related Resources
How it differs from a CRD
A CRD (see that question) extends the API by adding a new schema that the existing API server itself stores and validates — all the actual request handling still happens inside the one main API server, backed by etcd like everything else. API aggregation is a different, more heavyweight mechanism: it registers a genuinely separate API server process, implementing its own request-handling logic (potentially not backed by etcd at all, and not bound by the standard CRD schema/validation model), and the main API server simply proxies matching requests to it.
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1beta1.metrics.k8s.io
spec:
service:
name: metrics-server
namespace: kube-system
group: metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: false
groupPriorityMinimum: 100
versionPriority: 100
This APIService object tells the main API server: "requests for metrics.k8s.io/v1beta1 should be forwarded to the metrics-server Service" — from a client's perspective (kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes), this looks and behaves exactly like any other native Kubernetes API endpoint, but the actual logic answering the request lives entirely in the separate metrics-server component, not in the main API server or etcd.
Why metrics-server needs this, rather than being a CRD
Metrics-server's data (current CPU/memory usage) is fundamentally not the kind of thing etcd is designed to store — it's constantly changing, real-time, in-memory data with no need for durable persistence or the versioned-object history semantics etcd/CRDs provide. Implementing this as a CRD would force an awkward fit (constantly writing rapidly-changing snapshot data as etcd-backed objects); API aggregation instead lets metrics-server serve this data from its own purpose-built, in-memory implementation, while still appearing as a normal, integrated part of the Kubernetes API that kubectl and the HPA (see that question) can query uniformly.
When to reach for API aggregation vs. a CRD
| CRD (+ optional controller/Operator) | API aggregation | |
|---|---|---|
| Backing storage | etcd (via the main API server) | Whatever the aggregated API server implements itself |
| Implementation effort | Lower — mostly declaring a schema, optionally a controller | Higher — building and running an entire separate API server |
| Typical use case | Representing a new kind of object with standard CRUD + reconciliation | Custom, non-standard request handling; data that doesn't fit the object-storage model (metrics, specialized queries) |
| Real-world examples | Most Operators (databases, certificate management, service meshes) | metrics-server, custom authorization/authentication extensions |
The overwhelming majority of Kubernetes extensibility needs — representing a new application-specific concept, building automation/reconciliation around it — are well served by a CRD plus a controller/Operator, which is significantly simpler to build and maintain than a full aggregated API server. API aggregation is reserved for the comparatively rare cases where you genuinely need custom request-handling logic that doesn't fit the standard "object stored in etcd, reconciled by a controller" model — metrics-server remains the most commonly cited real-world example precisely because its use case (ephemeral, real-time data) is a poor fit for CRDs but a good fit for aggregation.
Recognizing that API aggregation and CRDs solve different classes of extension problems — and specifically that metrics-server uses aggregation rather than being a CRD, and why — demonstrates a level of Kubernetes internals understanding beyond the more commonly discussed CRD/Operator pattern alone.
Related Resources
The general shape, using controller-runtime concepts (the library underlying Kubebuilder/Operator SDK)
func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// 1. Fetch the current desired state (the custom resource itself)
var db examplev1.Database
if err := r.Get(ctx, req.NamespacedName, &db); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// 2. Check the real-world current state of what this object manages
var sts appsv1.StatefulSet
err := r.Get(ctx, req.NamespacedName, &sts)
if errors.IsNotFound(err) {
// 3. Take action to close the gap: nothing exists yet, create it
newSts := buildStatefulSetFor(&db)
return ctrl.Result{}, r.Create(ctx, newSts)
}
// 4. Something exists -- check if it matches desired state, update if not
if *sts.Spec.Replicas != db.Spec.Replicas {
sts.Spec.Replicas = &db.Spec.Replicas
return ctrl.Result{}, r.Update(ctx, &sts)
}
return ctrl.Result{}, nil // already matches desired state, nothing to do
}
The watch-queue-reconcile pipeline
API server (etcd change: a Database object created/updated/deleted)
→ controller-runtime's informer/watch mechanism notices the change
→ the affected object's identity is added to a work queue
→ a worker pulls it off the queue and calls Reconcile(ctx, thatObjectsRequest)
→ Reconcile reads CURRENT actual state fresh (not relying on the queue
event's payload alone) and takes whatever action is needed
Notice that Reconcile re-fetches the object's current state itself, rather than trusting whatever triggered this particular call — this is a deliberate and important design principle.
Why reconcile functions must be idempotent
The underlying delivery guarantee for watch events is at-least-once, not exactly-once — the same object can trigger Reconcile being called multiple times for what was conceptually one logical change (or even with no change at all, since controllers also periodically re-sync/re-queue everything as a safety net against missed events). A correctly written Reconcile function must therefore be safe to call repeatedly with no net effect if nothing actually needs to change — checking current state and only acting if it genuinely differs from desired state (as in the example above), rather than blindly performing an action (like "create a new StatefulSet") on every single invocation regardless of whether one already exists.
Why it re-fetches state rather than trusting the event payload
Between the moment a watch event is generated and the moment Reconcile actually runs (there can be a queue delay, retries, or several events batched together), the real state may have changed further — re-fetching fresh, current state at the start of Reconcile (rather than acting on a possibly-stale snapshot carried in the event itself) ensures the reconciliation logic is always working from an accurate picture at the moment it actually takes action, consistent with how every built-in Kubernetes controller operates (see the fundamentals topic's reconciliation question).
Handling errors and retries
return ctrl.Result{}, err // returning a non-nil error triggers an automatic retry, with backoff
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil // explicitly reconcile again later
If Reconcile returns an error, the controller-runtime framework automatically re-queues the object for another attempt, with exponential backoff — this is what makes transient failures (a temporary API server hiccup, a downstream dependency briefly unavailable) self-healing without custom retry logic needing to be hand-written for every possible failure point.
Why this level of detail matters for an interview
Understanding that reconcile loops must be idempotent, must re-fetch current state rather than trust stale event data, and rely on automatic requeue-on-error rather than custom retry logic demonstrates genuine hands-on experience building or deeply understanding controllers — distinguishing this from someone who only knows the Operator pattern's name and high-level purpose without grasping the actual mechanics that make it correct and resilient in practice.