How does a custom controller's reconcile loop typically work?

7 minadvancedreconcile-loopcustom-controllerscontroller-runtime

Quick Answer

A custom controller watches the API server for changes to the resource(s) it cares about, and for each change, calls a `Reconcile` function that reads the object's current desired state (`spec`) and the real-world current state of whatever it manages, then takes whatever action closes the gap between them — creating, updating, or deleting resources as needed. Critically, a well-written reconcile function is designed to be safely re-run repeatedly (idempotent) and to tolerate being called even when nothing has actually changed, since the underlying watch/queue mechanism doesn't guarantee exactly-once, only at-least-once delivery of change notifications.

Detailed Answer

The general shape, using controller-runtime concepts (the library underlying Kubebuilder/Operator SDK)

func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // 1. Fetch the current desired state (the custom resource itself)
    var db examplev1.Database
    if err := r.Get(ctx, req.NamespacedName, &db); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // 2. Check the real-world current state of what this object manages
    var sts appsv1.StatefulSet
    err := r.Get(ctx, req.NamespacedName, &sts)
    if errors.IsNotFound(err) {
        // 3. Take action to close the gap: nothing exists yet, create it
        newSts := buildStatefulSetFor(&db)
        return ctrl.Result{}, r.Create(ctx, newSts)
    }

    // 4. Something exists -- check if it matches desired state, update if not
    if *sts.Spec.Replicas != db.Spec.Replicas {
        sts.Spec.Replicas = &db.Spec.Replicas
        return ctrl.Result{}, r.Update(ctx, &sts)
    }

    return ctrl.Result{}, nil   // already matches desired state, nothing to do
}

The watch-queue-reconcile pipeline

API server (etcd change: a Database object created/updated/deleted)
   → controller-runtime's informer/watch mechanism notices the change
   → the affected object's identity is added to a work queue
   → a worker pulls it off the queue and calls Reconcile(ctx, thatObjectsRequest)
   → Reconcile reads CURRENT actual state fresh (not relying on the queue
     event's payload alone) and takes whatever action is needed

Notice that Reconcile re-fetches the object's current state itself, rather than trusting whatever triggered this particular call — this is a deliberate and important design principle.

Why reconcile functions must be idempotent

The underlying delivery guarantee for watch events is at-least-once, not exactly-once — the same object can trigger Reconcile being called multiple times for what was conceptually one logical change (or even with no change at all, since controllers also periodically re-sync/re-queue everything as a safety net against missed events). A correctly written Reconcile function must therefore be safe to call repeatedly with no net effect if nothing actually needs to change — checking current state and only acting if it genuinely differs from desired state (as in the example above), rather than blindly performing an action (like "create a new StatefulSet") on every single invocation regardless of whether one already exists.

Why it re-fetches state rather than trusting the event payload

Between the moment a watch event is generated and the moment Reconcile actually runs (there can be a queue delay, retries, or several events batched together), the real state may have changed further — re-fetching fresh, current state at the start of Reconcile (rather than acting on a possibly-stale snapshot carried in the event itself) ensures the reconciliation logic is always working from an accurate picture at the moment it actually takes action, consistent with how every built-in Kubernetes controller operates (see the fundamentals topic's reconciliation question).

Handling errors and retries

return ctrl.Result{}, err   // returning a non-nil error triggers an automatic retry, with backoff
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil  // explicitly reconcile again later

If Reconcile returns an error, the controller-runtime framework automatically re-queues the object for another attempt, with exponential backoff — this is what makes transient failures (a temporary API server hiccup, a downstream dependency briefly unavailable) self-healing without custom retry logic needing to be hand-written for every possible failure point.

Why this level of detail matters for an interview

Understanding that reconcile loops must be idempotent, must re-fetch current state rather than trust stale event data, and rely on automatic requeue-on-error rather than custom retry logic demonstrates genuine hands-on experience building or deeply understanding controllers — distinguishing this from someone who only knows the Operator pattern's name and high-level purpose without grasping the actual mechanics that make it correct and resilient in practice.