What is a Kubernetes Service, and why is it needed given Pods are ephemeral?

A Service provides a single, stable virtual IP address and DNS name that load-balances traffic across a dynamic, changing set of Pods matched by a label selector — since Pods are created and destroyed constantly (rescheduled, scaled, replaced during rollouts) and each gets a new IP address every time, nothing else in the cluster could reliably address them directly. A Service decouples "who I need to talk to" from "which specific Pod IPs currently exist," with Kubernetes keeping the mapping continuously up to date.

What are the different Service types, and when do you use each?

**ClusterIP** (the default) exposes a Service only inside the cluster, on an internal virtual IP. **NodePort** additionally opens a static port on every node's own IP, making the Service reachable from outside the cluster via any node's address. **LoadBalancer** provisions an external cloud load balancer (on supported cloud providers) that routes to the Service, giving it a real external IP. **ExternalName** maps a Service name to an external DNS name, with no proxying at all — a pure DNS-level alias for something outside the cluster.

What is an Ingress, and how does it differ from a Service of type LoadBalancer?

An Ingress is a set of routing rules — based on hostname and URL path — that routes external HTTP/HTTPS traffic to different internal Services, all through a single entry point, rather than requiring a separate cloud load balancer per Service. It requires an Ingress controller (a piece of software, like NGINX Ingress or Traefik, running in the cluster) to actually implement the routing — the Ingress object itself is just the desired routing configuration, similarly to how a Deployment spec is just configuration until a controller acts on it.

How does service discovery/DNS work inside a Kubernetes cluster?

Kubernetes runs an internal DNS service (CoreDNS, in modern clusters) that automatically creates a DNS record for every Service, following the pattern ` . .svc.cluster.local` — a Pod can reach a Service in its own namespace by its short name alone, or fully-qualified across namespaces. This means applications never need to hardcode IP addresses or discover peers through an external registry — DNS resolution, backed by Kubernetes's own cluster state, is the built-in service discovery mechanism.

What is a NetworkPolicy, and what's the default network behavior without one?

By default, every Pod in a Kubernetes cluster can reach every other Pod (and be reached by every other Pod) across the whole cluster, regardless of namespace — there's no network isolation unless you add it. A NetworkPolicy is a namespace-scoped set of rules that restricts which Pods can send traffic to (egress) or receive traffic from (ingress) a given set of Pods, based on label selectors — but it only has any effect if the cluster's CNI plugin actually implements NetworkPolicy enforcement, since the object itself is just configuration.

What is a CNI plugin, and what role does it play in Kubernetes networking?

The Container Network Interface (CNI) is a standard plugin interface for configuring network connectivity for containers — Kubernetes itself doesn't implement pod-to-pod networking directly; it delegates that entirely to whichever CNI plugin is installed (Calico, Cilium, Flannel, and others). The CNI plugin is responsible for assigning Pod IP addresses and making sure every Pod can reach every other Pod across every node in the cluster, and, depending on the plugin, may also implement NetworkPolicy enforcement and other advanced networking features.

What is a headless Service, and when would you use one?

A headless Service (`clusterIP: None`) doesn't get a virtual IP or perform load balancing at all — instead, its DNS name resolves directly to the individual IP addresses of every backing Pod. It's used when a client needs to discover and connect to *specific individual Pods* rather than being load-balanced across an interchangeable group — most commonly paired with a StatefulSet, where each replica has a distinct identity and peers need to address a particular instance by name.

How does kube-proxy route traffic to Service backends?

kube-proxy runs on every node and watches the API server for Services and their current healthy backing Pods (via Endpoints/EndpointSlices), then programs that node's local networking rules so that traffic sent to a Service's virtual IP gets transparently redirected to one of the actual Pod IPs — historically via iptables rules (randomly selecting among backends), increasingly via IPVS (a more efficient in-kernel load balancer for large numbers of Services), or bypassed entirely by eBPF-based CNI plugins like Cilium that implement equivalent routing without kube-proxy at all.

What is an EndpointSlice, and why did it replace/augment Endpoints?

An EndpointSlice tracks the set of network endpoints (Pod IPs and ports) backing a Service, the same role the older Endpoints object played — but EndpointSlices split large backend lists across multiple smaller objects (each capped at 100 endpoints by default) instead of one single, unbounded object. This was introduced specifically to fix a scalability problem: a Service with thousands of backing Pods meant one enormous Endpoints object that had to be entirely rewritten and redistributed to every watching component on every single change, which became a genuine performance bottleneck at scale.

Services and Networking

Stable networking for ephemeral Pods — Services, Ingress, DNS, NetworkPolicies, and the CNI.

Difficulty

Open as page

The problem: Pod IPs are not stable

Every Pod gets its own IP address when it starts — but that address is not durable. If a Pod crashes and is replaced, is rescheduled to a different node, or is part of a rolling update replacing it with a new version, the replacement gets a different IP address. Hardcoding a Pod's IP anywhere (in another application's configuration, in a load balancer) would break constantly as normal cluster operation replaced Pods.

Deployment "web" with 3 replicas might have Pod IPs:
  10.1.2.3, 10.1.2.4, 10.1.2.5   -- right now

After a rolling update or a node failure and rescheduling:
  10.1.3.7, 10.1.2.4, 10.1.4.1   -- completely different set, moments later

What a Service provides

apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  selector:
    app: web            # matches Pods with this label
  ports:
    - port: 80           # the Service's own stable port
      targetPort: 8080   # the port the Pods actually listen on

A Service gets its own stable virtual IP (a ClusterIP) and DNS name (web.default.svc.cluster.local) that never changes for the Service's lifetime, regardless of how many times its backing Pods are replaced. Any other Pod in the cluster can reliably reach http://web (or http://web.default.svc.cluster.local from another namespace) and have traffic routed to one of the currently-healthy Pods matching the app: web label selector — without ever needing to know or track individual Pod IPs.

How the mapping stays current

The Service continuously watches for Pods matching its selector, and an associated Endpoints (or EndpointSlice — see that question) object is kept up to date with the current set of healthy backing Pod IPs. kube-proxy on every node uses this list to program local networking rules (iptables/IPVS/eBPF, depending on configuration) that route traffic sent to the Service's virtual IP to one of the currently-listed healthy Pod IPs.

Why this is the foundational abstraction for nearly everything else

Deployments, StatefulSets, Ingress, and service meshes all build on top of the basic guarantee a Service provides: a stable way to address a set of Pods without caring about individual Pod identity or IP churn. Understanding "Services solve the problem of ephemeral Pod IPs by providing a stable, load-balanced front" is the conceptual anchor for the entire networking topic — every other networking object (Ingress routing to Services, NetworkPolicies restricting traffic to/from Pods a Service fronts, headless Services for StatefulSets) is a variation or extension of this same core need.

Related Resources

Kubernetes: Service

Open as page

ClusterIP — internal only (the default)

apiVersion: v1
kind: Service
metadata:
  name: backend-api
spec:
  type: ClusterIP     # default; can be omitted
  selector:
    app: backend-api
  ports:
    - port: 80
      targetPort: 8080

Gets a virtual IP reachable only from inside the cluster. This is the right choice for the overwhelming majority of Services — internal microservice-to-microservice communication (a frontend calling a backend API, an API calling a database) almost never needs to be reachable from outside the cluster directly.

NodePort — reachable via any node's IP, on a static port

spec:
  type: NodePort
  ports:
    - port: 80
      targetPort: 8080
      nodePort: 30080     # opened on EVERY node's IP, in the 30000-32767 range by default

Every node in the cluster starts listening on nodePort and forwards traffic to the Service, regardless of whether that specific node is actually running any of the backing Pods. This makes the Service reachable via <any-node-ip>:30080 from outside the cluster — but it's a fairly low-level mechanism (you're responsible for load-balancing across nodes yourself, and the fixed port range is limited) rarely used directly in production; it's more commonly a building block that LoadBalancer Services are implemented on top of.

LoadBalancer — provisions a real external cloud load balancer

spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: 8080

On a supported cloud provider (AWS, GCP, Azure), creating a LoadBalancer Service triggers the cloud provider's integration to provision an actual external load balancer (an AWS ELB/NLB, a GCP Load Balancer) that gets a real, internet-routable IP address and forwards traffic into the cluster (typically via NodePort under the hood). This is the standard way to expose a single Service directly to the internet — but provisioning one external load balancer per Service gets expensive and unwieldy at any real scale, which is exactly the problem Ingress (see that question) solves.

ExternalName — a pure DNS alias, no proxying

spec:
  type: ExternalName
  externalName: my-database.us-east-1.rds.amazonaws.com

Creates no virtual IP and does no traffic proxying at all — it's purely a DNS-level CNAME-like redirect. Any Pod resolving my-service.default.svc.cluster.local gets redirected, at the DNS layer, straight to my-database.us-east-1.rds.amazonaws.com. Useful for giving an external dependency (a managed cloud database, a third-party API) a consistent in-cluster name, so application configuration can refer to a stable internal name even if the actual external address changes later.

Choosing between them

Need	Service type
Internal service-to-service communication only	ClusterIP
Low-level external access via node IPs (rare in practice)	NodePort
A single Service exposed directly to the internet	LoadBalancer
Many services need to be exposed under one external IP/domain, with path/host-based routing	Ingress (fronting ClusterIP Services)
A stable internal name for an external dependency	ExternalName

In practice, most production clusters use ClusterIP for internal services and a small number of LoadBalancer Services (or, more commonly, a single one fronting an Ingress controller) rather than exposing many individual Services externally.

Related Resources

Kubernetes: Service Types

Open as page

The problem with one LoadBalancer Service per application

A LoadBalancer Service provisions a dedicated external cloud load balancer for that one Service — fine for a single application, but a cluster hosting dozens of services, each needing external HTTP access, would need dozens of expensive cloud load balancers, each with its own IP, and no shared logic for routing by hostname or path.

What Ingress solves: one entry point, many backends, routed by rules

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: main-ingress
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: backend-api
                port:
                  number: 80
    - host: app.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: frontend
                port:
                  number: 80

One Ingress (typically fronted by exactly one LoadBalancer Service, pointing at the Ingress controller itself) can route api.example.com to the backend-api Service and app.example.com to the frontend Service — host-based and path-based routing, TLS termination, and often other HTTP-layer features (URL rewriting, request/response header manipulation) that a plain L4 LoadBalancer Service knows nothing about, since a LoadBalancer Service just forwards raw TCP/UDP traffic without any awareness of HTTP semantics.

The Ingress controller — the piece that actually does the work

Critically, creating an Ingress object by itself does nothing unless an Ingress controller is running in the cluster to watch for Ingress objects and implement their rules — much like how a Deployment spec sitting in etcd does nothing until the Deployment controller acts on it. Common Ingress controllers: NGINX Ingress Controller, Traefik, cloud-specific ones (AWS Load Balancer Controller, GCE Ingress). Different controllers support different annotations/features beyond the Ingress spec's baseline (rate limiting, custom load-balancing algorithms, WAF integration), so the choice of controller genuinely matters, not just the Ingress YAML itself.

LoadBalancer Service vs. Ingress — where each fits

	LoadBalancer Service	Ingress
Layer	L4 (TCP/UDP)	L7 (HTTP/HTTPS)
Routing granularity	Whole Service, one external LB each	Host/path-based routing to many Services through one entry point
Cost at scale (cloud)	One cloud LB per Service — expensive with many services	Typically one cloud LB total, fronting the Ingress controller
TLS termination	Possible, but more manual	Commonly built in via `tls` config referencing a Secret
Non-HTTP protocols (raw TCP, gRPC without HTTP/1.1 semantics, etc.)	Works fine — it's protocol-agnostic at L4	HTTP-focused; some controllers support gRPC/TCP passthrough with extensions, but it's not the core design target

The typical real-world setup

Most production clusters run exactly one (or a small number of) LoadBalancer Service, pointed at an Ingress controller Deployment, and expose every other internal application Service purely as ClusterIP, routed to externally only via Ingress rules — this minimizes external cloud load balancer cost and centralizes TLS/routing configuration in one place rather than scattering it across many individually-exposed Services.

Gateway API — the newer alternative

Kubernetes's newer Gateway API is a more expressive, role-oriented successor to Ingress (separating cluster-operator-owned infrastructure config from application-team-owned routing rules, and supporting more protocols/features natively) — increasingly adopted alongside or instead of Ingress, though Ingress remains extremely widely deployed and is not being removed.

Related Resources

Kubernetes: Ingress

Open as page

The DNS naming pattern

Every Service automatically gets a DNS record following a predictable pattern:

<service-name>.<namespace>.svc.cluster.local

# A Service named "backend-api" in namespace "production"
# is resolvable at:
backend-api.production.svc.cluster.local

From within the same namespace, the short name alone resolves correctly (http://backend-api), because a Pod's DNS search domains include its own namespace — this is why application configuration inside a cluster almost never needs a fully-qualified name, just the plain Service name, as long as the caller and the Service are in the same namespace. Calling across namespaces requires at least backend-api.production (namespace included), or the fully-qualified form.

CoreDNS — the component that answers these queries

Modern Kubernetes clusters run CoreDNS (the successor to the older kube-dns) as a cluster add-on, typically itself a Deployment with a few replicas for availability, exposed via its own Service (usually named kube-dns for historical compatibility, even though it's running CoreDNS). Every Pod's /etc/resolv.conf is automatically configured (by the kubelet) to send DNS queries to CoreDNS's ClusterIP, with the appropriate search domains appended.

# Inside a Pod, this is what gets auto-configured:
cat /etc/resolv.conf
# nameserver 10.96.0.10          <- CoreDNS's Service ClusterIP
# search default.svc.cluster.local svc.cluster.local cluster.local

What gets a DNS record, and what doesn't

Every Service gets a DNS A/AAAA record resolving to its ClusterIP (or, for a headless Service, resolving directly to its backing Pods' individual IPs — see that question).
Pods themselves can optionally get individual DNS records too (if subdomain and a headless Service are configured — mainly relevant for StatefulSets, where individually addressing web-0 vs web-1 matters).
Ordinary Pods (not part of a headless-Service-backed StatefulSet) don't get an individually resolvable DNS name by default — you address them collectively, through their Service.

Why this is the "built-in service discovery" story

Rather than requiring applications to register themselves with, and query, a separate external service registry (like Consul, or a hand-rolled database of "which host runs which service"), Kubernetes uses its own control plane's already-authoritative knowledge of every Service and its endpoints to answer DNS queries directly and automatically. An application only needs to know one thing at deploy time — the Service's name — and DNS plus the Service abstraction together handle everything about which actual Pod IPs currently back it, with zero application-level service-registry code required.

A common practical gotcha

DNS resolution inside a Pod has a real (if usually small) latency cost, and some language runtimes' default DNS resolvers cache results in ways that don't always respect TTLs correctly, or don't retry properly against multiple nameserver entries — this occasionally causes subtle connectivity issues after a Service's backing Pods change, and is worth knowing as a troubleshooting angle when "everything looks fine in Kubernetes but the app still can't reach its dependency" comes up.

Related Resources

Kubernetes: DNS for Services and Pods

Open as page

The default: flat, fully-open networking

Out of the box, Kubernetes's networking model guarantees every Pod can reach every other Pod's IP directly, cluster-wide, with no NAT and no default restriction — this "flat network" model is a deliberate simplicity choice in the base Kubernetes networking design, but it means a compromised or misbehaving Pod in one namespace can, by default, reach any Pod in any other namespace, including ones it has no legitimate business talking to.

Restricting traffic with a NetworkPolicy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-allow-from-frontend-only
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend-api          # this policy applies to Pods labeled app=backend-api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: frontend       # only allow traffic FROM pods labeled app=frontend
      ports:
        - protocol: TCP
          port: 8080

This says: Pods labeled app: backend-api in the production namespace only accept inbound traffic on port 8080, and only from Pods labeled app: frontend — traffic from any other Pod (including other Pods in the same namespace not labeled frontend, and every Pod in every other namespace) is rejected.

The default-deny pattern

An empty podSelector: {} with no ingress/egress rules matches all Pods in the namespace and, since no rules are specified, denies all traffic of that type — a common, deliberate first step for hardening a namespace:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

Applied alone, this blocks all traffic in/out of every Pod in the namespace — then additional, more permissive NetworkPolicies are layered on top to explicitly allow only the specific traffic patterns actually needed (frontend → backend, backend → database, everything → DNS). This "default deny, then explicitly allow" approach is the standard, security-recommended pattern rather than trying to enumerate every disallowed path from an otherwise-open default.

The critical caveat: NetworkPolicy needs CNI support

A NetworkPolicy object is just declarative configuration — like any Kubernetes object, it does nothing on its own unless something in the cluster actually enforces it. NetworkPolicy enforcement depends entirely on the cluster's CNI plugin (see that question) supporting it — Calico, Cilium, and several others implement NetworkPolicy enforcement; some simpler CNI plugins (certain Flannel configurations, in particular) do not, meaning NetworkPolicy objects you create are silently accepted by the API server but have zero actual effect on traffic. Verifying your specific CNI plugin actually enforces NetworkPolicies is an essential, easy-to-overlook step — a false sense of security from an unenforced NetworkPolicy is worse than no policy at all, since it looks secured but isn't.

For any cluster running genuinely multi-tenant workloads, or handling sensitive data, treat NetworkPolicies (backed by a CNI plugin that actually enforces them) as a baseline security control, not an optional extra — combined with a default-deny starting posture per namespace and explicit allow rules for legitimate traffic paths only.

Related Resources

Kubernetes: Network Policies

Open as page

What Kubernetes requires, but doesn't itself implement

Kubernetes's networking model has a small number of hard requirements: every Pod gets its own IP, every Pod can reach every other Pod's IP without NAT (across the whole cluster, not just the local node), and a Pod sees the same IP for itself that other Pods use to reach it. Kubernetes itself has no built-in implementation of these requirements — it delegates entirely to whatever CNI plugin is installed, via the standard CNI interface (similar in spirit to how CRI standardizes the kubelet-to-runtime relationship — see that question).

kubelet, when starting a Pod:
   → calls the configured CNI plugin
   → CNI plugin assigns the Pod an IP, sets up its network namespace,
     configures routes so it can reach other Pods cluster-wide

Common CNI plugins and their approaches

Flannel — one of the simplest, focused primarily on providing basic cluster-wide Pod networking (often via a VXLAN overlay network); historically didn't support NetworkPolicy enforcement on its own.
Calico — supports both a simpler routed (non-overlay) networking mode and NetworkPolicy enforcement; widely used specifically for clusters that need real network policy security.
Cilium — built on eBPF (running programs directly in the Linux kernel rather than relying on iptables), offering high-performance networking, NetworkPolicy enforcement, and deep L7-aware observability/security features; increasingly popular for clusters wanting both performance and rich policy capability.
AWS VPC CNI / Azure CNI / GCP's native networking — cloud-provider-specific CNI plugins that assign Pods IP addresses directly from the cloud's own VPC address space, integrating Kubernetes networking more tightly with the cloud's native networking and security groups.

Why the choice of CNI plugin genuinely matters

Beyond basic connectivity, the CNI plugin determines: whether NetworkPolicies are enforced at all (see that question), the underlying networking approach's performance characteristics (overlay/VXLAN vs. direct routing vs. eBPF), and whether advanced features like network observability, L7-aware policies, or multi-cluster networking are available. This is a foundational infrastructure decision typically made once when standing up a cluster, and switching CNI plugins on a live cluster is a genuinely disruptive operation — not something changed casually.

How this relates to Services and NetworkPolicies

Once the CNI plugin has established basic Pod-to-Pod IP connectivity across the cluster, Services (implemented via kube-proxy, layered on top of the CNI's basic connectivity) provide stable virtual IPs and load balancing, and NetworkPolicies (enforced by the CNI plugin itself, if it supports it) restrict which of those now-possible connections are actually allowed. All three layers work together: CNI provides the underlying "can any Pod technically reach any other Pod," Services provide "a stable way to address a group of Pods," and NetworkPolicies provide "which of these technically-possible connections are actually permitted."

Related Resources

Kubernetes: Cluster Networking

Open as page

Regular Service vs. headless Service

A regular (non-headless) Service's DNS name resolves to one virtual ClusterIP, which kube-proxy then load-balances across the backing Pods — from a client's perspective, there's one address, and which actual Pod receives any given connection is essentially arbitrary.

apiVersion: v1
kind: Service
metadata:
  name: web-headless
spec:
  clusterIP: None      # <-- this is what makes it headless
  selector:
    app: web
  ports:
    - port: 80

A headless Service (clusterIP: None) skips the virtual IP and load-balancing entirely — instead, DNS resolves the Service's name directly to a list of all the backing Pods' individual IP addresses (an A record with multiple entries), and it's left to the client to decide which one to connect to (or connect to all of them, as appropriate for the use case).

The critical pairing with StatefulSets

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  serviceName: "web-headless"   # references the headless Service
  ...

When a headless Service fronts a StatefulSet, each Pod additionally gets its own individually resolvable DNS name, following the pattern <pod-name>.<service-name>.<namespace>.svc.cluster.local:

web-0.web-headless.default.svc.cluster.local  -> resolves to web-0's specific IP
web-1.web-headless.default.svc.cluster.local  -> resolves to web-1's specific IP
web-2.web-headless.default.svc.cluster.local  -> resolves to web-2's specific IP

This is essential for stateful, identity-aware applications: a database's replication logic might need to specifically connect to web-0 (the designated primary) rather than an arbitrary, load-balanced member of the set — something a regular Service's single load-balanced virtual IP has no way to express, since it deliberately hides which specific Pod you're reaching.

Other use cases beyond StatefulSets

Client-side load balancing — some applications/libraries prefer to receive the full list of backend IPs themselves and implement their own load-balancing or connection-pooling logic (common in some gRPC setups, or applications using a smart client library), rather than relying on kube-proxy's L4 load balancing.
Peer discovery for clustered/distributed applications — a distributed system's own membership/gossip protocol (like Cassandra's or Elasticsearch's) often wants to discover all peer IPs directly to manage its own clustering logic, rather than going through a load-balanced single address.

Use a regular (non-headless) Service by default for typical stateless client-server communication where load balancing across interchangeable replicas is exactly what you want. Reach for a headless Service specifically when a client needs individual Pod addressability or the full peer list — almost always in combination with a StatefulSet, or for applications implementing their own client-side connection logic.

Related Resources

Kubernetes: Headless Services

Open as page

The core job: translate "Service virtual IP" into "one specific Pod IP"

A Service's ClusterIP is not a real, routable address assigned to any actual network interface — it's a virtual IP that only means something because every node's kube-proxy has been configured to intercept traffic sent to it and redirect that traffic to one of the Service's actual backing Pods. kube-proxy watches the API server for Service and EndpointSlice changes and continuously updates each node's local networking configuration to reflect the current set of healthy backends.

iptables mode (the long-standing default)

kube-proxy writes a chain of iptables rules (using the Linux kernel's netfilter framework) that match packets destined for a Service's ClusterIP:port and probabilistically redirect them (via DNAT) to one of the currently healthy backing Pod IP:ports, roughly at random.

Packet destined for Service ClusterIP 10.96.0.5:80
   → iptables rule matches, picks one of the 3 backing Pod IPs (weighted-random)
   → DNAT rewrites the destination to the chosen Pod's actual IP:port
   → packet continues on to that Pod

Limitation at scale: iptables rule evaluation is roughly linear in the number of rules — with thousands of Services, the sheer number of rules that must be checked for every packet can become a measurable performance bottleneck, and rule updates (whenever any Service's endpoints change) get progressively slower to apply as the ruleset grows.

IPVS mode — built for larger scale

IPVS (IP Virtual Server, a Linux kernel-level load balancer) uses hash-table-based lookups instead of a linear rule chain, giving effectively O(1) backend selection regardless of how many Services exist, and supports several actual load-balancing algorithms (round robin, least connection, etc.) rather than iptables's simpler random selection. Clusters with a very large number of Services (common in large multi-tenant or microservice-heavy environments) typically switch kube-proxy to IPVS mode specifically for this scaling advantage.

Bypassing kube-proxy entirely: eBPF-based CNI plugins

Some CNI plugins, most notably Cilium, can replace kube-proxy's functionality entirely using eBPF programs running directly in the kernel — achieving the same Service-routing behavior with lower latency and overhead than either iptables or IPVS, and often adding richer observability into the bargain. This is an increasingly common production configuration, though it changes the operational model somewhat (you're relying on the CNI plugin, not the separate kube-proxy component, for this critical routing function).

Why understanding this mechanism matters practically

When debugging "a Service exists and has healthy endpoints, but traffic still isn't reaching Pods," understanding that kube-proxy is the component actually translating the virtual IP into real routing rules (rather than something magical happening at the Service object level) tells you where to look: is kube-proxy running and healthy on the relevant nodes, are its iptables/IPVS rules actually present and correct (iptables-save | grep <service-name>), and does the EndpointSlice actually list the expected healthy Pod IPs in the first place.

Related Resources

Kubernetes: kube-proxy

Open as page

The role both objects play

Both Endpoints and EndpointSlices exist to answer the same question: "which Pod IP:port combinations are currently the healthy backends for this Service?" — this is the data kube-proxy (and other consumers, like a service mesh's control plane) actually watches and reacts to when programming routing rules.

# The older Endpoints object -- one object per Service, unbounded size
apiVersion: v1
kind: Endpoints
metadata:
  name: web           # matches the Service name
subsets:
  - addresses:
      - ip: 10.1.2.3
      - ip: 10.1.2.4
      # ... every single backing Pod IP, in ONE list, in ONE object
    ports:
      - port: 8080

The scaling problem this caused

For a Service backed by a very large number of Pods (thousands, in large clusters), every single one of those Pod IPs lived inside one Endpoints object. Any single change — one Pod becoming unready, one Pod being replaced during a rollout — required the API server to serialize and transmit the entire updated list (potentially tens of thousands of IP entries) to every component watching that object. This scaled poorly: both the size of each update and the number of components needing to process it grew with cluster size, making large-scale rollouts and node churn noticeably more expensive on the control plane.

EndpointSlices — the same data, sharded into smaller pieces

apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: web-abc123        # one of potentially several slices for the "web" Service
  labels:
    kubernetes.io/service-name: web    # links this slice back to its Service
addressType: IPv4
endpoints:
  - addresses: ["10.1.2.3"]
    conditions:
      ready: true
  - addresses: ["10.1.2.4"]
    conditions:
      ready: true
ports:
  - port: 8080

Instead of one unbounded object, a Service's full backend list is split across multiple EndpointSlice objects, each capped at a configurable maximum (100 endpoints by default). When one Pod's readiness changes, only the one slice containing that Pod needs to be updated and redistributed — not the entire backend list for the whole Service — which is a significant, targeted fix for the update-cost-at-scale problem.

Additional improvements EndpointSlices brought along

Beyond sharding, EndpointSlices also natively support dual-stack (IPv4 and IPv6 simultaneously, via separate slices per address type) and carry richer per-endpoint information (like topology hints, used for topology-aware routing that prefers keeping traffic within the same zone/region for latency and cost reasons) that the older Endpoints object's simpler structure didn't accommodate well.

Practical relevance today

Endpoints objects still exist (for backward compatibility with older tooling that reads them directly) and are still automatically kept in sync alongside EndpointSlices for any Service, but EndpointSlices are what modern kube-proxy and other Service-consuming components actually watch and rely on. When debugging Service connectivity at scale, checking kubectl get endpointslices -l kubernetes.io/service-name=<service> (rather than the older kubectl get endpoints) is the more scalable and increasingly the more idiomatic diagnostic path.

Related Resources

Kubernetes: EndpointSlices