Explain the CAP theorem and what it means for distributed databases

7 minadvancedcap-theoremdistributed-systemsnosql

Quick Answer

The CAP theorem states that a distributed data system can only guarantee two of three properties at any given moment during a network partition: **Consistency** (every read sees the most recent write), **Availability** (every request gets a response, even if not the latest data), and **Partition tolerance** (the system keeps working despite network failures splitting it into isolated groups). Since partitions are an unavoidable reality in any real distributed system, the practical choice is really between **CP** (consistent but may refuse requests during a partition) and **AP** (available but may return stale data during a partition).

Detailed Answer

The three properties

  • Consistency (C): every node returns the most recent write for any given read — all nodes see the same data at the same time.
  • Availability (A): every request receives a (non-error) response, even if some nodes can't communicate with each other.
  • Partition tolerance (P): the system continues operating even when network communication between nodes is disrupted (a "partition" — some nodes can't reach others).

Why it's really "pick 2 of 3" only during a partition

In a system with no network partitions, you can actually have both C and A simultaneously — the theorem's bite only applies during an actual partition event. And because partitions are a real, unavoidable fact of distributed systems (networks fail, nodes get cut off, packets get dropped), partition tolerance isn't really optional for any system that's genuinely distributed across multiple nodes — so the real-world choice collapses to CP vs. AP: when a partition happens, do you sacrifice consistency (keep serving requests, possibly with stale data) or sacrifice availability (refuse requests from the cut-off nodes until the partition heals, to guarantee consistency)?

CP systems — consistency over availability during a partition

When a partition occurs, a CP system will refuse to serve (or will block) requests on the minority/cut-off side rather than risk returning stale or conflicting data.

Examples: traditional relational databases in a synchronous-replication configuration, HBase, MongoDB (in its default configuration, favoring consistency via a single primary that must be reachable for writes), ZooKeeper/etcd (consensus-based coordination systems).

AP systems — availability over consistency during a partition

When a partition occurs, an AP system keeps accepting reads/writes on both sides of the partition, accepting that different sides may temporarily disagree — reconciling the divergence once the partition heals (see the eventual consistency question).

Examples: Cassandra, DynamoDB (in its default/eventually-consistent read mode), CouchDB.

Why this isn't really about "SQL vs NoSQL"

CAP is a property of a distributed system's design choices, not an inherent property of "relational" vs. "NoSQL" — a single-node relational database isn't meaningfully subject to CAP at all (there's nothing to partition), but a multi-region relational deployment absolutely is, and some NoSQL systems (like MongoDB by default) actually lean CP rather than AP. Many modern databases also let you tune this per-operation (e.g., Cassandra's per-query consistency levels, DynamoDB's strongly-consistent vs. eventually-consistent reads) rather than being a single fixed system-wide choice.

A strong answer doesn't just recite "Consistency, Availability, Partition tolerance" — it explains why the real tradeoff is CP vs. AP (since P is non-negotiable for a genuinely distributed system), and can name what a specific real system chooses and why that fits its use case (e.g., a banking ledger favoring CP because stale balance reads are unacceptable; a social media "like counter" favoring AP because a few seconds of staleness is an acceptable tradeoff for never showing an error to the user).

Related Resources