What is two-phase commit, and when is it needed?
Quick Answer
Two-phase commit (2PC) is a protocol for atomically committing a transaction that spans multiple independent databases/resources: a coordinator first asks every participant to "prepare" (durably promise it can commit), and only after all participants confirm does it tell everyone to actually commit. It's needed for distributed transactions across separate database instances or heterogeneous systems (e.g., a database and a message queue) where a single engine's normal atomicity guarantee doesn't span both.
Detailed Answer
The problem: atomicity across multiple systems
A normal BEGIN...COMMIT transaction is atomic within one database. But suppose you need to atomically update two separate databases (e.g., debit an account in DB-A and credit an account in DB-B, where A and B are physically separate database servers) — a plain commit on each individually can't guarantee both succeed or both fail together; one could commit while the other crashes or fails.
The two phases
Phase 1 — Prepare: The coordinator asks every participant to do everything needed to commit (validate constraints, write to its own durable log) and reply "yes, I can commit" or "no, I can't" — but without actually making the change visible/permanent yet.
Coordinator -> Participant A: PREPARE
Coordinator -> Participant B: PREPARE
Participant A -> Coordinator: READY (durably logged, could commit or abort from here)
Participant B -> Coordinator: READY
Phase 2 — Commit (or Abort): Only if every participant replied "yes" does the coordinator tell everyone to actually commit. If even one participant said "no" (or timed out), the coordinator tells everyone to abort instead.
Coordinator -> Participant A: COMMIT
Coordinator -> Participant B: COMMIT
Participant A -> Coordinator: COMMITTED
Participant B -> Coordinator: COMMITTED
Because every participant durably logged "I'm ready to commit" before replying yes in phase 1, even a crash between phase 1 and phase 2 is recoverable — on restart, a participant can check with the coordinator (or replay its own log) to find out whether the overall transaction ultimately committed or aborted, and finish accordingly.
When it's actually needed
- Genuine distributed transactions across separate database engines/instances — e.g., updating two different PostgreSQL clusters, or a database plus a JMS message queue, as a single atomic unit (
XA transactionsimplement this pattern in many enterprise stacks). - Rare in modern web-scale architectures, because 2PC has real costs: it's blocking (participants hold locks/resources while waiting through both phases), and the coordinator is a single point of failure/bottleneck during the protocol.
Why most modern systems avoid it
Distributed systems at scale generally prefer eventual consistency with compensating actions (the Saga pattern: a sequence of local transactions, each with a corresponding "undo" action if a later step fails) or idempotent, retryable operations with outbox patterns (write the "intent to do X" in the same local transaction as the primary change, then a background process reliably delivers it) rather than 2PC, because these avoid 2PC's blocking behavior and single coordinator bottleneck, at the cost of only eventual (not immediate) cross-system consistency.
2PC is worth knowing conceptually — it's the classical answer to "how do you get atomicity across two databases" — but a strong answer also mentions why modern distributed architectures usually avoid it in favor of sagas/outbox patterns, since that shows awareness of its real operational costs, not just the protocol's mechanics.