What is two-phase commit, and when is it needed?

Detailed Answer

The problem: atomicity across multiple systems

A normal BEGIN...COMMIT transaction is atomic within one database. But suppose you need to atomically update two separate databases (e.g., debit an account in DB-A and credit an account in DB-B, where A and B are physically separate database servers) — a plain commit on each individually can't guarantee both succeed or both fail together; one could commit while the other crashes or fails.

The two phases

Phase 1 — Prepare: The coordinator asks every participant to do everything needed to commit (validate constraints, write to its own durable log) and reply "yes, I can commit" or "no, I can't" — but without actually making the change visible/permanent yet.

Coordinator -> Participant A: PREPARE
Coordinator -> Participant B: PREPARE
Participant A -> Coordinator: READY (durably logged, could commit or abort from here)
Participant B -> Coordinator: READY

Phase 2 — Commit (or Abort): Only if every participant replied "yes" does the coordinator tell everyone to actually commit. If even one participant said "no" (or timed out), the coordinator tells everyone to abort instead.

Coordinator -> Participant A: COMMIT
Coordinator -> Participant B: COMMIT
Participant A -> Coordinator: COMMITTED
Participant B -> Coordinator: COMMITTED

Because every participant durably logged "I'm ready to commit" before replying yes in phase 1, even a crash between phase 1 and phase 2 is recoverable — on restart, a participant can check with the coordinator (or replay its own log) to find out whether the overall transaction ultimately committed or aborted, and finish accordingly.

When it's actually needed

Genuine distributed transactions across separate database engines/instances — e.g., updating two different PostgreSQL clusters, or a database plus a JMS message queue, as a single atomic unit (XA transactions implement this pattern in many enterprise stacks).
Rare in modern web-scale architectures, because 2PC has real costs: it's blocking (participants hold locks/resources while waiting through both phases), and the coordinator is a single point of failure/bottleneck during the protocol.

Why most modern systems avoid it

Distributed systems at scale generally prefer eventual consistency with compensating actions (the Saga pattern: a sequence of local transactions, each with a corresponding "undo" action if a later step fails) or idempotent, retryable operations with outbox patterns (write the "intent to do X" in the same local transaction as the primary change, then a background process reliably delivers it) rather than 2PC, because these avoid 2PC's blocking behavior and single coordinator bottleneck, at the cost of only eventual (not immediate) cross-system consistency.

2PC is worth knowing conceptually — it's the classical answer to "how do you get atomicity across two databases" — but a strong answer also mentions why modern distributed architectures usually avoid it in favor of sagas/outbox patterns, since that shows awareness of its real operational costs, not just the protocol's mechanics.

What is two-phase commit, and when is it needed?

Quick Answer

Detailed Answer

The problem: atomicity across multiple systems

The two phases

When it's actually needed

Why most modern systems avoid it

Related Resources

Related Questions