What is database replication, and what's the difference between synchronous and asynchronous replication?
Quick Answer
Replication maintains copies of the same data across multiple database servers, typically a primary (accepts writes) and one or more replicas (receive copies of those writes). In **synchronous** replication, the primary waits for at least one replica to confirm it received the write before acknowledging success to the client — safer, zero data loss on failover, but higher write latency. In **asynchronous** replication, the primary acknowledges the write immediately and replicates in the background — lower latency, but a crash before replication completes can lose the most recent write(s).
Detailed Answer
The basic topology
Writes
|
v
[ Primary ]
/ | \
v v v
[Replica1][Replica2][Replica3] <- receive copies of every write
Replicas apply the same stream of changes the primary made (often via shipping the write-ahead log — see that question) so their data converges to match the primary's, with some delay.
Asynchronous replication
Client -> Primary: write X
Primary -> Client: "success" (acknowledged immediately)
Primary -> Replicas: ships the change (happens after acknowledging the client)
The primary doesn't wait for any replica to confirm receipt before telling the client the write succeeded. Pros: lowest possible write latency, since the client isn't waiting on network round-trips to remote replicas. Cons: if the primary crashes after acknowledging the client but before a replica received the change, that write is lost if a replica is promoted to primary — the replica genuinely never had it.
Synchronous replication
Client -> Primary: write X
Primary -> Replica: ships the change
Replica -> Primary: "received and applied"
Primary -> Client: "success" (only now, after replica confirmation)
Pros: zero data loss on failover — by the time the client is told "success," at least one replica genuinely has the data too, so promoting that replica loses nothing. Cons: meaningfully higher write latency (every write waits on a network round-trip to the replica, and if the replica is slow or unreachable, writes stall or fail depending on configuration) — this cost is paid on every single write, permanently, not just during a failure.
The realistic middle ground: semi-synchronous / quorum-based
Many production systems use a middle configuration — e.g., wait for confirmation from at least one of several replicas (not all), or a quorum, balancing durability guarantees against latency. PostgreSQL supports synchronous_commit tuning with options like remote_write/remote_apply and can designate specific replicas as synchronous while others remain asynchronous, letting you tune exactly how much durability guarantee you're paying latency for.
Choose based on how costly losing the most recent few writes actually is: financial transactions or anything where "we told the customer it succeeded, then it disappeared" is unacceptable strongly favors synchronous (or at least semi-synchronous/quorum) replication for the primary write path; less critical data (analytics events, non-critical logs) can usually tolerate asynchronous replication's small window of potential loss in exchange for consistently lower write latency.