What is a read replica, and how does it help scale reads?

Detailed Answer

The basic pattern

                     [ Primary ]  <- all WRITES go here
                     /    |    \
                    v     v     v
              [Replica1][Replica2][Replica3]  <- READS distributed across these

Application code routes writes to the primary and (some or all) reads to whichever replica is available/least loaded — often via a proxy/load balancer, or explicit read/write connection strings configured in the application.

Why this helps

Most application workloads are read-heavy (often 80-95%+ of database operations are reads). Since replicas can serve reads independently and in parallel, adding more replicas increases total read capacity roughly linearly, without touching the primary's write capacity at all — a much simpler scaling lever than sharding, and doesn't introduce cross-shard query complexity.

The cost: replication lag

Replicas apply changes slightly after the primary commits them (whether via async or sync replication — see that question), so a read hitting a replica immediately after a related write might not see that write yet.

1. Client writes new profile picture URL to the primary.  [committed]
2. Client immediately reads their profile from a replica.
3. Replica hasn't received the replicated change yet -- shows the OLD picture.

This is the classic "read-your-own-writes" consistency problem with read replicas — a real, common source of confusing bug reports ("I just changed X and it still shows the old value!"). Mitigations: route a user's own immediate follow-up reads to the primary for a short window after they write, use "read-after-write" consistency features some managed database services provide, or accept the staleness for read paths where it's genuinely not user-visible/critical.

What replicas don't help with

Read replicas scale read throughput, not write throughput or total storage — every replica holds a full copy of the entire dataset, and every write still has to go through (and be replicated from) the single primary. If write volume, not read volume, is the actual bottleneck, replicas don't help — that's what sharding addresses instead.

Read replicas are usually the first and easiest horizontal scaling step for a read-heavy application, since they require far less architectural change than sharding — mostly routing logic in the application/connection layer, rather than redesigning the data model around a shard key. They're a natural fit for reporting/analytics queries too, since routing expensive analytical reads to a dedicated replica isolates that load from the primary's transactional workload entirely.

What is a read replica, and how does it help scale reads?

Quick Answer

Detailed Answer

The basic pattern

Why this helps

The cost: replication lag

What replicas don't help with

Related Resources

Related Questions