What's the difference between vertical and horizontal scaling for databases?

Vertical scaling (scaling up) means adding more resources — CPU, RAM, faster disks — to a single database server. Horizontal scaling (scaling out) means distributing data and load across multiple servers (via replication and/or sharding). Vertical scaling is simpler (no application changes) but hits a hard ceiling and creates a single point of failure; horizontal scaling has much higher ceilings and improves availability, but adds real distributed-systems complexity.

What is database replication, and what's the difference between synchronous and asynchronous replication?

Replication maintains copies of the same data across multiple database servers, typically a primary (accepts writes) and one or more replicas (receive copies of those writes). In **synchronous** replication, the primary waits for at least one replica to confirm it received the write before acknowledging success to the client — safer, zero data loss on failover, but higher write latency. In **asynchronous** replication, the primary acknowledges the write immediately and replicates in the background — lower latency, but a crash before replication completes can lose the most recent write(s).

What is sharding, and what are common sharding strategies?

Sharding splits a dataset horizontally across multiple independent database instances (shards), each holding a subset of the rows, so that both storage and write throughput scale by adding more shards. Common strategies: **range-based** (partition by a value range, e.g., dates), **hash-based** (partition by a hash of the key, spreading data evenly), and **directory-based** (a lookup service maps each key to its shard explicitly). The choice of shard key is the single most consequential decision — a bad choice causes uneven load ("hot shards") or expensive cross-shard queries.

What is a read replica, and how does it help scale reads?

A read replica is a copy of the primary database that receives a continuous stream of replicated changes and serves read-only queries, offloading read traffic from the primary so it can focus on handling writes. This scales read throughput roughly linearly with the number of replicas added, at the cost of replication lag — replicas are typically slightly behind the primary, so reads from a replica may return marginally stale data.

What is database failover, and how do systems achieve high availability?

Failover is the process of automatically detecting that a primary database has become unavailable and promoting a replica to take over as the new primary, minimizing downtime. High availability is achieved by combining replication (so a healthy up-to-date copy exists), health monitoring/heartbeats (to detect failure quickly), and an automated promotion/routing mechanism (so clients start talking to the new primary without manual intervention) — measured by uptime targets like "99.99%" and by recovery time/point objectives (RTO/RPO).

What is connection pooling, and why does it matter at scale?

A connection pool maintains a set of already-established database connections that application code borrows and returns, instead of opening a brand-new connection per request. This matters because establishing a database connection (TCP handshake, authentication, session setup) is relatively expensive, and most databases have a hard limit on total concurrent connections — without pooling, a busy application can either waste significant time/resources per request on connection setup, or exhaust the database's connection limit entirely under load.

What is the N+1 query problem, and how do you fix it?

The N+1 problem happens when code fetches a list of N parent records with one query, then issues a separate query per parent to fetch related child data — resulting in 1 + N total queries instead of 2 (or 1). It's extremely common with ORMs' lazy-loading defaults. The fix is to fetch the related data in one additional batched query (or a single join), commonly called eager loading, instead of triggering a query inside a loop.

How would you design a backup and disaster recovery strategy for a production database?

A solid strategy combines full backups (periodic complete snapshots), incremental/differential backups (capturing only changes since the last backup, to reduce storage/time), and continuous write-ahead log archiving (enabling point-in-time recovery to any moment, not just the last backup's timestamp). Store backups off-site/in a different region from the primary, test restores regularly (an untested backup is not a real backup), and define explicit RTO/RPO targets that drive the actual backup frequency and architecture chosen.

What is a write-ahead log (WAL), and what role does it play in durability and replication?

A write-ahead log records every change as a durable, sequential log entry *before* the change is applied to the actual data files — ensuring that if the database crashes mid-operation, it can replay the log on restart to recover any committed change that hadn't yet been fully written to the main data files. This same log stream is also what most replication mechanisms ship to replicas, since replaying the identical sequence of logged changes is how a replica reconstructs the same state as the primary.

How would you scale a relational database to handle millions of users?

Work through scaling levers roughly in order of cost/complexity: optimize queries and indexes first, add caching (application-level and/or a dedicated cache like Redis) for hot read paths, add read replicas to scale read throughput, scale the primary vertically as far as practical, and only then consider sharding for write-throughput/storage limits that nothing else can address. Layer in connection pooling throughout, and treat each step as something to justify with actual measured bottlenecks, not a checklist to apply preemptively.

Scaling, Replication, and High Availability

Growing a database beyond a single server — replication, sharding, connection pooling, and disaster recovery.

Difficulty

Open as page

Vertical scaling — bigger machine

Before: 8 vCPU, 32GB RAM database server
After:  32 vCPU, 128GB RAM database server (same single server, upgraded)

Pros: requires no application-level changes — the database is still one logical instance, transactions and joins work exactly as before, no new distributed-systems concerns introduced. Simplest option to reason about.

Cons: there's a hard physical/economic ceiling — eventually you run out of bigger hardware to buy, or it becomes prohibitively expensive. It also doesn't improve availability — a single, larger server is still a single point of failure; if it goes down, the whole database is down.

Horizontal scaling — more machines

Before: 1 database server handling all reads and writes
After:  1 primary (writes) + several read replicas (reads),
        or several shards, each holding a subset of the data

Pros: much higher scaling ceiling (in principle, keep adding machines), and can improve availability (a replica can be promoted if the primary fails — see the failover question).

Cons: introduces real distributed-systems complexity — replication lag, choosing a sharding key (see that question), cross-shard queries/joins becoming expensive or impossible, and generally more operational surface area (more machines to monitor, patch, and reason about failure modes for).

How they typically combine in practice

Most systems scale vertically first (it's cheap and simple, and modern hardware ceilings are quite high) and only reach for horizontal scaling once vertical scaling is exhausted or availability requirements demand redundancy regardless of raw capacity needs. Read scaling is usually the first horizontal step (read replicas — see that question), since most application workloads are read-heavy and reads are easier to distribute than writes; write scaling (sharding) is a bigger architectural commitment, usually reserved for when a single primary genuinely can't keep up with write volume.

A strong answer recognizes vertical scaling isn't "the naive option to outgrow" — it's often the right first move because of its simplicity, and premature horizontal scaling (sharding a dataset that would fit comfortably on a bigger single server) adds real complexity for no corresponding benefit.

Related Resources

AWS: Scaling Databases

Open as page

The basic topology

        Writes
           |
           v
      [ Primary ]
       /    |    \
      v     v     v
 [Replica1][Replica2][Replica3]   <- receive copies of every write

Replicas apply the same stream of changes the primary made (often via shipping the write-ahead log — see that question) so their data converges to match the primary's, with some delay.

Asynchronous replication

Client -> Primary: write X
Primary -> Client: "success" (acknowledged immediately)
Primary -> Replicas: ships the change (happens after acknowledging the client)

The primary doesn't wait for any replica to confirm receipt before telling the client the write succeeded. Pros: lowest possible write latency, since the client isn't waiting on network round-trips to remote replicas. Cons: if the primary crashes after acknowledging the client but before a replica received the change, that write is lost if a replica is promoted to primary — the replica genuinely never had it.

Synchronous replication

Client -> Primary: write X
Primary -> Replica: ships the change
Replica -> Primary: "received and applied"
Primary -> Client: "success" (only now, after replica confirmation)

Pros: zero data loss on failover — by the time the client is told "success," at least one replica genuinely has the data too, so promoting that replica loses nothing. Cons: meaningfully higher write latency (every write waits on a network round-trip to the replica, and if the replica is slow or unreachable, writes stall or fail depending on configuration) — this cost is paid on every single write, permanently, not just during a failure.

The realistic middle ground: semi-synchronous / quorum-based

Many production systems use a middle configuration — e.g., wait for confirmation from at least one of several replicas (not all), or a quorum, balancing durability guarantees against latency. PostgreSQL supports synchronous_commit tuning with options like remote_write/remote_apply and can designate specific replicas as synchronous while others remain asynchronous, letting you tune exactly how much durability guarantee you're paying latency for.

Choose based on how costly losing the most recent few writes actually is: financial transactions or anything where "we told the customer it succeeded, then it disappeared" is unacceptable strongly favors synchronous (or at least semi-synchronous/quorum) replication for the primary write path; less critical data (analytics events, non-critical logs) can usually tolerate asynchronous replication's small window of potential loss in exchange for consistently lower write latency.

Related Resources

PostgreSQL: High Availability, Load Balancing, and Replication

Open as page

Why shard at all

A single database server has a ceiling on write throughput and total storage, no matter how much you scale vertically. Sharding splits the dataset itself across multiple independent servers, each handling only its own subset of the data — write throughput and storage both scale roughly linearly with the number of shards, unlike read replicas (which copy the entire dataset onto each replica and only help with read scaling, not write or storage scaling).

Range-based sharding

Shard 1: orders where order_date < 2024-01-01
Shard 2: orders where 2024-01-01 <= order_date < 2025-01-01
Shard 3: orders where order_date >= 2025-01-01

Pros: range queries (WHERE order_date BETWEEN ...) can often be satisfied by a single shard or a small contiguous set of shards. Cons: prone to "hot shards" if writes cluster in a narrow range — e.g., all current writes land on the newest shard (today's date range), leaving older shards idle while the newest shard bears all the write load.

Hash-based sharding

shard_number = hash(customer_id) % number_of_shards

Pros: spreads data (and write load) evenly across shards, since a good hash function distributes keys uniformly regardless of any natural clustering in the original values. Cons: range queries become expensive — "all orders in January" no longer maps to one shard, since hashing destroys the original ordering, so satisfying that query means fanning out to every shard and merging results (a "scatter-gather" query).

Directory-based sharding

Lookup service: customer_id -> shard_3
                customer_id -> shard_1
                ... (explicit mapping, stored and consulted per lookup)

Pros: maximum flexibility — shards can be rebalanced by simply updating the directory's mapping for affected keys, without needing to recompute a hash function or redefine ranges. Cons: the directory/lookup service itself becomes a critical dependency and potential bottleneck/single point of failure that must itself be highly available and fast.

Choosing a shard key — the most consequential decision

A shard key that's too low-cardinality, or that correlates with request "hotness" (e.g., sharding by country when 80% of traffic is from one country), creates a hot shard that bottlenecks the whole system regardless of how many shards exist. A shard key that doesn't align with your most common query patterns forces expensive cross-shard "scatter-gather" queries for routine operations. Good shard keys are high-cardinality, roughly evenly distributed in both storage and access frequency, and align with how data is most commonly queried (ideally, most queries can be satisfied by a single shard once the key is known).

What sharding costs you

Cross-shard joins/transactions become hard or impossible in the general case — a join across two entities that happen to live on different shards either isn't supported natively or requires application-level fan-out and merging.
Rebalancing is operationally complex — adding a new shard usually means migrating a subset of data from existing shards to the new one without downtime, a genuinely hard distributed-systems problem many databases (Vitess, Citus for PostgreSQL, MongoDB's native sharding) have built significant tooling around.
Uneven growth over time can gradually recreate hot shards even with an initially good key choice, requiring ongoing monitoring and occasional rebalancing.

Sharding is a significant architectural commitment that should be a last resort after read replicas, caching, and vertical scaling are exhausted — not a default reached for early, given the ongoing operational complexity it introduces.

Related Resources

MongoDB: Sharding

Open as page

The basic pattern

                     [ Primary ]  <- all WRITES go here
                     /    |    \
                    v     v     v
              [Replica1][Replica2][Replica3]  <- READS distributed across these

Application code routes writes to the primary and (some or all) reads to whichever replica is available/least loaded — often via a proxy/load balancer, or explicit read/write connection strings configured in the application.

Why this helps

Most application workloads are read-heavy (often 80-95%+ of database operations are reads). Since replicas can serve reads independently and in parallel, adding more replicas increases total read capacity roughly linearly, without touching the primary's write capacity at all — a much simpler scaling lever than sharding, and doesn't introduce cross-shard query complexity.

The cost: replication lag

Replicas apply changes slightly after the primary commits them (whether via async or sync replication — see that question), so a read hitting a replica immediately after a related write might not see that write yet.

1. Client writes new profile picture URL to the primary.  [committed]
2. Client immediately reads their profile from a replica.
3. Replica hasn't received the replicated change yet -- shows the OLD picture.

This is the classic "read-your-own-writes" consistency problem with read replicas — a real, common source of confusing bug reports ("I just changed X and it still shows the old value!"). Mitigations: route a user's own immediate follow-up reads to the primary for a short window after they write, use "read-after-write" consistency features some managed database services provide, or accept the staleness for read paths where it's genuinely not user-visible/critical.

What replicas don't help with

Read replicas scale read throughput, not write throughput or total storage — every replica holds a full copy of the entire dataset, and every write still has to go through (and be replicated from) the single primary. If write volume, not read volume, is the actual bottleneck, replicas don't help — that's what sharding addresses instead.

Read replicas are usually the first and easiest horizontal scaling step for a read-heavy application, since they require far less architectural change than sharding — mostly routing logic in the application/connection layer, rather than redesigning the data model around a shard key. They're a natural fit for reporting/analytics queries too, since routing expensive analytical reads to a dedicated replica isolates that load from the primary's transactional workload entirely.

Related Resources

AWS: Read Replicas

Open as page

The failover sequence

1. Primary is healthy, replicating to Replica A and Replica B.
2. Primary crashes (hardware failure, network partition, etc.).
3. A monitoring/orchestration system detects the primary is unresponsive
   (via missed heartbeats over some threshold).
4. The most up-to-date, healthy replica (say, Replica A) is PROMOTED to primary.
5. Application connections / a proxy / DNS / a virtual IP are redirected
   to point at the newly-promoted Replica A.
6. Replica B is reconfigured to replicate from the new primary (Replica A).

Key metrics that define "how good" a failover strategy is

RTO (Recovery Time Objective) — how long the system is actually down/unavailable during a failover, from detection to the new primary accepting traffic. Automated failover systems can often achieve RTOs of seconds to low minutes; manual intervention can take much longer.
RPO (Recovery Point Objective) — how much data (measured in time) could be lost in the worst case. With synchronous replication, RPO can be effectively zero; with asynchronous replication, RPO is bounded by however far behind the promoted replica was at the moment of failure (see the synchronous vs. asynchronous replication question).

Detecting failure correctly is harder than it sounds

A naive health check (a single missed heartbeat) risks false positives — briefly failing over due to a transient network blip, not an actual primary failure — which is disruptive and risky in its own right (a "split-brain" scenario, where both the old primary, which actually recovers a moment later, and the newly-promoted replica both believe they're the primary, is a serious and hard-to-clean-up failure mode). Real HA systems use consensus mechanisms or a quorum of independent observers (not a single health checker) to confirm a primary is truly down before triggering promotion, specifically to avoid this.

Components of a full HA setup

Replication — at least one replica must be reasonably current to promote.
Health monitoring / consensus — reliably detects genuine failure without over-triggering on transient issues.
Automated promotion — a replica is reconfigured to accept writes as the new primary.
Client redirection — a proxy, load balancer, virtual IP, or DNS update routes traffic to the new primary without requiring every application instance to be manually reconfigured.
Re-establishing replication topology — surviving replicas need to start following the new primary, and (ideally) the old primary, if it recovers, needs to safely rejoin as a replica rather than as a conflicting second primary.

Managed services vs. self-managed

Cloud-managed database services (AWS RDS/Aurora, Azure SQL, Google Cloud SQL) handle most of this automatically as a built-in feature — often with RTOs in the tens of seconds. Self-managed HA (e.g., PostgreSQL with Patroni + etcd, or MySQL with Orchestrator) requires assembling these pieces explicitly, which is more work but gives more control over exact behavior and thresholds.

Knowing the terms RTO/RPO, and being able to explain the split-brain risk and why naive health-checking is dangerous, demonstrates real operational experience with HA — beyond just "you have a backup server that takes over."

Related Resources

PostgreSQL: Failover

Open as page

Without connection pooling

Request 1 arrives -> open new DB connection -> run query -> close connection
Request 2 arrives -> open new DB connection -> run query -> close connection
...

Every request pays the full cost of establishing a connection: TCP handshake, TLS negotiation (if encrypted), authentication, and session/context setup on the database side — often tens of milliseconds, which can dwarf the actual query's execution time for simple queries. Under load, opening a fresh connection per request also risks exhausting the database's maximum connection limit (PostgreSQL's default max_connections is often 100; each connection also consumes real memory on the server side), causing new connection attempts to fail outright.

With connection pooling

Application startup: pool pre-establishes N connections (e.g., 20)

Request 1 arrives -> borrow a connection from the pool -> run query -> return connection to pool
Request 2 arrives -> borrow a (possibly different, already-open) connection -> run query -> return it

Connections are reused across many requests instead of being opened and torn down each time — amortizing the expensive setup cost across potentially thousands of queries per connection's lifetime, and keeping the total number of actual database connections bounded and predictable regardless of application request volume.

Application-level vs. proxy-level pooling

Application-level pooling — a pool library within the application process/runtime (e.g., HikariCP for Java, most ORM connection pools) manages a fixed set of connections for that one application instance.
Proxy-level pooling (PgBouncer, ProxySQL) — a separate service sits between many application instances and the database, multiplexing a large number of application-side "logical" connections down onto a much smaller number of actual database connections — especially valuable when you have many application instances/processes (e.g., a large fleet of serverless functions or many microservice replicas) that would otherwise each maintain their own pool, collectively still exceeding the database's connection limit.

Sizing a pool

Counterintuitively, a pool that's too large can hurt performance — each active connection consumes server-side resources (memory, and potentially lock/contention overhead), and beyond a certain point, more concurrent connections just means more contention for the same underlying CPU/IO resources rather than more real parallelism. A common rule of thumb (from PostgreSQL's own tooling guidance) is that optimal pool size is often much smaller than intuition suggests — frequently in the range of (2 x CPU cores) + effective spindle count, though the right number always depends on the specific workload and should be tuned/measured rather than assumed.

Why this matters especially with serverless/many-instance architectures

Serverless functions or large horizontally-scaled application fleets can easily spin up far more concurrent application instances than a database's connection limit can handle if each instance maintains even a modest pool — this is one of the most common real-world production incidents ("database ran out of connections during a traffic spike"), and is precisely why a proxy-level pooler (PgBouncer, RDS Proxy) is a standard, near-mandatory component in these architectures.

Related Resources

PostgreSQL: PgBouncer

Open as page

The bug, illustrated

orders = db.query("SELECT * FROM orders WHERE customer_id = 42")  # 1 query, returns 20 orders

for order in orders:
    # This line, inside the loop, fires a SEPARATE query for EVERY order
    items = db.query(f"SELECT * FROM order_items WHERE order_id = {order.id}")

With 20 orders, this executes 1 + 20 = 21 queries total, when the same data could have been fetched in 2 queries (or even 1, with a join). The problem scales linearly with N — 1,000 orders means 1,001 queries — and each query pays its own network round-trip latency, which dominates the total time far more than the actual work the database does per query.

Why ORMs make this especially easy to write by accident

Most ORMs default to lazy loading for related entities — accessing order.items triggers a fresh query at the moment it's accessed, which is convenient and often invisible in code review (it just looks like a normal property access), but silently produces exactly this pattern the instant it happens inside a loop over a collection.

Fix 1: eager loading — fetch related data in one extra batched query

order_ids = [o.id for o in orders]
items = db.query(f"SELECT * FROM order_items WHERE order_id IN ({','.join(order_ids)})")
# group items by order_id in application code -- now just 2 total queries, regardless of N

Most ORMs provide a built-in mechanism for this: .include()/.Include() (Entity Framework), select_related/prefetch_related (Django), JOIN FETCH (JPA/Hibernate) — explicitly telling the ORM to fetch the related data as part of (or immediately following) the initial query, rather than lazily per-access.

Fix 2: a single JOIN query

SELECT o.id AS order_id, oi.product_id, oi.quantity
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.customer_id = 42;

One query total — the application groups the flattened join results back into a nested order/items structure. This avoids the extra round-trip of Fix 1's second query entirely, at the cost of potential row duplication if the "one" side of the join has data you'd otherwise only want once per order (see the join explosion question) — worth watching for if the query also aggregates.

How to catch this in practice

Query logging/APM tools (many ORMs and database drivers support logging every executed query) make N+1 patterns visible as a suspicious burst of near-identical queries differing only by an ID.
Load testing with realistic data volumes — N+1 is often invisible in development with a handful of test rows, and only becomes an obvious, painful problem once N is large in production.
Some ORMs (Django, for instance) have built-in tooling/warnings specifically designed to flag likely N+1 patterns during development.

This is one of the most common real-world performance bugs in application code backed by a database, and recognizing it (plus knowing the eager-loading fix by name in whatever ORM/stack is relevant) is a strong, practical signal of hands-on experience.

Related Resources

Use the Index, Luke: The N+1 Problem

Open as page

The building blocks

Full backups — a complete snapshot of the entire database at a point in time. Simple to restore from (just load it), but large and slow to take/store frequently, and any changes since the last full backup are lost if you can only restore from it alone.

Incremental/differential backups — capture only what's changed since the last backup (incremental: since the last backup of any kind; differential: since the last full backup), reducing storage and backup-window time, at the cost of a more complex restore process (apply the full backup, then each subsequent incremental in order).

Write-ahead log (WAL) archiving — continuously shipping the database's write-ahead log (see that question) to durable storage, enabling point-in-time recovery (PITR): restoring to any specific moment, not just the timestamp of the last full/incremental backup. This is what lets you recover to "11:47:32am, one minute before the bad DELETE ran," rather than only to last night's backup (which would lose everything since then).

Full backup (Sunday) -> WAL continuously archived (Mon, Tue, Wed...) -> incident at Wed 2:15pm
Recovery: restore Sunday's full backup, then replay WAL up to Wed 2:14:59pm

Where backups should live

Never solely on the same server/storage as the primary database — a single disk failure, ransomware attack, or data-center-level incident (fire, regional outage) would destroy the primary and its backups together. Best practice: replicate backups to a genuinely separate storage system, ideally in a different physical region/availability zone from the primary.

Test restores — regularly, not hypothetically

An untested backup is, for practical purposes, not a real backup — corruption, incomplete backup scripts, or subtle configuration drift (e.g., a backup script silently failing for weeks without anyone noticing) are common real-world failure modes that only surface when you actually try to restore. Regularly scheduled restore drills (into an isolated environment, verifying data integrity and application functionality against the restored copy) are the only way to have real confidence the backup strategy actually works when it matters.

Let RTO/RPO targets drive the design

RPO (Recovery Point Objective) — how much data loss (measured in time) is acceptable in the worst case? A low RPO (seconds to minutes) requires continuous WAL archiving/PITR; a higher RPO (hours) might tolerate periodic snapshot-only backups.
RTO (Recovery Time Objective) — how quickly must the system be back online? A low RTO favors having a warm standby/replica ready to promote (see the failover question) over a from-scratch restore from cold backup storage, which can take hours for a large database.

Additional considerations

Encryption of backups at rest (they contain the same sensitive data as the live database, and are often stored somewhere with different access controls that need equally rigorous protection).
Retention policy — how long to keep backups, balancing storage cost, compliance/legal requirements, and the realistic window in which a "we need to recover something from 6 months ago" request might occur.
Runbook documentation — a written, rehearsed procedure for who does what during an actual disaster, since a crisis is exactly the wrong time to be improvising or hunting for credentials/access.

A strong answer goes beyond "we take backups" to explicitly connect the backup architecture to concrete RTO/RPO numbers, and treats regular restore testing as non-negotiable — this reflects real operational maturity rather than textbook knowledge.

Related Resources

PostgreSQL: Continuous Archiving and Point-in-Time Recovery

Open as page

The core idea: log first, apply later

1. Transaction commits: UPDATE accounts SET balance = 500 WHERE id = 1;
2. BEFORE modifying the actual data page on disk, the engine writes a
   WAL record: "change account id=1's balance to 500" to a sequential,
   append-only log file, and durably flushes (fsyncs) that log entry.
3. ONLY THEN does the engine acknowledge the commit to the client as successful.
4. The actual data page update can happen later (asynchronously,
   often batched with other changes for efficiency) -- it's no longer
   urgent, because the WAL already durably captured the intent.

If the server crashes at any point after step 3 but before the data page is actually updated on disk, the WAL entry survives (it was already durably flushed) — on restart, the engine replays any WAL entries not yet reflected in the data files, recovering the change that would otherwise have been lost. This is the mechanism that actually delivers ACID's Durability guarantee.

Why log-then-apply is faster, not just safer

Writing a compact, sequential log entry is much cheaper than immediately updating (and durably flushing) the actual data page, which might be scattered randomly across disk and require more complex I/O. Sequential log writes are also friendlier to disk hardware (especially spinning disks, though the benefit is smaller — if still real — on SSDs) than random-access writes to data files. So WAL isn't purely a safety mechanism — it also lets the engine defer and batch the more expensive data-file writes while still providing an immediate durability guarantee via the cheap sequential log write.

WAL's second job: replication

Because the WAL is a complete, ordered record of every change made to the database, shipping that exact log stream to another server and replaying it there reconstructs an identical copy of the data — this is precisely how physical/log-shipping replication works (PostgreSQL's streaming replication, MySQL's binary log replication is a related-but-distinct mechanism serving the same purpose). The replica doesn't need to re-execute the original SQL statements — it just applies the same low-level logged changes the primary already made, in the same order.

Primary: WAL stream ---> shipped continuously ---> Replica: replays WAL entries

Why this matters for backup/recovery too

WAL archiving (continuously saving WAL files to durable storage) is exactly what enables point-in-time recovery (see the backup/DR question) — replaying WAL entries up to any specific moment reconstructs the database's exact state at that moment, not just at the last full backup's timestamp.

Understanding WAL connects several otherwise-separate-seeming topics — durability, crash recovery, replication, and point-in-time backup/restore — as different applications of the same underlying mechanism, which is exactly the kind of "sees how the pieces fit together" understanding a senior-level interview question is probing for.

Related Resources

PostgreSQL: Write-Ahead Logging (WAL)

Open as page

This is a classic system-design-flavored SQL interview question. The strongest answers present an ordered progression of levers, each justified by what specific bottleneck it addresses, rather than jumping straight to "shard everything."

1. Query and index optimization (cheapest, do this first)

Before adding any infrastructure, confirm the existing queries are actually efficient — proper indexes (see the indexing topic), sargable predicates, no accidental N+1 patterns (see that question), no unnecessary joins or row-explosion bugs. A huge fraction of "we need to scale the database" problems are actually "we have an unindexed slow query" problems in disguise, and this step is far cheaper than any infrastructure change.

2. Caching

Application -> check cache (Redis/Memcached) -> cache hit? return immediately
                                                -> cache miss? query DB, populate cache, return

For read-heavy hot paths (a product page, a user's profile), a cache in front of the database can absorb the overwhelming majority of read traffic, often reducing database load by an order of magnitude for relatively little engineering cost. Requires a cache invalidation strategy (see that this is a genuinely hard problem — "there are only two hard things in computer science: cache invalidation and naming things").

3. Connection pooling

As traffic and application instance count grow, ensure connections are pooled (application-level and/or via a proxy like PgBouncer/RDS Proxy — see that question) so growing the application fleet doesn't independently exhaust the database's connection limit.

4. Read replicas

Once caching and indexing are optimized and read load still exceeds a single primary's comfortable capacity, add read replicas (see that question) to horizontally scale read throughput, routing read-only queries (and especially reporting/analytics queries) away from the primary.

5. Vertical scaling of the primary

Scale the primary's hardware up as far as is practical/economical — modern high-end database hardware can handle a genuinely enormous amount of load, and this remains simpler than horizontal write-scaling for as long as it suffices.

6. Sharding (last resort, biggest commitment)

Only once write throughput or total storage genuinely exceeds what a single (even heavily scaled-up) primary can handle — and after confirming caching/indexing/read-replicas haven't been sufficient — commit to sharding (see that question), accepting its costs: choosing a shard key, losing cross-shard joins/transactions in the general case, and meaningfully higher operational complexity.

Cross-cutting considerations throughout

Denormalization/materialized views for specific expensive, frequently-run aggregations (see those questions).
CQRS-style separation — routing writes through a normalized transactional model while serving reads from a separately optimized, possibly denormalized read model — for systems with very different read vs. write shapes and scale requirements.
Monitoring and load testing at each stage, to confirm the specific bottleneck being addressed is actually the one causing pain, rather than guessing.

The strongest signal isn't naming every possible technique — it's demonstrating that you'd apply them in a justified, incremental order, driven by actual measured bottlenecks (query plans, cache hit rates, replication lag, connection counts), rather than reflexively reaching for the most architecturally impressive-sounding solution (sharding, microservices-per-table) before confirming simpler, cheaper levers are exhausted.

Related Resources

AWS: Scaling Databases