How would you scale a relational database to handle millions of users?

8 minadvancedscalingsystem-designcachingsharding

Quick Answer

Work through scaling levers roughly in order of cost/complexity: optimize queries and indexes first, add caching (application-level and/or a dedicated cache like Redis) for hot read paths, add read replicas to scale read throughput, scale the primary vertically as far as practical, and only then consider sharding for write-throughput/storage limits that nothing else can address. Layer in connection pooling throughout, and treat each step as something to justify with actual measured bottlenecks, not a checklist to apply preemptively.

Detailed Answer

This is a classic system-design-flavored SQL interview question. The strongest answers present an ordered progression of levers, each justified by what specific bottleneck it addresses, rather than jumping straight to "shard everything."

1. Query and index optimization (cheapest, do this first)

Before adding any infrastructure, confirm the existing queries are actually efficient — proper indexes (see the indexing topic), sargable predicates, no accidental N+1 patterns (see that question), no unnecessary joins or row-explosion bugs. A huge fraction of "we need to scale the database" problems are actually "we have an unindexed slow query" problems in disguise, and this step is far cheaper than any infrastructure change.

2. Caching

Application -> check cache (Redis/Memcached) -> cache hit? return immediately
                                                -> cache miss? query DB, populate cache, return

For read-heavy hot paths (a product page, a user's profile), a cache in front of the database can absorb the overwhelming majority of read traffic, often reducing database load by an order of magnitude for relatively little engineering cost. Requires a cache invalidation strategy (see that this is a genuinely hard problem — "there are only two hard things in computer science: cache invalidation and naming things").

3. Connection pooling

As traffic and application instance count grow, ensure connections are pooled (application-level and/or via a proxy like PgBouncer/RDS Proxy — see that question) so growing the application fleet doesn't independently exhaust the database's connection limit.

4. Read replicas

Once caching and indexing are optimized and read load still exceeds a single primary's comfortable capacity, add read replicas (see that question) to horizontally scale read throughput, routing read-only queries (and especially reporting/analytics queries) away from the primary.

5. Vertical scaling of the primary

Scale the primary's hardware up as far as is practical/economical — modern high-end database hardware can handle a genuinely enormous amount of load, and this remains simpler than horizontal write-scaling for as long as it suffices.

6. Sharding (last resort, biggest commitment)

Only once write throughput or total storage genuinely exceeds what a single (even heavily scaled-up) primary can handle — and after confirming caching/indexing/read-replicas haven't been sufficient — commit to sharding (see that question), accepting its costs: choosing a shard key, losing cross-shard joins/transactions in the general case, and meaningfully higher operational complexity.

Cross-cutting considerations throughout

  • Denormalization/materialized views for specific expensive, frequently-run aggregations (see those questions).
  • CQRS-style separation — routing writes through a normalized transactional model while serving reads from a separately optimized, possibly denormalized read model — for systems with very different read vs. write shapes and scale requirements.
  • Monitoring and load testing at each stage, to confirm the specific bottleneck being addressed is actually the one causing pain, rather than guessing.

The strongest signal isn't naming every possible technique — it's demonstrating that you'd apply them in a justified, incremental order, driven by actual measured bottlenecks (query plans, cache hit rates, replication lag, connection counts), rather than reflexively reaching for the most architecturally impressive-sounding solution (sharding, microservices-per-table) before confirming simpler, cheaper levers are exhausted.

Related Resources