What is connection pooling, and why does it matter at scale?

Detailed Answer

Without connection pooling

Request 1 arrives -> open new DB connection -> run query -> close connection
Request 2 arrives -> open new DB connection -> run query -> close connection
...

Every request pays the full cost of establishing a connection: TCP handshake, TLS negotiation (if encrypted), authentication, and session/context setup on the database side — often tens of milliseconds, which can dwarf the actual query's execution time for simple queries. Under load, opening a fresh connection per request also risks exhausting the database's maximum connection limit (PostgreSQL's default max_connections is often 100; each connection also consumes real memory on the server side), causing new connection attempts to fail outright.

With connection pooling

Application startup: pool pre-establishes N connections (e.g., 20)

Request 1 arrives -> borrow a connection from the pool -> run query -> return connection to pool
Request 2 arrives -> borrow a (possibly different, already-open) connection -> run query -> return it

Connections are reused across many requests instead of being opened and torn down each time — amortizing the expensive setup cost across potentially thousands of queries per connection's lifetime, and keeping the total number of actual database connections bounded and predictable regardless of application request volume.

Application-level vs. proxy-level pooling

Application-level pooling — a pool library within the application process/runtime (e.g., HikariCP for Java, most ORM connection pools) manages a fixed set of connections for that one application instance.
Proxy-level pooling (PgBouncer, ProxySQL) — a separate service sits between many application instances and the database, multiplexing a large number of application-side "logical" connections down onto a much smaller number of actual database connections — especially valuable when you have many application instances/processes (e.g., a large fleet of serverless functions or many microservice replicas) that would otherwise each maintain their own pool, collectively still exceeding the database's connection limit.

Sizing a pool

Counterintuitively, a pool that's too large can hurt performance — each active connection consumes server-side resources (memory, and potentially lock/contention overhead), and beyond a certain point, more concurrent connections just means more contention for the same underlying CPU/IO resources rather than more real parallelism. A common rule of thumb (from PostgreSQL's own tooling guidance) is that optimal pool size is often much smaller than intuition suggests — frequently in the range of (2 x CPU cores) + effective spindle count, though the right number always depends on the specific workload and should be tuned/measured rather than assumed.

Why this matters especially with serverless/many-instance architectures

Serverless functions or large horizontally-scaled application fleets can easily spin up far more concurrent application instances than a database's connection limit can handle if each instance maintains even a modest pool — this is one of the most common real-world production incidents ("database ran out of connections during a traffic spike"), and is precisely why a proxy-level pooler (PgBouncer, RDS Proxy) is a standard, near-mandatory component in these architectures.