The vulnerability
# VULNERABLE
username = request.form["username"] # attacker-controlled input
query = f"SELECT * FROM users WHERE username = '{username}'"
If an attacker submits admin' -- as the username:
SELECT * FROM users WHERE username = 'admin' --'
The -- starts a SQL comment, so everything after it (including whatever password check might have followed) is ignored — this can bypass authentication entirely, returning the admin row regardless of password. A more destructive attacker payload like x'; DROP TABLE users; -- could delete data outright, depending on whether the driver/database allows statement stacking.
Why it works
The application intended username to be interpreted purely as data — a literal string value to compare against. But because it was concatenated directly into the SQL text, the database has no way to distinguish "data the developer meant as a literal value" from "SQL syntax" — the attacker's input is parsed as if the developer had typed it themselves.
The complete fix: parameterized queries / prepared statements
# SAFE
cursor.execute("SELECT * FROM users WHERE username = %s", (username,))
As covered in the prepared statements question, this sends the query structure and the parameter value as two separate, distinct channels — the database never re-parses the parameter value as SQL syntax, so there is no possible way for it to alter the query's structure, no matter what characters it contains. This isn't "better escaping" — it's a structurally different mechanism that eliminates the vulnerability class entirely for parameterized values.
Defense in depth (beyond the primary fix)
- Least privilege database accounts — the application's database user should have only the permissions it actually needs (e.g., no
DROP TABLEpermission for a web application's runtime account), limiting the blast radius even if some other injection vector is somehow missed. - Input validation — rejecting obviously malformed input early is good practice, but is not a substitute for parameterization; validation can be bypassed or incomplete, while parameterization is complete by construction.
- ORMs — most modern ORMs parameterize queries by default, which is a strong reason to prefer them (or a query builder) over hand-written string-concatenated SQL, though it's still possible to introduce injection through an ORM's "raw query" escape hatches if used carelessly.
- WAFs (Web Application Firewalls) — can catch some known injection patterns as an additional layer, but are pattern-matching heuristics, not a reliable primary defense.
What parameterization doesn't cover
Dynamic SQL structure (table names, column names, ORDER BY direction) chosen from user input can't be parameterized the same way — these must be validated against an explicit allow-list of known-safe values rather than ever directly interpolated, even though they "look like" simple identifiers rather than dangerous data.
A strong answer explains why parameterization works (separating code from data at the protocol level), not just "use prepared statements" as a memorized rule — and mentions least privilege as a complementary, not alternative, defense.
Related Resources
The problem with over-privileged accounts
-- Anti-pattern: application connects using the database's superuser/owner account
-- for every single query, including simple SELECTs
If this credential leaks (committed to a public repo, exposed in a misconfigured environment variable dump, extracted via a different vulnerability like SQL injection), the attacker gets everything: read all data, modify or delete any table, alter permissions, drop the whole database. A single over-privileged credential turns any one vulnerability into total compromise.
Designing role-based access
-- Read-only role, for reporting/analytics tools
CREATE ROLE reporting_role;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO reporting_role;
-- Application role: read/write on specific tables it actually needs, nothing else
CREATE ROLE app_role;
GRANT SELECT, INSERT, UPDATE, DELETE ON orders, order_items, customers TO app_role;
-- Deliberately NOT granted: DROP, ALTER, access to an unrelated 'admin_audit_log' table
-- Migration/admin role: schema changes, used only by deployment tooling, not the running app
CREATE ROLE migration_role;
GRANT ALL PRIVILEGES ON SCHEMA public TO migration_role;
Each role reflects a distinct responsibility, and the actual credential used by the running application in production should be the narrowly-scoped app_role, not an admin/superuser account — even though that's more convenient during initial development.
Practical guidelines
- No shared superuser credentials for routine application traffic. Reserve superuser/owner accounts for genuinely administrative tasks (schema migrations, one-off maintenance), performed by a human or a separate, tightly-controlled deployment pipeline — never the day-to-day running application.
- Separate read-only from read-write access wherever the use case allows it — a reporting dashboard or BI tool almost never needs write access, so it shouldn't have it, regardless of how trusted the tool is assumed to be.
- Table/column-level grants, not blanket schema access, when the application genuinely only touches a subset of the schema — this limits what a SQL injection vulnerability (or a bug that constructs an unintended query) can actually reach, even if it manages to execute arbitrary SQL as that role.
- Separate credentials per environment and per service — a microservice that only needs the
orderstable shouldn't share a credential with one that managescustomers, so a compromise of one service's credential doesn't expose unrelated data. - Regularly audit granted privileges against what's actually used — permissions tend to accumulate over time (a role granted broad access "temporarily" to unblock a one-off task, never revoked) and rarely get pruned without a deliberate review process.
Why this matters beyond "best practice" box-checking
Least privilege is specifically about limiting the blast radius of an inevitable future incident — not preventing every possible attack (defense in depth assumes some layer will eventually fail), but ensuring that when something does go wrong (a leaked credential, an injection bug, a compromised service), the damage is contained to what that specific credential could actually do, rather than exposing the entire database.
Related Resources
Encryption in transit
Without TLS, database traffic — including the authentication handshake and every query/result — travels as plaintext over the network, readable by anyone positioned to intercept it (a compromised network device, a malicious actor on a shared network, a misconfigured cloud VPC).
# PostgreSQL: require SSL/TLS for connections
sslmode=require # (or better: verify-full, which also validates the server's certificate)
Most managed cloud database services enforce or strongly encourage TLS by default; self-managed deployments must explicitly configure server-side TLS certificates and require clients to use them (sslmode=require/verify-full in PostgreSQL, useSSL=true for many JDBC-based connections). verify-full (or equivalent) additionally validates that the server's certificate matches the expected hostname, preventing a man-in-the-middle from presenting a different valid certificate.
Encryption at rest — the storage layer
Transparent Data Encryption (TDE) or full-disk/volume encryption encrypts the physical files/blocks the database is stored on, so that if the underlying storage media (a disk, a backup file, a cloud storage snapshot) is stolen or accessed outside the running database process, the data is unreadable without the encryption key.
-- Most cloud-managed databases: TDE is a configuration toggle, not application code
-- e.g., AWS RDS: "Enable encryption" at instance creation, backed by a KMS key
Important limitation: TDE protects data at the storage layer — it does nothing once a query is actually executed through an authenticated connection to a running database instance; at that point, the database decrypts data transparently and returns plaintext results to any authorized query, exactly as if encryption weren't enabled at all. TDE's threat model is specifically "someone got physical/file-level access to the storage without going through the database's authentication," not "protect data from an authenticated but malicious/compromised application."
Column-level / application-level encryption — defense in depth for the most sensitive data
For specific highly sensitive fields (SSNs, payment card data, health records), encrypting the value itself (before it's stored, using an application-managed key, or via database features like PostgreSQL's pgcrypto) means the data stays encrypted even to someone with a valid, authenticated database connection who queries the raw column directly — a materially stronger guarantee than TDE alone provides, at the cost of losing the ability to index, sort, or query that column's plaintext value directly in SQL (you generally need to decrypt in the application layer, or query by a separate deterministic hash/token if you need equality lookups).
-- Example using PostgreSQL's pgcrypto extension
INSERT INTO payment_methods (customer_id, card_number_encrypted)
VALUES (42, pgp_sym_encrypt('4111111111111111', 'encryption-key'));
SELECT pgp_sym_decrypt(card_number_encrypted, 'encryption-key') FROM payment_methods WHERE customer_id = 42;
(In production, the encryption key itself should come from a dedicated key-management service — AWS KMS, HashiCorp Vault — never hardcoded or stored alongside the data it protects.)
Layer both: TLS in transit is close to non-negotiable for any production system handling real user data, TDE-equivalent at-rest encryption is a reasonable default (and often required for compliance regimes like PCI-DSS, HIPAA), and column-level encryption should be reserved specifically for the subset of fields whose exposure would be most damaging — applying it universally adds real query/indexing complexity for data that often doesn't need that extra layer.
Related Resources
Why auditing matters beyond "logging in general"
Application logs typically capture business events ("order placed," "user logged in"); database auditing specifically answers "who touched this data, and what did they do to it" — critical for investigating a suspected breach, satisfying regulatory compliance (SOC 2, HIPAA, PCI-DSS, GDPR all have audit-trail requirements in some form), and detecting insider misuse (an employee with legitimate access browsing records they have no business reason to view).
What's commonly worth auditing
- Authentication events — successful and (especially) failed login attempts, which can reveal brute-force attempts or credential-stuffing attacks in progress.
- DDL/schema changes —
CREATE,ALTER,DROPstatements, since these can indicate either legitimate migrations or an attacker attempting to cover tracks/establish persistence (e.g., creating a backdoor account or table). - Privilege/permission changes —
GRANT/REVOKEstatements, since privilege escalation is a common step in a real attack chain. - Access to specifically sensitive tables/columns — for regulated data (health records, financial data, PII), logging every
SELECT/UPDATEagainst those specific tables, even by legitimately authorized users, so that unusual access patterns (a support engineer querying far more customer records than their role would normally require) can be detected. - Administrative/superuser actions — anything performed with elevated privileges deserves closer logging scrutiny than routine application-role activity, since it represents higher potential impact if misused.
Implementation approaches
-- PostgreSQL: pgAudit extension provides fine-grained, configurable audit logging
-- (session-level and object-level auditing of DDL, role changes, specific tables, etc.)
Most major engines offer a dedicated auditing feature/extension (SQL Server Audit, Oracle's Unified Auditing, MySQL Enterprise Audit, PostgreSQL's pgAudit) rather than expecting you to hand-roll auditing via ad-hoc triggers — these are generally more complete, more performant, and harder to accidentally bypass than a custom trigger-based approach (see the triggers question for why trigger-based logic can itself be fragile/incomplete).
Protecting the audit trail itself
An audit log that the very users being audited can modify or delete isn't a trustworthy record — best practice is to ship audit logs to a separate, append-only destination (a dedicated logging service, write-once storage, or a system the application's own database credentials have no delete/modify access to) so that even a fully compromised application account can't retroactively erase evidence of what it did.
The cost side of auditing
Fine-grained auditing (logging every single row read, for instance) has real performance and storage costs, and can itself become a source of sensitive-data sprawl (the audit log might contain the very sensitive values it's auditing access to, requiring its own access controls and retention policy). Scope auditing deliberately — broad enough to satisfy real compliance/security needs, not maximal by default — and treat the audit log's own storage and access control with the same rigor as the primary data it's protecting.
Related Resources
What not to do
# NEVER: hardcoded credential committed to source control
DB_PASSWORD = "SuperSecret123!"
connection = connect(host="prod-db.example.com", password=DB_PASSWORD)
Once a credential is committed to version control, it's effectively permanently compromised — even if later removed, it typically remains in the repository's git history indefinitely unless the history itself is rewritten (a disruptive, often-incomplete remediation), and may have already been cloned, cached, or scraped by automated credential-scanning bots that actively monitor public (and sometimes private) repositories.
Better: environment variables
import os
DB_PASSWORD = os.environ["DB_PASSWORD"] # injected by the deployment platform, not in source
An improvement — the credential isn't in source code — but environment variables can still leak via process listings, crash dumps/error logs that accidentally include the environment, or misconfigured logging that captures request/environment context.
Best: a dedicated secrets manager
import boto3
client = boto3.client("secretsmanager")
secret = client.get_secret_value(SecretId="prod/db/app-credentials")
Services like AWS Secrets Manager, HashiCorp Vault, Azure Key Vault, or Google Secret Manager provide: centralized, access-controlled storage (who/what can even retrieve a given secret is itself audited and restricted); automatic rotation (the secret's actual value can be changed on a schedule without requiring an application code deployment, as long as the application always fetches the current value rather than caching it indefinitely); and an audit trail of every access to the secret.
Additional practices
- Distinct credentials per environment — production, staging, and development should never share a database credential; a leak of the (lower-stakes) development credential shouldn't provide any path to production data.
- Distinct credentials per service/application, following least privilege (see that question) — so a leak affecting one service's credential doesn't expose everything every other service can reach.
- Automatic rotation wherever the tooling supports it, reducing the window of usefulness for any credential that does leak undetected.
- Never use personal/human developer credentials for application runtime authentication — production systems should authenticate as a dedicated service identity, both for auditability (distinguishing "the application did this" from "a specific human did this directly") and so that a developer leaving the company, or having their personal account compromised, doesn't affect production access.
- Where supported, prefer IAM-based/managed identity authentication over static passwords entirely — many cloud-managed databases support authenticating via the cloud platform's identity system (e.g., AWS RDS IAM authentication) rather than a long-lived static password at all, eliminating an entire class of "leaked static secret" risk.
Beyond naming a secrets manager, a strong answer recognizes the core underlying principle — secrets should never live in a place that's harder to control access to than the data they protect (source control history being the most common practical failure) — and connects credential hygiene to least privilege and rotation as complementary practices, not a single silver-bullet tool.