What is SQL injection, and how do you prevent it?

SQL injection occurs when untrusted input is concatenated directly into a SQL string, allowing an attacker to alter the query's structure — e.g., adding `OR '1'='1'` to bypass a login check, or a `; DROP TABLE ...` to destroy data. It's prevented completely by using parameterized queries/prepared statements (never concatenating user input into SQL text), applying least-privilege database accounts, and defense-in-depth measures like input validation and ORMs that parameterize by default.

What is the principle of least privilege, and how does it apply to database user/role design?

Least privilege means every user, application, or service should have only the minimum permissions necessary to do its job — nothing more. In database design, this means creating distinct roles per responsibility (read-only reporting, application read/write, schema migration/admin), granting each only the specific tables/operations it needs, and avoiding a single shared superuser credential used everywhere. It limits the damage a compromised credential, a bug, or a mistake can cause.

How do you encrypt data at rest and in transit in a database?

**In transit**: require TLS/SSL for all client-database connections, so network traffic (including credentials and query results) can't be intercepted/read by anyone with network access between the client and server. **At rest**: use transparent full-disk/tablespace encryption (encrypting the physical storage the database sits on) for baseline protection, and additionally encrypt specific highly sensitive columns (SSNs, payment data) at the application or column level for defense-in-depth, since at-rest encryption alone doesn't protect data once a legitimate, authenticated connection has already accessed it.

What is database auditing, and what should you log?

Database auditing is the practice of recording who did what, when, to which data — a durable trail used for security investigation, compliance, and detecting misuse. At minimum, log authentication attempts (success/failure), schema changes (DDL), privilege changes, and access to particularly sensitive tables/columns; for regulated data, often every read/write to specific tables. Audit logs must themselves be protected from tampering (ideally written to storage the application/admin accounts being audited can't modify or delete).

How do you securely manage database credentials and secrets in an application?

Never hardcode credentials in source code or commit them to version control. Use a dedicated secrets manager (AWS Secrets Manager, HashiCorp Vault, Azure Key Vault) or, at minimum, environment variables injected at deploy time — never checked into the repository. Rotate credentials regularly and automatically where possible, use distinct credentials per environment/service (so a leak in one doesn't compromise others), and ensure the application's runtime identity, not a human developer's personal credentials, is what actually authenticates in production.

What are best practices for schema migrations in production, especially for zero-downtime deployments?

Make schema changes additive and backward-compatible first (add new nullable columns/tables rather than renaming or dropping in place), deploy application code that can work with both the old and new schema during the transition, then clean up the old structure only after the new code path is fully rolled out and verified. For large tables, avoid long-locking operations (some `ALTER TABLE` variants lock the whole table) by using online schema-change tooling or engine features specifically designed to avoid blocking production traffic.

What's the difference between authentication and authorization at the database level?

Authentication answers "who are you" — verifying an identity via credentials (password, certificate, IAM token) before allowing any connection at all. Authorization answers "what are you allowed to do" — once connected as a verified identity, which specific operations (SELECT, INSERT, DROP) on which specific objects (tables, columns, schemas) that identity's granted privileges permit. A database always performs authentication first (to establish identity) and then authorization on every subsequent operation (to check permissions for that identity).

Security and Database Best Practices

Protecting data at rest and in transit, least privilege, auditing, and safe schema change practices.

Difficulty

Open as page

The vulnerability

# VULNERABLE
username = request.form["username"]  # attacker-controlled input
query = f"SELECT * FROM users WHERE username = '{username}'"

If an attacker submits admin' -- as the username:

SELECT * FROM users WHERE username = 'admin' --'

The -- starts a SQL comment, so everything after it (including whatever password check might have followed) is ignored — this can bypass authentication entirely, returning the admin row regardless of password. A more destructive attacker payload like x'; DROP TABLE users; -- could delete data outright, depending on whether the driver/database allows statement stacking.

Why it works

The application intended username to be interpreted purely as data — a literal string value to compare against. But because it was concatenated directly into the SQL text, the database has no way to distinguish "data the developer meant as a literal value" from "SQL syntax" — the attacker's input is parsed as if the developer had typed it themselves.

The complete fix: parameterized queries / prepared statements

# SAFE
cursor.execute("SELECT * FROM users WHERE username = %s", (username,))

As covered in the prepared statements question, this sends the query structure and the parameter value as two separate, distinct channels — the database never re-parses the parameter value as SQL syntax, so there is no possible way for it to alter the query's structure, no matter what characters it contains. This isn't "better escaping" — it's a structurally different mechanism that eliminates the vulnerability class entirely for parameterized values.

Defense in depth (beyond the primary fix)

Least privilege database accounts — the application's database user should have only the permissions it actually needs (e.g., no DROP TABLE permission for a web application's runtime account), limiting the blast radius even if some other injection vector is somehow missed.
Input validation — rejecting obviously malformed input early is good practice, but is not a substitute for parameterization; validation can be bypassed or incomplete, while parameterization is complete by construction.
ORMs — most modern ORMs parameterize queries by default, which is a strong reason to prefer them (or a query builder) over hand-written string-concatenated SQL, though it's still possible to introduce injection through an ORM's "raw query" escape hatches if used carelessly.
WAFs (Web Application Firewalls) — can catch some known injection patterns as an additional layer, but are pattern-matching heuristics, not a reliable primary defense.

What parameterization doesn't cover

Dynamic SQL structure (table names, column names, ORDER BY direction) chosen from user input can't be parameterized the same way — these must be validated against an explicit allow-list of known-safe values rather than ever directly interpolated, even though they "look like" simple identifiers rather than dangerous data.

A strong answer explains why parameterization works (separating code from data at the protocol level), not just "use prepared statements" as a memorized rule — and mentions least privilege as a complementary, not alternative, defense.

Related Resources

OWASP: SQL Injection

OWASP: SQL Injection Prevention Cheat Sheet

Open as page

The problem with over-privileged accounts

-- Anti-pattern: application connects using the database's superuser/owner account
-- for every single query, including simple SELECTs

If this credential leaks (committed to a public repo, exposed in a misconfigured environment variable dump, extracted via a different vulnerability like SQL injection), the attacker gets everything: read all data, modify or delete any table, alter permissions, drop the whole database. A single over-privileged credential turns any one vulnerability into total compromise.

Designing role-based access

-- Read-only role, for reporting/analytics tools
CREATE ROLE reporting_role;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO reporting_role;

-- Application role: read/write on specific tables it actually needs, nothing else
CREATE ROLE app_role;
GRANT SELECT, INSERT, UPDATE, DELETE ON orders, order_items, customers TO app_role;
-- Deliberately NOT granted: DROP, ALTER, access to an unrelated 'admin_audit_log' table

-- Migration/admin role: schema changes, used only by deployment tooling, not the running app
CREATE ROLE migration_role;
GRANT ALL PRIVILEGES ON SCHEMA public TO migration_role;

Each role reflects a distinct responsibility, and the actual credential used by the running application in production should be the narrowly-scoped app_role, not an admin/superuser account — even though that's more convenient during initial development.

Practical guidelines

No shared superuser credentials for routine application traffic. Reserve superuser/owner accounts for genuinely administrative tasks (schema migrations, one-off maintenance), performed by a human or a separate, tightly-controlled deployment pipeline — never the day-to-day running application.
Separate read-only from read-write access wherever the use case allows it — a reporting dashboard or BI tool almost never needs write access, so it shouldn't have it, regardless of how trusted the tool is assumed to be.
Table/column-level grants, not blanket schema access, when the application genuinely only touches a subset of the schema — this limits what a SQL injection vulnerability (or a bug that constructs an unintended query) can actually reach, even if it manages to execute arbitrary SQL as that role.
Separate credentials per environment and per service — a microservice that only needs the orders table shouldn't share a credential with one that manages customers, so a compromise of one service's credential doesn't expose unrelated data.
Regularly audit granted privileges against what's actually used — permissions tend to accumulate over time (a role granted broad access "temporarily" to unblock a one-off task, never revoked) and rarely get pruned without a deliberate review process.

Why this matters beyond "best practice" box-checking

Least privilege is specifically about limiting the blast radius of an inevitable future incident — not preventing every possible attack (defense in depth assumes some layer will eventually fail), but ensuring that when something does go wrong (a leaked credential, an injection bug, a compromised service), the damage is contained to what that specific credential could actually do, rather than exposing the entire database.

Related Resources

PostgreSQL: Database Roles

Open as page

Encryption in transit

Without TLS, database traffic — including the authentication handshake and every query/result — travels as plaintext over the network, readable by anyone positioned to intercept it (a compromised network device, a malicious actor on a shared network, a misconfigured cloud VPC).

# PostgreSQL: require SSL/TLS for connections
sslmode=require   # (or better: verify-full, which also validates the server's certificate)

Most managed cloud database services enforce or strongly encourage TLS by default; self-managed deployments must explicitly configure server-side TLS certificates and require clients to use them (sslmode=require/verify-full in PostgreSQL, useSSL=true for many JDBC-based connections). verify-full (or equivalent) additionally validates that the server's certificate matches the expected hostname, preventing a man-in-the-middle from presenting a different valid certificate.

Encryption at rest — the storage layer

Transparent Data Encryption (TDE) or full-disk/volume encryption encrypts the physical files/blocks the database is stored on, so that if the underlying storage media (a disk, a backup file, a cloud storage snapshot) is stolen or accessed outside the running database process, the data is unreadable without the encryption key.

-- Most cloud-managed databases: TDE is a configuration toggle, not application code
-- e.g., AWS RDS: "Enable encryption" at instance creation, backed by a KMS key

Important limitation: TDE protects data at the storage layer — it does nothing once a query is actually executed through an authenticated connection to a running database instance; at that point, the database decrypts data transparently and returns plaintext results to any authorized query, exactly as if encryption weren't enabled at all. TDE's threat model is specifically "someone got physical/file-level access to the storage without going through the database's authentication," not "protect data from an authenticated but malicious/compromised application."

Column-level / application-level encryption — defense in depth for the most sensitive data

For specific highly sensitive fields (SSNs, payment card data, health records), encrypting the value itself (before it's stored, using an application-managed key, or via database features like PostgreSQL's pgcrypto) means the data stays encrypted even to someone with a valid, authenticated database connection who queries the raw column directly — a materially stronger guarantee than TDE alone provides, at the cost of losing the ability to index, sort, or query that column's plaintext value directly in SQL (you generally need to decrypt in the application layer, or query by a separate deterministic hash/token if you need equality lookups).

-- Example using PostgreSQL's pgcrypto extension
INSERT INTO payment_methods (customer_id, card_number_encrypted)
VALUES (42, pgp_sym_encrypt('4111111111111111', 'encryption-key'));

SELECT pgp_sym_decrypt(card_number_encrypted, 'encryption-key') FROM payment_methods WHERE customer_id = 42;

(In production, the encryption key itself should come from a dedicated key-management service — AWS KMS, HashiCorp Vault — never hardcoded or stored alongside the data it protects.)

Layer both: TLS in transit is close to non-negotiable for any production system handling real user data, TDE-equivalent at-rest encryption is a reasonable default (and often required for compliance regimes like PCI-DSS, HIPAA), and column-level encryption should be reserved specifically for the subset of fields whose exposure would be most damaging — applying it universally adds real query/indexing complexity for data that often doesn't need that extra layer.

Related Resources

PostgreSQL: SSL Support

Open as page

Why auditing matters beyond "logging in general"

Application logs typically capture business events ("order placed," "user logged in"); database auditing specifically answers "who touched this data, and what did they do to it" — critical for investigating a suspected breach, satisfying regulatory compliance (SOC 2, HIPAA, PCI-DSS, GDPR all have audit-trail requirements in some form), and detecting insider misuse (an employee with legitimate access browsing records they have no business reason to view).

What's commonly worth auditing

Authentication events — successful and (especially) failed login attempts, which can reveal brute-force attempts or credential-stuffing attacks in progress.
DDL/schema changes — CREATE, ALTER, DROP statements, since these can indicate either legitimate migrations or an attacker attempting to cover tracks/establish persistence (e.g., creating a backdoor account or table).
Privilege/permission changes — GRANT/REVOKE statements, since privilege escalation is a common step in a real attack chain.
Access to specifically sensitive tables/columns — for regulated data (health records, financial data, PII), logging every SELECT/UPDATE against those specific tables, even by legitimately authorized users, so that unusual access patterns (a support engineer querying far more customer records than their role would normally require) can be detected.
Administrative/superuser actions — anything performed with elevated privileges deserves closer logging scrutiny than routine application-role activity, since it represents higher potential impact if misused.

Implementation approaches

-- PostgreSQL: pgAudit extension provides fine-grained, configurable audit logging
-- (session-level and object-level auditing of DDL, role changes, specific tables, etc.)

Most major engines offer a dedicated auditing feature/extension (SQL Server Audit, Oracle's Unified Auditing, MySQL Enterprise Audit, PostgreSQL's pgAudit) rather than expecting you to hand-roll auditing via ad-hoc triggers — these are generally more complete, more performant, and harder to accidentally bypass than a custom trigger-based approach (see the triggers question for why trigger-based logic can itself be fragile/incomplete).

Protecting the audit trail itself

An audit log that the very users being audited can modify or delete isn't a trustworthy record — best practice is to ship audit logs to a separate, append-only destination (a dedicated logging service, write-once storage, or a system the application's own database credentials have no delete/modify access to) so that even a fully compromised application account can't retroactively erase evidence of what it did.

The cost side of auditing

Fine-grained auditing (logging every single row read, for instance) has real performance and storage costs, and can itself become a source of sensitive-data sprawl (the audit log might contain the very sensitive values it's auditing access to, requiring its own access controls and retention policy). Scope auditing deliberately — broad enough to satisfy real compliance/security needs, not maximal by default — and treat the audit log's own storage and access control with the same rigor as the primary data it's protecting.

Related Resources

PostgreSQL: pgAudit

Open as page

What not to do

# NEVER: hardcoded credential committed to source control
DB_PASSWORD = "SuperSecret123!"
connection = connect(host="prod-db.example.com", password=DB_PASSWORD)

Once a credential is committed to version control, it's effectively permanently compromised — even if later removed, it typically remains in the repository's git history indefinitely unless the history itself is rewritten (a disruptive, often-incomplete remediation), and may have already been cloned, cached, or scraped by automated credential-scanning bots that actively monitor public (and sometimes private) repositories.

Better: environment variables

import os
DB_PASSWORD = os.environ["DB_PASSWORD"]   # injected by the deployment platform, not in source

An improvement — the credential isn't in source code — but environment variables can still leak via process listings, crash dumps/error logs that accidentally include the environment, or misconfigured logging that captures request/environment context.

Best: a dedicated secrets manager

import boto3
client = boto3.client("secretsmanager")
secret = client.get_secret_value(SecretId="prod/db/app-credentials")

Services like AWS Secrets Manager, HashiCorp Vault, Azure Key Vault, or Google Secret Manager provide: centralized, access-controlled storage (who/what can even retrieve a given secret is itself audited and restricted); automatic rotation (the secret's actual value can be changed on a schedule without requiring an application code deployment, as long as the application always fetches the current value rather than caching it indefinitely); and an audit trail of every access to the secret.

Additional practices

Distinct credentials per environment — production, staging, and development should never share a database credential; a leak of the (lower-stakes) development credential shouldn't provide any path to production data.
Distinct credentials per service/application, following least privilege (see that question) — so a leak affecting one service's credential doesn't expose everything every other service can reach.
Automatic rotation wherever the tooling supports it, reducing the window of usefulness for any credential that does leak undetected.
Never use personal/human developer credentials for application runtime authentication — production systems should authenticate as a dedicated service identity, both for auditability (distinguishing "the application did this" from "a specific human did this directly") and so that a developer leaving the company, or having their personal account compromised, doesn't affect production access.
Where supported, prefer IAM-based/managed identity authentication over static passwords entirely — many cloud-managed databases support authenticating via the cloud platform's identity system (e.g., AWS RDS IAM authentication) rather than a long-lived static password at all, eliminating an entire class of "leaked static secret" risk.

Beyond naming a secrets manager, a strong answer recognizes the core underlying principle — secrets should never live in a place that's harder to control access to than the data they protect (source control history being the most common practical failure) — and connects credential hygiene to least privilege and rotation as complementary practices, not a single silver-bullet tool.

Related Resources

OWASP: Secrets Management Cheat Sheet

Open as page

Why naive migrations break zero-downtime deployments

A migration that renames a column, changes its type incompatibly, or drops a column still in use will break any currently-running instance of the old application code the moment it runs — and during a rolling deployment, old and new application code are typically running simultaneously against the same database for some period. A migration must be safe for both code versions to interact with the schema at the same time.

The expand/contract pattern

The standard technique for safely evolving a schema without downtime:

Expand — add the new structure alongside the old, without removing anything yet:

ALTER TABLE users ADD COLUMN email_normalized VARCHAR(255);  -- new, nullable, additive

Migrate/backfill — populate the new structure from the old (often in batches, to avoid a single enormous long-running transaction/lock):

UPDATE users SET email_normalized = LOWER(TRIM(email)) WHERE email_normalized IS NULL;

Dual-write — deploy application code that writes to both the old and new columns, while still reading from the old one, ensuring nothing falls out of sync during the transition.
Cut over reads — deploy application code that reads from the new column instead, once you're confident it's fully populated and being kept in sync.
Contract — only after the new code path has been running successfully for a safe period (and old code is fully retired from the rolling deployment), remove the old column:

ALTER TABLE users DROP COLUMN email;

This pattern trades a single risky migration for several small, individually safe, individually reversible steps — each of which leaves the schema in a state both old and new application code can tolerate.

Watch for locking behavior on large tables

Not all ALTER TABLE operations are equally cheap. Adding a nullable column is typically fast/metadata-only in most modern engines. But adding a NOT NULL column with a default, adding certain indexes, or changing a column's type can require rewriting the entire table and taking a long-held lock that blocks reads/writes for the duration — potentially minutes or hours on a very large table, which is unacceptable for a live production system.

-- Can lock the whole table for a long time on large tables, in some engines/versions
ALTER TABLE orders ADD COLUMN status VARCHAR(20) NOT NULL DEFAULT 'pending';

-- Safer sequence: add nullable first, backfill in batches, THEN add NOT NULL
ALTER TABLE orders ADD COLUMN status VARCHAR(20);
-- ... batch UPDATE to backfill ...
ALTER TABLE orders ALTER COLUMN status SET NOT NULL;  -- now a much cheaper metadata-only check in many engines, once no NULLs remain

Check your specific engine/version's documented locking behavior for each DDL operation — this has genuinely improved over time (recent PostgreSQL versions optimize several previously-locking operations to be much cheaper), so version-specific verification matters more than assuming a rule of thumb still holds.

Tooling for large-scale online schema changes

For very large tables where even a brief lock is unacceptable, dedicated online schema-change tools (gh-ost, pt-online-schema-change for MySQL; similar patterns achievable in PostgreSQL via triggers-based shadow tables, or newer engine-native online DDL support) create a new table with the desired schema, copy data over in the background while capturing ongoing changes, then perform a quick atomic swap — avoiding a long table-level lock entirely.

Recognizing that a migration must be safe for both old and new application code simultaneously (not just "safe for the new code"), and being able to describe the expand/contract pattern concretely, distinguishes real production migration experience from someone who's only run migrations against a database with no concurrent traffic.

Related Resources

PostgreSQL: ALTER TABLE

Open as page

Authentication — establishing identity

Client connects: "I am app_user, here's my password/certificate/token"
Database: verifies the credential -- if valid, the connection is established
          as that specific, verified identity. If invalid, connection is rejected outright.

Common authentication mechanisms: password-based (the simplest, and weakest if not paired with strong password policies and TLS), certificate-based (mutual TLS, where the client presents a certificate the server trusts), and identity-federation/IAM-based (the database trusts tokens issued by an external identity provider — e.g., AWS RDS IAM authentication, or Kerberos/Active Directory integration in enterprise environments).

Authorization — establishing permissions

Once connected as a verified identity, every subsequent operation is checked against that identity's granted privileges:

-- Connected as app_user (already authenticated)
SELECT * FROM orders;          -- allowed, if app_user was GRANTed SELECT on orders
DROP TABLE orders;              -- rejected: "permission denied", if app_user lacks DROP privilege
SELECT * FROM admin_secrets;    -- rejected: "permission denied", if app_user has no grant on this table

Authorization is enforced via the granted privilege model (GRANT/REVOKE, roles — see the least-privilege question) and is checked per operation, not just once at connection time — a successfully authenticated connection doesn't mean unrestricted access; it means access exactly as broad as that identity's granted permissions.

Why the distinction matters practically

A connection can be successfully authenticated (the password/certificate was valid) but still have every actual operation denied by authorization (the identity has no useful privileges granted) — these are two entirely separate failure modes with different causes and fixes. "Authentication failed" means the credential itself was wrong/invalid; "permission denied" means the credential was valid, but that identity simply isn't allowed to do the specific thing it tried to do. Confusing the two when debugging a production access issue wastes time looking in the wrong place (resetting a password won't fix a missing GRANT, and vice versa).

How this maps to the broader security picture

Authentication is the front door — verifying you're really who you claim to be before letting you in at all.
Authorization is what happens at every room inside the house once you're in — just because you got through the front door doesn't mean every room is unlocked to you.

Both are necessary and distinct layers: strong authentication with overly broad authorization (everyone who can log in has full access to everything) still leaves you exposed to any authenticated-but-compromised or malicious identity; conversely, tight authorization is useless if authentication itself can be trivially bypassed or credentials are easily stolen/guessed. A secure system needs both layers implemented deliberately and independently.

Related Resources

PostgreSQL: Client Authentication