What is a stored procedure, and what are its pros/cons vs. application-layer logic?

A stored procedure is precompiled, named logic stored and executed inside the database itself, callable via `CALL`/`EXEC`. Pros: reduced network round-trips, logic co-located with the data it operates on, consistent enforcement across every caller. Cons: harder to version/test/deploy alongside application code, ties business logic to a specific database engine, and can obscure logic from developers used to reading application-layer code.

What's the difference between a stored procedure and a user-defined function?

A **function** must return a value (scalar or table) and, in most engines, cannot perform transaction control (`COMMIT`/`ROLLBACK`) or arbitrary side effects — it's meant to be called inside a `SELECT` like any built-in function. A **stored procedure** doesn't have to return anything (or can return multiple result sets/output parameters), can manage transactions itself, and is invoked with `CALL`/`EXEC` rather than embedded inside a query expression.

What is a trigger, and what are common use cases and pitfalls?

A trigger is a block of logic the database automatically executes in response to a table event (`INSERT`, `UPDATE`, `DELETE`), either `BEFORE` or `AFTER` the event, per-row or per-statement. Common legitimate uses: maintaining audit logs, enforcing complex invariants across tables, keeping denormalized columns in sync. Common pitfalls: hidden "action at a distance" logic that surprises developers, performance overhead on high-write tables, and cascading triggers that are hard to reason about or debug.

What are scalar functions vs table-valued functions?

A scalar function returns a single value (a number, string, date, etc.) and is used anywhere a single expression is valid, like inside a `SELECT` list or `WHERE` clause. A table-valued function returns an entire result set (a set of rows and columns) and is used in the `FROM` clause like a regular table or view, optionally parameterized — effectively a "parameterized view."

How do you handle errors and exceptions inside a stored procedure?

Most procedural SQL dialects provide a `TRY`/`CATCH` (SQL Server) or `BEGIN ... EXCEPTION WHEN ... END` (PostgreSQL PL/pgSQL) block to catch errors, inspect error details, and decide whether to roll back, re-raise, or handle them gracefully. The key discipline is the same as in application code: catch specific, expected error conditions deliberately, avoid silently swallowing unexpected errors, and ensure partial work is rolled back rather than left in an inconsistent state.

What are the risks of putting significant business logic in triggers?

Triggers execute invisibly relative to the statement that fired them, which makes system behavior harder to trace, test, and reason about; they add write-path overhead that's easy to forget is even happening; and they risk cascading/recursive chains across tables that are difficult to debug. The general rule: reserve triggers for invariants or side effects that genuinely must be enforced regardless of the caller, and keep anything resembling actual business workflow logic in application code instead.

How do prepared statements work, and how do they help with SQL injection and performance?

A prepared statement separates the SQL command's structure (parsed and planned once) from its parameter values (bound and sent separately, never concatenated into the query text). This closes SQL injection entirely for the parameterized values, since user input is never interpreted as SQL syntax — only as a literal value. It can also improve performance when the same statement shape is executed repeatedly, since the database can reuse the parsed/planned form instead of re-parsing identical-shaped SQL each time.

Stored Procedures, Functions, and Triggers

Server-side logic — procedures, user-defined functions, triggers, and prepared statements.

Difficulty

Open as page

CREATE PROCEDURE transfer_funds(
    IN from_account INT,
    IN to_account INT,
    IN amount NUMERIC
)
LANGUAGE plpgsql
AS $$
BEGIN
    UPDATE accounts SET balance = balance - amount WHERE id = from_account;
    UPDATE accounts SET balance = balance + amount WHERE id = to_account;

    IF (SELECT balance FROM accounts WHERE id = from_account) < 0 THEN
        RAISE EXCEPTION 'Insufficient funds';
    END IF;
END;
$$;

CALL transfer_funds(1, 2, 100);

Advantages

Fewer network round-trips. Multiple statements execute as one call instead of several separate queries from the application, each paying network latency — meaningful for logic with many small, dependent steps.
Logic co-located with data. Business rules enforced in a procedure apply no matter which application, script, or ad-hoc tool touches the database — you can't accidentally bypass validation by writing to the table through a different code path.
Reduced data transfer. Complex calculations happen where the data already lives, rather than pulling large intermediate result sets to the application tier just to compute something and write it back.
Precompiled execution plan (in some engines) — repeated calls can reuse a cached plan rather than re-parsing/re-planning every time, though modern query engines also cache plans for parameterized ad-hoc queries, narrowing this advantage.

Disadvantages

Weaker tooling for versioning/testing. Application code benefits from mature version control diffing, unit testing frameworks, code review tooling, and CI pipelines; stored procedure logic often lives partially outside that ecosystem unless deliberately integrated (migration-based deployment, dedicated SQL test frameworks).
Vendor lock-in. Procedural SQL dialects (PL/pgSQL, T-SQL, PL/SQL) are not portable across engines — logic written for PostgreSQL doesn't run on SQL Server without a rewrite, unlike application code written in a general-purpose language.
Split logic, harder to reason about. A developer reading application code may not realize significant business logic actually lives in the database, making the system harder to understand holistically and harder to debug with standard application debugging tools.
Scaling application logic independently of the database becomes harder — application-tier logic can scale horizontally across many stateless app servers; database-tier logic is bottlenecked by the database's own compute capacity.

Most modern application architectures favor keeping business logic in the application layer, reserving stored procedures for narrow cases where their advantages are decisive: enforcing an invariant that absolutely must never be bypassed regardless of caller, or a genuinely data-intensive operation where minimizing round-trips/data transfer matters more than tooling/portability concerns.

Related Resources

PostgreSQL: CREATE PROCEDURE

Open as page

Function — used inside expressions, must return a value

CREATE FUNCTION get_full_name(first_name TEXT, last_name TEXT)
RETURNS TEXT
LANGUAGE sql
IMMUTABLE
AS $$
    SELECT first_name || ' ' || last_name;
$$;

SELECT get_full_name(first_name, last_name) FROM employees;   -- used directly in SELECT

Procedure — invoked standalone, not embedded in a query

CREATE PROCEDURE archive_old_orders(cutoff_date DATE)
LANGUAGE plpgsql
AS $$
BEGIN
    INSERT INTO orders_archive SELECT * FROM orders WHERE order_date < cutoff_date;
    DELETE FROM orders WHERE order_date < cutoff_date;
    COMMIT;   -- procedures can manage their own transaction in engines that allow it
END;
$$;

CALL archive_old_orders('2020-01-01');   -- can't be used inside a SELECT

Key differences

	Function	Procedure
Must return a value	Yes	No (can return nothing, out parameters, or multiple result sets)
Callable inside `SELECT`/expressions	Yes	No — invoked with `CALL`/`EXEC`
Can control transactions (`COMMIT`/`ROLLBACK`)	Generally no	Yes, in engines that support it (PostgreSQL, SQL Server)
Typical use	Computing/transforming a value, encapsulating reusable expressions	Multi-step business operations, batch jobs, administrative tasks

Why the transaction-control distinction matters

Because a function is meant to be composable inside an arbitrary query (potentially called many times per row, or nested inside other expressions), most engines forbid functions from committing or rolling back — doing so mid-query would make no sense, since the calling query itself is part of some outer transaction context it doesn't control. Procedures, invoked as standalone top-level statements, don't have that constraint and can legitimately manage a whole multi-step transaction internally.

Terminology varies by engine

Older MySQL versions only had "stored procedures" (no true standalone functions distinct from procedures in the same sense); PostgreSQL added genuine standalone PROCEDURE objects (separate from FUNCTION) only in version 11. Always check what your specific engine/version actually supports before assuming feature parity.

Related Resources

PostgreSQL: User-Defined Functions

Open as page

Anatomy of a trigger

CREATE OR REPLACE FUNCTION log_salary_change()
RETURNS TRIGGER AS $$
BEGIN
    IF NEW.salary <> OLD.salary THEN
        INSERT INTO salary_audit (employee_id, old_salary, new_salary, changed_at)
        VALUES (OLD.id, OLD.salary, NEW.salary, now());
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_log_salary_change
AFTER UPDATE ON employees
FOR EACH ROW
EXECUTE FUNCTION log_salary_change();

BEFORE vs AFTER: a BEFORE trigger can inspect and modify the row before it's written (e.g., auto-populating updated_at), and can even prevent the write entirely by raising an exception; an AFTER trigger runs once the change is already committed to the row and is typically used for side effects (logging, cascading updates) rather than modifying the row itself.
FOR EACH ROW vs statement-level: a row-level trigger fires once per affected row; a statement-level trigger fires once total, regardless of how many rows a single statement touched.
OLD and NEW: special row references available inside the trigger — OLD is the row's prior state (available for UPDATE/DELETE), NEW is the row's new state (available for INSERT/UPDATE).

Legitimate use cases

Audit logging — recording who changed what and when, guaranteed to fire regardless of which application or tool made the change.
Enforcing invariants too complex for a CHECK constraint — e.g., "the sum of allocations across related rows can never exceed a parent's capacity," which spans multiple rows/tables.
Maintaining denormalized/derived columns — keeping a cached order_count on a customers row in sync whenever orders changes, without every application code path having to remember to update it manually.
Enforcing referential integrity rules more complex than a standard foreign key can express (e.g., conditional referential integrity).

Common pitfalls

Hidden "action at a distance." A developer looking at UPDATE orders SET status = 'shipped' has no way to know, just from reading that line, that it also silently fires three triggers that update two other tables — this makes systems significantly harder to reason about and debug, especially for someone new to the codebase.
Performance overhead on high-throughput writes. Every trigger adds work to every matching INSERT/UPDATE/DELETE; a row-level trigger on a table with heavy write volume can become a meaningful bottleneck, especially if the trigger itself does further queries/writes.
Cascading and recursive triggers. A trigger on table A that writes to table B, which has its own trigger that writes back to table A, can create confusing chains of execution (or, if unguarded, infinite loops) that are very difficult to trace.
Silent failure surprises. If a BEFORE trigger raises an exception, the original statement fails too — this is usually desired (it enforces the invariant), but if the trigger's logic has a bug, it can cause completely unrelated-looking application writes to fail with a confusing error that doesn't obviously point back to the trigger.

Triggers are the right tool when an invariant or side effect must be enforced regardless of caller and cannot be expressed as a simpler constraint. For anything that could reasonably live in explicit application code instead (most business logic), prefer application code — it's visible, testable, and versioned alongside the rest of the system, rather than hidden in the schema.

Related Resources

PostgreSQL: Triggers

Open as page

Scalar function — returns one value

CREATE FUNCTION age_in_years(birth_date DATE)
RETURNS INT
LANGUAGE sql
IMMUTABLE
AS $$
    SELECT EXTRACT(YEAR FROM AGE(birth_date))::INT;
$$;

SELECT name, age_in_years(birth_date) AS age FROM people;

Used exactly like a built-in function (UPPER(), COALESCE()) — once per row, in a SELECT list, WHERE, ORDER BY, anywhere a single expression is valid.

Table-valued function — returns a full result set

CREATE FUNCTION orders_for_customer(cust_id INT)
RETURNS TABLE (order_id INT, order_date DATE, total NUMERIC)
LANGUAGE sql
AS $$
    SELECT id, order_date, total FROM orders WHERE customer_id = cust_id;
$$;

-- Used in FROM, like a table or view, but parameterized:
SELECT * FROM orders_for_customer(42) WHERE total > 100;

This is effectively a parameterized view — a regular view can't accept arguments, but a table-valued function can, letting you encapsulate a reusable, parameterized query the same way you'd encapsulate a parameterless one in a view.

Performance characteristics differ meaningfully

Scalar functions called per-row in a large query can be a significant performance trap — if a scalar function is invoked once per row of a million-row query and internally runs its own additional query, that's a million extra queries hiding behind what looks like a simple function call in the SELECT list. This is a common, easy-to-miss source of slow reports.
Table-valued functions, especially ones written in pure SQL (not a procedural language), are often inlined by the optimizer much like a view — the outer query's filters can sometimes be pushed down into the function's body, unlike a procedural scalar function that behaves as an opaque black box to the optimizer.

Prefer table-valued functions (or plain views/CTEs) over scalar functions for anything data-set-oriented — reserve scalar functions for lightweight, genuinely per-value computations (formatting, simple math, type conversions) that don't themselves need to query other tables. If you find yourself writing a scalar function that runs a SELECT internally and it's called across many rows, that's usually a sign the logic should be restructured as a join or table-valued function instead.

Related Resources

PostgreSQL: Table Functions

Open as page

PostgreSQL (PL/pgSQL)

CREATE OR REPLACE PROCEDURE safe_transfer(from_acct INT, to_acct INT, amt NUMERIC)
LANGUAGE plpgsql
AS $$
BEGIN
    UPDATE accounts SET balance = balance - amt WHERE id = from_acct;

    IF (SELECT balance FROM accounts WHERE id = from_acct) < 0 THEN
        RAISE EXCEPTION 'Insufficient funds in account %', from_acct
            USING ERRCODE = 'insufficient_funds';
    END IF;

    UPDATE accounts SET balance = balance + amt WHERE id = to_acct;

EXCEPTION
    WHEN insufficient_privilege THEN
        RAISE NOTICE 'Permission issue, transfer aborted';
        RAISE;   -- re-raise to the caller after logging
    WHEN OTHERS THEN
        RAISE NOTICE 'Unexpected error: %', SQLERRM;
        RAISE;   -- never silently swallow unknown errors
END;
$$;

SQL Server (T-SQL)

BEGIN TRY
    BEGIN TRANSACTION;
    UPDATE accounts SET balance = balance - @amt WHERE id = @from_acct;

    IF (SELECT balance FROM accounts WHERE id = @from_acct) < 0
        THROW 50000, 'Insufficient funds', 1;

    UPDATE accounts SET balance = balance + @amt WHERE id = @to_acct;
    COMMIT TRANSACTION;
END TRY
BEGIN CATCH
    ROLLBACK TRANSACTION;
    THROW;   -- re-raise the original error to the caller
END CATCH;

Key principles

Roll back on failure. Any partial writes made before the error must be undone — leaving the transfer example half-applied (debited but not credited) is exactly the atomicity violation ACID exists to prevent. ROLLBACK (explicit, or via automatic transaction abort) is essential in the error path.
Catch specific conditions deliberately, don't blanket-swallow everything. A bare WHEN OTHERS (PL/pgSQL) or empty CATCH block that suppresses all errors silently hides real bugs and data problems — always at least log/re-raise unexpected errors rather than making them disappear.
Use meaningful, application-actionable error signals. Raising a specific error code/message (insufficient_funds, a custom THROW with a clear message) lets the calling application distinguish "the transfer failed because of insufficient funds" (a business-logic condition the app should show the user) from "the transfer failed because of a database bug" (an operational alert).
Re-raise unless you're specifically handling the condition. Catching an error to log it, then continuing as if nothing happened, is rarely correct — usually you want to log/handle and still propagate the failure so the caller (and any transaction it's part of) also reacts correctly.

This question tests whether a candidate treats database-level error handling with the same rigor as application-level error handling — expecting failures, ensuring atomicity is preserved on the failure path, and avoiding the anti-pattern of a catch-all block that silently discards information about what actually went wrong.

Related Resources

PostgreSQL: Errors and Messages (PL/pgSQL)

Open as page

This is a deliberate follow-up to the general trigger question, focused specifically on why teams tend to regret over-relying on them for business logic.

Invisibility / "action at a distance"

UPDATE orders SET status = 'cancelled' WHERE id = 501;

Reading this single line gives no indication that it might also, via triggers: restock inventory, send a cancellation event to a queue, recompute a customer's lifetime-value rollup, and write an audit record. Anyone debugging unexpected inventory numbers has to know to go looking in the schema's trigger definitions — a much less discoverable place than reading the code path that issued the UPDATE.

Harder to test in isolation

Application-layer business logic can typically be unit-tested with mocked dependencies, run in CI, and reviewed as a diff. Trigger logic requires a real (or realistically simulated) database to exercise, is often excluded from the same test suites as application code, and changes to it don't show up in the same code review flow unless the team has specifically built tooling to track schema/trigger changes as first-class artifacts.

Cascading and recursive complexity

A trigger on table A writing to table B, which itself has a trigger writing to table C (or back to A), creates an execution graph that's difficult to trace statically just by reading any single trigger's definition — and if not carefully guarded, can produce infinite loops or very deep, hard-to-predict cascades from what looked like a simple single-row update.

Write-path overhead that's easy to forget

Every trigger adds cost to every matching write, and that cost is invisible from the application's perspective — a query that "should" be a cheap single-row UPDATE might actually be doing substantial additional work under the hood, making performance regressions hard to attribute to their real cause without specifically knowing to check for triggers.

Deployment and rollback friction

Trigger definitions live in the database schema, so changing or removing significant behavior requires a schema migration rather than a simple application code deployment/rollback — this can slow down iteration on business logic that changes frequently, compared to logic that lives in ordinarily-deployed application code.

When triggers are still the right call

None of this means triggers are always wrong — they remain the right tool for invariants that must hold regardless of which system or code path writes to the table (a true database-level guarantee, like an audit trail that must exist even if written to via a raw psql session) or for keeping a tightly-coupled derived value in sync where the alternative (every application code path remembering to update it) is more fragile than the trigger itself. The risk is specifically in using triggers for logic that's really workflow, not invariant enforcement — that logic almost always belongs in application code where it's visible, testable, and easy to change.

Related Resources

PostgreSQL: Triggers

Open as page

The vulnerable pattern: string concatenation

# NEVER do this
query = "SELECT * FROM users WHERE username = '" + user_input + "'"

If user_input is ' OR '1'='1, the resulting query becomes SELECT * FROM users WHERE username = '' OR '1'='1' — a classic SQL injection that returns every row, because the attacker's input was interpreted as SQL syntax rather than a plain string value.

Prepared statements: structure and data are sent separately

# Parameterized / prepared statement
cursor.execute("SELECT * FROM users WHERE username = %s", (user_input,))

Under the hood, this happens in (up to) two round-trips:

Prepare: the driver sends the query template, with placeholders (%s, ?, $1 depending on driver/engine), to the database. The database parses and plans the query structure — where the WHERE clause is, what the placeholder positions mean — without knowing yet what values will fill them.
Execute (bind): the actual parameter values are sent separately, tagged explicitly as data, not as SQL text. The database substitutes them into the already-parsed plan's placeholder slots directly — there's no step where the value is ever concatenated into a string that gets re-parsed as SQL, so there's no syntactic position for injected SQL to "break out" into.

This is why prepared statements close SQL injection completely for parameterized values — it's not a matter of "better escaping," it's an entirely different mechanism where user input structurally cannot be interpreted as code.

Performance: plan reuse

Databases can cache the parsed/planned form of a prepared statement and reuse it across multiple executions with different parameter values — skipping repeated parsing and (in some engines) re-planning for identical-shaped queries executed frequently (e.g., the same SELECT run thousands of times per second with different IDs). Whether this actually happens automatically, and how much it saves, varies significantly by engine and driver — some drivers only prepare once per connection and reuse across calls, others prepare fresh each time unless explicitly told to cache. In practice, the performance benefit is secondary; the security guarantee is the primary reason to always use them.

What prepared statements don't protect against

Parameterization only protects data values — it can't safely parameterize dynamic SQL structure itself, like a table/column name or an ORDER BY direction chosen at runtime:

# Still vulnerable if column_name comes from user input -- can't be a bound parameter
cursor.execute(f"SELECT * FROM users ORDER BY {column_name}")

For genuinely dynamic identifiers, validate against an explicit allow-list of known-safe values (e.g., a fixed set of column names the application recognizes) rather than ever interpolating raw user input into the SQL text, even for something that "looks like" just a column name.

Every ORM and modern database driver supports parameterized queries by default — there is essentially never a legitimate reason to build a query by string-concatenating untrusted input. This is the single highest-leverage, lowest-cost defense against SQL injection and should be treated as a non-negotiable baseline, not an optional hardening step.

Related Resources

OWASP: SQL Injection Prevention Cheat Sheet