How do you connect to and query databases from Python?

7 minintermediatedatabasesormsqlalchemysql-injection

Quick Answer

Low-level access goes through the **DB-API 2.0** standard (PEP 249) — every major database driver (`psycopg2`/`psycopg` for Postgres, `sqlite3` built-in, `pymysql`) exposes the same `connect()`/`cursor()`/`execute()`/`fetchall()` interface. Most applications instead use an **ORM** (SQLAlchemy, Django ORM) to work with Python objects instead of raw SQL, at the cost of an abstraction layer — always use **parameterized queries**, never string-formatted SQL, to avoid SQL injection.

Detailed Answer

The DB-API 2.0 standard: a common low-level interface

import sqlite3

conn = sqlite3.connect("app.db")
cursor = conn.cursor()
cursor.execute("SELECT id, name FROM users WHERE age > ?", (18,))
rows = cursor.fetchall()
conn.close()

Every DB-API-compliant driver (sqlite3 built-in, psycopg2/psycopg for PostgreSQL, pymysql/mysqlclient for MySQL) exposes the same shape: connect() returns a connection, .cursor() gets a cursor, .execute(sql, params) runs a query, and .fetchall()/.fetchone() retrieve results — learning this pattern once transfers across database backends.

SQL injection: the critical vulnerability to avoid

# NEVER do this -- string formatting builds SQL from untrusted input
name = "'; DROP TABLE users; --"
cursor.execute(f"SELECT * FROM users WHERE name = '{name}'")   # SQL INJECTION!

# ALWAYS use parameterized queries -- the driver handles escaping safely
cursor.execute("SELECT * FROM users WHERE name = ?", (name,))   # safe, regardless of content

String-interpolating user input directly into SQL lets an attacker inject arbitrary SQL (as the classic "Bobby Tables" example shows) — parameterized queries (? or %s placeholders, driver-dependent syntax) send the query and its values separately to the database, which handles escaping correctly regardless of what the value contains. This is not optional hardening — it's the baseline requirement for any code that builds a query using data from outside the program.

ORMs: working with objects instead of raw SQL

from sqlalchemy import create_engine, select
from sqlalchemy.orm import Session, DeclarativeBase, Mapped, mapped_column

class Base(DeclarativeBase):
    pass

class User(Base):
    __tablename__ = "users"
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str]
    age: Mapped[int]

engine = create_engine("postgresql://localhost/app")
with Session(engine) as session:
    users = session.scalars(select(User).where(User.age > 18)).all()

SQLAlchemy (and Django's built-in ORM) let you query and manipulate data as Python objects/classes instead of writing raw SQL strings, and automatically parameterize values (so ORM queries are inherently safe from SQL injection for their generated queries). The tradeoff: an abstraction layer that can generate inefficient queries if misused (the classic N+1 query problem — fetching a list, then separately querying related data for each item in a loop) and a learning curve of its own.

Connection pooling for production

engine = create_engine("postgresql://localhost/app", pool_size=10, max_overflow=5)

Opening a new database connection per request is expensive; production applications use a connection pool (built into SQLAlchemy's engine, or a standalone pooler like PgBouncer for Postgres) that reuses a fixed set of open connections across requests, dramatically reducing per-request connection overhead.

Async database access

import asyncpg

async def get_users(pool):
    async with pool.acquire() as conn:
        return await conn.fetch("SELECT * FROM users WHERE age > $1", 18)

Standard DB-API drivers are synchronous/blocking, which would stall an asyncio event loop — async applications (FastAPI, etc.) use async-native drivers (asyncpg, aiomysql) or an async-compatible ORM layer (SQLAlchemy's async engine) instead.

Interview-ready summary: DB-API 2.0 gives a consistent low-level interface across database drivers; most applications build on an ORM (SQLAlchemy, Django ORM) for productivity, understanding the tradeoff of an abstraction layer that can hide inefficient query patterns like N+1. Regardless of layer, always use parameterized queries — never string-format untrusted input into SQL — to avoid SQL injection.