Python in Production, Security & Ecosystem

Difficulty

Why unpickling untrusted data is a full remote-code-execution risk

import pickle
import os

class Exploit:
    def __reduce__(self):
        return (os.system, ("echo pwned; rm -rf /tmp/demo",))

payload = pickle.dumps(Exploit())

# Anywhere this runs on untrusted input:
pickle.loads(payload)   # actually executes os.system(...) during unpickling!

__reduce__ is a legitimate protocol pickle uses to know how to reconstruct an object — but it can name any callable, and unpickling calls it. There is no way to "sandbox" pickle.loads() against a maliciously crafted payload; the official docs state plainly: never unpickle data received from an untrusted or unauthenticated source. This is the reason cache backends, message queues, or APIs that use pickle for convenience are a known attack surface if any external input can reach them.

eval()/exec(): running arbitrary source directly

user_input = "__import__('os').system('rm -rf /')"
eval(user_input)   # executes it -- catastrophic if user_input is attacker-controlled

eval (expressions) and exec (statements) execute Python source text directly, with the full power of the language — passing any externally-influenced string to either is effectively giving that input author full code execution in your process.

The safe alternatives

import json
import ast

# Safe: parsing structured data
data = json.loads(user_json_string)              # only produces JSON-compatible values

# Safe: parsing a Python LITERAL (not arbitrary code)
value = ast.literal_eval("[1, 2, {'a': True}]")    # only literals -- no function calls, no imports
ast.literal_eval("__import__('os').system('x')")     # raises ValueError -- not a literal, rejected

json.loads only ever produces plain data (dicts, lists, strings, numbers, booleans, None) — it cannot execute anything. ast.literal_eval is a genuinely safe, restricted subset of eval that parses only Python literals (numbers, strings, tuples, lists, dicts, booleans, None) and explicitly rejects anything resembling a function call or attribute access.

For serialization across a trust boundary, avoid pickle entirely

# Instead of pickling to send data between services / store in a shared cache:
import json
data = json.dumps({"user_id": 1, "action": "login"})

# For richer/faster binary serialization with the same "no code execution" safety:
# msgpack, protobuf, or a schema-validated format (pydantic models -> JSON)

pickle is appropriate only for trusted, same-process or same-organization data you fully control (e.g., caching your own computed Python objects to local disk) — never for data crossing a trust boundary (received from a network request, a third-party queue, user uploads, or any source you don't fully control end to end).

Interview-ready summary: pickle.loads() on untrusted data is equivalent to arbitrary code execution, because a crafted payload's __reduce__ can invoke any callable during deserialization — never unpickle data you don't fully trust. eval/exec on any externally-influenced string is the same class of risk. Use json/ ast.literal_eval for safe parsing, and JSON/msgpack/protobuf instead of pickle for any data that crosses a trust boundary.

The mistake: hardcoded secrets in source

# DON'T -- committed to git, visible in history forever, even if later "removed"
DATABASE_PASSWORD = "hunter2"
API_KEY = "sk-live-abc123..."

Once a secret is committed to version control, it's in the repository's history permanently (removing it from the latest commit doesn't remove it from history) — anyone with read access to the repo, now or in the future, can find it. This is one of the most common real-world causes of security incidents.

Loading from environment variables

import os

DATABASE_PASSWORD = os.environ["DATABASE_PASSWORD"]   # raises KeyError if missing -- fail loudly
API_KEY = os.environ.get("API_KEY")                     # or provide a fallback if optional

Environment variables keep secrets out of the codebase entirely — they're injected at deploy/runtime by the hosting platform, CI secrets store, or orchestration system (Kubernetes secrets, systemd environment files), and never touch the repository.

Local development: .env files (never committed)

# .env  (in .gitignore -- never committed!)
DATABASE_PASSWORD=local-dev-password
API_KEY=sk-test-...
from dotenv import load_dotenv
load_dotenv()   # reads .env into os.environ, for local dev convenience

import os
password = os.environ["DATABASE_PASSWORD"]

python-dotenv loads a local .env file into the process environment, giving the same os.environ access pattern locally as in production — critically, .env must be listed in .gitignore, and a .env.example (with placeholder, non-real values) is committed instead to document what variables are needed.

Dedicated secrets managers for production

import boto3

client = boto3.client("secretsmanager")
secret = client.get_secret_value(SecretId="prod/db-password")["SecretString"]

For production systems, a dedicated secrets manager (AWS Secrets Manager, HashiCorp Vault, Google Secret Manager) adds capabilities plain environment variables don't offer: access auditing (who fetched which secret, when), automatic rotation, and fine-grained access control per service — worth the added complexity for anything beyond small applications.

Separating secrets from non-secret configuration

# settings.py -- safe to commit; no actual secrets here
import os

DEBUG = os.environ.get("DEBUG", "false").lower() == "true"
DATABASE_HOST = os.environ.get("DATABASE_HOST", "localhost")
DATABASE_PASSWORD = os.environ["DATABASE_PASSWORD"]   # the actual secret, injected at runtime

Non-sensitive configuration (feature flags, hostnames, timeouts) can reasonably live in a committed settings file with sensible defaults; only the genuinely sensitive values need to come exclusively from the environment/secrets manager with no committed default at all.

A useful checklist

  • Add .env, *.pem, credentials.json, etc. to .gitignore from day one.
  • Use pre-commit secret-scanning hooks (detect-secrets, gitleaks) to catch accidental commits before they happen.
  • Rotate any secret that was ever accidentally committed — removing it from the latest commit is not sufficient; treat it as compromised.

Interview-ready summary: Secrets belong in environment variables or a dedicated secrets manager, injected at runtime — never hardcoded in source or committed to version control, since git history is effectively permanent. Use .env files (gitignored) for local development convenience, and treat any secret that was ever committed as compromised and due for rotation.

WSGI: the synchronous standard

def application(environ, start_response):
    status = "200 OK"
    headers = [("Content-Type", "text/plain")]
    start_response(status, headers)
    return [b"Hello, World!"]

A WSGI application is literally a callable matching this signature — environ describes the incoming request, start_response sends back the status/headers, and the return value is the response body. Every production WSGI setup (Flask, Django's traditional mode, running under Gunicorn/uWSGI) is built on this one synchronous, blocking-call contract: one request occupies one worker (thread or process) until it's fully handled.

Why WSGI's synchronous model limits concurrency

# A slow WSGI view blocks the entire worker handling it
def slow_view(request):
    time.sleep(5)         # this worker can't serve ANY other request meanwhile
    return HttpResponse("done")

Scaling a WSGI application to handle more concurrent slow requests means adding more worker processes/threads (each with real memory overhead) — there's no way for a single WSGI worker to cooperatively juggle many in-flight requests the way an event loop can.

ASGI: the async-capable successor

async def application(scope, receive, send):
    await send({
        "type": "http.response.start",
        "status": 200,
        "headers": [(b"content-type", b"text/plain")],
    })
    await send({"type": "http.response.body", "body": b"Hello, World!"})

ASGI applications are async callables built around the same scope/receive/send message-passing pattern used throughout asyncio — a single worker process, running an event loop, can hold thousands of concurrent connections open (including long-lived ones like WebSockets or Server-Sent Events, which WSGI has no first-class way to represent at all) as long as each one spends most of its time await-ing rather than blocking.

Framework alignment

FrameworkInterfaceNotes
Flask (classic)WSGISynchronous by design; can run under Gunicorn
Django (traditional views)WSGIAsync views supported since Django 3.1, running under ASGI
FastAPIASGIBuilt async-first, typically served by Uvicorn/Hypercorn
StarletteASGIThe lightweight ASGI toolkit FastAPI itself is built on

Why this distinction matters practically

Choosing WSGI vs ASGI isn't just a framework preference — it determines whether the application can efficiently support WebSockets, long-polling, or very high connection counts with modest resource usage. A traditional synchronous CRUD app with modest concurrency needs is often perfectly well served by WSGI (simpler mental model, mature tooling); an app needing real-time features or very high concurrent connection counts benefits substantially from ASGI's async model.

Interview-ready summary: WSGI is the synchronous, one-request-per- worker standard interface web servers and Python apps have used for decades; ASGI is its async successor, enabling a single worker to cooperatively handle many concurrent (including long-lived, WebSocket) connections via async/await. The choice determines whether the application's concurrency model can scale via an event loop or only via adding more OS-level workers.

Django: batteries included

# models.py
from django.db import models

class Article(models.Model):
    title = models.CharField(max_length=200)
    body = models.TextField()
    published_at = models.DateTimeField(auto_now_add=True)

Django ships an ORM, a migration system, an admin panel generated automatically from your models, authentication/authorization, a templating engine, and form handling — all designed to work together out of the box, following the "convention over configuration" philosophy. Best fit: content-driven sites, internal tools, and applications where you want a mature, opinionated full-stack framework so you're not assembling and gluing together a dozen separate pieces yourself.

Flask: minimal and unopinionated

from flask import Flask, jsonify

app = Flask(__name__)

@app.route("/users/<int:user_id>")
def get_user(user_id):
    return jsonify({"id": user_id, "name": "Ada"})

Flask provides routing and request/response handling and essentially nothing else by default — you choose your own ORM (SQLAlchemy is common), your own auth solution, your own validation library. This flexibility is the point: Flask fits well for small services, APIs with unconventional requirements, or teams that want full control over which pieces go into their stack rather than accepting Django's defaults.

FastAPI: async-first, type-hint-driven

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class User(BaseModel):
    name: str
    age: int

@app.post("/users")
async def create_user(user: User):   # request body validated automatically from the type hint
    return {"id": 1, **user.model_dump()}

FastAPI uses Python type hints (via Pydantic) to automatically validate incoming request data, serialize responses, and generate interactive OpenAPI/Swagger documentation — all with minimal boilerplate compared to manually validating input in Flask/Django views. Being ASGI-native, it handles high-concurrency async workloads (calling other services, databases) efficiently without extra configuration.

Comparison at a glance

DjangoFlaskFastAPI
Philosophybatteries-includedminimal, unopinionatedmodern, type-hint-driven
InterfaceWSGI (ASGI for async views since 3.1)WSGIASGI (async-first)
ORMbuilt-inbring your own (commonly SQLAlchemy)bring your own
Auto validation/docsforms + DRF (for APIs)manual / extensionsbuilt-in (Pydantic + OpenAPI)
Best forfull-stack apps, admin-heavy toolssmall services, custom stackshigh-performance JSON APIs
Learning curvesteeper upfront, faster aftergentle, scales with complexitygentle, strong typing payoff

Practical decision guide

  • Building a content-heavy site or internal admin-driven tool quickly, with a database and want conventions already decided → Django.
  • Building a small service or need full control over exactly which libraries make up the stack → Flask.
  • Building a new JSON API, especially one needing high concurrency, automatic validation, and generated docs → FastAPI.

Interview-ready summary: Django trades flexibility for productivity via a complete, opinionated stack (ORM, admin, auth) best suited to full-stack, database-driven apps. Flask trades built-in features for flexibility, suiting small or custom-architected services. FastAPI is the modern default for high-performance async JSON APIs, using type hints for automatic validation and documentation generation.

The DB-API 2.0 standard: a common low-level interface

import sqlite3

conn = sqlite3.connect("app.db")
cursor = conn.cursor()
cursor.execute("SELECT id, name FROM users WHERE age > ?", (18,))
rows = cursor.fetchall()
conn.close()

Every DB-API-compliant driver (sqlite3 built-in, psycopg2/psycopg for PostgreSQL, pymysql/mysqlclient for MySQL) exposes the same shape: connect() returns a connection, .cursor() gets a cursor, .execute(sql, params) runs a query, and .fetchall()/.fetchone() retrieve results — learning this pattern once transfers across database backends.

SQL injection: the critical vulnerability to avoid

# NEVER do this -- string formatting builds SQL from untrusted input
name = "'; DROP TABLE users; --"
cursor.execute(f"SELECT * FROM users WHERE name = '{name}'")   # SQL INJECTION!

# ALWAYS use parameterized queries -- the driver handles escaping safely
cursor.execute("SELECT * FROM users WHERE name = ?", (name,))   # safe, regardless of content

String-interpolating user input directly into SQL lets an attacker inject arbitrary SQL (as the classic "Bobby Tables" example shows) — parameterized queries (? or %s placeholders, driver-dependent syntax) send the query and its values separately to the database, which handles escaping correctly regardless of what the value contains. This is not optional hardening — it's the baseline requirement for any code that builds a query using data from outside the program.

ORMs: working with objects instead of raw SQL

from sqlalchemy import create_engine, select
from sqlalchemy.orm import Session, DeclarativeBase, Mapped, mapped_column

class Base(DeclarativeBase):
    pass

class User(Base):
    __tablename__ = "users"
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str]
    age: Mapped[int]

engine = create_engine("postgresql://localhost/app")
with Session(engine) as session:
    users = session.scalars(select(User).where(User.age > 18)).all()

SQLAlchemy (and Django's built-in ORM) let you query and manipulate data as Python objects/classes instead of writing raw SQL strings, and automatically parameterize values (so ORM queries are inherently safe from SQL injection for their generated queries). The tradeoff: an abstraction layer that can generate inefficient queries if misused (the classic N+1 query problem — fetching a list, then separately querying related data for each item in a loop) and a learning curve of its own.

Connection pooling for production

engine = create_engine("postgresql://localhost/app", pool_size=10, max_overflow=5)

Opening a new database connection per request is expensive; production applications use a connection pool (built into SQLAlchemy's engine, or a standalone pooler like PgBouncer for Postgres) that reuses a fixed set of open connections across requests, dramatically reducing per-request connection overhead.

Async database access

import asyncpg

async def get_users(pool):
    async with pool.acquire() as conn:
        return await conn.fetch("SELECT * FROM users WHERE age > $1", 18)

Standard DB-API drivers are synchronous/blocking, which would stall an asyncio event loop — async applications (FastAPI, etc.) use async-native drivers (asyncpg, aiomysql) or an async-compatible ORM layer (SQLAlchemy's async engine) instead.

Interview-ready summary: DB-API 2.0 gives a consistent low-level interface across database drivers; most applications build on an ORM (SQLAlchemy, Django ORM) for productivity, understanding the tradeoff of an abstraction layer that can hide inefficient query patterns like N+1. Regardless of layer, always use parameterized queries — never string-format untrusted input into SQL — to avoid SQL injection.