Testing, Tooling & Packaging

Difficulty

Plain functions, plain assert

# test_math.py
def add(a, b):
    return a + b

def test_add_positive_numbers():
    assert add(2, 3) == 5

def test_add_negative_numbers():
    assert add(-1, -1) == -2

No self.assertEqual(...) boilerplate (as in unittest) — pytest rewrites plain assert statements at import time to produce detailed failure output (showing both sides of a failed comparison) without any special assertion methods.

Fixtures: reusable, composable setup/teardown

import pytest

@pytest.fixture
def db_connection():
    conn = create_connection()
    yield conn          # provided to the test
    conn.close()          # teardown, runs after the test (even if it failed)

def test_query(db_connection):    # requested by parameter name
    result = db_connection.execute("SELECT 1")
    assert result == 1

A fixture requested by a test function's parameter name is automatically resolved, run, and injected by pytest — yield splits it into setup (before) and teardown (after), with teardown guaranteed to run even if the test fails. Fixtures can depend on other fixtures, be scoped (scope="module", "session") to control how often they're recreated, and be shared across a whole directory via a conftest.py.

parametrize: one test, many inputs

import pytest

@pytest.mark.parametrize("a, b, expected", [
    (2, 3, 5),
    (-1, -1, -2),
    (0, 0, 0),
])
def test_add(a, b, expected):
    assert add(a, b) == expected

This runs test_add three times with three different argument sets, reported as three separate test results — far more maintainable than copy-pasting near-identical test functions for each input case, and each case's failure is reported independently.

Marks: controlling test execution

@pytest.mark.skip(reason="not implemented yet")
def test_future_feature():
    ...

@pytest.mark.skipif(sys.platform == "win32", reason="POSIX-only")
def test_unix_permissions():
    ...

@pytest.mark.xfail(reason="known bug, see #123")
def test_known_broken():
    assert broken_function() == expected

skip/skipif exclude a test from the run entirely; xfail runs the test but doesn't fail the suite if it fails as expected (and flags it if it unexpectedly passes, via strict=True) — useful for tracking known issues without either deleting the test or leaving the suite red.

Organizing a test suite

tests/
    conftest.py       # shared fixtures, available to every test in this directory tree
    test_models.py
    test_views.py
    integration/
        test_api.py

conftest.py files are auto-discovered by pytest and their fixtures are available to every test in the same directory and subdirectories, without any import — the standard way to share setup logic across a test suite.

Interview-ready summary: pytest tests are plain assert-based functions with no required base class; fixtures provide composable, scoped setup/teardown injected by parameter name; parametrize runs one test body against many input sets as separate reported cases; and marks (skip/skipif/xfail) control which tests run and how failures are interpreted.

Related Resources

Basic Mock: recording calls, configuring return values

from unittest.mock import Mock

mock_client = Mock()
mock_client.get_user.return_value = {"id": 1, "name": "Ada"}

result = mock_client.get_user(user_id=1)
result                                # {'id': 1, 'name': 'Ada'}

mock_client.get_user.assert_called_once_with(user_id=1)   # verify how it was called
mock_client.get_user.call_count        # 1

Mock (and MagicMock, which additionally supports dunder methods like __len__/__iter__) auto-creates attributes/methods on access and records every call made to them — assert_called_with, assert_called_once, and .call_args/.call_args_list let you verify the code under test interacted with the dependency correctly, not just that it produced the right final output.

patch(): swapping out a real dependency temporarily

from unittest.mock import patch

# module: app/weather.py
import requests
def get_temperature(city):
    resp = requests.get(f"https://api.weather.com/{city}")
    return resp.json()["temp"]

# test
@patch("app.weather.requests.get")   # patch WHERE IT'S USED, not where it's defined
def test_get_temperature(mock_get):
    mock_get.return_value.json.return_value = {"temp": 72}
    assert get_temperature("boston") == 72
    mock_get.assert_called_once_with("https://api.weather.com/boston")

The critical rule: patch the name where it's looked up, not where it's originally defined — app.weather.requests.get, because app.weather imported requests and looks it up as requests.get in its own namespace; patching requests.get globally would work too but is broader and less precise than needed.

Context-manager form (for patching only part of a test)

def test_something():
    with patch("app.weather.requests.get") as mock_get:
        mock_get.return_value.json.return_value = {"temp": 72}
        assert get_temperature("boston") == 72
    # requests.get is back to normal here, outside the `with` block

Mocking a raised exception

@patch("app.weather.requests.get")
def test_get_temperature_handles_failure(mock_get):
    mock_get.side_effect = ConnectionError("network down")
    with pytest.raises(ConnectionError):
        get_temperature("boston")

side_effect set to an exception class/instance makes the mock raise it when called — the standard way to test error-handling paths without needing to actually trigger a real failure (a downed network, a real database outage).

spec/autospec: catching typos in mocked interfaces

from unittest.mock import create_autospec

mock_client = create_autospec(RealClient)
mock_client.get_uesr(1)   # AttributeError -- typo caught immediately, unlike a bare Mock()

A bare Mock() accepts any attribute/method name silently, which can hide a typo in test code (calling a method that doesn't actually exist on the real object) until it breaks in production. create_autospec/ spec=RealClient constrains the mock to the real object's actual interface, catching such mismatches at test time.

Interview-ready summary: Mock/MagicMock create call-recording fake objects; patch() swaps a real dependency for a mock at the import path where it's used, for the duration of a test. side_effect simulates exceptions/varying return values across calls, and autospec/spec constrain a mock to the real object's actual interface to catch typos that a bare Mock() would silently accept.

The core problem all of these solve: dependency isolation

# Without isolation: installing project A's dependencies could break project B
pip install requests==2.0   # for project A
pip install requests==3.0   # for project B -- now A is broken!

Every project needs its own independent set of installed packages, so version requirements from unrelated projects never collide.

venv: the built-in baseline

python -m venv .venv
source .venv/bin/activate      # on Windows: .venv\Scripts\activate
pip install requests

venv creates a directory with its own Python interpreter symlink and site-packages, isolated from the system Python — no extra installation needed since it ships with Python 3.3+. It only manages the environment itself; you still track dependencies manually (typically in a requirements.txt you maintain by hand or via pip freeze).

virtualenv: the third-party predecessor

Functionally similar to venv but predates it, supports older Python 2 environments, and historically offered a few extra features/faster environment creation — largely superseded by the built-in venv for pure Python 3 projects, but still used in some legacy toolchains.

pipenv: environment + dependency management combined

pipenv install requests
pipenv install --dev pytest
pipenv shell

Combines environment creation with a Pipfile/Pipfile.lock that pins exact resolved versions (including transitive dependencies) for reproducible installs across machines — addressing venv's gap of "you manage the dependency list yourself."

poetry: dependency management + packaging + publishing

poetry init
poetry add requests
poetry add --group dev pytest
poetry install
poetry build      # builds a wheel/sdist
poetry publish     # publishes to PyPI

poetry centralizes dependency declaration, environment management, a lockfile (poetry.lock) for reproducibility, and the packaging/ publishing workflow (building wheels, publishing to PyPI) in one tool built around a single pyproject.toml — the most common modern choice for library/application projects that need all of this together.

conda: a different scope entirely

conda create -n myenv python=3.11 numpy scipy
conda activate myenv

conda manages environments that can include non-Python dependencies too (compiled C/Fortran libraries, CUDA toolkits, compilers) — this is its key differentiator, and why it dominates in data science/scientific computing where packages like NumPy/SciPy historically needed complex native builds that pip alone couldn't easily manage across platforms.

Choosing one

NeedTool
Just isolate a Python environment, manage deps manuallyvenv + requirements.txt
Reproducible installs with a lockfile, simple workflowpipenv
Full library/app lifecycle: deps, lockfile, packaging, publishingpoetry (or modern pip + pyproject.toml + pip-tools)
Non-Python dependencies (native libs, data science stack)conda

Interview-ready summary: venv is the built-in, minimal environment isolator; virtualenv is its older third-party equivalent; pipenv and poetry add dependency locking and (for poetry) packaging on top; conda solves a broader problem — managing non-Python system dependencies alongside Python packages — which is why it's the default in data science despite overlapping with the others for pure-Python use cases.

What type hints look like, and what they don't do at runtime

def greet(name: str) -> str:
    return f"hello, {name}"

greet(42)   # runs FINE at runtime -- Python doesn't check the hint!
            # f"hello, {42}" -> 'hello, 42' -- no error, just probably not intended

Type hints are not enforced by the Python interpreter itself — they're metadata, stored on the function (greet.__annotations__), that tools can optionally read and check. Calling greet(42) doesn't raise TypeError on its own; catching this mismatch requires running a static type checker separately.

Catching errors before running the code

def get_user(user_id: int) -> dict | None:
    ...

user = get_user("123")     # mypy: error: Argument 1 has incompatible type "str"; expected "int"

user = get_user(123)
print(user["name"])          # mypy: error: Item "None" of "dict | None" has no attribute "__getitem__"
                               # (get_user's return type says it might be None!)

mypy/pyright statically analyze the code (no execution needed) and flag both a wrong-type argument and a missed-None-check — the second example is a genuinely common real-world bug class (forgetting a function can return None) that static typing surfaces at review/CI time instead of as a production AttributeError.

Real benefits beyond bug-catching

  • IDE autocomplete/navigation improves dramatically — the editor knows a variable's type and can suggest its actual methods.
  • Self-documenting signaturesdef process(items: list[Order]) -> Summary: communicates intent far better than an untyped signature plus a docstring that can drift out of sync.
  • Safer refactoring — renaming a field or changing a function's signature immediately surfaces every call site the type checker disagrees with.

The limitations

def process(data: Any) -> Any:      # Any opts OUT of checking entirely
    return data.whatever_method()    # never flagged, regardless of what `data` actually is

import third_party_untyped_lib       # if it ships no type stubs, calls into it are unchecked
result = third_party_untyped_lib.do_thing()   # typed as Any by default
  • Any disables checking for anything it touches — a common escape hatch that, if overused, silently reduces how much of the codebase is actually protected.
  • Untyped third-party code (no type stubs, no py.typed marker) is treated as Any by default, creating blind spots at every boundary with such a library.
  • No runtime enforcement — a caller that ignores type errors (or code paths the type checker can't see, like getattr-based dynamic dispatch, or unchecked deserialized JSON) can still pass the wrong type through at runtime; for that, use runtime validation libraries (pydantic) at actual system boundaries.
  • Gradual, not all-or-nothing — a codebase can be partially typed, which is often the pragmatic starting point, but means coverage (and therefore protection) varies file by file until fully adopted.

Interview-ready summary: Type hints let mypy/pyright catch type mismatches and missed-None bugs statically, before running the code, with zero runtime cost and better IDE support as a side benefit — but they're not enforced at runtime, so Any, untyped dependencies, and unchecked dynamic code remain blind spots; use runtime validation (pydantic) at actual data-entry boundaries where static checking alone isn't sufficient.

The old way: setup.py as executable code

# setup.py (legacy)
from setuptools import setup

setup(
    name="myproject",
    version="1.0.0",
    install_requires=["requests>=2.0"],
)

Because setup.py is a Python script, building or installing a package required actually executing arbitrary code just to read its metadata — a real reproducibility and security concern (a malicious or broken setup.py could do anything at install time, and different environments could produce different results running the "same" setup.py).

The modern way: declarative pyproject.toml

# pyproject.toml
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[project]
name = "myproject"
version = "1.0.0"
dependencies = ["requests>=2.0"]
requires-python = ">=3.9"

[project.optional-dependencies]
dev = ["pytest", "mypy", "ruff"]

[project.scripts]
mycli = "myproject.cli:main"

This is plain, static TOML data — no code execution needed to read project metadata, dependencies, or entry points. [build-system] (PEP 518) declares what's needed to build the project before even importing setuptools; [project] (PEP 621) standardizes metadata that used to be scattered across setup.py/setup.cfg/Pipfile in tool-specific formats.

Why this matters: one format, many tools

[tool.poetry.dependencies]
python = "^3.9"
requests = "^2.0"

[tool.pytest.ini_options]
testpaths = ["tests"]

[tool.ruff]
line-length = 100

Beyond the standardized [project] table, tools can add their own [tool.*] sections in the same file — poetry, pytest, ruff, black, mypy all support configuration directly in pyproject.toml, consolidating what used to be setup.cfg, pytest.ini, .flake8, and various other tool-specific config files into one place.

The evolution, in short

  1. distutils/setup.py (original, Python 2 era) — code-based, minimal metadata standardization.
  2. setuptools + setup.cfg — moved some metadata to a declarative INI-style file, but setup.py was often still required as a shim.
  3. pyproject.toml (current standard, PEP 518/621) — fully declarative project metadata and build-system requirements; setup.py is no longer required at all for most modern projects (though setuptools can still use one for complex custom build logic).

Building and publishing today

python -m build          # builds a wheel (.whl) and sdist (.tar.gz) from pyproject.toml
python -m twine upload dist/*   # publishes to PyPI

The build package is the modern, backend-agnostic way to build a distributable package purely from pyproject.toml, regardless of which build backend (setuptools, hatchling, poetry-core) the project uses.

Interview-ready summary: pyproject.toml replaced the historical mix of executable setup.py and scattered config files with one declarative, standardized file for build requirements, project metadata, and dependencies — read by pip, build, and virtually every modern Python tool, eliminating the need to execute arbitrary code just to discover a package's metadata.