What's the difference between unit tests, integration tests, and how do you measure coverage?
Quick Answer
A **unit test** verifies a single function/class in isolation, typically mocking its dependencies, and should run fast (milliseconds) and deterministically. An **integration test** verifies multiple components working together (a real database, a real HTTP call to another service), catching issues unit tests with mocks can miss (wrong SQL, a misunderstood API contract) at the cost of being slower and sometimes flakier. **Coverage** (via `coverage.py`/`pytest-cov`) measures what percentage of code lines/branches actually executed during tests — a useful signal for finding untested code, but not a proxy for test quality by itself.
Detailed Answer
Unit tests: isolated, fast, deterministic
from unittest.mock import Mock
def calculate_total(cart, tax_service):
subtotal = sum(item.price for item in cart)
return subtotal + tax_service.calculate_tax(subtotal)
def test_calculate_total():
tax_service = Mock()
tax_service.calculate_tax.return_value = 8.0
cart = [Mock(price=50), Mock(price=42)]
assert calculate_total(cart, tax_service) == 100.0 # 92 + 8, using the mocked tax
tax_service is mocked, so this test verifies calculate_total's own
logic in complete isolation — it runs in milliseconds, never touches a
network or real tax-calculation service, and fails only if
calculate_total itself has a bug (not if the real tax service is down).
Integration tests: real components, real contracts
def test_tax_service_integration(real_tax_service):
# uses the ACTUAL tax service (or a realistic test double, e.g. a test DB)
result = real_tax_service.calculate_tax(100.0)
assert result == 8.0 # verifies the REAL contract, not an assumed mock behavior
If the unit test's assumption about tax_service.calculate_tax's
behavior is wrong (e.g., it actually returns a Decimal, not a float,
or takes different arguments), the unit test won't catch that — only an
integration test exercising the real dependency will. Integration tests
are slower and sometimes flakier (network, timing, external state), so
they're typically run less frequently (e.g., in CI, not on every local
save) and in smaller numbers than unit tests.
The testing pyramid: why the mix matters
/\
/ \ <- few end-to-end / integration tests (slow, high confidence)
/----\
/ \ <- some integration tests
/--------\
/ \ <- many unit tests (fast, cheap, run constantly)
/____________\
A healthy suite has many fast unit tests giving quick feedback on logic correctness, and a smaller number of integration/end-to-end tests verifying that the pieces actually fit together correctly — relying on only one type leaves a real gap: all-unit-tests-with-mocks can pass while the real integration is broken; all-integration-tests is prohibitively slow and hard to debug when something fails.
Measuring coverage
pytest --cov=myapp --cov-report=term-missing
Name Stmts Miss Cover Missing
--------------------------------------------------
myapp/services.py 42 3 93% 57-59
myapp/models.py 18 0 100%
Coverage tools instrument the code being tested and report which lines (and, with branch coverage, which conditional branches) actually executed during the test run — the "Missing" column pinpoints exactly which lines have zero test coverage, a good starting point for finding untested code paths.
Why coverage percentage isn't a quality proxy
def divide(a, b):
return a / b
def test_divide():
divide(10, 2) # covers the line, asserts NOTHING -- 100% coverage, useless test
This test achieves 100% line coverage for divide while verifying
absolutely nothing about correctness (no assert, and it doesn't even
test the b == 0 case). High coverage tells you code ran during tests,
not that its behavior was actually verified — it's a useful signal for
finding completely untested code, not a target to chase for its own sake.
Interview-ready summary: Unit tests isolate and fast-check individual units' logic (usually with mocks); integration tests verify real components' actual contracts together, at higher cost but catching what mocked assumptions can miss. Coverage measures what code executed during tests, which is useful for finding gaps but is not itself evidence that the executed code was meaningfully verified.