What is `concurrent.futures`, and how do `ThreadPoolExecutor` and `ProcessPoolExecutor` differ?
Quick Answer
`concurrent.futures` provides a unified, high-level API (`Executor.submit()`/`.map()`, returning `Future` objects) for running work asynchronously, backed by either a pool of **threads** (`ThreadPoolExecutor`, for I/O-bound work) or a pool of **processes** (`ProcessPoolExecutor`, for CPU-bound work) — the same calling code works with either, so switching between them is usually a one-line change.
Detailed Answer
A unified API over threads or processes
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed
def work(n):
return n * n
# Thread-backed -- good for I/O-bound work
with ThreadPoolExecutor(max_workers=4) as pool:
results = list(pool.map(work, range(10)))
# Process-backed -- good for CPU-bound work; SAME calling code
with ProcessPoolExecutor(max_workers=4) as pool:
results = list(pool.map(work, range(10)))
Both executors expose the same interface (submit, map, context-manager
shutdown), so the choice of thread vs. process pool is largely a config
decision based on whether the workload is I/O-bound or CPU-bound, not a
rewrite of the calling code.
Future objects: getting results as they complete
futures = [pool.submit(work, n) for n in range(10)]
for future in as_completed(futures): # yields futures as they finish, not in order
print(future.result()) # blocks until this specific future is done
submit() returns a Future immediately (non-blocking); .result()
blocks until that particular future completes (re-raising any exception
the task raised); as_completed() yields futures in completion order,
useful when you want to process results as soon as any is ready rather
than waiting for all of them in submission order (which is what .map()
effectively does).
Key differences beyond thread vs. process
ThreadPoolExecutor | ProcessPoolExecutor | |
|---|---|---|
| Backed by | OS threads, shared memory | separate processes, no shared memory |
| Best for | I/O-bound work | CPU-bound work |
| Data passed to workers | any object (shared reference) | must be picklable (copied across process boundary) |
| Overhead per worker | low (~MBs) | higher (full process + interpreter startup) |
| Shared mutable state | works (with locking) | requires explicit IPC (multiprocessing.Manager, shared memory) |
| GIL impact | still one thread runs bytecode at a time | each process has its own GIL — true parallelism |
Exception handling
future = pool.submit(lambda: 1 / 0)
future.result() # raises ZeroDivisionError -- exceptions propagate through .result()
Exceptions raised inside the worker are captured and re-raised when you
call .result(), rather than crashing the worker pool silently — a
common gotcha is calling .submit() in a loop and never checking
.result(), silently swallowing failures.
Interview-ready summary: concurrent.futures gives one API
(submit/map/Future) for both thread- and process-backed
concurrency — pick ThreadPoolExecutor for I/O-bound work and
ProcessPoolExecutor for CPU-bound work, and remember that process pool
arguments/results must be picklable since they cross a real process
boundary.