What is the GIL (Global Interpreter Lock), and why does it exist?

The GIL is a mutex in CPython that allows only **one thread to execute Python bytecode at a time**, even on a multi-core machine. It exists because CPython's memory management (reference counting) isn't thread-safe by default, and the GIL was the simplest way to make the interpreter thread-safe without requiring fine-grained locking on every object. The practical consequence: pure-Python CPU-bound code doesn't get faster with more threads; I/O-bound code still benefits because the GIL is released during blocking I/O.

When should you use threading vs multiprocessing vs asyncio?

**Threading**: I/O-bound work with many blocking calls (network requests, file I/O) where you want concurrency with minimal code changes to existing synchronous code. **Multiprocessing**: CPU-bound work that needs true multi-core parallelism, bypassing the GIL by using separate processes. **Asyncio**: high-concurrency I/O-bound work (thousands of connections) where thread-per-task overhead would be too high, using a single-threaded event loop with cooperative `async`/`await`.

How does asyncio's event loop work, and what does `async`/`await` actually do?

The **event loop** is a single-threaded scheduler that runs one coroutine at a time, switching to another whenever the running one hits an `await` on something not yet ready (I/O, a timer, another coroutine) — it's cooperative multitasking, not preemptive. `async def` defines a coroutine function; calling it returns a coroutine object (nothing runs yet); `await` suspends the current coroutine until the awaited thing completes, yielding control back to the event loop to run other ready tasks in the meantime.

What's the difference between a coroutine and a generator?

Both are built on the same underlying mechanism (a suspendable function frame), but they serve different purposes: a **generator** *produces* a sequence of values one at a time via `yield`, consumed by iteration (`for`, `next()`). A **coroutine** (`async def`) is designed to be *awaited* — it represents a unit of asynchronous work that eventually produces one result, driven by an event loop rather than a `for` loop, and uses `await` instead of `yield` to suspend.

How do you run CPU-bound work efficiently in Python given the GIL?

Move the CPU-bound work to **separate processes** (`multiprocessing`, `concurrent.futures.ProcessPoolExecutor`) so each gets its own interpreter and GIL, achieving true multi-core parallelism. Alternatively, push the hot loop into a **C extension or a library that releases the GIL** during the computation (NumPy, Cython with `nogil`, or a Rust extension), so pure-Python threads around it can still run concurrently.

What is `concurrent.futures`, and how do `ThreadPoolExecutor` and `ProcessPoolExecutor` differ?

`concurrent.futures` provides a unified, high-level API (`Executor.submit()`/`.map()`, returning `Future` objects) for running work asynchronously, backed by either a pool of **threads** (`ThreadPoolExecutor`, for I/O-bound work) or a pool of **processes** (`ProcessPoolExecutor`, for CPU-bound work) — the same calling code works with either, so switching between them is usually a one-line change.

How do you handle race conditions and use locks in threaded Python code?

A race condition happens when multiple threads read-modify-write shared state without synchronization, so the GIL alone does **not** prevent it — the GIL only serializes individual bytecode instructions, not multi-step operations like `x += 1` (which is read, add, store as separate steps). Use `threading.Lock` (or `RLock`, `Semaphore`, `Condition`) to make a critical section atomic, typically via the lock as a context manager (`with lock:`).

What are common asyncio pitfalls?

The most common ones: calling a **blocking** (synchronous) function inside a coroutine without offloading it, which freezes the entire event loop, not just that task; forgetting to `await` a coroutine (creating it but never running it, which raises a "coroutine was never awaited" warning); and creating tasks with `asyncio.create_task()` but never keeping a reference or awaiting them, letting them be garbage-collected mid-execution or silently swallow exceptions.

How does `multiprocessing` share data between processes?

Since separate processes don't share memory by default, `multiprocessing` provides explicit IPC (inter-process communication) mechanisms: `Queue`/`Pipe` for passing messages between processes, `Value`/`Array` for simple shared memory backed by `ctypes`, and `Manager` for more complex shared objects (dicts, lists) proxied through a separate manager process. Anything passed to a worker (arguments, return values) is implicitly pickled and copied across the process boundary.

What is PEP 703 (free-threaded / no-GIL Python), and what does it change?

PEP 703, implemented as an **experimental build option starting in Python 3.13** (`python3.13t`), removes the Global Interpreter Lock, replacing coarse-grained global locking with finer-grained per-object synchronization (including biased reference counting) so multiple threads can execute Python bytecode truly in parallel. It aims to make `threading` a viable path to real multi-core speedups for CPU-bound Python code, but as of 3.13/3.14 it's opt-in, carries a single-threaded performance cost, and the C-extension ecosystem is still catching up.

How do you cancel or add a timeout to an asyncio task safely?

Use `asyncio.wait_for(coro, timeout=...)` to raise `TimeoutError` if a coroutine doesn't finish in time (it cancels the underlying task internally), or call `task.cancel()` directly to request cancellation, which raises `asyncio.CancelledError` **inside** the task at its next `await` point. Always let `CancelledError` propagate (don't swallow it with a bare `except Exception`) unless you specifically need to run cleanup first — bare-catching it breaks cancellation and can hang shutdown.

Concurrency, Parallelism & Async

The GIL, threading vs multiprocessing, asyncio's event loop, and practical concurrency patterns.

Difficulty

Open as page

What the GIL actually locks

CPython's memory management relies on reference counting: every object tracks how many references point to it, and is freed when that count hits zero. Incrementing/decrementing a refcount from multiple threads simultaneously, without synchronization, is a data race that could corrupt an object's refcount (leading to premature frees or memory leaks). The GIL solves this crudely but effectively: only one thread runs Python bytecode at a time, so refcount updates are never actually concurrent.

import threading

counter = 0

def increment():
    global counter
    for _ in range(1_000_000):
        counter += 1

threads = [threading.Thread(target=increment) for _ in range(4)]
[t.start() for t in threads]
[t.join() for t in threads]
print(counter)   # 4,000,000 -- correct, thanks to the GIL serializing bytecode execution

Without the GIL (or equivalent fine-grained locking), this kind of shared counter update from multiple threads would risk lost updates.

Why "more threads" doesn't mean "more CPU throughput"

def cpu_bound(n):
    return sum(i * i for i in range(n))

# Running cpu_bound() on 4 threads doesn't run 4x faster --
# only one thread executes Python bytecode at any instant, GIL or not.

For CPU-bound pure-Python work, threads provide concurrency (multiple things making progress, interleaved) but not parallelism (multiple things running simultaneously on separate cores) — the GIL serializes bytecode execution regardless of how many OS threads and CPU cores exist.

Why threading still helps for I/O-bound work

import time

def slow_io():
    time.sleep(1)   # releases the GIL while "blocked"

Blocking operations that call into C (file/network I/O, time.sleep, many library calls) release the GIL while waiting, letting other Python threads run bytecode in the meantime. This is why threading/concurrent.futures.ThreadPoolExecutor genuinely speed up I/O-bound workloads (e.g., many concurrent HTTP requests) even though the GIL exists — the bottleneck (waiting on the network) isn't CPU work at all.

The real workaround for CPU-bound parallelism: separate processes

Since the GIL is per-interpreter process, multiprocessing sidesteps it entirely by running separate Python processes, each with its own GIL, achieving true multi-core parallelism for CPU-bound work at the cost of inter-process communication overhead (data must be pickled/copied between processes, not shared directly).

PEP 703: free-threaded (no-GIL) Python

Starting with Python 3.13, an experimental free-threaded build (python3.13t) removes the GIL, using more fine-grained locking instead — aiming to give real multi-core parallelism to threaded Python code. As of this writing it's still opt-in and the ecosystem (C extensions especially) is still adapting; the standard GIL-enabled build remains the default.

Interview-ready summary: The GIL is CPython's mutex ensuring only one thread executes Python bytecode at a time, needed because refcount-based memory management isn't otherwise thread-safe. It doesn't prevent threading from helping I/O-bound work (the GIL is released during blocking calls), but it does prevent threads from speeding up CPU-bound pure-Python code — for that, use multiprocessing, or Python 3.13+'s experimental free-threaded build.

Related Resources

GlobalInterpreterLock — Python wiki

Open as page

The decision framework

Workload	Best tool	Why
CPU-bound (heavy computation)	`multiprocessing`	Bypasses the GIL via separate processes — actual multi-core parallelism
I/O-bound, moderate concurrency (10s-100s)	`threading`	Simple to retrofit onto existing sync code; GIL releases during blocking I/O
I/O-bound, very high concurrency (1000s of connections)	`asyncio`	One thread, no per-task OS thread overhead; scales to far more concurrent tasks

Threading: easiest retrofit for I/O-bound code

from concurrent.futures import ThreadPoolExecutor
import requests

def fetch(url):
    return requests.get(url).status_code

with ThreadPoolExecutor(max_workers=10) as pool:
    results = list(pool.map(fetch, urls))

Existing synchronous libraries (like requests) work unmodified inside threads — no need to rewrite calls as async/await. Downside: each thread has real OS overhead (~MBs of stack space each), so this doesn't scale gracefully to tens of thousands of concurrent tasks.

Multiprocessing: real parallelism for CPU-bound work

from concurrent.futures import ProcessPoolExecutor

def cpu_heavy(n):
    return sum(i * i for i in range(n))

with ProcessPoolExecutor() as pool:
    results = list(pool.map(cpu_heavy, [10**7] * 4))   # genuinely runs on 4 cores

Each process has its own interpreter and GIL, so cpu_heavy genuinely runs in parallel across cores — at the cost of process startup overhead and needing to pickle data across the process boundary (no shared memory by default).

Asyncio: massive I/O concurrency, single thread

import asyncio
import aiohttp

async def fetch(session, url):
    async with session.get(url) as resp:
        return resp.status

async def main(urls):
    async with aiohttp.ClientSession() as session:
        return await asyncio.gather(*(fetch(session, u) for u in urls))

asyncio.run(main(urls))   # can comfortably handle thousands of concurrent requests

A single thread cooperatively switches between thousands of pending coroutines whenever one is waiting on I/O — no OS thread per task, so memory/scheduling overhead per concurrent task is far lower than threading. The catch: it requires an async-compatible library stack (aiohttp instead of requests, asyncpg instead of a blocking DB driver) — mixing in a blocking call anywhere freezes the entire event loop, not just one task.

Combining them

It's common to combine approaches: use asyncio for I/O concurrency, and delegate genuinely CPU-bound chunks of work to a ProcessPoolExecutor via loop.run_in_executor(...) so they don't block the event loop.

Interview-ready summary: Pick multiprocessing for CPU-bound parallelism (the GIL makes threads useless for this), threading for moderate I/O concurrency with minimal code changes, and asyncio when you need very high I/O concurrency and are willing to adopt an async library stack throughout.

Related Resources

concurrency — Python docs overview

Open as page

`async def` creates a coroutine function

async def fetch_data():
    print("start")
    await asyncio.sleep(1)   # suspend here; event loop runs other work meanwhile
    print("done")
    return 42

coro = fetch_data()   # nothing has run yet -- just a coroutine object

Calling fetch_data() does not execute the body — like a generator, it returns a coroutine object that must be driven (via await, asyncio.run, or scheduled as a task) for its code to actually execute.

The event loop: single-threaded cooperative scheduling

import asyncio

async def worker(name, delay):
    print(f"{name} starting")
    await asyncio.sleep(delay)
    print(f"{name} done")

async def main():
    await asyncio.gather(
        worker("A", 2),
        worker("B", 1),
    )

asyncio.run(main())
# A starting
# B starting
# B done      <- after ~1s
# A done      <- after ~2s (not 3s! -- they ran concurrently)

asyncio.gather schedules both worker coroutines as concurrent tasks. When worker("A", 2) hits await asyncio.sleep(2), it tells the event loop "wake me up in 2 seconds, and meanwhile run something else" — the loop then runs worker("B", 1) until it also suspends. This is why the total time is ~2s (the max), not ~3s (the sum): the two sleep calls overlap because a single thread is interleaving them, not running them in true parallel, but scheduling them so neither blocks the other while waiting.

What `await` actually does

await only works on awaitables (coroutines, Tasks, Futures). It:

Suspends the current coroutine at that point, saving its state (much like a generator's suspended frame — coroutines are, in fact, implemented on the same underlying mechanism as generators).
Registers a callback so the event loop knows to resume this coroutine once the awaited thing completes.
Returns control to the event loop, which picks another ready task/ callback to run.
When the awaited operation finishes, the event loop resumes the original coroutine exactly where it left off, and await evaluates to the awaited thing's result.

Cooperative, not preemptive

Because there's no operating-system-level time-slicing, a coroutine that never awaits anything (e.g., a tight CPU-bound loop with no suspension points) blocks the entire event loop — no other coroutine gets to run until it returns. This is the single most important asyncio rule: only await genuinely yields control; ordinary synchronous code inside an async def function runs to completion without interruption.

Interview-ready summary: The event loop is a single-threaded scheduler running one coroutine at a time; await is the only point where control can be voluntarily handed back to the loop, letting other coroutines make progress while the current one waits on I/O. This cooperative model gives massive I/O concurrency on one thread, but a coroutine that blocks without awaiting stalls every other task.

Related Resources

asyncio — Python docs

Open as page

Same suspension mechanism, different intent

def gen():                  # generator: produces a SEQUENCE of values
    yield 1
    yield 2

async def coro():           # coroutine: produces ONE eventual result
    await asyncio.sleep(1)
    return 42

Both gen() and coro() return objects representing suspended computation rather than running immediately — under the hood, CPython's native coroutines (async def) are implemented with the same frame- suspension machinery that powers generators (historically, asyncio was even built directly on @types.coroutine-decorated generators before native coroutine syntax existed).

How they're driven differs

# Generator: driven by iteration
for value in gen():
    print(value)

# Coroutine: driven by the event loop, via await/asyncio.run
result = await coro()          # inside another coroutine
result = asyncio.run(coro())    # or, at the top level

You can't for loop over a coroutine (it's not iterable in that sense), and you can't await a plain generator (unless it's specifically decorated as a generator-based coroutine, a legacy pattern superseded by async def). Trying to iterate a coroutine directly, or await a plain generator, raises a TypeError.

Purpose: many values vs. one eventual value

A generator's job is to lazily produce a sequence: yield each value, potentially infinitely many, consumed one at a time.
A coroutine's job is to represent a single asynchronous operation that will eventually complete with one result (or raise) — conceptually closer to a Future/Promise than to an iterator, even though it's implemented with similar suspension internals.

Async generators: a hybrid

async def async_range(n):
    for i in range(n):
        await asyncio.sleep(0)   # yield control back to the event loop
        yield i

async for i in async_range(5):
    print(i)

Python also supports async generators (async def containing yield), which combine both: they lazily produce a sequence and can await between values, consumed with async for instead of a plain for loop — used for streaming data over an async source (e.g., reading paginated results from an async database driver).

Interview-ready summary: Coroutines and generators share the same suspend/resume mechanism, but generators (yield, driven by for/next) model lazily producing a sequence of values, while coroutines (await, driven by the event loop) model a single asynchronous operation resolving to one eventual result — async generators combine both when you need a lazily-produced sequence that can also await I/O between items.

Related Resources

Coroutines and Tasks — Python docs

Open as page

Option 1: separate processes

from concurrent.futures import ProcessPoolExecutor
import math

def is_prime(n):
    if n < 2:
        return False
    return all(n % i for i in range(2, int(math.sqrt(n)) + 1))

numbers = list(range(10_000_000, 10_000_100))
with ProcessPoolExecutor() as pool:
    results = list(pool.map(is_prime, numbers))   # genuinely parallel across cores

Each worker process has its own Python interpreter and its own GIL, so CPU-bound work in different processes truly runs simultaneously on separate cores. The cost: data passed to/from worker processes must be pickled, and process startup has real overhead — this pays off for coarse-grained, CPU-heavy chunks of work, not for many tiny tasks.

Option 2: push the hot loop into native code that releases the GIL

import numpy as np

# Pure Python loop: single-threaded, GIL-bound the whole time
total = sum(x * x for x in range(10_000_000))

# NumPy: the actual multiply-and-sum runs in C, releasing the GIL
arr = np.arange(10_000_000)
total = (arr * arr).sum()

NumPy (and similar C-extension libraries) do the heavy numeric work inside C code that releases the GIL during the computation — this is why NumPy-heavy code can benefit from threads even for "CPU-bound" work: the actual bottleneck has moved out of GIL-held Python bytecode into GIL-free C. Cython supports the same idea explicitly via nogil blocks for hand-written extensions.

Why `threading` alone doesn't help here

from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(max_workers=4) as pool:
    results = list(pool.map(is_prime, numbers))   # NOT faster than serial --
                                                     # still one thread executing
                                                     # Python bytecode at a time

Since is_prime is pure Python arithmetic (no I/O, no GIL-releasing C call), running it across threads doesn't parallelize the actual work — the GIL still serializes bytecode execution across all four threads.

Choosing between the two options

Reach for multiprocessing when the computation is written in plain Python and can be chunked into independent units of work. process count.
Reach for NumPy/Cython/native extensions when the computation is numeric/vectorizable — this usually gives a far larger speedup than multiprocessing alone, since it avoids both GIL contention and Python's general interpreter overhead.

Interview-ready summary: For CPU-bound pure-Python work, use multiprocessing/ProcessPoolExecutor to get separate interpreters (and GILs) running truly in parallel. For numeric/vectorizable work, push the hot loop into a library like NumPy that does the heavy lifting in C and releases the GIL, which often beats multiprocessing's overhead entirely.

Related Resources

multiprocessing — Python docs

Open as page

A unified API over threads or processes

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed

def work(n):
    return n * n

# Thread-backed -- good for I/O-bound work
with ThreadPoolExecutor(max_workers=4) as pool:
    results = list(pool.map(work, range(10)))

# Process-backed -- good for CPU-bound work; SAME calling code
with ProcessPoolExecutor(max_workers=4) as pool:
    results = list(pool.map(work, range(10)))

Both executors expose the same interface (submit, map, context-manager shutdown), so the choice of thread vs. process pool is largely a config decision based on whether the workload is I/O-bound or CPU-bound, not a rewrite of the calling code.

`Future` objects: getting results as they complete

futures = [pool.submit(work, n) for n in range(10)]

for future in as_completed(futures):    # yields futures as they finish, not in order
    print(future.result())               # blocks until this specific future is done

submit() returns a Future immediately (non-blocking); .result() blocks until that particular future completes (re-raising any exception the task raised); as_completed() yields futures in completion order, useful when you want to process results as soon as any is ready rather than waiting for all of them in submission order (which is what .map() effectively does).

Key differences beyond thread vs. process

	`ThreadPoolExecutor`	`ProcessPoolExecutor`
Backed by	OS threads, shared memory	separate processes, no shared memory
Best for	I/O-bound work	CPU-bound work
Data passed to workers	any object (shared reference)	must be picklable (copied across process boundary)
Overhead per worker	low (~MBs)	higher (full process + interpreter startup)
Shared mutable state	works (with locking)	requires explicit IPC (`multiprocessing.Manager`, shared memory)
GIL impact	still one thread runs bytecode at a time	each process has its own GIL — true parallelism

Exception handling

future = pool.submit(lambda: 1 / 0)
future.result()   # raises ZeroDivisionError -- exceptions propagate through .result()

Exceptions raised inside the worker are captured and re-raised when you call .result(), rather than crashing the worker pool silently — a common gotcha is calling .submit() in a loop and never checking .result(), silently swallowing failures.

Interview-ready summary: concurrent.futures gives one API (submit/map/Future) for both thread- and process-backed concurrency — pick ThreadPoolExecutor for I/O-bound work and ProcessPoolExecutor for CPU-bound work, and remember that process pool arguments/results must be picklable since they cross a real process boundary.

Related Resources

concurrent.futures — Python docs

Open as page

Why the GIL doesn't prevent race conditions

import threading

counter = 0

def increment():
    global counter
    for _ in range(100_000):
        counter += 1   # NOT atomic: read counter, add 1, write back -- 3 separate steps

threads = [threading.Thread(target=increment) for _ in range(4)]
[t.start() for t in threads]
[t.join() for t in threads]
print(counter)   # often less than 400,000 -- lost updates!

counter += 1 compiles to multiple bytecode instructions (load, add, store). The GIL guarantees each individual bytecode instruction runs atomically, but a thread can be swapped out between those instructions — so two threads can both read the same value before either writes back the incremented result, and one increment gets lost.

Fixing it with a `Lock`

counter = 0
lock = threading.Lock()

def increment():
    global counter
    for _ in range(100_000):
        with lock:              # only one thread executes this block at a time
            counter += 1

threads = [threading.Thread(target=increment) for _ in range(4)]
[t.start() for t in threads]
[t.join() for t in threads]
print(counter)   # always exactly 400,000

with lock: acquires the lock on entry and releases it on exit (even if an exception occurs) — while held, no other thread can enter a block guarded by the same lock, making the read-modify-write sequence atomic with respect to other threads using that lock.

Other synchronization primitives

RLock (reentrant lock): the same thread can acquire it multiple times (e.g., recursive functions or nested methods that both need the lock) without deadlocking itself; a plain Lock would deadlock in that case.
Semaphore(n): allows up to n threads to hold it simultaneously — useful for capping concurrent access to a limited resource (e.g., at most 5 concurrent connections to a rate-limited service).
Condition: lets threads wait for some condition to become true, signaled by another thread (.wait()/.notify()) — the basis for producer/consumer patterns.
Event: a simple flag threads can wait on until another thread sets it (.set()), useful for one-shot "start now" or "shutdown" signals.

Avoiding locks altogether: prefer thread-safe data structures/patterns

from queue import Queue

q = Queue()   # thread-safe by design -- internally uses its own locking

def producer():
    q.put("item")

def consumer():
    item = q.get()

queue.Queue is thread-safe internally, so a producer/consumer pattern built around it needs no manual locking at all — preferring built-in thread-safe structures (Queue, queue.LifoQueue) over hand-rolled locking is usually safer and simpler than reasoning about locks directly.

Interview-ready summary: The GIL makes individual bytecode instructions atomic, but not multi-instruction operations like x += 1 — race conditions are still possible and must be guarded with threading.Lock (or a higher-level primitive like Queue) around any critical section that reads and writes shared mutable state.

Related Resources

threading — Python docs

Open as page

Pitfall 1: blocking calls freeze the whole event loop

import asyncio
import time

async def bad():
    time.sleep(5)     # BLOCKS the entire event loop -- every other task stalls too!

async def good():
    await asyncio.sleep(5)   # yields control -- other tasks keep running

time.sleep (or any synchronous, blocking I/O call, or CPU-heavy computation) doesn't await anything, so the event loop has no opportunity to run other coroutines while it executes — a single blocking call anywhere stalls every pending task, not just the one that made it. Offload genuinely blocking/synchronous work with loop.run_in_executor(None, blocking_func) instead.

Pitfall 2: creating a coroutine without awaiting it

async def fetch():
    ...

async def main():
    fetch()          # BUG: creates a coroutine object but never runs it!
    # RuntimeWarning: coroutine 'fetch' was never awaited

Calling fetch() just constructs a coroutine object — it does nothing until awaited, scheduled as a task, or passed to asyncio.gather(). Forgetting the await is one of the most common asyncio bugs, and Python does emit a RuntimeWarning for it, but it's easy to miss in noisy logs.

Pitfall 3: fire-and-forget tasks losing their reference

async def main():
    asyncio.create_task(background_job())   # BUG: no reference kept!
    await asyncio.sleep(10)
    # background_job's task can be garbage-collected before it finishes,
    # silently cancelling it, and any exception it raised is never surfaced

async def main():
    task = asyncio.create_task(background_job())   # keep a reference
    try:
        await asyncio.sleep(10)
    finally:
        await task   # ensure it completes, and its exceptions propagate

The asyncio docs explicitly warn: "Save a reference to the result of this function, to avoid a task disappearing mid-execution." Without a kept reference, the event loop has no obligation to keep the task alive, and exceptions raised inside an un-awaited, unreferenced task are simply logged (via asyncio's default exception handler) rather than raised anywhere your code can catch them.

Pitfall 4: mixing `asyncio.run()` calls / event loop confusion

async def main():
    ...

asyncio.run(main())
asyncio.run(main())   # fine -- each call creates and closes its own loop

# But calling asyncio.run() from WITHIN a running coroutine is an error:
async def bad():
    asyncio.run(other_coro())   # RuntimeError: asyncio.run() cannot be called
                                  # from a running event loop

asyncio.run() is meant to be the single top-level entry point per program/script — nesting it inside already-running async code doesn't work; use await other_coro() instead.

Interview-ready summary: The recurring theme across asyncio pitfalls is forgetting that cooperative scheduling depends entirely on await points: blocking calls without an await freeze everything, coroutines created without await/create_task never run, and tasks created without a kept reference can vanish silently along with any exception they raised.

Related Resources

Developing with asyncio — Python docs

Open as page

The default: no shared memory, everything is copied

from multiprocessing import Process

data = [1, 2, 3]

def worker(data):
    data.append(4)   # mutates the CHILD process's own copy only

p = Process(target=worker, args=(data,))
p.start()
p.join()
print(data)   # [1, 2, 3] -- parent's list is untouched; child had a separate copy

Unlike threads (which share the same memory space), each process gets its own independent memory — arguments passed to Process/Pool are pickled, sent to the child, and unpickled there, so mutations in the child never affect the parent's original objects.

`Queue` and `Pipe`: message passing

from multiprocessing import Process, Queue

def worker(q):
    q.put("result from child")

q = Queue()
p = Process(target=worker, args=(q,))
p.start()
print(q.get())   # 'result from child'
p.join()

Queue is a process-safe FIFO for passing arbitrary picklable objects between processes — the standard way to send results back from workers, or to distribute work items to them. Pipe() provides a lower-level, two-endpoint duplex connection between exactly two processes.

`Value`/`Array`: real shared memory for simple types

from multiprocessing import Process, Value

def worker(counter):
    with counter.get_lock():        # Value provides a built-in lock
        counter.value += 1

counter = Value("i", 0)             # 'i' = ctypes int, backed by shared memory
processes = [Process(target=worker, args=(counter,)) for _ in range(4)]
[p.start() for p in processes]
[p.join() for p in processes]
print(counter.value)   # 4

Value/Array allocate memory in a shared segment (via ctypes) that multiple processes can read/write directly — much faster than pickling through a Queue for simple numeric/fixed-type shared state, but limited to ctypes-compatible types.

`Manager`: shared Python objects (dict, list, etc.)

from multiprocessing import Process, Manager

def worker(shared_dict, key, value):
    shared_dict[key] = value

with Manager() as manager:
    shared_dict = manager.dict()
    processes = [Process(target=worker, args=(shared_dict, i, i * i)) for i in range(4)]
    [p.start() for p in processes]
    [p.join() for p in processes]
    print(dict(shared_dict))   # {0: 0, 1: 1, 2: 4, 3: 9}

A Manager runs a separate server process holding the real object; other processes get a proxy that forwards operations to it over IPC — more flexible than Value/Array (supports dicts, lists, arbitrary picklable values) but slower, since every access is a message round-trip, not a direct memory read.

Interview-ready summary: Processes don't share memory by default, so multiprocessing provides explicit channels: Queue/Pipe for message passing, Value/Array for fast shared memory of simple ctypes types, and Manager for shared, proxied Python objects when you need richer data structures at the cost of IPC overhead.

Related Resources

multiprocessing — Python docs

Open as page

What changes

# Standard build -- has the GIL
python3.13 my_script.py

# Free-threaded build -- experimental, no GIL
python3.13t my_script.py

In a free-threaded build, the single global lock is replaced by finer-grained mechanisms: biased reference counting (a fast path for the common case where an object is only touched by the thread that owns it, falling back to atomic operations when shared across threads) and per-object or per-data-structure locking where needed, instead of one lock protecting everything.

The practical implication: threads can now use multiple cores

from concurrent.futures import ThreadPoolExecutor

def cpu_heavy(n):
    return sum(i * i for i in range(n))

with ThreadPoolExecutor(max_workers=4) as pool:
    results = list(pool.map(cpu_heavy, [10**7] * 4))
    # on a free-threaded build, this can now actually run ~4x faster;
    # on a standard GIL build, it doesn't (see the GIL question)

This is the headline benefit: CPU-bound pure-Python code using threading/ThreadPoolExecutor can see genuine multi-core speedups without needing to switch to multiprocessing and its IPC/pickling overhead.

The tradeoffs, as of Python 3.13/3.14

Single-threaded performance cost: removing the GIL's cheap global lock in favor of finer-grained (sometimes atomic) operations adds overhead to single-threaded code — early free-threaded builds showed a measurable (though steadily improving) single-thread slowdown compared to the standard GIL build.
C extension compatibility: many existing C extensions assumed GIL protection for their own internal state and need updates (Py_mod_gil slot, thread-safety audits) to be safe under free-threading; the ecosystem is actively migrating but not fully there yet.
Still opt-in: the GIL-enabled build remains the default and officially supported configuration; free-threading is offered as an alternative build for testing and gradual adoption, not (yet) the default for everyone.

Why it matters for interviews

It's a live, actively-evolving area of CPython — a good signal that a candidate follows the ecosystem, understands why the GIL existed in the first place (reference-counting safety), and can articulate what replacing it actually requires (finer-grained locking, not just deleting a mutex).

Interview-ready summary: PEP 703 introduces an experimental, opt-in build of CPython (3.13+) that removes the GIL via finer-grained synchronization (biased reference counting plus targeted locking), enabling real multi-core parallelism for threaded CPU-bound code — at the cost of some single-threaded overhead and an still-adapting C-extension ecosystem, which is why it isn't the default build yet.

Related Resources

PEP 703 – Making the Global Interpreter Lock Optional

Python support for free threading

Open as page

Timeouts with `asyncio.wait_for`

import asyncio

async def slow_operation():
    await asyncio.sleep(10)
    return "done"

async def main():
    try:
        result = await asyncio.wait_for(slow_operation(), timeout=2)
    except TimeoutError:
        print("timed out after 2 seconds")

asyncio.run(main())

wait_for runs the coroutine as a task and races it against the timeout; if the timeout elapses first, it cancels the task internally and raises TimeoutError (in Python 3.11+; asyncio.TimeoutError, an alias of the same, in earlier versions) to the caller.

Python 3.11+'s `asyncio.timeout()`: a context-manager alternative

async def main():
    try:
        async with asyncio.timeout(2):
            await slow_operation()
    except TimeoutError:
        print("timed out")

asyncio.timeout() applies a deadline to an entire async with block (potentially covering multiple awaits), which is more flexible than wait_for when you want one timeout to cover several sequential operations rather than wrapping each individually.

Explicit cancellation with `task.cancel()`

async def worker():
    try:
        await asyncio.sleep(100)
    except asyncio.CancelledError:
        print("cleaning up before exiting")
        raise             # IMPORTANT: re-raise so the cancellation actually completes

async def main():
    task = asyncio.create_task(worker())
    await asyncio.sleep(1)
    task.cancel()          # requests cancellation
    await task              # propagates CancelledError here, after cleanup runs

task.cancel() doesn't stop the task immediately — it arranges for CancelledError to be raised inside the task at its next await point. The task can catch that to run cleanup (closing a connection, releasing a lock), but must re-raise it afterward; swallowing CancelledError silently prevents the cancellation from actually taking effect and can leave .cancel() callers waiting forever.

Why you must never blanket-catch `CancelledError`

async def bad_worker():
    try:
        await asyncio.sleep(100)
    except Exception:        # BUG: in Python 3.8+, CancelledError is a
        pass                  # BaseException subclass, NOT caught here --
                                # but a bare `except:` WOULD catch it and break cancellation

Since Python 3.8, asyncio.CancelledError inherits from BaseException (not Exception), specifically so that ordinary except Exception: blocks don't accidentally swallow it — but a bare except: still would, so it's a rule worth stating explicitly: always let cancellation propagate unless you're deliberately intercepting it to clean up, and always re-raise afterward.

Interview-ready summary: Use asyncio.wait_for/asyncio.timeout() for deadline-based cancellation, and task.cancel() for explicit cancellation — both work by raising CancelledError inside the task at its next await. Treat CancelledError as something to clean up around, never to swallow, since eating it silently breaks the cancellation contract for whoever is waiting on the task.

Related Resources

asyncio.wait_for — Python docs

Task Cancellation — Python docs

Concurrency, Parallelism & Async

What is the GIL (Global Interpreter Lock), and why does it exist?

What the GIL actually locks

Why "more threads" doesn't mean "more CPU throughput"

Why threading still helps for I/O-bound work

The real workaround for CPU-bound parallelism: separate processes

PEP 703: free-threaded (no-GIL) Python

Related Resources

When should you use threading vs multiprocessing vs asyncio?

The decision framework

Threading: easiest retrofit for I/O-bound code

Multiprocessing: real parallelism for CPU-bound work

Asyncio: massive I/O concurrency, single thread

Combining them

Related Resources

How does asyncio's event loop work, and what does `async`/`await` actually do?

async def creates a coroutine function

The event loop: single-threaded cooperative scheduling

What await actually does

Cooperative, not preemptive

Related Resources

What's the difference between a coroutine and a generator?

Same suspension mechanism, different intent

How they're driven differs

Purpose: many values vs. one eventual value

Async generators: a hybrid

Related Resources

How do you run CPU-bound work efficiently in Python given the GIL?

Option 1: separate processes

Option 2: push the hot loop into native code that releases the GIL

Why threading alone doesn't help here

Choosing between the two options

Related Resources

What is `concurrent.futures`, and how do `ThreadPoolExecutor` and `ProcessPoolExecutor` differ?

A unified API over threads or processes

Future objects: getting results as they complete

Key differences beyond thread vs. process

Exception handling

Related Resources

How do you handle race conditions and use locks in threaded Python code?

Why the GIL doesn't prevent race conditions

Fixing it with a Lock

Other synchronization primitives

Avoiding locks altogether: prefer thread-safe data structures/patterns

Related Resources

What are common asyncio pitfalls?

Pitfall 1: blocking calls freeze the whole event loop

Pitfall 2: creating a coroutine without awaiting it

Pitfall 3: fire-and-forget tasks losing their reference

Pitfall 4: mixing asyncio.run() calls / event loop confusion

Related Resources

How does `multiprocessing` share data between processes?

The default: no shared memory, everything is copied

Queue and Pipe: message passing

Value/Array: real shared memory for simple types

Manager: shared Python objects (dict, list, etc.)

Related Resources

What is PEP 703 (free-threaded / no-GIL Python), and what does it change?

What changes

The practical implication: threads can now use multiple cores

The tradeoffs, as of Python 3.13/3.14

Why it matters for interviews

Related Resources

How do you cancel or add a timeout to an asyncio task safely?

Timeouts with asyncio.wait_for

Python 3.11+'s asyncio.timeout(): a context-manager alternative

Explicit cancellation with task.cancel()

Why you must never blanket-catch CancelledError

Related Resources

`async def` creates a coroutine function

What `await` actually does

Why `threading` alone doesn't help here

`Future` objects: getting results as they complete

Fixing it with a `Lock`

Pitfall 4: mixing `asyncio.run()` calls / event loop confusion

`Queue` and `Pipe`: message passing

`Value`/`Array`: real shared memory for simple types

`Manager`: shared Python objects (dict, list, etc.)

Timeouts with `asyncio.wait_for`

Python 3.11+'s `asyncio.timeout()`: a context-manager alternative

Explicit cancellation with `task.cancel()`

Why you must never blanket-catch `CancelledError`