What is the GIL (Global Interpreter Lock), and why does it exist?

Detailed Answer

What the GIL actually locks

CPython's memory management relies on reference counting: every object tracks how many references point to it, and is freed when that count hits zero. Incrementing/decrementing a refcount from multiple threads simultaneously, without synchronization, is a data race that could corrupt an object's refcount (leading to premature frees or memory leaks). The GIL solves this crudely but effectively: only one thread runs Python bytecode at a time, so refcount updates are never actually concurrent.

import threading

counter = 0

def increment():
    global counter
    for _ in range(1_000_000):
        counter += 1

threads = [threading.Thread(target=increment) for _ in range(4)]
[t.start() for t in threads]
[t.join() for t in threads]
print(counter)   # 4,000,000 -- correct, thanks to the GIL serializing bytecode execution

Without the GIL (or equivalent fine-grained locking), this kind of shared counter update from multiple threads would risk lost updates.

Why "more threads" doesn't mean "more CPU throughput"

def cpu_bound(n):
    return sum(i * i for i in range(n))

# Running cpu_bound() on 4 threads doesn't run 4x faster --
# only one thread executes Python bytecode at any instant, GIL or not.

For CPU-bound pure-Python work, threads provide concurrency (multiple things making progress, interleaved) but not parallelism (multiple things running simultaneously on separate cores) — the GIL serializes bytecode execution regardless of how many OS threads and CPU cores exist.

Why threading still helps for I/O-bound work

import time

def slow_io():
    time.sleep(1)   # releases the GIL while "blocked"

Blocking operations that call into C (file/network I/O, time.sleep, many library calls) release the GIL while waiting, letting other Python threads run bytecode in the meantime. This is why threading/concurrent.futures.ThreadPoolExecutor genuinely speed up I/O-bound workloads (e.g., many concurrent HTTP requests) even though the GIL exists — the bottleneck (waiting on the network) isn't CPU work at all.

The real workaround for CPU-bound parallelism: separate processes

Since the GIL is per-interpreter process, multiprocessing sidesteps it entirely by running separate Python processes, each with its own GIL, achieving true multi-core parallelism for CPU-bound work at the cost of inter-process communication overhead (data must be pickled/copied between processes, not shared directly).

PEP 703: free-threaded (no-GIL) Python

Starting with Python 3.13, an experimental free-threaded build (python3.13t) removes the GIL, using more fine-grained locking instead — aiming to give real multi-core parallelism to threaded Python code. As of this writing it's still opt-in and the ecosystem (C extensions especially) is still adapting; the standard GIL-enabled build remains the default.

Interview-ready summary: The GIL is CPython's mutex ensuring only one thread executes Python bytecode at a time, needed because refcount-based memory management isn't otherwise thread-safe. It doesn't prevent threading from helping I/O-bound work (the GIL is released during blocking calls), but it does prevent threads from speeding up CPU-bound pure-Python code — for that, use multiprocessing, or Python 3.13+'s experimental free-threaded build.