How does `multiprocessing` share data between processes?
Quick Answer
Since separate processes don't share memory by default, `multiprocessing` provides explicit IPC (inter-process communication) mechanisms: `Queue`/`Pipe` for passing messages between processes, `Value`/`Array` for simple shared memory backed by `ctypes`, and `Manager` for more complex shared objects (dicts, lists) proxied through a separate manager process. Anything passed to a worker (arguments, return values) is implicitly pickled and copied across the process boundary.
Detailed Answer
The default: no shared memory, everything is copied
from multiprocessing import Process
data = [1, 2, 3]
def worker(data):
data.append(4) # mutates the CHILD process's own copy only
p = Process(target=worker, args=(data,))
p.start()
p.join()
print(data) # [1, 2, 3] -- parent's list is untouched; child had a separate copy
Unlike threads (which share the same memory space), each process gets its
own independent memory — arguments passed to Process/Pool are
pickled, sent to the child, and unpickled there, so mutations in the
child never affect the parent's original objects.
Queue and Pipe: message passing
from multiprocessing import Process, Queue
def worker(q):
q.put("result from child")
q = Queue()
p = Process(target=worker, args=(q,))
p.start()
print(q.get()) # 'result from child'
p.join()
Queue is a process-safe FIFO for passing arbitrary picklable objects
between processes — the standard way to send results back from workers,
or to distribute work items to them. Pipe() provides a lower-level,
two-endpoint duplex connection between exactly two processes.
Value/Array: real shared memory for simple types
from multiprocessing import Process, Value
def worker(counter):
with counter.get_lock(): # Value provides a built-in lock
counter.value += 1
counter = Value("i", 0) # 'i' = ctypes int, backed by shared memory
processes = [Process(target=worker, args=(counter,)) for _ in range(4)]
[p.start() for p in processes]
[p.join() for p in processes]
print(counter.value) # 4
Value/Array allocate memory in a shared segment (via ctypes) that
multiple processes can read/write directly — much faster than pickling
through a Queue for simple numeric/fixed-type shared state, but limited
to ctypes-compatible types.
Manager: shared Python objects (dict, list, etc.)
from multiprocessing import Process, Manager
def worker(shared_dict, key, value):
shared_dict[key] = value
with Manager() as manager:
shared_dict = manager.dict()
processes = [Process(target=worker, args=(shared_dict, i, i * i)) for i in range(4)]
[p.start() for p in processes]
[p.join() for p in processes]
print(dict(shared_dict)) # {0: 0, 1: 1, 2: 4, 3: 9}
A Manager runs a separate server process holding the real object; other
processes get a proxy that forwards operations to it over IPC — more
flexible than Value/Array (supports dicts, lists, arbitrary picklable
values) but slower, since every access is a message round-trip, not a
direct memory read.
Interview-ready summary: Processes don't share memory by default, so
multiprocessing provides explicit channels: Queue/Pipe for message
passing, Value/Array for fast shared memory of simple ctypes types,
and Manager for shared, proxied Python objects when you need richer
data structures at the cost of IPC overhead.