How does a Python `dict` work internally, and does it guarantee insertion order?

6 minadvancedcollectionsdicthash-table

Quick Answer

CPython's `dict` is a **hash table**: each key's hash determines a slot, with open addressing to resolve collisions, giving average **O(1)** lookup/insert/delete. Since Python 3.7, dicts are guaranteed (as a language spec, not just a CPython detail) to **preserve insertion order** — achieved internally by keeping a compact array of entries in insertion order alongside a sparser hash-indices array.

Detailed Answer

The hash table basics

d = {"a": 1, "b": 2}
d["a"]        # O(1) average -- hash("a") locates the slot directly
d["c"] = 3    # O(1) average insert

Looking up d["a"] computes hash("a"), uses it to find a candidate slot in the internal table, and (after resolving any hash collisions) returns the value — no scanning of all keys, unlike a list.

Insertion order guarantee (Python 3.7+)

d = {}
d["z"] = 1
d["a"] = 2
d["m"] = 3
list(d)   # ['z', 'a', 'm']  -- insertion order, guaranteed since 3.7

This was a CPython implementation detail in 3.6 and became an official language guarantee in 3.7. Internally, CPython separates "which slot does this hash map to" (a sparse array of indices) from "the actual key/value/ hash entries" (a dense array kept in insertion order) — iterating a dict walks the dense array, which is naturally in insertion order, while lookups still use the sparse hash-indexed array for O(1) access.

Why keys must be hashable

d = {[1, 2]: "x"}   # TypeError: unhashable type: 'list'

A dict key's hash is computed once and used to place it in a slot; mutating the key afterward (which is only possible for mutable objects) would silently break the invariant that a key's slot matches its current hash. That's why list/dict/set (all mutable) can't be dict keys, but tuple, str, int, and frozenset (all immutable, and hashable if their contents are) can.

Collision resolution: open addressing

Unlike some languages' hash maps (which chain multiple entries per bucket, e.g. a linked list per slot), CPython dicts use open addressing: on a collision, a perturbation-based probing sequence finds the next candidate slot. This keeps memory more compact and cache-friendly than chaining, at the cost of needing careful resizing (the table is resized — and re-hashed — well before it gets too full, keeping average lookup close to O(1)).

Worst case

Pathological hash collisions can degrade lookups toward O(n) in theory, but CPython uses SipHash for string hashing with a randomized seed per process (PYTHONHASHSEED) specifically to make it infeasible for an attacker to engineer such collisions deliberately (a real denial-of-service vector in older hash table implementations).

Interview-ready summary: dict is a hash table giving average O(1) lookup/insert/delete; since Python 3.7 it's spec-guaranteed to preserve insertion order, achieved by keeping entries in a dense, insertion-ordered array separate from the sparse hash-index table used for O(1) lookups. Keys must be hashable (and therefore effectively immutable) since a key's slot is determined by its hash at insertion time.

Related Resources