Lists are dynamic arrays, not linked lists
CPython's list stores a contiguous array of pointers to objects, with
some extra pre-allocated capacity to amortize the cost of growth. This is
why indexing is O(1): lst[i] is just pointer arithmetic into the
underlying array, not a traversal.
lst = [1, 2, 3]
lst[1] # O(1) -- direct array index
lst.append(4) # O(1) amortized -- usually just writes into pre-allocated space
Complexity cheat sheet
| Operation | Complexity | Why |
|---|---|---|
lst[i], lst[i] = x | O(1) | direct array index |
lst.append(x) | O(1) amortized | writes to spare capacity; occasional O(n) resize, amortized away |
lst.pop() (from end) | O(1) | no shifting needed |
lst.pop(0), lst.insert(0, x) | O(n) | every remaining element shifts by one slot |
x in lst | O(n) | linear scan, no index structure |
len(lst) | O(1) | length is cached, not recomputed |
lst.sort() | O(n log n) | Timsort |
slicing lst[a:b] | O(k) | k = length of the slice, since a new list is built |
Why append is amortized O(1)
When the underlying array runs out of spare capacity, CPython allocates a larger array (roughly growing by ~1.125x plus a constant) and copies existing elements over — an O(n) operation, but one that happens increasingly rarely as the list grows, so the average cost per append across many appends works out to O(1).
Why front-insertion/removal is expensive
lst = [1, 2, 3, 4, 5]
lst.pop(0) # removes 1, then shifts 2,3,4,5 each one slot left -- O(n)
lst.insert(0, 0) # shifts every element right by one slot -- O(n)
If your workload needs frequent insert/pop from both ends, use
collections.deque instead — it's implemented as a doubly-linked list of
fixed-size blocks and supports O(1) append/pop from either end, at the
cost of O(n) random access (deque[i] is O(n) for large i, unlike
list).
Interview-ready summary: Python lists are dynamic arrays: O(1)
indexing and end-append/pop, but O(n) for inserting/removing anywhere
except the end, and O(n) membership testing. When you need efficient
operations at both ends, reach for collections.deque instead of list.
Related Resources
The hash table basics
d = {"a": 1, "b": 2}
d["a"] # O(1) average -- hash("a") locates the slot directly
d["c"] = 3 # O(1) average insert
Looking up d["a"] computes hash("a"), uses it to find a candidate slot
in the internal table, and (after resolving any hash collisions) returns
the value — no scanning of all keys, unlike a list.
Insertion order guarantee (Python 3.7+)
d = {}
d["z"] = 1
d["a"] = 2
d["m"] = 3
list(d) # ['z', 'a', 'm'] -- insertion order, guaranteed since 3.7
This was a CPython implementation detail in 3.6 and became an official language guarantee in 3.7. Internally, CPython separates "which slot does this hash map to" (a sparse array of indices) from "the actual key/value/ hash entries" (a dense array kept in insertion order) — iterating a dict walks the dense array, which is naturally in insertion order, while lookups still use the sparse hash-indexed array for O(1) access.
Why keys must be hashable
d = {[1, 2]: "x"} # TypeError: unhashable type: 'list'
A dict key's hash is computed once and used to place it in a slot;
mutating the key afterward (which is only possible for mutable objects)
would silently break the invariant that a key's slot matches its current
hash. That's why list/dict/set (all mutable) can't be dict keys, but
tuple, str, int, and frozenset (all immutable, and hashable
if their contents are) can.
Collision resolution: open addressing
Unlike some languages' hash maps (which chain multiple entries per bucket, e.g. a linked list per slot), CPython dicts use open addressing: on a collision, a perturbation-based probing sequence finds the next candidate slot. This keeps memory more compact and cache-friendly than chaining, at the cost of needing careful resizing (the table is resized — and re-hashed — well before it gets too full, keeping average lookup close to O(1)).
Worst case
Pathological hash collisions can degrade lookups toward O(n) in theory,
but CPython uses SipHash for string hashing with a randomized seed per
process (PYTHONHASHSEED) specifically to make it infeasible for an
attacker to engineer such collisions deliberately (a real
denial-of-service vector in older hash table implementations).
Interview-ready summary: dict is a hash table giving average O(1)
lookup/insert/delete; since Python 3.7 it's spec-guaranteed to preserve
insertion order, achieved by keeping entries in a dense, insertion-ordered
array separate from the sparse hash-index table used for O(1) lookups.
Keys must be hashable (and therefore effectively immutable) since a key's
slot is determined by its hash at insertion time.
Related Resources
What "hashable" requires
hash(42) # works -- int is hashable
hash("abc") # works -- str is hashable
hash((1, 2, 3)) # works -- tuple of hashables is hashable
hash([1, 2, 3]) # TypeError: unhashable type: 'list'
hash({1, 2}) # TypeError: unhashable type: 'set'
hash(frozenset({1,2})) # works -- frozenset (immutable set) is hashable
Hashability requires: (1) a __hash__ method that returns the same
integer every time for a given object's lifetime, and (2) if __eq__ is
defined, a == b implies hash(a) == hash(b) — this is required for
correct dict/set behavior (two "equal" keys must land findable in the same
bucket).
Why mutable containers are unhashable
lst = [1, 2, 3]
d = {lst: "value"} # TypeError -- if this worked...
lst.append(4) # ...and this mutated the key after insertion,
d[lst] # the dict's internal slot (based on the OLD hash) would be wrong
If list were hashable and its hash were based on contents, mutating a
list already used as a dict key would silently corrupt the dict's
internal structure (the key's slot no longer matches its current hash).
Python sidesteps this entirely by making mutable containers unhashable.
Custom classes: hashable by default (via identity)
class Point:
def __init__(self, x, y):
self.x, self.y = x, y
p = Point(1, 2)
hash(p) # works! -- default __hash__ is based on id(), inherited from object
By default, custom classes inherit object.__hash__, based on identity
(id()) — two distinct Point(1, 2) instances hash differently even
though they'd naturally seem "equal." This is fine until you also define
__eq__ for value-based equality:
class Point:
def __init__(self, x, y):
self.x, self.y = x, y
def __eq__(self, other):
return isinstance(other, Point) and (self.x, self.y) == (other.x, other.y)
hash(Point(1, 2)) # TypeError: unhashable type: 'Point'
Defining __eq__ makes Python set __hash__ = None automatically,
because the identity-based default hash would now violate the "equal
implies equal hash" contract. Fix by defining a consistent __hash__:
def __hash__(self):
return hash((self.x, self.y))
Interview-ready summary: Hashable means "has a stable __hash__, and
if __eq__ is defined, equal objects hash equally." Built-in mutable
containers are unhashable by design (mutation would corrupt any dict/set
using them as keys); custom classes are hashable by identity by default,
but overriding __eq__ disables that default hash until you provide a
matching __hash__ explicitly.
Related Resources
Quick decision table
| Need | Structure | Why |
|---|---|---|
| Ordered, mutable collection, duplicates OK | list | general-purpose sequence |
Fixed-size, immutable record ((x, y), (name, age)) | tuple | safe to hash, signals "this won't change" |
| Fast membership testing, deduplication | set | O(1) average in, automatic uniqueness |
| Lookup by key/name | dict | O(1) average lookup by key, not position |
Concrete examples
# list -- ordered sequence of items, order and duplicates matter
scores = [85, 92, 85, 78]
# tuple -- an immutable, fixed-shape record
point = (3, 4)
person = ("Ada", 36)
# set -- membership and uniqueness, order doesn't matter
seen_ids = {101, 205, 310}
if user_id in seen_ids: # O(1) average -- much faster than `in a_list` for large data
...
unique_tags = set(all_tags) # dedupe in one line
# dict -- look things up by key
users_by_id = {101: "Ada", 205: "Grace"}
users_by_id[101] # O(1) average
Why x in set beats x in list at scale
big_list = list(range(1_000_000))
big_set = set(big_list)
999_999 in big_list # O(n) -- scans up to a million elements
999_999 in big_set # O(1) average -- direct hash lookup
For any workload doing repeated membership checks against a large
collection, converting to a set (or using a dict if you also need
associated values) is one of the cheapest, highest-impact optimizations
available.
Tuple vs list: signaling intent, not just performance
def get_coordinates():
return (self.x, self.y) # a tuple signals "this is a fixed 2-item record"
Beyond being hashable (usable as dict keys/set elements) and slightly more
memory-efficient, using a tuple for a fixed-shape value communicates to
readers that the shape — not just the values — is meant to be fixed:
nobody should expect to .append() to it.
Interview-ready summary: Reach for list for ordered, mutable
sequences; tuple for fixed-shape, immutable records (and anything you
need to hash); set for uniqueness/fast membership testing; dict
whenever you look things up by key rather than by position. The
performance difference between O(n) list scans and O(1) set/dict lookups
is often the single biggest algorithmic win available in everyday code.
Related Resources
The three comprehension forms
squares = [x * x for x in range(10)] # list
evens = {x for x in range(10) if x % 2 == 0} # set
lookup = {x: x * x for x in range(10)} # dict
gen = (x * x for x in range(10)) # generator expression (lazy!)
Each desugars roughly to a loop that appends/adds/assigns into a new container (except the generator expression, which produces a lazy iterator instead of eagerly building a container — see the iterators/generators topic).
Why they're often faster than an explicit loop
# Comprehension
squares = [x * x for x in range(1000)]
# Equivalent explicit loop
squares = []
for x in range(1000):
squares.append(x * x)
Both do the same logical work, but the comprehension compiles to
specialized bytecode (LIST_APPEND inside an isolated function frame)
that avoids repeated attribute lookups (squares.append) on every
iteration — in practice this is a modest, not dramatic, speedup, so
"use comprehensions for performance" is a secondary benefit, not the main
reason to reach for them.
When they hurt readability
# Hard to read -- nested comprehension with a condition
result = [item.strip().lower() for sublist in data
for item in sublist if item and not item.startswith("#")]
# Clearer as an explicit loop
result = []
for sublist in data:
for item in sublist:
if item and not item.startswith("#"):
result.append(item.strip().lower())
A comprehension nested two or more levels deep, or one combining multiple
if conditions and a non-trivial expression, usually reads worse than the
equivalent loop — there's no room for named intermediate variables or
comments explaining a non-obvious filter, and reviewers have to mentally
un-nest the comprehension to understand execution order (which, notably,
goes left-to-right the way the for/if clauses are written, not inside-
out).
A good rule of thumb
If a comprehension needs more than one for clause or more than one
condition to express the logic, or if the transformation expression
itself needs a helper function to stay readable, switch to an explicit
loop (or a generator function with clear intermediate steps).
Interview-ready summary: Comprehensions are syntax sugar for building a list/set/dict via a loop, and are typically a bit faster due to specialized bytecode — but that's secondary to their real value: concise, readable code for a simple transform-and-filter. Once nesting or conditions pile up, prefer an explicit loop over a comprehension that's technically correct but hard to read.