What's the difference between `sys.getsizeof()` and an object's actual total memory usage?
Quick Answer
`sys.getsizeof(obj)` returns only the memory of the object **itself** — for a container like a list, that's the list structure and its internal pointer array, but **not** the size of the objects it points to. To get the true total memory footprint of a nested structure, you need to recursively sum the sizes of every object reachable from it (accounting for shared references so you don't double-count).
Detailed Answer
The trap: getsizeof doesn't recurse into contents
import sys
small_list = [1, 2, 3]
big_list = [10**100] * 3 # each element is a HUGE integer
sys.getsizeof(small_list) # ~88 bytes -- just the list structure + 3 pointers
sys.getsizeof(big_list) # ~88 bytes -- SAME! getsizeof doesn't look at what's IN the list
Both lists report roughly the same size from getsizeof, because it only
measures the list object's own overhead (a header plus an array of
pointers) — not the memory used by the objects those pointers point to.
The actual memory difference between small_list and big_list (the
huge integers) is completely invisible to a naive getsizeof call.
Getting the real total: recursive sizing
import sys
from itertools import chain
from collections import deque
def total_size(obj, seen=None):
if seen is None:
seen = set()
obj_id = id(obj)
if obj_id in seen: # avoid double-counting shared references / infinite recursion on cycles
return 0
seen.add(obj_id)
size = sys.getsizeof(obj)
if isinstance(obj, dict):
size += sum(total_size(k, seen) + total_size(v, seen) for k, v in obj.items())
elif isinstance(obj, (list, tuple, set, frozenset)):
size += sum(total_size(item, seen) for item in obj)
return size
total_size(big_list) # now correctly reflects the huge integers' actual size
This is roughly what third-party tools like pympler.asizeof do more
robustly — a naive recursive walk must track already-visited object ids
(seen) both to avoid infinite loops on cyclic structures and to avoid
double-counting objects referenced from multiple places (e.g., the same
string appearing as a value under several dict keys).
Why this distinction matters in practice
data = {"key": some_shared_large_object}
data2 = {"other_key": some_shared_large_object} # same object, referenced twice
sys.getsizeof(data) + sys.getsizeof(data2) # double-counts some_shared_large_object
# if you naively try to add up "total memory"
Any back-of-envelope memory estimate built by summing getsizeof calls
across multiple containers risks double-counting shared objects —
the same underlying string/list/object referenced from two places isn't
actually taking up memory twice, but naive summation would report it that
way.
Practical guidance
For a real memory audit, prefer tracemalloc (tracks actual allocations
by source location, correctly reflecting real memory pressure) or a
purpose-built tool (pympler, objgraph) over hand-rolled
sys.getsizeof recursion — the latter is useful for quick, one-off
"how big is this specific object's own overhead" checks, not accurate
whole-structure memory accounting.
Interview-ready summary: sys.getsizeof measures only an object's
own shallow overhead, not what it references — a container full of huge
objects can report the same size as one full of tiny objects. Getting a
true total requires recursively walking references while tracking
already-visited objects to avoid double-counting shared data, or better,
using a dedicated memory-profiling tool.