What's the difference between `sys.getsizeof()` and an object's actual total memory usage?

5 minadvancedmemorysys-getsizeofprofiling

Quick Answer

`sys.getsizeof(obj)` returns only the memory of the object **itself** — for a container like a list, that's the list structure and its internal pointer array, but **not** the size of the objects it points to. To get the true total memory footprint of a nested structure, you need to recursively sum the sizes of every object reachable from it (accounting for shared references so you don't double-count).

Detailed Answer

The trap: getsizeof doesn't recurse into contents

import sys

small_list = [1, 2, 3]
big_list = [10**100] * 3   # each element is a HUGE integer

sys.getsizeof(small_list)   # ~88 bytes -- just the list structure + 3 pointers
sys.getsizeof(big_list)      # ~88 bytes -- SAME! getsizeof doesn't look at what's IN the list

Both lists report roughly the same size from getsizeof, because it only measures the list object's own overhead (a header plus an array of pointers) — not the memory used by the objects those pointers point to. The actual memory difference between small_list and big_list (the huge integers) is completely invisible to a naive getsizeof call.

Getting the real total: recursive sizing

import sys
from itertools import chain
from collections import deque

def total_size(obj, seen=None):
    if seen is None:
        seen = set()
    obj_id = id(obj)
    if obj_id in seen:      # avoid double-counting shared references / infinite recursion on cycles
        return 0
    seen.add(obj_id)

    size = sys.getsizeof(obj)
    if isinstance(obj, dict):
        size += sum(total_size(k, seen) + total_size(v, seen) for k, v in obj.items())
    elif isinstance(obj, (list, tuple, set, frozenset)):
        size += sum(total_size(item, seen) for item in obj)
    return size

total_size(big_list)   # now correctly reflects the huge integers' actual size

This is roughly what third-party tools like pympler.asizeof do more robustly — a naive recursive walk must track already-visited object ids (seen) both to avoid infinite loops on cyclic structures and to avoid double-counting objects referenced from multiple places (e.g., the same string appearing as a value under several dict keys).

Why this distinction matters in practice

data = {"key": some_shared_large_object}
data2 = {"other_key": some_shared_large_object}   # same object, referenced twice

sys.getsizeof(data) + sys.getsizeof(data2)   # double-counts some_shared_large_object
                                                # if you naively try to add up "total memory"

Any back-of-envelope memory estimate built by summing getsizeof calls across multiple containers risks double-counting shared objects — the same underlying string/list/object referenced from two places isn't actually taking up memory twice, but naive summation would report it that way.

Practical guidance

For a real memory audit, prefer tracemalloc (tracks actual allocations by source location, correctly reflecting real memory pressure) or a purpose-built tool (pympler, objgraph) over hand-rolled sys.getsizeof recursion — the latter is useful for quick, one-off "how big is this specific object's own overhead" checks, not accurate whole-structure memory accounting.

Interview-ready summary: sys.getsizeof measures only an object's own shallow overhead, not what it references — a container full of huge objects can report the same size as one full of tiny objects. Getting a true total requires recursively walking references while tracking already-visited objects to avoid double-counting shared data, or better, using a dedicated memory-profiling tool.