How does small-integer and string caching (interning) affect object identity?

5 minintermediatememoryinterningintegersgotcha

Quick Answer

CPython pre-allocates and caches all integers from **-5 to 256** at startup, so every reference to, say, `100` in that range points to the *same* cached object — making `is` comparisons on small integers appear to work by coincidence. Outside that range, each integer literal typically creates a distinct object. This is a CPython implementation detail (an optimization to avoid constantly allocating tiny, extremely common integer objects), not a language guarantee, so code must never rely on it.

Detailed Answer

The small-integer cache in action

a = 100
b = 100
a is b   # True -- both point to the SAME cached int object (100 is within -5..256)

x = 1000
y = 1000
x is y    # False on most CPython builds/contexts -- distinct objects; no caching guarantee

CPython pre-creates integer objects for -5 through 256 once, at interpreter startup, since these small values are used constantly throughout any program (loop counters, small offsets, boolean-like flags) — reusing one cached object instead of allocating a fresh one for every occurrence is a straightforward, safe optimization because integers are immutable, so sharing the same object across unrelated code has no observable effect other than saving allocations.

Why this is invisible and safe for == but a trap for is

def is_positive(n):
    return n > 0

is_positive(100) == is_positive(100)   # True -- correct, always
100 is 100                               # True, but ONLY because of the small-int cache

n = 1000
n is 1000    # unreliable! don't write code that depends on this

Since == compares values (correct regardless of caching), the cache is completely invisible to correct code. It only becomes a trap when someone mistakenly uses is for value comparison and it happens to "work" during testing (because test values were small) but silently breaks in production once real values exceed 256 — this is exactly why modern CPython emits a SyntaxWarning for is used with integer/string literals.

String interning follows a similar but separate policy

s1 = "hello"
s2 = "hello"
s1 is s2   # True -- identifier-like string literals are commonly interned

s3 = "".join(["hel", "lo"])
s3 is s1     # often False -- built at runtime, not necessarily interned

String interning (covered in more depth in the Collections topic) is a related but distinct CPython optimization with its own rules about which strings get cached — both mechanisms exist purely to reduce memory/ allocation overhead for extremely common immutable values, and neither is part of the language specification.

The takeaway for interview answers

The important part isn't memorizing the exact cache range — it's recognizing that this is a CPython implementation detail that other implementations (PyPy, etc.) and even future CPython versions are free to change, so relying on is for numeric or string value comparison is a latent bug, not a valid optimization. Always use == for value comparisons; reserve is for identity checks that are genuinely about identity (None, sentinels, singleton checks).

Interview-ready summary: CPython caches small integers (-5 to 256) and many identifier-like string literals as an allocation-saving optimization, which makes is comparisons on them appear to work — but this is an implementation detail with no language guarantee, and relying on it for value equality (instead of ==) is a bug that will eventually surface once values fall outside the cached range.