How does small-integer and string caching (interning) affect object identity?
Quick Answer
CPython pre-allocates and caches all integers from **-5 to 256** at startup, so every reference to, say, `100` in that range points to the *same* cached object — making `is` comparisons on small integers appear to work by coincidence. Outside that range, each integer literal typically creates a distinct object. This is a CPython implementation detail (an optimization to avoid constantly allocating tiny, extremely common integer objects), not a language guarantee, so code must never rely on it.
Detailed Answer
The small-integer cache in action
a = 100
b = 100
a is b # True -- both point to the SAME cached int object (100 is within -5..256)
x = 1000
y = 1000
x is y # False on most CPython builds/contexts -- distinct objects; no caching guarantee
CPython pre-creates integer objects for -5 through 256 once, at interpreter startup, since these small values are used constantly throughout any program (loop counters, small offsets, boolean-like flags) — reusing one cached object instead of allocating a fresh one for every occurrence is a straightforward, safe optimization because integers are immutable, so sharing the same object across unrelated code has no observable effect other than saving allocations.
Why this is invisible and safe for == but a trap for is
def is_positive(n):
return n > 0
is_positive(100) == is_positive(100) # True -- correct, always
100 is 100 # True, but ONLY because of the small-int cache
n = 1000
n is 1000 # unreliable! don't write code that depends on this
Since == compares values (correct regardless of caching), the cache is
completely invisible to correct code. It only becomes a trap when someone
mistakenly uses is for value comparison and it happens to "work" during
testing (because test values were small) but silently breaks in
production once real values exceed 256 — this is exactly why modern
CPython emits a SyntaxWarning for is used with integer/string
literals.
String interning follows a similar but separate policy
s1 = "hello"
s2 = "hello"
s1 is s2 # True -- identifier-like string literals are commonly interned
s3 = "".join(["hel", "lo"])
s3 is s1 # often False -- built at runtime, not necessarily interned
String interning (covered in more depth in the Collections topic) is a related but distinct CPython optimization with its own rules about which strings get cached — both mechanisms exist purely to reduce memory/ allocation overhead for extremely common immutable values, and neither is part of the language specification.
The takeaway for interview answers
The important part isn't memorizing the exact cache range — it's
recognizing that this is a CPython implementation detail that other
implementations (PyPy, etc.) and even future CPython versions are free to
change, so relying on is for numeric or string value comparison is a
latent bug, not a valid optimization. Always use == for value
comparisons; reserve is for identity checks that are genuinely about
identity (None, sentinels, singleton checks).
Interview-ready summary: CPython caches small integers (-5 to 256)
and many identifier-like string literals as an allocation-saving
optimization, which makes is comparisons on them appear to work — but
this is an implementation detail with no language guarantee, and relying
on it for value equality (instead of ==) is a bug that will eventually
surface once values fall outside the cached range.