What for actually does under the hood
for x in [1, 2, 3]:
print(x)
# is roughly equivalent to:
it = iter([1, 2, 3]) # calls [1,2,3].__iter__()
while True:
try:
x = next(it) # calls it.__next__()
except StopIteration:
break
print(x)
iter(obj) calls obj.__iter__(). The returned object must implement
__next__(), returning the next element each time until it's exhausted,
at which point it raises StopIteration — the for loop (and every other
iteration construct: comprehensions, * unpacking, sum(), etc.) catches
that exception internally to know when to stop.
Iterable vs iterator: the crucial distinction
lst = [1, 2, 3]
it1 = iter(lst)
it2 = iter(lst)
next(it1) # 1
next(it1) # 2
next(it2) # 1 -- it2 is independent, its own position
it1 is it2 # False -- iter(lst) returns a NEW iterator object each time
lst is iterable (has __iter__) but is not itself an iterator — you
can call iter(lst) as many times as you want, each returning an
independent, fresh iterator starting from the beginning. This is why a
list can be looped over multiple times, or in nested loops
simultaneously, without interference.
Writing your own iterator
class Countdown:
def __init__(self, start):
self.current = start
def __iter__(self):
return self # an iterator returns itself from __iter__
def __next__(self):
if self.current <= 0:
raise StopIteration
self.current -= 1
return self.current + 1
for n in Countdown(3):
print(n) # 3, 2, 1
Note that Countdown is its own iterator (__iter__ returns self),
which is a common but important limitation: once exhausted, a Countdown
instance can't be iterated again — unlike list, which is iterable but
not itself an iterator, and so always supports a fresh pass.
Why this distinction matters in practice
data = Countdown(3)
list(data) # [3, 2, 1]
list(data) # [] -- already exhausted! same object, no way to restart
If you need to iterate something multiple times, it needs to be a true
iterable that returns a new iterator each time __iter__ is called
(like list) — not an object that is its own (single-use) iterator.
Interview-ready summary: Iteration is a two-step protocol: iter()
gets an iterator via __iter__, and next() advances it via __next__
until StopIteration signals the end. Iterables can be iterated
repeatedly (each iter() call returns a fresh iterator); an object that
is its own iterator can only be consumed once.
Related Resources
yield pauses; return exits
def count_up_to(n):
i = 1
while i <= n:
yield i # pause here, produce i, resume on next `next()`
i += 1
gen = count_up_to(3)
gen # <generator object count_up_to at 0x...> -- body hasn't run yet!
next(gen) # 1 -- runs until the first yield
next(gen) # 2 -- resumes right after the yield, runs to the next one
next(gen) # 3
next(gen) # StopIteration -- loop condition false, function returns naturally
Calling count_up_to(3) does not execute any code in the function
body — it immediately returns a generator object. Execution only happens
when you call next() (or iterate with a for loop), and each call
resumes exactly where the previous yield left off, with all local
variables (i, in this case) preserved between calls.
return inside a generator ends iteration (doesn't return a value normally)
def gen():
yield 1
yield 2
return "done" # ends the generator; the return value becomes StopIteration's argument
yield 3 # never reached
g = gen()
next(g) # 1
next(g) # 2
next(g) # StopIteration: done -- the return value is attached to the exception
A bare return (or falling off the end of the function) also raises
StopIteration — it's just the normal way a generator signals it's
exhausted. A return value attaches value to StopIteration.value,
which most code never inspects directly (it's mainly used internally by
yield from to get a sub-generator's final return value).
Why generators matter: lazy evaluation
def read_large_file(path):
with open(path) as f:
for line in f:
yield line.strip()
for line in read_large_file("huge_log.txt"): # processes one line at a time
process(line) # never loads the whole file into memory
Because a generator computes each value on demand rather than all at once, it can represent an infinite or very large sequence using constant memory — the entire point of the generator/iterator model versus building a full list upfront.
Generator state is a real suspended stack frame
Each generator object keeps its own frame — local variables, instruction
pointer, and the position in any try/finally blocks — completely
separate from any other call to the same generator function, which is why
multiple independent generators from the same function don't interfere
with each other.
Interview-ready summary: yield suspends a function's execution and
produces one value per call, resuming exactly where it left off with all
local state intact; return ends the function and (for a generator)
raises StopIteration, ending iteration. This lazy, resumable execution
model is what makes generators memory-efficient for large or infinite
sequences.
Related Resources
The basic simplification
def inner():
yield 1
yield 2
def outer_manual():
for value in inner(): # manually re-yield each value
yield value
def outer_yield_from():
yield from inner() # equivalent, more concise
list(outer_manual()) # [1, 2]
list(outer_yield_from()) # [1, 2]
For simple pass-through delegation, yield from inner() is just a more
concise form of looping and re-yielding — but the equivalence goes deeper
than syntax sugar for a loop.
Flattening nested structures
def flatten(nested):
for item in nested:
if isinstance(item, list):
yield from flatten(item) # recursively delegate to sub-generator
else:
yield item
list(flatten([1, [2, 3, [4, 5]], 6])) # [1, 2, 3, 4, 5, 6]
yield from makes recursive generator flattening natural — each
recursive call's yielded values bubble straight up through every level of
delegation without needing an explicit loop at each level.
Forwarding .send(), .throw(), and .close() (why it's more than a loop)
def inner():
received = yield "ready"
print(f"inner got: {received}")
yield "done"
def outer():
result = yield from inner() # forwards .send() values into inner() too
print(f"inner returned: {result}")
g = outer()
next(g) # 'ready'
g.send("hello") # prints "inner got: hello", yields 'done'
A plain for value in inner(): yield value loop would only forward
yielded values, not values sent back in via .send() — those would go
to the for loop's implicit next() call, not to inner(). yield from
transparently forwards send/throw/close in both directions, which is
essential for coroutine-style generators (largely superseded by
async/await today, but still the mechanism asyncio was originally
built on).
Capturing the sub-generator's return value
def inner():
yield 1
yield 2
return "sub-generator done"
def outer():
result = yield from inner()
print(result) # 'sub-generator done'
list(outer()) # prints "sub-generator done", yields [1, 2]
yield from is itself an expression that evaluates to whatever the
delegated generator returned — a plain loop has no equivalent way to
capture that value.
Interview-ready summary: yield from delegates iteration (and, less
commonly used today, send/throw/close forwarding) to a sub-generator
or iterable, doubling as a clean way to flatten nested generators and to
capture a sub-generator's final return value — capabilities a plain
re-yielding for loop doesn't fully provide.
Related Resources
Syntax difference: brackets vs parens
squares_list = [x * x for x in range(10**8)] # builds a 10^8-element list NOW
squares_gen = (x * x for x in range(10**8)) # builds nothing yet -- lazy
import sys
sys.getsizeof(squares_list) # ~800,000,000+ bytes -- huge
sys.getsizeof(squares_gen) # ~200 bytes -- constant, regardless of range size
The list comprehension allocates memory for every element immediately; the generator expression is a small object that knows how to produce values, computing each one only when asked.
When you can (and can't) use a generator expression
total = sum(x * x for x in range(10**8)) # fine -- sum() consumes lazily, one at a time
gen = (x * x for x in range(10**8))
gen[5] # TypeError -- generators don't support indexing
len(gen) # TypeError -- no len() either, since size isn't known upfront
list(gen) # works, but now you've paid the full memory cost anyway
for x in gen: ... # then a second `for x in gen: ...` produces NOTHING -- already exhausted
Generators trade away random access, len(), and re-iterability for
constant memory — appropriate when you consume the sequence exactly once,
in order, and don't need to know its length in advance.
The practical rule of thumb
- Feeding directly into another function that consumes one item at a
time (
sum(),max(),"".join(), aforloop): use a generator expression — no reason to materialize the whole list first. - Need to iterate multiple times, index into it, call
len(), or keep the whole thing around: use a list comprehension.
# Generator expression -- no unnecessary intermediate list
total = sum(price * qty for price, qty in cart)
# List comprehension -- needed multiple times / indexed
top_3 = sorted([score for score in scores])[-3:]
Function calls already act like generator expressions when parens are implied
sum(x * x for x in range(10)) # no extra parens needed -- single-argument call
When a generator expression is the sole argument to a function call, the
call's own parentheses double as the generator expression's parentheses —
you don't need sum((x * x for x in range(10))).
Interview-ready summary: List comprehensions build the full result eagerly, in memory; generator expressions produce values lazily, in O(1) memory, at the cost of single-pass, no-random-access consumption. Default to a generator expression whenever the result is consumed once, in sequence, by something else — reach for the list only when you need to keep, index, or re-iterate the full result.
Related Resources
The everyday workhorses
from itertools import chain, islice, groupby, product, count
# chain -- concatenate multiple iterables lazily, no copying
list(chain([1, 2], [3, 4], [5])) # [1, 2, 3, 4, 5]
# islice -- lazily slice an iterator (regular slicing doesn't work on iterators!)
list(islice(range(100), 5, 10)) # [5, 6, 7, 8, 9]
first_3 = islice(open("huge.log"), 3) # first 3 lines, without reading the whole file
# groupby -- group CONSECUTIVE elements sharing a key (input must be pre-sorted/grouped!)
data = [("a", 1), ("a", 2), ("b", 3), ("b", 4), ("a", 5)]
for key, group in groupby(data, key=lambda x: x[0]):
print(key, list(group))
# a [('a', 1), ('a', 2)]
# b [('b', 3), ('b', 4)]
# a [('a', 5)] <- note: a separate group, since input wasn't fully sorted by key
The groupby gotcha is one of the most common itertools mistakes:
it only groups consecutive matching elements, so input almost always
needs sorted(data, key=...) applied first if you want all occurrences
of each key grouped together.
Infinite iterators (paired with islice/takewhile to bound them)
from itertools import count, cycle, repeat, islice
list(islice(count(10, 2), 5)) # [10, 12, 14, 16, 18] -- count from 10, step 2
list(islice(cycle("AB"), 5)) # ['A', 'B', 'A', 'B', 'A'] -- repeats forever
list(repeat("x", 3)) # ['x', 'x', 'x']
count/cycle never terminate on their own — always pair them with
islice, zip against a finite iterable, or a break condition.
Combinatorics
from itertools import product, permutations, combinations
list(product([1, 2], ["a", "b"])) # [(1,'a'), (1,'b'), (2,'a'), (2,'b')] -- cartesian product
list(permutations([1, 2, 3], 2)) # [(1,2),(1,3),(2,1),(2,3),(3,1),(3,2)]
list(combinations([1, 2, 3], 2)) # [(1,2),(1,3),(2,3)] -- order doesn't matter
These replace hand-written nested loops for generating all pairings,
orderings, or subsets — both more concise and more efficient than manual
nested for loops.
tee: splitting one iterator into several
from itertools import tee
a, b = tee(some_generator, 2)
list(a) # consumes the shared underlying iterator
list(b) # still works -- tee buffers what a already consumed
Useful when two different consumers each need to walk the same iterator
independently — but note tee buffers data internally, so it trades
memory for that independence and shouldn't replace list() for small,
reusable sequences.
Interview-ready summary: itertools provides composable, lazy
building blocks — chain/islice for combining/slicing, groupby for
grouping consecutive keys (remember to sort first), product/
permutations/combinations for combinatorics, and count/cycle for
infinite sequences bounded by islice. They let you build multi-step
data pipelines without materializing intermediate lists at each stage.