Iterators, Generators & Context Managers

Difficulty

What for actually does under the hood

for x in [1, 2, 3]:
    print(x)

# is roughly equivalent to:
it = iter([1, 2, 3])     # calls [1,2,3].__iter__()
while True:
    try:
        x = next(it)      # calls it.__next__()
    except StopIteration:
        break
    print(x)

iter(obj) calls obj.__iter__(). The returned object must implement __next__(), returning the next element each time until it's exhausted, at which point it raises StopIteration — the for loop (and every other iteration construct: comprehensions, * unpacking, sum(), etc.) catches that exception internally to know when to stop.

Iterable vs iterator: the crucial distinction

lst = [1, 2, 3]
it1 = iter(lst)
it2 = iter(lst)

next(it1)   # 1
next(it1)   # 2
next(it2)   # 1  -- it2 is independent, its own position

it1 is it2  # False -- iter(lst) returns a NEW iterator object each time

lst is iterable (has __iter__) but is not itself an iterator — you can call iter(lst) as many times as you want, each returning an independent, fresh iterator starting from the beginning. This is why a list can be looped over multiple times, or in nested loops simultaneously, without interference.

Writing your own iterator

class Countdown:
    def __init__(self, start):
        self.current = start

    def __iter__(self):
        return self          # an iterator returns itself from __iter__

    def __next__(self):
        if self.current <= 0:
            raise StopIteration
        self.current -= 1
        return self.current + 1

for n in Countdown(3):
    print(n)   # 3, 2, 1

Note that Countdown is its own iterator (__iter__ returns self), which is a common but important limitation: once exhausted, a Countdown instance can't be iterated again — unlike list, which is iterable but not itself an iterator, and so always supports a fresh pass.

Why this distinction matters in practice

data = Countdown(3)
list(data)   # [3, 2, 1]
list(data)   # [] -- already exhausted! same object, no way to restart

If you need to iterate something multiple times, it needs to be a true iterable that returns a new iterator each time __iter__ is called (like list) — not an object that is its own (single-use) iterator.

Interview-ready summary: Iteration is a two-step protocol: iter() gets an iterator via __iter__, and next() advances it via __next__ until StopIteration signals the end. Iterables can be iterated repeatedly (each iter() call returns a fresh iterator); an object that is its own iterator can only be consumed once.

yield pauses; return exits

def count_up_to(n):
    i = 1
    while i <= n:
        yield i          # pause here, produce i, resume on next `next()`
        i += 1

gen = count_up_to(3)
gen                # <generator object count_up_to at 0x...> -- body hasn't run yet!
next(gen)           # 1  -- runs until the first yield
next(gen)           # 2  -- resumes right after the yield, runs to the next one
next(gen)           # 3
next(gen)           # StopIteration -- loop condition false, function returns naturally

Calling count_up_to(3) does not execute any code in the function body — it immediately returns a generator object. Execution only happens when you call next() (or iterate with a for loop), and each call resumes exactly where the previous yield left off, with all local variables (i, in this case) preserved between calls.

return inside a generator ends iteration (doesn't return a value normally)

def gen():
    yield 1
    yield 2
    return "done"   # ends the generator; the return value becomes StopIteration's argument
    yield 3          # never reached

g = gen()
next(g)   # 1
next(g)   # 2
next(g)   # StopIteration: done  -- the return value is attached to the exception

A bare return (or falling off the end of the function) also raises StopIteration — it's just the normal way a generator signals it's exhausted. A return value attaches value to StopIteration.value, which most code never inspects directly (it's mainly used internally by yield from to get a sub-generator's final return value).

Why generators matter: lazy evaluation

def read_large_file(path):
    with open(path) as f:
        for line in f:
            yield line.strip()

for line in read_large_file("huge_log.txt"):   # processes one line at a time
    process(line)                                # never loads the whole file into memory

Because a generator computes each value on demand rather than all at once, it can represent an infinite or very large sequence using constant memory — the entire point of the generator/iterator model versus building a full list upfront.

Generator state is a real suspended stack frame

Each generator object keeps its own frame — local variables, instruction pointer, and the position in any try/finally blocks — completely separate from any other call to the same generator function, which is why multiple independent generators from the same function don't interfere with each other.

Interview-ready summary: yield suspends a function's execution and produces one value per call, resuming exactly where it left off with all local state intact; return ends the function and (for a generator) raises StopIteration, ending iteration. This lazy, resumable execution model is what makes generators memory-efficient for large or infinite sequences.

The basic simplification

def inner():
    yield 1
    yield 2

def outer_manual():
    for value in inner():   # manually re-yield each value
        yield value

def outer_yield_from():
    yield from inner()       # equivalent, more concise

list(outer_manual())   # [1, 2]
list(outer_yield_from()) # [1, 2]

For simple pass-through delegation, yield from inner() is just a more concise form of looping and re-yielding — but the equivalence goes deeper than syntax sugar for a loop.

Flattening nested structures

def flatten(nested):
    for item in nested:
        if isinstance(item, list):
            yield from flatten(item)   # recursively delegate to sub-generator
        else:
            yield item

list(flatten([1, [2, 3, [4, 5]], 6]))   # [1, 2, 3, 4, 5, 6]

yield from makes recursive generator flattening natural — each recursive call's yielded values bubble straight up through every level of delegation without needing an explicit loop at each level.

Forwarding .send(), .throw(), and .close() (why it's more than a loop)

def inner():
    received = yield "ready"
    print(f"inner got: {received}")
    yield "done"

def outer():
    result = yield from inner()   # forwards .send() values into inner() too
    print(f"inner returned: {result}")

g = outer()
next(g)                # 'ready'
g.send("hello")          # prints "inner got: hello", yields 'done'

A plain for value in inner(): yield value loop would only forward yielded values, not values sent back in via .send() — those would go to the for loop's implicit next() call, not to inner(). yield from transparently forwards send/throw/close in both directions, which is essential for coroutine-style generators (largely superseded by async/await today, but still the mechanism asyncio was originally built on).

Capturing the sub-generator's return value

def inner():
    yield 1
    yield 2
    return "sub-generator done"

def outer():
    result = yield from inner()
    print(result)   # 'sub-generator done'

list(outer())   # prints "sub-generator done", yields [1, 2]

yield from is itself an expression that evaluates to whatever the delegated generator returned — a plain loop has no equivalent way to capture that value.

Interview-ready summary: yield from delegates iteration (and, less commonly used today, send/throw/close forwarding) to a sub-generator or iterable, doubling as a clean way to flatten nested generators and to capture a sub-generator's final return value — capabilities a plain re-yielding for loop doesn't fully provide.

Syntax difference: brackets vs parens

squares_list = [x * x for x in range(10**8)]    # builds a 10^8-element list NOW
squares_gen = (x * x for x in range(10**8))      # builds nothing yet -- lazy

import sys
sys.getsizeof(squares_list)   # ~800,000,000+ bytes -- huge
sys.getsizeof(squares_gen)     # ~200 bytes -- constant, regardless of range size

The list comprehension allocates memory for every element immediately; the generator expression is a small object that knows how to produce values, computing each one only when asked.

When you can (and can't) use a generator expression

total = sum(x * x for x in range(10**8))    # fine -- sum() consumes lazily, one at a time
gen = (x * x for x in range(10**8))
gen[5]        # TypeError -- generators don't support indexing
len(gen)       # TypeError -- no len() either, since size isn't known upfront
list(gen)       # works, but now you've paid the full memory cost anyway
for x in gen: ...  # then a second `for x in gen: ...` produces NOTHING -- already exhausted

Generators trade away random access, len(), and re-iterability for constant memory — appropriate when you consume the sequence exactly once, in order, and don't need to know its length in advance.

The practical rule of thumb

  • Feeding directly into another function that consumes one item at a time (sum(), max(), "".join(), a for loop): use a generator expression — no reason to materialize the whole list first.
  • Need to iterate multiple times, index into it, call len(), or keep the whole thing around: use a list comprehension.
# Generator expression -- no unnecessary intermediate list
total = sum(price * qty for price, qty in cart)

# List comprehension -- needed multiple times / indexed
top_3 = sorted([score for score in scores])[-3:]

Function calls already act like generator expressions when parens are implied

sum(x * x for x in range(10))   # no extra parens needed -- single-argument call

When a generator expression is the sole argument to a function call, the call's own parentheses double as the generator expression's parentheses — you don't need sum((x * x for x in range(10))).

Interview-ready summary: List comprehensions build the full result eagerly, in memory; generator expressions produce values lazily, in O(1) memory, at the cost of single-pass, no-random-access consumption. Default to a generator expression whenever the result is consumed once, in sequence, by something else — reach for the list only when you need to keep, index, or re-iterate the full result.

The everyday workhorses

from itertools import chain, islice, groupby, product, count

# chain -- concatenate multiple iterables lazily, no copying
list(chain([1, 2], [3, 4], [5]))   # [1, 2, 3, 4, 5]

# islice -- lazily slice an iterator (regular slicing doesn't work on iterators!)
list(islice(range(100), 5, 10))     # [5, 6, 7, 8, 9]
first_3 = islice(open("huge.log"), 3)   # first 3 lines, without reading the whole file

# groupby -- group CONSECUTIVE elements sharing a key (input must be pre-sorted/grouped!)
data = [("a", 1), ("a", 2), ("b", 3), ("b", 4), ("a", 5)]
for key, group in groupby(data, key=lambda x: x[0]):
    print(key, list(group))
# a [('a', 1), ('a', 2)]
# b [('b', 3), ('b', 4)]
# a [('a', 5)]           <- note: a separate group, since input wasn't fully sorted by key

The groupby gotcha is one of the most common itertools mistakes: it only groups consecutive matching elements, so input almost always needs sorted(data, key=...) applied first if you want all occurrences of each key grouped together.

Infinite iterators (paired with islice/takewhile to bound them)

from itertools import count, cycle, repeat, islice

list(islice(count(10, 2), 5))    # [10, 12, 14, 16, 18] -- count from 10, step 2
list(islice(cycle("AB"), 5))      # ['A', 'B', 'A', 'B', 'A'] -- repeats forever
list(repeat("x", 3))                # ['x', 'x', 'x']

count/cycle never terminate on their own — always pair them with islice, zip against a finite iterable, or a break condition.

Combinatorics

from itertools import product, permutations, combinations

list(product([1, 2], ["a", "b"]))     # [(1,'a'), (1,'b'), (2,'a'), (2,'b')] -- cartesian product
list(permutations([1, 2, 3], 2))        # [(1,2),(1,3),(2,1),(2,3),(3,1),(3,2)]
list(combinations([1, 2, 3], 2))         # [(1,2),(1,3),(2,3)] -- order doesn't matter

These replace hand-written nested loops for generating all pairings, orderings, or subsets — both more concise and more efficient than manual nested for loops.

tee: splitting one iterator into several

from itertools import tee

a, b = tee(some_generator, 2)
list(a)   # consumes the shared underlying iterator
list(b)   # still works -- tee buffers what a already consumed

Useful when two different consumers each need to walk the same iterator independently — but note tee buffers data internally, so it trades memory for that independence and shouldn't replace list() for small, reusable sequences.

Interview-ready summary: itertools provides composable, lazy building blocks — chain/islice for combining/slicing, groupby for grouping consecutive keys (remember to sort first), product/ permutations/combinations for combinatorics, and count/cycle for infinite sequences bounded by islice. They let you build multi-step data pipelines without materializing intermediate lists at each stage.

Related Resources