What is the iterator protocol, and how does a `for` loop use it?

An **iterable** implements `__iter__`, which returns an **iterator** — an object implementing `__next__` (returning the next value, or raising `StopIteration` when exhausted) and `__iter__` (returning itself). A `for x in obj:` loop calls `iter(obj)` once to get an iterator, then repeatedly calls `next()` on it until `StopIteration` is raised, which the loop catches silently to end iteration.

What are generators, and how does `yield` differ from `return`?

A function containing `yield` becomes a **generator function** — calling it doesn't run the body immediately, it returns a generator object. Each call to `next()` resumes execution from where it last left off, runs until the next `yield` (producing one value and pausing, preserving all local state), and continues until the function ends (raising `StopIteration` automatically) or hits `return` (which ends iteration immediately). Unlike `return`, `yield` doesn't exit the function — it suspends it.

What is `yield from`, and how is it used for generator delegation?

`yield from subgen` delegates iteration to another generator/iterable — it yields every value `subgen` produces (as if written as a loop yielding each one), and also forwards `.send()`/`.throw()`/`.close()` calls through to `subgen`, and evaluates to `subgen`'s final `return` value. It's mainly used to flatten nested generators/compose generator pipelines without writing an explicit inner loop.

What's the difference between a generator expression and a list comprehension memory-wise?

A list comprehension (`[x for x in y]`) builds the **entire list in memory immediately**. A generator expression (`(x for x in y)`) builds nothing upfront — it's a lazy iterator that computes each value on demand, using O(1) memory regardless of how many elements it will eventually produce. Use a generator expression when you'll only iterate once and don't need random access or the full sequence in memory at once.

What are the most useful `itertools` functions, and when would you use them?

`itertools` provides fast, memory-efficient building blocks for iterator composition: `chain` (concatenate iterables), `groupby` (group consecutive equal keys), `islice` (slice an iterator lazily), `product`/`permutations`/`combinations` (combinatorics), `count`/`cycle`/`repeat` (infinite iterators), and `tee` (split one iterator into several independent ones). They compose well with generators to build data pipelines without materializing intermediate lists.

How does a context manager work, and how does the `with` statement use `__enter__`/`__exit__`?

A context manager implements `__enter__` (run when entering the `with` block, its return value bound to the `as` variable) and `__exit__` (run when leaving the block, **even if an exception occurred**, receiving the exception info and able to suppress it by returning `True`). This guarantees cleanup (closing a file, releasing a lock, committing/rolling back a transaction) happens regardless of how the block exits.

How do you write a context manager using `contextlib.contextmanager`?

`@contextlib.contextmanager` turns a **generator function** into a context manager: code before the single `yield` runs as `__enter__` (the yielded value becomes the `as` target), and code after `yield` runs as `__exit__` — wrapping the `yield` in `try`/`finally` handles cleanup on exceptions too. It's a much more concise alternative to writing a full class with `__enter__`/`__exit__` for simple setup/teardown logic.

How do generators help with memory efficiency when processing large datasets?

A generator processes one item at a time and never materializes the full dataset in memory, so you can stream through a file, database cursor, or network response of arbitrary size using **constant memory** instead of O(n). Chaining multiple generator-based transformation steps together builds a lazy pipeline where each item flows through all stages before the next item is even read, rather than each stage completing fully before the next starts.

Iterators, Generators & Context Managers

The iterator protocol, generator functions and expressions, itertools, and the with statement.

Difficulty

Open as page

What `for` actually does under the hood

for x in [1, 2, 3]:
    print(x)

# is roughly equivalent to:
it = iter([1, 2, 3])     # calls [1,2,3].__iter__()
while True:
    try:
        x = next(it)      # calls it.__next__()
    except StopIteration:
        break
    print(x)

iter(obj) calls obj.__iter__(). The returned object must implement __next__(), returning the next element each time until it's exhausted, at which point it raises StopIteration — the for loop (and every other iteration construct: comprehensions, * unpacking, sum(), etc.) catches that exception internally to know when to stop.

Iterable vs iterator: the crucial distinction

lst = [1, 2, 3]
it1 = iter(lst)
it2 = iter(lst)

next(it1)   # 1
next(it1)   # 2
next(it2)   # 1  -- it2 is independent, its own position

it1 is it2  # False -- iter(lst) returns a NEW iterator object each time

lst is iterable (has __iter__) but is not itself an iterator — you can call iter(lst) as many times as you want, each returning an independent, fresh iterator starting from the beginning. This is why a list can be looped over multiple times, or in nested loops simultaneously, without interference.

Writing your own iterator

class Countdown:
    def __init__(self, start):
        self.current = start

    def __iter__(self):
        return self          # an iterator returns itself from __iter__

    def __next__(self):
        if self.current <= 0:
            raise StopIteration
        self.current -= 1
        return self.current + 1

for n in Countdown(3):
    print(n)   # 3, 2, 1

Note that Countdown is its own iterator (__iter__ returns self), which is a common but important limitation: once exhausted, a Countdown instance can't be iterated again — unlike list, which is iterable but not itself an iterator, and so always supports a fresh pass.

Why this distinction matters in practice

data = Countdown(3)
list(data)   # [3, 2, 1]
list(data)   # [] -- already exhausted! same object, no way to restart

If you need to iterate something multiple times, it needs to be a true iterable that returns a new iterator each time __iter__ is called (like list) — not an object that is its own (single-use) iterator.

Interview-ready summary: Iteration is a two-step protocol: iter() gets an iterator via __iter__, and next() advances it via __next__ until StopIteration signals the end. Iterables can be iterated repeatedly (each iter() call returns a fresh iterator); an object that is its own iterator can only be consumed once.

Related Resources

Iterator Types — Python docs

Open as page

`yield` pauses; `return` exits

def count_up_to(n):
    i = 1
    while i <= n:
        yield i          # pause here, produce i, resume on next `next()`
        i += 1

gen = count_up_to(3)
gen                # <generator object count_up_to at 0x...> -- body hasn't run yet!
next(gen)           # 1  -- runs until the first yield
next(gen)           # 2  -- resumes right after the yield, runs to the next one
next(gen)           # 3
next(gen)           # StopIteration -- loop condition false, function returns naturally

Calling count_up_to(3) does not execute any code in the function body — it immediately returns a generator object. Execution only happens when you call next() (or iterate with a for loop), and each call resumes exactly where the previous yield left off, with all local variables (i, in this case) preserved between calls.

`return` inside a generator ends iteration (doesn't return a value normally)

def gen():
    yield 1
    yield 2
    return "done"   # ends the generator; the return value becomes StopIteration's argument
    yield 3          # never reached

g = gen()
next(g)   # 1
next(g)   # 2
next(g)   # StopIteration: done  -- the return value is attached to the exception

A bare return (or falling off the end of the function) also raises StopIteration — it's just the normal way a generator signals it's exhausted. A return value attaches value to StopIteration.value, which most code never inspects directly (it's mainly used internally by yield from to get a sub-generator's final return value).

Why generators matter: lazy evaluation

def read_large_file(path):
    with open(path) as f:
        for line in f:
            yield line.strip()

for line in read_large_file("huge_log.txt"):   # processes one line at a time
    process(line)                                # never loads the whole file into memory

Because a generator computes each value on demand rather than all at once, it can represent an infinite or very large sequence using constant memory — the entire point of the generator/iterator model versus building a full list upfront.

Generator state is a real suspended stack frame

Each generator object keeps its own frame — local variables, instruction pointer, and the position in any try/finally blocks — completely separate from any other call to the same generator function, which is why multiple independent generators from the same function don't interfere with each other.

Interview-ready summary: yield suspends a function's execution and produces one value per call, resuming exactly where it left off with all local state intact; return ends the function and (for a generator) raises StopIteration, ending iteration. This lazy, resumable execution model is what makes generators memory-efficient for large or infinite sequences.

Related Resources

Generators — Python docs

Open as page

The basic simplification

def inner():
    yield 1
    yield 2

def outer_manual():
    for value in inner():   # manually re-yield each value
        yield value

def outer_yield_from():
    yield from inner()       # equivalent, more concise

list(outer_manual())   # [1, 2]
list(outer_yield_from()) # [1, 2]

For simple pass-through delegation, yield from inner() is just a more concise form of looping and re-yielding — but the equivalence goes deeper than syntax sugar for a loop.

Flattening nested structures

def flatten(nested):
    for item in nested:
        if isinstance(item, list):
            yield from flatten(item)   # recursively delegate to sub-generator
        else:
            yield item

list(flatten([1, [2, 3, [4, 5]], 6]))   # [1, 2, 3, 4, 5, 6]

yield from makes recursive generator flattening natural — each recursive call's yielded values bubble straight up through every level of delegation without needing an explicit loop at each level.

Forwarding `.send()`, `.throw()`, and `.close()` (why it's more than a loop)

def inner():
    received = yield "ready"
    print(f"inner got: {received}")
    yield "done"

def outer():
    result = yield from inner()   # forwards .send() values into inner() too
    print(f"inner returned: {result}")

g = outer()
next(g)                # 'ready'
g.send("hello")          # prints "inner got: hello", yields 'done'

A plain for value in inner(): yield value loop would only forward yielded values, not values sent back in via .send() — those would go to the for loop's implicit next() call, not to inner(). yield from transparently forwards send/throw/close in both directions, which is essential for coroutine-style generators (largely superseded by async/await today, but still the mechanism asyncio was originally built on).

Capturing the sub-generator's return value

def inner():
    yield 1
    yield 2
    return "sub-generator done"

def outer():
    result = yield from inner()
    print(result)   # 'sub-generator done'

list(outer())   # prints "sub-generator done", yields [1, 2]

yield from is itself an expression that evaluates to whatever the delegated generator returned — a plain loop has no equivalent way to capture that value.

Interview-ready summary: yield from delegates iteration (and, less commonly used today, send/throw/close forwarding) to a sub-generator or iterable, doubling as a clean way to flatten nested generators and to capture a sub-generator's final return value — capabilities a plain re-yielding for loop doesn't fully provide.

Related Resources

yield expressions — Python docs

Open as page

Syntax difference: brackets vs parens

squares_list = [x * x for x in range(10**8)]    # builds a 10^8-element list NOW
squares_gen = (x * x for x in range(10**8))      # builds nothing yet -- lazy

import sys
sys.getsizeof(squares_list)   # ~800,000,000+ bytes -- huge
sys.getsizeof(squares_gen)     # ~200 bytes -- constant, regardless of range size

The list comprehension allocates memory for every element immediately; the generator expression is a small object that knows how to produce values, computing each one only when asked.

When you can (and can't) use a generator expression

total = sum(x * x for x in range(10**8))    # fine -- sum() consumes lazily, one at a time
gen = (x * x for x in range(10**8))
gen[5]        # TypeError -- generators don't support indexing
len(gen)       # TypeError -- no len() either, since size isn't known upfront
list(gen)       # works, but now you've paid the full memory cost anyway
for x in gen: ...  # then a second `for x in gen: ...` produces NOTHING -- already exhausted

Generators trade away random access, len(), and re-iterability for constant memory — appropriate when you consume the sequence exactly once, in order, and don't need to know its length in advance.

The practical rule of thumb

Feeding directly into another function that consumes one item at a time (sum(), max(), "".join(), a for loop): use a generator expression — no reason to materialize the whole list first.
Need to iterate multiple times, index into it, call len(), or keep the whole thing around: use a list comprehension.

# Generator expression -- no unnecessary intermediate list
total = sum(price * qty for price, qty in cart)

# List comprehension -- needed multiple times / indexed
top_3 = sorted([score for score in scores])[-3:]

Function calls already act like generator expressions when parens are implied

sum(x * x for x in range(10))   # no extra parens needed -- single-argument call

When a generator expression is the sole argument to a function call, the call's own parentheses double as the generator expression's parentheses — you don't need sum((x * x for x in range(10))).

Interview-ready summary: List comprehensions build the full result eagerly, in memory; generator expressions produce values lazily, in O(1) memory, at the cost of single-pass, no-random-access consumption. Default to a generator expression whenever the result is consumed once, in sequence, by something else — reach for the list only when you need to keep, index, or re-iterate the full result.

Related Resources

Generator expressions — Python docs

Open as page

The everyday workhorses

from itertools import chain, islice, groupby, product, count

# chain -- concatenate multiple iterables lazily, no copying
list(chain([1, 2], [3, 4], [5]))   # [1, 2, 3, 4, 5]

# islice -- lazily slice an iterator (regular slicing doesn't work on iterators!)
list(islice(range(100), 5, 10))     # [5, 6, 7, 8, 9]
first_3 = islice(open("huge.log"), 3)   # first 3 lines, without reading the whole file

# groupby -- group CONSECUTIVE elements sharing a key (input must be pre-sorted/grouped!)
data = [("a", 1), ("a", 2), ("b", 3), ("b", 4), ("a", 5)]
for key, group in groupby(data, key=lambda x: x[0]):
    print(key, list(group))
# a [('a', 1), ('a', 2)]
# b [('b', 3), ('b', 4)]
# a [('a', 5)]           <- note: a separate group, since input wasn't fully sorted by key

The groupby gotcha is one of the most common itertools mistakes: it only groups consecutive matching elements, so input almost always needs sorted(data, key=...) applied first if you want all occurrences of each key grouped together.

Infinite iterators (paired with `islice`/`takewhile` to bound them)

from itertools import count, cycle, repeat, islice

list(islice(count(10, 2), 5))    # [10, 12, 14, 16, 18] -- count from 10, step 2
list(islice(cycle("AB"), 5))      # ['A', 'B', 'A', 'B', 'A'] -- repeats forever
list(repeat("x", 3))                # ['x', 'x', 'x']

count/cycle never terminate on their own — always pair them with islice, zip against a finite iterable, or a break condition.

Combinatorics

from itertools import product, permutations, combinations

list(product([1, 2], ["a", "b"]))     # [(1,'a'), (1,'b'), (2,'a'), (2,'b')] -- cartesian product
list(permutations([1, 2, 3], 2))        # [(1,2),(1,3),(2,1),(2,3),(3,1),(3,2)]
list(combinations([1, 2, 3], 2))         # [(1,2),(1,3),(2,3)] -- order doesn't matter

These replace hand-written nested loops for generating all pairings, orderings, or subsets — both more concise and more efficient than manual nested for loops.

`tee`: splitting one iterator into several

from itertools import tee

a, b = tee(some_generator, 2)
list(a)   # consumes the shared underlying iterator
list(b)   # still works -- tee buffers what a already consumed

Useful when two different consumers each need to walk the same iterator independently — but note tee buffers data internally, so it trades memory for that independence and shouldn't replace list() for small, reusable sequences.

Interview-ready summary: itertools provides composable, lazy building blocks — chain/islice for combining/slicing, groupby for grouping consecutive keys (remember to sort first), product/ permutations/combinations for combinatorics, and count/cycle for infinite sequences bounded by islice. They let you build multi-step data pipelines without materializing intermediate lists at each stage.

Related Resources

itertools — Python docs

Open as page

What `with` desugars to

with open("file.txt") as f:
    data = f.read()

# roughly equivalent to:
f = open("file.txt").__enter__()
try:
    data = f.read()
finally:
    open("file.txt").__exit__(None, None, None)   # (simplified)

__enter__() runs first, and its return value is bound to f (the as target). __exit__() is guaranteed to run when the block ends, whether it ends normally or via an exception — this is the core guarantee with provides over a manual try/finally written by hand every time.

Writing your own context manager (class-based)

class Timer:
    def __enter__(self):
        import time
        self.start = time.perf_counter()
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        import time
        self.elapsed = time.perf_counter() - self.start
        print(f"Elapsed: {self.elapsed:.3f}s")
        return False    # False (or None) -- don't suppress any exception

with Timer() as t:
    do_expensive_work()
# prints elapsed time whether do_expensive_work() succeeded or raised

__exit__ receives (exc_type, exc_value, traceback) — all None if the block exited normally, or the actual exception info if one was raised. Returning True from __exit__ suppresses the exception (swallows it, as if it never happened); returning False/None lets it propagate normally after cleanup runs.

A common real-world pattern: guaranteed release

class DatabaseTransaction:
    def __enter__(self):
        self.conn.begin()
        return self.conn

    def __exit__(self, exc_type, exc_value, traceback):
        if exc_type is None:
            self.conn.commit()
        else:
            self.conn.rollback()   # roll back on ANY exception
        return False               # still propagate the exception after rollback

This is the canonical use of __exit__'s exception parameters: commit on success, roll back on failure, and let the caller still see the original exception (since returning False doesn't hide it) — the transaction is never left half-applied regardless of how the block exits.

Multiple context managers in one `with`

with open("in.txt") as fin, open("out.txt", "w") as fout:
    fout.write(fin.read())

Equivalent to nested with blocks — each context manager's __exit__ is guaranteed to run (in reverse order of entry) even if a later one's __enter__ or the block body fails.

Interview-ready summary: with guarantees __exit__ runs on the way out of the block, exception or not — __enter__ sets up a resource and __exit__ tears it down, optionally inspecting/suppressing an exception. This is Python's structured alternative to manually writing try/finally around every resource-acquiring operation.

Related Resources

With Statement Context Managers — Python docs

Open as page

The generator-based shortcut

from contextlib import contextmanager
import time

@contextmanager
def timer():
    start = time.perf_counter()
    try:
        yield start           # this becomes the `as` target
    finally:
        elapsed = time.perf_counter() - start
        print(f"Elapsed: {elapsed:.3f}s")

with timer() as start_time:
    do_expensive_work()
# prints elapsed time whether do_expensive_work() succeeded or raised

Everything before yield is the "enter" logic; the yielded value is bound to the as variable; everything after yield (in the finally) is the "exit" logic — the try/finally is what guarantees the cleanup code runs even if the with block's body raises.

How exceptions flow through the generator

@contextmanager
def suppress_value_errors():
    try:
        yield
    except ValueError:
        print("suppressed a ValueError")
    # not re-raising -- suppresses it, same as __exit__ returning True

with suppress_value_errors():
    raise ValueError("oops")
print("still runs")   # exception was suppressed

If the with block raises, that exception is raised at the yield statement itself inside the generator — so a try/except wrapped around the yield can catch and optionally suppress it (by not re-raising), exactly mirroring what returning True from a class-based __exit__ would do.

Class-based vs `@contextmanager`: when to choose which

# Class-based -- needed when you must hold state across calls,
# or reuse the context manager as a decorator/multiple times cleanly
class ManagedResource:
    def __enter__(self): ...
    def __exit__(self, *exc_info): ...

# @contextmanager -- concise, ideal for simple setup/teardown pairs
@contextmanager
def managed_resource():
    resource = acquire()
    try:
        yield resource
    finally:
        release(resource)

@contextmanager is usually the more concise, more readable choice for straightforward "acquire, yield, release" logic; a full class is better when the context manager needs multiple methods, needs to be instantiated once and reused as a context manager many times without re-running setup, or needs to double as something else (e.g., also implementing __call__ to work as a decorator via contextlib.ContextDecorator).

Interview-ready summary: @contextmanager converts a single-yield generator into a context manager — code before yield is __enter__, code after (in a finally) is __exit__, and wrapping yield in try/except lets you intercept or suppress exceptions from the with block, exactly mirroring the class-based protocol with far less boilerplate.

Related Resources

contextlib.contextmanager — Python docs

Open as page

The eager approach: loads everything at once

def read_all_lines(path):
    with open(path) as f:
        return f.readlines()   # entire file in memory as a list of strings

lines = read_all_lines("100gb.log")   # boom -- won't fit in memory

The generator approach: constant memory, streamed

def read_lines(path):
    with open(path) as f:
        for line in f:            # file objects are themselves iterators
            yield line.strip()

for line in read_lines("100gb.log"):   # one line in memory at a time
    process(line)

Since a file object is already an iterator over its lines, wrapping it in a generator function costs nothing extra in memory — only the current line (plus whatever process() needs) is ever resident, regardless of whether the file is 1KB or 100GB.

Chaining generators into a lazy pipeline

def read_lines(path):
    with open(path) as f:
        yield from (line.strip() for line in f)

def parse(lines):
    for line in lines:
        yield line.split(",")

def filter_valid(records):
    for r in records:
        if len(r) == 3:
            yield r

pipeline = filter_valid(parse(read_lines("data.csv")))
for record in pipeline:
    handle(record)

Crucially, this pipeline processes one row all the way through (read → parse → filter → handle) before reading the next row — it never builds an intermediate list at any stage. Compare this to writing each step as a list comprehension: filter_valid(parse(read_lines(...))) would fully read the file, then fully parse every line, then fully filter, each as a separate full pass building a full intermediate list — for large data, that's the difference between using a few KB of memory and running out of RAM.

The tradeoff: no random access, single-pass only

Generators give up len(), indexing, and re-iteration in exchange for constant memory. That's the right trade for streaming/one-pass processing; if you genuinely need to look at the data multiple times or access it by index, you need it materialized (a list) at some point regardless — generators just let you defer or avoid that when you don't.

Interview-ready summary: Generators trade eager, full-memory computation for lazy, one-item-at-a-time computation, which is what lets Python process files, streams, or datasets far larger than available memory. Chaining generators together builds a pipeline that processes each item through every stage before moving to the next, avoiding intermediate list materialization entirely.

Related Resources

Functional Programming HOWTO — Python docs

Iterators, Generators & Context Managers

What is the iterator protocol, and how does a `for` loop use it?

What for actually does under the hood

Iterable vs iterator: the crucial distinction

Writing your own iterator

Why this distinction matters in practice

Related Resources

What are generators, and how does `yield` differ from `return`?

yield pauses; return exits

return inside a generator ends iteration (doesn't return a value normally)

Why generators matter: lazy evaluation

Generator state is a real suspended stack frame

Related Resources

What is `yield from`, and how is it used for generator delegation?

The basic simplification

Flattening nested structures

Forwarding .send(), .throw(), and .close() (why it's more than a loop)

Capturing the sub-generator's return value

Related Resources

What's the difference between a generator expression and a list comprehension memory-wise?

Syntax difference: brackets vs parens

When you can (and can't) use a generator expression

The practical rule of thumb

Function calls already act like generator expressions when parens are implied

Related Resources

What are the most useful `itertools` functions, and when would you use them?

The everyday workhorses

Infinite iterators (paired with islice/takewhile to bound them)

Combinatorics

tee: splitting one iterator into several

Related Resources

How does a context manager work, and how does the `with` statement use `__enter__`/`__exit__`?

What with desugars to

Writing your own context manager (class-based)

A common real-world pattern: guaranteed release

Multiple context managers in one with

Related Resources

How do you write a context manager using `contextlib.contextmanager`?

The generator-based shortcut

How exceptions flow through the generator

Class-based vs @contextmanager: when to choose which

Related Resources

How do generators help with memory efficiency when processing large datasets?

The eager approach: loads everything at once

The generator approach: constant memory, streamed

Chaining generators into a lazy pipeline

The tradeoff: no random access, single-pass only

Related Resources

What `for` actually does under the hood

`yield` pauses; `return` exits

`return` inside a generator ends iteration (doesn't return a value normally)

Forwarding `.send()`, `.throw()`, and `.close()` (why it's more than a loop)

Infinite iterators (paired with `islice`/`takewhile` to bound them)

`tee`: splitting one iterator into several

How does a context manager work, and how does the `with` statement use `enter`/`exit`?

What `with` desugars to

Multiple context managers in one `with`

Class-based vs `@contextmanager`: when to choose which