Python Generators - Python Cheat Sheet

Generators are a special type of iterator in Python that allow you to declare a function that behaves like an iterator. They generate values on-the-fly, one at a time, which makes them memory-efficient for handling large datasets or infinite sequences. Generators use the yield keyword to produce a sequence of values.

Generator Fundamentals

Basic Generator Function

A generator function looks like a normal function but uses the yield statement instead of return to provide a value. When called, it returns a generator object that can be iterated over.

# Basic generator function
def count_up_to(max):
    count = 1
    while count <= max:
        yield count
        count += 1

# Using the generator
counter = count_up_to(5)
print(counter)  # 

# Consuming values one at a time with next()
print(next(counter))  # 1
print(next(counter))  # 2
print(next(counter))  # 3

# Iterate through remaining values
for num in counter:
    print(num)  # 4, 5

# StopIteration is raised if next() is called again
# next(counter)  # This would raise StopIteration

Understanding Yield

When a generator function reaches a yield statement, it pauses execution, returns the yielded value, and saves its state. The next time the generator is called, it resumes where it left off.

# Generator that demonstrates state preservation
def state_demo():
    print("First checkpoint")
    yield 1
    
    print("Second checkpoint")
    yield 2
    
    print("Third checkpoint")
    yield 3
    
    print("Generator complete")

# Consuming the generator
gen = state_demo()

value = next(gen)  # Prints "First checkpoint" and returns 1
print(f"Got: {value}")

value = next(gen)  # Prints "Second checkpoint" and returns 2
print(f"Got: {value}")

value = next(gen)  # Prints "Third checkpoint" and returns 3
print(f"Got: {value}")

# The next call would print "Generator complete" and raise StopIteration
# next(gen)

Generator Expressions

Generator expressions provide a concise way to create generators, similar to list comprehensions but with parentheses instead of square brackets.

# List comprehension - builds the entire list in memory
squares_list = [x**2 for x in range(1000000)]  # Consumes lots of memory

# Generator expression - creates values on demand
squares_gen = (x**2 for x in range(1000000))   # Uses minimal memory

# Both can be used with the sum function
total = sum(x**2 for x in range(10))  # Generator expression without parentheses
print(total)  # 285

# Conditional generator expressions
even_squares = (x**2 for x in range(10) if x % 2 == 0)
for num in even_squares:
    print(num)  # 0, 4, 16, 36, 64

Practical Applications

Working with Large Files

Generators are ideal for processing large files line by line without loading the entire file into memory.

# Process a large file efficiently
def read_large_file(file_path):
    with open(file_path, 'r') as file:
        for line in file:
            yield line.strip()

# Count lines containing a specific word
def count_lines_with_word(file_path, word):
    count = 0
    for line in read_large_file(file_path):
        if word in line:
            count += 1
    return count

# Usage example (pseudocode)
# count = count_lines_with_word('huge_log.txt', 'ERROR')
# print(f"Found {count} error lines")

Infinite Sequences

Generators can represent infinite sequences without consuming infinite memory.

# Generate Fibonacci numbers infinitely
def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

# Get the first 10 Fibonacci numbers
fib_gen = fibonacci()
first_10 = [next(fib_gen) for _ in range(10)]
print(first_10)  # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

# Generate an infinite counter
def infinite_counter():
    num = 0
    while True:
        yield num
        num += 1

# Take the first 5 values
counter = infinite_counter()
for _ in range(5):
    print(next(counter))  # 0, 1, 2, 3, 4

Data Pipelines

Generators can form the basis of efficient data processing pipelines.

# Data processing pipeline using generators
def read_data(file_path):
    with open(file_path, 'r') as f:
        for line in f:
            yield line.strip()

def parse_data(lines):
    for line in lines:
        try:
            # Assuming CSV with ID,NAME,VALUE format
            id, name, value = line.split(',')
            yield {'id': id, 'name': name, 'value': float(value)}
        except (ValueError, IndexError):
            continue  # Skip malformed lines

def filter_data(records, min_value):
    for record in records:
        if record['value'] >= min_value:
            yield record

def transform_data(records):
    for record in records:
        # Add a calculated field
        record['value_squared'] = record['value'] ** 2
        yield record

# Usage as a pipeline (pseudocode)
# raw_data = read_data('data.csv')
# parsed_data = parse_data(raw_data)
# filtered_data = filter_data(parsed_data, 10.0)
# final_data = transform_data(filtered_data)
#
# for item in final_data:
#     print(item)

Advanced Generator Features

Sending Values to Generators

Generators can receive values from the outside using the send() method, turning them into coroutines that can both produce and consume values.

# Generator that can receive values
def echo():
    value = yield "Ready to echo!"
    while True:
        value = yield f"You said: {value}"

# Create the generator and advance to first yield
echo_gen = echo()
initial = next(echo_gen)  # Primes the generator
print(initial)  # "Ready to echo!"

# Send values to the generator
response = echo_gen.send("Hello")
print(response)  # "You said: Hello"

response = echo_gen.send("World")
print(response)  # "You said: World"

yield from Expression

The yield from expression allows a generator to delegate part of its operations to another generator.

# Generator delegating to another generator
def gen1():
    yield 'A'
    yield 'B'
    yield 'C'

def gen2():
    yield 'X'
    yield 'Y'
    yield 'Z'

def combined():
    yield from gen1()
    yield from gen2()

# Using the combined generator
for item in combined():
    print(item)  # A, B, C, X, Y, Z

# yield from can be used with any iterable
def flatten(nested_list):
    for item in nested_list:
        if isinstance(item, list):
            yield from flatten(item)  # Recursively flatten nested lists
        else:
            yield item

# Flatten a nested list
nested = [1, [2, [3, 4], 5], 6]
flat = list(flatten(nested))
print(flat)  # [1, 2, 3, 4, 5, 6]

Generator Cleanup

Generators can define cleanup code that runs when the generator is closed or garbage-collected.

# Generator with cleanup code using try/finally
def generator_with_cleanup():
    try:
        yield 1
        yield 2
        yield 3
    finally:
        print("Generator cleanup performed")

# Using the generator
gen = generator_with_cleanup()
print(next(gen))  # 1
print(next(gen))  # 2

# Close the generator explicitly
gen.close()  # Prints "Generator cleanup performed"

# Resource management generator example
def file_reader(file_path):
    file = open(file_path, 'r')
    try:
        for line in file:
            yield line
    finally:
        file.close()
        print(f"File {file_path} closed")

# Usage example (pseudocode)
# reader = file_reader('data.txt')
# first_line = next(reader)
# reader.close()  # File is properly closed

Throwing Exceptions into Generators

You can inject exceptions into a running generator using the throw() method.

# Generator that can handle thrown exceptions
def exception_handler():
    try:
        while True:
            try:
                value = yield
                print(f"Got value: {value}")
            except ValueError:
                print("Caught ValueError inside generator!")
    finally:
        print("Generator is exiting")

# Create and prime the generator
gen = exception_handler()
next(gen)  # Prime the generator

# Send values normally
gen.send("Hello")  # Prints "Got value: Hello"

# Throw an exception into the generator
gen.throw(ValueError)  # Prints "Caught ValueError inside generator!"

# Continue using the generator
gen.send("Back to normal")  # Prints "Got value: Back to normal"

# Close the generator
gen.close()  # Prints "Generator is exiting"

Performance Considerations

Memory Efficiency

Generators are memory-efficient because they produce values on-the-fly instead of storing the entire sequence in memory.

# Memory comparison: List vs Generator
import sys

# Creating a list of 10 million integers
def large_list():
    return [i for i in range(10000000)]

# Creating a generator for 10 million integers
def large_gen():
    return (i for i in range(10000000))

# Compare memory usage
print(f"List size: {sys.getsizeof(large_list())} bytes")
print(f"Generator size: {sys.getsizeof(large_gen())} bytes")

# List size will be hundreds of megabytes
# Generator size will be just a few hundred bytes

Execution Time

While generators save memory, there can be a small performance overhead for value generation.

# Performance comparison for summing numbers
import time

# Using a list
def sum_with_list(n):
    start = time.time()
    result = sum([i for i in range(n)])
    end = time.time()
    return result, end - start

# Using a generator
def sum_with_generator(n):
    start = time.time()
    result = sum(i for i in range(n))
    end = time.time()
    return result, end - start

# For small collections (e.g., n=1000), the difference is negligible
# For very large collections (e.g., n=10000000):
# - Lists can be faster for repeated iteration (trade-off: memory usage)
# - Generators are better when values are processed only once

Reusability

Unlike lists, generators are exhausted after a single iteration, which is important to consider in your design.

# Reusability demonstration
def get_generator():
    for i in range(3):
        yield i

# First iteration
gen = get_generator()
print(list(gen))  # [0, 1, 2]

# Second iteration - generator is already exhausted
print(list(gen))  # [] (empty list)

# To reuse, you need to create a new generator instance
gen = get_generator()
print(list(gen))  # [0, 1, 2] (works again)

# Caching generator values if needed
def cached_generator():
    cache = []
    gen = get_generator()
    for value in gen:
        cache.append(value)
        yield value
    
    # Yield from cache for subsequent iterations
    while True:
        for value in cache:
            yield value

Best Practices

When to Use Generators

Working with large datasets that would consume too much memory
Processing streams of data (files, network streams)
Creating data pipelines where values are processed one at a time
Implementing infinite sequences or streams
When lazy evaluation is desired (compute values only when needed)

Generator Design Tips

Keep generator functions focused on a single responsibility
Use proper error handling with try/except/finally blocks
Document whether a generator is expected to be consumed once or multiple times
Consider providing helper functions that recreate exhausted generators when needed
Use generator expressions for simple cases, generator functions for complex logic
Leverage the built-in itertools module for common generator patterns

# Useful patterns from itertools
import itertools

# Generate infinite sequences
counter = itertools.count(1, 2)  # 1, 3, 5, 7, ...
repeater = itertools.repeat("A", 3)  # A, A, A
cycler = itertools.cycle(["A", "B", "C"])  # A, B, C, A, B, C, ...

# Combine generators
chained = itertools.chain([1, 2], [3, 4])  # 1, 2, 3, 4
zipped = itertools.zip_longest([1, 2], [3, 4, 5], fillvalue=0)  # (1,3), (2,4), (0,5)

# Filtering and transforming
filtered = filter(lambda x: x % 2 == 0, range(10))  # 0, 2, 4, 6, 8
mapped = map(lambda x: x**2, range(5))  # 0, 1, 4, 9, 16

# Combinatorial generators
combos = itertools.combinations([1, 2, 3], 2)  # (1,2), (1,3), (2,3)
perms = itertools.permutations([1, 2, 3], 2)  # (1,2), (1,3), (2,1), (2,3), (3,1), (3,2)
products = itertools.product("AB", "12")  # A1, A2, B1, B2

Practice Exercises

Try These:

Create a generator that produces prime numbers up to a specified limit.
Write a generator-based function to read a CSV file and yield dictionaries of rows.
Implement a generator pipeline to process and transform data from one format to another.
Build a coroutine-based generator that can receive commands and adjust its output accordingly.
Write a recursive generator that traverses a directory structure and yields file paths.

Sample Solution

Here's a solution for the prime number generator:

def is_prime(n):
    """Check if a number is prime."""
    if n <= 1:
        return False
    if n <= 3:
        return True
    if n % 2 == 0 or n % 3 == 0:
        return False
    i = 5
    while i * i <= n:
        if n % i == 0 or n % (i + 2) == 0:
            return False
        i += 6
    return True

def primes_up_to(limit):
    """Generate prime numbers up to the given limit."""
    for num in range(2, limit + 1):
        if is_prime(num):
            yield num

# Usage
for prime in primes_up_to(30):
    print(prime)  # 2, 3, 5, 7, 11, 13, 17, 19, 23, 29

Generators in Python