Generators in Python

Generators are a special type of iterator in Python that allow you to declare a function that behaves like an iterator. They generate values on-the-fly, one at a time, which makes them memory-efficient for handling large datasets or infinite sequences. Generators use the yield keyword to produce a sequence of values.

Generator Fundamentals

Basic Generator Function

A generator function looks like a normal function but uses the yield statement instead of return to provide a value. When called, it returns a generator object that can be iterated over.

# Basic generator function
def count_up_to(max):
    count = 1
    while count <= max:
        yield count
        count += 1

# Using the generator
counter = count_up_to(5)
print(counter)  # 

# Consuming values one at a time with next()
print(next(counter))  # 1
print(next(counter))  # 2
print(next(counter))  # 3

# Iterate through remaining values
for num in counter:
    print(num)  # 4, 5

# StopIteration is raised if next() is called again
# next(counter)  # This would raise StopIteration

Understanding Yield

When a generator function reaches a yield statement, it pauses execution, returns the yielded value, and saves its state. The next time the generator is called, it resumes where it left off.

# Generator that demonstrates state preservation
def state_demo():
    print("First checkpoint")
    yield 1
    
    print("Second checkpoint")
    yield 2
    
    print("Third checkpoint")
    yield 3
    
    print("Generator complete")

# Consuming the generator
gen = state_demo()

value = next(gen)  # Prints "First checkpoint" and returns 1
print(f"Got: {value}")

value = next(gen)  # Prints "Second checkpoint" and returns 2
print(f"Got: {value}")

value = next(gen)  # Prints "Third checkpoint" and returns 3
print(f"Got: {value}")

# The next call would print "Generator complete" and raise StopIteration
# next(gen)

Generator Expressions

Generator expressions provide a concise way to create generators, similar to list comprehensions but with parentheses instead of square brackets.

# List comprehension - builds the entire list in memory
squares_list = [x**2 for x in range(1000000)]  # Consumes lots of memory

# Generator expression - creates values on demand
squares_gen = (x**2 for x in range(1000000))   # Uses minimal memory

# Both can be used with the sum function
total = sum(x**2 for x in range(10))  # Generator expression without parentheses
print(total)  # 285

# Conditional generator expressions
even_squares = (x**2 for x in range(10) if x % 2 == 0)
for num in even_squares:
    print(num)  # 0, 4, 16, 36, 64

Practical Applications

Working with Large Files

Generators are ideal for processing large files line by line without loading the entire file into memory.

# Process a large file efficiently
def read_large_file(file_path):
    with open(file_path, 'r') as file:
        for line in file:
            yield line.strip()

# Count lines containing a specific word
def count_lines_with_word(file_path, word):
    count = 0
    for line in read_large_file(file_path):
        if word in line:
            count += 1
    return count

# Usage example (pseudocode)
# count = count_lines_with_word('huge_log.txt', 'ERROR')
# print(f"Found {count} error lines")

Infinite Sequences

Generators can represent infinite sequences without consuming infinite memory.

# Generate Fibonacci numbers infinitely
def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

# Get the first 10 Fibonacci numbers
fib_gen = fibonacci()
first_10 = [next(fib_gen) for _ in range(10)]
print(first_10)  # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

# Generate an infinite counter
def infinite_counter():
    num = 0
    while True:
        yield num
        num += 1

# Take the first 5 values
counter = infinite_counter()
for _ in range(5):
    print(next(counter))  # 0, 1, 2, 3, 4

Data Pipelines

Generators can form the basis of efficient data processing pipelines.

# Data processing pipeline using generators
def read_data(file_path):
    with open(file_path, 'r') as f:
        for line in f:
            yield line.strip()

def parse_data(lines):
    for line in lines:
        try:
            # Assuming CSV with ID,NAME,VALUE format
            id, name, value = line.split(',')
            yield {'id': id, 'name': name, 'value': float(value)}
        except (ValueError, IndexError):
            continue  # Skip malformed lines

def filter_data(records, min_value):
    for record in records:
        if record['value'] >= min_value:
            yield record

def transform_data(records):
    for record in records:
        # Add a calculated field
        record['value_squared'] = record['value'] ** 2
        yield record

# Usage as a pipeline (pseudocode)
# raw_data = read_data('data.csv')
# parsed_data = parse_data(raw_data)
# filtered_data = filter_data(parsed_data, 10.0)
# final_data = transform_data(filtered_data)
#
# for item in final_data:
#     print(item)

Advanced Generator Features

Sending Values to Generators

Generators can receive values from the outside using the send() method, turning them into coroutines that can both produce and consume values.

# Generator that can receive values
def echo():
    value = yield "Ready to echo!"
    while True:
        value = yield f"You said: {value}"

# Create the generator and advance to first yield
echo_gen = echo()
initial = next(echo_gen)  # Primes the generator
print(initial)  # "Ready to echo!"

# Send values to the generator
response = echo_gen.send("Hello")
print(response)  # "You said: Hello"

response = echo_gen.send("World")
print(response)  # "You said: World"

yield from Expression

The yield from expression allows a generator to delegate part of its operations to another generator.

# Generator delegating to another generator
def gen1():
    yield 'A'
    yield 'B'
    yield 'C'

def gen2():
    yield 'X'
    yield 'Y'
    yield 'Z'

def combined():
    yield from gen1()
    yield from gen2()

# Using the combined generator
for item in combined():
    print(item)  # A, B, C, X, Y, Z

# yield from can be used with any iterable
def flatten(nested_list):
    for item in nested_list:
        if isinstance(item, list):
            yield from flatten(item)  # Recursively flatten nested lists
        else:
            yield item

# Flatten a nested list
nested = [1, [2, [3, 4], 5], 6]
flat = list(flatten(nested))
print(flat)  # [1, 2, 3, 4, 5, 6]

Generator Cleanup

Generators can define cleanup code that runs when the generator is closed or garbage-collected.

# Generator with cleanup code using try/finally
def generator_with_cleanup():
    try:
        yield 1
        yield 2
        yield 3
    finally:
        print("Generator cleanup performed")

# Using the generator
gen = generator_with_cleanup()
print(next(gen))  # 1
print(next(gen))  # 2

# Close the generator explicitly
gen.close()  # Prints "Generator cleanup performed"

# Resource management generator example
def file_reader(file_path):
    file = open(file_path, 'r')
    try:
        for line in file:
            yield line
    finally:
        file.close()
        print(f"File {file_path} closed")

# Usage example (pseudocode)
# reader = file_reader('data.txt')
# first_line = next(reader)
# reader.close()  # File is properly closed

Throwing Exceptions into Generators

You can inject exceptions into a running generator using the throw() method.

# Generator that can handle thrown exceptions
def exception_handler():
    try:
        while True:
            try:
                value = yield
                print(f"Got value: {value}")
            except ValueError:
                print("Caught ValueError inside generator!")
    finally:
        print("Generator is exiting")

# Create and prime the generator
gen = exception_handler()
next(gen)  # Prime the generator

# Send values normally
gen.send("Hello")  # Prints "Got value: Hello"

# Throw an exception into the generator
gen.throw(ValueError)  # Prints "Caught ValueError inside generator!"

# Continue using the generator
gen.send("Back to normal")  # Prints "Got value: Back to normal"

# Close the generator
gen.close()  # Prints "Generator is exiting"

Performance Considerations

Memory Efficiency

Generators are memory-efficient because they produce values on-the-fly instead of storing the entire sequence in memory.

# Memory comparison: List vs Generator
import sys

# Creating a list of 10 million integers
def large_list():
    return [i for i in range(10000000)]

# Creating a generator for 10 million integers
def large_gen():
    return (i for i in range(10000000))

# Compare memory usage
print(f"List size: {sys.getsizeof(large_list())} bytes")
print(f"Generator size: {sys.getsizeof(large_gen())} bytes")

# List size will be hundreds of megabytes
# Generator size will be just a few hundred bytes

Execution Time

While generators save memory, there can be a small performance overhead for value generation.

# Performance comparison for summing numbers
import time

# Using a list
def sum_with_list(n):
    start = time.time()
    result = sum([i for i in range(n)])
    end = time.time()
    return result, end - start

# Using a generator
def sum_with_generator(n):
    start = time.time()
    result = sum(i for i in range(n))
    end = time.time()
    return result, end - start

# For small collections (e.g., n=1000), the difference is negligible
# For very large collections (e.g., n=10000000):
# - Lists can be faster for repeated iteration (trade-off: memory usage)
# - Generators are better when values are processed only once

Reusability

Unlike lists, generators are exhausted after a single iteration, which is important to consider in your design.

# Reusability demonstration
def get_generator():
    for i in range(3):
        yield i

# First iteration
gen = get_generator()
print(list(gen))  # [0, 1, 2]

# Second iteration - generator is already exhausted
print(list(gen))  # [] (empty list)

# To reuse, you need to create a new generator instance
gen = get_generator()
print(list(gen))  # [0, 1, 2] (works again)

# Caching generator values if needed
def cached_generator():
    cache = []
    gen = get_generator()
    for value in gen:
        cache.append(value)
        yield value
    
    # Yield from cache for subsequent iterations
    while True:
        for value in cache:
            yield value

Best Practices

When to Use Generators

  • Working with large datasets that would consume too much memory
  • Processing streams of data (files, network streams)
  • Creating data pipelines where values are processed one at a time
  • Implementing infinite sequences or streams
  • When lazy evaluation is desired (compute values only when needed)

Generator Design Tips

  • Keep generator functions focused on a single responsibility
  • Use proper error handling with try/except/finally blocks
  • Document whether a generator is expected to be consumed once or multiple times
  • Consider providing helper functions that recreate exhausted generators when needed
  • Use generator expressions for simple cases, generator functions for complex logic
  • Leverage the built-in itertools module for common generator patterns
# Useful patterns from itertools
import itertools

# Generate infinite sequences
counter = itertools.count(1, 2)  # 1, 3, 5, 7, ...
repeater = itertools.repeat("A", 3)  # A, A, A
cycler = itertools.cycle(["A", "B", "C"])  # A, B, C, A, B, C, ...

# Combine generators
chained = itertools.chain([1, 2], [3, 4])  # 1, 2, 3, 4
zipped = itertools.zip_longest([1, 2], [3, 4, 5], fillvalue=0)  # (1,3), (2,4), (0,5)

# Filtering and transforming
filtered = filter(lambda x: x % 2 == 0, range(10))  # 0, 2, 4, 6, 8
mapped = map(lambda x: x**2, range(5))  # 0, 1, 4, 9, 16

# Combinatorial generators
combos = itertools.combinations([1, 2, 3], 2)  # (1,2), (1,3), (2,3)
perms = itertools.permutations([1, 2, 3], 2)  # (1,2), (1,3), (2,1), (2,3), (3,1), (3,2)
products = itertools.product("AB", "12")  # A1, A2, B1, B2

Practice Exercises

Try These:

  1. Create a generator that produces prime numbers up to a specified limit.
  2. Write a generator-based function to read a CSV file and yield dictionaries of rows.
  3. Implement a generator pipeline to process and transform data from one format to another.
  4. Build a coroutine-based generator that can receive commands and adjust its output accordingly.
  5. Write a recursive generator that traverses a directory structure and yields file paths.

Sample Solution

Here's a solution for the prime number generator:

def is_prime(n):
    """Check if a number is prime."""
    if n <= 1:
        return False
    if n <= 3:
        return True
    if n % 2 == 0 or n % 3 == 0:
        return False
    i = 5
    while i * i <= n:
        if n % i == 0 or n % (i + 2) == 0:
            return False
        i += 6
    return True

def primes_up_to(limit):
    """Generate prime numbers up to the given limit."""
    for num in range(2, limit + 1):
        if is_prime(num):
            yield num

# Usage
for prime in primes_up_to(30):
    print(prime)  # 2, 3, 5, 7, 11, 13, 17, 19, 23, 29
Back to Cheat Sheet