Python Generators

Reference

How to Use Generators and yield in Python
Working With Files in Python
Python Generators 101

Have you ever had to work with a dataset so large that it overwhelmed your machine’s memory? Or maybe you have a complex function that needs to maintain an internal state every time it’s called, but the function is too small to justify creating its own class. In these cases and more, generators and the Python yield statement are here to help.

By the end of this course, you’ll know:

  • What generators are and how to use them
  • How to create generator functions and expressions
  • How the Python yield statement works
  • How to use multiple Python yield statements in a generator function
  • How to use advanced generator methods
  • How to build data pipelines with multiple generators

Understanding Generators

1
2
3
4
5
6
7
8
9
10
11
def infinite_sequence():
num = 0
while True:
yield num
num += 1

infinite = infinite_sequence()
next(infinite)
# 0
next(infinite)
# 1

Generators return the value then stop.


If you have multiple yield statement in one function, every next return one of the yield statement.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def infinite_sequence():
num = 0
while True:
yield num
num += 1
yield "This is the second yield statement!"

infinite = infinite_sequence()
next(infinite)
# 0
next(infinite)
# 'This is the second yield statement!'
next(infinite)
# 1
next(infinite)
# 'This is the second yield statement!'

For loop.

1
2
3
4
5
6
7
8
9
10
11
12
def finite_sequence():
nums = [1, 2, 3]
for num in nums:
yield num

finite = finite_sequence()
next(finite)
# 1
next(finite)
# 2
next(finite)
# 3

If you exceed the range.

1
next(finite)
1
2
3
--------------------------------------------------------------------------- StopIteration Traceback (most recent call last) ~/Desktop/Code/Python/Exercise.py in ----> [5901](file:///Users/zacks/Desktop/Code/Python/Exercise.py?line=5900) next(finite)

StopIteration:

Examples

Implement my_enumerate

Write your own generator function that works like the built-in function enumerate.

Calling the function like this:

1
2
3
4
lessons = ["Why Python Programming", "Data Types and Operators", "Control Flow", "Functions", "Scripting"]

for i, lesson in my_enumerate(lessons, 1):
print("Lesson {}: {}".format(i, lesson))

should output:

1
2
3
4
5
Lesson 1: Why Python Programming
Lesson 2: Data Types and Operators
Lesson 3: Control Flow
Lesson 4: Functions
Lesson 5: Scripting

Solution:

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
lessons = ["Why Python Programming", "Data Types and Operators", "Control Flow", "Functions", "Scripting"]

def my_enumerate(iterable, start=0) -> tuple:
"""my_enumerate returns the index and the element of the iteratble.

Args:
iterable (iterable object): An iterable object.
start (int): The start index.

Returns:
tuple: index, element
"""
idx = start
for element in iterable:
# O(1)
yield idx, element
idx += 1
Python
1
2
for i, lesson in my_enumerate(lessons, 0):
print("Lesson {}: {}".format(i, lesson))
1
2
3
4
5
Lesson 0: Why Python Programming
Lesson 1: Data Types and Operators
Lesson 2: Control Flow
Lesson 3: Functions
Lesson 4: Scripting
Python
1
2
for i, lesson in my_enumerate(lessons, 1):
print("Lesson {}: {}".format(i, lesson))
1
2
3
4
5
Lesson 1: Why Python Programming
Lesson 2: Data Types and Operators
Lesson 3: Control Flow
Lesson 4: Functions
Lesson 5: Scripting

Chunker

If you have an iterable that is too large to fit in memory in full (e.g., when dealing with large files), being able to take and use chunks of it at a time can be very valuable.

Implement a generator function, chunker, that takes in an iterable and yields a chunk of a specified size at a time.

Calling the function like this:

1
2
for chunk in chunker(range(25), 4):
print(list(chunk))

should output:

1
2
3
4
5
6
7
[0, 1, 2, 3]
[4, 5, 6, 7]
[8, 9, 10, 11]
[12, 13, 14, 15]
[16, 17, 18, 19]
[20, 21, 22, 23]
[24]

Better solution:

Here we don’t have to worry about stop is out of range since range handles it.

Python
1
2
list(range(5))[:100]
# [0, 1, 2, 3, 4]
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def chunker(iterable, size):
"""Yield successive chunks from iterable of length size.

Args:
iterable (iterable object): An iterable object.
size (int): The required size.

Returns:
list: The chunks.
"""
for start in range(0, len(iterable), size):
stop = start + size
yield iterable[start:stop]


for chunk in chunker(range(25), 4):
print(list(chunk))
1
2
3
4
5
6
7
[0, 1, 2, 3]
[4, 5, 6, 7]
[8, 9, 10, 11]
[12, 13, 14, 15]
[16, 17, 18, 19]
[20, 21, 22, 23]
[24]

Worse solution:

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def chunker(iterable, size):
"""chunker splits the iterable into pieces in specific size.

Args:
iterable (iterable object): An iterable object.
size (int): The required size.

Returns:
list: The elements in specific size.
"""
start = 0
while start < len(iterable):
if start < len(iterable) - size:
stop = start + size
yield iterable[start:stop]
start = stop
else:
stop = -1
yield iterable[start:]
break


for chunk in chunker(range(25), 4):
print(list(chunk))
1
2
3
4
5
6
7
[0, 1, 2, 3]
[4, 5, 6, 7]
[8, 9, 10, 11]
[12, 13, 14, 15]
[16, 17, 18, 19]
[20, 21, 22, 23]
[24]

Generator Comprehension

List comprehension.

1
2
3
nums_squared_lc = [num**2 for num in range(5)]
nums_squared_lc
# [0, 1, 4, 9, 16]

Generator comprehension

1
2
3
nums_squared_gc = (num**2 for num in range(5))
nums_squared_gc
# <generator object <genexpr> at 0x1057803c0>
1
2
next(nums_squared_gc)
# 0

See memory difference.

1
2
3
4
5
6
7
8
9
import sys

nums_squared_lc = [num**2 for num in range(100000)]
nums_squared_gc = (num**2 for num in range(100000))

print(sys.getsizeof(nums_squared_lc))
# 800984
print(sys.getsizeof(nums_squared_gc))
# 112

The Python Profilers

Once you’ve learned the difference in syntax, you’ll compare the memory footprint of both, and profile their performance using cProfile.

1
2
3
import cProfile

cProfile.run('sum([i**2 for i in range(100000)])')
1
2
3
4
5
6
7
8
9
10
11
import cProfile...
5 function calls in 0.043 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.041 0.041 0.041 0.041 <string>:1(<listcomp>)
1 0.001 0.001 0.043 0.043 <string>:1(<module>)
1 0.000 0.000 0.043 0.043 {built-in method builtins.exec}
1 0.001 0.001 0.001 0.001 {built-in method builtins.sum}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1
cProfile.run('sum((i**2 for i in range(100000)))')
1
2
3
4
5
6
7
8
9
10
11
cProfile.run('sum((i**2 for i in range(100000)))')
100005 function calls in 0.048 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
100001 0.038 0.000 0.038 0.000 <string>:1(<genexpr>)
1 0.000 0.000 0.048 0.048 <string>:1(<module>)
1 0.000 0.000 0.048 0.048 {built-in method builtins.exec}
1 0.009 0.009 0.047 0.047 {built-in method builtins.sum}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

Conclusion

  • Generators are iterable objects
  • They keep state between calls, meaning they can remember a place in a sequence without holding the entire sequence in memory.
  • They save memory, but are slower than other lterables, so there is a tradeoff.

Using Advanced Generator Methods

  • .send()
  • .throw()
  • .close()

In this lesson, you’ll learn about the advanced generators methods of .send(), .throw(), and .close(). To practice with these new methods, you’re going to build a program that can make use of each of the three methods.

As you follow along in the lesson, you’ll learn that yield is an expression, rather than a statement. You can use it as a statement, but you can manipulate a yielded value. You are allowed to .send() a new value back to the generator. You’ll also handle exceptions with .throw() and stop the generator after a given amount of digits with .close().


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# If a number is palindrome
def is_palindrome(num):
# Skip single-digit inputs
if num // 10 == 0:
return False
temp = num
reversed_num = 0
while temp != 0:
reversed_num = (reversed_num * 10) + (temp % 10)
temp = temp // 10
if num == reversed_num:
return True
else:
return False

is_palindrome(12345)
# False
is_palindrome(12321)
# True
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def infinite_palindromes():
num = 0
while True:
if is_palindrome(num):
i = (yield num)
if i is not None:
num = i
num += 1

pal_gen = infinite_palindromes()
for i in pal_gen:
print(i)
digits = len(str(i))
pal_gen.send(10 ** (digits))
1
2
3
4
5
6
7
8
9
10
11
12
11
111
1111
10101
101101
1001001
10011001
100010001
1000110001
10000100001
100001100001
1000001000001
  1. We got the 1st palindrome 11 and pal_gen send 10 ** len(str(11)) which is 100 to i = (yield num)
  2. Now i = 100, and the next palindrome is 111.
  3. pal_gen send 10 ** len(str(111)) which is 1000 to i = (yield num)

throw

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def infinite_palindromes():
num = 0
while True:
if is_palindrome(num):
i = (yield num)
if i is not None:
num = i
num += 1

pal_gen = infinite_palindromes()
for i in pal_gen:
print(i)
digits = len(str(i))
if digits == 5:
pal_gen.throw(ValueError("We don't like large palindromes"))
pal_gen.send(10 ** (digits))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
11
111
1111
10101
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/Desktop/Code/Python/Exercise.py in
5989 digits = len(str(i))
5990 if digits == 5:
---> 5991 pal_gen.throw(ValueError("We don't like large palindromes"))
5992 pal_gen.send(10 ** (digits))

~/Desktop/Code/Python/Exercise.py in infinite_palindromes()
5950 while True:
5951 if is_palindrome(num):
----> 5952 i = (yield num)
5953 if i is not None:
5954 num = i

ValueError: We don't like large palindromes

close

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def infinite_palindromes():
num = 0
while True:
if is_palindrome(num):
i = (yield num)
if i is not None:
num = i
num += 1

pal_gen = infinite_palindromes()
for i in pal_gen:
print(i)
digits = len(str(i))
if digits == 5:
pal_gen.close()
pal_gen.send(10 ** (digits))
1
2
3
4
5
6
7
8
9
10
11
12
13
14

def infinite_palindromes():...
11
111
1111
10101
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
~/Desktop/Code/Python/Exercise.py in
6015 if digits == 5:
6016 pal_gen.close()
---> 6017 pal_gen.send(10 ** (digits))

StopIteration:

Creating Data Pipelines With Generators

In this lesson, you’ll learn how to use generator expressions to build a data pipeline. Data pipelines allow you to string together code to process large datasets or streams of data without maxing out your machine’s memory.

For this example, you’ll use a CSV file that is pulled from the TechCrunch Continental USA dataset, which describes funding rounds and dollar amounts for various startups based in the USA. Click the link under Supporting Material to download the dataset included with the sample code for this course.

TechCrunch

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import os

path = "/Users/zacks/Desktop/Code/datasets/"
file_name = "techcrunch.csv"
file_path = os.path.join(path, file_name)

# Initialize lines as generator
lines = (line for line in open(file_path))
# Parsing
list_line = (s.rstrip().split(",") for s in lines)
# The 1st line of a csv file is the column names
cols = next(list_line)
# Iterate the rest contents
# The rest contents do not include header since we've run cols = next(list_line) above,
company_dicts = (dict(zip(cols, data)) for data in list_line)

# Select the raisedAmt for company's round == 'a'
funding = (
int(company_dict["raisedAmt"])
for company_dict in company_dicts
if company_dict["round"] == "a"
)
# Sum the total raisedAmt for all companies with round == 'a'
total_series_a = sum(funding)
print(f"Total series A fundraising: ${total_series_a}")
1
Total series A fundraising: $4376015000