Have you ever had to work with a dataset so large that it overwhelmed your machine’s memory? Or maybe you have a complex function that needs to maintain an internal state every time it’s called, but the function is too small to justify creating its own class. In these cases and more, generators and the Python yield statement are here to help.
By the end of this course, you’ll know:
What generators are and how to use them
How to create generator functions and expressions
How the Python yield statement works
How to use multiple Python yield statements in a generator function
How to use advanced generator methods
How to build data pipelines with multiple generators
Understanding Generators
1 2 3 4 5 6 7 8 9 10 11
definfinite_sequence(): num = 0 whileTrue: yield num num += 1
If you have multiple yield statement in one function, every next return one of the yield statement.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
definfinite_sequence(): num = 0 whileTrue: yield num num += 1 yield"This is the second yield statement!"
infinite = infinite_sequence() next(infinite) # 0 next(infinite) # 'This is the second yield statement!' next(infinite) # 1 next(infinite) # 'This is the second yield statement!'
For loop.
1 2 3 4 5 6 7 8 9 10 11 12
deffinite_sequence(): nums = [1, 2, 3] for num in nums: yield num
defmy_enumerate(iterable, start=0) -> tuple: """my_enumerate returns the index and the element of the iteratble. Args: iterable (iterable object): An iterable object. start (int): The start index. Returns: tuple: index, element """ idx = start for element in iterable: # O(1) yield idx, element idx += 1
Python
1 2
for i, lesson in my_enumerate(lessons, 0): print("Lesson {}: {}".format(i, lesson))
1 2 3 4 5
Lesson 0: Why Python Programming Lesson 1: Data Types and Operators Lesson 2: Control Flow Lesson 3: Functions Lesson 4: Scripting
Python
1 2
for i, lesson in my_enumerate(lessons, 1): print("Lesson {}: {}".format(i, lesson))
1 2 3 4 5
Lesson 1: Why Python Programming Lesson 2: Data Types and Operators Lesson 3: Control Flow Lesson 4: Functions Lesson 5: Scripting
Chunker
If you have an iterable that is too large to fit in memory in full (e.g., when dealing with large files), being able to take and use chunks of it at a time can be very valuable.
Implement a generator function, chunker, that takes in an iterable and yields a chunk of a specified size at a time.
Calling the function like this:
1 2
for chunk in chunker(range(25), 4): print(list(chunk))
They keep state between calls, meaning they can remember a place in a sequence without holding the entire sequence in memory.
They save memory, but are slower than other lterables, so there is a tradeoff.
Using Advanced Generator Methods
.send()
.throw()
.close()
In this lesson, you’ll learn about the advanced generators methods of .send(), .throw(), and .close(). To practice with these new methods, you’re going to build a program that can make use of each of the three methods.
As you follow along in the lesson, you’ll learn that yield is an expression, rather than a statement. You can use it as a statement, but you can manipulate a yielded value. You are allowed to .send() a new value back to the generator. You’ll also handle exceptions with .throw() and stop the generator after a given amount of digits with .close().
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
# If a number is palindrome defis_palindrome(num): # Skip single-digit inputs if num // 10 == 0: returnFalse temp = num reversed_num = 0 while temp != 0: reversed_num = (reversed_num * 10) + (temp % 10) temp = temp // 10 if num == reversed_num: returnTrue else: returnFalse is_palindrome(12345) # False is_palindrome(12321) # True
1 2 3 4 5 6 7 8 9 10 11 12 13 14
definfinite_palindromes(): num = 0 whileTrue: if is_palindrome(num): i = (yield num) if i isnotNone: num = i num += 1
pal_gen = infinite_palindromes() for i in pal_gen: print(i) digits = len(str(i)) pal_gen.send(10 ** (digits))
We got the 1st palindrome 11 and pal_gen send 10 ** len(str(11)) which is 100 to i = (yield num)
Now i = 100, and the next palindrome is 111.
pal_gen send 10 ** len(str(111)) which is 1000 to i = (yield num)
throw
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
definfinite_palindromes(): num = 0 whileTrue: if is_palindrome(num): i = (yield num) if i isnotNone: num = i num += 1
pal_gen = infinite_palindromes() for i in pal_gen: print(i) digits = len(str(i)) if digits == 5: pal_gen.throw(ValueError("We don't like large palindromes")) pal_gen.send(10 ** (digits))
11 111 1111 10101 --------------------------------------------------------------------------- ValueError Traceback (most recent call last) ~/Desktop/Code/Python/Exercise.py in 5989 digits = len(str(i)) 5990 if digits == 5: ---> 5991 pal_gen.throw(ValueError("We don't like large palindromes")) 5992 pal_gen.send(10 ** (digits))
~/Desktop/Code/Python/Exercise.py in infinite_palindromes() 5950 while True: 5951 if is_palindrome(num): ----> 5952 i = (yield num) 5953 if i is not None: 5954 num = i
ValueError: We don't like large palindromes
close
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
definfinite_palindromes(): num = 0 whileTrue: if is_palindrome(num): i = (yield num) if i isnotNone: num = i num += 1
pal_gen = infinite_palindromes() for i in pal_gen: print(i) digits = len(str(i)) if digits == 5: pal_gen.close() pal_gen.send(10 ** (digits))
In this lesson, you’ll learn how to use generator expressions to build a data pipeline. Data pipelines allow you to string together code to process large datasets or streams of data without maxing out your machine’s memory.
For this example, you’ll use a CSV file that is pulled from the TechCrunch Continental USA dataset, which describes funding rounds and dollar amounts for various startups based in the USA. Click the link under Supporting Material to download the dataset included with the sample code for this course.
# Initialize lines as generator lines = (line for line inopen(file_path)) # Parsing list_line = (s.rstrip().split(",") for s in lines) # The 1st line of a csv file is the column names cols = next(list_line) # Iterate the rest contents # The rest contents do not include header since we've run cols = next(list_line) above, company_dicts = (dict(zip(cols, data)) for data in list_line)
# Select the raisedAmt for company's round == 'a' funding = ( int(company_dict["raisedAmt"]) for company_dict in company_dicts if company_dict["round"] == "a" ) # Sum the total raisedAmt for all companies with round == 'a' total_series_a = sum(funding) print(f"Total series A fundraising: ${total_series_a}")