For Loops and Iterators

From CS 61A Wiki
Jump to: navigation, search

For loop - advanced version

This guide assumes that you are already familiar with the for loop and have a general idea of how a for loop works.

So far, you have seen for loops over many data structures:

odds = range(1, 10, 2)
for num in odds:  # odds is a range object.
    print(num)
 
vowels = 'aeiou'
for vowel in vowels:  # vowels is a string object.
    print(vowel)
 
glookup = { 'hw' : 2, 'proj1' : 17, 'proj2' : 12, 'test1' : 50 }
for key in glookup:  # glookup is a dictionary object.
    print('{0} has a maximum score of {1}'.format(key, glookup[key]))

A key principle in program design is that you should use an interface in order to simplify code. (Make sure you know what an interface is before proceeding.) In particular, in the case of for loops, there must be an interface that all of these various different objects implement that the for loop takes advantage of. You might think it is the sequence interface, but looking at that last example, we can see that it is possible to have a for loop over dictionaries, and dictionaries are not sequences. The most commonly used interface in a for loop is the Iterator interface.

Iterators

Conceptually, an iterator is an object that allows you to iterate through some data. In particular, you can keep querying the iterator in order to get more values (at least until it runs out of values). Why would we want this? Don't we already have lists, which can do the same thing? The key point is that iterators are more efficient - they only store enough data so that they can compute the next number. On the other hand, lists have to store all of their values at the same time, which uses up the computer's memory. For example, if we wanted to iterate through the numbers from 1 to a billion, with a list, we would have to use 1 billion slots in the computer to store all of those numbers. With an iterator on the other hand, we only need to store one number - the number we are currently considering.

In order to implement this idea, an iterator just needs to have a single method, that allows you to get the next item from the iterator, or raises a StopIteration error to signal that there are no more elements. This is the __next__ magic method. However, it turns out that you also need an __iter__ method - more on that soon.

Let's look at what happens if we create a class that only has a __next__ magic method and not the __iter__ magic method. We will try to define the RangeIterator class, which will eventually be an Iterator that duplicates (some of) the functionality of the built-in range object:

class RangeIterator:
 
    def __init__(self, start, end):
        self.val = start
        self.end = end
 
    def __next__(self):
        if self.val >= self.end:
            raise StopIteration
        value_to_return = self.val
        self.val += 1
        return value_to_return
 
>>> teens = RangeIterator(13, 20)
>>> teens.__next__()
13
>>> next(teens)  # Equivalent to teens.__next__()
14
>>> for i in range(5):  # Repeat 5 times
...     print(next(teens))
... 
15
16
17
18
19
>>> next(teens)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 7, in __next__
StopIteration

Okay, so far this seems to be working.

But if we try a for loop, things break:

>>> for i in RangeIterator(0, 10):
...     print(i)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'RangeIterator' object is not iterable

Okay, that was confusing. To understand that, let's take a detour and talk about for loops and what they actually do:

How a For Loop uses Iterators

In general a for loop looks something like this:

for elem in iterable_thing:
    do_stuff_with_elem

Internally, Python does the following:

try:
    iterator = iterable_thing.__iter__()  # Get an iterator!
except AttributeError:
    raise TypeError('not iterable')
try:
    while True:
        elem = iterator.__next__()
        do_stuff_with_elem
except StopIteration:
    pass  # Do nothing

Notice the line which gets an iterator. This is a design decision that Python made - Python wants to allow you to use a for loop on iterators as well as things that you can iterate over. So in the loop above, the iterable_thing could be an iterator or something that could give you an iterator (through the __iter__ method).

So, now we can understand why our RangeIterator example didn't work out. The example we had was

>>> for i in RangeIterator(0, 10):
...     print(i)
...

If we look at what a for loop does, we notice that it first calls the __iter__ method, which means that it tries to do RangeIterator(0, 10).__iter__() However, RangeIterator does not have an __iter__ method! That's why we get the not iterable error message.

How can we fix this? Well, an easy fix would be to give RangeIterators an __iter__ method. What should that __iter__ method do? It is supposed to return an iterator, so that the for loop can call __next__ on the iterator to get the various values. However, the RangeIterator is already an iterator, so we can just return the RangeIterator itself!

class RangeIterator:
 
    def __init__(self, start, end):
        self.val = start
        self.end = end
 
    def __iter__(self):
        return self  # self is a RangeIterator object, which already has a __next__ method
 
    def __next__(self):
        if self.val >= self.end:
            raise StopIteration
        value_to_return = self.val
        self.val += 1
        return value_to_return
 
>>> for i in RangeIterator(2, 6):
...     print(i)
... 
2
3
4
5

In summary - because of how for loops work, an iterator must have both an __iter__ and a __next__ method.

Iterable

Conceptually, an iterable object is something that we can put in a for loop. If we look at what a for does with the iterable_thing, we notice that all it does is call the __iter__ method. So, an object implements the Iterable interface if it has a __iter__ method. (It may or may not have a __next__ method.)

Check Your Knowledge

  • If something is an iterator, must it also be an iterable?
  • If something is an iterable, must it also be an iterator?
  • Why do iterators have to have an __iter__ method? Shouldn't they only need a __next__ method?
  • Why do we bother using iterators? Couldn't we just use lists all the time? They are easier to use than iterators, after all...
  • You have the following interaction with the Python interpreter:
>>> for i in Foo():
...     print(i)
... 
TypeError: 'Foo' object is not iterable
>>> for i in Bar():
...     print(i)
...
>>> for i in Baz():
...     print(i)
...
hello
world
>>> next(Garply())
0
>>> next(Garply())
0
>>> next(iter(Garply()))
0
    • For each of Foo, Bar, Baz, and Garply, answer the following questions:
      • Is it guaranteed to have an __iter__ method? Is it guaranteed to not have an __iter__ method?
      • Same thing for the __next__ method.
      • Is it guaranteed to be Iterable?
      • Is it guaranteed to be an Iterator?
  • [Midterm Level - Average] What Would Python Output? (Assume that the RangeIterator class has already been defined, as shown in the previous section.)
>>> rng = RangeIterator(2, 5)
>>> for i in rng:
...     for j in rng:
...         print(i, j)
...
  • [Midterm Level - Tricky] What Would Python Output?
class RangeIterator:
    def __init__(self, start, end):
        self.start = start
        self.val = start
        self.end = end
    def __iter__(self):
        self.val = self.start  # Reset to the beginning
        return self
    def __next__(self):
        if self.val >= self.end:
            raise StopIteration
        value_to_return = self.val
        self.val += 1
        return value_to_return
 
class Range:
    def __init__(self, start, end):
        self.start = start
        self.end = end
    def __iter__(self):
        return RangeIterator(self.start, self.end)
 
>>> rng = RangeIterator(2, 4)
>>> for i in rng:
...     for j in rng:
...         print(i, j)
... 
>>> rng = Range(2, 4)
>>> for i in rng:
...     for j in rng:
...         print(i, j)
...