Python高级-迭代器 - czhiming 的博客

Python中容器类的对象都是可迭代对象，外部则通过可迭代对象提供的迭代器达到逐个遍历元素的目的。

迭代器都实现了collections.abc.Iterator 基类中的两个重要方法，其中 iter() 方法用于返回迭代器本身，next() 方法用于实现逐个遍历元素的行为。

class Iterator(Iterable):

    __slots__ = ()

    @abstractmethod
    def __next__(self):
        'Return the next item from the iterator. When exhausted, raise
StopIteration'
        raise StopIteration

    def __iter__(self):
        return self

    @classmethod
    def __subclasshook__(cls, C): 
        if cls is Iterator:
	# 检查class是否实现了指定方法，issubclass(C, Iterable)返回True
            return _check_methods(C, '__iter__', '__next__') 
        return NotImplemented

下面是一个经典的迭代器实现，用于逐个遍历一个英文句子中的单词。

import re
import reprlib

RE_WORD = re.compile(r'\w+')


class Sentence:

    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __repr__(self):
        return f'Sentence({reprlib.repr(self.text)})'

    def __iter__(self): 
        return SentenceIterator(self.words) 

class SentenceIterator:

    def __init__(self, words):
        self.words = words 
        self.index = 0 

    def __next__(self):
        try:
            word = self.words[self.index] 
        except IndexError:
            raise StopIteration() 
        self.index += 1 
        return word  

    def __iter__(self):
        return self

s = Sentence('"The time has come," the Walrus said')
for word in s:
    print(word)

需要注意可迭代对象和迭代器是两个概念，实现时尽量不要混淆：

可迭代对象的 iter() 方法每次都返回一个全新的迭代器对象
具体迭代的迭代逻辑则由迭代器内部实现

这么做的原因如下。

支持多次独立遍历：每次对可迭代对象进行遍历（如 for word in s），都会生成一个新的迭代器对象，互不影响。如果把遍历状态（如 index）放在 Sentence 里，则同一时间内，同一个可迭代对象只能有一个遍历过程，多个遍历会互相干扰。
便于扩展和复用：通过将数据管理和迭代策略两项职责的分离，迭代器的实现可以灵活扩展，比如可以实现不同的遍历方式（正序、逆序、跳步等），而不需要修改可迭代对象本身。多个不同的可迭代对象也可以复用同一个迭代器实现。
节省内存：一个可迭代对象实例便可支持同时运行多种迭代策略。