访问者-CFANZ编程社区

访问者（visitor）帮助分离算法与数据结构，并具有与观察者模式类似的目标。它允许扩

展给定类的功能而不改变其代码。但是访问者做的更多的是，通过定义一个负责保存数据的类，

并将算法推送到称为访问者的其他类。每个访问者专用于一种算法，并且可以将其应用于数据。

访问者模式通过在数据类中提供可由各种访问者访问的入口点来实现。

Visitable 类决定它如何调用 Visitor 类，例如，通过决定调用哪个方法。例如，

负责打印内置类型内容的访问者可以实现 visit_TYPENAME()方法，并且每个类型都可

以在 accept()方法中调用给定的方法，如下所示：

class VisitableList(list):

def accept(self, visitor):

visitor.visit_list(self)

class VisitableDict(dict):

def accept(self, visitor):

visitor.visit_dict(self)

class Printer(object):

def visit_list(self, instance):

print('list content: {}'.format(instance))

def visit_dict(self, instance):

print('dict keys: {}'.format(

', '.join(instance.keys()))

)

下面的示例就是这样做的：

>>> visitable_list = VisitableList([1, 2, 5])

>>> visitable_list.accept(Printer())

list content: [1, 2, 5]

>>> visitable_dict = VisitableDict({'one': 1, 'two': 2, 'three': 3})

>>> visitable_dict.accept(Printer())

dict keys: two, one, three

但是这种模式意味着每个被访问的类需要有一个被访问的 accept 方法，这是非常痛

苦的。

由于 Python 允许代码内省，一个更好的主意是自动关联访问者和被访问的类，如

下所示：

>>> def visit(visited, visitor):

... cls = visited.__class__.__name__

... method_name = 'visit_%s' % cls

... method = getattr(visitor, method_name, None)

... if isinstance(method, Callable):

... method(visited)

... else:

... raise AttributeError(

... "No suitable '{}' method in visitor"

... "".format(method_name)

... )

...

>>> visit([1,2,3], Printer())

list content: [1, 2, 3]

>>> visit({'one': 1, 'two': 2, 'three': 3}, Printer())

dict keys: two, one, three

>>> visit((1, 2, 3), Printer())

Traceback (most recent call last):

File "<input>", line 1, in <module>

File "<input>", line 10, in visit

AttributeError: No suitable 'visit_tuple' method in visitor

该模式以这种方式在 ast 模块中使用，例如，通过调用访问者的 NodeVisitor 类与

编译代码树的每个节点。这是因为 Python 没有像 Haskell 这样的匹配运算符。

另一个例子是根据文件扩展名调用 Visitor 方法的目录遍历器，如下所示：

>>> def visit(directory, visitor):

... for root, dirs, files in os.walk(directory):

... for file in files:

... # foo.txt → .txt

... ext = os.path.splitext(file)[-1][1:]

... if hasattr(visitor, ext):

... getattr(visitor, ext)(file)

...

>>> class FileReader(object):

... def pdf(self, filename):

... print('processing: {}'.format(filename))

...

>>> walker = visit('/Users/tarek/Desktop', FileReader())

processing slides.pdf

processing sholl23.pdf

如果你的应用程序具有由多个算法访问的数据结构，则访问者模式将有助于分离关

注点。对于数据容器来说，最好只专注于提供数据访问和持有数据，而无需关心其他任

何事情。

模板

模板（template）通过定义抽象步骤来帮助设计一个通用算法，这些抽象步骤由子

类来实现。这种模式使用里氏替换原则（Liskov substitution principle），在维基百科中这

样定义。

“如果 S 是 T 的子类型，则程序中类型 T 的对象可以用类型 S 的对象替换，

而无需改变该程序的任何期望属性。”

换句话说，抽象类可以通过在具体类中实现的步骤来定义算法如何工作。抽象类还可

以给出算法的基本或部分实现，并允许开发人员覆写其部分。例如，可以覆写队列模块中

的 Queue 类的一些方法以改变其行为。

Indexer 是一个索引器类，它以 5 个步骤处理文本，这是无论使用任何索引技术都常

见的步骤。

• 文本规范化。

• 文本拆分。

• 去停用词。

抽取词干。

• 词频。

Indexer 提供了流程算法的部分实现，但是需要在子类中实现_remove_stop_words 和

_stem_words。BasicIndexer 实现最小必须的部分，而 LocalIndex 使用停用词

文件和词干数据库。FastIndexer 实现所有步骤，可以基于快速索引器，如 Xapian

或 Lucene。

一个简单实现如下：

from collections import Counter

class Indexer:

def process(self, text):

text = self._normalize_text(text)

words = self._split_text(text)

words = self._remove_stop_words(words)

stemmed_words = self._stem_words(words)

return self._frequency(stemmed_words)

def _normalize_text(self, text):

return text.lower().strip()

def _split_text(self, text):

return text.split()

def _remove_stop_words(self, words):

raise NotImplementedError

def _stem_words(self, words):

raise NotImplementedError

def _frequency(self, words):

return Counter(words)

从那里，一个 BasicIndexer 实现可以是如下所示：

class BasicIndexer(Indexer):

_stop_words = {'he', 'she', 'is', 'and', 'or', 'the'}

def _remove_stop_words(self, words):

return (

word for word in words

if word not in self._stop_words

)

def _stem_words(self, words):

return (

(

len(word) > 3 and

word.rstrip('aeiouy') or

word

)

for word in words

)

并且，和以往一样，这里是上面的示例代码的示例用法，如下所示：

>>> indexer = BasicIndexer()

>>> indexer.process("Just like Johnny Flynn said\nThe breath I've taken

and the one I must to go on")

Counter({"i'v": 1, 'johnn': 1, 'breath': 1, 'to': 1, 'said': 1, 'go': 1,

'flynn': 1, 'taken': 1, 'on': 1, 'must': 1, 'just': 1, 'one': 1, 'i': 1,

'lik': 1})

对于可以变化并且可以被表示为独立的子步骤的算法，应当考虑模板。这可能是 Python

中最常用的模式，并不总是需要通过子类来实现。例如，许多处理算法问题的内置 Python

函数接受允许将部分实现委托给外部实现的参数。例如，sorted()函数允许一个可选的

key 关键字参数，稍后由排序算法使用。在给定集合中找到最小值和最大值的 min()和

max()函数也是如此。

小结

针对软件设计的常见问题，设计模式是可复用的且有点语言相关的解决方案。对于所

有开发者来说，无论他们使用何种语言，设计模式都是必备修养。

因此，对于一种给定的语言，使用实现的例子来说明常用的模式，这是一种很好

的学习设计模式的方式。在 Web 开发以及其他开发的书籍中，你可以很容易找到 GoF

书中提到的每一个设计模式的实现。所以，我们只关注 Python 语言中最常见且最流行

的模式。