正则表达式4（python re源码解析）-CFANZ编程社区

标题 Python 正则表达式模块的源码解析

如何学习好语言？100个人会有100种方法，我选择学习源码，今天就来解析Python中的正则模块源代码的一部分，由于源码的内容比较多，不可能一天就学习完，只好先试着分析下一部分内容。话不多说，直接上码，中文是我自己写的，等于注解，有些也看不懂：
# Secret Labs' Regular Expression Engine
#秘密实验室的正则表达式引擎
#
# re-compatible interface for the sre matching engine
#正则兼容接口兼容sre 匹配引擎？？？
# Copyright (c) 1998-2001 by Secret Labs AB.  All rights reserved.

#版权信息，secret labs ab 一家瑞典的公司,位于Teknikringen 8 Linkoping, ？？
# This version of the SRE library can be redistributed under CNRI's
# Python 1.6 license.  For any other use, please contact Secret Labs
# AB (info@pythonware.com).
#版权声明，应该意思是从Python1.6版本开始自带，有别的使用用途
#请联系secret labs 然后给了个邮箱(info@pythonware.com).
# Portions of this engine have been developed in cooperation with
# CNRI.  Hewlett-Packard provided funding for 1.6 integration and
# other compatibility work.
#部分引擎的开发是和cnrt合作
#cnrt 国家技术委员会？

r"""Support for regular expressions (RE).
#原生字符串被re所支持？

This module provides regular expression matching operations similar to
those found in Perl.  It supports both 8-bit and Unicode strings; both
the pattern and the strings being processed can contain null bytes and
characters outside the US ASCII range.
#本模块提供了正则表达式的匹配操作,它的功能跟Perl语言里的功能一样。
#无论是Unicode字符串还是单字节8位组成的字符串,都可以使用模式匹配和字符串查徇

Regular expressions can contain both special and ordinary characters.
Most ordinary characters, like "A", "a", or "0", are the simplest
regular expressions; they simply match themselves.  You can
concatenate ordinary characters, so last matches the string 'last'.
#正则表达式可以包含特殊字符和普通字符，大部分普通字符，比如，
#‘A’，'a',或者'0',是最简单的正则，这些普通字符可以连接起来，
#所以，last这个单词匹配 字符串'last'
The special characters are:
#特殊字符有如下：
    "."      Matches any character except a newline.
    #匹配任意一个字符，只在一个行内
    "^"      Matches the start of the string.
    #匹配字符串的最开始的
    "$"      Matches the end of the string or just before the newline at
             the end of the string.
    #匹配字符串的末尾，或者行尾
    #以上三个特殊字符都是在本行生效，具体意思就是，
    #如果是个多行字符串，只作用当前行
    "*"      Matches 0 or more (greedy) repetitions of the preceding RE.
             Greedy means that it will match as many repetitions as possible.
    #匹配这个字符（*）之前的字符0或者更多此
    #解释了下贪婪模式，意思是贪婪模式是尽可能多的匹配
    "+"      Matches 1 or more (greedy) repetitions of the preceding RE.
    #匹配这个字符（+）之前的字符至少一个或者更多，也是贪婪模式使用
    "?"      Matches 0 or 1 (greedy) of the preceding RE.
    #匹配这个字符（？）之前的字符0或者1
    *?,+?,?? Non-greedy versions of the previous three special characters.
    #之前的三个特殊字符的组合，*？，+？？？表示非贪婪模式
    #意思就是见到这三个 就表示非贪婪模式
    {m,n}    Matches from m to n repetitions of the preceding RE.
    #之前的正则匹配从m次到n次
    {m,n}?   Non-greedy version of the above.
    #上述模式的非贪婪模式
    "\\"     Either escapes special characters or signals a special sequence.
    #排除特殊字符，或者提示是个特殊句子
    #也就是俗称的转义
    []       Indicates a set of characters.
             A "^" as the first character indicates a complementing set.
    #【】表示一个集合，
    "|"      A|B, creates an RE that will match either A or B.
    #表示或 ，一个逻辑关系
    (...)    Matches the RE inside the parentheses.
             The contents can be retrieved or matched later in the string.
    #匹配（）内的内容直到匹配到底
    (?aiLmsux) Set the A, I, L, M, S, U, or X flag for the RE (see below).
    #设置 re以a,i.l,m.s.u.x标识解析也就是re.A,I,在下面有专门介绍
    #也就是re的解析标识或者模式
    (?:...)  Non-grouping version of regular parentheses.
    #无分组re解析模式
    (?P<name>...) The substring matched by the group is accessible by name.
    (?P=name)     Matches the text matched earlier by the group named name.
    (?#...)  A comment; ignored.
    (?=...)  Matches if ... matches next, but doesn't consume the string.
    #精准匹配某一段
    (?!...)  Matches if ... doesn't match next.
    (?<=...) Matches if preceded by ... (must be fixed length).
    (?<!...) Matches if not preceded by ... (must be fixed length).
    (?(id/name)yes|no) Matches yes pattern if the group with id/name matched,
                       the (optional) no pattern otherwise.