0
点赞
收藏
分享

微信扫一扫

正则表达式4(python re源码解析)


标题 Python 正则表达式模块的源码解析

如何学习好语言?100个人会有100种方法,我选择学习源码,今天就来解析Python中的正则模块源代码的一部分,由于源码的内容比较多,不可能一天就学习完,只好先试着分析下一部分内容。话不多说,直接上码,中文是我自己写的,等于注解,有些也看不懂:

# Secret Labs' Regular Expression Engine
#秘密实验室的正则表达式引擎
#
# re-compatible interface for the sre matching engine
#正则兼容接口兼容sre 匹配引擎???
# Copyright (c) 1998-2001 by Secret Labs AB. All rights reserved.

#版权信息,secret labs ab 一家瑞典的公司,位于Teknikringen 8 Linkoping, ??
# This version of the SRE library can be redistributed under CNRI's
# Python 1.6 license. For any other use, please contact Secret Labs
# AB (info@pythonware.com).
#版权声明,应该意思是从Python1.6版本开始自带,有别的使用用途
#请联系secret labs 然后给了个邮箱(info@pythonware.com).
# Portions of this engine have been developed in cooperation with
# CNRI. Hewlett-Packard provided funding for 1.6 integration and
# other compatibility work.
#部分引擎的开发是和cnrt合作
#cnrt 国家技术委员会?

r"""Support for regular expressions (RE).
#原生字符串被re所支持?

This module provides regular expression matching operations similar to
those found in Perl. It supports both 8-bit and Unicode strings; both
the pattern and the strings being processed can contain null bytes and
characters outside the US ASCII range.
#本模块提供了正则表达式的匹配操作,它的功能跟Perl语言里的功能一样。
#无论是Unicode字符串还是单字节8位组成的字符串,都可以使用模式匹配和字符串查徇

Regular expressions can contain both special and ordinary characters.
Most ordinary characters, like "A", "a", or "0", are the simplest
regular expressions; they simply match themselves. You can
concatenate ordinary characters, so last matches the string 'last'.
#正则表达式可以包含特殊字符和普通字符,大部分普通字符,比如,
#‘A’,'a',或者'0',是最简单的正则,这些普通字符可以连接起来,
#所以,last这个单词匹配 字符串'last'
The special characters are:
#特殊字符有如下:
"." Matches any character except a newline.
#匹配任意一个字符,只在一个行内
"^" Matches the start of the string.
#匹配字符串的最开始的
"$" Matches the end of the string or just before the newline at
the end of the string.
#匹配字符串的末尾,或者行尾
#以上三个特殊字符都是在本行生效,具体意思就是,
#如果是个多行字符串,只作用当前行
"*" Matches 0 or more (greedy) repetitions of the preceding RE.
Greedy means that it will match as many repetitions as possible.
#匹配这个字符(*)之前的字符0或者更多此
#解释了下贪婪模式,意思是贪婪模式是尽可能多的匹配
"+" Matches 1 or more (greedy) repetitions of the preceding RE.
#匹配这个字符(+)之前的字符至少一个或者更多,也是贪婪模式使用
"?" Matches 0 or 1 (greedy) of the preceding RE.
#匹配这个字符(?)之前的字符0或者1
*?,+?,?? Non-greedy versions of the previous three special characters.
#之前的三个特殊字符的组合,*?,+???表示非贪婪模式
#意思就是见到这三个 就表示非贪婪模式
{m,n} Matches from m to n repetitions of the preceding RE.
#之前的正则匹配从m次到n次
{m,n}? Non-greedy version of the above.
#上述模式的非贪婪模式
"\\" Either escapes special characters or signals a special sequence.
#排除特殊字符,或者提示是个特殊句子
#也就是俗称的转义
[] Indicates a set of characters.
A "^" as the first character indicates a complementing set.
#【】表示一个集合,
"|" A|B, creates an RE that will match either A or B.
#表示或 ,一个逻辑关系
(...) Matches the RE inside the parentheses.
The contents can be retrieved or matched later in the string.
#匹配()内的内容直到匹配到底
(?aiLmsux) Set the A, I, L, M, S, U, or X flag for the RE (see below).
#设置 re以a,i.l,m.s.u.x标识解析也就是re.A,I,在下面有专门介绍
#也就是re的解析标识或者模式
(?:...) Non-grouping version of regular parentheses.
#无分组re解析模式
(?P<name>...) The substring matched by the group is accessible by name.
(?P=name) Matches the text matched earlier by the group named name.
(?#...) A comment; ignored.
(?=...) Matches if ... matches next, but doesn't consume the string.
#精准匹配某一段
(?!...) Matches if ... doesn't match next.
(?<=...) Matches if preceded by ... (must be fixed length).
(?<!...) Matches if not preceded by ... (must be fixed length).
(?(id/name)yes|no) Matches yes pattern if the group with id/name matched,
the (optional) no pattern otherwise.


举报

相关推荐

0 条评论