|
| 1 | +Python 2.7 Regular Expressions |
| 2 | +============================== |
| 3 | + |
| 4 | +Special characters:: |
| 5 | + |
| 6 | + \ escapes special characters. |
| 7 | + . matches any character |
| 8 | + ^ matches start of the string (or line if MULTILINE) |
| 9 | + $ matches end of the string (or line if MULTILINE) |
| 10 | + [5b-d] matches any chars '5', 'b', 'c' or 'd' |
| 11 | + [^a-c6] matches any char except 'a', 'b', 'c' or '6' |
| 12 | + R|S matches either regex R or regex S. |
| 13 | + () Creates a capture group, and indicates precedence. |
| 14 | + |
| 15 | +No special chars do anything special inside ``[]``, hence they don't need |
| 16 | +escaping, except for ``']'`` and ``'-'``, which only need escaping if they are |
| 17 | +not the 1st char. e.g. ``'[]]'`` matches ``']'``. |
| 18 | + |
| 19 | +Quantifiers:: |
| 20 | + |
| 21 | + * 0 or more (append ? for non-greedy) |
| 22 | + + 1 or more " |
| 23 | + ? 0 or 1 " |
| 24 | + {m} exactly 'm' |
| 25 | + {m,n} from m to n. 'm' defaults to 0, 'n' to infinity |
| 26 | + {m,n}? from m to n, as few as possible |
| 27 | + |
| 28 | + |
| 29 | +Special sequences:: |
| 30 | + |
| 31 | + \A Start of string |
| 32 | + \b Matches empty string at word boundary (between \w and \W) |
| 33 | + \B Matches empty string not at word boundary |
| 34 | + \d Digit |
| 35 | + \D Non-digit |
| 36 | + \s Whitespace: [ \t\n\r\f\v], more if LOCALE or UNICODE |
| 37 | + \S Non-whitespace |
| 38 | + \w Alphanumeric: [0-9a-zA-Z_], or is LOCALE dependant |
| 39 | + \W Non-alphanumeric |
| 40 | + \Z End of string |
| 41 | + |
| 42 | + \g<id> Match previous named or numbered group, |
| 43 | + e.g. \g<0> or \g<name> |
| 44 | + |
| 45 | +Special character escapes are much like those already escaped in Python string |
| 46 | +literals. Hence regex '``\n``' is same as regex '``\\n``':: |
| 47 | + |
| 48 | + \a ASCII Bell (BEL) |
| 49 | + \f ASCII Formfeed |
| 50 | + \n ASCII Linefeed |
| 51 | + \r ASCII Carraige return |
| 52 | + \t ASCII Tab |
| 53 | + \v ASCII Vertical tab |
| 54 | + \\ A single backslash |
| 55 | + |
| 56 | + \xHH Two digit hex character |
| 57 | + \OOO Three digit octal char |
| 58 | + (or use a preceding zero, e.g. \0, \09) |
| 59 | + \DD Decimal number 1 to 99, matches previous |
| 60 | + numbered group |
| 61 | +. |
| 62 | +. |
| 63 | +Extensions. These do not cause grouping, except for ``(?P<name>...)``:: |
| 64 | + |
| 65 | + (?iLmsux) Matches empty string, letters set re.X flags |
| 66 | + (?:...) Non-capturing version of regular parentheses |
| 67 | + (?P<name>...) Creates a named capturing group. |
| 68 | + (?P=<name>) Matches whatever matched previously named group |
| 69 | + (?#...) A comment; ignored. |
| 70 | + (?=...) Lookahead assertion: Matches without consuming |
| 71 | + (?!...) Negative lookahead assertion |
| 72 | + (?<=...) Lookbehind assertion: Matches if preceded |
| 73 | + (?<!...) Negative lookbehind assertion |
| 74 | + (?(id)yes|no) If group 'id' matched, match 'yes', else 'no' |
| 75 | + |
| 76 | + |
| 77 | +Flags for re.compile(). Combine with ``'|'``:: |
| 78 | + |
| 79 | + re.I re.IGNORECASE Ignore case |
| 80 | + re.L re.LOCALE Make \w, \b, and \s locale dependent |
| 81 | + re.M re.MULTILINE Multiline |
| 82 | + re.S re.DOTALL Dot matches all (including newline) |
| 83 | + re.U re.UNICODE Make \w, \b, \d, and \s unicode dependent |
| 84 | + re.X re.VERBOSE Verbose (unescaped whitespace in pattern is ignored, |
| 85 | + and '#' marks comments, up till next newline) |
| 86 | + |
| 87 | + |
| 88 | +Module level functions:: |
| 89 | + |
| 90 | + re.compile(pattern[, flags]) -> RegexObject |
| 91 | + re.match(pattern, string[, flags]) -> MatchObject |
| 92 | + re.search(pattner, string[, flags]) -> MatchObject |
| 93 | + re.findall(pattern, string[, flags]) -> list of strings |
| 94 | + re.finditer(pattern, string[, flags]) -> iter of MatchObjects |
| 95 | + re.split(pattern, string[, maxsplit, flags]) -> list of strings |
| 96 | + re.sub(pattern, repl, string[, count, flags]) -> string |
| 97 | + re.subn(pattern, repl, string[, count, flags]) -> (string, int) |
| 98 | + re.escape(string) -> string |
| 99 | + re.purge() # the re cache |
| 100 | + |
| 101 | + |
| 102 | +RegexObjects (returned from ``compile()``):: |
| 103 | + |
| 104 | + match(string[, pos, endpos]) -> MatchObject |
| 105 | + search(string[, pos, endpos]) -> MatchObject |
| 106 | + findall(string[, pos, endpos]) -> list of strings |
| 107 | + finditer(string[, pos, endpos]) -> iter of MatchObjects |
| 108 | + split(string[, maxsplit]) -> list of strings |
| 109 | + sub(repl, string[, count]) -> string |
| 110 | + subn(repl, string[, count]) -> (string, int) |
| 111 | + flags int passed to compile() |
| 112 | + groups int number of capturing groups |
| 113 | + groupindex { 'name': id } |
| 114 | + pattern string |
| 115 | + |
| 116 | + |
| 117 | +MatchObjects (returned from ``match()`` and ``search()``):: |
| 118 | + |
| 119 | + expand(template) -> string # backslash and group expansion |
| 120 | + group([group1...]) -> string # or tuple of strings, one per arg |
| 121 | + groups([default]) -> tuple of all groups, non-matching='default' |
| 122 | + groupdict([default]) -> dict of named groups, non-matching='default' |
| 123 | + start([group]) -> indices of start/end of substring matched by group |
| 124 | + end([group]) group defaults to 0, which means the whole match. |
| 125 | + span([group]) -> tuple (match.start(group), match.end(group)) |
| 126 | + pos -> the value passed to search() or match() |
| 127 | + endpos -> " |
| 128 | + lastindex -> int index of last matched capturing group |
| 129 | + lastgroup -> string name of last matched capturing group |
| 130 | + re -> the regex passed to search() or match() |
| 131 | + string -> the string passed to seatch() or match() |
| 132 | + |
| 133 | + |
| 134 | +:Version: v0.2 |
| 135 | +:Contact: tartley@tartley.com |
| 136 | +Gleaned from the python 2.7 're' docs. http://docs.python.org/library/re.html |
0 commit comments