|
1 | 1 | Python 2.7 Regular Expressions
|
2 | 2 | ==============================
|
3 | 3 |
|
4 |
| -Special characters:: |
5 |
| - |
6 |
| - \ escapes special characters. |
7 |
| - . matches any character |
8 |
| - ^ matches start of the string (or line if MULTILINE) |
9 |
| - $ matches end of the string (or line if MULTILINE) |
10 |
| - [5b-d] matches any chars '5', 'b', 'c' or 'd' |
11 |
| - [^a-c6] matches any char except 'a', 'b', 'c' or '6' |
12 |
| - R|S matches either regex R or regex S. |
13 |
| - () Creates a capture group, and indicates precedence. |
14 |
| - |
15 |
| -Within ``[]``, no special chars do anything special, hence they don't need |
16 |
| -escaping, except for ``']'`` and ``'-'``, which only need escaping if they are |
17 |
| -not the 1st char. e.g. ``'[]]'`` matches ``']'``. ``'^'`` also has special |
18 |
| -meaning, it negates the group if it's the first character in the ``[]``, and |
19 |
| -needs to be escaped to match it literally. |
20 |
| - |
21 |
| -Quantifiers:: |
22 |
| - |
23 |
| - * 0 or more (append ? for non-greedy) |
24 |
| - + 1 or more " |
25 |
| - ? 0 or 1 " |
26 |
| - {m} exactly 'm' |
27 |
| - {m,n} from m to n. 'm' defaults to 0, 'n' to infinity |
28 |
| - {m,n}? from m to n, as few as possible |
| 4 | +Non-special chars match themselves. Exceptions are special characters:: |
| 5 | + |
| 6 | + \ Escape special char |
| 7 | + . Match any char except newline, see re.DOTALL |
| 8 | + ^ Match start of the string, see re.MULTILINE |
| 9 | + $ Match end of the string, see re.MULTILINE |
| 10 | + [] Enclose a set of matchable chars |
| 11 | + R|S Match either regex R or regex S. |
| 12 | + () Create capture group, and indicate precedence |
| 13 | + |
| 14 | +After '``[``', enclose a set, the only special chars are:: |
| 15 | + |
| 16 | + ] End the set, if not the 1st char |
| 17 | + - A range, eg. a-c matches a, b or c |
| 18 | + ^ Negate the set only if it is the 1st char |
| 19 | + |
| 20 | +Quantifiers (append '``?``' for non-greedy):: |
| 21 | + |
| 22 | + * 0 or more |
| 23 | + + 1 or more |
| 24 | + ? 0 or 1 |
| 25 | + {m} Exactly 'm' |
| 26 | + {m,n} From m (default 0) to n (default infinity) |
29 | 27 |
|
30 | 28 | Special sequences::
|
31 | 29 |
|
32 | 30 | \A Start of string
|
33 |
| - \b Matches empty string at word boundary (between \w and \W) |
34 |
| - \B Matches empty string not at word boundary |
| 31 | + \b Match empty string at word (\w+) boundary |
| 32 | + \B Match empty string not at word boundary |
35 | 33 | \d Digit
|
36 | 34 | \D Non-digit
|
37 |
| - \s Whitespace: [ \t\n\r\f\v], more if LOCALE or UNICODE |
| 35 | + \s Whitespace [ \t\n\r\f\v], see LOCALE,UNICODE |
38 | 36 | \S Non-whitespace
|
39 |
| - \w Alphanumeric: [0-9a-zA-Z_], or is LOCALE dependant |
| 37 | + \w Alphanumeric: [0-9a-zA-Z_], see LOCALE |
40 | 38 | \W Non-alphanumeric
|
41 | 39 | \Z End of string
|
42 |
| - |
43 |
| - \g<id> Match previous group, '<' & '>' are literal |
44 |
| - e.g. \g<0> or \g<name> (not \g0 or \gname) |
| 40 | + \g<id> Match prev named or numbered group, |
| 41 | + '<' & '>' are literal, e.g. \g<0> |
| 42 | + or \g<name> (not \g0 or \gname) |
45 | 43 |
|
46 | 44 | Special character escapes are much like those already escaped in Python string
|
47 | 45 | literals. Hence regex '``\n``' is same as regex '``\\n``'::
|
48 | 46 |
|
49 | 47 | \a ASCII Bell (BEL)
|
50 | 48 | \f ASCII Formfeed
|
51 | 49 | \n ASCII Linefeed
|
52 |
| - \r ASCII Carraige return |
| 50 | + \r ASCII Carriage return |
53 | 51 | \t ASCII Tab
|
54 | 52 | \v ASCII Vertical tab
|
55 | 53 | \\ A single backslash
|
56 |
| - |
57 |
| - \xHH Two digit hex character |
58 |
| - \OOO Three digit octal char |
59 |
| - (or use a preceding zero, e.g. \0, \09) |
60 |
| - \DD Decimal number 1 to 99, matches previous |
61 |
| - numbered group |
62 |
| - |
63 |
| -Extensions. These do not cause grouping, except for ``(?P<name>...)``:: |
64 |
| - |
65 |
| - (?iLmsux) Matches empty string, sets re.X flags |
66 |
| - (?:...) Non-capturing version of regular parentheses |
67 |
| - (?P<name>...) Creates a named capturing group. |
68 |
| - (?P=name) Matches whatever matched previously named group |
69 |
| - (?#...) A comment; ignored. |
70 |
| - (?=...) Lookahead assertion: Matches without consuming |
71 |
| - (?!...) Negative lookahead assertion |
72 |
| - (?<=...) Lookbehind assertion: Matches if preceded |
73 |
| - (?<!...) Negative lookbehind assertion |
74 |
| - (?(id)yes|no) Match 'yes' if group 'id' matched, else 'no' |
| 54 | + \xHH Two digit hexadecimal character goes here |
| 55 | + \OOO Three digit octal char (or just use an |
| 56 | + initial zero, e.g. \0, \09) |
| 57 | + \DD Decimal number 1 to 99, match |
| 58 | + previous numbered group |
| 59 | + |
| 60 | +Extensions. Do not cause grouping, except '``P<name>``':: |
| 61 | + |
| 62 | + (?iLmsux) Match empty string, sets re.X flags |
| 63 | + (?:...) Non-capturing version of regular parens |
| 64 | + (?P<name>...) Create a named capturing group. |
| 65 | + (?P=name) Match whatever matched prev named group |
| 66 | + (?#...) A comment; ignored. |
| 67 | + (?=...) Lookahead assertion, match without consuming |
| 68 | + (?!...) Negative lookahead assertion |
| 69 | + (?<=...) Lookbehind assertion, match if preceded |
| 70 | + (?<!...) Negative lookbehind assertion |
| 71 | + (?(id)y|n) Match 'y' if group 'id' matched, else 'n' |
75 | 72 |
|
76 | 73 | Flags for re.compile(), etc. Combine with ``'|'``::
|
77 | 74 |
|
@@ -105,30 +102,30 @@ RegexObjects (returned from ``compile()``)::
|
105 | 102 | .split(string[, maxsplit]) -> list of strings
|
106 | 103 | .sub(repl, string[, count]) -> string
|
107 | 104 | .subn(repl, string[, count]) -> (string, int)
|
108 |
| - .flags # int passed to compile() |
109 |
| - .groups # int number of capturing groups |
110 |
| - .groupindex # {} maps group names to ints |
111 |
| - .pattern # string passed to compile() |
| 105 | + .flags # int, Passed to compile() |
| 106 | + .groups # int, Number of capturing groups |
| 107 | + .groupindex # {}, Maps group names to ints |
| 108 | + .pattern # string, Passed to compile() |
112 | 109 |
|
113 | 110 | MatchObjects (returned from ``match()`` and ``search()``)::
|
114 | 111 |
|
115 |
| - .expand(template) -> string, backslash and group expansion |
| 112 | + .expand(template) -> string, Backslash & group expansion |
116 | 113 | .group([group1...]) -> string or tuple of strings, 1 per arg
|
117 |
| - .groups([default]) -> (,) of all groups, non-matching=default |
118 |
| - .groupdict([default]) -> {} of named groups, non-matching=default |
119 |
| - .start([group]) -> int, start/end of substring matched by group |
120 |
| - .end([group]) (group defaults to 0, the whole match) |
| 114 | + .groups([default]) -> tuple of all groups, non-matching=default |
| 115 | + .groupdict([default]) -> {}, Named groups, non-matching=default |
| 116 | + .start([group]) -> int, Start/end of substring match by group |
| 117 | + .end([group]) -> int, Group defaults to 0, the whole match |
121 | 118 | .span([group]) -> tuple (match.start(group), match.end(group))
|
122 |
| - .pos # value passed to search() or match() |
123 |
| - .endpos # " |
124 |
| - .lastindex # int index of last matched capturing group |
125 |
| - .lastgroup # string name of last matched capturing group |
126 |
| - .re # regex passed to search() or match() |
127 |
| - .string # string passed to search() or match() |
| 119 | + .pos int, Passed to search() or match() |
| 120 | + .endpos int, " |
| 121 | + .lastindex int, Index of last matched capturing group |
| 122 | + .lastgroup string, Name of last matched capturing group |
| 123 | + .re regex, As passed to search() or match() |
| 124 | + .string string, " |
128 | 125 |
|
129 | 126 |
|
130 | 127 | Gleaned from the python 2.7 're' docs. http://docs.python.org/library/re.html
|
131 | 128 |
|
132 |
| -:Version: v0.3.1 |
133 |
| -:Contact: tartley@tartley.com |
| 129 | +https://github.com/tartley/python-regex-cheatsheet |
| 130 | +Version: v0.3.3 |
134 | 131 |
|
0 commit comments