<!doctype html> <meta charset="utf8"> <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/styles/github.min.css"> <link rel="spec" href="es2015" /> <pre class="metadata"> title: Regular Expression Pattern Modifiers for ECMAScript stage: 3 contributors: Ron Buckton, Ecma International </pre> <emu-biblio href="node_modules/@tc39/ecma262-biblio/biblio.json"></emu-biblio> <emu-intro id="sec-intro"> <h1>Introduction</h1> <p>See <a href="https://github.com/tc39/proposal-regexp-modifiers#readme">the proposal repository</a> for background material and discussion.</p> </emu-intro> <emu-clause id="sec-text-processing"> <h1>Text Processing</h1> <emu-clause id="sec-regexp-regular-expression-objects"> <h1>RegExp (Regular Expression) Objects</h1> <p>A RegExp object contains a regular expression and the associated flags.</p> <emu-note> <p>The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language.</p> </emu-note> <emu-clause id="sec-patterns"> <h1>Patterns</h1> <p>The RegExp constructor applies the following grammar to the input pattern String. An error occurs if the grammar cannot interpret the String as an expansion of |Pattern|.</p> <h2>Syntax</h2> <emu-grammar type="definition"> Pattern[UnicodeMode, N] :: Disjunction[?UnicodeMode, ?N] Disjunction[UnicodeMode, N] :: Alternative[?UnicodeMode, ?N] Alternative[?UnicodeMode, ?N] `|` Disjunction[?UnicodeMode, ?N] Alternative[UnicodeMode, N] :: [empty] Alternative[?UnicodeMode, ?N] Term[?UnicodeMode, ?N] Term[UnicodeMode, N] :: Assertion[?UnicodeMode, ?N] Atom[?UnicodeMode, ?N] Atom[?UnicodeMode, ?N] Quantifier Assertion[UnicodeMode, N] :: `^` `$` `\` `b` `\` `B` `(` `?` `=` Disjunction[?UnicodeMode, ?N] `)` `(` `?` `!` Disjunction[?UnicodeMode, ?N] `)` `(` `?` `<=` Disjunction[?UnicodeMode, ?N] `)` `(` `?` `<!` Disjunction[?UnicodeMode, ?N] `)` Quantifier :: QuantifierPrefix QuantifierPrefix `?` QuantifierPrefix :: `*` `+` `?` `{` DecimalDigits[~Sep] `}` `{` DecimalDigits[~Sep] `,` `}` `{` DecimalDigits[~Sep] `,` DecimalDigits[~Sep] `}` Atom[UnicodeMode, N] :: PatternCharacter `.` `\` AtomEscape[?UnicodeMode, ?N] CharacterClass[?UnicodeMode] `(` GroupSpecifier[?UnicodeMode] Disjunction[?UnicodeMode, ?N] `)` <del>`(` `?` `:` Disjunction[?UnicodeMode, ?N] `)`</del> <ins>`(` `?` RegularExpressionFlags `:` Disjunction[?UnicodeMode, ?N] `)`</ins> <ins>`(` `?` RegularExpressionFlags `-` RegularExpressionFlags `:` Disjunction[?UnicodeMode, ?N] `)`</ins> SyntaxCharacter :: one of `^` `$` `\` `.` `*` `+` `?` `(` `)` `[` `]` `{` `}` `|` PatternCharacter :: SourceCharacter but not SyntaxCharacter AtomEscape[UnicodeMode, N] :: DecimalEscape CharacterClassEscape[?UnicodeMode] CharacterEscape[?UnicodeMode] [+N] `k` GroupName[?UnicodeMode] CharacterEscape[UnicodeMode] :: ControlEscape `c` ControlLetter `0` [lookahead ∉ DecimalDigit] HexEscapeSequence RegExpUnicodeEscapeSequence[?UnicodeMode] IdentityEscape[?UnicodeMode] ControlEscape :: one of `f` `n` `r` `t` `v` ControlLetter :: one of `a` `b` `c` `d` `e` `f` `g` `h` `i` `j` `k` `l` `m` `n` `o` `p` `q` `r` `s` `t` `u` `v` `w` `x` `y` `z` `A` `B` `C` `D` `E` `F` `G` `H` `I` `J` `K` `L` `M` `N` `O` `P` `Q` `R` `S` `T` `U` `V` `W` `X` `Y` `Z` GroupSpecifier[UnicodeMode] :: [empty] `?` GroupName[?UnicodeMode] GroupName[UnicodeMode] :: `<` RegExpIdentifierName[?UnicodeMode] `>` RegExpIdentifierName[UnicodeMode] :: RegExpIdentifierStart[?UnicodeMode] RegExpIdentifierName[?UnicodeMode] RegExpIdentifierPart[?UnicodeMode] RegExpIdentifierStart[UnicodeMode] :: IdentifierStartChar `\` RegExpUnicodeEscapeSequence[+UnicodeMode] [~UnicodeMode] UnicodeLeadSurrogate UnicodeTrailSurrogate RegExpIdentifierPart[UnicodeMode] :: IdentifierPartChar `\` RegExpUnicodeEscapeSequence[+UnicodeMode] [~UnicodeMode] UnicodeLeadSurrogate UnicodeTrailSurrogate RegExpUnicodeEscapeSequence[UnicodeMode] :: [+UnicodeMode] `u` HexLeadSurrogate `\u` HexTrailSurrogate [+UnicodeMode] `u` HexLeadSurrogate [+UnicodeMode] `u` HexTrailSurrogate [+UnicodeMode] `u` HexNonSurrogate [~UnicodeMode] `u` Hex4Digits [+UnicodeMode] `u{` CodePoint `}` UnicodeLeadSurrogate :: > any Unicode code point in the inclusive range 0xD800 to 0xDBFF UnicodeTrailSurrogate :: > any Unicode code point in the inclusive range 0xDC00 to 0xDFFF </emu-grammar> <p>Each `\\u` |HexTrailSurrogate| for which the choice of associated `u` |HexLeadSurrogate| is ambiguous shall be associated with the nearest possible `u` |HexLeadSurrogate| that would otherwise have no corresponding `\\u` |HexTrailSurrogate|.</p> <emu-grammar type="definition"> HexLeadSurrogate :: Hex4Digits [> but only if the MV of |Hex4Digits| is in the inclusive range 0xD800 to 0xDBFF] HexTrailSurrogate :: Hex4Digits [> but only if the MV of |Hex4Digits| is in the inclusive range 0xDC00 to 0xDFFF] HexNonSurrogate :: Hex4Digits [> but only if the MV of |Hex4Digits| is not in the inclusive range 0xD800 to 0xDFFF] IdentityEscape[UnicodeMode] :: [+UnicodeMode] SyntaxCharacter [+UnicodeMode] `/` [~UnicodeMode] SourceCharacter but not UnicodeIDContinue DecimalEscape :: NonZeroDigit DecimalDigits[~Sep]? [lookahead ∉ DecimalDigit] CharacterClassEscape[UnicodeMode] :: `d` `D` `s` `S` `w` `W` [+UnicodeMode] `p{` UnicodePropertyValueExpression `}` [+UnicodeMode] `P{` UnicodePropertyValueExpression `}` UnicodePropertyValueExpression :: UnicodePropertyName `=` UnicodePropertyValue LoneUnicodePropertyNameOrValue UnicodePropertyName :: UnicodePropertyNameCharacters UnicodePropertyNameCharacters :: UnicodePropertyNameCharacter UnicodePropertyNameCharacters? UnicodePropertyValue :: UnicodePropertyValueCharacters LoneUnicodePropertyNameOrValue :: UnicodePropertyValueCharacters UnicodePropertyValueCharacters :: UnicodePropertyValueCharacter UnicodePropertyValueCharacters? UnicodePropertyValueCharacter :: UnicodePropertyNameCharacter DecimalDigit UnicodePropertyNameCharacter :: ControlLetter `_` CharacterClass[UnicodeMode] :: `[` [lookahead != `^`] ClassRanges[?UnicodeMode] `]` `[` `^` ClassRanges[?UnicodeMode] `]` ClassRanges[UnicodeMode] :: [empty] NonemptyClassRanges[?UnicodeMode] NonemptyClassRanges[UnicodeMode] :: ClassAtom[?UnicodeMode] ClassAtom[?UnicodeMode] NonemptyClassRangesNoDash[?UnicodeMode] ClassAtom[?UnicodeMode] `-` ClassAtom[?UnicodeMode] ClassRanges[?UnicodeMode] NonemptyClassRangesNoDash[UnicodeMode] :: ClassAtom[?UnicodeMode] ClassAtomNoDash[?UnicodeMode] NonemptyClassRangesNoDash[?UnicodeMode] ClassAtomNoDash[?UnicodeMode] `-` ClassAtom[?UnicodeMode] ClassRanges[?UnicodeMode] ClassAtom[UnicodeMode] :: `-` ClassAtomNoDash[?UnicodeMode] ClassAtomNoDash[UnicodeMode] :: SourceCharacter but not one of `\` or `]` or `-` `\` ClassEscape[?UnicodeMode] ClassEscape[UnicodeMode] :: `b` [+UnicodeMode] `-` CharacterClassEscape[?UnicodeMode] CharacterEscape[?UnicodeMode] </emu-grammar> <emu-note> <p>A number of productions in this section are given alternative definitions in section <emu-xref href="#sec-regular-expressions-patterns"></emu-xref>.</p> </emu-note> </emu-clause> <emu-clause id="sec-pattern-semantics"> <h1>Pattern Semantics</h1> <emu-clause id="sec-notation"> <h1>Notation</h1> <p>The descriptions below use the following aliases:</p> <ul> <li> _Input_ is a List whose elements are the characters of the String being matched by the regular expression pattern. Each character is either a code unit or a code point, depending upon the kind of pattern involved. The notation _Input_[_n_] means the _n_<sup>th</sup> character of _Input_, where _n_ can range between 0 (inclusive) and _InputLength_ (exclusive). </li> <li> _InputLength_ is the number of characters in _Input_. </li> <li> _NcapturingParens_ is the total number of left-capturing parentheses (i.e. the total number of <emu-grammar>Atom :: `(` GroupSpecifier Disjunction `)`</emu-grammar> Parse Nodes) in the pattern. A left-capturing parenthesis is any `(` pattern character that is matched by the `(` terminal of the <emu-grammar>Atom :: `(` GroupSpecifier Disjunction `)`</emu-grammar> production. </li> <li> _DotAll_ is *true* if the RegExp object's [[OriginalFlags]] internal slot contains *"s"* and otherwise is *false*. </li> <li> _IgnoreCase_ is *true* if the RegExp object's [[OriginalFlags]] internal slot contains *"i"* and otherwise is *false*. </li> <li> _Multiline_ is *true* if the RegExp object's [[OriginalFlags]] internal slot contains *"m"* and otherwise is *false*. </li> <li> _Unicode_ is *true* if the RegExp object's [[OriginalFlags]] internal slot contains *"u"* and otherwise is *false*. </li> <li oldids="sec-runtime-semantics-wordcharacters-abstract-operation"> <del>_WordCharacters_ is the mathematical set that is the union of all sixty-three characters in *"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_"* (letters, numbers, and U+005F (LOW LINE) in the Unicode Basic Latin block) and all characters _c_ for which _c_ is not in that set but Canonicalize(_c_) is. _WordCharacters_ cannot contain more than sixty-three characters unless _Unicode_ and _IgnoreCase_ are both *true*.</del> </li> </ul> <p>Furthermore, the descriptions below use the following internal data structures:</p> <ul> <li> A <em>CharSet</em> is a mathematical set of characters. When the _Unicode_ flag is *true*, “all characters” means the CharSet containing all code point values; otherwise “all characters” means the CharSet containing all code unit values. </li> <li> A <em>State</em> is an ordered pair (_endIndex_, _captures_) where _endIndex_ is an integer and _captures_ is a List of _NcapturingParens_ values. States are used to represent partial match states in the regular expression matching algorithms. The _endIndex_ is one plus the index of the last input character matched so far by the pattern, while _captures_ holds the results of capturing parentheses. The _n_<sup>th</sup> element of _captures_ is either a List of characters that represents the value obtained by the _n_<sup>th</sup> set of capturing parentheses or *undefined* if the _n_<sup>th</sup> set of capturing parentheses hasn't been reached yet. Due to backtracking, many States may be in use at any time during the matching process. </li> <li> A <em>MatchResult</em> is either a State or the special token ~failure~ that indicates that the match failed. </li> <li> A <em>Continuation</em> is an Abstract Closure that takes one State argument and returns a MatchResult result. The Continuation attempts to match the remaining portion (specified by the closure's captured values) of the pattern against _Input_, starting at the intermediate state given by its State argument. If the match succeeds, the Continuation returns the final State that it reached; if the match fails, the Continuation returns ~failure~. </li> <li> A <em>Matcher</em> is an Abstract Closure that takes two arguments—a State and a Continuation—and returns a MatchResult result. A Matcher attempts to match a middle subpattern (specified by the closure's captured values) of the pattern against _Input_, starting at the intermediate state given by its State argument. The Continuation argument should be a closure that matches the rest of the pattern. After matching the subpattern of a pattern to obtain a new State, the Matcher then calls Continuation on that new State to test if the rest of the pattern can match as well. If it can, the Matcher returns the State returned by Continuation; if not, the Matcher may try different choices at its choice points, repeatedly calling Continuation until it either succeeds or all possibilities have been exhausted. </li> </ul> </emu-clause> <ins class="block"> <emu-clause id="sec-patterns-static-semantics-early-errors"> <h1>Static Semantics: Early Errors</h1> <emu-grammar>Atom :: `(` `?` RegularExpressionFlags `:` Disjunction `)`</emu-grammar> <ul> <li>It is a Syntax Error if the source text matched by |RegularExpressionFlags| contains any code point other than `i`, `m`, or `s`, or if it contains the same code point more than once. </ul> <emu-grammar>Atom :: `(` `?` RegularExpressionFlags `-` RegularExpressionFlags `:` Disjunction `)`</emu-grammar> <ul> <li>It is a Syntax Error if the source text matched by the first |RegularExpressionFlags| and the source text matched by the second |RegularExpressionFlags| are both empty. <li>It is a Syntax Error if the source text matched by the first |RegularExpressionFlags| contains any code point other than `i`, `m`, or `s`, or contains the same code point more than once. <li>It is a Syntax Error if the source text matched by the second |RegularExpressionFlags| contains any code point other than `i`, `m`, or `s`, or contains the same code point more than once. <li>It is a Syntax Error if any code point in the source text matched by the first |RegularExpressionFlags| is also contained in the source text matched by the second |RegularExpressionFlags|. </ul> </emu-clause> <emu-clause id="sec-modifiers-records"> <h1>Modifiers Records</h1> <p>A <dfn variants="Modifiers Records">Modifiers Record</dfn> is a Record value used to encapsulate information about the regular expression flags that apply to a subpattern.</p> <p>Modifiers Records have the fields listed in <emu-xref href="#table-modifiers-record"></emu-xref>.</p> <emu-table id="table-modifiers-record" caption="Modifiers Record Fields"> <table> <tr> <th>Field Name</th> <th>Value</th> <th>Meaning</th> </tr> <tr> <td>[[DotAll]]</td> <td>a Boolean</td> <td>Indicates whether the *"s"* flag is currently enabled.</td> </tr> <tr> <td>[[IgnoreCase]]</td> <td>a Boolean</td> <td>Indicates whether the *"i"* flag is currently enabled.</td> </tr> <tr> <td>[[Multiline]]</td> <td>a Boolean</td> <td>Indicates whether the *"m"* flag is currently enabled.</td> </tr> </table> </emu-table> </emu-clause> </ins> <emu-clause id="sec-compilepattern" type="sdo" oldids="sec-pattern"> <h1>Runtime Semantics: CompilePattern</h1> <dl class="header"> <dt>description</dt> <dd>It returns an Abstract Closure that takes a String and a non-negative integer and returns a MatchResult.</dd> </dl> <emu-grammar>Pattern :: Disjunction</emu-grammar> <emu-alg> 1. <ins>Let _modifiers_ be the Modifiers Record { [[DotAll]]: _DotAll_, [[IgnoreCase]]: _IgnoreCase_, [[Multiline]]: _Multiline_ }.</ins> 1. Let _m_ be CompileSubpattern of |Disjunction| with argument<ins>s</ins> ~forward~<ins> and _modifiers_</ins>. 1. Return a new Abstract Closure with parameters (_str_, _index_) that captures _m_ and performs the following steps when called: 1. Assert: Type(_str_) is String. 1. Assert: _index_ is a non-negative integer which is ≤ the length of _str_. 1. If _Unicode_ is *true*, let _Input_ be StringToCodePoints(_str_). Otherwise, let _Input_ be a List whose elements are the code units that are the elements of _str_. _Input_ will be used throughout the algorithms in <emu-xref href="#sec-pattern-semantics"></emu-xref>. Each element of _Input_ is considered to be a character. 1. Let _InputLength_ be the number of characters contained in _Input_. This alias will be used throughout the algorithms in <emu-xref href="#sec-pattern-semantics"></emu-xref>. 1. Let _listIndex_ be the index into _Input_ of the character that was obtained from element _index_ of _str_. 1. Let _c_ be a new Continuation with parameters (_y_) that captures nothing and performs the following steps when called: 1. Assert: _y_ is a State. 1. Return _y_. 1. Let _cap_ be a List of _NcapturingParens_ *undefined* values, indexed 1 through _NcapturingParens_. 1. Let _x_ be the State (_listIndex_, _cap_). 1. Return _m_(_x_, _c_). </emu-alg> <emu-note> <p>A Pattern compiles to an Abstract Closure value. RegExpBuiltinExec can then apply this procedure to a String and an offset within the String to determine whether the pattern would match starting at exactly that offset within the String, and, if it does match, what the values of the capturing parentheses would be. The algorithms in <emu-xref href="#sec-pattern-semantics"></emu-xref> are designed so that compiling a pattern may throw a *SyntaxError* exception; on the other hand, once the pattern is successfully compiled, applying the resulting Abstract Closure to find a match in a String cannot throw an exception (except for any implementation-defined exceptions that can occur anywhere such as out-of-memory).</p> </emu-note> </emu-clause> <emu-clause id="sec-compilesubpattern" type="sdo" oldids="sec-disjunction,sec-alternative,sec-term"> <h1> Runtime Semantics: CompileSubpattern ( _direction_: ~forward~ or ~backward~, <ins>_modifiers_: a Modifiers Record,</ins> ): a Matcher </h1> <dl class="header"> </dl> <emu-note> <p>This section is amended in B.1.2.4.</p> </emu-note> <!-- Disjunction --> <emu-grammar>Disjunction :: Alternative `|` Disjunction</emu-grammar> <emu-alg> 1. Let _m1_ be CompileSubpattern of |Alternative| with argument<ins>s</ins> _direction_<ins> and _modifiers_</ins>. 1. Let _m2_ be CompileSubpattern of |Disjunction| with argument<ins>s</ins> _direction_<ins> and _modifiers_</ins>. 1. Return a new Matcher with parameters (_x_, _c_) that captures _m1_ and _m2_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Let _r_ be _m1_(_x_, _c_). 1. If _r_ is not ~failure~, return _r_. 1. Return _m2_(_x_, _c_). </emu-alg> <emu-note> <p>The `|` regular expression operator separates two alternatives. The pattern first tries to match the left |Alternative| (followed by the sequel of the regular expression); if it fails, it tries to match the right |Disjunction| (followed by the sequel of the regular expression). If the left |Alternative|, the right |Disjunction|, and the sequel all have choice points, all choices in the sequel are tried before moving on to the next choice in the left |Alternative|. If choices in the left |Alternative| are exhausted, the right |Disjunction| is tried instead of the left |Alternative|. Any capturing parentheses inside a portion of the pattern skipped by `|` produce *undefined* values instead of Strings. Thus, for example,</p> <pre><code class="javascript">/a|ab/.exec("abc")</code></pre> <p>returns the result *"a"* and not *"ab"*. Moreover,</p> <pre><code class="javascript">/((a)|(ab))((c)|(bc))/.exec("abc")</code></pre> <p>returns the array</p> <pre><code class="javascript">["abc", "a", "a", undefined, "bc", undefined, "bc"]</code></pre> <p>and not</p> <pre><code class="javascript">["abc", "ab", undefined, "ab", "c", "c", undefined]</code></pre> <p>The order in which the two alternatives are tried is independent of the value of _direction_.</p> </emu-note> <!-- Alternative --> <emu-grammar>Alternative :: [empty]</emu-grammar> <emu-alg> 1. Return a new Matcher with parameters (_x_, _c_) that captures nothing and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Return _c_(_x_). </emu-alg> <emu-grammar>Alternative :: Alternative Term</emu-grammar> <emu-alg> 1. Let _m1_ be CompileSubpattern of |Alternative| with argument<ins>s</ins> _direction_<ins> and _modifiers_</ins>. 1. Let _m2_ be CompileSubpattern of |Term| with argument<ins>s</ins> _direction_<ins> and _modifiers_</ins>. 1. If _direction_ is ~forward~, then 1. Let _m_ be a new Matcher with parameters (_x_, _c_) that captures _m1_ and _m2_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Let _d_ be a new Continuation with parameters (_y_) that captures _c_ and _m2_ and performs the following steps when called: 1. Assert: _y_ is a State. 1. Return _m2_(_y_, _c_). 1. Return _m1_(_x_, _d_). 1. Else, 1. Assert: _direction_ is ~backward~. 1. Let _m_ be a new Matcher with parameters (_x_, _c_) that captures _m1_ and _m2_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Let _d_ be a new Continuation with parameters (_y_) that captures _c_ and _m1_ and performs the following steps when called: 1. Assert: _y_ is a State. 1. Return _m1_(_y_, _c_). 1. Return _m2_(_x_, _d_). </emu-alg> <emu-note> <p>Consecutive |Term|s try to simultaneously match consecutive portions of _Input_. When _direction_ is ~forward~, if the left |Alternative|, the right |Term|, and the sequel of the regular expression all have choice points, all choices in the sequel are tried before moving on to the next choice in the right |Term|, and all choices in the right |Term| are tried before moving on to the next choice in the left |Alternative|. When _direction_ is ~backward~, the evaluation order of |Alternative| and |Term| are reversed.</p> </emu-note> <!-- Term --> <emu-grammar>Term :: Assertion</emu-grammar> <emu-alg> 1. Return CompileAssertion of |Assertion|<ins> with argument _modifiers_</ins>. </emu-alg> <emu-note> <p>The resulting Matcher is independent of _direction_.</p> </emu-note> <emu-grammar>Term :: Atom</emu-grammar> <emu-alg> 1. Return CompileAtom of |Atom| with argument<ins>s</ins> _direction_<ins> and _modifiers_</ins>. </emu-alg> <emu-grammar>Term :: Atom Quantifier</emu-grammar> <emu-alg> 1. Let _m_ be CompileAtom of |Atom| with argument<ins>s</ins> _direction_<ins> and _modifiers_</ins>. 1. Let _q_ be CompileQuantifier of |Quantifier|. 1. Assert: _q_.[[Min]] ≤ _q_.[[Max]]. 1. Let _parenIndex_ be the number of left-capturing parentheses in the entire regular expression that occur to the left of this |Term|. This is the total number of <emu-grammar>Atom :: `(` GroupSpecifier Disjunction `)`</emu-grammar> Parse Nodes prior to or enclosing this |Term|. 1. Let _parenCount_ be the number of left-capturing parentheses in |Atom|. This is the total number of <emu-grammar>Atom :: `(` GroupSpecifier Disjunction `)`</emu-grammar> Parse Nodes enclosed by |Atom|. 1. Return a new Matcher with parameters (_x_, _c_) that captures _m_, _q_, _parenIndex_, and _parenCount_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Return RepeatMatcher(_m_, _q_.[[Min]], _q_.[[Max]], _q_.[[Greedy]], _x_, _c_, _parenIndex_, _parenCount_). </emu-alg> </emu-clause> <emu-clause id="sec-compileassertion" type="sdo" oldids="sec-assertion"> <h1> Runtime Semantics: CompileAssertion ( <ins>_modifiers_: a Modifiers Record,</ins> ): a Matcher </h1> <dl class="header"> </dl> <emu-note> <p>This section is amended in B.1.2.5.</p> </emu-note> <emu-grammar>Assertion :: `^`</emu-grammar> <emu-alg> 1. Return a new Matcher with parameters (_x_, _c_) that captures nothing and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Let _e_ be _x_'s _endIndex_. 1. If _e_ = 0, or if <del>_Multiline_</del><ins>_modifiers_.[[Multiline]]</ins> is *true* and the character _Input_[_e_ - 1] is one of |LineTerminator|, then 1. Return _c_(_x_). 1. Return ~failure~. </emu-alg> <emu-note> <p>Even when the `y` flag is used with a pattern, `^` always matches only at the beginning of _Input_, or (if <del>_Multiline_</del><ins>_modifiers_.[[Multiline]]</ins> is *true*) at the beginning of a line.</p> </emu-note> <emu-grammar>Assertion :: `$`</emu-grammar> <emu-alg> 1. Return a new Matcher with parameters (_x_, _c_) that captures nothing and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Let _e_ be _x_'s _endIndex_. 1. If _e_ = _InputLength_, or if <del>_Multiline_</del><ins>_modifiers_.[[Multiline]]</ins> is *true* and the character _Input_[_e_] is one of |LineTerminator|, then 1. Return _c_(_x_). 1. Return ~failure~. </emu-alg> <emu-grammar>Assertion :: `\` `b`</emu-grammar> <emu-alg> 1. Return a new Matcher with parameters (_x_, _c_) that captures nothing and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Let _e_ be _x_'s _endIndex_. 1. Let _a_ be IsWordChar(_e_ - 1<ins>, _modifiers_</ins>). 1. Let _b_ be IsWordChar(_e_<ins>, _modifiers_</ins>). 1. If _a_ is *true* and _b_ is *false*, or if _a_ is *false* and _b_ is *true*, return _c_(_x_). 1. Return ~failure~. </emu-alg> <emu-grammar>Assertion :: `\` `B`</emu-grammar> <emu-alg> 1. Return a new Matcher with parameters (_x_, _c_) that captures nothing and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Let _e_ be _x_'s _endIndex_. 1. Let _a_ be IsWordChar(_e_ - 1<ins>, _modifiers_</ins>). 1. Let _b_ be IsWordChar(_e_<ins>, _modifiers_</ins>). 1. If _a_ is *true* and _b_ is *true*, or if _a_ is *false* and _b_ is *false*, return _c_(_x_). 1. Return ~failure~. </emu-alg> <emu-grammar>Assertion :: `(` `?` `=` Disjunction `)`</emu-grammar> <emu-alg> 1. Let _m_ be CompileSubpattern of |Disjunction| with argument<ins>s</ins> ~forward~<ins> and _modifiers_</ins>. 1. Return a new Matcher with parameters (_x_, _c_) that captures _m_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Let _d_ be a new Continuation with parameters (_y_) that captures nothing and performs the following steps when called: 1. Assert: _y_ is a State. 1. Return _y_. 1. Let _r_ be _m_(_x_, _d_). 1. If _r_ is ~failure~, return ~failure~. 1. Let _y_ be _r_'s State. 1. Let _cap_ be _y_'s _captures_ List. 1. Let _xe_ be _x_'s _endIndex_. 1. Let _z_ be the State (_xe_, _cap_). 1. Return _c_(_z_). </emu-alg> <emu-grammar>Assertion :: `(` `?` `!` Disjunction `)`</emu-grammar> <emu-alg> 1. Let _m_ be CompileSubpattern of |Disjunction| with argument<ins>s</ins> ~forward~<ins> and _modifiers_</ins>. 1. Return a new Matcher with parameters (_x_, _c_) that captures _m_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Let _d_ be a new Continuation with parameters (_y_) that captures nothing and performs the following steps when called: 1. Assert: _y_ is a State. 1. Return _y_. 1. Let _r_ be _m_(_x_, _d_). 1. If _r_ is not ~failure~, return ~failure~. 1. Return _c_(_x_). </emu-alg> <emu-grammar>Assertion :: `(` `?` `<=` Disjunction `)`</emu-grammar> <emu-alg> 1. Let _m_ be CompileSubpattern of |Disjunction| with argument<ins>s</ins> ~backward~<ins> and _modifiers_</ins>. 1. Return a new Matcher with parameters (_x_, _c_) that captures _m_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Let _d_ be a new Continuation with parameters (_y_) that captures nothing and performs the following steps when called: 1. Assert: _y_ is a State. 1. Return _y_. 1. Let _r_ be _m_(_x_, _d_). 1. If _r_ is ~failure~, return ~failure~. 1. Let _y_ be _r_'s State. 1. Let _cap_ be _y_'s _captures_ List. 1. Let _xe_ be _x_'s _endIndex_. 1. Let _z_ be the State (_xe_, _cap_). 1. Return _c_(_z_). </emu-alg> <emu-grammar>Assertion :: `(` `?` `<!` Disjunction `)`</emu-grammar> <emu-alg> 1. Let _m_ be CompileSubpattern of |Disjunction| with argument<ins>s</ins> ~backward~<ins> and _modifiers_</ins>. 1. Return a new Matcher with parameters (_x_, _c_) that captures _m_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Let _d_ be a new Continuation with parameters (_y_) that captures nothing and performs the following steps when called: 1. Assert: _y_ is a State. 1. Return _y_. 1. Let _r_ be _m_(_x_, _d_). 1. If _r_ is not ~failure~, return ~failure~. 1. Return _c_(_x_). </emu-alg> <emu-clause id="sec-runtime-semantics-iswordchar-abstract-operation" type="abstract operation"> <h1> IsWordChar ( _e_: an integer, <ins>_modifiers_: a Modifiers Record,</ins> ) </h1> <dl class="header"> </dl> <emu-alg> 1. If _e_ = -1 or _e_ is _InputLength_, return *false*. 1. Let _c_ be the character _Input_[_e_]. 1. <ins>Let _wordCharacters_ be GetWordCharacters(_modifiers_).</ins> 1. If _c_ is in <del>_WordCharacters_</del><ins>_wordCharacters_</ins>, return *true*. 1. Return *false*. </emu-alg> </emu-clause> </emu-clause> <emu-clause id="sec-compileatom" type="sdo" oldids="sec-atom,sec-atomescape,sec-characterescape,sec-decimalescape"> <h1> Runtime Semantics: CompileAtom ( _direction_: ~forward~ or ~backward~, <ins>_modifiers_: a Modifiers Record,</ins> ): a Matcher </h1> <dl class="header"> </dl> <emu-note> <p>This section is amended in B.1.2.6.</p> </emu-note> <!-- Atom --> <emu-grammar>Atom :: PatternCharacter</emu-grammar> <emu-alg> 1. Let _ch_ be the character matched by |PatternCharacter|. 1. Let _A_ be a one-element CharSet containing the character _ch_. 1. Return CharacterSetMatcher(_A_, *false*, _direction_<ins>, _modifiers_</ins>). </emu-alg> <emu-grammar>Atom :: `.`</emu-grammar> <emu-alg> 1. Let _A_ be the CharSet of all characters. 1. If <del>_DotAll_</del><ins>_modifiers_.[[DotAll]]</ins> is not *true*, then 1. Remove from _A_ all characters corresponding to a code point on the right-hand side of the |LineTerminator| production. 1. Return CharacterSetMatcher(_A_, *false*, _direction_<ins>, _modifiers_</ins>). </emu-alg> <emu-grammar>Atom :: CharacterClass</emu-grammar> <emu-alg> 1. Let _cc_ be CompileCharacterClass of |CharacterClass|. 1. Return CharacterSetMatcher(_cc_.[[CharSet]], _cc_.[[Invert]], _direction_<ins>, _modifiers_</ins>). </emu-alg> <emu-grammar>Atom :: `(` GroupSpecifier Disjunction `)`</emu-grammar> <emu-alg> 1. Let _m_ be CompileSubpattern of |Disjunction| with argument<ins>s</ins> _direction_<ins> and _modifiers_</ins>. 1. Let _parenIndex_ be the number of left-capturing parentheses in the entire regular expression that occur to the left of this |Atom|. This is the total number of <emu-grammar>Atom :: `(` GroupSpecifier Disjunction `)`</emu-grammar> Parse Nodes prior to or enclosing this |Atom|. 1. Return a new Matcher with parameters (_x_, _c_) that captures _direction_, _m_, and _parenIndex_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Let _d_ be a new Continuation with parameters (_y_) that captures _x_, _c_, _direction_, and _parenIndex_ and performs the following steps when called: 1. Assert: _y_ is a State. 1. Let _cap_ be a copy of _y_'s _captures_ List. 1. Let _xe_ be _x_'s _endIndex_. 1. Let _ye_ be _y_'s _endIndex_. 1. If _direction_ is ~forward~, then 1. Assert: _xe_ ≤ _ye_. 1. Let _s_ be a List whose elements are the characters of _Input_ at indices _xe_ (inclusive) through _ye_ (exclusive). 1. Else, 1. Assert: _direction_ is ~backward~. 1. Assert: _ye_ ≤ _xe_. 1. Let _s_ be a List whose elements are the characters of _Input_ at indices _ye_ (inclusive) through _xe_ (exclusive). 1. Set _cap_[_parenIndex_ + 1] to _s_. 1. Let _z_ be the State (_ye_, _cap_). 1. Return _c_(_z_). 1. Return _m_(_x_, _d_). </emu-alg> <del class="block"> <emu-grammar>Atom :: `(` `?` `:` Disjunction `)`</emu-grammar> <emu-alg> 1. Return CompileSubpattern of |Disjunction| with argument<ins>s</ins> _direction_<ins> and _modifiers_</ins>. </emu-alg> </del> <ins class="block"> <emu-grammar>Atom :: `(` `?` RegularExpressionFlags `:` Disjunction `)`</emu-grammar> <emu-alg> 1. Let _addModifiers_ be the source text matched by |RegularExpressionFlags|. 1. Let _removeModifiers_ be the empty String. 1. Let _newModifiers_ be UpdateModifiers(_modifiers_, CodePointsToString(_addModifiers_), _removeModifiers_). 1. Return CompileSubpattern of |Disjunction| with arguments _direction_ and _newModifiers_. </emu-alg> <emu-grammar>Atom :: `(` `?` RegularExpressionFlags `-` RegularExpressionFlags `:` Disjunction `)`</emu-grammar> <emu-alg> 1. Let _addModifiers_ be the source text matched by the first |RegularExpressionFlags|. 1. Let _removeModifiers_ be the source text matched by the second |RegularExpressionFlags|. 1. Let _newModifiers_ be UpdateModifiers(_modifiers_, CodePointsToString(_addModifiers_), CodePointsToString(_removeModifiers_)). 1. Return CompileSubpattern of |Disjunction| with arguments _direction_ and _newModifiers_. </emu-alg> </ins> <!-- AtomEscape --> <emu-grammar>AtomEscape :: DecimalEscape</emu-grammar> <emu-alg> 1. Let _n_ be the CapturingGroupNumber of |DecimalEscape|. 1. Assert: _n_ ≤ _NcapturingParens_. 1. Return BackreferenceMatcher(_n_, _direction_<ins>, _modifiers_</ins>). </emu-alg> <emu-note> <p>An escape sequence of the form `\\` followed by a non-zero decimal number _n_ matches the result of the _n_<sup>th</sup> set of capturing parentheses (<emu-xref href="#sec-notation"></emu-xref>). It is an error if the regular expression has fewer than _n_ capturing parentheses. If the regular expression has _n_ or more capturing parentheses but the _n_<sup>th</sup> one is *undefined* because it has not captured anything, then the backreference always succeeds.</p> </emu-note> <emu-grammar>AtomEscape :: CharacterEscape</emu-grammar> <emu-alg> 1. Let _cv_ be the CharacterValue of |CharacterEscape|. 1. Let _ch_ be the character whose character value is _cv_. 1. Let _A_ be a one-element CharSet containing the character _ch_. 1. Return CharacterSetMatcher(_A_, *false*, _direction_<ins>, _modifiers_</ins>). </emu-alg> <emu-grammar>AtomEscape :: CharacterClassEscape</emu-grammar> <emu-alg> 1. Let _A_ be CompileToCharSet of |CharacterClassEscape|. 1. Return CharacterSetMatcher(_A_, *false*, _direction_<ins>, _modifiers_</ins>). </emu-alg> <emu-grammar>AtomEscape :: `k` GroupName</emu-grammar> <emu-alg> 1. Search the enclosing |Pattern| for an instance of a |GroupSpecifier| containing a |RegExpIdentifierName| which has a CapturingGroupName equal to the CapturingGroupName of the |RegExpIdentifierName| contained in |GroupName|. 1. Assert: A unique such |GroupSpecifier| is found. 1. Let _parenIndex_ be the number of left-capturing parentheses in the entire regular expression that occur to the left of the located |GroupSpecifier|. This is the total number of <emu-grammar>Atom :: `(` GroupSpecifier Disjunction `)`</emu-grammar> Parse Nodes prior to or enclosing the located |GroupSpecifier|, including its immediately enclosing |Atom|. 1. Return BackreferenceMatcher(_parenIndex_, _direction_<ins>, _modifiers_</ins>). </emu-alg> <emu-clause id="sec-runtime-semantics-charactersetmatcher-abstract-operation" type="abstract operation"> <h1> CharacterSetMatcher ( _A_: a CharSet, _invert_: a Boolean, _direction_: ~forward~ or ~backward~, <ins>_modifiers_: a Modifiers Record,</ins> ): a Matcher </h1> <dl class="header"> </dl> <emu-alg> 1. Return a new Matcher with parameters (_x_, _c_) that captures _A_, _invert_, and _direction_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Let _e_ be _x_'s _endIndex_. 1. If _direction_ is ~forward~, let _f_ be _e_ + 1. 1. Else, let _f_ be _e_ - 1. 1. If _f_ < 0 or _f_ > _InputLength_, return ~failure~. 1. Let _index_ be min(_e_, _f_). 1. Let _ch_ be the character _Input_[_index_]. 1. Let _cc_ be Canonicalize(_ch_<ins>, _modifiers_</ins>). 1. If there exists a member _a_ of _A_ such that Canonicalize(_a_, <ins>_modifiers_</ins>) is _cc_, let _found_ be *true*. Otherwise, let _found_ be *false*. 1. If _invert_ is *false* and _found_ is *false*, return ~failure~. 1. If _invert_ is *true* and _found_ is *true*, return ~failure~. 1. Let _cap_ be _x_'s _captures_ List. 1. Let _y_ be the State (_f_, _cap_). 1. Return _c_(_y_). </emu-alg> </emu-clause> <emu-clause id="sec-backreference-matcher" type="abstract operation"> <h1> BackreferenceMatcher ( _n_: a positive integer, _direction_: ~forward~ or ~backward~, <ins>_modifiers_: a Modifiers Record,</ins> ): a Matcher </h1> <dl class="header"> </dl> <emu-alg> 1. Assert: _n_ ≥ 1. 1. Return a new Matcher with parameters (_x_, _c_) that captures _n_ and _direction_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Let _cap_ be _x_'s _captures_ List. 1. Let _s_ be _cap_[_n_]. 1. If _s_ is *undefined*, return _c_(_x_). 1. Let _e_ be _x_'s _endIndex_. 1. Let _len_ be the number of elements in _s_. 1. If _direction_ is ~forward~, let _f_ be _e_ + _len_. 1. Else, let _f_ be _e_ - _len_. 1. If _f_ < 0 or _f_ > _InputLength_, return ~failure~. 1. Let _g_ be min(_e_, _f_). 1. If there exists an integer _i_ between 0 (inclusive) and _len_ (exclusive) such that Canonicalize(_s_[_i_]<ins>, _modifiers_</ins>) is not the same character value as Canonicalize(_Input_[_g_ + _i_], <ins>_modifiers_</ins>), return ~failure~. 1. Let _y_ be the State (_f_, _cap_). 1. Return _c_(_y_). </emu-alg> </emu-clause> <emu-clause id="sec-runtime-semantics-canonicalize-ch" type="abstract operation"> <h1> Canonicalize ( _ch_: a character, <ins>_modifiers_: a Modifiers Record,</ins> ): a Matcher </h1> <dl class="header"> </dl> <emu-alg> 1. If _Unicode_ is *true* and <del>_IgnoreCase_</del><ins>_modifiers_.[[IgnoreCase]]</ins> is *true*, then 1. If the file CaseFolding.txt of the Unicode Character Database provides a simple or common case folding mapping for _ch_, return the result of applying that mapping to _ch_. 1. Return _ch_. 1. If <del>_IgnoreCase_</del><ins>_modifiers_.[[IgnoreCase]]</ins> is *false*, return _ch_. 1. Assert: _ch_ is a UTF-16 code unit. 1. Let _cp_ be the code point whose numeric value is that of _ch_. 1. Let _u_ be the result of toUppercase(« _cp_ »), according to the Unicode Default Case Conversion algorithm. 1. Let _uStr_ be CodePointsToString(_u_). 1. If _uStr_ does not consist of a single code unit, return _ch_. 1. Let _cu_ be _uStr_'s single code unit element. 1. If the numeric value of _ch_ ≥ 128 and the numeric value of _cu_ < 128, return _ch_. 1. Return _cu_. </emu-alg> <emu-note> <p>Parentheses of the form `(` |Disjunction| `)` serve both to group the components of the |Disjunction| pattern together and to save the result of the match. The result can be used either in a backreference (`\\` followed by a non-zero decimal number), referenced in a replace String, or returned as part of an array from the regular expression matching Abstract Closure. To inhibit the capturing behaviour of parentheses, use the form `(?:` |Disjunction| `)` instead.</p> </emu-note> <emu-note> <p>The form `(?=` |Disjunction| `)` specifies a zero-width positive lookahead. In order for it to succeed, the pattern inside |Disjunction| must match at the current position, but the current position is not advanced before matching the sequel. If |Disjunction| can match at the current position in several ways, only the first one is tried. Unlike other regular expression operators, there is no backtracking into a `(?=` form (this unusual behaviour is inherited from Perl). This only matters when the |Disjunction| contains capturing parentheses and the sequel of the pattern contains backreferences to those captures.</p> <p>For example,</p> <pre><code class="javascript">/(?=(a+))/.exec("baaabac")</code></pre> <p>matches the empty String immediately after the first `b` and therefore returns the array:</p> <pre><code class="javascript">["", "aaa"]</code></pre> <p>To illustrate the lack of backtracking into the lookahead, consider:</p> <pre><code class="javascript">/(?=(a+))a*b\1/.exec("baaabac")</code></pre> <p>This expression returns</p> <pre><code class="javascript">["aba", "a"]</code></pre> <p>and not:</p> <pre><code class="javascript">["aaaba", "a"]</code></pre> </emu-note> <emu-note> <p>The form `(?!` |Disjunction| `)` specifies a zero-width negative lookahead. In order for it to succeed, the pattern inside |Disjunction| must fail to match at the current position. The current position is not advanced before matching the sequel. |Disjunction| can contain capturing parentheses, but backreferences to them only make sense from within |Disjunction| itself. Backreferences to these capturing parentheses from elsewhere in the pattern always return *undefined* because the negative lookahead must fail for the pattern to succeed. For example,</p> <pre><code class="javascript">/(.*?)a(?!(a+)b\2c)\2(.*)/.exec("baaabaac")</code></pre> <p>looks for an `a` not immediately followed by some positive number n of `a`'s, a `b`, another n `a`'s (specified by the first `\\2`) and a `c`. The second `\\2` is outside the negative lookahead, so it matches against *undefined* and therefore always succeeds. The whole expression returns the array:</p> <pre><code class="javascript">["baaabaac", "ba", undefined, "abaac"]</code></pre> </emu-note> <emu-note> <p>In case-insignificant matches when _Unicode_ is *true*, all characters are implicitly case-folded using the simple mapping provided by the Unicode standard immediately before they are compared. The simple mapping always maps to a single code point, so it does not map, for example, `ß` (U+00DF) to `SS`. It may however map a code point outside the Basic Latin range to a character within, for example, `ſ` (U+017F) to `s`. Such characters are not mapped if _Unicode_ is *false*. This prevents Unicode code points such as U+017F and U+212A from matching regular expressions such as `/[a-z]/i`, but they will match `/[a-z]/ui`.</p> </emu-note> </emu-clause> </emu-clause> <emu-clause id="sec-compiletocharset" type="sdo" oldids="sec-classranges,sec-nonemptyclassranges,sec-nonemptyclassrangesnodash,sec-classatom,sec-classatomnodash,sec-classescape,sec-characterclassescape"> <h1>Runtime Semantics: CompileToCharSet ( ): a CharSet</h1> <dl class="header"> </dl> <emu-note> <p>This section is amended in <emu-xref href="#sec-compiletocharset-annexb"></emu-xref>.</p> </emu-note> <!-- ClassRanges --> <emu-grammar>ClassRanges :: [empty]</emu-grammar> <emu-alg> 1. Return the empty CharSet. </emu-alg> <!-- NonemptyClassRanges --> <emu-grammar>NonemptyClassRanges :: ClassAtom NonemptyClassRangesNoDash</emu-grammar> <emu-alg> 1. Let _A_ be CompileToCharSet of |ClassAtom|. 1. Let _B_ be CompileToCharSet of |NonemptyClassRangesNoDash|. 1. Return the union of CharSets _A_ and _B_. </emu-alg> <emu-grammar>NonemptyClassRanges :: ClassAtom `-` ClassAtom ClassRanges</emu-grammar> <emu-alg> 1. Let _A_ be CompileToCharSet of the first |ClassAtom|. 1. Let _B_ be CompileToCharSet of the second |ClassAtom|. 1. Let _C_ be CompileToCharSet of |ClassRanges|. 1. Let _D_ be CharacterRange(_A_, _B_). 1. Return the union of _D_ and _C_. </emu-alg> <!-- NonemptyClassRangesNoDash --> <emu-grammar>NonemptyClassRangesNoDash :: ClassAtomNoDash NonemptyClassRangesNoDash</emu-grammar> <emu-alg> 1. Let _A_ be CompileToCharSet of |ClassAtomNoDash|. 1. Let _B_ be CompileToCharSet of |NonemptyClassRangesNoDash|. 1. Return the union of CharSets _A_ and _B_. </emu-alg> <emu-grammar>NonemptyClassRangesNoDash :: ClassAtomNoDash `-` ClassAtom ClassRanges</emu-grammar> <emu-alg> 1. Let _A_ be CompileToCharSet of |ClassAtomNoDash|. 1. Let _B_ be CompileToCharSet of |ClassAtom|. 1. Let _C_ be CompileToCharSet of |ClassRanges|. 1. Let _D_ be CharacterRange(_A_, _B_). 1. Return the union of _D_ and _C_. </emu-alg> <emu-note> <p>|ClassRanges| can expand into a single |ClassAtom| and/or ranges of two |ClassAtom| separated by dashes. In the latter case the |ClassRanges| includes all characters between the first |ClassAtom| and the second |ClassAtom|, inclusive; an error occurs if either |ClassAtom| does not represent a single character (for example, if one is \w) or if the first |ClassAtom|'s character value is greater than the second |ClassAtom|'s character value.</p> </emu-note> <emu-note> <p>Even if the pattern ignores case, the case of the two ends of a range is significant in determining which characters belong to the range. Thus, for example, the pattern `/[E-F]/i` matches only the letters `E`, `F`, `e`, and `f`, while the pattern `/[E-f]/i` matches all upper and lower-case letters in the Unicode Basic Latin block as well as the symbols `[`, `\\`, `]`, `^`, `_`, and <code>`</code>.</p> </emu-note> <emu-note> <p>A `-` character can be treated literally or it can denote a range. It is treated literally if it is the first or last character of |ClassRanges|, the beginning or end limit of a range specification, or immediately follows a range specification.</p> </emu-note> <!-- ClassAtom --> <emu-grammar>ClassAtom :: `-`</emu-grammar> <emu-alg> 1. Return the CharSet containing the single character `-` U+002D (HYPHEN-MINUS). </emu-alg> <!-- ClassAtomNoDash --> <emu-grammar>ClassAtomNoDash :: SourceCharacter but not one of `\` or `]` or `-`</emu-grammar> <emu-alg> 1. Return the CharSet containing the character matched by |SourceCharacter|. </emu-alg> <!-- ClassEscape --> <emu-grammar> ClassEscape :: `b` ClassEscape :: `-` ClassEscape :: CharacterEscape </emu-grammar> <emu-alg> 1. Let _cv_ be the CharacterValue of this |ClassEscape|. 1. Let _c_ be the character whose character value is _cv_. 1. Return the CharSet containing the single character _c_. </emu-alg> <emu-note> <p>A |ClassAtom| can use any of the escape sequences that are allowed in the rest of the regular expression except for `\\b`, `\\B`, and backreferences. Inside a |CharacterClass|, `\\b` means the backspace character, while `\\B` and backreferences raise errors. Using a backreference inside a |ClassAtom| causes an error.</p> </emu-note> <!-- CharacterClassEscape --> <emu-grammar>CharacterClassEscape :: `d`</emu-grammar> <emu-alg> 1. Return the ten-element CharSet containing the characters `0` through `9` inclusive. </emu-alg> <emu-grammar>CharacterClassEscape :: `D`</emu-grammar> <emu-alg> 1. Return the CharSet containing all characters not in the CharSet returned by <emu-grammar>CharacterClassEscape :: `d`</emu-grammar> . </emu-alg> <emu-grammar>CharacterClassEscape :: `s`</emu-grammar> <emu-alg> 1. Return the CharSet containing all characters corresponding to a code point on the right-hand side of the |WhiteSpace| or |LineTerminator| productions. </emu-alg> <emu-grammar>CharacterClassEscape :: `S`</emu-grammar> <emu-alg> 1. Return the CharSet containing all characters not in the CharSet returned by <emu-grammar>CharacterClassEscape :: `s`</emu-grammar> . </emu-alg> <emu-grammar>CharacterClassEscape :: `w`</emu-grammar> <emu-alg> 1. Return <del>_WordCharacters_</del><ins>GetWordCharacters(_modifiers_)</ins>. </emu-alg> <emu-grammar>CharacterClassEscape :: `W`</emu-grammar> <emu-alg> 1. Return the CharSet containing all characters not in the CharSet returned by <emu-grammar>CharacterClassEscape :: `w`</emu-grammar> . </emu-alg> <emu-grammar>CharacterClassEscape :: `p{` UnicodePropertyValueExpression `}`</emu-grammar> <emu-alg> 1. Return the CharSet containing all Unicode code points included in CompileToCharSet of |UnicodePropertyValueExpression|. </emu-alg> <emu-grammar>CharacterClassEscape :: `P{` UnicodePropertyValueExpression `}`</emu-grammar> <emu-alg> 1. Return the CharSet containing all Unicode code points not included in CompileToCharSet of |UnicodePropertyValueExpression|. </emu-alg> <emu-grammar>UnicodePropertyValueExpression :: UnicodePropertyName `=` UnicodePropertyValue</emu-grammar> <emu-alg> 1. Let _ps_ be SourceText of |UnicodePropertyName|. 1. Let _p_ be UnicodeMatchProperty(_ps_). 1. Assert: _p_ is a Unicode property name or property alias listed in the “Property name and aliases” column of <emu-xref href="#table-nonbinary-unicode-properties"></emu-xref>. 1. Let _vs_ be SourceText of |UnicodePropertyValue|. 1. Let _v_ be UnicodeMatchPropertyValue(_p_, _vs_). 1. Return the CharSet containing all Unicode code points whose character database definition includes the property _p_ with value _v_. </emu-alg> <emu-grammar>UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue</emu-grammar> <emu-alg> 1. Let _s_ be SourceText of |LoneUnicodePropertyNameOrValue|. 1. If UnicodeMatchPropertyValue(`General_Category`, _s_) is identical to a List of Unicode code points that is the name of a Unicode general category or general category alias listed in the “Property value and aliases” column of <emu-xref href="#table-unicode-general-category-values"></emu-xref>, then 1. Return the CharSet containing all Unicode code points whose character database definition includes the property “General_Category” with value _s_. 1. Let _p_ be UnicodeMatchProperty(_s_). 1. Assert: _p_ is a binary Unicode property or binary property alias listed in the “Property name and aliases” column of <emu-xref href="#table-binary-unicode-properties"></emu-xref>. 1. Return the CharSet containing all Unicode code points whose character database definition includes the property _p_ with value “True”. </emu-alg> </emu-clause> <ins class="block"> <emu-clause id="sec-getwordcharacters" type="abstract operation"> <h1> <ins> GetWordCharacters ( _modifiers_: a Modifiers Record, ): a CharSet </ins> </h1> <dl class="header"> </dl> <emu-alg> 1. Let _wordCharacters_ be the mathematical set that is the union of all sixty-three characters in *"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_"* (letters, numbers, and U+005F (LOW LINE) in the Unicode Basic Latin block) and all characters _c_ for which _c_ is not in that set but Canonicalize(_c_, _modifiers_) is. 1. Return _wordCharacters_. </emu-alg> <emu-note> _wordCharacters_ cannot contain more than sixty-three characters unless _Unicode_ and _modifiers_.[[IgnoreCase]] are both *true*. </emu-note> </emu-clause> <emu-clause id="sec-updatemodifiers" type="abstract operation"> <h1> <ins> UpdateModifiers ( _modifiers_: a Modifiers Record, _add_: a String, _remove_: a String, ): a Modifiers </ins> </h1> <dl class="header"> </dl> <emu-alg> 1. Let _dotAll_ be _modifiers_.[[DotAll]]. 1. Let _ignoreCase_ be _modifiers_.[[IgnoreCase]]. 1. Let _multiline_ be _modifiers_.[[Multiline]]. 1. If _add_ contains *"s"*, set _dotAll_ to *true*. 1. If _add_ contains *"i"*, set _ignoreCase_ to *true*. 1. If _add_ contains *"m"*, set _multiline_ to *true*. 1. If _remove_ contains *"s"*, set _dotAll_ to *false*. 1. If _remove_ contains *"i"*, set _ignoreCase_ to *false*. 1. If _remove_ contains *"m"*, set _multiline_ to *false*. 1. Return the Modifiers Record { [[DotAll]]: _dotAll_, [[IgnoreCase]]: _ignoreCase_, [[Multiline]]: _multiline_ }. </emu-alg> </emu-clause> </ins> </emu-clause> </emu-clause> </emu-clause> <emu-annex id="sec-additional-ecmascript-features-for-web-browsers" namespace="annexB" normative> <h1>Additional ECMAScript Features for Web Browsers</h1> <emu-annex id="sec-additional-syntax"> <h1>Additional Syntax</h1> <emu-annex id="sec-regular-expressions-patterns"> <h1>Regular Expressions Patterns</h1> <p>The syntax of <emu-xref href="#sec-patterns"></emu-xref> is modified and extended as follows. These changes introduce ambiguities that are broken by the ordering of grammar productions and by contextual information. When parsing using the following grammar, each alternative is considered only if previous production alternatives do not match.</p> <p>This alternative pattern grammar and semantics only changes the syntax and semantics of BMP patterns. The following grammar extensions include productions parameterized with the [UnicodeMode] parameter. However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the [UnicodeMode] parameter present on the goal symbol.</p> <h2>Syntax</h2> <emu-grammar type="definition"> Term[UnicodeMode, N] :: [+UnicodeMode] Assertion[+UnicodeMode, ?N] [+UnicodeMode] Atom[+UnicodeMode, ?N] Quantifier [+UnicodeMode] Atom[+UnicodeMode, ?N] [~UnicodeMode] QuantifiableAssertion[?N] Quantifier [~UnicodeMode] Assertion[~UnicodeMode, ?N] [~UnicodeMode] ExtendedAtom[?N] Quantifier [~UnicodeMode] ExtendedAtom[?N] Assertion[UnicodeMode, N] :: `^` `$` `\` `b` `\` `B` [+UnicodeMode] `(` `?` `=` Disjunction[+UnicodeMode, ?N] `)` [+UnicodeMode] `(` `?` `!` Disjunction[+UnicodeMode, ?N] `)` [~UnicodeMode] QuantifiableAssertion[?N] `(` `?` `<=` Disjunction[?UnicodeMode, ?N] `)` `(` `?` `<!` Disjunction[?UnicodeMode, ?N] `)` QuantifiableAssertion[N] :: `(` `?` `=` Disjunction[~UnicodeMode, ?N] `)` `(` `?` `!` Disjunction[~UnicodeMode, ?N] `)` ExtendedAtom[N] :: `.` `\` AtomEscape[~UnicodeMode, ?N] `\` [lookahead == `c`] CharacterClass[~UnicodeMode] `(` Disjunction[~UnicodeMode, ?N] `)` <del>`(` `?` `:` Disjunction[~UnicodeMode, ?N] `)`</del> <ins>`(` `?` RegularExpressionFlags `:` Disjunction[?UnicodeMode, ?N] `)`</ins> <ins>`(` `?` RegularExpressionFlags `-` RegularExpressionFlags `:` Disjunction[?UnicodeMode, ?N] `)`</ins> InvalidBracedQuantifier ExtendedPatternCharacter InvalidBracedQuantifier :: `{` DecimalDigits[~Sep] `}` `{` DecimalDigits[~Sep] `,` `}` `{` DecimalDigits[~Sep] `,` DecimalDigits[~Sep] `}` ExtendedPatternCharacter :: SourceCharacter but not one of `^` `$` `\` `.` `*` `+` `?` `(` `)` `[` `|` AtomEscape[UnicodeMode, N] :: [+UnicodeMode] DecimalEscape [~UnicodeMode] DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is ≤ _NcapturingParens_] CharacterClassEscape[?UnicodeMode] CharacterEscape[?UnicodeMode, ?N] [+N] `k` GroupName[?UnicodeMode] CharacterEscape[UnicodeMode, N] :: ControlEscape `c` ControlLetter `0` [lookahead ∉ DecimalDigit] HexEscapeSequence RegExpUnicodeEscapeSequence[?UnicodeMode] [~UnicodeMode] LegacyOctalEscapeSequence IdentityEscape[?UnicodeMode, ?N] IdentityEscape[UnicodeMode, N] :: [+UnicodeMode] SyntaxCharacter [+UnicodeMode] `/` [~UnicodeMode] SourceCharacterIdentityEscape[?N] SourceCharacterIdentityEscape[N] :: [~N] SourceCharacter but not `c` [+N] SourceCharacter but not one of `c` or `k` ClassAtomNoDash[UnicodeMode, N] :: SourceCharacter but not one of `\` or `]` or `-` `\` ClassEscape[?UnicodeMode, ?N] `\` [lookahead == `c`] ClassEscape[UnicodeMode, N] :: `b` [+UnicodeMode] `-` [~UnicodeMode] `c` ClassControlLetter CharacterClassEscape[?UnicodeMode] CharacterEscape[?UnicodeMode, ?N] ClassControlLetter :: DecimalDigit `_` </emu-grammar> <emu-note> <p>When the same left-hand sides occurs with both [+UnicodeMode] and [\~UnicodeMode] guards it is to control the disambiguation priority.</p> </emu-note> </emu-annex> </emu-annex> </emu-annex>