-
Notifications
You must be signed in to change notification settings - Fork 49
Escape sequence + empty char class tweaks #226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -641,17 +641,67 @@ extension AST.Atom { | |||
case .scalar(let s): | |||
return Character(s) | |||
|
|||
case .escaped(let c): | |||
switch c { | |||
// TODO: Should we separate these into a separate enum? Or move the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but feel free to look first at the use sites of literalCharacterValue
to see if there's really two different questions an AST client is asking. There might be a literal value for the purposes of character class treatment vs a literal value for the purposes of performing the match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
literalCharacterValue
is currently only really used for building a model character class, and not for regular matching. It seems like we might want to use these scalars for both though, I actually wonder whether we should convert these escape sequences into a .scalar
DSL atoms? Because ultimately it seems like these are just syntactic sugar for unicode scalar sequences, and it would allow them to be supported by the matching engine. Any thoughts?
Throw an error for unknown a-z escape sequences as well as non-ASCII non-whitespace escape sequences.
Certain escape sequences express a unicode scalar and as such are valid in a range.
This is now done from the DSLTree.
As per PCRE, Oniguruma, and ICU, a first character of `]` is treated as literal.
Argh, won't let me just reply in-line. I think converting anything that is literally matching a scalar value to a |
@swift-ci please test |
Going to land this so I can land more stuff on top of it, I can work on converting to scalars in a follow-up |
a-zA-Z
and non-ASCII non-whitespace escape sequences.]
as the first character is instead treated as literal, for consistency with PCRE, Oniguruma, and ICU.