Skip to content

Don't parse a character property containing a backslash #301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 19, 2022

Conversation

hamishknight
Copy link
Contributor

Add backslash to the list of characters we don't consider valid for a character property name (previous rules implemented in #269). This means that we'll bail when attempting to lex a POSIX character property and instead lex a custom character class if we see a \. This allows e.g [:\Q :] \E] to be lexed as a custom character class. For \p{...} this just means we'll emit a truncated invalid property error, which is arguably more inline with what the user was expecting..

I noticed this when digging through the ICU source code. It will bail out of parsing a POSIX character property if it encounters one of its known escape sequences (e.g \a, \e, \f, ...). Interestingly this doesn't cover character property escapes e.g \d, but it's not clear that is intentional. Given backslash is not a valid character property character anyway, it seems reasonable to broaden this behavior to bail on any backslash.

Add backslash to the list of characters we don't
consider valid for a character property name. This
means that we'll bail when attempting to lex a
POSIX character property and instead lex a custom
character class. This allows e.g `[:\Q :] \E]` to
be lexed as a custom character class. For `\p{...}`
this just means we'll emit a truncated invalid
property error, which is arguably more inline with
what the user was expecting..

I noticed when digging through the ICU source code
that it will bail out of parsing a POSIX character
property if it encounters one of its known escape
sequences (e.g `\a`, `\e`, `\f`, ...). Interestingly
this doesn't cover character property escapes e.g
`\d`, but it's not clear that is intentional. Given
backslash is not a valid character property character
anyway, it seems reasonable to broaden this behavior
to bail on any backslash.
@hamishknight
Copy link
Contributor Author

@swift-ci please test

Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hamishknight hamishknight merged commit 8068ea1 into swiftlang:main Apr 19, 2022
@hamishknight hamishknight deleted the yet-more-posix-quirks branch April 19, 2022 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants