Don't parse a character property containing a backslash #301

hamishknight · 2022-04-19T12:07:03Z

Add backslash to the list of characters we don't consider valid for a character property name (previous rules implemented in #269). This means that we'll bail when attempting to lex a POSIX character property and instead lex a custom character class if we see a \. This allows e.g [:\Q :] \E] to be lexed as a custom character class. For \p{...} this just means we'll emit a truncated invalid property error, which is arguably more inline with what the user was expecting..

I noticed this when digging through the ICU source code. It will bail out of parsing a POSIX character property if it encounters one of its known escape sequences (e.g \a, \e, \f, ...). Interestingly this doesn't cover character property escapes e.g \d, but it's not clear that is intentional. Given backslash is not a valid character property character anyway, it seems reasonable to broaden this behavior to bail on any backslash.

Add backslash to the list of characters we don't consider valid for a character property name. This means that we'll bail when attempting to lex a POSIX character property and instead lex a custom character class. This allows e.g `[:\Q :] \E]` to be lexed as a custom character class. For `\p{...}` this just means we'll emit a truncated invalid property error, which is arguably more inline with what the user was expecting.. I noticed when digging through the ICU source code that it will bail out of parsing a POSIX character property if it encounters one of its known escape sequences (e.g `\a`, `\e`, `\f`, ...). Interestingly this doesn't cover character property escapes e.g `\d`, but it's not clear that is intentional. Given backslash is not a valid character property character anyway, it seems reasonable to broaden this behavior to bail on any backslash.

hamishknight · 2022-04-19T12:07:11Z

@swift-ci please test

milseman

LGTM

hamishknight requested review from milseman and natecook1000 April 19, 2022 12:08

hamishknight mentioned this pull request Apr 19, 2022

Update Regex Syntax document for [:...:] changes #302

Merged

milseman approved these changes Apr 19, 2022

View reviewed changes

hamishknight merged commit 8068ea1 into swiftlang:main Apr 19, 2022

hamishknight deleted the yet-more-posix-quirks branch April 19, 2022 19:25

hamishknight mentioned this pull request Apr 21, 2022

[5.7] Cherry-pick parser changes to 5.7 #309

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't parse a character property containing a backslash #301

Don't parse a character property containing a backslash #301

Uh oh!

hamishknight commented Apr 19, 2022

Uh oh!

hamishknight commented Apr 19, 2022

Uh oh!

milseman left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Don't parse a character property containing a backslash #301

Don't parse a character property containing a backslash #301

Uh oh!

Conversation

hamishknight commented Apr 19, 2022

Uh oh!

hamishknight commented Apr 19, 2022

Uh oh!

milseman left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants