Skip to content

Case-insensitive regex with repetitions does not match uppercase characters correctly #785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fwcd opened this issue Nov 8, 2024 · 2 comments · Fixed by #797
Closed
Labels
bug Something isn't working

Comments

@fwcd
Copy link
Member

fwcd commented Nov 8, 2024

Description

In case-insensitive regular expressions containing repetitions (+ or *), the repeated part is not matched correctly against differently cased characters in the string to be searched.

Reproduction

func test(_ pattern: Regex<Substring>, on s: String) throws {
    print("\(s) contains \(try pattern.firstMatch(in: s)?.output ?? "no match")")
}

for s in ["ab", "Ab"] {
    try test(#/ab/#.ignoresCase(), on: s)
    try test(#/a*b/#.ignoresCase(), on: s)
    try test(#/a+b/#.ignoresCase(), on: s)
}

This snippet outputs

ab contains ab
ab contains ab
ab contains ab
Ab contains Ab
Ab contains b
Ab contains no match

Expected behavior

I would expect it to output

ab contains ab
ab contains ab
ab contains ab
Ab contains Ab
Ab contains Ab
Ab contains Ab

Environment

swift-driver version: 1.115 Apple Swift version 6.0.2 (swiftlang-6.0.2.1.2 clang-1600.0.26.4)
Target: arm64-apple-macosx15.0

Additional information

No response

@fwcd fwcd added the bug Something isn't working label Nov 8, 2024
@hamishknight hamishknight transferred this issue from swiftlang/swift Nov 8, 2024
@jdberry
Copy link

jdberry commented Dec 13, 2024

Per swiftlang/swift#78155, optional characters indicated by ? also fail to match correctly.

natecook1000 added a commit that referenced this issue Jan 6, 2025
The optimized bytecode for matching repetition of a single character overlooks
case insensitivity. This resolves that by falling back to use an ASCII bitset
when doing a case-insensitive match of a cased character.

Fixes #785.
@jdberry
Copy link

jdberry commented Jan 8, 2025

@natecook1000 This should fix swiftlang/swift#78155 also?

natecook1000 added a commit that referenced this issue Jan 14, 2025
The optimized bytecode for matching repetition of a single character overlooks
case insensitivity. This resolves that by falling back to use an ASCII bitset
when doing a case-insensitive match of a cased character.

Fixes #785.
natecook1000 added a commit that referenced this issue Jan 14, 2025
The optimized bytecode for matching repetition of a single character overlooks
case insensitivity. This resolves that by falling back to use an ASCII bitset
when doing a case-insensitive match of a cased character.

Fixes #785.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants