Skip to content

Commit 3e2160c

Browse files
authored
Update local proposal copies (#317)
* Bring run time proposal up to speed * Update repo doc to be in line with SE
1 parent 73a5ccf commit 3e2160c

File tree

2 files changed

+47
-9
lines changed

2 files changed

+47
-9
lines changed

Documentation/Evolution/RegexSyntaxRunTimeConstruction.md

+22-3
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
11

22
# Regex Syntax and Run-time Construction
33

4-
- Authors: [Hamish Knight](https://github.com/hamishknight), [Michael Ilseman](https://github.com/milseman)
4+
* Proposal: [SE-NNNN](NNNN-filename.md)
5+
* Authors: [Hamish Knight](https://github.com/hamishknight), [Michael Ilseman](https://github.com/milseman)
6+
* Review Manager: [Ben Cohen](https://github.com/airspeedswift)
7+
* Status: **Awaiting review**
8+
* Implementation: https://github.com/apple/swift-experimental-string-processing
9+
* Available in nightly toolchain snapshots with `import _StringProcessing`
510

611
## Introduction
712

@@ -81,11 +86,11 @@ We propose initializers to declare and compile a regex from syntax. Upon failure
8186
```swift
8287
extension Regex {
8388
/// Parse and compile `pattern`, resulting in a strongly-typed capture list.
84-
public init(compiling pattern: String, as: Output.Type = Output.self) throws
89+
public init(_ pattern: String, as: Output.Type = Output.self) throws
8590
}
8691
extension Regex where Output == AnyRegexOutput {
8792
/// Parse and compile `pattern`, resulting in an existentially-typed capture list.
88-
public init(compiling pattern: String) throws
93+
public init(_ pattern: String) throws
8994
}
9095
```
9196

@@ -156,6 +161,20 @@ extension Regex.Match where Output == AnyRegexOutput {
156161
}
157162
```
158163

164+
We propose adding API to query and access captures by name in an existentially typed regex match:
165+
166+
```swift
167+
extension Regex.Match where Output == AnyRegexOutput {
168+
/// If a named-capture with `name` is present, returns its value. Otherwise `nil`.
169+
public subscript(_ name: String) -> AnyRegexOutput.Element? { get }
170+
}
171+
172+
extension AnyRegexOutput {
173+
/// If a named-capture with `name` is present, returns its value. Otherwise `nil`.
174+
public subscript(_ name: String) -> AnyRegexOutput.Element? { get }
175+
}
176+
```
177+
159178
The rest of this proposal will be a detailed and exhaustive definition of our proposed regex syntax.
160179

161180
<details><summary>Grammar Notation</summary>

Documentation/Evolution/RegexTypeOverview.md

+25-6
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
11
# Regex Type and Overview
22

3-
- Authors: [Michael Ilseman](https://github.com/milseman)
3+
* Proposal: [SE-0350](0350-regex-type-overview.md)
4+
* Authors: [Michael Ilseman](https://github.com/milseman)
5+
* Review Manager: [Ben Cohen](https://github.com/airspeedswift)
6+
* Status: **Active Review (4 - 28 April 2022)**
7+
* Implementation: https://github.com/apple/swift-experimental-string-processing
8+
* Available in nightly toolchain snapshots with `import _StringProcessing`
49

510
## Introduction
611

@@ -207,7 +212,7 @@ func processEntry(_ line: String) -> Transaction? {
207212
// amount: Substring
208213
// )>
209214

210-
guard let match = regex.matchWhole(line),
215+
guard let match = regex.wholeMatch(line),
211216
let kind = Transaction.Kind(match.kind),
212217
let date = try? Date(String(match.date), strategy: dateParser),
213218
let amount = try? Decimal(String(match.amount), format: decimalParser)
@@ -384,21 +389,25 @@ extension Regex.Match {
384389
// Run-time compilation interfaces
385390
extension Regex {
386391
/// Parse and compile `pattern`, resulting in a strongly-typed capture list.
387-
public init(compiling pattern: String, as: Output.Type = Output.self) throws
392+
public init(_ pattern: String, as: Output.Type = Output.self) throws
388393
}
389394
extension Regex where Output == AnyRegexOutput {
390395
/// Parse and compile `pattern`, resulting in an existentially-typed capture list.
391-
public init(compiling pattern: String) throws
396+
public init(_ pattern: String) throws
392397
}
393398
```
394399

400+
### Cancellation
401+
402+
Regex is somewhat different from existing standard library operations in that regex processing can be a long-running task.
403+
For this reason regex algorithms may check if the parent task has been cancelled and end execution.
404+
395405
### On severability and related proposals
396406

397407
The proposal split presented is meant to aid focused discussion, while acknowledging that each is interconnected. The boundaries between them are not completely cut-and-dry and could be refined as they enter proposal phase.
398408

399409
Accepting this proposal in no way implies that all related proposals must be accepted. They are severable and each should stand on their own merit.
400410

401-
402411
## Source compatibility
403412

404413
Everything in this proposal is additive. Regex delimiters may have their own source compatibility impact, which is discussed in that proposal.
@@ -488,6 +497,16 @@ The generic parameter `Output` is proposed to contain both the whole match (the
488497

489498
The biggest issue with this alternative design is that the numbering of `Captures` elements misaligns with the numbering of captures in textual regexes, where backreference `\0` refers to the entire match and captures start at `\1`. This design would sacrifice familarity and have the pitfall of introducing off-by-one errors.
490499

500+
### Encoding `Regex`es into the type system
501+
502+
During the initial review period the following comment was made:
503+
504+
> I think the goal should be that, at least for regex literals (and hopefully for the DSL to some extent), one day we might not even need a bytecode or interpreter. I think the ideal case is if each literal was its own function or type that gets generated and optimised as if you wrote it in Swift.
505+
506+
This is an approach that has been tried a few times in a few different languages (including by a few members of the Swift Standard Library and Core teams), and while it can produce attractive microbenchmarks, it has almost always proved to be a bad idea at the macro scale. In particular, even if we set aside witness tables and other associated swift generics overhead, optimizing a fixed pipeline for each pattern you want to match causes significant codesize expansion when there are multiple patterns in use, as compared to a more flexible byte code interpreter. A bytecode interpreter makes better use of instruction caches and memory, and can also benefit from micro architectural resources that are shared across different patterns. There is a tradeoff w.r.t. branch prediction resources, where separately compiled patterns may have more decisive branch history data, but a shared bytecode engine has much more data to use; this tradeoff tends to fall on the side of a bytecode engine, but it does not always do so.
507+
508+
It should also be noted that nothing prevents AOT or JIT compiling of the bytecode if we believe it will be advantageous, but compiling or interpreting arbitrary Swift code at runtime is rather more unattractive, since both the type system and language are undecidable. Even absent this rationale, we would probably not encode regex programs directly into the type system simply because it is unnecessarily complex.
509+
491510
### Future work: static optimization and compilation
492511

493512
Swift's support for static compilation is still developing, and future work here is leveraging that to compile regex when profitable. Many regex describe simple [DFAs](https://en.wikipedia.org/wiki/Deterministic_finite_automaton) and can be statically compiled into very efficient programs. Full static compilation needs to be balanced with code size concerns, as a matching-specific bytecode is typically far smaller than a corresponding program (especially since the bytecode interpreter is shared).
@@ -551,4 +570,4 @@ Regexes are often used for tokenization and tokens can be represented with Swift
551570
552571
-->
553572

554-
[pitches]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/ProposalOverview.md
573+
[pitches]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/ProposalOverview.md

0 commit comments

Comments
 (0)