Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

literal pattern lowering: use the pattern's type instead of the literal's in const_to_pat #138992

Merged
merged 3 commits into from
Apr 2, 2025

Conversation

dianne
Copy link
Contributor

@dianne dianne commented Mar 26, 2025

This has two purposes:

  • First, it enables removing the treat_byte_string_as_slice fields from TypeckResults and ConstToPat. A byte string pattern's type will be &[u8] when matching on a slice reference, so const_to_pat will lower it to a slice ref pattern. I believe this is tested by tests/ui/match/pattern-deref-miscompile.rs.
  • Second, it will simplify the implementation of byte string literals in deref patterns. If byte string patterns can be given the type [u8; N] or [u8] during HIR typeck, then nothing needs to be changed in const_to_pat in order to lower the patterns deref!(b"..."): Vec<u8> and deref!(b"..."): Box<[u8; 3]>.

Implementation-wise, this uses lit_to_const to make a const with the pattern's type and the literal's valtree; that feels to me like the best way to make sure that the valtree representations of the pattern type and literal are the same. Though it may necessitate later changes to lit_to_const to accommodate giving byte string literal patterns non-reference types—would that be reasonable?

This unfortunately doesn't work for the string_deref_patterns feature (since that gives string literal patterns the String type), so I added a workaround for that. However, once deref_patterns supports string literals, it may be able to replace string_deref_patterns; the special case for String can removed at that point.

r? @oli-obk

dianne added 2 commits March 26, 2025 10:10
This allows us to remove the field `treat_byte_string_as_slice` from
`TypeckResults`, since the pattern's type contains everything necessary
to get the correct lowering for byte string literal patterns.

This leaves the implementation of `string_deref_patterns` broken, to be
fixed in the next commit.
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 26, 2025
@rustbot
Copy link
Collaborator

rustbot commented Mar 26, 2025

Some changes occurred in match checking

cc @Nadrieril

@dianne
Copy link
Contributor Author

dianne commented Mar 26, 2025

This unfortunately doesn't work for the string_deref_patterns feature (since that gives string literal patterns the String type), so I added a workaround for that. However, once deref_patterns supports string literals, it may be able to replace string_deref_patterns; the special case for String can removed at that point.

Actually, maybe it would make more sense for this to only use the pattern type for byte string literals. It's hacky either way, but for deref_patterns to work with normal string literals, it's nicer to have a &str constant. We need &strs to do string comparisons, and otherwise it'd require a special case for str in const_to_pat and lit_to_const.

Edit:

Though it may necessitate later changes to lit_to_const to accommodate giving byte string literal patterns non-reference types—would that be reasonable?

On a technical level at least, this does seem to work; nothing's stopping a ConstKind::Value const from having an unsized type, and the tests seem to pass. Maybe it's a bit unusual to make a const of an unsized type though? The alternative I can see here would be to make a constant of type &[u8; N] and override const_to_pat to use the pattern's type instead of the constant's (trusting that the valtree representations are the same). That feels shakier to me, though; I like that lit_to_const has checks and will ICE very predictably if something's wrong.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 1, 2025

Since String is a lang item, could we just require its valtree to be the same as str. At present this simply means that you'd change const_to_pat to support String where it supports str, no need to add any logic that would ever produce such a valtree on its own

@dianne
Copy link
Contributor Author

dianne commented Apr 1, 2025

The one awkward thing I can see with that is that errors for trying to use non-StructuralPartialEq consts as patterns aren't caught until valtree_to_pat, after the valtree's been made; there'd need to be some intervention to handle users trying to use String::new() as a pattern1. I'm not sure what the status of structural match is, but I have to imagine valtrees for String aren't constructed in any other situation, given its current valtree representation isn't equality-preserving. So I don't think it'd break anything to add a special case for string_deref_patterns, but it'd be a bit more involved2.

Footnotes

  1. Funnily, since String derives PartialEq, it gets past the first check for shallow structural equality; the error message for using String::new() as a pattern (including on stable) is "constant of non-structural type Vec<u8> in a pattern"

  2. Unless errors for non-structural types get moved earlier, e.g. into const_to_valtree_inner. That might be conceptually cleaner anyway.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 1, 2025

Pretty sure valtree construction fails on String. Const to pat never sees it

@dianne
Copy link
Contributor Author

dianne commented Apr 1, 2025

That's what I would have thought, since I'd expect it to have a pointer in it, but debug logging for valtree_to_pat says otherwise: using const S: String = String::new(); as a pattern results in the valtree

Branch([Branch([Branch([Branch([Branch([Branch([Leaf(0x0000000000000001)]), Branch([])]), Branch([Leaf(0x0000000000000000)]), Branch([])]), Branch([])]), Leaf(0x0000000000000000)])])

The check/error for structural equality is in valtree_to_pat as well, and that can be seen on playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=8f872de7835eaf0ad3363b8676fb8685.

Maybe there's a bug in valtree construction and it's not getting stopped where it should?

@oli-obk
Copy link
Contributor

oli-obk commented Apr 1, 2025

Ah, I stand corrected. Thank you

is where I thought we'd check struct eq.

Hmm... Funny wrt Vec being the non struct eq type.

So yea, we should try moving the struct eq checks into valtree creation. Maybe not in this PR tho. I think we can land your PR as is, but I think we'll need some more tests. Consts of SomeStruct<[u8]> type and SomeStruct<[u8;10]> with the corresponding other type as the scrutinee type. There's probably other things we aren't testing that may be affected by this PR

@dianne
Copy link
Contributor Author

dianne commented Apr 2, 2025

I've added some tests for those cases, as well as for consts of types &[u8] and &[u8;3]. It looks like they're unaffected by this change. As for other possibly missing tests, the main things I can think of that could be affected are:

  • Byte string literal patterns (which already have tests for correctness and inference)
  • The string_deref_patterns feature (which already has its own tests, including some for when the gate is disabled)
  • Other literal patterns (there's lots of tests for string and numeric literals)
  • Const patterns with arrays and slices in them (it was harder to find run-pass tests here, but see e.g. tests/ui/consts/references.rs and tests/ui/rfcs/rfc-2005-default-binding-mode/constref.rs. sort of also tests/patterns/usefulness/slice-pattern-const.rs since that tests exhaustiveness still works as expected)

I'll see how easy it would be to move the structural equality checks to valtree creation in another PR. As a bonus, I think that'll make it very slightly easier to e.g. blame String for not having structural equality, rather than Vec<u8>. Edit: Actually, on second look, the diagnostic change might be just as easy to make either way. Maybe I'll make that separately too; it'd be a merge conflict, but a small one.

// See also `tests/ui/match/pattern-deref-miscompile.rs`, which tests that byte string literal
// patterns check slices' length appropriately when matching on slices.
match BSTR_UNSIZED {
BSTR_SIZED => {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference in the repr between this and mentioning the pattern inline that causes this error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be these lines in the literal pattern checking rules. Byte string literal patterns are assigned a slice reference type if the current scrutinee is known to be a slice reference, and otherwise default to the type of the literal (an array reference). Since that link's from the master branch, that's also where treat_byte_string_as_slice was assigned. There's no array unsizing logic for named constant patterns (type-checking code here), so it fails to unify &[u8] with &[u8;3]. If, hypothetically, that was special-cased, and the pattern was assigned the type &[u8] rather than the constant's type, then that example would work too (though without the change this PR makes, it'd also have to set treat_byte_string_as_slice or it'd be missing length checks).

@oli-obk
Copy link
Contributor

oli-obk commented Apr 2, 2025

Thanks, I'm convinced now there are no issues with this change

@bors r+

@bors
Copy link
Collaborator

bors commented Apr 2, 2025

📌 Commit 7a4d4de has been approved by oli-obk

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 2, 2025
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Apr 2, 2025
…, r=oli-obk

literal pattern lowering: use the pattern's type instead of the literal's in `const_to_pat`

This has two purposes:
- First, it enables removing the `treat_byte_string_as_slice` fields from `TypeckResults` and `ConstToPat`. A byte string pattern's type will be `&[u8]` when matching on a slice reference, so `const_to_pat` will lower it to a slice ref pattern. I believe this is tested by `tests/ui/match/pattern-deref-miscompile.rs`.
- Second, it will simplify the implementation of byte string literals in deref patterns. If byte string patterns can be given the type `[u8; N]` or `[u8]` during HIR typeck, then nothing needs to be changed in `const_to_pat` in order to lower the patterns `deref!(b"..."): Vec<u8>` and `deref!(b"..."): Box<[u8; 3]>`.

Implementation-wise, this uses `lit_to_const` to make a const with the pattern's type and the literal's valtree; that feels to me like the best way to make sure that the valtree representations of the pattern type and literal are the same. Though it may necessitate later changes to `lit_to_const` to accommodate giving byte string literal patterns non-reference types—would that be reasonable?

This unfortunately doesn't work for the `string_deref_patterns` feature (since that gives string literal patterns the `String` type), so I added a workaround for that. However, once `deref_patterns` supports string literals, it may be able to replace `string_deref_patterns`; the special case for `String` can removed at that point.

r? `@oli-obk`
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 2, 2025
…iaskrgr

Rollup of 6 pull requests

Successful merges:

 - rust-lang#138992 (literal pattern lowering: use the pattern's type instead of the literal's in `const_to_pat`)
 - rust-lang#139211 (interpret: add a version of run_for_validation for &self)
 - rust-lang#139235 (`AstValidator` tweaks)
 - rust-lang#139237 (Add a dep kind for use of the anon node with zero dependencies)
 - rust-lang#139260 (Add dianqk to codegen reviewers)
 - rust-lang#139264 (Fix two incorrect turbofish suggestions)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 3fb1230 into rust-lang:master Apr 2, 2025
6 checks passed
@rustbot rustbot added this to the 1.88.0 milestone Apr 2, 2025
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Apr 2, 2025
Rollup merge of rust-lang#138992 - dianne:simplify-byte-string-to-pat, r=oli-obk

literal pattern lowering: use the pattern's type instead of the literal's in `const_to_pat`

This has two purposes:
- First, it enables removing the `treat_byte_string_as_slice` fields from `TypeckResults` and `ConstToPat`. A byte string pattern's type will be `&[u8]` when matching on a slice reference, so `const_to_pat` will lower it to a slice ref pattern. I believe this is tested by `tests/ui/match/pattern-deref-miscompile.rs`.
- Second, it will simplify the implementation of byte string literals in deref patterns. If byte string patterns can be given the type `[u8; N]` or `[u8]` during HIR typeck, then nothing needs to be changed in `const_to_pat` in order to lower the patterns `deref!(b"..."): Vec<u8>` and `deref!(b"..."): Box<[u8; 3]>`.

Implementation-wise, this uses `lit_to_const` to make a const with the pattern's type and the literal's valtree; that feels to me like the best way to make sure that the valtree representations of the pattern type and literal are the same. Though it may necessitate later changes to `lit_to_const` to accommodate giving byte string literal patterns non-reference types—would that be reasonable?

This unfortunately doesn't work for the `string_deref_patterns` feature (since that gives string literal patterns the `String` type), so I added a workaround for that. However, once `deref_patterns` supports string literals, it may be able to replace `string_deref_patterns`; the special case for `String` can removed at that point.

r? ``@oli-obk``
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants