Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify PartialOrd on tuples containing primitives #138135

Merged
merged 3 commits into from
Mar 24, 2025

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented Mar 7, 2025

We noticed in #133984 (comment) that currently the tuple comparison code, while it does optimize down today, is kinda huge: https://rust.godbolt.org/z/xqMoeYbhE

This PR changes the tuple code to go through an overridable "chaining" version of the comparison functions, so that for simple things like (i16, u16) and (f32, f32) (as seen in the new MIR pre-codegen test) we just directly get the

if lhs.0 == rhs.0 { lhs.0 OP rhs.0 }
else { lhs.1 OP rhs.1 }

version in MIR, rather than emitting a mess for LLVM to have to clean up.

Test added in the first commit, so you can see the MIR diff in the second one.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 7, 2025
@rust-log-analyzer

This comment has been minimized.

StorageDead(_3);
_9 = &((*_1).1: f32);
_10 = &((*_2).1: f32);
_0 = <f32 as PartialOrd>::le(move _9, move _10) -> [return: bb3, unwind continue];
Copy link
Member Author

@scottmcm scottmcm Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...well, it almost optimizes down completely. At least it's down to just 4 blocks, which is the smallest we can have here since (AFAIK) we're not allowed supposed to have two returns.

I wasn't sure if we had something tracking this inliner imperfection, so filed #138136 to have a way to reference it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually in the 2 weeks since I opened this I managed to fix that inliner imperfection, so the MIR here is now really good -- no function calls left at all 🎉

(Though it still has the #138544 problem, but that's much more minor in comparison.)

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 7, 2025
…<try>

Allow more top-down inlining for single-BB callees

This means that things like `<usize as Step>::forward_unchecked` and `<PartialOrd for f32>::le` will inline even if
we've already done a bunch of inlining to find the calls to them.

Fixes rust-lang#138136

Draft as it's built atop rust-lang#138135, which adds a mir-opt test that's a nice demonstration of this.
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 8, 2025
…<try>

Allow more top-down inlining for single-BB callees

This means that things like `<usize as Step>::forward_unchecked` and `<PartialOrd for f32>::le` will inline even if
we've already done a bunch of inlining to find the calls to them.

Fixes rust-lang#138136

~~Draft as it's built atop rust-lang#138135, which adds a mir-opt test that's a nice demonstration of this.  To see just this change, look at <https://github.com/rust-lang/rust/pull/138157/commits/48f63e3be552605c2933056b77bf23a326757f92>~~ Rebased to be just the inlining change, as the other existing tests show it great.
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 9, 2025
…<try>

Allow more top-down inlining for single-BB callees

This means that things like `<usize as Step>::forward_unchecked` and `<PartialOrd for f32>::le` will inline even if
we've already done a bunch of inlining to find the calls to them.

Fixes rust-lang#138136

~~Draft as it's built atop rust-lang#138135, which adds a mir-opt test that's a nice demonstration of this.  To see just this change, look at <https://github.com/rust-lang/rust/pull/138157/commits/48f63e3be552605c2933056b77bf23a326757f92>~~ Rebased to be just the inlining change, as the other existing tests show it great.
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 13, 2025
…oli-obk

Allow more top-down inlining for single-BB callees

This means that things like `<usize as Step>::forward_unchecked` and `<PartialOrd for f32>::le` will inline even if
we've already done a bunch of inlining to find the calls to them.

Fixes rust-lang#138136

~~Draft as it's built atop rust-lang#138135, which adds a mir-opt test that's a nice demonstration of this.  To see just this change, look at <https://github.com/rust-lang/rust/pull/138157/commits/48f63e3be552605c2933056b77bf23a326757f92>~~ Rebased to be just the inlining change, as the other existing tests show it great.
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 13, 2025
…oli-obk

Allow more top-down inlining for single-BB callees

This means that things like `<usize as Step>::forward_unchecked` and `<PartialOrd for f32>::le` will inline even if
we've already done a bunch of inlining to find the calls to them.

Fixes rust-lang#138136

~~Draft as it's built atop rust-lang#138135, which adds a mir-opt test that's a nice demonstration of this.  To see just this change, look at <https://github.com/rust-lang/rust/pull/138157/commits/48f63e3be552605c2933056b77bf23a326757f92>~~ Rebased to be just the inlining change, as the other existing tests show it great.
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 14, 2025
…oli-obk

Allow more top-down inlining for single-BB callees

This means that things like `<usize as Step>::forward_unchecked` and `<PartialOrd for f32>::le` will inline even if
we've already done a bunch of inlining to find the calls to them.

Fixes rust-lang#138136

~~Draft as it's built atop rust-lang#138135, which adds a mir-opt test that's a nice demonstration of this.  To see just this change, look at <https://github.com/rust-lang/rust/pull/138157/commits/48f63e3be552605c2933056b77bf23a326757f92>~~ Rebased to be just the inlining change, as the other existing tests show it great.
We have codegen ones, but it looks like we could make those less flakey by just doing something better in the first place...
@@ -12,5 +12,5 @@ pub fn demo_le_total(a: &(u16, i16), b: &(u16, i16)) -> bool {
// EMIT_MIR tuple_ord.demo_ge_partial.PreCodegen.after.mir
pub fn demo_ge_partial(a: &(f32, f32), b: &(f32, f32)) -> bool {
// CHECK-LABEL: demo_ge_partial
a <= b
a >= b
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh, I somehow managed to not notice that I'd put <= in the ge test 🤦

Sorry for the slightly-worse diff; it doesn't really change anything material though.

github-actions bot pushed a commit to model-checking/verify-rust-std that referenced this pull request Mar 19, 2025
…oli-obk

Allow more top-down inlining for single-BB callees

This means that things like `<usize as Step>::forward_unchecked` and `<PartialOrd for f32>::le` will inline even if
we've already done a bunch of inlining to find the calls to them.

Fixes rust-lang#138136

~~Draft as it's built atop rust-lang#138135, which adds a mir-opt test that's a nice demonstration of this.  To see just this change, look at <https://github.com/rust-lang/rust/pull/138157/commits/48f63e3be552605c2933056b77bf23a326757f92>~~ Rebased to be just the inlining change, as the other existing tests show it great.
@scottmcm
Copy link
Member Author

r? libs

/// directly, instead of needing to optimize the 3-way comparison.
///
/// Currently this is done using specialization, but it doesn't need that:
/// it could be provided methods on `PartialOrd` instead and work fine.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worse for compile times or similar to make these (unstable) provided methods? If we can avoid another use of specialization that seems worthwhile to me - I forget if core's usage is guaranteed sound or not (I seem to recall some gaps)...

Copy link
Member Author

@scottmcm scottmcm Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this usage is sound, since we're only specializing on primitives that don't have lifetimes. But I was torn between the two anyway, so if you have a weak preference for the other way I'm happy to give that a shot. Let's see how it comes out. I always like less specialization 🙂

@rustbot author

StorageDead(_4);
StorageDead(_3);
_8 = copy ((_7 as Break).0: bool);
_0 = copy _8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised we're not able to make this block _0 = Ge(...) like bb2 ends up as... I'm sure LLVM will work it out though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is #138544 :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's right. bb2 is the second field, so it's just return a.1 < b.1, and doesn't have anything to optimize out, but here in bb1 we can't quite fix it in MIR yet became the passes that know how to fix it don't see it in the form we see here in the PreCodegen MIR -- earlier before another round of SimplifyCfg the basic block structure is messier.

And yes, LLVM will fix it. I'm also working on other changes (#138759 and the unfinished #138582) that'll mean it'll even be fixed in debug codegen, rather than SRoA needing to fix it in LLVM.

@Mark-Simulacrum
Copy link
Member

r=me if refactoring to avoid specialization doesn't seem warranted to you, not a strong opinion there.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 23, 2025
Uses `__`-named `doc(hidden)` methods instead.
@scottmcm
Copy link
Member Author

Nice, I like this better. I think it'd fix more easily with extending it to other things too, though I'm not going to do that in this PR.

@bors r=Mark-Simulacrum

@bors
Copy link
Contributor

bors commented Mar 23, 2025

📌 Commit 7781346 has been approved by Mark-Simulacrum

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 23, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 24, 2025
Rollup of 10 pull requests

Successful merges:

 - rust-lang#137593 (fix download-llvm logic for subtree sync branches)
 - rust-lang#137736 (Don't attempt to export compiler-builtins symbols from rust dylibs)
 - rust-lang#138135 (Simplify `PartialOrd` on tuples containing primitives)
 - rust-lang#138321 ([bootstrap] Distribute split debuginfo if present)
 - rust-lang#138574 (rustdoc: be more strict about "Methods from Deref")
 - rust-lang#138606 (Fix missing rustfmt in msi installer - cont)
 - rust-lang#138671 (Fix `FileType` `PartialEq` implementation on Windows)
 - rust-lang#138728 (Update `compiler-builtins` to 0.1.152)
 - rust-lang#138783 (Cache current_dll_path output)
 - rust-lang#138846 (Tweaks to writeback and `Obligation -> Goal` conversion)

Failed merges:

 - rust-lang#138755 ([rustdoc] Remove duplicated loop when computing doc cfgs)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 1ba9b78 into rust-lang:master Mar 24, 2025
6 checks passed
@rustbot rustbot added this to the 1.87.0 milestone Mar 24, 2025
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Mar 24, 2025
Rollup merge of rust-lang#138135 - scottmcm:chaining-ord, r=Mark-Simulacrum

Simplify `PartialOrd` on tuples containing primitives

We noticed in rust-lang#133984 (comment) that currently the tuple comparison code, while it [does optimize down](https://github.com/rust-lang/rust/blob/master/tests/codegen/comparison-operators-2-tuple.rs) today, is kinda huge: <https://rust.godbolt.org/z/xqMoeYbhE>

This PR changes the tuple code to go through an overridable "chaining" version of the comparison functions, so that for simple things like `(i16, u16)` and `(f32, f32)` (as seen in the new MIR pre-codegen test) we just directly get the
```rust
if lhs.0 == rhs.0 { lhs.0 OP rhs.0 }
else { lhs.1 OP rhs.1 }
```
version in MIR, rather than emitting a mess for LLVM to have to clean up.

Test added in the first commit, so you can see the MIR diff in the second one.
@scottmcm scottmcm deleted the chaining-ord branch March 24, 2025 04:43
@scottmcm
Copy link
Member Author

@rust-timer build 8115bc1

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8115bc1): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.2% [0.1%, 0.2%] 2
Regressions ❌
(secondary)
0.3% [0.3%, 0.3%] 1
Improvements ✅
(primary)
-0.3% [-0.3%, -0.3%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.1% [-0.3%, 0.2%] 4

Max RSS (memory usage)

Results (primary -3.5%, secondary 0.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.2% [2.2%, 2.2%] 1
Regressions ❌
(secondary)
2.9% [2.9%, 2.9%] 1
Improvements ✅
(primary)
-9.2% [-9.2%, -9.2%] 1
Improvements ✅
(secondary)
-1.6% [-1.6%, -1.6%] 1
All ❌✅ (primary) -3.5% [-9.2%, 2.2%] 2

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary 0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.1%, 0.4%] 7
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.1% [-0.2%, -0.0%] 6
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-0.2%, 0.4%] 13

Bootstrap: 774.849s -> 772.479s (-0.31%)
Artifact size: 365.54 MiB -> 365.54 MiB (0.00%)

@rustbot rustbot added the perf-regression Performance regression. label Mar 24, 2025
@scottmcm
Copy link
Member Author

@rustbot label: +perf-regression-triaged
This roughly evened out between a couple of small changes for icounts, and looks good on bootstrap time. So I think it's fine, especially since it'll unblock #133984 (comment)

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants