-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lower BinOp::Cmp to llvm.{s,u}cmp.* intrinsics #133984
Conversation
LLVM 19 doesn't have all the optimization support for these intrinsics yet. |
Should the check be on LLVM 20? |
This comment has been minimized.
This comment has been minimized.
I'm excited to do this, but if nikic says it's too early, that probably means it shouldn't be on by default yet. Maybe make this PR adding a -Z flag for it, so people can play with it and test out how close it is? (Then in the future once it's fully ready we can remove that flag again) |
Yes, that's fine. You'll have to test locally against LLVM 20 though to make sure codegen tests don't break. |
9759fdc
to
dcdb7ed
Compare
@rustbot ready I didn't add the unstable flag since LLVM 20 already tries to lower the manual icmp pattern to the intrinsic so doing it ourselves shouldn't change much. |
Given that https://rust-lang.zulipchat.com/#narrow/channel/187780-t-compiler.2Fwg-llvm/topic/LLVM.2020/near/495752627 is talking about landing LLVM 20 in nightly in just a couple weeks, should we wait for that before doing this PR? I don't know how much we get from having it now in LLVM 19, when the LLVM 20 versions of the tests won't run in bors.
|
Closing and reopening now that #135763 landed to re-run CI |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Lower BinOp::Cmp to llvm.{s,u}cmp.* intrinsics Lowers `mir::BinOp::Cmp` (`three_way_compare` intrinsic) to the corresponding LLVM `llvm.{s,u}cmp.i8.*` intrinsics. These are the intrinsics mentioned in rust-lang#118310, which are now available in LLVM 19. I couldn't find any follow-up PRs/discussions about this, please let me know if I missed something. r? `@scottmcm`
Ah, here's at least one LLVM bug: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable
define noundef zeroext i1 @check_lt_direct(i16 noundef %0, i16 noundef %1, i16 noundef %2, i16 noundef %3) unnamed_addr #0 {
start:
%_3.i4.i = tail call noundef range(i8 -1, 3) i8 @llvm.scmp.i8.i16(i16 %0, i16 %2)
switch i8 %_3.i4.i, label %bb4.i [
i8 2, label %"_ZN4core5tuple65_$LT$impl$u20$core..cmp..PartialOrd$u20$for$u20$$LP$U$C$T$RP$$GT$2lt17h933a2b8cae739748E.exit"
i8 0, label %bb5.i
]
bb5.i: ; preds = %start
%_0.i.i = icmp ult i16 %1, %3
br label %"_ZN4core5tuple65_$LT$impl$u20$core..cmp..PartialOrd$u20$for$u20$$LP$U$C$T$RP$$GT$2lt17h933a2b8cae739748E.exit"
bb4.i: ; preds = %start
%4 = icmp slt i16 %0, %2
br label %"_ZN4core5tuple65_$LT$impl$u20$core..cmp..PartialOrd$u20$for$u20$$LP$U$C$T$RP$$GT$2lt17h933a2b8cae739748E.exit"
"_ZN4core5tuple65_$LT$impl$u20$core..cmp..PartialOrd$u20$for$u20$$LP$U$C$T$RP$$GT$2lt17h933a2b8cae739748E.exit": ; preds = %start, %bb5.i, %bb4.i
%_0.sroa.0.0.i = phi i1 [ %_0.i.i, %bb5.i ], [ %4, %bb4.i ], [ false, %start ]
ret i1 %_0.sroa.0.0.i
}
If the returned range is changed to Filed as llvm/llvm-project#130179 |
@DaniPopes Is it possible you to emit the If not, this ugly hack looks like it fixes it for me, though I'd certainly rather something better--- a/library/core/src/tuple.rs
+++ b/library/core/src/tuple.rs
@@ -80,19 +80,19 @@ fn partial_cmp(&self, other: &($($T,)+)) -> Option<Ordering> {
}
#[inline]
fn lt(&self, other: &($($T,)+)) -> bool {
- lexical_ord!(lt, Less, $( ${ignore($T)} self.${index()}, other.${index()} ),+)
+ lexical_ord!(lt, $( ${ignore($T)} self.${index()}, other.${index()} ),+)
}
#[inline]
fn le(&self, other: &($($T,)+)) -> bool {
- lexical_ord!(le, Less, $( ${ignore($T)} self.${index()}, other.${index()} ),+)
+ lexical_ord!(le, $( ${ignore($T)} self.${index()}, other.${index()} ),+)
}
#[inline]
fn ge(&self, other: &($($T,)+)) -> bool {
- lexical_ord!(ge, Greater, $( ${ignore($T)} self.${index()}, other.${index()} ),+)
+ lexical_ord!(ge, $( ${ignore($T)} self.${index()}, other.${index()} ),+)
}
#[inline]
fn gt(&self, other: &($($T,)+)) -> bool {
- lexical_ord!(gt, Greater, $( ${ignore($T)} self.${index()}, other.${index()} ),+)
+ lexical_ord!(gt, $( ${ignore($T)} self.${index()}, other.${index()} ),+)
}
}
}
@@ -170,16 +170,22 @@ macro_rules! maybe_tuple_doc {
// The values are interleaved, so the macro invocation for
// `(a1, a2, a3) < (b1, b2, b3)` would be `lexical_ord!(lt, opt_is_lt, a1, b1,
// a2, b2, a3, b3)` (and similarly for `lexical_cmp`)
-//
-// `$ne_rel` is only used to determine the result after checking that they're
-// not equal, so `lt` and `le` can both just use `Less`.
macro_rules! lexical_ord {
- ($rel: ident, $ne_rel: ident, $a:expr, $b:expr, $($rest_a:expr, $rest_b:expr),+) => {{
+ ($rel: ident, $a:expr, $b:expr, $($rest_a:expr, $rest_b:expr),+) => {{
let c = PartialOrd::partial_cmp(&$a, &$b);
- if c != Some(Equal) { c == Some($ne_rel) }
- else { lexical_ord!($rel, $ne_rel, $($rest_a, $rest_b),+) }
+ // FIXME refactor to make this simpler once the LLVM issue is fixed
+ // <https://github.com/llvm/llvm-project/issues/130179>
+ // SAFETY: Because these have the same size (it wouldn't compile otherwise)
+ // there's no padding in the option, so all the bytes are defined.
+ let c = unsafe { crate::mem::transmute::<Option<Ordering>, i8>(c) };
+ if let -1..=1 = c {
+ if let 0 = c { lexical_ord!($rel, $($rest_a, $rest_b),+) }
+ else { PartialOrd::$rel(&c, &0) }
+ } else {
+ false
+ }
}};
- ($rel: ident, $ne_rel: ident, $a:expr, $b:expr) => {
+ ($rel: ident, $a:expr, $b:expr) => {
// Use the specific method for the last element
PartialOrd::$rel(&$a, &$b)
}; EDIT: Here's a bigger redo of the tuple code to just be better in MIR before we even get to LLVM: #138135 |
Confirmed locally that #138135 will fix this, so marking as either blocked on a change here to emit the range on the [su]cmp calls or waiting for the author here to include the range on the call instructions. |
Something like this, maybe? diff --git a/compiler/rustc_codegen_llvm/src/builder.rs b/compiler/rustc_codegen_llvm/src/builder.rs
index 297f104d124..fdd745a092f 100644
--- a/compiler/rustc_codegen_llvm/src/builder.rs
+++ b/compiler/rustc_codegen_llvm/src/builder.rs
@@ -1101,7 +1101,16 @@ fn three_way_compare(
_ => bug!("three-way compare unsupported for type {ty:?}"),
};
- Some(self.call_intrinsic(name, &[lhs, rhs]))
+ let call = self.call_intrinsic(name, &[lhs, rhs]);
+ if self.cx.sess().opts.optimize != OptLevel::No {
+ let range = llvm::CreateRangeAttr(
+ self.cx.llcx,
+ Size::from_bytes(1),
+ WrappingRange { start: 255, end: 1 },
+ );
+ attributes::apply_to_callsite(call, llvm::AttributePlace::ReturnValue, &[range]);
+ }
+ Some(call)
}
/* Miscellaneous instructions */
diff --git a/tests/codegen/intrinsics/three_way_compare.rs b/tests/codegen/intrinsics/three_way_compare.rs
index 95fcb636f7c..632de4ecdf5 100644
--- a/tests/codegen/intrinsics/three_way_compare.rs
+++ b/tests/codegen/intrinsics/three_way_compare.rs
@@ -13,7 +13,8 @@
// CHECK-LABEL: @signed_cmp
// CHECK-SAME: (i16{{.*}} %a, i16{{.*}} %b)
pub fn signed_cmp(a: i16, b: i16) -> std::cmp::Ordering {
- // CHECK: %[[CMP:.+]] = call i8 @llvm.scmp.i8.i16(i16 %a, i16 %b)
+ // DEBUG: %[[CMP:.+]] = call i8 @llvm.scmp.i8.i16(i16 %a, i16 %b)
+ // OPTIM: %[[CMP:.+]] = call range(i8 -1, 2) i8 @llvm.scmp.i8.i16(i16 %a, i16 %b)
// CHECK-NEXT: ret i8 %[[CMP]]
three_way_compare(a, b)
}
@@ -22,7 +23,8 @@ pub fn signed_cmp(a: i16, b: i16) -> std::cmp::Ordering {
// CHECK-LABEL: @unsigned_cmp
// CHECK-SAME: (i16{{.*}} %a, i16{{.*}} %b)
pub fn unsigned_cmp(a: u16, b: u16) -> std::cmp::Ordering {
- // CHECK: %[[CMP:.+]] = call i8 @llvm.ucmp.i8.i16(i16 %a, i16 %b)
+ // DEBUG: %[[CMP:.+]] = call i8 @llvm.ucmp.i8.i16(i16 %a, i16 %b)
+ // OPTIM: %[[CMP:.+]] = call range(i8 -1, 2) i8 @llvm.ucmp.i8.i16(i16 %a, i16 %b)
// CHECK-NEXT: ret i8 %[[CMP]]
three_way_compare(a, b)
} |
Does not look like adding the range information fixes |
Still no? Darn. Adding the range information fixes it in LLVM 20.1 in https://llvm.godbolt.org/z/WPeqfEnGs :/ (Compared to https://llvm.godbolt.org/z/88n195vWv without the range information on the call.) |
Not sure if I'm doing something wrong Otherwise can wait for #138135 |
That sure looks right to me. Dunno if nikic might spot something different, though. |
…lacrum Simplify `PartialOrd` on tuples containing primitives We noticed in rust-lang#133984 (comment) that currently the tuple comparison code, while it [does optimize down](https://github.com/rust-lang/rust/blob/master/tests/codegen/comparison-operators-2-tuple.rs) today, is kinda huge: <https://rust.godbolt.org/z/xqMoeYbhE> This PR changes the tuple code to go through an overridable "chaining" version of the comparison functions, so that for simple things like `(i16, u16)` and `(f32, f32)` (as seen in the new MIR pre-codegen test) we just directly get the ```rust if lhs.0 == rhs.0 { lhs.0 OP rhs.0 } else { lhs.1 OP rhs.1 } ``` version in MIR, rather than emitting a mess for LLVM to have to clean up. Test added in the first commit, so you can see the MIR diff in the second one.
Rollup merge of rust-lang#138135 - scottmcm:chaining-ord, r=Mark-Simulacrum Simplify `PartialOrd` on tuples containing primitives We noticed in rust-lang#133984 (comment) that currently the tuple comparison code, while it [does optimize down](https://github.com/rust-lang/rust/blob/master/tests/codegen/comparison-operators-2-tuple.rs) today, is kinda huge: <https://rust.godbolt.org/z/xqMoeYbhE> This PR changes the tuple code to go through an overridable "chaining" version of the comparison functions, so that for simple things like `(i16, u16)` and `(f32, f32)` (as seen in the new MIR pre-codegen test) we just directly get the ```rust if lhs.0 == rhs.0 { lhs.0 OP rhs.0 } else { lhs.1 OP rhs.1 } ``` version in MIR, rather than emitting a mess for LLVM to have to clean up. Test added in the first commit, so you can see the MIR diff in the second one.
☀️ Test successful - checks-actions |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing f8c27df (parent) -> 1df5aff (this PR) Test differencesShow 22 test diffs
Job group index
|
Finished benchmarking commit (1df5aff): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary 0.9%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 777.151s -> 777.153s (0.00%) |
Thanks for sticking with this, @DaniPopes ! Excited to have it finally landed. |
Lowers
mir::BinOp::Cmp
(three_way_compare
intrinsic) to the corresponding LLVMllvm.{s,u}cmp.i8.*
intrinsics.These are the intrinsics mentioned in #118310, which are now available in LLVM 19.
I couldn't find any follow-up PRs/discussions about this, please let me know if I missed something.
r? @scottmcm