-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance returning enums larged than a word. Possibly poor code generation? #19864
Comments
So I looked at this a little deeper. From the LLVM IR, it seems that when LLVM SROAs the This would be fixed with some TBAA metadata emitted, so is essentially #6736. |
This is something I've noticed as well. |
Ok, so I did a basic implementation of |
Ok, so I managed to fix up the codegen, hopefully it's still valid. It improved speed a little, but part of the issue is that IoError is 8 words, or 64 bytes on x86-64. pub struct IoError {
pub kind: IoErrorKind // 2 words because a single variant has a uint associated with it
pub desc: &'static str // 2 words - the size of a string slice
pub detail: Option<String> // 4 words - One for the discriminant 3 for the String.
} It's no surprise that the code using |
I was talking with @mcpherrrin and @huonw, and they mentioned that we are shrinking the enum tag down to the smallest integer size, which would result in padding and the unsigned stores and loads you found. I wonder if disabling the shrinking would speed things up. |
@erickt heh, that's exactly the change I made. Instead of just using the smallest integer type, I use the smallest to figure out what the appropriate alignment is, then use the alignment size to set the discriminant type. For some reason LLVM was struggling to figure out that the padding wasn't being written to or read from. |
For the record, with @alexcrichton example the slower example gets twice as fast with the newer codegen. It's still much slower than the faster example, but that's hardly surprising. |
Even with @Aatch's patches, we're just doubling the IoError case from 40MB/s to 80MB/s. The destructor we're getting from the #[deriving(Show, PartialEq, Eq)]
enum MyError3Kind {
EndOfFile,
Error,
_Error1, //(uint),
}
#[deriving(Show, PartialEq, Eq)]
struct MyError3 {
kind: MyError3Kind,
//desc: &'static str,
//description: Option<String>,
}
impl Error for MyError3 {
fn is_eof(&self) -> bool {
self.kind == MyError3Kind::EndOfFile
}
}
#[bench]
fn bench_foo11_enum_smaller_error(b: &mut test::Bencher) {
let bytes = generate_bytes();
b.bytes = bytes.len() as u64;
b.iter(|| {
let mut rdr = bytes.as_slice();
let iter = Foo11::new(|buf| -> Result<(), MyError3> {
match rdr.push(BUFFER_SIZE, buf) {
Ok(_) => Ok(()),
Err(io::IoError { kind: io::EndOfFile, .. }) => Ok(()),
Err(_) => {
Err(MyError3 {
kind: MyError3Kind::Error,
//desc: "",
//description: Some("foo".to_string()),
})
}
}
});
for (idx, item) in iter.enumerate() {
assert_eq!(idx as u8, item.unwrap());
}
})
} |
CC @mitsuhiko |
Giant return types simply don't exist in C libraries, even though C structs are first-class. The canonical approach is to pass an extra pointer: fn foo(..., _ret: &mut IoError<T>);
fn bar(..., _ret: &mut IoError<T>) {
foo(..., _ret);
if !_ret.is_ok() { return; } // no need to copy
// ...
// write our own value when we're done
*_ret = ...;
}
fn baz() {
let mut _ret: IoError<T> = uninitialized();
bar(..., &mut _ret);
} LLVM doesn't have the freedom to change cross-library signatures, but we can probably do this in the Rust calling convention without API breakage. |
Rust implements large return types via an out pointer similar to the manual version in C. |
@huonw: That's already how the platform ABIs handle large return values anyway. |
The x86_64 ABI uses a higher threshold than Rust's choice of 1 word though. |
Triage: updated code: #![feature(test)]
extern crate test;
use std::iter::repeat;
const N: usize = 100000;
#[derive(Clone)]
struct Foo;
#[derive(Clone)]
struct Bar {
name: &'static str,
desc: Option<String>,
other: Option<String>,
}
#[bench]
fn b1(b: &mut test::Bencher) {
b.iter(|| {
let r: Result<u8, Foo> = Ok(1u8);
repeat(r).take(N).map(|x| test::black_box(x)).count()
});
}
#[bench]
fn b2(b: &mut test::Bencher) {
b.iter(|| {
let r: Result<u8, Bar> = Ok(1u8);
repeat(r).take(N).map(|x| test::black_box(x)).count()
});
} updated benchmarks:
|
That benchmark on
If
results will be
but this can be not related to actual problem. |
With rustc 1.75.0-nightly (249624b 2023-10-20):
|
I've discovered an issue with
IoError
, and really returning any enums that are larger than 1 word, are running an order of magnitude slower than returning an error enum that's 1 word size. Here's my test case:Here are the results:
On a related note, @alexcrichton just found a similar case with:
The assembly for
b1
is about 3 instructions, but the assembly forb2
has a ton ofmov
instructions.The text was updated successfully, but these errors were encountered: