for loop in closure is not unrolled and not vectorlized correctly #120189
Labels
A-autovectorization
Area: Autovectorization, which can impact perf or code size
A-LLVM
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
C-bug
Category: This is a bug.
E-needs-test
Call for participation: An issue has been fixed and does not reproduce, but no test has been added.
I-slow
Issue: Problems and improvements with respect to performance of generated code.
I tried this code on godbolt.org :
The generated assembly code is as follows. The additions in the for loop are compiled into four
inc
instructions and onepsubb
instruction. Is there any particular reason why these additions cannot be compiled into one SSE addition?Instead, if you move the for loop outside the closure, the for loop will be unrolled into five
psubb
instructions.The complete test code is avaliable here: https://godbolt.org/z/YoMaWWzW7
The text was updated successfully, but these errors were encountered: