forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 351
Pick changes for std::find vectorization #11744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
fhahn
merged 60 commits into
swiftlang:stable/21.x
from
fhahn:pick-scev-laa-loads-changes-for-early-exit
Nov 5, 2025
Merged
Pick changes for std::find vectorization #11744
fhahn
merged 60 commits into
swiftlang:stable/21.x
from
fhahn:pick-scev-laa-loads-changes-for-early-exit
Nov 5, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(cherry picked from commit 1217c82)
(cherry picked from commit f0df62f)
…r. (llvm#147214) If an AddRec is expanded outside a loop with a single exit block, check if any of the (lcssa) phi nodes in the exit block match the AddRec. If that's the case, simply use the existing lcssa phi. This can reduce the number of instruction created for SCEV expansions, mainly for runtime checks generated by the loop vectorizer. Compile-time impact should be mostly neutral https://llvm-compile-time-tracker.com/compare.php?from=48c7a3187f9831304a38df9bdb3b4d5bf6b6b1a2&to=cf9d039a7b0db5d0d912e0e2c01b19c2a653273a&stat=instructions:u PR: llvm#147214 (cherry-picked from 4635743)
The original test @backward_dep_known_distance_less_than_btc was incorrectly named, as all loads are completely before the first store. Add a variant where this is not the case: @backward_dep_known_distance_less_than_btc (cherry picked from commit a11c5dd)
llvm#149795) Relax the NUW requirements for isKnownPredicateViaNoOverflow, if the second operand (Y) is an ADD. The code only simplifies the condition if C1 < C2, so if the second ADD is NUW, it doesn't matter whether the first operand also has the NUW flag, as it cannot wrap if C1 < C2. https://alive2.llvm.org/ce/z/b3dM7N PR: llvm#149795 (cherry picked from commit 6c50e2b)
Add additional test coverage for llvm#147824. (cherry picked from commit 6d004d2)
…47824) Generalize the code added in llvm#147214 to also support re-using pointer LCSSA phis when expanding SCEVs with AddRecs. A common source of integer AddRecs with pointer bases are runtime checks emitted by LV based on the distance between 2 pointer AddRecs. This improves codegen in some cases when vectorizing and prevents regressions with llvm#142309, which turns some phis into single-entry ones, which SCEV will look through now (and expand the whole AddRec), whereas before it would have to treat the LCSSA phi as SCEVUnknown. Compile-time impact neutral: https://llvm-compile-time-tracker.com/compare.php?from=fd5fc76c91538871771be2c3be2ca3a5f2dcac31&to=ca5fc2b3d8e6efc09f1624a17fdbfbe909f14eb4&stat=instructions:u PR: llvm#147824 (cherry picked from commit e21ee41)
Add another test case for llvm#147824, where the difference between an existing phi and the target SCEV is an add of a constant. (cherry picked from commit 445006d)
If we insert a new add instruction, it may introduce a new use outside the loop that contains the phi node we re-use. Use fixupLCSSAFormFor to fix LCSSA form, if needed. This fixes a crash reported in llvm#147824 (comment). (cherry picked from commit f9f68af)
Adds a SCEV-only tests for llvm#151227. (cherry picked from commit c6f7fa7)
…lvm#151227) Try to push the constant operand into a ZExt: A + zext (-A + B) -> zext (B), if trunc (A) + -A + B does not unsigned-wrap. The actual code supports ZExts with arbitrary number of arguments, hence the getAddExpr in the return. This helps SCEV reasoning in some cases, commonly when adding an offset to a zero-extended SCEV that subtracts the same offset. Note that this is restricted to cases where we can fold away an operand of the inner Add. This is needed to avoid bad interactions with patterns when forming ZExts, which try to push to ZExt to add operands. https://alive2.llvm.org/ce/z/q7d303 PR: llvm#151227 (cherry picked from commit d74d841)
Follow-up to llvm#151227 (review) to check the inner expression is an Add before calling getTruncateExpr. Adds a new matcher that just matches and captures SCEVAddExpr, to support matching a SCEVAddExpr with arbitrary number of operands. (cherry picked from commit ab9b23c)
Update the logic added in llvm#147824 to also allow adds of constants. There are a number of cases where this can help remove redundant phis and replace some computation with a ptrtoint (which likely is free in the backend). PR: llvm#150693 (cherry picked from commit 99d70e0)
This fixes a crash trying to use SCEVCouldNotCompute, if getPtrToIntExpr failed. Fixes llvm#155287 (cherry-picked from b29084f)
…llvm#155300) Try to push constant multiply operand into a ZExt containing an add, if possible. In general we are trying to push down ops through ZExt if possible. This is similar to llvm#151227 which did the same for additions. For now this is restricted to adds with a constant operand, which is similar to some of the logic above. This enables some additional simplifications. Alive2 Proof: https://alive2.llvm.org/ce/z/97pbSL PR: llvm#155300 (cherry-picked from 9143746)
Adds a test case for computing the backedge-taken-count for llvm#155941
…156013) Trip count expressions sometimes consist of adding 3 operands, i.e. (Const + A + B). There may be guard info for A + B, and if so, apply it. We can probably more generally apply this, but need to be careful w.r.t compile-time. Alive2 Proof for changes in miniters.ll: https://alive2.llvm.org/ce/z/HFfXOx Fixes llvm#155941. PR: llvm#156013 (cherry-picked from fb7c0d7)
FoundNonConstantDistanceDependence is a misleading name for a variable that determines whether we retry with runtime checks. Rename it. (cherry picked from commit b692b23)
…vm#147047) This patch extends the logic added in llvm#128061 to support dereferenceability information from assumptions as well. Unfortunately both assumption cache and the dominator tree need to be threaded through multiple layers to make them available where needed. PR: llvm#147047 (cherry picked from commit 2ae996c)
…ing deref. (llvm#155672)" This reverts commit f0df1e3. Recommit with extra check for SCEVCouldNotCompute. Test has been added in b169302. Original message: Remove the fall-back to constant max BTC if the backedge-taken-count cannot be computed. The constant max backedge-taken count is computed considering loop guards, so to avoid regressions we need to apply loop guards as needed. Also remove the special handling for Mul in willNotOverflow, as this should not longer be needed after 9143746 (llvm#155300). PR: llvm#155672 (cherry picked from commit a434a7a)
Update evaluatePtrAddrecAtMaxBTCWillNotWrap to support non-constant sizes in dereferenceable assumptions. Apply loop-guards in a few places needed to reason about expressions involving trip counts of the from (BTC - 1). PR: llvm#156758 (cherry picked from commit b400fd1)
…-2. (llvm#156730) Alive2 Proof: https://alive2.llvm.org/ce/z/JoHJE9 PR: llvm#156730 (cherry picked from commit 74ec38f)
AnyOf reduces multiple input vectors to a single boolean value. When used for early-exit vectorization, we need to consider any lane after the early exit being poison. Any poison lane would result in poison after the AnyOf reduction. To prevent this, freeze all inputs to AnyOf. Fixes llvm#153946. Fixes llvm#155162. https://alive2.llvm.org/ce/z/FD-XxA PR: llvm#154156 (cherry picked from commit f492eb9)
Add extra test coverage for follow-up to llvm#156730. (cherry picked from commit 45a2214)
Consolidate tests for multiple divisors in a single loop, add multiplies by 1, 2, 5, 6. Extends test coverage for llvm#157159. (cherry picked from commit b9f571f)
… A. (llvm#157159) Generalize fold added in 74ec38f (llvm#156730) to support multiplying and dividing by different constants, given they are both powers-of-2 and C1 is a multiple of C2, checked via logBase2. https://alive2.llvm.org/ce/z/eqJ2xj PR: llvm#157159 (cherry picked from commit a1afe66)
Treat negative constants C as -1 * abs(C1) when folding multiplies and udivs. Alive2 Proof: https://alive2.llvm.org/ce/z/bdj9W2 PR: llvm#157555 (cherry picked from commit 6580c91)
If C2 >u C1 and C1 >u 1, fold to A /u (C2 /u C1). Depends on llvm#157555. Alive2 Proof: https://alive2.llvm.org/ce/z/BWvQYN PR: llvm#157656 (cherry picked from commit 70012fd)
Add test for SCEVUMaxExpr handling in llvm#160012. (cherry picked from commit 129c683)
(cherry picked from commit 9be276e)
…Z. (llvm#160941) When computing the backedge taken count, we know that the expression must be valid just before we enter the loop. Using the terminator of the loop predecessor as context instruction for getConstantMultiple, getMinTrailingZeros allows using information from things like alignment assumptions. When a context instruction is used, the result is not cached, as it is only valid at the specific context instruction. Compile-time looks neutral: http://llvm-compile-time-tracker.com/compare.php?from=9be276ec75c087595ebb62fe11b35c1a90371a49&to=745980f5e1c8094ea1293cd145d0ef1390f03029&stat=instructions:u No impact on llvm-opt-benchmark (dtcxzyw/llvm-opt-benchmark#2867), but leads to additonal unrolling in ~90 files across a C/C++ based corpus including LLVM on AArch64 using libc++ (which emits alignment assumptions for things like std::vector::begin). PR: llvm#160941 (cherry picked from commit c7fbe38)
…s. (llvm#162617) Simplify and generalize the code to get a common constant multiple for expressions when collecting guards, replacing the manual implementation. Split off from llvm#160012. PR: llvm#162617 (cherry picked from commit 6d905e4)
…llvm#163017) Follow-up as suggested in llvm#162617. Just use an APInt for DividesBy, as the existing code already operates on APInt and thus handles the case of DividesBy being 1. PR: llvm#163017 (cherry picked from commit 0d1f2f4)
Additional test coverage for llvm#160500. (cherry picked from commit 735ee5c)
This patch adds a new m_scev_Trunc pattern matcher for SCEVTruncateExpr and uses it in a few places to slightly simplify the code. PR: llvm#163169 (cherry picked from commit bc4e14b)
Move URem matching to ScalarEvolutionPatternMatch.h so it can be re-used together with other matchers. Depends on llvm#163169 PR: llvm#163170 (cherry picked from commit 7f04ee1)
When collecting information from loop guards, use UMax(1, %b - %a) for ICMP NE %a, %b, if neither are constant. This improves results in some cases, and will be even more useful together with * llvm#160012 * llvm#159942 https://alive2.llvm.org/ce/z/YyBvoT PR: llvm#160500 (cherry picked from commit 2d02726)
…e. (llvm#163260) Follow-up to llvm#160941. Even if we don't have a context instruction for the caller, we should be able to provide context instructions for SCEVUnknowns. Unless I am missing something, SCEVUnknown only become available at the point their underlying IR instruction has been defined. If it is an argument, it should be safe to use the first instruction in the entry block or the instruction itself if it wraps an instruction. This allows getConstantMultiple to make better use of alignment assumptions. PR: llvm#163260 (cherry picked from commit 3b46556)
Add additional test coverage for using NE guards added in 2d02726 (llvm#160500) (cherry picked from commit 0792478)
Add a new variant of m_scev_Mul that binds a SCEVMulExpr and use it in SCEVURem_match and also update 2 more places in ScalarEvolution.cpp that can use m_scev_Mul as well. PR: llvm#163364 (cherry picked from commit 7c54c82)
…lvm#163787) Follow-up to 2d02726 (llvm#160500) Creating the SCEV subtraction eagerly is very expensive. To soften the blow, just collect a map with inequalities and check if we can apply the subtract rewrite when rewriting SCEVAddExpr. Restores most of the regression: http://llvm-compile-time-tracker.com/compare.php?from=0792478e4e133be96650444f3264e89d002fc058&to=7fca35db60fe6f423ea6051b45226046c067c252&stat=instructions:u stage1-O3: -0.10% stage1-ReleaseThinLTO: -0.09% stage1-ReleaseLTO-g: -0.10% stage1-O0-g: +0.02% stage1-aarch64-O3: -0.09% stage1-aarch64-O0-g: +0.00% stage2-O3: -0.17% stage2-O0-g: -0.05% stage2-clang: -0.07% There is still some negative impact compared to before 2d02726, but there's probably not much we could do reduce this even more. Compile-time improvement with 2d02726 reverted on top of the current PR: http://llvm-compile-time-tracker.com/compare.php?from=7fca35db60fe6f423ea6051b45226046c067c252&to=98dd152bdfc76b30d00190d3850d89406ca3c21f&stat=instructions:u stage1-O3: 60628M (-0.03%) stage1-ReleaseThinLTO: 76388M (-0.04%) stage1-ReleaseLTO-g: 89228M (-0.02%) stage1-O0-g: 18523M (-0.03%) stage1-aarch64-O3: 67623M (-0.03%) stage1-aarch64-O0-g: 22595M (+0.01%) stage2-O3: 52336M (+0.01%) stage2-O0-g: 16174M (+0.00%) stage2-clang: 34890032M (-0.03%) PR: llvm#163787 (cherry picked from commit a5d3522)
…ub. (llvm#163250) Follow-up to llvm#160500 to preserve divisibiltiy info when creating the UMax. PR: llvm#163250 (cherry picked from commit eb17a8d)
Add test with urem guard with non-constant divisor and AddRec guards. Extra test coverage for llvm#163021 (cherry picked from commit 0731f18)
Move getPreviousSCEVDivisibleByDivisor from a lambda to a static function and clarify the name (DividesBy -> DivisibleBy). Split off refactoring from llvm#163021. (cherry picked from commit 385ea0d)
Fix formatting for switch, to avoid unrelated changes/formatting errors in llvm#163021. (cherry picked from commit 817b7c5)
…llvm#163021) At the moment, the effectivness of guards that contain divisibility information (A % B == 0 ) depends on the order of the conditions. This patch makes using divisibility information independent of the order, by collecting and applying the divisibility information separately. We first collect all conditions in a vector, then collect the divisibility information from all guards. When processing other guards, we apply divisibility info collected earlier. After all guards have been processed, we add the divisibility info, rewriting the existing rewrite. This ensures we apply the divisibility info to the largest rewrite expression. This helps to improve results in a few cases, one in dtcxzyw/llvm-opt-benchmark#2921 and another one in a different large C/C++ based IR corpus. PR: llvm#163021 (cherry picked from commit d3fe1df)
(cherry picked from commit 4efde3c)
When using information from dereferenceable assumptions, we need to make sure that the memory is not freed between the assume and the specified context instruction. Instead of just checking canBeFreed, check if there any calls that may free between the assume and the context instruction. This patch introduces a willNotFreeBetween to check for calls that may free between an assume and a context instructions, to also be used in llvm#161255. PR: llvm#161725 (cherry picked from commit 7ceef76)
Author
|
@swift-ci please test |
Author
|
@swift-ci please test llvm |
llvm#156929) Update Clang's __builtin_assume_dereferenceable to support non-constant lengths. The corresponding assume bundle has been updated to support non-constant sizes in cad62df. The current docs for the builtin don't mention the constant requirement for the size argument, so don't need to be updated: https://clang.llvm.org/docs/LanguageExtensions.html#builtin-assume-dereferenceable A number of patches landed recently to make the optimizer make better use of the dereferenceable assumptions, and once llvm#156730 lands, it can be used to vectorize some early-exit loops, for example std::find with std::vector::iterator: https://godbolt.org/z/qo58PKG37 ``` #include <algorithm> #include <cstddef> #include <vector> auto find(std::vector<short>::iterator first, short s, unsigned size) { auto Addr = __builtin_assume_aligned(std::to_address(first), 2); __builtin_assume_dereferenceable(std::to_address(first), size * sizeof(short)); return std::find(first, first + size, s); } ``` PR: llvm#156929 (cherry picked from commit c8d065b)
When using information from dereferenceable assumptions, we need to make sure that the memory is not freed between the assume and the specified context instruction. Instead of just checking canBeFreed, check if there any calls that may free between the assume and the context instruction. Note that this also adjusts the context instruction to be the terminator in the loop predecessor, if there is one and it is a branch (to avoid things like invoke). PR: llvm#161255 (cherry picked from commit 8b8c59c)
b8eaf2c to
dec6e12
Compare
Author
|
@swift-ci please test |
Author
|
@swift-ci please test llvm |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
rdar://160925334
rdar://158592232
rdar://159859974