Pick changes for std::find vectorization #11744

fhahn · 2025-11-04T14:02:06Z

rdar://160925334
rdar://158592232
rdar://159859974

(cherry picked from commit 1217c82)

(cherry picked from commit f0df62f)

…r. (llvm#147214) If an AddRec is expanded outside a loop with a single exit block, check if any of the (lcssa) phi nodes in the exit block match the AddRec. If that's the case, simply use the existing lcssa phi. This can reduce the number of instruction created for SCEV expansions, mainly for runtime checks generated by the loop vectorizer. Compile-time impact should be mostly neutral https://llvm-compile-time-tracker.com/compare.php?from=48c7a3187f9831304a38df9bdb3b4d5bf6b6b1a2&to=cf9d039a7b0db5d0d912e0e2c01b19c2a653273a&stat=instructions:u PR: llvm#147214 (cherry-picked from 4635743)

The original test @backward_dep_known_distance_less_than_btc was incorrectly named, as all loads are completely before the first store. Add a variant where this is not the case: @backward_dep_known_distance_less_than_btc (cherry picked from commit a11c5dd)

llvm#149795) Relax the NUW requirements for isKnownPredicateViaNoOverflow, if the second operand (Y) is an ADD. The code only simplifies the condition if C1 < C2, so if the second ADD is NUW, it doesn't matter whether the first operand also has the NUW flag, as it cannot wrap if C1 < C2. https://alive2.llvm.org/ce/z/b3dM7N PR: llvm#149795 (cherry picked from commit 6c50e2b)

Add additional test coverage for llvm#147824. (cherry picked from commit 6d004d2)

…47824) Generalize the code added in llvm#147214 to also support re-using pointer LCSSA phis when expanding SCEVs with AddRecs. A common source of integer AddRecs with pointer bases are runtime checks emitted by LV based on the distance between 2 pointer AddRecs. This improves codegen in some cases when vectorizing and prevents regressions with llvm#142309, which turns some phis into single-entry ones, which SCEV will look through now (and expand the whole AddRec), whereas before it would have to treat the LCSSA phi as SCEVUnknown. Compile-time impact neutral: https://llvm-compile-time-tracker.com/compare.php?from=fd5fc76c91538871771be2c3be2ca3a5f2dcac31&to=ca5fc2b3d8e6efc09f1624a17fdbfbe909f14eb4&stat=instructions:u PR: llvm#147824 (cherry picked from commit e21ee41)

Add another test case for llvm#147824, where the difference between an existing phi and the target SCEV is an add of a constant. (cherry picked from commit 445006d)

If we insert a new add instruction, it may introduce a new use outside the loop that contains the phi node we re-use. Use fixupLCSSAFormFor to fix LCSSA form, if needed. This fixes a crash reported in llvm#147824 (comment). (cherry picked from commit f9f68af)

Adds a SCEV-only tests for llvm#151227. (cherry picked from commit c6f7fa7)

…lvm#151227) Try to push the constant operand into a ZExt: A + zext (-A + B) -> zext (B), if trunc (A) + -A + B does not unsigned-wrap. The actual code supports ZExts with arbitrary number of arguments, hence the getAddExpr in the return. This helps SCEV reasoning in some cases, commonly when adding an offset to a zero-extended SCEV that subtracts the same offset. Note that this is restricted to cases where we can fold away an operand of the inner Add. This is needed to avoid bad interactions with patterns when forming ZExts, which try to push to ZExt to add operands. https://alive2.llvm.org/ce/z/q7d303 PR: llvm#151227 (cherry picked from commit d74d841)

Follow-up to llvm#151227 (review) to check the inner expression is an Add before calling getTruncateExpr. Adds a new matcher that just matches and captures SCEVAddExpr, to support matching a SCEVAddExpr with arbitrary number of operands. (cherry picked from commit ab9b23c)

Update the logic added in llvm#147824 to also allow adds of constants. There are a number of cases where this can help remove redundant phis and replace some computation with a ptrtoint (which likely is free in the backend). PR: llvm#150693 (cherry picked from commit 99d70e0)

This fixes a crash trying to use SCEVCouldNotCompute, if getPtrToIntExpr failed. Fixes llvm#155287 (cherry-picked from b29084f)

…llvm#155300) Try to push constant multiply operand into a ZExt containing an add, if possible. In general we are trying to push down ops through ZExt if possible. This is similar to llvm#151227 which did the same for additions. For now this is restricted to adds with a constant operand, which is similar to some of the logic above. This enables some additional simplifications. Alive2 Proof: https://alive2.llvm.org/ce/z/97pbSL PR: llvm#155300 (cherry-picked from 9143746)

Adds a test case for computing the backedge-taken-count for llvm#155941

…156013) Trip count expressions sometimes consist of adding 3 operands, i.e. (Const + A + B). There may be guard info for A + B, and if so, apply it. We can probably more generally apply this, but need to be careful w.r.t compile-time. Alive2 Proof for changes in miniters.ll: https://alive2.llvm.org/ce/z/HFfXOx Fixes llvm#155941. PR: llvm#156013 (cherry-picked from fb7c0d7)

FoundNonConstantDistanceDependence is a misleading name for a variable that determines whether we retry with runtime checks. Rename it. (cherry picked from commit b692b23)

…vm#147047) This patch extends the logic added in llvm#128061 to support dereferenceability information from assumptions as well. Unfortunately both assumption cache and the dominator tree need to be threaded through multiple layers to make them available where needed. PR: llvm#147047 (cherry picked from commit 2ae996c)

Add support for identifying multiplication overflow in SCEV. This is needed in LoopAccessAnalysis and that limitation was worked around by 484417a. This allows early-exit vectorization to work as expected in vect.stats.ll test without needing the workaround. (cherry picked from commit 00926a6)

Includes a test for the crash exposed by 08001cf. (cherry picked from commit b169302)

…ing deref. (llvm#155672)" This reverts commit f0df1e3. Recommit with extra check for SCEVCouldNotCompute. Test has been added in b169302. Original message: Remove the fall-back to constant max BTC if the backedge-taken-count cannot be computed. The constant max backedge-taken count is computed considering loop guards, so to avoid regressions we need to apply loop guards as needed. Also remove the special handling for Mul in willNotOverflow, as this should not longer be needed after 9143746 (llvm#155300). PR: llvm#155672 (cherry picked from commit a434a7a)

Update evaluatePtrAddrecAtMaxBTCWillNotWrap to support non-constant sizes in dereferenceable assumptions. Apply loop-guards in a few places needed to reason about expressions involving trip counts of the from (BTC - 1). PR: llvm#156758 (cherry picked from commit b400fd1)

…-2. (llvm#156730) Alive2 Proof: https://alive2.llvm.org/ce/z/JoHJE9 PR: llvm#156730 (cherry picked from commit 74ec38f)

AnyOf reduces multiple input vectors to a single boolean value. When used for early-exit vectorization, we need to consider any lane after the early exit being poison. Any poison lane would result in poison after the AnyOf reduction. To prevent this, freeze all inputs to AnyOf. Fixes llvm#153946. Fixes llvm#155162. https://alive2.llvm.org/ce/z/FD-XxA PR: llvm#154156 (cherry picked from commit f492eb9)

Add extra test coverage for follow-up to llvm#156730. (cherry picked from commit 45a2214)

Consolidate tests for multiple divisors in a single loop, add multiplies by 1, 2, 5, 6. Extends test coverage for llvm#157159. (cherry picked from commit b9f571f)

… A. (llvm#157159) Generalize fold added in 74ec38f (llvm#156730) to support multiplying and dividing by different constants, given they are both powers-of-2 and C1 is a multiple of C2, checked via logBase2. https://alive2.llvm.org/ce/z/eqJ2xj PR: llvm#157159 (cherry picked from commit a1afe66)

Treat negative constants C as -1 * abs(C1) when folding multiplies and udivs. Alive2 Proof: https://alive2.llvm.org/ce/z/bdj9W2 PR: llvm#157555 (cherry picked from commit 6580c91)

If C2 >u C1 and C1 >u 1, fold to A /u (C2 /u C1). Depends on llvm#157555. Alive2 Proof: https://alive2.llvm.org/ce/z/BWvQYN PR: llvm#157656 (cherry picked from commit 70012fd)

Add test for SCEVUMaxExpr handling in llvm#160012. (cherry picked from commit 129c683)

(cherry picked from commit 9be276e)

…Z. (llvm#160941) When computing the backedge taken count, we know that the expression must be valid just before we enter the loop. Using the terminator of the loop predecessor as context instruction for getConstantMultiple, getMinTrailingZeros allows using information from things like alignment assumptions. When a context instruction is used, the result is not cached, as it is only valid at the specific context instruction. Compile-time looks neutral: http://llvm-compile-time-tracker.com/compare.php?from=9be276ec75c087595ebb62fe11b35c1a90371a49&to=745980f5e1c8094ea1293cd145d0ef1390f03029&stat=instructions:u No impact on llvm-opt-benchmark (dtcxzyw/llvm-opt-benchmark#2867), but leads to additonal unrolling in ~90 files across a C/C++ based corpus including LLVM on AArch64 using libc++ (which emits alignment assumptions for things like std::vector::begin). PR: llvm#160941 (cherry picked from commit c7fbe38)

…s. (llvm#162617) Simplify and generalize the code to get a common constant multiple for expressions when collecting guards, replacing the manual implementation. Split off from llvm#160012. PR: llvm#162617 (cherry picked from commit 6d905e4)

…llvm#163017) Follow-up as suggested in llvm#162617. Just use an APInt for DividesBy, as the existing code already operates on APInt and thus handles the case of DividesBy being 1. PR: llvm#163017 (cherry picked from commit 0d1f2f4)

Additional test coverage for llvm#160500. (cherry picked from commit 735ee5c)

This patch adds a new m_scev_Trunc pattern matcher for SCEVTruncateExpr and uses it in a few places to slightly simplify the code. PR: llvm#163169 (cherry picked from commit bc4e14b)

Move URem matching to ScalarEvolutionPatternMatch.h so it can be re-used together with other matchers. Depends on llvm#163169 PR: llvm#163170 (cherry picked from commit 7f04ee1)

When collecting information from loop guards, use UMax(1, %b - %a) for ICMP NE %a, %b, if neither are constant. This improves results in some cases, and will be even more useful together with * llvm#160012 * llvm#159942 https://alive2.llvm.org/ce/z/YyBvoT PR: llvm#160500 (cherry picked from commit 2d02726)

…e. (llvm#163260) Follow-up to llvm#160941. Even if we don't have a context instruction for the caller, we should be able to provide context instructions for SCEVUnknowns. Unless I am missing something, SCEVUnknown only become available at the point their underlying IR instruction has been defined. If it is an argument, it should be safe to use the first instruction in the entry block or the instruction itself if it wraps an instruction. This allows getConstantMultiple to make better use of alignment assumptions. PR: llvm#163260 (cherry picked from commit 3b46556)

Add additional test coverage for using NE guards added in 2d02726 (llvm#160500) (cherry picked from commit 0792478)

Add a new variant of m_scev_Mul that binds a SCEVMulExpr and use it in SCEVURem_match and also update 2 more places in ScalarEvolution.cpp that can use m_scev_Mul as well. PR: llvm#163364 (cherry picked from commit 7c54c82)

…lvm#163787) Follow-up to 2d02726 (llvm#160500) Creating the SCEV subtraction eagerly is very expensive. To soften the blow, just collect a map with inequalities and check if we can apply the subtract rewrite when rewriting SCEVAddExpr. Restores most of the regression: http://llvm-compile-time-tracker.com/compare.php?from=0792478e4e133be96650444f3264e89d002fc058&to=7fca35db60fe6f423ea6051b45226046c067c252&stat=instructions:u stage1-O3: -0.10% stage1-ReleaseThinLTO: -0.09% stage1-ReleaseLTO-g: -0.10% stage1-O0-g: +0.02% stage1-aarch64-O3: -0.09% stage1-aarch64-O0-g: +0.00% stage2-O3: -0.17% stage2-O0-g: -0.05% stage2-clang: -0.07% There is still some negative impact compared to before 2d02726, but there's probably not much we could do reduce this even more. Compile-time improvement with 2d02726 reverted on top of the current PR: http://llvm-compile-time-tracker.com/compare.php?from=7fca35db60fe6f423ea6051b45226046c067c252&to=98dd152bdfc76b30d00190d3850d89406ca3c21f&stat=instructions:u stage1-O3: 60628M (-0.03%) stage1-ReleaseThinLTO: 76388M (-0.04%) stage1-ReleaseLTO-g: 89228M (-0.02%) stage1-O0-g: 18523M (-0.03%) stage1-aarch64-O3: 67623M (-0.03%) stage1-aarch64-O0-g: 22595M (+0.01%) stage2-O3: 52336M (+0.01%) stage2-O0-g: 16174M (+0.00%) stage2-clang: 34890032M (-0.03%) PR: llvm#163787 (cherry picked from commit a5d3522)

…ub. (llvm#163250) Follow-up to llvm#160500 to preserve divisibiltiy info when creating the UMax. PR: llvm#163250 (cherry picked from commit eb17a8d)

Add test with urem guard with non-constant divisor and AddRec guards. Extra test coverage for llvm#163021 (cherry picked from commit 0731f18)

Move getPreviousSCEVDivisibleByDivisor from a lambda to a static function and clarify the name (DividesBy -> DivisibleBy). Split off refactoring from llvm#163021. (cherry picked from commit 385ea0d)

Fix formatting for switch, to avoid unrelated changes/formatting errors in llvm#163021. (cherry picked from commit 817b7c5)

…llvm#163021) At the moment, the effectivness of guards that contain divisibility information (A % B == 0 ) depends on the order of the conditions. This patch makes using divisibility information independent of the order, by collecting and applying the divisibility information separately. We first collect all conditions in a vector, then collect the divisibility information from all guards. When processing other guards, we apply divisibility info collected earlier. After all guards have been processed, we add the divisibility info, rewriting the existing rewrite. This ensures we apply the divisibility info to the largest rewrite expression. This helps to improve results in a few cases, one in dtcxzyw/llvm-opt-benchmark#2921 and another one in a different large C/C++ based IR corpus. PR: llvm#163021 (cherry picked from commit d3fe1df)

(cherry picked from commit 4efde3c)

When using information from dereferenceable assumptions, we need to make sure that the memory is not freed between the assume and the specified context instruction. Instead of just checking canBeFreed, check if there any calls that may free between the assume and the context instruction. This patch introduces a willNotFreeBetween to check for calls that may free between an assume and a context instructions, to also be used in llvm#161255. PR: llvm#161725 (cherry picked from commit 7ceef76)

fhahn · 2025-11-04T14:02:20Z

@swift-ci please test

fhahn · 2025-11-04T14:02:27Z

@swift-ci please test llvm

llvm#156929) Update Clang's __builtin_assume_dereferenceable to support non-constant lengths. The corresponding assume bundle has been updated to support non-constant sizes in cad62df. The current docs for the builtin don't mention the constant requirement for the size argument, so don't need to be updated: https://clang.llvm.org/docs/LanguageExtensions.html#builtin-assume-dereferenceable A number of patches landed recently to make the optimizer make better use of the dereferenceable assumptions, and once llvm#156730 lands, it can be used to vectorize some early-exit loops, for example std::find with std::vector::iterator: https://godbolt.org/z/qo58PKG37 ``` #include <algorithm> #include <cstddef> #include <vector> auto find(std::vector<short>::iterator first, short s, unsigned size) { auto Addr = __builtin_assume_aligned(std::to_address(first), 2); __builtin_assume_dereferenceable(std::to_address(first), size * sizeof(short)); return std::find(first, first + size, s); } ``` PR: llvm#156929 (cherry picked from commit c8d065b)

When using information from dereferenceable assumptions, we need to make sure that the memory is not freed between the assume and the specified context instruction. Instead of just checking canBeFreed, check if there any calls that may free between the assume and the context instruction. Note that this also adjusts the context instruction to be the terminator in the loop predecessor, if there is one and it is a branch (to avoid things like invoke). PR: llvm#161255 (cherry picked from commit 8b8c59c)

fhahn · 2025-11-04T20:43:33Z

@swift-ci please test

fhahn · 2025-11-04T20:43:36Z

@swift-ci please test llvm

fhahn and others added 30 commits November 4, 2025 10:12

[LoopIdiom] Add test for simplifying SCEV during expansion with flags.

f433ff8

(cherry picked from commit 1217c82)

[IndVars,LV] Add tests for missed SCEV simplifications with muls.

b73cf05

(cherry picked from commit f0df62f)

[LV] Add additional SCEV expansion tests for llvm#147824.

ee153c8

Add additional test coverage for llvm#147824. (cherry picked from commit 6d004d2)

[LV] Add test for re-using existing phi for SCEV Add.

110e585

Add another test case for llvm#147824, where the difference between an existing phi and the target SCEV is an add of a constant. (cherry picked from commit 445006d)

[SCEV] Add test for pushing constant add into zext.

9184074

Adds a SCEV-only tests for llvm#151227. (cherry picked from commit c6f7fa7)

[SCEVExp] Check if getPtrToIntExpr resulted in CouldNotCompute.

a996352

This fixes a crash trying to use SCEVCouldNotCompute, if getPtrToIntExpr failed. Fixes llvm#155287 (cherry-picked from b29084f)

[SCEV] Add tests for applying guards to SCEVAddExpr sub-expressions.

83fdf69

Adds a test case for computing the backedge-taken-count for llvm#155941

[LAA] Rename var used to retry with RT-checks (NFC) (llvm#147307)

2ed9951

FoundNonConstantDistanceDependence is a misleading name for a variable that determines whether we retry with runtime checks. Rename it. (cherry picked from commit b692b23)

[LV] Add additional tests for reasoning about dereferenceable loads.

6cd6e8b

Includes a test for the crash exposed by 08001cf. (cherry picked from commit b169302)

[SCEV] Fold (C * A /u C) -> A, if A is a multiple of C and C a pow-of…

b18c2ca

…-2. (llvm#156730) Alive2 Proof: https://alive2.llvm.org/ce/z/JoHJE9 PR: llvm#156730 (cherry picked from commit 74ec38f)

[SCEV] Add tests for folding multiply/divide by constants.

f6c35e2

Add extra test coverage for follow-up to llvm#156730. (cherry picked from commit 45a2214)

[SCEV] Cover more multipler/divisor combinations in folding test.

4918512

Consolidate tests for multiple divisors in a single loop, add multiplies by 1, 2, 5, 6. Extends test coverage for llvm#157159. (cherry picked from commit b9f571f)

[SCEV] Fold ((-1 * C1) * D / C1) -> -1 * D. (llvm#157555)

af12a13

Treat negative constants C as -1 * abs(C1) when folding multiplies and udivs. Alive2 Proof: https://alive2.llvm.org/ce/z/bdj9W2 PR: llvm#157555 (cherry picked from commit 6580c91)

[SCEV] Fold (C1 * A /u C2) -> A /u (C2 /u C1), if C2 > C1. (llvm#157656)

ca229b8

If C2 >u C1 and C1 >u 1, fold to A /u (C2 /u C1). Depends on llvm#157555. Alive2 Proof: https://alive2.llvm.org/ce/z/BWvQYN PR: llvm#157656 (cherry picked from commit 70012fd)

fhahn added 20 commits November 4, 2025 11:49

[LV] Add test showing missed optimization due to missing info from guard

14064d5

Add test for SCEVUMaxExpr handling in llvm#160012. (cherry picked from commit 129c683)

[SCEV] Add tests for computing trip counts with align assumptions.

921ddad

(cherry picked from commit 9be276e)

[SCEV] Add test with ptrtoint guards and their order swapped.

d245be6

Additional test coverage for llvm#160500. (cherry picked from commit 735ee5c)

[SCEV] Add m_scev_Trunc pattern matcher. (llvm#163169)

198f96b

This patch adds a new m_scev_Trunc pattern matcher for SCEVTruncateExpr and uses it in a few places to slightly simplify the code. PR: llvm#163169 (cherry picked from commit bc4e14b)

[SCEV] Move URem matching to ScalarEvolutionPatternMatch.h (llvm#163170)

2af416b

Move URem matching to ScalarEvolutionPatternMatch.h so it can be re-used together with other matchers. Depends on llvm#163169 PR: llvm#163170 (cherry picked from commit 7f04ee1)

[SCEV] Add tests with multiple NE guards and different orders.

711a060

Add additional test coverage for using NE guards added in 2d02726 (llvm#160500) (cherry picked from commit 0792478)

[SCEV] Preserve divisor info when adding guard info for ICMP_NE via S…

ba69e1d

…ub. (llvm#163250) Follow-up to llvm#160500 to preserve divisibiltiy info when creating the UMax. PR: llvm#163250 (cherry picked from commit eb17a8d)

[SCEV] Add extra test coverage with URem & AddRec guards.

ea3b9c0

Add test with urem guard with non-constant divisor and AddRec guards. Extra test coverage for llvm#163021 (cherry picked from commit 0731f18)

[SCEV] Move and clarify names of prev/next divisor helpers (NFC).

f2ef597

Move getPreviousSCEVDivisibleByDivisor from a lambda to a static function and clarify the name (DividesBy -> DivisibleBy). Split off refactoring from llvm#163021. (cherry picked from commit 385ea0d)

[SCEV] Fix switch formatting in collectFromBlock (NFC).

a738a80

Fix formatting for switch, to avoid unrelated changes/formatting errors in llvm#163021. (cherry picked from commit 817b7c5)

[Loads] Add additional test coverage for assumptions.

d67f605

(cherry picked from commit 4efde3c)

fhahn added 3 commits November 4, 2025 20:34

Update tests.

dec6e12

fhahn force-pushed the pick-scev-laa-loads-changes-for-early-exit branch from b8eaf2c to dec6e12 Compare November 4, 2025 20:35

fhahn merged commit 4d06b34 into swiftlang:stable/21.x Nov 5, 2025
5 checks passed

fhahn deleted the pick-scev-laa-loads-changes-for-early-exit branch November 5, 2025 10:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pick changes for std::find vectorization #11744

Pick changes for std::find vectorization #11744

Uh oh!

fhahn commented Nov 4, 2025

Uh oh!

fhahn commented Nov 4, 2025

Uh oh!

fhahn commented Nov 4, 2025

Uh oh!

fhahn commented Nov 4, 2025

Uh oh!

fhahn commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Pick changes for std::find vectorization #11744

Pick changes for std::find vectorization #11744

Uh oh!

Conversation

fhahn commented Nov 4, 2025

Uh oh!

fhahn commented Nov 4, 2025

Uh oh!

fhahn commented Nov 4, 2025

Uh oh!

fhahn commented Nov 4, 2025

Uh oh!

fhahn commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants