Skip to content

Conversation

@fhahn
Copy link

@fhahn fhahn commented Nov 4, 2025

rdar://160925334
rdar://158592232
rdar://159859974

fhahn and others added 30 commits November 4, 2025 10:12
…r. (llvm#147214)

If an AddRec is expanded outside a loop with a single exit block, check
if any of the (lcssa) phi nodes in the exit block match the AddRec. If
that's the case, simply use the existing lcssa phi.

This can reduce the number of instruction created for SCEV expansions,
mainly for runtime checks generated by the loop vectorizer.

Compile-time impact should be mostly neutral

https://llvm-compile-time-tracker.com/compare.php?from=48c7a3187f9831304a38df9bdb3b4d5bf6b6b1a2&to=cf9d039a7b0db5d0d912e0e2c01b19c2a653273a&stat=instructions:u

PR: llvm#147214

(cherry-picked from 4635743)
The original test @backward_dep_known_distance_less_than_btc was
incorrectly named, as all loads are completely before the first store.

Add a variant where this is not the case: @backward_dep_known_distance_less_than_btc

(cherry picked from commit a11c5dd)
llvm#149795)

Relax the NUW requirements for isKnownPredicateViaNoOverflow, if the
second operand (Y) is an ADD. The code only simplifies the condition if
C1 < C2, so if the second ADD is NUW, it doesn't matter whether the
first operand also has the NUW flag, as it cannot wrap if C1 < C2.

https://alive2.llvm.org/ce/z/b3dM7N

PR: llvm#149795
(cherry picked from commit 6c50e2b)
Add additional test coverage for
llvm#147824.

(cherry picked from commit 6d004d2)
…47824)

Generalize the code added in
llvm#147214 to also support
re-using pointer LCSSA phis when expanding SCEVs with AddRecs.

A common source of integer AddRecs with pointer bases are runtime checks
emitted by LV based on the distance between 2 pointer AddRecs.

This improves codegen in some cases when vectorizing and prevents
regressions with llvm#142309, which
turns some phis into single-entry ones, which SCEV will look through
now (and expand the whole AddRec), whereas before it would have to treat
the LCSSA phi as SCEVUnknown.

Compile-time impact neutral:
https://llvm-compile-time-tracker.com/compare.php?from=fd5fc76c91538871771be2c3be2ca3a5f2dcac31&to=ca5fc2b3d8e6efc09f1624a17fdbfbe909f14eb4&stat=instructions:u

PR: llvm#147824
(cherry picked from commit e21ee41)
Add another test case for
llvm#147824, where the difference
between an existing phi and the target SCEV is an add of a constant.

(cherry picked from commit 445006d)
If we insert a new add instruction, it may introduce a new use outside
the loop that contains the phi node we re-use. Use fixupLCSSAFormFor to
fix LCSSA form, if needed.

This fixes a crash reported in
llvm#147824 (comment).

(cherry picked from commit f9f68af)
Adds a SCEV-only tests for
llvm#151227.

(cherry picked from commit c6f7fa7)
…lvm#151227)

Try to push the constant operand into a ZExt:
A + zext (-A + B) -> zext (B), if trunc (A) + -A + B does not
unsigned-wrap.

The actual code supports ZExts with arbitrary number of arguments, hence
the getAddExpr in the return.

This helps SCEV reasoning in some cases, commonly when adding an offset
to a zero-extended SCEV that subtracts the same offset.

Note that this is restricted to cases where we can fold away an operand
of the inner Add. This is needed to avoid bad interactions with patterns
when forming ZExts, which try to push to ZExt to add operands.

https://alive2.llvm.org/ce/z/q7d303

PR: llvm#151227
(cherry picked from commit d74d841)
Follow-up to
llvm#151227 (review)
to check the inner expression is an Add before calling getTruncateExpr.

Adds a new matcher that just matches and captures SCEVAddExpr, to
support matching a SCEVAddExpr with arbitrary number of operands.

(cherry picked from commit ab9b23c)
Update the logic added in
llvm#147824 to also allow adds of
constants. There are a number of cases where this can help remove
redundant phis and replace some computation with a ptrtoint (which
likely is free in the backend).

PR: llvm#150693
(cherry picked from commit 99d70e0)
This fixes a crash trying to use SCEVCouldNotCompute, if getPtrToIntExpr
failed.

Fixes llvm#155287

(cherry-picked from b29084f)
…llvm#155300)

Try to push constant multiply operand into a ZExt containing an add, if
possible. In general we are trying to push down ops through ZExt if
possible. This is similar to
llvm#151227 which did the same for
additions.

For now this is restricted to adds with a constant operand, which is
similar to some of the logic above.

This enables some additional simplifications.

Alive2 Proof: https://alive2.llvm.org/ce/z/97pbSL

PR: llvm#155300

(cherry-picked from 9143746)
Adds a test case for computing the backedge-taken-count for
llvm#155941
…156013)

Trip count expressions sometimes consist of adding 3 operands, i.e.
(Const + A + B). There may be guard info for A + B, and if so, apply it.

We can probably more generally apply this, but need to be careful w.r.t
compile-time.

Alive2 Proof for changes in miniters.ll:
https://alive2.llvm.org/ce/z/HFfXOx

Fixes llvm#155941.

PR: llvm#156013

(cherry-picked from fb7c0d7)
FoundNonConstantDistanceDependence is a misleading name for a variable
that determines whether we retry with runtime checks. Rename it.

(cherry picked from commit b692b23)
…vm#147047)

This patch extends the logic added in
llvm#128061 to support
dereferenceability information from assumptions as well.

Unfortunately both assumption cache and the dominator tree need to be
threaded through multiple layers to make them available where needed.

PR: llvm#147047
(cherry picked from commit 2ae996c)
Add support for identifying multiplication overflow in SCEV.
This is needed in LoopAccessAnalysis and that limitation was worked
around by 484417a.
This allows early-exit vectorization to work as expected in
vect.stats.ll test without needing the workaround.

(cherry picked from commit 00926a6)
Includes a test for the crash exposed by 08001cf.

(cherry picked from commit b169302)
…ing deref. (llvm#155672)"

This reverts commit f0df1e3.

Recommit with extra check for SCEVCouldNotCompute. Test has been added in
b169302.

Original message:
Remove the fall-back to constant max BTC if the backedge-taken-count
cannot be computed.

The constant max backedge-taken count is computed considering loop
guards, so to avoid regressions we need to apply loop guards as needed.

Also remove the special handling for Mul in willNotOverflow, as this
should not longer be needed after 9143746
(llvm#155300).

PR: llvm#155672
(cherry picked from commit a434a7a)
Update evaluatePtrAddrecAtMaxBTCWillNotWrap to support non-constant
sizes in dereferenceable assumptions.

Apply loop-guards in a few places needed to reason about expressions
involving trip counts of the from (BTC - 1).

PR: llvm#156758
(cherry picked from commit b400fd1)
AnyOf reduces multiple input vectors to a single boolean value. When
used for early-exit vectorization, we need to consider any lane after
the early exit being poison. Any poison lane would result in poison
after the AnyOf reduction. To prevent this, freeze all inputs to AnyOf.

Fixes llvm#153946.
Fixes llvm#155162.

https://alive2.llvm.org/ce/z/FD-XxA

PR: llvm#154156
(cherry picked from commit f492eb9)
Add extra test coverage for follow-up to
llvm#156730.

(cherry picked from commit 45a2214)
Consolidate tests for multiple divisors in a single loop, add multiplies
by 1, 2, 5, 6.

Extends test coverage for
llvm#157159.

(cherry picked from commit b9f571f)
… A. (llvm#157159)

Generalize fold added in 74ec38f
(llvm#156730) to support multiplying and
dividing by different constants, given they are both powers-of-2 and C1 is a
multiple of C2, checked via logBase2.

https://alive2.llvm.org/ce/z/eqJ2xj

PR: llvm#157159
(cherry picked from commit a1afe66)
Treat negative constants C as -1 * abs(C1) when folding multiplies and
udivs.

Alive2 Proof: https://alive2.llvm.org/ce/z/bdj9W2

PR: llvm#157555
(cherry picked from commit 6580c91)
If C2 >u C1 and C1 >u 1, fold to A /u (C2 /u C1).

Depends on llvm#157555.

Alive2 Proof: https://alive2.llvm.org/ce/z/BWvQYN

PR: llvm#157656
(cherry picked from commit 70012fd)
fhahn added 20 commits November 4, 2025 11:49
Add test for SCEVUMaxExpr handling in
llvm#160012.

(cherry picked from commit 129c683)
…Z. (llvm#160941)

When computing the backedge taken count, we know that the expression
must be valid just before we enter the loop. Using the terminator of the
loop predecessor as context instruction for getConstantMultiple,
getMinTrailingZeros allows using information from things like alignment
assumptions.

When a context instruction is used, the result is not cached, as it is
only valid at the specific context instruction.

Compile-time looks neutral:
http://llvm-compile-time-tracker.com/compare.php?from=9be276ec75c087595ebb62fe11b35c1a90371a49&to=745980f5e1c8094ea1293cd145d0ef1390f03029&stat=instructions:u

No impact on llvm-opt-benchmark
(dtcxzyw/llvm-opt-benchmark#2867), but leads to
additonal unrolling in ~90 files across a C/C++ based corpus including
LLVM on AArch64 using libc++ (which emits alignment assumptions for
things like std::vector::begin).

PR: llvm#160941
(cherry picked from commit c7fbe38)
…s. (llvm#162617)

Simplify and generalize the code to get a common constant multiple for
expressions when collecting guards, replacing the manual implementation.

Split off from llvm#160012.

PR: llvm#162617
(cherry picked from commit 6d905e4)
…llvm#163017)

Follow-up as suggested in
llvm#162617.

Just use an APInt for DividesBy, as the existing code already operates
on APInt and thus handles the case of DividesBy being 1.

PR: llvm#163017
(cherry picked from commit 0d1f2f4)
Additional test coverage for llvm#160500.

(cherry picked from commit 735ee5c)
This patch adds a new m_scev_Trunc pattern matcher for SCEVTruncateExpr
and uses it in a few places to slightly simplify the code.

PR: llvm#163169
(cherry picked from commit bc4e14b)
Move URem matching to ScalarEvolutionPatternMatch.h so it can
be re-used together with other matchers.

Depends on llvm#163169

PR: llvm#163170
(cherry picked from commit 7f04ee1)
When collecting information from loop guards, use UMax(1, %b - %a) for
ICMP NE %a, %b, if neither are constant.

This improves results in some cases, and will be even more useful
together with
 * llvm#160012
 * llvm#159942

https://alive2.llvm.org/ce/z/YyBvoT

PR: llvm#160500
(cherry picked from commit 2d02726)
…e. (llvm#163260)

Follow-up to llvm#160941.

Even if we don't have a context instruction for the caller, we should be
able to provide context instructions for SCEVUnknowns. Unless I am
missing something, SCEVUnknown only become available at the point their
underlying IR instruction has been defined. If it is an argument, it
should be safe to use the first instruction in the entry block or the
instruction itself if it wraps an instruction.

This allows getConstantMultiple to make better use of alignment
assumptions.

PR: llvm#163260
(cherry picked from commit 3b46556)
Add additional test coverage for using NE guards added in 2d02726
(llvm#160500)

(cherry picked from commit 0792478)
Add a new variant of m_scev_Mul that binds a SCEVMulExpr and use it in
SCEVURem_match and also update 2 more places in ScalarEvolution.cpp that
can use m_scev_Mul as well.

PR: llvm#163364
(cherry picked from commit 7c54c82)
…lvm#163787)

Follow-up to 2d02726
(llvm#160500)

Creating the SCEV subtraction eagerly is very expensive. To soften the
blow, just collect a map with inequalities and check if we can apply the
subtract rewrite when rewriting SCEVAddExpr.

Restores most of the regression:

http://llvm-compile-time-tracker.com/compare.php?from=0792478e4e133be96650444f3264e89d002fc058&to=7fca35db60fe6f423ea6051b45226046c067c252&stat=instructions:u
stage1-O3: -0.10%
stage1-ReleaseThinLTO: -0.09%
stage1-ReleaseLTO-g: -0.10%
stage1-O0-g: +0.02%
stage1-aarch64-O3: -0.09%
stage1-aarch64-O0-g: +0.00%
stage2-O3: -0.17%
stage2-O0-g: -0.05%
stage2-clang: -0.07%

There is still some negative impact compared to before 2d02726, but
there's probably not much we could do reduce this even more.

Compile-time improvement with 2d02726 reverted on top of the
current PR:
http://llvm-compile-time-tracker.com/compare.php?from=7fca35db60fe6f423ea6051b45226046c067c252&to=98dd152bdfc76b30d00190d3850d89406ca3c21f&stat=instructions:u

stage1-O3: 60628M (-0.03%)
stage1-ReleaseThinLTO: 76388M (-0.04%)
stage1-ReleaseLTO-g: 89228M (-0.02%)
stage1-O0-g: 18523M (-0.03%)
stage1-aarch64-O3: 67623M (-0.03%)
stage1-aarch64-O0-g: 22595M (+0.01%)
stage2-O3: 52336M (+0.01%)
stage2-O0-g: 16174M (+0.00%)
stage2-clang: 34890032M (-0.03%)

PR: llvm#163787
(cherry picked from commit a5d3522)
…ub. (llvm#163250)

Follow-up to llvm#160500 to
preserve divisibiltiy info when creating the UMax.

PR: llvm#163250
(cherry picked from commit eb17a8d)
Add test with urem guard with non-constant divisor and AddRec guards.

Extra test coverage for llvm#163021

(cherry picked from commit 0731f18)
Move getPreviousSCEVDivisibleByDivisor from a lambda to a static
function and clarify the name (DividesBy -> DivisibleBy).

Split off refactoring from llvm#163021.

(cherry picked from commit 385ea0d)
Fix formatting for switch, to avoid unrelated changes/formatting errors
in llvm#163021.

(cherry picked from commit 817b7c5)
…llvm#163021)

At the moment, the effectivness of guards that contain divisibility
information (A % B == 0 ) depends on the order of the conditions.

This patch makes using divisibility information independent of the
order, by collecting and applying the divisibility information
separately.

We first collect all conditions in a vector, then collect the
divisibility information from all guards.

When processing other guards, we apply divisibility info collected
earlier.

After all guards have been processed, we add the divisibility info,
rewriting the existing rewrite. This ensures we apply the divisibility
info to the largest rewrite expression.

This helps to improve results in a few cases, one in
dtcxzyw/llvm-opt-benchmark#2921 and another one
in a different large C/C++ based IR corpus.

PR: llvm#163021
(cherry picked from commit d3fe1df)
When using information from dereferenceable assumptions, we need to make
sure that the memory is not freed between the assume and the specified
context instruction. Instead of just checking canBeFreed, check if there
any calls that may free between the assume and the context instruction.

This patch introduces a willNotFreeBetween to check for calls that may
free between an assume and a context instructions, to also be used in
llvm#161255.

PR: llvm#161725
(cherry picked from commit 7ceef76)
@fhahn
Copy link
Author

fhahn commented Nov 4, 2025

@swift-ci please test

@fhahn
Copy link
Author

fhahn commented Nov 4, 2025

@swift-ci please test llvm

fhahn added 3 commits November 4, 2025 20:34
llvm#156929)

Update Clang's __builtin_assume_dereferenceable to support non-constant
lengths. The corresponding assume bundle has been updated to support
non-constant sizes in cad62df.

The current docs for the builtin don't mention the constant requirement
for the size argument, so don't need to be updated:
https://clang.llvm.org/docs/LanguageExtensions.html#builtin-assume-dereferenceable

A number of patches landed recently to make the optimizer make better
use of the dereferenceable assumptions, and once
llvm#156730 lands, it can be used
to vectorize some early-exit loops, for example std::find with
std::vector::iterator: https://godbolt.org/z/qo58PKG37
```
  #include <algorithm>
  #include <cstddef>
  #include <vector>

  auto find(std::vector<short>::iterator first, short s, unsigned size) {
    auto Addr = __builtin_assume_aligned(std::to_address(first),  2);
    __builtin_assume_dereferenceable(std::to_address(first), size * sizeof(short));
    return std::find(first, first + size, s);
  }
```

PR: llvm#156929
(cherry picked from commit c8d065b)
When using information from dereferenceable assumptions, we need to make
sure that the memory is not freed between the assume and the specified
context instruction. Instead of just checking canBeFreed, check if there
any calls that may free between the assume and the context instruction.

Note that this also adjusts the context instruction to be the terminator
in the loop predecessor, if there is one and it is a branch (to avoid
things like invoke).

PR: llvm#161255
(cherry picked from commit 8b8c59c)
@fhahn fhahn force-pushed the pick-scev-laa-loads-changes-for-early-exit branch from b8eaf2c to dec6e12 Compare November 4, 2025 20:35
@fhahn
Copy link
Author

fhahn commented Nov 4, 2025

@swift-ci please test

@fhahn
Copy link
Author

fhahn commented Nov 4, 2025

@swift-ci please test llvm

@fhahn fhahn merged commit 4d06b34 into swiftlang:stable/21.x Nov 5, 2025
5 checks passed
@fhahn fhahn deleted the pick-scev-laa-loads-changes-for-early-exit branch November 5, 2025 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants