AMDGPU should handle SimplifyDemandedVectorElts for more trivial intrinsics #131734

arsenm · 2025-03-18T05:51:45Z

As a follow up to #128647, more intrinsics should be handled in SimplifyDemandedVectorElts.

This includes: Intrinsic::amdgcn_readlane, Intrinsic::amdgcn_update_dpp, Intrinsic::amdgcn_permlane16, Intrinsic::amdgcn_permlanex16, Intrinsic::amdgcn_permlane64 and Intrinsic::amdgcn_mov_dpp8

This is mostly a matter of adding the intrinsics to the switch, some boilerplate to keep the other immediate operands as they are, and adding tests similar to the ones added in #128647

llvmbot · 2025-03-18T05:52:01Z

Hi!

This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:

Check that no other contributor has already been assigned to this issue. If you believe that no one is actually working on it despite an assignment, ping the person. After one week without a response, the assignee may be changed.
In the comments of this issue, request for it to be assigned to you, or just create a pull request after following the steps below. Mention this issue in the description of the pull request.
Fix the issue locally.
Run the test suite locally. Remember that the subdirectories under test/ create fine-grained testing targets, so you can e.g. use make check-clang-ast to only run Clang's AST tests.
Create a Git commit.
Run git clang-format HEAD~1 to format your changes.
Open a pull request to the upstream repository on GitHub. Detailed instructions can be found in GitHub's documentation. Mention this issue in the description of the pull request.

If you have any further questions about this issue, don't hesitate to ask via a comment in the thread below.

llvmbot · 2025-03-18T05:52:03Z

@llvm/issue-subscribers-good-first-issue

Author: Matt Arsenault (arsenm)

As a follow up to https://github.com//pull/128647, more intrinsics should be handled in SimplifyDemandedVectorElts.

This includes: Intrinsic::amdgcn_readlane, Intrinsic::amdgcn_update_dpp, Intrinsic::amdgcn_permlane16, Intrinsic::amdgcn_permlanex16, Intrinsic::amdgcn_permlane64 and Intrinsic::amdgcn_mov_dpp8

This is mostly a matter of adding the intrinsics to the switch, some boilerplate to keep the other immediate operands as they are, and adding tests similar to the ones added in #128647

llvmbot · 2025-03-18T05:52:05Z

@llvm/issue-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

As a follow up to https://github.com//pull/128647, more intrinsics should be handled in SimplifyDemandedVectorElts.

This includes: Intrinsic::amdgcn_readlane, Intrinsic::amdgcn_update_dpp, Intrinsic::amdgcn_permlane16, Intrinsic::amdgcn_permlanex16, Intrinsic::amdgcn_permlane64 and Intrinsic::amdgcn_mov_dpp8

This is mostly a matter of adding the intrinsics to the switch, some boilerplate to keep the other immediate operands as they are, and adding tests similar to the ones added in #128647

jayfoad · 2025-03-18T10:03:06Z

I don't see why this is specific to AMDGPU. It should apply to any isTriviallyScalarizable intrinsic.

DiviyamPathak · 2025-03-19T08:15:02Z

Can I take this up

DiviyamPathak · 2025-03-21T14:23:26Z

@arsenm hello how do I handle Intrinsic::amdgcn_permlane16, Intrinsic::amdgcn_permlanex16, Intrinsic::amdgcn_permlane64 in switch. I am new contributor sorry

BaoshanPang · 2025-03-22T00:20:59Z

@arsenm Do we only support the instrinsic from ll to ll? I am gettting this error when trying to see the assembly ouput:

$ ./bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942  /home/bpang/myworks/github/llvm-project/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
ScalarizeVectorResult #0: t6: v1i16 = llvm.amdgcn.readfirstlane TargetConstant:i64<3139>, t4

LLVM ERROR: Do not know how to scalarize the result of this operator!

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:

BaoshanPang · 2025-03-22T00:22:00Z

And it would stuck there if using option -global-sel

$ ./bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -global-isel
/home/bpang/myworks/github/llvm-project/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll

arsenm · 2025-03-22T03:40:48Z

1 X vectors are a degenerate case, the backend doesn't treat them as legal. These should use the scalar type instead (i.e replace '<1 x i16>' with 'i16'

arsenm added backend:AMDGPU good first issue https://github.com/llvm/llvm-project/contribute missed-optimization labels Mar 18, 2025

arsenm assigned DiviyamPathak Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMDGPU should handle SimplifyDemandedVectorElts for more trivial intrinsics #131734

AMDGPU should handle SimplifyDemandedVectorElts for more trivial intrinsics #131734

arsenm commented Mar 18, 2025

llvmbot commented Mar 18, 2025

llvmbot commented Mar 18, 2025

llvmbot commented Mar 18, 2025

jayfoad commented Mar 18, 2025

DiviyamPathak commented Mar 19, 2025

DiviyamPathak commented Mar 21, 2025

BaoshanPang commented Mar 22, 2025

BaoshanPang commented Mar 22, 2025

arsenm commented Mar 22, 2025

AMDGPU should handle SimplifyDemandedVectorElts for more trivial intrinsics #131734

AMDGPU should handle SimplifyDemandedVectorElts for more trivial intrinsics #131734

Comments

arsenm commented Mar 18, 2025

llvmbot commented Mar 18, 2025

llvmbot commented Mar 18, 2025

llvmbot commented Mar 18, 2025

jayfoad commented Mar 18, 2025

DiviyamPathak commented Mar 19, 2025

DiviyamPathak commented Mar 21, 2025

BaoshanPang commented Mar 22, 2025

BaoshanPang commented Mar 22, 2025

arsenm commented Mar 22, 2025