Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDGPU should handle SimplifyDemandedVectorElts for more trivial intrinsics #131734

Open
arsenm opened this issue Mar 18, 2025 · 9 comments
Open
Assignees
Labels
backend:AMDGPU good first issue https://github.com/llvm/llvm-project/contribute missed-optimization

Comments

@arsenm
Copy link
Contributor

arsenm commented Mar 18, 2025

As a follow up to #128647, more intrinsics should be handled in SimplifyDemandedVectorElts.

This includes: Intrinsic::amdgcn_readlane, Intrinsic::amdgcn_update_dpp, Intrinsic::amdgcn_permlane16, Intrinsic::amdgcn_permlanex16, Intrinsic::amdgcn_permlane64 and Intrinsic::amdgcn_mov_dpp8

This is mostly a matter of adding the intrinsics to the switch, some boilerplate to keep the other immediate operands as they are, and adding tests similar to the ones added in #128647

@arsenm arsenm added backend:AMDGPU good first issue https://github.com/llvm/llvm-project/contribute missed-optimization labels Mar 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 18, 2025

Hi!

This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:

  1. Check that no other contributor has already been assigned to this issue. If you believe that no one is actually working on it despite an assignment, ping the person. After one week without a response, the assignee may be changed.
  2. In the comments of this issue, request for it to be assigned to you, or just create a pull request after following the steps below. Mention this issue in the description of the pull request.
  3. Fix the issue locally.
  4. Run the test suite locally. Remember that the subdirectories under test/ create fine-grained testing targets, so you can e.g. use make check-clang-ast to only run Clang's AST tests.
  5. Create a Git commit.
  6. Run git clang-format HEAD~1 to format your changes.
  7. Open a pull request to the upstream repository on GitHub. Detailed instructions can be found in GitHub's documentation. Mention this issue in the description of the pull request.

If you have any further questions about this issue, don't hesitate to ask via a comment in the thread below.

@llvmbot
Copy link
Member

llvmbot commented Mar 18, 2025

@llvm/issue-subscribers-good-first-issue

Author: Matt Arsenault (arsenm)

As a follow up to https://github.com//pull/128647, more intrinsics should be handled in SimplifyDemandedVectorElts.

This includes: Intrinsic::amdgcn_readlane, Intrinsic::amdgcn_update_dpp, Intrinsic::amdgcn_permlane16, Intrinsic::amdgcn_permlanex16, Intrinsic::amdgcn_permlane64 and Intrinsic::amdgcn_mov_dpp8

This is mostly a matter of adding the intrinsics to the switch, some boilerplate to keep the other immediate operands as they are, and adding tests similar to the ones added in #128647

@llvmbot
Copy link
Member

llvmbot commented Mar 18, 2025

@llvm/issue-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

As a follow up to https://github.com//pull/128647, more intrinsics should be handled in SimplifyDemandedVectorElts.

This includes: Intrinsic::amdgcn_readlane, Intrinsic::amdgcn_update_dpp, Intrinsic::amdgcn_permlane16, Intrinsic::amdgcn_permlanex16, Intrinsic::amdgcn_permlane64 and Intrinsic::amdgcn_mov_dpp8

This is mostly a matter of adding the intrinsics to the switch, some boilerplate to keep the other immediate operands as they are, and adding tests similar to the ones added in #128647

@jayfoad
Copy link
Contributor

jayfoad commented Mar 18, 2025

I don't see why this is specific to AMDGPU. It should apply to any isTriviallyScalarizable intrinsic.

@DiviyamPathak
Copy link

Can I take this up

@DiviyamPathak
Copy link

@arsenm hello how do I handle Intrinsic::amdgcn_permlane16, Intrinsic::amdgcn_permlanex16, Intrinsic::amdgcn_permlane64 in switch. I am new contributor sorry

@BaoshanPang
Copy link
Contributor

@arsenm Do we only support the instrinsic from ll to ll? I am gettting this error when trying to see the assembly ouput:

$ ./bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942  /home/bpang/myworks/github/llvm-project/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
ScalarizeVectorResult #0: t6: v1i16 = llvm.amdgcn.readfirstlane TargetConstant:i64<3139>, t4

LLVM ERROR: Do not know how to scalarize the result of this operator!

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:

@BaoshanPang
Copy link
Contributor

And it would stuck there if using option -global-sel

$ ./bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -global-isel
/home/bpang/myworks/github/llvm-project/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll

@arsenm
Copy link
Contributor Author

arsenm commented Mar 22, 2025

1 X vectors are a degenerate case, the backend doesn't treat them as legal. These should use the scalar type instead (i.e replace '<1 x i16>' with 'i16'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU good first issue https://github.com/llvm/llvm-project/contribute missed-optimization
Projects
None yet
Development

No branches or pull requests

5 participants