Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix post-build-script #8982

Merged
merged 4 commits into from
Mar 18, 2025
Merged

Fix post-build-script #8982

merged 4 commits into from
Mar 18, 2025

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Mar 18, 2025

Job has been failing with various issues

The CUDA_HOME variable wasn't set on some job which led to a failure.

Also, lddtree must be imported differently from auditwheel 6.3 as mentioned in #8981:

ImportError: cannot import name 'lddtree' from 'auditwheel.lddtree' (/__w/_temp/conda_environment_13920830566/lib/python3.9/site-packages/auditwheel/lddtree.py)

And we didn't know this job was failing because we didn't raise on failure. So wheel relocation has been silently failing, which was probably causing all wheel jobs to fail: #8980

So, hopefully, this PR fixes all this mess. UGH.

Fixes #8981
Fixes #8980

Copy link

pytorch-bot bot commented Mar 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8982

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit 0cf06a8 with merge base ade1c9b (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@NicolasHug
Copy link
Member Author

All jobs are green (finally), merging.

@NicolasHug NicolasHug merged commit 29066f5 into pytorch:main Mar 18, 2025
60 checks passed
Copy link

Hey @NicolasHug!

You merged this PR, but no labels were added.
The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

NicolasHug added a commit to NicolasHug/vision that referenced this pull request Mar 18, 2025
jithunnair-amd pushed a commit to ROCm/vision that referenced this pull request Mar 18, 2025
jithunnair-amd pushed a commit to ROCm/vision that referenced this pull request Mar 18, 2025
jithunnair-amd pushed a commit to ROCm/vision that referenced this pull request Mar 18, 2025
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Mar 19, 2025
Observing aarch64 failure in nightly:
https://github.com/pytorch/pytorch/actions/runs/13917778961/job/38943911228

Similar to: pytorch/vision#8982

```
2025-03-18T08:44:58.4128744Z Repairing Wheel with AuditWheel
2025-03-18T08:44:58.5440988Z INFO:auditwheel.main_repair:Repairing torch-2.8.0.dev20250318+cpu-cp39-cp39-linux_aarch64.whl
2025-03-18T08:45:20.3393288Z Traceback (most recent call last):
2025-03-18T08:45:20.3393732Z   File "/opt/python/cp39-cp39/bin/auditwheel", line 8, in <module>
2025-03-18T08:45:20.3394115Z     sys.exit(main())
2025-03-18T08:45:20.3394559Z   File "/opt/_internal/cpython-3.9.21/lib/python3.9/site-packages/auditwheel/main.py", line 53, in main
2025-03-18T08:45:20.3395064Z     result: int | None = args.func(args, p)
2025-03-18T08:45:20.3395626Z   File "/opt/_internal/cpython-3.9.21/lib/python3.9/site-packages/auditwheel/main_repair.py", line 203, in execute
2025-03-18T08:45:20.3396163Z     out_wheel = repair_wheel(
2025-03-18T08:45:20.3396657Z   File "/opt/_internal/cpython-3.9.21/lib/python3.9/site-packages/auditwheel/repair.py", line 84, in repair_wheel
2025-03-18T08:45:20.3397184Z     raise ValueError(msg)
2025-03-18T08:45:20.3397620Z ValueError: Cannot repair wheel, because required library "libarm_compute.so" could not be located
2025-03-18T08:45:20.3678843Z Traceback (most recent call last):
2025-03-18T08:45:20.3679267Z   File "/pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py", line 236, in <module>
2025-03-18T08:45:20.3680988Z     pytorch_wheel_name = complete_wheel("/pytorch/")
2025-03-18T08:45:20.3681449Z   File "/pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py", line 141, in complete_wheel
2025-03-18T08:45:20.3681976Z     check_call(["auditwheel", "repair", f"dist/{wheel_name}"], cwd=folder)
2025-03-18T08:45:20.3682860Z   File "/opt/python/cp39-cp39/lib/python3.9/subprocess.py", line 373, in check_call
2025-03-18T08:45:20.3683308Z     raise CalledProcessError(retcode, cmd)
2025-03-18T08:45:20.3684034Z subprocess.CalledProcessError: Command '['auditwheel', 'repair', 'dist/torch-2.8.0.dev20250318+cpu-cp39-cp39-linux_aarch64.whl']' returned non-zero exit status 1.
2025-03-18T08:45:20.3790063Z ##[error]Process completed with exit code 1.
2025-03-18T08:45:20.3862012Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main
2025-03-18T08:45:20.3862448Z with:
```

Please note aarch64 CUDA failures are related to: #149351
Pull Request resolved: #149471
Approved by: https://github.com/malfet
pytorchbot pushed a commit to pytorch/pytorch that referenced this pull request Mar 19, 2025
Observing aarch64 failure in nightly:
https://github.com/pytorch/pytorch/actions/runs/13917778961/job/38943911228

Similar to: pytorch/vision#8982

```
2025-03-18T08:44:58.4128744Z Repairing Wheel with AuditWheel
2025-03-18T08:44:58.5440988Z INFO:auditwheel.main_repair:Repairing torch-2.8.0.dev20250318+cpu-cp39-cp39-linux_aarch64.whl
2025-03-18T08:45:20.3393288Z Traceback (most recent call last):
2025-03-18T08:45:20.3393732Z   File "/opt/python/cp39-cp39/bin/auditwheel", line 8, in <module>
2025-03-18T08:45:20.3394115Z     sys.exit(main())
2025-03-18T08:45:20.3394559Z   File "/opt/_internal/cpython-3.9.21/lib/python3.9/site-packages/auditwheel/main.py", line 53, in main
2025-03-18T08:45:20.3395064Z     result: int | None = args.func(args, p)
2025-03-18T08:45:20.3395626Z   File "/opt/_internal/cpython-3.9.21/lib/python3.9/site-packages/auditwheel/main_repair.py", line 203, in execute
2025-03-18T08:45:20.3396163Z     out_wheel = repair_wheel(
2025-03-18T08:45:20.3396657Z   File "/opt/_internal/cpython-3.9.21/lib/python3.9/site-packages/auditwheel/repair.py", line 84, in repair_wheel
2025-03-18T08:45:20.3397184Z     raise ValueError(msg)
2025-03-18T08:45:20.3397620Z ValueError: Cannot repair wheel, because required library "libarm_compute.so" could not be located
2025-03-18T08:45:20.3678843Z Traceback (most recent call last):
2025-03-18T08:45:20.3679267Z   File "/pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py", line 236, in <module>
2025-03-18T08:45:20.3680988Z     pytorch_wheel_name = complete_wheel("/pytorch/")
2025-03-18T08:45:20.3681449Z   File "/pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py", line 141, in complete_wheel
2025-03-18T08:45:20.3681976Z     check_call(["auditwheel", "repair", f"dist/{wheel_name}"], cwd=folder)
2025-03-18T08:45:20.3682860Z   File "/opt/python/cp39-cp39/lib/python3.9/subprocess.py", line 373, in check_call
2025-03-18T08:45:20.3683308Z     raise CalledProcessError(retcode, cmd)
2025-03-18T08:45:20.3684034Z subprocess.CalledProcessError: Command '['auditwheel', 'repair', 'dist/torch-2.8.0.dev20250318+cpu-cp39-cp39-linux_aarch64.whl']' returned non-zero exit status 1.
2025-03-18T08:45:20.3790063Z ##[error]Process completed with exit code 1.
2025-03-18T08:45:20.3862012Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main
2025-03-18T08:45:20.3862448Z with:
```

Please note aarch64 CUDA failures are related to: #149351
Pull Request resolved: #149471
Approved by: https://github.com/malfet

(cherry picked from commit 4df66e0)
pruthvistony added a commit to ROCm/pytorch that referenced this pull request Mar 19, 2025
pruthvistony added a commit to ROCm/pytorch that referenced this pull request Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

packaging/wheel/relocate.py failed for auditwheel>=6.3 nvjpeg missing from all linux GPU wheel build jobs
2 participants