Fix out of resources issue for llama3.2 sft on amd gpu #3455

wangxunx · 2025-10-15T06:22:03Z

Summary

This PR fixes the out of resource issue when fine tune llama3.2 on AMD MI300 gpu.

Context / Motivation

As AMD GPU with CDNA3 arch has different hardware resource with Nvidia's gpu, the default settings of cross_entropy_loss kernel produces illegal launch parameters and cause training failure, for Llama and Qwen3 models.
This PR reduces num warps to reasonable values, to enable llm sft on AMD MI300.

Changes

Add AMD arch detection utility functions
For AMD's CDNA3 arch, reduced num_warps to 16

Testing

llama3.2 and qwen3 sft pass on AMD MI300
llama3.2 and qwen3 sft pass on Nvidia's 4090

Related Issues

None

danielhanchen · 2025-10-17T23:24:12Z

Thanks!

fix out of resources issue for llama3.2 sft on amd gpu

79b5c35

danielhanchen merged commit 67cc329 into unslothai:main Oct 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix out of resources issue for llama3.2 sft on amd gpu #3455

Fix out of resources issue for llama3.2 sft on amd gpu #3455

wangxunx commented Oct 15, 2025

Uh oh!

danielhanchen commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Fix out of resources issue for llama3.2 sft on amd gpu #3455

Fix out of resources issue for llama3.2 sft on amd gpu #3455

Conversation

wangxunx commented Oct 15, 2025

Summary

Context / Motivation

Changes

Testing

Related Issues

Uh oh!

danielhanchen commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants