Skip to content

Conversation

@wangxunx
Copy link
Contributor

Summary

This PR fixes the out of resource issue when fine tune llama3.2 on AMD MI300 gpu.

Context / Motivation

As AMD GPU with CDNA3 arch has different hardware resource with Nvidia's gpu, the default settings of cross_entropy_loss kernel produces illegal launch parameters and cause training failure, for Llama and Qwen3 models.
This PR reduces num warps to reasonable values, to enable llm sft on AMD MI300.

Changes

  • Add AMD arch detection utility functions
  • For AMD's CDNA3 arch, reduced num_warps to 16

Testing

  • llama3.2 and qwen3 sft pass on AMD MI300
  • llama3.2 and qwen3 sft pass on Nvidia's 4090

Related Issues

None

@danielhanchen danielhanchen merged commit 67cc329 into unslothai:main Oct 17, 2025
@danielhanchen
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants