Skip to content

Commit 23c9bf3

Browse files
danielhanchenDatta0shimmyshimmerjeromekummathew23
authored
Fix transformers 4.57.1 (#3473)
* Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit 204fc46. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit 204fc46. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit 204fc46. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update loader.py * Fix padding issue * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * New models * Update llama.py * Versioning * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Fix AMD * Update _utils.py * Update llama.py * Update vision.py * DEVICE_TYPE_TORCH * Update __init__.py * Update __init__.py * Update _utils.py * Move DEVICE_TYPE * Update rl_replacements.py * Update loader.py * AMD install script * Move AMD * Update _amd_install.sh * Update pyproject.toml --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com>
1 parent 85923d0 commit 23c9bf3

File tree

10 files changed

+237
-56
lines changed

10 files changed

+237
-56
lines changed

pyproject.toml

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -39,11 +39,10 @@ triton = [
3939
"triton>=3.0.0 ; ('linux' in sys_platform)",
4040
"triton-windows ; (sys_platform == 'win32') and (platform_machine == 'AMD64' or platform_machine == 'x86_64')",
4141
]
42-
huggingface = [
43-
"unsloth_zoo>=2025.10.4",
42+
huggingfacenotorch = [
43+
"unsloth_zoo>=2025.10.5",
4444
"wheel>=0.42.0",
4545
"packaging",
46-
"torchvision",
4746
"numpy",
4847
"tqdm",
4948
"psutil",
@@ -58,6 +57,10 @@ huggingface = [
5857
"diffusers",
5958
"transformers>=4.51.3,!=4.52.0,!=4.52.1,!=4.52.2,!=4.52.3,!=4.53.0,!=4.54.0,!=4.55.0,!=4.55.1,<=4.56.2",
6059
"trl>=0.7.9,!=0.9.0,!=0.9.1,!=0.9.2,!=0.9.3,!=0.15.0,!=0.19.0,<=0.23.0",
60+
]
61+
huggingface = [
62+
"unsloth[huggingfacenotorch]",
63+
"torchvision",
6164
"unsloth[triton]",
6265
]
6366
windows = [
@@ -458,7 +461,7 @@ colab-ampere-torch220 = [
458461
"flash-attn>=2.6.3 ; ('linux' in sys_platform)",
459462
]
460463
colab-new = [
461-
"unsloth_zoo>=2025.10.4",
464+
"unsloth_zoo>=2025.10.5",
462465
"packaging",
463466
"tyro",
464467
"transformers>=4.51.3,!=4.52.0,!=4.52.1,!=4.52.2,!=4.52.3,!=4.53.0,!=4.54.0,!=4.55.0,!=4.55.1,<=4.56.2",
@@ -740,7 +743,12 @@ intel-gpu-torch270 = [
740743
"torch @ https://download.pytorch.org/whl/xpu/torch-2.7.0%2Bxpu-cp312-cp312-linux_x86_64.whl#sha256=c806d44aa2ca5d225629f6fbc6c994d5deaac2d2cde449195bc8e3522ddd219a ; ('linux' in sys_platform) and python_version == '3.12' and (platform_machine == 'AMD64' or platform_machine == 'x86_64')",
741744
"torch @ https://download.pytorch.org/whl/xpu/torch-2.7.0%2Bxpu-cp313-cp313-linux_x86_64.whl#sha256=25d8277b7f01d42e2e014ccbab57a2692b6ec4eff8dcf894eda1b297407cf97a ; ('linux' in sys_platform) and python_version == '3.13' and (platform_machine == 'AMD64' or platform_machine == 'x86_64')",
742745
]
743-
746+
amd = [
747+
"unsloth[huggingfacenotorch]",
748+
"bitsandbytes @ https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whl ; ('linux' in sys_platform) and (platform_machine == 'AMD64' or platform_machine == 'x86_64')",
749+
"bitsandbytes @ https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-win_amd64.whl ; (sys_platform == 'win32') and (platform_machine == 'AMD64' or platform_machine == 'x86_64')",
750+
"bitsandbytes @ https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_aarch64.whl ; ('linux' in sys_platform) and (platform_machine == 'aarch64')",
751+
]
744752
[project.urls]
745753
homepage = "http://www.unsloth.ai"
746754
documentation = "https://github.com/unslothai/unsloth"

unsloth/__init__.py

Lines changed: 12 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -69,49 +69,14 @@
6969
raise exception
7070
pass
7171

72-
@functools.cache
73-
def is_hip():
74-
return bool(getattr(getattr(torch, "version", None), "hip", None))
75-
pass
76-
77-
@functools.cache
78-
def get_device_type():
79-
if hasattr(torch, "cuda") and torch.cuda.is_available():
80-
if is_hip():
81-
return "hip"
82-
return "cuda"
83-
elif hasattr(torch, "xpu") and torch.xpu.is_available():
84-
return "xpu"
85-
# Check torch.accelerator
86-
if hasattr(torch, "accelerator"):
87-
if not torch.accelerator.is_available():
88-
raise NotImplementedError("Unsloth cannot find any torch accelerator? You need a GPU.")
89-
accelerator = str(torch.accelerator.current_accelerator())
90-
if accelerator in ("cuda", "xpu", "hip"):
91-
raise RuntimeError(
92-
f"Unsloth: Weirdly `torch.cuda.is_available()`, `torch.xpu.is_available()` and `is_hip` all failed.\n"\
93-
f"But `torch.accelerator.current_accelerator()` works with it being = `{accelerator}`\n"\
94-
f"Please reinstall torch - it's most likely broken :("
95-
)
96-
raise NotImplementedError("Unsloth currently only works on NVIDIA, AMD and Intel GPUs.")
97-
pass
98-
DEVICE_TYPE : str = get_device_type()
99-
# HIP fails for autocast and other torch functions. Use CUDA instead
100-
DEVICE_TYPE_TORCH = DEVICE_TYPE
101-
if DEVICE_TYPE_TORCH == "hip": DEVICE_TYPE_TORCH = "cuda"
102-
103-
@functools.cache
104-
def get_device_count():
105-
if DEVICE_TYPE in ("cuda", "hip"):
106-
return torch.cuda.device_count()
107-
elif DEVICE_TYPE == "xpu":
108-
return torch.xpu.device_count()
109-
else:
110-
return 1
111-
pass
112-
113-
DEVICE_COUNT : int = get_device_count()
114-
72+
from .device_type import (
73+
is_hip,
74+
get_device_type,
75+
DEVICE_TYPE,
76+
DEVICE_TYPE_TORCH,
77+
DEVICE_COUNT,
78+
ALLOW_PREQUANTIZED_MODELS,
79+
)
11580
# Reduce VRAM usage by reducing fragmentation
11681
# And optimize pinning of memory
11782
# TODO(billishyahao): need to add hip related optimization...
@@ -201,7 +166,10 @@ def is_bf16_supported(): return SUPPORTS_BFLOAT16
201166
else: from triton.common.build import libcuda_dirs
202167

203168
# Try loading bitsandbytes and triton
204-
import bitsandbytes as bnb
169+
try:
170+
import bitsandbytes as bnb
171+
except:
172+
print("Unsloth: `bitsandbytes` is not installed - 4bit QLoRA unallowed, but 16bit and full finetuning works!")
205173
try:
206174
cdequantize_blockwise_fp32 = bnb.functional.lib.cdequantize_blockwise_fp32
207175
libcuda_dirs()

unsloth/_amd_install.sh

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
#!/usr/bin/env bash
2+
# _amd_install.sh
3+
# Non-interactive installer: build tools, PyTorch (ROCm 6.4), bitsandbytes (HIP), and Unsloth from source.
4+
# Usage:
5+
# bash _amd_install.sh
6+
#
7+
8+
set -euo pipefail
9+
export DEBIAN_FRONTEND=noninteractive
10+
11+
apt-get update
12+
apt-get install -y --no-install-recommends build-essential cmake git
13+
14+
pip install \
15+
torch==2.8.0 torchvision torchaudio torchao==0.13.0 xformers \
16+
--index-url https://download.pytorch.org/whl/rocm6.4
17+
18+
WORKDIR="$(pwd)"
19+
TMPDIR="$(mktemp -d)"
20+
cd "$TMPDIR"
21+
git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git
22+
cd bitsandbytes
23+
arch
24+
cmake -DCOMPUTE_BACKEND=hip -S .
25+
make -j"$(nproc)"
26+
pip install .
27+
cd "$WORKDIR"
28+
rm -rf "$TMPDIR"
29+
30+
pip install --no-deps unsloth unsloth-zoo
31+
pip install "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo"
32+
pip install "unsloth[base] @ git+https://github.com/unslothai/unsloth"

unsloth/device_type.py

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Copyright 2023-present Daniel Han-Chen & the Unsloth team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
__all__ = [
16+
"is_hip",
17+
"get_device_type",
18+
"DEVICE_TYPE",
19+
"DEVICE_TYPE_TORCH",
20+
"DEVICE_COUNT",
21+
"ALLOW_PREQUANTIZED_MODELS",
22+
]
23+
24+
import torch
25+
import functools
26+
27+
@functools.cache
28+
def is_hip():
29+
return bool(getattr(getattr(torch, "version", None), "hip", None))
30+
pass
31+
32+
@functools.cache
33+
def get_device_type():
34+
if hasattr(torch, "cuda") and torch.cuda.is_available():
35+
if is_hip():
36+
return "hip"
37+
return "cuda"
38+
elif hasattr(torch, "xpu") and torch.xpu.is_available():
39+
return "xpu"
40+
# Check torch.accelerator
41+
if hasattr(torch, "accelerator"):
42+
if not torch.accelerator.is_available():
43+
raise NotImplementedError("Unsloth cannot find any torch accelerator? You need a GPU.")
44+
accelerator = str(torch.accelerator.current_accelerator())
45+
if accelerator in ("cuda", "xpu", "hip"):
46+
raise RuntimeError(
47+
f"Unsloth: Weirdly `torch.cuda.is_available()`, `torch.xpu.is_available()` and `is_hip` all failed.\n"\
48+
f"But `torch.accelerator.current_accelerator()` works with it being = `{accelerator}`\n"\
49+
f"Please reinstall torch - it's most likely broken :("
50+
)
51+
raise NotImplementedError("Unsloth currently only works on NVIDIA, AMD and Intel GPUs.")
52+
pass
53+
DEVICE_TYPE : str = get_device_type()
54+
# HIP fails for autocast and other torch functions. Use CUDA instead
55+
DEVICE_TYPE_TORCH = DEVICE_TYPE
56+
if DEVICE_TYPE_TORCH == "hip": DEVICE_TYPE_TORCH = "cuda"
57+
58+
@functools.cache
59+
def get_device_count():
60+
if DEVICE_TYPE in ("cuda", "hip"):
61+
return torch.cuda.device_count()
62+
elif DEVICE_TYPE == "xpu":
63+
return torch.xpu.device_count()
64+
else:
65+
return 1
66+
pass
67+
68+
DEVICE_COUNT : int = get_device_count()
69+
70+
# Check blocksize for 4bit -> 64 for CUDA, 128 for AMD
71+
# If AMD, we cannot load pre-quantized models for now :(
72+
ALLOW_PREQUANTIZED_MODELS : bool = True
73+
if DEVICE_TYPE == "hip":
74+
try:
75+
from bitsandbytes.nn.modules import Params4bit
76+
if "blocksize = 64 if not HIP_ENVIRONMENT else 128" in inspect.getsource(Params4bit):
77+
ALLOW_PREQUANTIZED_MODELS = False
78+
except:
79+
pass
80+
pass

unsloth/kernels/utils.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,14 @@
1919
import functools
2020
from typing import Optional
2121

22-
from .. import DEVICE_TYPE, DEVICE_COUNT
22+
from ..device_type import (
23+
is_hip,
24+
get_device_type,
25+
DEVICE_TYPE,
26+
DEVICE_TYPE_TORCH,
27+
DEVICE_COUNT,
28+
ALLOW_PREQUANTIZED_MODELS,
29+
)
2330
from .fp8 import weight_dequant, fp8_linear
2431

2532
# torch.cuda.amp.custom_fwd is deprecated >= 2.4

unsloth/models/_utils.py

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
__version__ = "2025.10.4"
15+
__version__ = "2025.10.5"
1616

1717
__all__ = [
1818
"SUPPORTS_BFLOAT16",
@@ -87,7 +87,14 @@
8787
import warnings, subprocess, re, inspect, psutil, os, math
8888
from unsloth_zoo.utils import Version
8989
from importlib.metadata import version as importlib_version
90-
from unsloth import DEVICE_TYPE, DEVICE_COUNT, DEVICE_TYPE_TORCH
90+
from ..device_type import (
91+
is_hip,
92+
get_device_type,
93+
DEVICE_TYPE,
94+
DEVICE_TYPE_TORCH,
95+
DEVICE_COUNT,
96+
ALLOW_PREQUANTIZED_MODELS,
97+
)
9198
from unsloth_zoo.log import logger
9299
from unsloth_zoo.tokenizer_utils import (
93100
patch_tokenizer as _patch_tokenizer,
@@ -1331,6 +1338,7 @@ def _unsloth_pre_compute_loss(self, model, inputs, *args, **kwargs):
13311338

13321339
def patch_gradient_accumulation_fix(Trainer):
13331340
# Fixes gradient accumulation
1341+
# Fixes Output 0 of UnslothFusedLossBackward is a view and is being modified inplace.
13341342
import inspect
13351343
if hasattr(Trainer, "get_batch_samples"):
13361344
if Trainer.get_batch_samples.__name__ == "_unsloth_get_batch_samples": return
@@ -1346,6 +1354,32 @@ def patch_gradient_accumulation_fix(Trainer):
13461354

13471355
# Also fix passing in num_items_in_batch
13481356
if not hasattr(Trainer, "_old_compute_loss"):
1357+
1358+
# Fix transformers 4.57.0 causing `Output 0 of UnslothFusedLossBackward is a view and is being modified inplace.`
1359+
function = inspect.getsource(Trainer.compute_loss)
1360+
if "loss *=" in function or "loss*=" in function:
1361+
where = function.find("def")
1362+
function = function.split("\n")
1363+
function = "\n".join(x[where:] for x in function)
1364+
1365+
# Import all variables that need importing
1366+
import transformers.trainer
1367+
items_in_trainer = dir(transformers.trainer)
1368+
good_items = []
1369+
for item in items_in_trainer:
1370+
if item in function: good_items.append(item)
1371+
pass
1372+
exec("from transformers.trainer import (" + ", ".join(x for x in good_items) + ")", globals())
1373+
1374+
# Replace loss*= with loss = loss *
1375+
function = re.sub(
1376+
r"loss[\s]{0,}\*\=",
1377+
"loss = loss *",
1378+
function,
1379+
)
1380+
exec(function, globals())
1381+
Trainer.compute_loss = compute_loss
1382+
pass
13491383
Trainer._old_compute_loss = Trainer.compute_loss
13501384
Trainer.compute_loss = _unsloth_pre_compute_loss
13511385
pass

unsloth/models/llama.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,14 @@
2727
from unsloth_zoo.utils import Version, _get_dtype
2828
from unsloth_zoo.hf_utils import dtype_from_config, add_dtype_kwargs, fix_lora_auto_mapping
2929
from unsloth_zoo.peft_utils import SKIP_QUANTIZATION_MODULES
30-
from unsloth import DEVICE_TYPE, DEVICE_COUNT, DEVICE_TYPE_TORCH
30+
from ..device_type import (
31+
is_hip,
32+
get_device_type,
33+
DEVICE_TYPE,
34+
DEVICE_TYPE_TORCH,
35+
DEVICE_COUNT,
36+
ALLOW_PREQUANTIZED_MODELS,
37+
)
3138

3239
transformers_version = Version(transformers_version)
3340
# Transformers moved rotary embeddings out of all attention layers

unsloth/models/loader.py

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,14 @@
4545
pass
4646
from huggingface_hub import HfFileSystem
4747
import importlib.util
48+
from ..device_type import (
49+
is_hip,
50+
get_device_type,
51+
DEVICE_TYPE,
52+
DEVICE_TYPE_TORCH,
53+
DEVICE_COUNT,
54+
ALLOW_PREQUANTIZED_MODELS,
55+
)
4856

4957
# https://github.com/huggingface/transformers/pull/26037 allows 4 bit loading!
5058
from unsloth_zoo.utils import Version, _get_dtype
@@ -195,6 +203,12 @@ def from_pretrained(
195203
old_model_name = model_name
196204
if not use_exact_model_name:
197205
model_name = get_model_name(model_name, load_in_4bit)
206+
# Check if pre-quantized models are allowed
207+
# For eg AMD GPUs need blocksize = 128, but our pre-quants are blocksize = 64
208+
if not ALLOW_PREQUANTIZED_MODELS and model_name.endswith(("-unsloth-bnb-4bit", "-bnb-4bit")):
209+
model_name = model_name.removesuffix("-unsloth-bnb-4bit")
210+
model_name = model_name.removesuffix("-bnb-4bit")
211+
pass
198212

199213
if USE_MODELSCOPE and not os.path.exists(model_name):
200214
from modelscope import snapshot_download
@@ -306,6 +320,12 @@ def from_pretrained(
306320
model_name = peft_config.base_model_name_or_path
307321
if not use_exact_model_name:
308322
model_name = get_model_name(model_name, load_in_4bit)
323+
# Check if pre-quantized models are allowed
324+
# For eg AMD GPUs need blocksize = 128, but our pre-quants are blocksize = 64
325+
if not ALLOW_PREQUANTIZED_MODELS and model_name.endswith(("-unsloth-bnb-4bit", "-bnb-4bit")):
326+
model_name = model_name.removesuffix("-unsloth-bnb-4bit")
327+
model_name = model_name.removesuffix("-bnb-4bit")
328+
pass
309329
model_config = AutoConfig.from_pretrained(
310330
model_name,
311331
token = token,
@@ -618,6 +638,12 @@ def from_pretrained(
618638
old_model_name = model_name
619639
if not use_exact_model_name:
620640
model_name = get_model_name(model_name, load_in_4bit)
641+
# Check if pre-quantized models are allowed
642+
# For eg AMD GPUs need blocksize = 128, but our pre-quants are blocksize = 64
643+
if not ALLOW_PREQUANTIZED_MODELS and model_name.endswith(("-unsloth-bnb-4bit", "-bnb-4bit")):
644+
model_name = model_name.removesuffix("-unsloth-bnb-4bit")
645+
model_name = model_name.removesuffix("-bnb-4bit")
646+
pass
621647

622648
# Check modelscope
623649
if USE_MODELSCOPE and not os.path.exists(model_name):
@@ -833,7 +859,12 @@ def from_pretrained(
833859
model_name = peft_config.base_model_name_or_path
834860
if not use_exact_model_name:
835861
model_name = get_model_name(model_name, load_in_4bit)
836-
862+
# Check if pre-quantized models are allowed
863+
# For eg AMD GPUs need blocksize = 128, but our pre-quants are blocksize = 64
864+
if not ALLOW_PREQUANTIZED_MODELS and model_name.endswith(("-unsloth-bnb-4bit", "-bnb-4bit")):
865+
model_name = model_name.removesuffix("-unsloth-bnb-4bit")
866+
model_name = model_name.removesuffix("-bnb-4bit")
867+
pass
837868
model_config = AutoConfig.from_pretrained(
838869
model_name,
839870
token = token,

0 commit comments

Comments
 (0)