27 Oct 11:25

shimmyshimmer

874b262

October Release + Unsloth Docker! Latest

Latest

Hey everyone, please update Unsloth to use the latest updates! 🦥

Unsloth now has its own 🐋 Docker image! Start training with no setup: Read our Guide • Docker image
We collabed with NVIDIA for Blackwell and DGX Spark support. Read our Blackwell guide and DGX guide.

New model updates

Qwen3-VL models are all now supported: Blogpost • SFT 8B notebook • GRPO 8B notebook
IBM Granite-4.0 models are now supported. Granite-4.0 guide • Notebook
OpenAI showcased our new gpt-oss RL notebook for autonomously solving the 2048 game. Blogpost • Notebook
Read about our GLM-4.6 chat template fixes and how to run the model here

New features

Introducing Quantization-Aware Training: We collabed with Pytorch for QAT, recovering as much 70% accuracy. Read blog
Unsloth supports OpenEnv to allow for open RL environments. Blog coming soon • Notebook
New customer support agent notebook to enable real-time analysis & solving of customer interactions. You'll also learn how to train models using data from Google Sheets.
Support for Python 3.13, PyTorch 2.9 and the latest Hugging Face TRL and transformers are now fixed.
Save to TorchAO supported as well:

from torchao.quantization import Int4WeightOnlyConfig
model.save_pretrained_torchao("model", tokenizer, torchao_config = Int4WeightOnlyConfig())

Tip

Update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
If you want PyTorch 2.9: pip install --upgrade unsloth unsloth_zoo

RL Improvements

Fixed Standby consuming more VRAM than usual. Auto selects the maximum 80% to 95% of GPU utilization if import os; os.environ["UNSLOTH_VLLM_STANDBY"] = "1" is used.
Fixed GRPO training hangs with better environment timers - works on DGX Spark and all other GPUs.
Fixes GRPO RuntimeError: shape '[1, 887, 1, 128]' is invalid for input of size 3633152 for all models

RL Environment functions

New execute_with_time_limit function to force functions to execute within a time limit. E.g. with a 2 second time limit, use:

from unsloth import execute_with_time_limit
@execute_with_time_limit(2)
def execute_strategy(strategy, game):
    return _execute_strategy(strategy, game)
try:
    execute_strategy(strategy, game)
except TimeoutError as e:
    print(f"Timed out with error = {str(e)}")

To check if only Python standard modules are used in a function, use check_python_modules.
Use create_locked_down_function to create a function without leakage of global variables.
Use Benchmarker ie from unsloth import Benchmarker to benchmark functions accurately. It wipes the L1 to L3 cache approximately to reduce chances of benchmark cheating.
Use launch_openenv to launch a continuous reloaded OpenEnv environment process (to stop it from closing down) ie from unsloth import launch_openenv It will auto find a port that is not used.

Bug fixes

GPT-OSS BF16 The GPTOSSRouter works with load_in_4bit = True AttributeError: 'GptOssTopKRouter' object has no attribute 'weight'
Mistral training fixed - sentencepiece proto issue fixed (any protobuf version works)
Fix evaluation ie UNSLOTH_RETURN_LOGITS="1" works. Fixes #3126 #3071
Fixes Output 0 of UnslothFusedLossBackward is a view and is being modified inplace. for Gemma 3 and transformers>=4.57.1
If you see ImportError: cannot import name '_Ink' from 'PIL._typing' (/usr/local/lib/python3.12/dist-packages/PIL/_typing.py) please update and use our new notebooks

Don't forget to also join our Reddit: r/unsloth 🥰

What's Changed

Fix loading as 8bit by @Etherll in #3384
Nightly by @danielhanchen in #3392
Nightly by @danielhanchen in #3394
Update int8-int4 QAT config to use Int8DynamicActivationIntxWeightConfig by @metascroy in #3391
Gemma 3 bug fixes by @danielhanchen in #3410
Transformers Fix v4.57 rename from PretrainedConfig to PreTrainedConfig by @mmathew23 in #3445
improve qat by @Etherll in #3446
Fix eval metric issue by @pluesclues in #3420
[Part2] Reinstate llama.cpp Compatibility and GGUF Conversion with Multiple Quantizations and Automated Ollama Modelfile Creation by @rolandtannous in #3356
vLLM FP8 quantized support for SFT/GRPO by @Datta0 in #3414
Fix by @danielhanchen in #3466
AMD fixes by @danielhanchen in #3467
Fix transformers 4.57.1 by @danielhanchen in #3473
GRPO bug fixes by @danielhanchen in #3474
EOL LF (unix line endings) normalization by @djsaunde in #3478
Fix out of resources issue for llama3.2 sft on amd gpu by @wangxunx in #3455
Bug fixes by @danielhanchen in #3483
Bug fixes by @danielhanchen in #3484
Patch sleep mode properly for trl by @Datta0 in #3492
Sleep trl patch by @Datta0 in #3494
fix cross entropy loss issue for small vocab size on amd gpu by @wangxunx in #3503
Gemma 3n fix by @mmathew23 in #3499
enable intel for torch2.8 by @leizhenyuan in #3381
add code for intel qlora by @leizhenyuan in #3370
fix for intel memory calculation by @leizhenyuan in #3513
[intel] enable support 2.9 for intel xpu by @leizhenyuan in #3514
FP8 training enhancements by @Datta0 in #3496

New Contributors

@metascroy made their first contribution in #3391
@djsaunde made their first contribution in #3478
@wangxunx made their first contribution in #3455

Full Changelog: September-2025-v3...October-2025

Contributors

djsaunde, mmathew23, and 8 other contributors

Assets 2

0 Join discussion

26 Sep 15:24

shimmyshimmer

September-2025-v3

39f006b

gpt-oss Reinforcement Learning + Auto Kernel Notebook

We’re introducing gpt-oss RL support and the fastest RL inference and lowest VRAM use vs. any implementation. Blog: https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning

Unsloth now offers the fastest inference (~3x faster), lowest VRAM (50% less) and most context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy loss.
Since RL on gpt-oss isn't yet vLLM compatible, we rewrote Transformers inference code to enable faster inference
gpt-oss-20b GSPO free Colab notebook
This notebook automatically creates faster matrix multiplication kernels and uses a new Unsloth reward function. We also show how to counteract reward-hacking which is one of RL's biggest challenges.

We previously released Vision RL with GSPO support
⚠️ Reminder to NOT use Flash Attention 3 for gpt-oss as it'll make your training loss wrong.
DeepSeek-V3.1-Terminus is here and you can run locally via our GGUF
Read how our 3-bit GGUF beats Claude-4-Opus (thinking) on Aider Polyglot here
Magistral 1.2 is here and you can run it locally here or fine-tune it for free by using our Kaggle notebook
Fine-tuning the new Qwen3 models including Qwen3-VL, Qwen3-Omni and Qwen3-Next should work in Unsloth if you install the latest transformers. The models are big however so ensure you have enough VRAM.
BERT is now fixed! Feel free to use our BERT fine-tuning notebook

Don't forget to also join our Reddit: r/unsloth 🥰

What's Changed

Bug fixes by @danielhanchen in #3329
Fix QAT + LoRA fast path, add tests by @andrewor14 in #3307
Use gemma3n embedder patch + adjust FORCE_FLOAT32 match logic by @mmathew23 in #3332
Synthetic Data updates by @mmathew23 in #3333
Fix loading issues for BERT by @Etherll in #3339
Bug fixes by @danielhanchen in #3335
peft_config before model_config by @mmathew23 in #3342
specify different tokenizer_path/name by @mmathew23 in #3343
correct python support statement by @laz-001 in #3374
GPT OSS RL by @danielhanchen in #3362

New Contributors

@laz-001 made their first contribution in #3374

Full Changelog: September-2025-v2...September-2025-v3

Contributors

andrewor14, mmathew23, and 3 other contributors

Assets 2

16 Sep 16:14

shimmyshimmer

September-2025-v2

2248402

Vision Reinforcement Learning + Memory Efficient RL

We're excited to support Vision models for RL and even more memory efficient + faster RL!

Unsloth now supports vision/multimodal RL with Gemma 3, Qwen2.5-VL and other vision models. Due to Unsloth's unique weight sharing and custom kernels, Unsloth makes VLM RL 1.5–2× faster, uses 90% less VRAM, and enables 10× longer context lengths than FA2 setups, with no accuracy loss. Qwen2.5-VL GSPO notebook
Gemma 3 (4B) Vision GSPO notebook

Full details in our blogpost: https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl

This update also introduces Qwen's GSPO algorithm.
Our new vision RL support also comes now even faster & more memory efficient! Our new kernels & algos allows faster RL for text and vision LLMs with 50% less VRAM & 10× more context.
Introducing a new RL feature called 'Standby'. Before, RL requires GPU splitting between training & inference. With Unsloth Standby, you no longer have to & 'Unsloth Standby' uniquely limits speed degradation compared to other implementations and sometimes makes training even faster! Read our Blog

We released Aider Polyglot benchmarks for our DeepSeek-V3.1 Dynamic GGUFs and Unsloth quants perform consistently better than others. Blog

Don't forget to also join our Reddit: r/unsloth 🥰

What's Changed

GPT OSS Bug fixes by @danielhanchen in #3231
tests for mxfp4 and quantized models merge fix unsloth zoo pr 254 by @rolandtannous in #3223
Update mistral.py, showed flag to not call cut cross entropy by @pluesclues in #3233
Remove old version constraint in dependency list by @timkpaine in #3237
chore: Fix Typos by @DefiWimar7 in #3246
Fix incorrect function call in test_qwen3_grpo.py by @stevenxdavis in #3212
[Intel] make intel device support ROPE by @leizhenyuan in #3164
Support saving locally in model.save_pretrained_torchao by @jerryzh168 in #3263
fixed save_pretrained_torchao and associated tests by @rolandtannous in #3264
patch sftrainer to disable _is_vlm by @mmathew23 in #3265
Bug fixes by @danielhanchen in #3266
Filter vllm executor log by @Datta0 in #3268
llama vision inference fix by @mmathew23 in #3270
Add TorchAO quantization tests with FP16 models and serialization workarounds by @rolandtannous in #3269
GptAttention turn training off during inference by @mmathew23 in #3289
Add support for QAT full fine-tuning by @andrewor14 in #3238
simplify unsloth_base_fast_generate by @mmathew23 in #3291
Bug fixes by @danielhanchen in #3295
[ROCm] add hip device path by @billishyahao in #3301
Bug fixes by @danielhanchen in #3322
Add support for modules_to_save in FastModel.get_peft_model by @l1ghtsource in #3317
Fast Inference with vLLM for VLMs by @Datta0 in #2975
TRL Updated version of VLM GRPO update along with GSPO by @pluesclues in #3132

New Contributors

@timkpaine made their first contribution in #3237
@stevenxdavis made their first contribution in #3212
@l1ghtsource made their first contribution in #3317

Full Changelog: August-2025-v2...September-2025-v2

Contributors

andrewor14, timkpaine, and 11 other contributors

Assets 2

28 Aug 16:38

shimmyshimmer

August-2025-v2

1748e47

Unsloth Flex Attention + Long context gpt-oss Training

We’re excited to introduce Unsloth Flex Attention support for OpenAI gpt-oss training that enables >8× longer context lengths, >50% less VRAM usage and >1.5× faster training compared to all implementations including those using Flash Attention 3 (FA3). Unsloth Flex Attention makes it possible to train with a 60K context length on just 80GB of VRAM for BF16 LoRA. Also:

You can now export/save your QLoRA fine-tuned gpt-oss model to llama.cpp, vLLM, or HF.
We fixed gpt-oss training losses going to infinity on float16 GPUs (like T4 Colab)
We fixed gpt-oss implementation issues, most notably ensuring that swiglu_limit = 7.0 is properly applied during MXFP4 inference in transformers
Unsloth Flex Attention scales with context, longer sequences yield bigger savings in both VRAM and training time

Full details in our blogpost: https://docs.unsloth.ai/basics/long-context-gpt-oss-training

What's Changed

Add Qwen3 Instruct / Thinking chat templates by @Etherll in #3110
Add Qwen3 4B to mapper.py by @Etherll in #3120
Nightly by @danielhanchen in #3148
Fix GPT OSS by @danielhanchen in #3154
Nightly by @danielhanchen in #3169
Update Blackwell install instructions for latest vLLM release by @qingy1337 in #3175
Fix potential generator exhaustion bug in model loading file detection by @rolandtannous in #3167
Fix vision model GGUF quantization_method error type by @rolandtannous in #3173
Replace back ticks with single quotes by @rnowling in #3157
Fix original_push_to_hub fallback by @Thiraput01 in #3115
Add support for QAT + LoRA by @andrewor14 in #2976
Bug fixes by @danielhanchen in #3180
Torch 2.8 by @danielhanchen in #3186
Fix extras transformers typo in pyproject.toml by @parth2510 in #3187
Bug fixes by @danielhanchen in #3195
allow torch.float32 dtype in FastLanguageModel by @mmathew23 in #3204
fix is casual for qwen3 by @leizhenyuan in #3213
Support model.save_pretrained_torchao by @jerryzh168 in #3111
Fix gemma-3n by @mmathew23 in #3219
Handle transformers move to dtype from torch_dtype by @mmathew23 in #3225
chore: Fix Typos by @DefiWimar7 in #3224

New Contributors

@rnowling made their first contribution in #3157
@Thiraput01 made their first contribution in #3115
@andrewor14 made their first contribution in #2976
@parth2510 made their first contribution in #3187
@jerryzh168 made their first contribution in #3111
@DefiWimar7 made their first contribution in #3224

Full Changelog: August-2025...August-2025-v2

Contributors

rnowling, andrewor14, and 10 other contributors

Assets 2

08 Aug 15:33

shimmyshimmer

August-2025

35ce422

gpt-oss Fine-tuning

gpt-oss is here! ✨

Finetune gpt-oss for free with our Unsloth Colab notebook!

We’ve managed to make gpt-oss train on just 14GB of VRAM, making it possible to work on free Colab due to our linear conversions. For more details, Read our Guide/Blogpost
Fine-tuning gpt-oss is 1.5x faster and uses 50% less VRAM with Unsloth. gpt-oss-120b model fits on 65GB of VRAM.
Model uploads: 20b GGUF • 120b GGUF • All uploads

🦥 Unsloth updates

We’ve made algorithmic updates to Unsloth so every model now trains faster and with less VRAM, no matter which.
Unsloth now works on RTX 50 and Blackwell GPUs. Read our guide.
Official Unsloth Docker image coming very soon!
You can now run Unsloth models directly via Docker: docker model pull hf.co/unsloth/gpt-oss-20b-GGUF

🌠 Qwen3-Coder + Qwen3-2507

Qwen made July, 2025 updates called 'Qwen3-2507' and launched their SOTA coding models!

Qwen3-Coder (with Unsloth fixes): Guide • Coder uploads
Qwen3-2507: Guide • 2507 uploads
Fine-tune Qwen3-4B-2507 with our Colab notebook

🔮 New models + Support:

Run these new models:

Kimi-K2: Guide • GGUF
GLM: 4.5-Air • 4.5 • 4-32B-0414
Orpheus-3B • Hunyuan-A13B

Unsloth also now supports running + training for:

We collabed with the Liquid & TII teams to support training for Falcon-H1-7B and LFM2-1.2B! Notebooks here
Devstral-2507 • Magistral-2507 • SmolLM3-3B

Don't forget to also join our Reddit: r/unsloth 🥰

What's Changed

Fix argument mismatch in GRPO _get_per_token_logps lambda function by @rolandtannous in #2929
patch falcon h1 inference by @mmathew23 in #2932
Fix falcon H1 dropout issue by @Datta0 in #2938
fix: change lora_dropout from int to float for type consistency by @muzzlol in #2949
GRPO fix dataloader_num_workers value error in GRPOTrainer by @rolandtannous in #2944
GRPO Fix - Support vllm pre-dequantized quantization states in fast_dequantize kernel by @rolandtannous in #2943
Bug fixes by @danielhanchen in #2982
Update unsloth-cli.py by @qgallouedec in #2985
use fastmodel falcon h1 by @mmathew23 in #2987
Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model merge error by @rolandtannous in #2986
Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model merge error" by @danielhanchen in #2988
Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized … by @danielhanchen in #2990
Bug fixes by @danielhanchen in #2998
Update README.md by @qgallouedec in #2991
Bug fixes by @danielhanchen in #3017
[bugs] fix for casual mask by @leizhenyuan in #3011
[intel] add for intel path for llama.py by @leizhenyuan in #3012
Fix Gemma 2 by @danielhanchen in #3024
falcon h1 force float32 when dtype is torch.float16 by @mmathew23 in #3026
Fix torch compile issues by @danielhanchen in #3028
Fix Llama and Gemma inference by @Erland366 in #3034
Fixup multi GPU workload. by @Datta0 in #3049
Bug Fixes and Enhancements for Model Loading by @Etherll in #3052
Add gemma-3n chat template to chat_templates.py by @Etherll in #3051
Fix: Added specific check for Gemma so models like BERT properly init… by @Sekinal in #3055
fixup rope sync for everything by @Datta0 in #3061
get_per_token_logps_and_entropies: return tuple instead of dict by @mmathew23 in #3080
Docs: Add WSL Installation Guide for Blackwell / RTX 5090 GPU by @dongbin-lunark in #3079
GPT-OSS support by @mmathew23 in #3099
Nightly by @danielhanchen in #3102
gpt-oss manually call temporary patch by @mmathew23 in #3104

New Contributors

@muzzlol made their first contribution in #2949
@Sekinal made their first contribution in #3055
@dongbin-lunark made their first contribution in #3079

Full Changelog: July-2025...August-2025

Contributors

mmathew23, danielhanchen, and 9 other contributors

Assets 2

0 Join discussion

10 Jul 14:35

danielhanchen

July-2025

856cf94

Less VRAM + bug fixes

More VRAM reduction, faster & bug fixes

Please update Unsloth! pip install --upgrade --force-reinstall --no-deps --no-cache-dir unsloth unsloth_zoo

Gemma 3N Vision now works and is fixed! Please re-download all model checkpoints (Unsloth will auto do it) Try Kaggle Notebook! There is also a challenge with a prize pool of $100,000!
Gemma 3 text and vision are all fixed for T4, and is much faster. Losses of 6 to 7 are now fixed - it should be 1 to 2.
10 to 25% less VRAM consumption for all models. Also faster compiling and less errors. Unsloth is now more stable!
Downloads stuck at 90% to 95% fixed!
Qwen 2.5, Qwen 2, GLM all fixed as well.
GRPO now works with latest main TRL
Main TRL, PEFT, Transformers all work
Forced upgrading transformers is now fixed.
Falcon H1 finetuning should work great! Notebooks incoming
Devstral 1.1 and MedGemma 27B, 4B support with vision
Many many many more bug fixes - this release of Unsloth should be much more stable and error tolerant!

Please update Unsloth! pip install --upgrade --force-reinstall --no-deps --no-cache-dir unsloth unsloth_zoo

What's Changed

Gemma 3N by @danielhanchen in #2809
Add instructions for installing unsloth on RTX 5090 by @jeromeku in #2812
Add falcon h1 by @dhiaEddineRhaiem in #2650
Granite4 support by @mmathew23 in #2799
import undefined transformers_version for falcon model by @mmathew23 in #2822
Fix LoftQ with FastBaseModel by @mehmetoguzderin in #2826
Create stale.yml by @danielhanchen in #2832
Create stale.yml by @danielhanchen in #2836
Added conda/mamba section to blackwell installation readme by @rolandtannous in #2817
Gemma 3N bug fixes by @danielhanchen in #2842
Fix loftq None config for FastBaseModel by @mmathew23 in #2848
Convert torch.bfloat16, torch.float16, etc. to vLLM valid dtypes by @rishabh135 in #2811
[Feature] enable unsloth on amd gpu by @billishyahao in #2520
Fix Gemma 3N by @danielhanchen in #2854
fix quantized model parameter count method by @rolandtannous in #2855
Update CSM for faster inference (no compile) by @mmathew23 in #2865
Fix UnslothTrainingArguments not patching trl.Config properly by @Erland366 in #2873
Fix unnecessary warning for transformers >= 4.53.0 by @mmathew23 in #2867
Update README.md by @danielhanchen in #2885
Many bug fixes by @danielhanchen in #2908
silenty skip falcon h1 import if transformers_version < 4.53.0 by @mmathew23 in #2912
Dynamically adjust get_per_token_logps [trl main upgrade] by @Datta0 in #2911
[Intel] add intel gpu with vllm support by @leizhenyuan in #2903
[bugs] fix for casual mask by @leizhenyuan in #2868
Explicitly check if xformers exists for attention by @Datta0 in #2889
Falcon H1: if mlp doesn't exist in layer module check for feed_forward by @mmathew23 in #2913
Move inputs to right devices. by @Datta0 in #2919
Many bug fixes by @danielhanchen in #2927

New Contributors

@dhiaEddineRhaiem made their first contribution in #2650
@mehmetoguzderin made their first contribution in #2826
@rishabh135 made their first contribution in #2811
@billishyahao made their first contribution in #2520

Full Changelog: June-2025...July-2025

Contributors

jeromeku, mehmetoguzderin, and 9 other contributors

Assets 2

26 Jun 16:25

danielhanchen

June-2025

c9bcf20

Gemma 3n + Text-to-speech (TTS)

✨ Gemma 3n now available

Google's new Gemma 3n multimodal models that support text, image, video & audio. Guide
Gemma 3n finetuning notebook + audio, vision, text inference Colab notebook
Gemma 3n collection in dynamic GGUF, safetensor 4-bit etc formats: Gemma-3n

🎵 Text-to-Speech (TTS) Fine-tuning

Train TTS/STT models like Sesame-CSM, Orpheus-TTS and OpenAI's Whisper locally! Guide
Clone voices, learn new emotions, tones & styles with 1.5x faster training and -50% VRAM. Notebooks

Tip

Update Unsloth via pip install --upgrade --force-reinstall unsloth unsloth_zoo

🧠 DeepSeek-R1-0528 Support with Dynamic 1-bit GGUFs

Fine-tune DeepSeek-R1-0528-Qwen3 with GRPO! Our new reward function increases multilingual response rates by 40%+ Notebook
Dynamic 1-bit GGUFs shrink the full 715GB model to just 175GB (-80% size)

📈 Dynamic 2.0 GGUFs

New quantization method that achieves SOTA performance. More info
Sets new benchmarks for 5-shot MMLU and KL Divergence and selectively quantizes layers for optimal accuracy

⚡ Advanced Qwen3 GRPO notebook

Proximity scoring for more better reward functions. Advanced GRPO notebook
New Prefinetuning/priming to skip GRPO format learning

🎯 Magistral Conversational Reasoning

Fine-tune Magistral-24B for advanced conversational reasoning. Notebook

👁️ Gemma3 Vision Support

Fine-tune Gemma3 vision models for multimodal tasks Notebook

Documentation & Guides

Reinforcement Learning Guide: Complete guide on RL for LLMs covering GRPO, RLHF, DPO. Guide
LoRA Hyperparameters Guide: Master optimal learning rates, epochs, LoRA rank & alpha settings. Guide

What's Changed

Nightly by @danielhanchen in #2448
Added k_norm & q_norm to merged Qwen3 layers by @cblomert in #2452
MoE Kernel by @jeromeku in #2465
Blackwell Support by @johnnynunez in #2458
Added missing code of conduct by @rolandtannous in #2416
Fix readme example by @yuanzhedong in #2492
the pixtral vision notebook fails during inference by @mmathew23 in #2466
[1/N] Enable intel GPU for unsloth by @leizhenyuan in #2350
[2/N] Enable intel GPU for unsloth by @leizhenyuan in #2388
vLLM Windows CUDA support [tested] by @fenglui in #2158
Add Sesame CSM by @mmathew23 in #2527
Add Qwen-3 chat template and Ollama template support by @kiankyars in #2537
Fix typos by @omahs in #2540
Add use_rslora reference to LoraConfig inititalisation by @jkumz in #2539
TTS by @danielhanchen in #2545
Quick fix on the CompileConfig error by @Erland366 in #2554
Fix trust remote code by @Etherll in #2357
fix issue with qwen3 template double quote escapes by @davedgd in #2563
Display the model name in RoPE scaling unsupported error by @emmanuel-ferdman in #2564
Fix Whisper, ModernBERT by @danielhanchen in #2565
fix: improved error handling when llama.cpp build fails #2358 by @Hansehart in #2603
Remove dataset_text_field from SFTConfig by @qgallouedec in #2609
Upgrade trl fix by @Datta0 in #2544
Check the skip_prepare_dataset before accessing dataset fields. #2496 by @Premik in #2633
Llama4 MoE Grouped GEMM by @jeromeku in #2639
Latest TRL, GRPO + Bug fixes by @danielhanchen in #2645
Fix SFTtraining for new trl by @mmathew23 in #2647
Bug fixes by @danielhanchen in #2651
Fix quant model param fetch regex by @Datta0 in #2662
Fix batched generation for prompts of different lengths by @RunFMe in #2216
reroute merge logic language models + comprehensive tests + eval kits by @rolandtannous in #2673
unsloth checkpointing fix for latest transformers==4.52.x by @mmathew23 in #2674
patch sft_trainer to favor max_seq_length over max_length in config by @mmathew23 in #2669
Update prepare 4d causal attention call by @mmathew23 in #2678
Ignore None Values when building vllm subprocess_command by @Salpingopharyngeus in #2680
add support for torch270 with Intel GPU by @leizhenyuan in #2709
Making protobuf version more flexible by @user799595 in #2637
tests for additional merge fix unsloth zoo pr 163 by @rolandtannous in #2719
Reward modeling update (There seems to be another patch) by @pluesclues in #2710
Fix Typos in Documentation and Comments by @leopardracer in #2721
Fix renaming on other model than Llama by @Erland366 in #2762
Enable vLLM to share memory space by @Datta0 in #2712
Fix TRL 1.8.2 by @marcandrelarochelle in #2774
Fix AttributeError in GRPO trainer for models without llm attribute by @rolandtannous in #2780
Additional tests for unsloth-zoo PR#174 by @rolandtannous in #2779
Update pyproject.toml by @amrothemich in #2778
Fix for grpo_compute_loss_slow by @simpissa in #2702
Fix GRPO by @danielhanchen in #2787
Docs: Fix typo and improve MoE docstrings by @kilavvy in #2784
[5/N] Enable intel GPU for unsloth by @leizhenyuan in #2768
Sequence Classification Bug Fixes by @pluesclues in #2793
intel 5/N fix patch by @mmathew23 in #2792
[3/N] Enable intel GPU for unsloth by @leizhenyuan in #2620
[4/N] Enable intel GPU for unsloth by @mmathew23 in #2801
[intel] use DeviceProperties instead of torch.xxx.deviceproperties by @leizhenyuan in #2803
Fix grpo sleep regex and indentation by @Datta0 in #2804
Bug fixes by @danielhanchen in #2805
Bug fixes by @danielhanchen in #2807

New Contributors

@cblomert made their first contribution in #2452
@johnnynunez made their first contribution in #2458
@rolandtannous made their first contribution in #2416
@yuanzhedong made their first contribution in #2492
@mmathew23 made their first contribution in #2466
@leizhenyuan made their first contribution in #2350
@fenglui made their first contribution in #2158
@kiankyars made their first contribution in #2537
@omahs made their first contribution in #2540
@jkumz made their first contribution in #2539
@davedgd made their first contribution in #2563
@emmanuel-ferdman made their first contribution in https://github.com/unslothai/u...

Contributors

fenglui, cblomert, and 27 other contributors

Assets 2

0 Join discussion

02 May 16:13

danielhanchen

May-2025

4f3fe1b

Qwen3

Qwen 3 support + bug fixes

Please update Unsloth via pip install --upgrade --force-reinstall unsloth unsloth_zoo

Qwen3 notebook: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(14B)-Reasoning-Conversational.ipynb
GRPO with Qwen3 notebook: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb

There are also many bug fixes in this release!

The 30B MoE is also fine-tunable in Unsloth!

from unsloth import FastModel
import torch
model, tokenizer = FastModel.from_pretrained(
    model_name = "unsloth/Qwen3-30B-A3B",
    max_seq_length = 2048, # Choose any for long context!
    load_in_4bit = True,  # 4 bit quantization to reduce memory
    load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory
    full_finetuning = False, # [NEW!] We have full finetuning now!
    # token = "hf_...", # use one if using gated models
)

What's Changed

GGUF saving by @danielhanchen in #2017
Gemma 3 readme by @danielhanchen in #2019
Update README.md by @danielhanchen in #2028
bug fix #2008 - load_in_4bit = True + fast_inference = True by @void-mckenzie in #2039
unsloth_fast_generate model is not defined fix by @KareemMusleh in #2051
Ensure trust_remote_code propagates down to unsloth_compile_transformers by @CuppaXanax in #2075
Show peft_error by @IsaacBreen in #2080
Add generation prompt error message change by @KareemMusleh in #2046
Many bug fixes by @danielhanchen in #2087
fix: config.torch_dtype in LlamaModel_fast_forward_inference by @lurf21 in #2091
Updating new FFT 8bit support by @shimmyshimmer in #2110
Bug fixes by @danielhanchen in #2113
Small fix by @danielhanchen in #2114
fix(utils): add missing importlib import to fix NameError by @naliazheli in #2134
Add QLoRA Train and Merge16bit Test by @jeromeku in #2130
Fix Transformers 4.45 by @danielhanchen in #2151
Bug Fixes by @danielhanchen in #2197
Issues templates by @jeromeku in #2242
Fix feature_request ISSUE_TEMPLATE by @jeromeku in #2250
Registry refactor by @jeromeku in #2255
Update README.md by @Kimizhao in #2267
Update README.md by @jackswl in #2119
Update bug_report.md by @shimmyshimmer in #2323
feat: Support custom auto_model for wider model compatibility (Whisper, Bert,etc) & attn_implementation support by @Etherll in #2263
fix: improved error handling when llama.cpp build fails by @Hansehart in #2358
Revert "fix: improved error handling when llama.cpp build fails" by @shimmyshimmer in #2375
Fix saving 4bit for VLM by @Erland366 in #2381
[WIP] Initial support for Qwen3. Will udpate when the model is released by @Datta0 in #2211
Fixup qwen3 by @Datta0 in #2423
Fixup qwen3 qk norm by @Datta0 in #2427
Qwen3 inference fixes by @Datta0 in #2436
Update mapper.py to add Qwen3 base by @Etherll in #2439
Qwen 3, Bug Fixes by @danielhanchen in #2445

New Contributors

@void-mckenzie made their first contribution in #2039
@CuppaXanax made their first contribution in #2075
@IsaacBreen made their first contribution in #2080
@lurf21 made their first contribution in #2091
@naliazheli made their first contribution in #2134
@jeromeku made their first contribution in #2130
@Kimizhao made their first contribution in #2267
@jackswl made their first contribution in #2119
@Etherll made their first contribution in #2263
@Hansehart made their first contribution in #2358

Full Changelog: 2025-03...May-2025

Contributors

Kimizhao, jeromeku, and 13 other contributors

Assets 2

14 Mar 15:58

danielhanchen

2025-03

3c3d9b3

Gemma 3 + FFT Support

March Release 🦥

Get the latest stable Unsloth via:

pip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zoo

The March release should be stable - you can force the version via:

pip install "unsloth==2025.3.18" "unsloth_zoo==2025.3.16"

New Features

Read all details here: https://unsloth.ai/blog/gemma3
Gemma 3 1B, 4B, 12B and 27B finetuning all work now! Colab Notebook We fixed some issues which caused Gemma 3 training loss to be very high. This includes some tokenization issues so fine-tuning Gemma 3 will now work correctly if you use Unsloth.
We also encountered many infinite gradients during Gemma 3 (1B to 27B) finetuning. We found float16 mixed precision (Tesla T4, RTX 2080 series) to not function well, and we defaulted to float32 precision. Float16 also failed on A100, so this is a hardware agnostic issue. Bfloat16 is fine though! Unsloth auto selects the best data-type! You do not have to do anything! Colab Notebook to finetune Gemma 3
Preliminary support for full-finetuning and 8bit finetuning - set full_finetuning = True or load_in_8bit = True Both will be optimized further in the future! A reminder you will need more powerful GPUs!

model, tokenizer = FastModel.from_pretrained(
    model_name = "unsloth/gemma-3-4B-it",
    max_seq_length = 2048, # Choose any for long context!
    load_in_4bit = True,  # 4 bit quantization to reduce memory
    load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory
    full_finetuning = False, # [NEW!] We have full finetuning now!
    # token = "hf_...", # use one if using gated models
)

New Unsloth Auto Model support - nearly all models are now supported! We now supports vision and text models out of the box, without the need for custom implementations (and all are optimized!)
Mixtral (yes finally!), Gemma 3, Granite 3.2, Cohere, OLMo, Reka, and generally any vision or language model! There might be some occasional models which don't work!

model, tokenizer = FastModel.from_pretrained(
    model_name = "mistralai/Mixtral-8x7B-Instruct-v0.1",
)

Windows support via pip install unsloth should function now! Utilizes https://pypi.org/project/triton-windows/ which provides a pip installable path for Triton. Use:

pip install unsloth

Train on completions / responses only for vision models supported! Use it like below:

data_collator = UnslothVisionDataCollator(
    model,
    tokenizer,
    train_on_responses_only = False,
    instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
    response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)
SFTTrainer(..., data_collator = data_collator)

Conversions to llama.cpp GGUFs for 16bit and 8bit now DO NOT need compiling! This solves many many issues, and this means no need to install GCC, Microsoft Visual Studio etc!

model.save_pretrained_merged("gemma-3-finetune", tokenizer)
model.save_pretrained_gguf(
    "gemma-3-finetune",
    quantization_type = "Q8_0", # For now only Q8_0, BF16, F16 supported
)

Vision models now auto resize images which stops OOMs and also allows truncating sequence lengths!
Many multiple optimizations in Unsloth allowing a further +10% less VRAM usage, and >10% speedup boost for 4bit (on top of our original 2x faster, 70% less memory usage). 8bit and full finetuning also benefit!
GRPO in Unsloth now allows non Unsloth uploaded models to be in 4bit as well - reduces VRAM usage a lot! (ie pretend your own finetune of Llama)
New training logs and infos - training parameter counts, total batch size
Vision models now also work for normal text training! This means non vision notebooks can work with vision models!
Complete gradient accumulation bug fix coverage for all models!
GRPO notebook for Gemma 3 coming soon with Hugging Face's reasoning course!
DoRA, Dropout, and other PEFT methods should just work!

Bug fixes

Faster and less error prone streamlined finetuning experience! Apologies for the recent issues with constant releases and breaking breaks - the March release should be stable! Ie pip install "unsloth==2025.3.14" "unsloth_zoo==2025.3.12"
Pixtral and Llava finetuning are now fixed! In fact nearly all vision models are supported out of the box! Please update transformers for Pixtral: pip install --no-deps git+https://github.com/huggingface/transformers.git
Fixed all Colabs not working - cloud instances like Runpod should just work now!
Fixed many many bugs - will reply to each issue with updates!

Other items

GRPO Bug fixes by @danielhanchen in #1623
Fixes Triton url in README.md by @DiogoNeves in #1607
Update README.md by @shimmyshimmer in #1654
Update README.md by @shimmyshimmer in #1688
Fix bugs by @danielhanchen in #1701
Fix bugs by @danielhanchen in #1706
Memory efficient GRPO, DPO etc by @danielhanchen in #1716
Add GRPO metrics by @danielhanchen in #1718
llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) by @everythingisc00l in #1649
Update rl_replacements.py by @SethHWeidman in #1754
Update README.md by @danielhanchen in #1768
fix an import error by @NinoRisteski in #1767
Gemma Mask convert to float by @Erland366 in #1762
[Windows Support] Add latest xformers wheels to pyproject.toml by @versipellis in #1753
Memory Efficient GRPO by @danielhanchen in #1773
Bug Fixes by @danielhanchen in #1774
Export Model to ollama.com by @gjyotin305 in #1648
Fix: GRPO with Mistral and importing by @oKatanaaa in #1831
Fix key error in GRPOTrainer by @le-big-mac in #1818
fixed syntax warnings by @KareemMusleh in #1522
Direct windows support for unsloth by @adityaghai07 in #1841
Fix Layernorm when num_cols not a power of 2 by @MekkCyber in #1867
Added Python version warning to Windows Install Section by @areebuzair in #1872
Update README.md by @shimmyshimmer in #1885
Bug fixes by @danielhanchen in #1891
Many bug fixes by @danielhanchen in #1900
Logits fixes by @danielhanchen in #1916
Bug fixes by @danielhanchen in #1920
Bug fixes by @danielhanchen in #1951
move use_modelscope to _utils by @KareemMusleh in #1938
Don't use revision when loading model_config and is_peft=True by @wiwu2390 in #1949
More syntax warnings by @KareemMusleh in #1944
Gemma 3 by @danielhanchen in #1986
Gemma 3 bug fixes by @danielhanchen in #2005
Triton windows update by @Captain-T2004 in #1976
Update RMS LayerNorm implementation, and list compr. change in chat templates by @NinoRisteski in #1974
Gemma 3, bug fixes by @danielhanchen in #2014

New Contributors

@DiogoNeves made their first contribution in #1607
@everythingisc00l made their first contribution in #1649
@SethHWeidman made their first contribution in #1754
@versipellis made their first contribution in #1753
@gjyotin305 made their first contribution in #1648
@le-big-mac made their first contribution in #1818
@MekkCyber made their first contribution in #1867
@areebuzair made their first contribution in #1872
@wiwu2390 made their first contribution in #1949
@Captain-T2004 made their first contribution in #1976

Full Changelog: 2025-02...2025-03

Contributors

DiogoNeves, versipellis, and 15 other contributors

Assets 2

5 Join discussion

20 Feb 18:51

danielhanchen

2025-02-v2

adbe38e

Long Context GRPO

90% less memory usage GRPO

Update Unsloth via `pip install --upgrade --no-cache-dir unsloth unsloth_zoo`

More details in blog post: https://unsloth.ai/blog/grpo

Llama 3.1 8B GRPO Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb

Metric	Unsloth	TRL + FA2
Training Memory Cost (GB)	42GB	414GB
GRPO Memory Cost (GB)	9.8GB	78.3GB
Inference Cost (GB)	0GB	16GB
Inference KV Cache for 20K context (GB)	2.5GB	2.5GB
Total Memory Usage	54.3GB (90% less)	510.8GB

You automatically get 90% less memory usage! Also all reward logs for individual reward functions will show up.

Script to run GRPO:

!pip install unsloth vllm
from unsloth import FastLanguageModel
import torch
max_seq_length = 1024 # Can increase for longer reasoning traces
lora_rank = 32 # Larger rank = smarter, but slower

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "meta-llama/meta-Llama-3.1-8B-Instruct",
    max_seq_length = max_seq_length,
    load_in_4bit = True, # False for LoRA 16bit
    fast_inference = True, # Enable vLLM fast inference
    max_lora_rank = lora_rank,
    gpu_memory_utilization = 0.6, # Reduce if out of memory
)

model = FastLanguageModel.get_peft_model(
    model,
    r = lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ], # Remove QKVO if out of memory
    lora_alpha = lora_rank,
    use_gradient_checkpointing = "unsloth", # Enable long context finetuning
    random_state = 3407,
)
import re
from datasets import load_dataset, Dataset
global COUNTER
COUNTER = 0
global PRINT_EVERY
PRINT_EVERY = 20

# Load and prep dataset
SYSTEM_PROMPT = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""

XML_COT_FORMAT = """\
<reasoning>
{reasoning}
</reasoning>
<answer>
{answer}
</answer>
"""

def extract_xml_answer(text: str) -> str:
    answer = text.split("<answer>")[-1]
    answer = answer.split("</answer>")[0]
    return answer.strip()

def extract_hash_answer(text: str) -> str | None:
    if "####" not in text:
        return None
    return text.split("####")[1].strip()

# uncomment middle messages for 1-shot prompting
def get_gsm8k_questions(split = "train") -> Dataset:
    data = load_dataset('openai/gsm8k', 'main')[split] # type: ignore
    data = data.map(lambda x: { # type: ignore
        'prompt': [
            {'role': 'system', 'content': SYSTEM_PROMPT},
            {'role': 'user', 'content': x['question']}
        ],
        'answer': extract_hash_answer(x['answer'])
    }) # type: ignore
    return data # type: ignore

dataset = get_gsm8k_questions()

# Reward functions
def correctness_reward_func(prompts, completions, answer, **kwargs) -> list[float]:
    responses = [completion[0]['content'] for completion in completions]
    q = prompts[0][-1]['content']
    extracted_responses = [extract_xml_answer(r) for r in responses]
    global COUNTER
    if COUNTER % PRINT_EVERY == 0:
        print('-'*20, f"Question:\n{q}", f"\nAnswer:\n{answer[0]}", f"\nResponse:\n{responses[0]}", f"\nExtracted:\n{extracted_responses[0]}")
    COUNTER += 1
    return [2.0 if r == a else 0.0 for r, a in zip(extracted_responses, answer)]

def int_reward_func(completions, **kwargs) -> list[float]:
    responses = [completion[0]['content'] for completion in completions]
    extracted_responses = [extract_xml_answer(r) for r in responses]
    return [0.5 if r.isdigit() else 0.0 for r in extracted_responses]

def strict_format_reward_func(completions, **kwargs) -> list[float]:
    """Reward function that checks if the completion has a specific format."""
    pattern = r"^<reasoning>\n.*?\n</reasoning>\n<answer>\n.*?\n</answer>\n$"
    responses = [completion[0]["content"] for completion in completions]
    matches = [re.match(pattern, r) for r in responses]
    return [0.5 if match else 0.0 for match in matches]

def soft_format_reward_func(completions, **kwargs) -> list[float]:
    """Reward function that checks if the completion has a specific format."""
    pattern = r"<reasoning>.*?</reasoning>\s*<answer>.*?</answer>"
    responses = [completion[0]["content"] for completion in completions]
    matches = [re.match(pattern, r) for r in responses]
    return [0.5 if match else 0.0 for match in matches]

def count_xml(text) -> float:
    count = 0.0
    if text.count("<reasoning>\n") == 1:
        count += 0.125
    if text.count("\n</reasoning>\n") == 1:
        count += 0.125
    if text.count("\n<answer>\n") == 1:
        count += 0.125
        count -= len(text.split("\n</answer>\n")[-1])*0.001
    if text.count("\n</answer>") == 1:
        count += 0.125
        count -= (len(text.split("\n</answer>")[-1]) - 1)*0.001
    return count

def xmlcount_reward_func(completions, **kwargs) -> list[float]:
    contents = [completion[0]["content"] for completion in completions]
    return [count_xml(c) for c in contents]

max_prompt_length = 256
from trl import GRPOConfig, GRPOTrainer

# Optional extra params for vLLM
from unsloth import vLLMSamplingParams
vllm_sampling_params = vLLMSamplingParams(
    min_p = 0.01,
    seed = 3407,
)
training_args = GRPOConfig(
    learning_rate = 5e-6,
    warmup_ratio = 0.1,
    lr_scheduler_type = "cosine",
    optim = "adamw_8bit",
    per_device_train_batch_size = 1,
    gradient_accumulation_steps = 1, # Increase to 4 for smoother training
    num_generations = 6, # Decrease if out of memory
    max_prompt_length = max_prompt_length,
    max_completion_length = max_seq_length - max_prompt_length,
    # num_train_epochs = 1, # Set to 1 for a full training run
    max_steps = 250,
    report_to = "none", # Can use Weights & Biases
    vllm_sampling_params = vllm_sampling_params, # Optional
    temperature = 1.0,
)
trainer = GRPOTrainer(
    model = model,
    processing_class = tokenizer,
    reward_funcs = [
        xmlcount_reward_func,
        soft_format_reward_func,
        strict_format_reward_func,
        int_reward_func,
        correctness_reward_func,
    ],
    args = training_args,
    train_dataset = dataset,
)
trainer.train()

What's Changed

GRPO Bug fixes by @danielhanchen in #1623
Fixes Triton url in README.md by @DiogoNeves in #1607
Update README.md by @shimmyshimmer in #1654
Update README.md by @shimmyshimmer in #1688
Fix bugs by @danielhanchen in #1701
Fix bugs by @danielhanchen in #1706
Memory efficient GRPO, DPO etc by @danielhanchen in #1716
Add GRPO metrics by @danielhanchen in #1718
llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) by @everythingisc00l in #1649
Update rl_replacements.py by @SethHWeidman in #1754
Update README.md by @danielhanchen in #1768
fix an import error by @NinoRisteski in #1767
Gemma Mask convert to float by @Erland366 in #1762
[Windows Support] Add latest xformers wheels to pyproject.toml by @versipellis in #1753
Memory Efficient GRPO by @danielhanchen in #1773

New Contributors

@DiogoNeves made their first contribution in #1607
@everythingisc00l made their first contribution in #1649
@SethHWeidman made their first contribution in #1754
@versipellis made their first contribution in #1753

Full Changelog: 2025-02...2025-02-v2

Contributors

DiogoNeves, versipellis, and 6 other contributors

Assets 2

Uh oh!

Releases: unslothai/unsloth

October Release + Unsloth Docker!

New model updates

New features

RL Improvements

RL Environment functions

Bug fixes

What's Changed

New Contributors

Contributors

Uh oh!

gpt-oss Reinforcement Learning + Auto Kernel Notebook

What's Changed

New Contributors

Contributors

Uh oh!

Vision Reinforcement Learning + Memory Efficient RL

What's Changed

New Contributors

Contributors

Uh oh!

Unsloth Flex Attention + Long context gpt-oss Training

What's Changed

New Contributors

Contributors

Uh oh!

gpt-oss Fine-tuning

gpt-oss is here! ✨

🦥 Unsloth updates

🌠 Qwen3-Coder + Qwen3-2507

🔮 New models + Support:

What's Changed

New Contributors

Contributors

Uh oh!

Less VRAM + bug fixes

More VRAM reduction, faster & bug fixes

What's Changed

New Contributors

Contributors

Uh oh!

Gemma 3n + Text-to-speech (TTS)

✨ Gemma 3n now available

🎵 Text-to-Speech (TTS) Fine-tuning

🧠 DeepSeek-R1-0528 Support with Dynamic 1-bit GGUFs

📈 Dynamic 2.0 GGUFs

⚡ Advanced Qwen3 GRPO notebook

🎯 Magistral Conversational Reasoning

👁️ Gemma3 Vision Support

Documentation & Guides

What's Changed

New Contributors

Contributors

Uh oh!

Qwen3

Qwen 3 support + bug fixes

What's Changed

New Contributors

Contributors

Uh oh!

Gemma 3 + FFT Support

March Release 🦥

New Features

Bug fixes

Other items

New Contributors

Contributors

Uh oh!

Long Context GRPO

90% less memory usage GRPO

Update Unsloth via pip install --upgrade --no-cache-dir unsloth unsloth_zoo

What's Changed

New Contributors

Contributors

Uh oh!

Update Unsloth via `pip install --upgrade --no-cache-dir unsloth unsloth_zoo`