-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Open
Labels
Description
-
Did you update?
pip install --upgrade unsloth unsloth_zoo
Using unsloth and unsloth-zoo from Docker -
ColaborKaggleor local / cloud
AWS EC2 -
Number GPUs used, use
nvidia-smi
1 GPU,
Tue Nov 4 04:24:03 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L40S On | 00000000:30:00.0 Off | 0 |
| N/A 41C P0 79W / 350W | 23177MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 681 C /opt/conda/bin/python3 23166MiB |
+-----------------------------------------------------------------------------------------+
- Which notebook? Please link!
Custom notebook. Relevant code:
from trl import SFTTrainer, SFTConfig
from evaluate import load
import numpy as np
import torch # ADDED: Missing import
from transformers import DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported
# Load the metrics from Hugging Face's evaluate library
bleu = load("bleu")
chrf = load("chrf")
wer = load("wer")
cer = load("cer")
def preprocess_logits_for_metrics(logits, labels):
"""Convert logits to predicted token IDs"""
if isinstance(logits, tuple):
logits = logits[0] # Handle tuple outputs
pred_ids = torch.argmax(logits, dim=-1)
return pred_ids, labels
def compute_metrics(p):
"""Compute evaluation metrics including BLEU, CHRF, WER, and CER"""
(preds, labels), _ = p
del _
# Replace -100 padding tokens with pad_token_id for proper decoding
labels = np.where(labels == -100, tokenizer.pad_token_id, labels)
preds = np.where(preds == -100, tokenizer.pad_token_id, preds)
try:
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
except Exception as e:
print(f"Error during decoding predictions: {e}")
raise e
try:
decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
except Exception as e:
print(f"Error during decoding labels: {e}")
raise e
# For BLEU/CHRF, references should be a list of lists
decoded_labels_bleu = [[label] for label in decoded_labels]
# Compute metrics
bleu_score = bleu.compute(predictions=decoded_preds, references=decoded_labels_bleu)
chrf_score = chrf.compute(predictions=decoded_preds, references=decoded_labels_bleu)
chrfpp_score = chrf.compute(predictions=decoded_preds, references=decoded_labels_bleu, word_order=2)
wer_score = wer.compute(predictions=decoded_preds, references=decoded_labels)
cer_score = cer.compute(predictions=decoded_preds, references=decoded_labels)
metrics = {
"bleu": bleu_score["bleu"],
"chrf": chrf_score["score"],
"chrf++": chrfpp_score["score"],
"wer": wer_score,
"cer": cer_score,
}
return metrics
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=poisoned_training_dataset,
eval_dataset=poisoned_test_dataset,
dataset_text_field="text",
max_seq_length=max_seq_length,
data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer),
dataset_num_proc=2,
packing=False,
compute_metrics=compute_metrics, # ENABLED: Uncommented
preprocess_logits_for_metrics=preprocess_logits_for_metrics,
args=SFTConfig(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=5,
num_train_epochs=1,
learning_rate=2e-4,
fp16=not is_bfloat16_supported(),
bf16=is_bfloat16_supported(),
logging_steps=1,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
output_dir="outputs",
report_to="none",
save_strategy="steps",
# Evaluation settings (ENABLED and OPTIMIZED)
eval_strategy="steps", # ENABLED: Evaluate during training
eval_steps=10, # ENABLED: Evaluate every 10 steps
per_device_eval_batch_size=1, # ENABLED: Lower than training to avoid OOM
eval_accumulation_steps=2, # ENABLED: Accumulate eval batches
# Memory optimization for evaluation
fp16_full_eval=not is_bfloat16_supported(), # ADDED: Use fp16 for eval to save memory
bf16_full_eval=is_bfloat16_supported(), # ADDED: Use bf16 for eval if supported
# Optional: Enable these for early stopping based on validation loss
# load_best_model_at_end=True,
# metric_for_best_model="eval_loss",
# greater_is_better=False,
# save_steps=10,
# save_total_limit=3,
),
)
- Which Unsloth version, TRL version, transformers version, PyTorch version?
Name: unsloth
Version: 2025.10.9
Summary: 2-5X faster training, reinforcement learning & finetuning
Home-page: http://www.unsloth.ai
Author: Unsloth AI team
Author-email: info@unsloth.ai
License-Expression: Apache-2.0
Location: /opt/conda/lib/python3.11/site-packages
Requires: accelerate, bitsandbytes, datasets, diffusers, hf_transfer, huggingface_hub, numpy, packaging, peft, protobuf, psutil, sentencepiece, torch, torchvision, tqdm, transformers, triton, trl, tyro, unsloth_zoo, wheel, xformers
Required-by:
---
Name: unsloth_zoo
Version: 2025.10.10
Summary: Utils for Unsloth
Home-page: http://www.unsloth.ai
Author: Unsloth AI team
Author-email: info@unsloth.ai
License-Expression: LGPL-3.0-or-later
Location: /opt/conda/lib/python3.11/site-packages
Requires: accelerate, cut_cross_entropy, datasets, filelock, hf_transfer, huggingface_hub, msgspec, numpy, packaging, peft, pillow, protobuf, psutil, regex, sentencepiece, torch, torchao, tqdm, transformers, triton, trl, typing_extensions, tyro, wheel
Required-by: unsloth
---
Name: trl
Version: 0.23.0
Summary: Train transformer language models with reinforcement learning.
Home-page: https://github.com/huggingface/trl
Author: Leandro von Werra
Author-email: leandro.vonwerra@gmail.com
License:
Location: /opt/conda/lib/python3.11/site-packages
Requires: accelerate, datasets, transformers
Required-by: unsloth, unsloth_zoo
---
Name: transformers
Version: 4.56.2
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /opt/conda/lib/python3.11/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: compressed-tensors, mamba-ssm, peft, transformers-cfg, trl, unsloth, unsloth_zoo, vllm, xgrammar
---
Name: torch
Version: 2.8.0+cu128
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3-Clause
Location: /opt/conda/lib/python3.11/site-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-cufile-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-cusparselt-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: accelerate, bitsandbytes, causal-conv1d, compressed-tensors, cut-cross-entropy, descript-audio-codec, descript-audiotools, flash_attn, julius, mamba-ssm, openai-whisper, peft, snac, timm, torch-stoi, torchaudio, torchelastic, torchvision, transformers-cfg, unsloth, unsloth_zoo, vllm, xformers, xgrammar
-
Which trainer?
SFTTrainer,GRPOTraineretc```pythonPut Minimal code to reproduce error here -
###Remove Hugging Face token###``
🦥 You can also ask via our Reddit page: https://www.reddit.com/r/unsloth/