Skip to content

Commit a240783

Browse files
danielhanchenErland366Datta0Rabbidonroot
authored
Bug Fixes (#1470)
* Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit f09974b. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit 9fc1307. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules * Fix vision model tokenizer padding side. (#1384) * Dynamic quants (#1379) * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit f09974b. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit 9fc1307. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> * Update README.md Unsloth Dynamic 4-bit Quantization Update * Fix vision model tokenizer padding side. * Update vision.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Add citation section to README.md (#1377) * Add citation section to README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Granite support (#1218) * [WIP] Support for Granite * Fixup inference * Cleanup flex attention * remove sliding window * Use torch.add for residual multiplier * Llama 3.3 * Update llama.py * Update llama.py * fullgraph * Fix loader.py to work on Windows (#1453) * Update README.md Llama 3.3 + Reddit * Update README.md Apple ML Cross Entropy * Update README.md Removing double citation * Fix loader.py to work on Windows --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update save.py warning message (#1425) * Update README.md Llama 3.3 + Reddit * Update README.md Apple ML Cross Entropy * Update README.md Removing double citation * Update save.py warning message --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Change _fix_chat_template in case a template has both endif and endfor (#1388) * Update llama and derivatives to pass position embeddings explicitly for transformers v4.47+ (#1442) * Update save.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Temp fix * Update _utils.py * Update _utils.py * Update pyproject.toml * Name Error Bug Fix - import from packaging.version import Version (#1468) * Version * Update pyproject.toml * Update pyproject.toml * Version * Update pyproject.toml * Update pyproject.toml * dependencies * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update mistral.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update granite.py * Update cohere.py * Triton windows * Update gemma2.py * Update pyproject.toml * Update _utils.py * Update pyproject.toml --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Zewen Shen <zewen.public@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Scott Phillips <polygonguru@gmail.com> Co-authored-by: qingy1337 <qxli2@students.everettcc.edu> Co-authored-by: Giulia Baldini <44327645+giuliabaldini@users.noreply.github.com> Co-authored-by: Yonghye Kwon <developer.0hye@gmail.com>
1 parent 79db713 commit a240783

File tree

10 files changed

+168
-144
lines changed

10 files changed

+168
-144
lines changed

pyproject.toml

Lines changed: 130 additions & 100 deletions
Large diffs are not rendered by default.

unsloth/models/_utils.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
__version__ = "2024.12.8"
15+
__version__ = "2024.12.9"
1616

1717
__all__ = [
1818
"prepare_model_for_kbit_training",
@@ -72,7 +72,7 @@
7272
platform_system = platform_system()
7373
import numpy as np
7474
import warnings, subprocess, re, inspect, psutil, os, math
75-
from packaging.version import Version
75+
from unsloth_zoo.utils import Version
7676

7777
from unsloth_zoo.tokenizer_utils import (
7878
patch_tokenizer as _patch_tokenizer,
@@ -403,7 +403,7 @@ def _is_openai_available(): return False
403403
# Fix new Xformers versions TypeError: Multiple dispatch failed for 'torch._ops.aten.to.dtype_layout'
404404
accelerate_old_send_to_device = None
405405
accelerate_new_send_to_device = None
406-
if Version(xformers_version) >= Version("0.0.27"):
406+
if xformers_version is not None and Version(xformers_version) >= Version("0.0.27"):
407407
import accelerate.utils.operations
408408
if hasattr(accelerate.utils.operations, "send_to_device") and \
409409
accelerate.utils.operations.send_to_device.__name__ != "_fixed_send_to_device":
@@ -1086,6 +1086,14 @@ def patch_gradient_accumulation_fix(Trainer):
10861086
"if num_items_in_batch is not None: loss *= self.args.gradient_accumulation_steps",
10871087
)
10881088
function = function.replace("def training_step", "def _unsloth_training_step", 1)
1089+
1090+
# Fix 4.47.0 issue where num_items_in_batch was removed
1091+
# See https://github.com/huggingface/transformers/pull/35121
1092+
function = function.replace(
1093+
"if self.model_accepts_loss_kwargs:",
1094+
"if False:",
1095+
)
1096+
10891097
exec(function, globals())
10901098
Trainer.training_step = _unsloth_training_step
10911099
pass

unsloth/models/cohere.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ def fast_layernorm_inference(self, X, out_weight = None):
6868
def CohereAttention_fast_forward(
6969
self,
7070
hidden_states: torch.Tensor,
71-
causal_mask: Optional[xformers.attn_bias.BlockDiagonalCausalMask] = None,
71+
causal_mask: Optional[BlockDiagonalCausalMask] = None,
7272
attention_mask: Optional[torch.Tensor] = None,
7373
position_ids: Optional[torch.LongTensor] = None,
7474
past_key_value: Optional[Tuple[torch.Tensor]] = None,
@@ -183,7 +183,7 @@ def CohereAttention_fast_forward(
183183
def CohereDecoderLayer_fast_forward(
184184
self,
185185
hidden_states: torch.Tensor,
186-
causal_mask: Optional[xformers.attn_bias.BlockDiagonalCausalMask] = None,
186+
causal_mask: Optional[BlockDiagonalCausalMask] = None,
187187
attention_mask: Optional[torch.Tensor] = None,
188188
position_ids: Optional[torch.LongTensor] = None,
189189
past_key_value: Optional[Tuple[torch.Tensor]] = None,

unsloth/models/gemma.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ def fast_geglu_inference(self, X):
7777
def GemmaDecoderLayer_fast_forward(
7878
self,
7979
hidden_states: torch.Tensor,
80-
causal_mask: Optional[xformers.attn_bias.BlockDiagonalCausalMask] = None,
80+
causal_mask: Optional[BlockDiagonalCausalMask] = None,
8181
attention_mask: Optional[torch.Tensor] = None,
8282
position_ids: Optional[torch.LongTensor] = None,
8383
past_key_value: Optional[Tuple[torch.Tensor]] = None,

unsloth/models/gemma2.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ def fast_rms_layernorm_gemma2_compiled(layernorm, X, gemma = True):
7575
def Gemma2Attention_fast_forward(
7676
self,
7777
hidden_states: torch.Tensor,
78-
causal_mask: Optional[xformers.attn_bias.BlockDiagonalCausalMask] = None,
78+
causal_mask: Optional[BlockDiagonalCausalMask] = None,
7979
attention_mask: Optional[torch.Tensor] = None,
8080
position_ids: Optional[torch.LongTensor] = None,
8181
past_key_value: Optional[Tuple[torch.Tensor]] = None,
@@ -169,7 +169,7 @@ def Gemma2Attention_fast_forward(
169169
def Gemma2DecoderLayer_fast_forward(
170170
self,
171171
hidden_states: torch.Tensor,
172-
causal_mask: Optional[xformers.attn_bias.BlockDiagonalCausalMask] = None,
172+
causal_mask: Optional[BlockDiagonalCausalMask] = None,
173173
attention_mask: Optional[torch.Tensor] = None,
174174
position_ids: Optional[torch.LongTensor] = None,
175175
past_key_value: Optional[Tuple[torch.Tensor]] = None,

unsloth/models/granite.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@
6060
def GraniteAttention_fast_forward(
6161
self,
6262
hidden_states: torch.Tensor,
63-
causal_mask: Optional[xformers.attn_bias.BlockDiagonalCausalMask] = None,
63+
causal_mask: Optional[BlockDiagonalCausalMask] = None,
6464
attention_mask: Optional[torch.Tensor] = None,
6565
position_ids: Optional[torch.LongTensor] = None,
6666
past_key_value: Optional[Tuple[torch.Tensor]] = None,
@@ -171,7 +171,7 @@ def GraniteAttention_fast_forward(
171171
def GraniteDecoderLayer_fast_forward(
172172
self,
173173
hidden_states: torch.Tensor,
174-
causal_mask: Optional[xformers.attn_bias.BlockDiagonalCausalMask] = None,
174+
causal_mask: Optional[BlockDiagonalCausalMask] = None,
175175
attention_mask: Optional[torch.Tensor] = None,
176176
position_ids: Optional[torch.LongTensor] = None,
177177
past_key_value: Optional[Tuple[torch.Tensor]] = None,

unsloth/models/llama.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,8 @@
6666
from huggingface_hub.utils._token import get_token
6767
pass
6868
from triton import __version__ as triton_version
69+
BlockDiagonalCausalMask = xformers.attn_bias.BlockDiagonalCausalMask if xformers is not None else None
70+
6971

7072
def original_apply_qkv(self, X):
7173
Q = self.q_proj(X)
@@ -330,7 +332,7 @@ def fast_layernorm_compiled(layernorm, X):
330332
def LlamaAttention_fast_forward(
331333
self,
332334
hidden_states: torch.Tensor,
333-
causal_mask: Optional[xformers.attn_bias.BlockDiagonalCausalMask] = None,
335+
causal_mask: Optional[BlockDiagonalCausalMask] = None,
334336
attention_mask: Optional[torch.Tensor] = None,
335337
position_ids: Optional[torch.LongTensor] = None,
336338
past_key_value: Optional[Tuple[torch.Tensor]] = None,
@@ -538,7 +540,7 @@ def LlamaDecoderLayer_fast_forward(
538540
def LlamaModel_fast_forward(
539541
self,
540542
input_ids: torch.LongTensor,
541-
causal_mask: Optional[xformers.attn_bias.BlockDiagonalCausalMask] = None,
543+
causal_mask: Optional[BlockDiagonalCausalMask] = None,
542544
attention_mask: Optional[torch.Tensor] = None,
543545
position_ids: Optional[torch.LongTensor] = None,
544546
past_key_values: Optional[List[torch.FloatTensor]] = None,
@@ -942,7 +944,7 @@ def CausalLM_fast_forward(fast_forward_inference):
942944
def _CausalLM_fast_forward(
943945
self,
944946
input_ids: torch.LongTensor = None,
945-
causal_mask: Optional[xformers.attn_bias.BlockDiagonalCausalMask] = None,
947+
causal_mask: Optional[BlockDiagonalCausalMask] = None,
946948
attention_mask: Optional[torch.Tensor] = None,
947949
position_ids: Optional[torch.LongTensor] = None,
948950
past_key_values: Optional[List[torch.FloatTensor]] = None,

unsloth/models/loader.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
from huggingface_hub import HfFileSystem
3333

3434
# https://github.com/huggingface/transformers/pull/26037 allows 4 bit loading!
35-
from packaging.version import Version
35+
from unsloth_zoo.utils import Version
3636
transformers_version = Version(transformers_version)
3737
SUPPORTS_FOURBIT = transformers_version >= Version("4.37")
3838
SUPPORTS_GEMMA = transformers_version >= Version("4.38")

unsloth/models/mistral.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@
4040
def MistralAttention_fast_forward(
4141
self,
4242
hidden_states: torch.Tensor,
43-
causal_mask: Optional[xformers.attn_bias.BlockDiagonalCausalMask] = None,
43+
causal_mask: Optional[BlockDiagonalCausalMask] = None,
4444
attention_mask: Optional[torch.Tensor] = None,
4545
position_ids: Optional[torch.LongTensor] = None,
4646
past_key_value: Optional[Tuple[torch.Tensor]] = None,
@@ -172,7 +172,7 @@ def MistralAttention_fast_forward(
172172
def MistralForCausalLM_fast_forward(
173173
self,
174174
input_ids: torch.LongTensor = None,
175-
causal_mask: Optional[xformers.attn_bias.BlockDiagonalCausalMask] = None,
175+
causal_mask: Optional[BlockDiagonalCausalMask] = None,
176176
attention_mask: Optional[torch.Tensor] = None,
177177
position_ids: Optional[torch.LongTensor] = None,
178178
past_key_values: Optional[List[torch.FloatTensor]] = None,

unsloth/save.py

Lines changed: 12 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15+
from unsloth_zoo.utils import Version
1516
from bitsandbytes.nn import Linear4bit as Bnb_Linear4bit
1617
from peft.tuners.lora import Linear4bit as Peft_Linear4bit
1718
from peft.tuners.lora import Linear as Peft_Linear
@@ -2096,6 +2097,7 @@ def unsloth_convert_lora_to_ggml_and_save_locally(
20962097

20972098

20982099
from .models.loader_utils import get_model_name
2100+
from unsloth_zoo.saving_utils import merge_and_overwrite_lora
20992101

21002102
@torch.inference_mode
21012103
def unsloth_generic_save(
@@ -2127,34 +2129,16 @@ def unsloth_generic_save(
21272129
maximum_memory_usage : float = 0.9,
21282130
):
21292131
if token is None and push_to_hub: token = get_token()
2130-
2131-
import unsloth_zoo
2132-
if Version(unsloth_zoo.__version__) <= Version("2024.12.1"):
2133-
from unsloth_zoo.peft_utils import merge_and_overwrite_lora
2134-
merge_and_overwrite_lora(
2135-
get_model_name,
2136-
create_huggingface_repo,
2137-
model,
2138-
save_location = save_directory,
2139-
push_to_hub = push_to_hub,
2140-
token = token,
2141-
upload_location = save_directory if push_to_hub else None,
2142-
low_disk_space_usage = True,
2143-
private = private,
2144-
)
2145-
else:
2146-
from unsloth_zoo.saving_utils import merge_and_overwrite_lora
2147-
merge_and_overwrite_lora(
2148-
get_model_name,
2149-
model,
2150-
save_directory = save_directory,
2151-
push_to_hub = push_to_hub,
2152-
private = private,
2153-
token = token,
2154-
low_disk_space_usage = False,
2155-
use_temp_file = False,
2156-
)
2157-
pass
2132+
merge_and_overwrite_lora(
2133+
get_model_name,
2134+
model,
2135+
save_directory = save_directory,
2136+
push_to_hub = push_to_hub,
2137+
private = private,
2138+
token = token,
2139+
low_disk_space_usage = False,
2140+
use_temp_file = False,
2141+
)
21582142
return
21592143
pass
21602144

0 commit comments

Comments
 (0)