Skip to content

Commit f3f2b51

Browse files
committed
Merge branch 'main' into nightly
2 parents 3dd87bb + d2a038d commit f3f2b51

File tree

4 files changed

+37
-42
lines changed

4 files changed

+37
-42
lines changed

blackwell/README.md

Lines changed: 25 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ The core libs for running unsloth which have dependencies on `CUDA` version are:
1010
- `bitsandbytes` - already has wheels built with `CUDA 12.8` so `pip install` should work out of the box
1111
- `triton` - requires `triton>=3.3.1`
1212
- `torch` - requires installing with `pip install torch --extra-index-url https://download.pytorch.org/whl/cu128`
13-
- `vllm` - safest is to use the nightly build: `uv pip install -U vllm --torch-backend=cu128 --extra-index-url https://wheels.vllm.ai/nightly`
14-
- `xformers` - as of 6/26, `xformers` wheels are not yet built with `sm100+` enabled as support was only recently [added](https://github.com/facebookresearch/xformers/commit/d9b3b6e2b38ca485c89507ef8ac1fbef2723cdfa) so will require a source build (see below).
13+
- `vllm` - vLLM 0.10.0 supports Blackwell now, but use CUDA 12.8: `uv pip install -U vllm --torch-backend=cu128`
14+
- `xformers` - (Optional) as of 6/26, `xformers` wheels are not yet built with `sm100+` enabled as support was only recently [added](https://github.com/facebookresearch/xformers/commit/d9b3b6e2b38ca485c89507ef8ac1fbef2723cdfa) so will require a source build (see below).
1515

1616
## Installation
1717

@@ -38,7 +38,7 @@ The installation order is important, since we want the overwrite bundled depende
3838
2) Install `vllm`
3939

4040
```bash
41-
uv pip install -U vllm --torch-backend=cu128 --extra-index-url https://wheels.vllm.ai/nightly
41+
uv pip install -U vllm --torch-backend=cu128
4242
```
4343

4444
Note that we have to specify `cu128`, otherwise `vllm` will install `torch==2.7.0` but with `cu126`.
@@ -49,7 +49,17 @@ The installation order is important, since we want the overwrite bundled depende
4949
uv pip install unsloth unsloth_zoo bitsandbytes
5050
```
5151

52-
4) Download and build `xformers`
52+
If you notice weird resolving issues due to Xformers, you can also install Unsloth from source without Xformers:
53+
54+
```bash
55+
uv pip install -qqq \
56+
"unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo" \
57+
"unsloth[base] @ git+https://github.com/unslothai/unsloth"
58+
```
59+
60+
4) Download and build `xformers` (Optional)
61+
62+
Xformers is optional, but it is definitely faster and uses less memory. We'll use PyTorch's native SDPA if you do not want Xformers. Building Xformers from source might be slow, so beware!
5363

5464
```bash
5565
# First uninstall xformers installed by previous libraries
@@ -64,23 +74,11 @@ The installation order is important, since we want the overwrite bundled depende
6474

6575
Note that we have to explicitly set `TORCH_CUDA_ARCH_LIST=12.0`.
6676

67-
5) Update `triton`
68-
69-
```bash
70-
uv pip install -U triton>=3.3.1
71-
```
72-
73-
`triton>=3.3.1` is required for `Blackwell` support.
74-
75-
6) `transformers`
76-
`transformers >= 4.53.0` breaks `unsloth` inference. Specifically, `transformers` with `gradient_checkpointing` enabled will automatically [switch off caching](https://github.com/huggingface/transformers/blob/67ddc82fbc7e52c6f42a395b4a6d278c55b77a39/src/transformers/modeling_layers.py#L52-L59).
77-
78-
When using `unsloth` `FastLanguageModel` to `generate` directly after training with `use_cache=True`, this will result in mismatch between expected and actual outputs [here](https://github.com/unslothai/unsloth/blob/bfa6a3678e2fb8097c5ece41d095a8051f099db3/unsloth/models/llama.py#L939).
79-
80-
Temporary solution is to switch off `gradient_checkpointing` (e.g., `model.disable_gradient_checkpointing()`) before generation if using `4.53.0` or stick with `4.52.4` for now:
77+
5) `transformers`
78+
Install any transformers version, but best to get the latest.
8179

8280
```bash
83-
uv pip install -U transformers==4.52.4
81+
uv pip install -U transformers
8482
```
8583

8684

@@ -112,7 +110,7 @@ The installation order is important, since we want the overwrite bundled depende
112110
Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your `(unsloth-blackwell)user@machine:`
113111

114112
```bash
115-
pip install -U vllm --extra-index-url https://download.pytorch.org/whl/cu128 --extra-index-url https://wheels.vllm.ai/nightly
113+
pip install -U vllm --extra-index-url https://download.pytorch.org/whl/cu128
116114
```
117115

118116
Note that we have to specify `cu128`, otherwise `vllm` will install `torch==2.7.0` but with `cu126`.
@@ -125,9 +123,11 @@ The installation order is important, since we want the overwrite bundled depende
125123
pip install unsloth unsloth_zoo bitsandbytes
126124
```
127125

128-
4) Download and build `xformers`
126+
4) Download and build `xformers` (Optional)
129127

130-
Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your `(unsloth-blackwell)user@machine:`
128+
Xformers is optional, but it is definitely faster and uses less memory. We'll use PyTorch's native SDPA if you do not want Xformers. Building Xformers from source might be slow, so beware!
129+
130+
You should see the name of your environment as a prefix to your terminal shell like this your `(unsloth-blackwell)user@machine:`
131131

132132
```bash
133133
# First uninstall xformers installed by previous libraries
@@ -153,16 +153,10 @@ The installation order is important, since we want the overwrite bundled depende
153153
`triton>=3.3.1` is required for `Blackwell` support.
154154

155155
6) `Transformers`
156-
`transformers >= 4.53.0` breaks `unsloth` inference. Specifically, `transformers` with `gradient_checkpointing` enabled will automatically [switch off caching](https://github.com/huggingface/transformers/blob/67ddc82fbc7e52c6f42a395b4a6d278c55b77a39/src/transformers/modeling_layers.py#L52-L59).
157-
158-
When using `unsloth` `FastLanguageModel` to `generate` directly after training with `use_cache=True`, this will result in mismatch between expected and actual outputs [here](https://github.com/unslothai/unsloth/blob/bfa6a3678e2fb8097c5ece41d095a8051f099db3/unsloth/models/llama.py#L939).
159-
160-
Temporary solution is to switch off `gradient_checkpointing` (e.g., `model.disable_gradient_checkpointing()`) before generation if using `4.53.0` or stick with `4.52.4` for now:
161-
162-
Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your `(unsloth-blackwell)user@machine:`
156+
Install any transformers version, but best to get the latest.
163157

164158
```bash
165-
pip install -U transformers==4.52.4
159+
uv pip install -U transformers
166160
```
167161

168162

@@ -171,7 +165,7 @@ If you are using mamba as your package just replace conda with mamba for all com
171165

172166
## WSL-Specific Notes
173167

174-
If you're using WSL (Windows Subsystem for Linux) and encounter issues during xformers compilation, follow these additional steps:
168+
If you're using WSL (Windows Subsystem for Linux) and encounter issues during xformers compilation (reminder Xformers is optional, but faster for training) follow these additional steps:
175169
176170
1. **Increase WSL Memory Limit**
177171
Create or edit the WSL configuration file:

unsloth/models/_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
__version__ = "2025.8.6"
15+
__version__ = "2025.8.7"
1616

1717
__all__ = [
1818
"SUPPORTS_BFLOAT16",

unsloth/models/loader.py

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ def from_pretrained(
158158
)
159159
pass
160160
pass
161-
161+
162162
old_model_name = model_name
163163
if not use_exact_model_name:
164164
model_name = get_model_name(model_name, load_in_4bit)
@@ -214,7 +214,7 @@ def from_pretrained(
214214
else:
215215
# Because HfFileSystem assumes linux paths, we need to set the path with forward slashes, even on Windows.
216216
files = HfFileSystem(token = token).glob(f"{model_name}/*.json")
217-
files = (os.path.split(x)[-1] for x in files)
217+
files = list(os.path.split(x)[-1] for x in files)
218218
if sum(x == "adapter_config.json" or x == "config.json" for x in files) >= 2:
219219
both_exist = True
220220
pass
@@ -239,7 +239,7 @@ def from_pretrained(
239239
f"This includes Llama 3.1. The minimum required version is 4.43.2\n"\
240240
f'Try `pip install --upgrade "transformers>=4.43.2"`\n'\
241241
f"to obtain the latest transformers build, then restart this session."\
242-
)
242+
)
243243
# Create a combined error message showing both failures
244244
combined_error = (
245245
"Unsloth: Failed to load model. Both AutoConfig and PeftConfig loading failed.\n\n"
@@ -316,7 +316,7 @@ def from_pretrained(
316316
"To update flash-attn, do the below:\n"\
317317
'\npip install --no-deps --upgrade "flash-attn>=2.6.3"'
318318
)
319-
319+
320320
dispatch_model = FastGemma2Model
321321
elif model_type == "qwen2":
322322
dispatch_model = FastQwen2Model
@@ -383,7 +383,7 @@ def from_pretrained(
383383
fast_inference = False
384384
pass
385385
from unsloth_zoo.vllm_utils import (
386-
patch_vllm,
386+
patch_vllm,
387387
vllm_dynamic_quant_supported,
388388
)
389389
patch_vllm()
@@ -421,7 +421,7 @@ def from_pretrained(
421421
disable_log_stats = disable_log_stats,
422422
*args, **kwargs,
423423
)
424-
424+
425425
if resize_model_vocab is not None:
426426
model.resize_token_embeddings(resize_model_vocab)
427427
pass
@@ -712,7 +712,7 @@ def from_pretrained(
712712
both_exist = exist_adapter_config and exist_config
713713
else:
714714
files = HfFileSystem(token = token).glob(f"{model_name}/*.json")
715-
files = (os.path.split(x)[-1] for x in files)
715+
files = list(os.path.split(x)[-1] for x in files)
716716
if sum(x == "adapter_config.json" or x == "config.json" for x in files) >= 2:
717717
both_exist = True
718718
pass
@@ -737,7 +737,7 @@ def from_pretrained(
737737
f"This includes Llama 3.1. The minimum required version is 4.43.2\n"\
738738
f'Try `pip install --upgrade "transformers>=4.43.2"`\n'\
739739
f"to obtain the latest transformers build, then restart this session."\
740-
)
740+
)
741741
# Create a combined error message showing both failures
742742
combined_error = (
743743
"Unsloth: Failed to load model. Both AutoConfig and PeftConfig loading failed.\n\n"
@@ -753,7 +753,7 @@ def from_pretrained(
753753
model_name = peft_config.base_model_name_or_path
754754
if not use_exact_model_name:
755755
model_name = get_model_name(model_name, load_in_4bit)
756-
756+
757757
model_config = AutoConfig.from_pretrained(
758758
model_name,
759759
token = token,
@@ -869,7 +869,7 @@ def from_pretrained(
869869
use_gradient_checkpointing = use_gradient_checkpointing,
870870
supports_sdpa = supports_sdpa,
871871
whisper_language = whisper_language,
872-
whisper_task = whisper_task,
872+
whisper_task = whisper_task,
873873
*args, **kwargs,
874874
)
875875

unsloth/save.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2240,6 +2240,7 @@ def unsloth_convert_lora_to_ggml_and_save_locally(
22402240
def save_to_gguf_generic(
22412241
model,
22422242
save_directory,
2243+
tokenizer,
22432244
quantization_method = None,
22442245
quantization_type = "Q8_0",
22452246
repo_id = None,

0 commit comments

Comments
 (0)