|
1 | | -## Unsloth Blackwell Compatibility |
| 1 | +# Unsloth Blackwell Compatibility |
2 | 2 |
|
3 | | -### Overview |
| 3 | +## Overview |
4 | 4 |
|
5 | | -`Blackwell` (`sm100+`) requires all dependent libraries to be compiled with `cuda 12.8`. |
| 5 | +`Blackwell` (`sm100+`) requires all dependent libraries to be compiled with `cuda 12.8`. |
6 | 6 |
|
7 | 7 | The core libs for running unsloth which have dependencies on `CUDA` version are: |
8 | 8 | - `bitsandbytes` - already has wheels built with `CUDA 12.8` so `pip install` should work out of the box |
9 | 9 | - `triton` - requires `triton>=3.3.1` |
10 | | -- `torch` - requires installing with `pip install torch --extra-index-url https://download.pytorch.org/whl/cu128` |
| 10 | +- `torch` - requires installing with `pip install torch --extra-index-url https://download.pytorch.org/whl/cu128` |
11 | 11 | - `vllm` - safest is to use the nightly build: `uv pip install -U vllm --torch-backend=cu128 --extra-index-url https://wheels.vllm.ai/nightly` |
12 | 12 | - `xformers` - as of 6/26, `xformers` wheels are not yet built with `sm100+` enabled as support was only recently [added](https://github.com/facebookresearch/xformers/commit/d9b3b6e2b38ca485c89507ef8ac1fbef2723cdfa) so will require a source build (see below). |
13 | | - |
14 | | -### Installation |
| 13 | + |
| 14 | +## Installation |
| 15 | + |
| 16 | +### Using uv |
15 | 17 |
|
16 | 18 | The installation order is important, since we want the overwrite bundled dependencies with specific versions (namely, `xformers` and `triton`). |
17 | 19 |
|
18 | | -1) I prefer to use `uv` over `pip` as it's faster and better for resolving dependencies, especially for libraries which depend on `torch` but for which a specific `CUDA` version is required per this scenario. |
19 | | - |
| 20 | +1) I prefer to use `uv` over `pip` as it's faster and better for resolving dependencies, especially for libraries which depend on `torch` but for which a specific `CUDA` version is required per this scenario. |
| 21 | + |
20 | 22 | Install `uv` |
21 | | - |
| 23 | + |
22 | 24 | ```bash |
23 | 25 | curl -LsSf https://astral.sh/uv/install.sh | sh && source $HOME/.local/bin/env |
24 | 26 | ``` |
25 | | - |
| 27 | + |
26 | 28 | Create a project dir and venv: |
27 | | - |
| 29 | + |
28 | 30 | ```bash |
29 | 31 | mkdir `unsloth-blackwell` && cd `unsloth-blackwell` |
30 | 32 | uv venv .venv --python=3.12 --seed |
31 | 33 | source .venv/bin/activate |
32 | 34 | ``` |
33 | 35 |
|
34 | 36 | 2) Install `vllm` |
35 | | - |
| 37 | + |
36 | 38 | ```bash |
37 | 39 | uv pip install -U vllm --torch-backend=cu128 --extra-index-url https://wheels.vllm.ai/nightly |
38 | 40 | ``` |
39 | 41 |
|
40 | 42 | Note that we have to specify `cu128`, otherwise `vllm` will install `torch==2.7.0` but with `cu126`. |
41 | 43 |
|
42 | 44 | 3) Install `unsloth` dependencies |
43 | | - |
| 45 | + |
44 | 46 | ```bash |
45 | 47 | uv pip install unsloth unsloth_zoo bitsandbytes |
46 | 48 | ``` |
47 | 49 |
|
48 | 50 | 4) Download and build `xformers` |
49 | | - |
| 51 | + |
50 | 52 | ```bash |
51 | 53 | # First uninstall xformers installed by previous libraries |
52 | 54 | uv pip uninstall xformers |
53 | | - |
| 55 | +
|
54 | 56 | # Clone and build |
55 | 57 | git clone --depth=1 https://github.com/facebookresearch/xformers --recursive |
56 | 58 | cd xformers |
57 | 59 | export TORCH_CUDA_ARCH_LIST="12.0" |
58 | 60 | python setup.py install |
59 | 61 | ``` |
60 | | - |
| 62 | + |
61 | 63 | Note that we have to explicitly set `TORCH_CUDA_ARCH_LIST=12.0`. |
62 | 64 |
|
63 | 65 | 5) Update `triton` |
64 | | - |
| 66 | + |
65 | 67 | ```bash |
66 | 68 | uv pip install -U triton>=3.3.1 |
67 | 69 | ``` |
68 | | - |
| 70 | + |
69 | 71 | `triton>=3.3.1` is required for `Blackwell` support. |
70 | 72 |
|
71 | | -6) `transformers` |
72 | | - `transformers >= 4.53.0` breaks `unsloth` inference. Specifically, `transformers` with `gradient_checkpointing` enabled will automatically [switch off caching](https://github.com/huggingface/transformers/blob/67ddc82fbc7e52c6f42a395b4a6d278c55b77a39/src/transformers/modeling_layers.py#L52-L59). |
73 | | - |
| 73 | +6) `transformers` |
| 74 | + `transformers >= 4.53.0` breaks `unsloth` inference. Specifically, `transformers` with `gradient_checkpointing` enabled will automatically [switch off caching](https://github.com/huggingface/transformers/blob/67ddc82fbc7e52c6f42a395b4a6d278c55b77a39/src/transformers/modeling_layers.py#L52-L59). |
| 75 | + |
74 | 76 | When using `unsloth` `FastLanguageModel` to `generate` directly after training with `use_cache=True`, this will result in mismatch between expected and actual outputs [here](https://github.com/unslothai/unsloth/blob/bfa6a3678e2fb8097c5ece41d095a8051f099db3/unsloth/models/llama.py#L939). |
75 | | - |
| 77 | + |
76 | 78 | Temporary solution is to switch off `gradient_checkpointing` (e.g., `model.disable_gradient_checkpointing()`) before generation if using `4.53.0` or stick with `4.52.4` for now: |
77 | | - |
| 79 | +
|
78 | 80 | ```bash |
79 | 81 | uv pip install -U transformers==4.52.4 |
80 | 82 | ``` |
81 | 83 |
|
| 84 | +
|
| 85 | +### Using conda or mamba |
| 86 | +
|
| 87 | +1) Install `conda/mamba` |
| 88 | +
|
| 89 | + ```bash |
| 90 | + curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" |
| 91 | + ``` |
| 92 | +
|
| 93 | + Run the installation script |
| 94 | + ```bash |
| 95 | + bash Miniforge3-$(uname)-$(uname -m).sh |
| 96 | + ``` |
| 97 | +
|
| 98 | + Create a conda or mamba environment |
| 99 | + ```bash |
| 100 | + conda create --name unsloth-blackwell python==3.12 -y |
| 101 | + ``` |
| 102 | +
|
| 103 | + Activate newly created environment |
| 104 | + ```bash |
| 105 | + conda activate unsloth-blackwell |
| 106 | + ``` |
| 107 | +
|
| 108 | +2) Install `vllm` |
| 109 | +
|
| 110 | + Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your `(unsloth-blackwell)user@machine:` |
| 111 | +
|
| 112 | + ```bash |
| 113 | + pip install -U vllm --extra-index-url https://download.pytorch.org/whl/cu128 --extra-index-url https://wheels.vllm.ai/nightly |
| 114 | + ``` |
| 115 | +
|
| 116 | + Note that we have to specify `cu128`, otherwise `vllm` will install `torch==2.7.0` but with `cu126`. |
| 117 | +
|
| 118 | +3) Install `unsloth` dependencies |
| 119 | +
|
| 120 | + Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your `(unsloth-blackwell)user@machine:` |
| 121 | +
|
| 122 | + ```bash |
| 123 | + pip install unsloth unsloth_zoo bitsandbytes |
| 124 | + ``` |
| 125 | +
|
| 126 | +4) Download and build `xformers` |
| 127 | +
|
| 128 | + Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your `(unsloth-blackwell)user@machine:` |
| 129 | +
|
| 130 | + ```bash |
| 131 | + # First uninstall xformers installed by previous libraries |
| 132 | + pip uninstall xformers |
| 133 | +
|
| 134 | + # Clone and build |
| 135 | + git clone --depth=1 https://github.com/facebookresearch/xformers --recursive |
| 136 | + cd xformers |
| 137 | + export TORCH_CUDA_ARCH_LIST="12.0" |
| 138 | + python setup.py install |
| 139 | + ``` |
| 140 | +
|
| 141 | + Note that we have to explicitly set `TORCH_CUDA_ARCH_LIST=12.0`. |
| 142 | +
|
| 143 | +5) Update `triton` |
| 144 | +
|
| 145 | + Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your `(unsloth-blackwell)user@machine:` |
| 146 | +
|
| 147 | + ```bash |
| 148 | + pip install -U triton>=3.3.1 |
| 149 | + ``` |
| 150 | +
|
| 151 | + `triton>=3.3.1` is required for `Blackwell` support. |
| 152 | +
|
| 153 | +6) `Transformers` |
| 154 | + `transformers >= 4.53.0` breaks `unsloth` inference. Specifically, `transformers` with `gradient_checkpointing` enabled will automatically [switch off caching](https://github.com/huggingface/transformers/blob/67ddc82fbc7e52c6f42a395b4a6d278c55b77a39/src/transformers/modeling_layers.py#L52-L59). |
| 155 | +
|
| 156 | + When using `unsloth` `FastLanguageModel` to `generate` directly after training with `use_cache=True`, this will result in mismatch between expected and actual outputs [here](https://github.com/unslothai/unsloth/blob/bfa6a3678e2fb8097c5ece41d095a8051f099db3/unsloth/models/llama.py#L939). |
| 157 | +
|
| 158 | + Temporary solution is to switch off `gradient_checkpointing` (e.g., `model.disable_gradient_checkpointing()`) before generation if using `4.53.0` or stick with `4.52.4` for now: |
| 159 | +
|
| 160 | + Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your `(unsloth-blackwell)user@machine:` |
| 161 | +
|
| 162 | + ```bash |
| 163 | + pip install -U transformers==4.52.4 |
| 164 | + ``` |
| 165 | +
|
| 166 | +
|
| 167 | +If you are using mamba as your package just replace conda with mamba for all commands shown above. |
| 168 | +
|
| 169 | +
|
| 170 | +## Post Installation notes: |
| 171 | +
|
82 | 172 | After installation, your environment should look similar to `blackwell.requirements.txt`. |
83 | 173 |
|
84 | 174 | Note, might need to downgrade `numpy<=2.2` after all the installs. |
85 | 175 |
|
86 | | -### Test |
| 176 | +## Test |
87 | 177 | Both `test_llama32_sft.py` and `test_qwen3_grpo.py` should run without issue if correct install. If not, check diff between your installed env and `blackwell.requirements.txt`. |
0 commit comments