Skip to content

Commit 3d78d44

Browse files
Added conda/mamba section to blackwell installation readme (#2817)
* Added conda/mamba section to blackwell installation readme * fix conda creation suffix and vllm install syntax
1 parent bc77032 commit 3d78d44

File tree

1 file changed

+114
-24
lines changed

1 file changed

+114
-24
lines changed

blackwell/README.md

Lines changed: 114 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,87 +1,177 @@
1-
## Unsloth Blackwell Compatibility
1+
# Unsloth Blackwell Compatibility
22

3-
### Overview
3+
## Overview
44

5-
`Blackwell` (`sm100+`) requires all dependent libraries to be compiled with `cuda 12.8`.
5+
`Blackwell` (`sm100+`) requires all dependent libraries to be compiled with `cuda 12.8`.
66

77
The core libs for running unsloth which have dependencies on `CUDA` version are:
88
- `bitsandbytes` - already has wheels built with `CUDA 12.8` so `pip install` should work out of the box
99
- `triton` - requires `triton>=3.3.1`
10-
- `torch` - requires installing with `pip install torch --extra-index-url https://download.pytorch.org/whl/cu128`
10+
- `torch` - requires installing with `pip install torch --extra-index-url https://download.pytorch.org/whl/cu128`
1111
- `vllm` - safest is to use the nightly build: `uv pip install -U vllm --torch-backend=cu128 --extra-index-url https://wheels.vllm.ai/nightly`
1212
- `xformers` - as of 6/26, `xformers` wheels are not yet built with `sm100+` enabled as support was only recently [added](https://github.com/facebookresearch/xformers/commit/d9b3b6e2b38ca485c89507ef8ac1fbef2723cdfa) so will require a source build (see below).
13-
14-
### Installation
13+
14+
## Installation
15+
16+
### Using uv
1517

1618
The installation order is important, since we want the overwrite bundled dependencies with specific versions (namely, `xformers` and `triton`).
1719

18-
1) I prefer to use `uv` over `pip` as it's faster and better for resolving dependencies, especially for libraries which depend on `torch` but for which a specific `CUDA` version is required per this scenario.
19-
20+
1) I prefer to use `uv` over `pip` as it's faster and better for resolving dependencies, especially for libraries which depend on `torch` but for which a specific `CUDA` version is required per this scenario.
21+
2022
Install `uv`
21-
23+
2224
```bash
2325
curl -LsSf https://astral.sh/uv/install.sh | sh && source $HOME/.local/bin/env
2426
```
25-
27+
2628
Create a project dir and venv:
27-
29+
2830
```bash
2931
mkdir `unsloth-blackwell` && cd `unsloth-blackwell`
3032
uv venv .venv --python=3.12 --seed
3133
source .venv/bin/activate
3234
```
3335

3436
2) Install `vllm`
35-
37+
3638
```bash
3739
uv pip install -U vllm --torch-backend=cu128 --extra-index-url https://wheels.vllm.ai/nightly
3840
```
3941

4042
Note that we have to specify `cu128`, otherwise `vllm` will install `torch==2.7.0` but with `cu126`.
4143

4244
3) Install `unsloth` dependencies
43-
45+
4446
```bash
4547
uv pip install unsloth unsloth_zoo bitsandbytes
4648
```
4749

4850
4) Download and build `xformers`
49-
51+
5052
```bash
5153
# First uninstall xformers installed by previous libraries
5254
uv pip uninstall xformers
53-
55+
5456
# Clone and build
5557
git clone --depth=1 https://github.com/facebookresearch/xformers --recursive
5658
cd xformers
5759
export TORCH_CUDA_ARCH_LIST="12.0"
5860
python setup.py install
5961
```
60-
62+
6163
Note that we have to explicitly set `TORCH_CUDA_ARCH_LIST=12.0`.
6264

6365
5) Update `triton`
64-
66+
6567
```bash
6668
uv pip install -U triton>=3.3.1
6769
```
68-
70+
6971
`triton>=3.3.1` is required for `Blackwell` support.
7072

71-
6) `transformers`
72-
`transformers >= 4.53.0` breaks `unsloth` inference. Specifically, `transformers` with `gradient_checkpointing` enabled will automatically [switch off caching](https://github.com/huggingface/transformers/blob/67ddc82fbc7e52c6f42a395b4a6d278c55b77a39/src/transformers/modeling_layers.py#L52-L59).
73-
73+
6) `transformers`
74+
`transformers >= 4.53.0` breaks `unsloth` inference. Specifically, `transformers` with `gradient_checkpointing` enabled will automatically [switch off caching](https://github.com/huggingface/transformers/blob/67ddc82fbc7e52c6f42a395b4a6d278c55b77a39/src/transformers/modeling_layers.py#L52-L59).
75+
7476
When using `unsloth` `FastLanguageModel` to `generate` directly after training with `use_cache=True`, this will result in mismatch between expected and actual outputs [here](https://github.com/unslothai/unsloth/blob/bfa6a3678e2fb8097c5ece41d095a8051f099db3/unsloth/models/llama.py#L939).
75-
77+
7678
Temporary solution is to switch off `gradient_checkpointing` (e.g., `model.disable_gradient_checkpointing()`) before generation if using `4.53.0` or stick with `4.52.4` for now:
77-
79+
7880
```bash
7981
uv pip install -U transformers==4.52.4
8082
```
8183
84+
85+
### Using conda or mamba
86+
87+
1) Install `conda/mamba`
88+
89+
```bash
90+
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
91+
```
92+
93+
Run the installation script
94+
```bash
95+
bash Miniforge3-$(uname)-$(uname -m).sh
96+
```
97+
98+
Create a conda or mamba environment
99+
```bash
100+
conda create --name unsloth-blackwell python==3.12 -y
101+
```
102+
103+
Activate newly created environment
104+
```bash
105+
conda activate unsloth-blackwell
106+
```
107+
108+
2) Install `vllm`
109+
110+
Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your `(unsloth-blackwell)user@machine:`
111+
112+
```bash
113+
pip install -U vllm --extra-index-url https://download.pytorch.org/whl/cu128 --extra-index-url https://wheels.vllm.ai/nightly
114+
```
115+
116+
Note that we have to specify `cu128`, otherwise `vllm` will install `torch==2.7.0` but with `cu126`.
117+
118+
3) Install `unsloth` dependencies
119+
120+
Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your `(unsloth-blackwell)user@machine:`
121+
122+
```bash
123+
pip install unsloth unsloth_zoo bitsandbytes
124+
```
125+
126+
4) Download and build `xformers`
127+
128+
Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your `(unsloth-blackwell)user@machine:`
129+
130+
```bash
131+
# First uninstall xformers installed by previous libraries
132+
pip uninstall xformers
133+
134+
# Clone and build
135+
git clone --depth=1 https://github.com/facebookresearch/xformers --recursive
136+
cd xformers
137+
export TORCH_CUDA_ARCH_LIST="12.0"
138+
python setup.py install
139+
```
140+
141+
Note that we have to explicitly set `TORCH_CUDA_ARCH_LIST=12.0`.
142+
143+
5) Update `triton`
144+
145+
Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your `(unsloth-blackwell)user@machine:`
146+
147+
```bash
148+
pip install -U triton>=3.3.1
149+
```
150+
151+
`triton>=3.3.1` is required for `Blackwell` support.
152+
153+
6) `Transformers`
154+
`transformers >= 4.53.0` breaks `unsloth` inference. Specifically, `transformers` with `gradient_checkpointing` enabled will automatically [switch off caching](https://github.com/huggingface/transformers/blob/67ddc82fbc7e52c6f42a395b4a6d278c55b77a39/src/transformers/modeling_layers.py#L52-L59).
155+
156+
When using `unsloth` `FastLanguageModel` to `generate` directly after training with `use_cache=True`, this will result in mismatch between expected and actual outputs [here](https://github.com/unslothai/unsloth/blob/bfa6a3678e2fb8097c5ece41d095a8051f099db3/unsloth/models/llama.py#L939).
157+
158+
Temporary solution is to switch off `gradient_checkpointing` (e.g., `model.disable_gradient_checkpointing()`) before generation if using `4.53.0` or stick with `4.52.4` for now:
159+
160+
Make sure you are inside the activated conda/mamba environment. You should see the name of your environment as a prefix to your terminal shell like this your `(unsloth-blackwell)user@machine:`
161+
162+
```bash
163+
pip install -U transformers==4.52.4
164+
```
165+
166+
167+
If you are using mamba as your package just replace conda with mamba for all commands shown above.
168+
169+
170+
## Post Installation notes:
171+
82172
After installation, your environment should look similar to `blackwell.requirements.txt`.
83173
84174
Note, might need to downgrade `numpy<=2.2` after all the installs.
85175
86-
### Test
176+
## Test
87177
Both `test_llama32_sft.py` and `test_qwen3_grpo.py` should run without issue if correct install. If not, check diff between your installed env and `blackwell.requirements.txt`.

0 commit comments

Comments
 (0)