Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
833e147
[WIP] use vLLM for vision language models
Datta0 Jul 6, 2025
bad1692
Streamline vision vllm settings
Datta0 Jul 7, 2025
985e81c
Merge remote-tracking branch 'origin/main' into vlm_fast_infer
Datta0 Jul 10, 2025
5883e13
WIP
Datta0 Jul 10, 2025
d23e378
WIP vLLM VLM
Datta0 Jul 10, 2025
beba3ae
Make individual dummy model for qwen 2.5vl, llama3.2,
Datta0 Jul 12, 2025
d124be6
fixup norm for vLLM
Datta0 Jul 13, 2025
7abcb47
rework vLLM for VLMs
Datta0 Jul 13, 2025
11e3ff0
Cleanup more stuff
Datta0 Jul 13, 2025
b043e73
Load up remaining modules from state dict
Datta0 Jul 14, 2025
125597e
use get_state_dict when possible
Datta0 Jul 14, 2025
500fc02
Fixup lm_head state dict fetch
Datta0 Jul 15, 2025
fab2ba0
add is_vision flag for differentiating VLMs
Datta0 Jul 15, 2025
9d0a7e2
add is_vision_model flag
Datta0 Jul 15, 2025
3a72f8f
Cleanup more stuff
Datta0 Jul 15, 2025
872127f
Cleanup vLLM extraction
Datta0 Jul 15, 2025
090df5d
Fixup device type
Datta0 Jul 15, 2025
c1b57fd
Cleanup more stuff
Datta0 Jul 15, 2025
27e8b18
revert vLLM mem usage calc changes
Datta0 Jul 15, 2025
60d3a9c
Populate config values properly for VLMs
Datta0 Jul 16, 2025
4b054b8
cleaner attribute copy and check mechanism
Datta0 Jul 17, 2025
e021682
Patch siglip empty init
Datta0 Jul 17, 2025
4e91e37
Merge remote-tracking branch 'origin/main' into vlm_fast_infer
Datta0 Jul 17, 2025
6d5f448
Make additional module loading memory efficient
Datta0 Jul 17, 2025
e720866
Let the mini models be really small
Datta0 Jul 17, 2025
544cf2e
Minor cleanup
Datta0 Jul 17, 2025
b466139
cleanup vllm_utils by moving out empty model creation
Datta0 Jul 18, 2025
b5e8d63
Gemma3 and CausalLM fixes
Datta0 Jul 18, 2025
e7e4279
Merge branch 'main' into vlm_fast_infer
Datta0 Jul 18, 2025
f4cc238
Merge remote-tracking branch 'origin/main' into vlm_fast_infer
Datta0 Jul 20, 2025
c230e6d
Merge remote-tracking branch 'origin/main' into vlm_fast_infer
Datta0 Jul 21, 2025
9838d9b
Respect vLLMs conditions of max_num_batch_tokens vs max_seq_len
Datta0 Jul 21, 2025
638bd10
Restrict mm per prompt and max batch tokens
Datta0 Jul 22, 2025
de91982
Improve config copy overs
Datta0 Jul 22, 2025
64f40f3
Falcon H1 training is fp16 is unstable with the mamba kernels. NaN's …
mmathew23 Jul 22, 2025
292f6f7
Fix torch compile issues (#213)
danielhanchen Jul 23, 2025
1064908
Small fix
danielhanchen Jul 23, 2025
8fa08ed
fixup norms for causallm
Datta0 Jul 24, 2025
1b493e8
Guard against args change
Datta0 Jul 25, 2025
8fd2e4d
dont mark as grpo hidden states as dynamic
Datta0 Jul 28, 2025
deb2e0a
Merge remote-tracking branch 'origin/main' into vlm_fast_infer
Datta0 Jul 29, 2025
34da39f
Refactor to make vision handling easier
Datta0 Aug 11, 2025
9a467f2
[WIP] fixup llama vision
Datta0 Aug 11, 2025
7d4db12
cleanup
Datta0 Aug 11, 2025
e0ebcc4
2/n mllama
Datta0 Aug 11, 2025
0de564f
fixup mllama additional layers
Datta0 Aug 12, 2025
bb6243f
Merge remote-tracking branch 'origin/main' into vlm_fast_infer
Datta0 Aug 18, 2025
fa47fdf
Fixup qwen qknorm
Datta0 Aug 18, 2025
c2da34a
Pad token check and state dict changes
Datta0 Aug 18, 2025
fa93268
Patch TF protobuf incompatability
Datta0 Aug 19, 2025
b71e4f5
Revert "Patch TF protobuf incompatability"
Datta0 Aug 19, 2025
af94f0c
Fixup patch_model_and_tokenizer for VLM
Datta0 Aug 19, 2025
e580d66
reset vllm state dict changes
Datta0 Aug 19, 2025
2c52a23
Merge remote-tracking branch 'origin/main' into vlm_fast_infer
Datta0 Aug 23, 2025
28aae16
Cleanup logs
Datta0 Aug 27, 2025
85b26f3
Fixup gemma3 local rope embedding
Datta0 Aug 29, 2025
540c3d4
Merge remote-tracking branch 'origin/main' into vlm_fast_infer
Datta0 Aug 30, 2025
c3d3ac9
Fix Qwen 2.5 VL gate_up_proj vLLM
Datta0 Aug 30, 2025
8c1034a
Wakeup before doing vLLM generate (#259)
Datta0 Sep 1, 2025
538ba0c
Merge remote-tracking branch 'origin/main' into vlm_fast_infer
Datta0 Sep 5, 2025
cca9e16
use logger instead of print. Add license header
Datta0 Sep 8, 2025
6cfb2c9
Increase gpu_emmory_utilisation if in standby
Datta0 Sep 8, 2025
e078cf0
User friendly error message for sleep model with expandable segments
Datta0 Sep 8, 2025
41c7d41
Fixup cumem init for older versions
Datta0 Sep 9, 2025
be75f93
Merge remote-tracking branch 'origin/main' into vlm_fast_infer
Datta0 Sep 9, 2025
19519f1
fixup qwen vl vision rope
Datta0 Sep 9, 2025
b0a1081
do not slice logits for grpo
Datta0 Sep 9, 2025
cfae834
undo changes to rl_replacements
Datta0 Sep 9, 2025
f55abbe
Fix: (temporary workaround) mem usage calcl for quantized VLMs
Datta0 Sep 10, 2025
f6ed07d
fixup comparison attributes
Datta0 Sep 16, 2025
2b67460
Merge remote-tracking branch 'origin' into vlm_fast_infer
Datta0 Sep 16, 2025
ae65c51
compare and copy dtype
Datta0 Sep 16, 2025
39bddeb
Copy buffers along with comparable attributes
Datta0 Sep 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Increase gpu_emmory_utilisation if in standby
  • Loading branch information
Datta0 committed Sep 8, 2025
commit 6cfb2c96b22ffd5422ce5d5c49d6cf7a6a84f794
3 changes: 3 additions & 0 deletions unsloth_zoo/vllm_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -1335,6 +1335,9 @@ def load_vllm(
assert(conservativeness >= 0.0 and conservativeness <= 1.0)

unsloth_vllm_standby = unsloth_vllm_standby or (os.getenv("UNSLOTH_VLLM_STANDBY", "0") != "0")
if unsloth_vllm_standby and gpu_memory_utilization < 0.9:
gpu_memory_utilization = 0.9
logger.info("Unsloth: Standby mode is enabled. Increasing `gpu_memory_utilization` to 0.9.")

if DEVICE_TYPE == "cuda":
major_version, minor_version = torch.cuda.get_device_capability()
Expand Down