support release pipeline #3069

irexyc · 2025-01-21T12:58:32Z

Motivation

Currently only turbomind backend is supported and ytorch backend support should be added after #3051

from lmdeploy import pipeline
pipe = pipeline('/nvme/shared/vicuna-7b-v1.5/')
pipe('hello')
pipe.close()

# for pytorch backend & vl models, you need to invoke below functions manually
import gc
gc.collect()
torch.cuda.empty_cache()

from lmdeploy import pipeline
with pipeline('/nvme/shared/vicuna-7b-v1.5/') as pipe:
    pipe('hello')

# for pytorch backend & vl models, you need to invoke below functions manually
import gc
gc.collect()
torch.cuda.empty_cache()

There are two other issues that do not affect the usage

1: pipe.close will cancel all async tasks, but can't remove the callback. The callback in internal thread will raise error like below, while this will not blobk main thread.

exception calling callback for <Future at 0x7fe1541b6080 state=cancelled>
Traceback (most recent call last):
  File "/home/chenxin/miniconda3/envs/310/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
    callback(self)
  File "/home/chenxin/ws/lmdeploy/lmdeploy/serve/async_engine.py", line 440, in <lambda>
    asyncio.run_coroutine_threadsafe(_infer(), loop).add_done_callback(lambda x: x.result())
  File "/home/chenxin/miniconda3/envs/310/lib/python3.10/concurrent/futures/_base.py", line 449, in result
    raise CancelledError()
concurrent.futures._base.CancelledError

pipe.close() won't clear all cuda resources. But if we repeatedly create and close the pipeline, the cuda memory will not exceed a small value.

src/turbomind/models/llama/LlamaWeight.cc

lvhan028 · 2025-02-13T04:07:13Z

@zhulinJulia24 may add TC for this PR

* main: (90 commits) Fix cogvlm and phi3vision (InternLM#3137) support release pipeline (InternLM#3069) [ci] fix some fail in daily testcase (InternLM#3134) Fix internvl2.5 error after eviction (InternLM#3122) fix UT of deepseek chat template (InternLM#3125) Update benchmark script and user guide (InternLM#3110) bump version to v0.7.0.post3 (InternLM#3115) fix postional argument (InternLM#3086) remove logitswarper (InternLM#3109) [Fix] fix the URL judgment problem in Windows (InternLM#3103) fix user guide about cogvlm deployment (InternLM#3088) add option max-concurrent-requests for api_server(InternLM#2961) bump version to v0.7.0.post2 (InternLM#3094) Fix xcomposer2d5 (InternLM#3087) Add system role to deepseek chat template (InternLM#3031) Update tokenizer (InternLM#3061) Add deepseek-r1 chat template (InternLM#3072) bump version to v0.7.0.post1 (InternLM#3076) More arguments in api_client, update docstrings (InternLM#3077) fix sliding window mgr (InternLM#3068) ... # Conflicts: # lmdeploy/turbomind/turbomind.py

irexyc added 3 commits January 21, 2025 10:46

turbomind release pipe

b8bfec2

Merge remote-tracking branch 'origin/main' into destroy2

113f8a6

fix tp=1

de74d20

irexyc mentioned this pull request Jan 21, 2025

support release pipeline #2581

Closed

lvhan028 reviewed Jan 21, 2025

View reviewed changes

src/turbomind/models/llama/LlamaWeight.cc Outdated Show resolved Hide resolved

lvhan028 added the enhancement New feature or request label Jan 21, 2025

irexyc added 3 commits January 23, 2025 11:56

not work for pt engine with tp > 1

d81373b

fix pytorch backend tp > 1 & add pipeline context

f01bcda

better release tm

f9842e1

richardjonker2000 mentioned this pull request Jan 27, 2025

[Feature] Automatically Free pipeline of Prompts #3095

Closed

irexyc added 4 commits February 9, 2025 07:42

better release pt

fc22fa7

fix tm deadlock

87d43c1

Merge remote-tracking branch 'origin/main' into destroy2

9f41c60

Merge remote-tracking branch 'origin/main' into destroy2

06da3dd

lvhan028 requested a review from grimoire February 12, 2025 07:56

grimoire approved these changes Feb 12, 2025

View reviewed changes

irexyc added 2 commits February 12, 2025 13:02

add trim_default_mempool

666d08a

remove gc.collect and torch.cuda.empty_cache()

b08084d

lvhan028 requested a review from lzhangzz February 12, 2025 13:22

lvhan028 approved these changes Feb 12, 2025

View reviewed changes

irexyc mentioned this pull request Feb 13, 2025

[Bug] Unable to clear VRAM / unload pipeline #3112

Closed

3 tasks

lzhangzz approved these changes Feb 13, 2025

View reviewed changes

lvhan028 merged commit eb89e6a into InternLM:main Feb 13, 2025
9 checks passed

lvhan028 mentioned this pull request Feb 13, 2025

[Feature] Hope the pipeline can automatically destroy resources #2498

Closed

zhulinJulia24 mentioned this pull request Feb 18, 2025

[ci] testcase refactoring #3151

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support release pipeline #3069

support release pipeline #3069

irexyc commented Jan 21, 2025 •

edited

Loading

lvhan028 commented Feb 13, 2025 •

edited

Loading

support release pipeline #3069

support release pipeline #3069

Conversation

irexyc commented Jan 21, 2025 • edited Loading

Motivation

lvhan028 commented Feb 13, 2025 • edited Loading

irexyc commented Jan 21, 2025 •

edited

Loading

lvhan028 commented Feb 13, 2025 •

edited

Loading