Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support release pipeline #3069

Merged
merged 12 commits into from
Feb 13, 2025
Merged

support release pipeline #3069

merged 12 commits into from
Feb 13, 2025

Conversation

irexyc
Copy link
Collaborator

@irexyc irexyc commented Jan 21, 2025

Motivation

#3055
#2498

Currently only turbomind backend is supported and ytorch backend support should be added after #3051

from lmdeploy import pipeline
pipe = pipeline('/nvme/shared/vicuna-7b-v1.5/')
pipe('hello')
pipe.close()

# for pytorch backend & vl models, you need to invoke below functions manually
import gc
gc.collect()
torch.cuda.empty_cache()
from lmdeploy import pipeline
with pipeline('/nvme/shared/vicuna-7b-v1.5/') as pipe:
    pipe('hello')

# for pytorch backend & vl models, you need to invoke below functions manually
import gc
gc.collect()
torch.cuda.empty_cache()

There are two other issues that do not affect the usage

1: pipe.close will cancel all async tasks, but can't remove the callback. The callback in internal thread will raise error like below, while this will not blobk main thread.

exception calling callback for <Future at 0x7fe1541b6080 state=cancelled>
Traceback (most recent call last):
  File "/home/chenxin/miniconda3/envs/310/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
    callback(self)
  File "/home/chenxin/ws/lmdeploy/lmdeploy/serve/async_engine.py", line 440, in <lambda>
    asyncio.run_coroutine_threadsafe(_infer(), loop).add_done_callback(lambda x: x.result())
  File "/home/chenxin/miniconda3/envs/310/lib/python3.10/concurrent/futures/_base.py", line 449, in result
    raise CancelledError()
concurrent.futures._base.CancelledError
  1. pipe.close() won't clear all cuda resources. But if we repeatedly create and close the pipeline, the cuda memory will not exceed a small value.

@irexyc irexyc mentioned this pull request Jan 21, 2025
@lvhan028 lvhan028 added the enhancement New feature or request label Jan 21, 2025
@lvhan028 lvhan028 requested a review from grimoire February 12, 2025 07:56
@lvhan028 lvhan028 requested a review from lzhangzz February 12, 2025 13:22
@lvhan028 lvhan028 merged commit eb89e6a into InternLM:main Feb 13, 2025
9 checks passed
@lvhan028
Copy link
Collaborator

lvhan028 commented Feb 13, 2025

@zhulinJulia24 may add TC for this PR

tastelikefeet added a commit to tastelikefeet/lmdeploy that referenced this pull request Feb 17, 2025
* main: (90 commits)
  Fix cogvlm and phi3vision (InternLM#3137)
  support release pipeline (InternLM#3069)
  [ci] fix some fail in daily testcase (InternLM#3134)
  Fix internvl2.5 error after eviction (InternLM#3122)
  fix UT of deepseek chat template (InternLM#3125)
  Update benchmark script and user guide (InternLM#3110)
  bump version to v0.7.0.post3 (InternLM#3115)
  fix postional argument (InternLM#3086)
  remove logitswarper (InternLM#3109)
  [Fix] fix the URL judgment problem in Windows (InternLM#3103)
  fix user guide about cogvlm deployment (InternLM#3088)
  add option max-concurrent-requests for api_server(InternLM#2961)
  bump version to v0.7.0.post2 (InternLM#3094)
  Fix xcomposer2d5 (InternLM#3087)
  Add system role to deepseek chat template (InternLM#3031)
  Update tokenizer (InternLM#3061)
  Add deepseek-r1 chat template (InternLM#3072)
  bump version to v0.7.0.post1 (InternLM#3076)
  More arguments in api_client, update docstrings (InternLM#3077)
  fix sliding window mgr (InternLM#3068)
  ...

# Conflicts:
#	lmdeploy/turbomind/turbomind.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants