[fix] fix vl gradio, use pipeline api and remove interactive chat #3136

irexyc · 2025-02-12T08:54:44Z

Motivation

Use pipeline api and remove interactive chat to support vl models like qwen2-vl

Currently with pytoch backend, it seems that the generator cannot be interrupted when sending a stop request. @grimoire

# reproduction code

import asyncio
from lmdeploy import pipeline, PytorchEngineConfig
pipe = pipeline('/mnt/141/vicuna-7b-v1.5/', backend_config=PytorchEngineConfig())

event = asyncio.Event()
messages = '写一篇高考作文'
session_id = 0

async def inference():
    step = 0
    async for out in pipe.generate(messages=messages, session_id=session_id):
        print(step, out)
        step += 1
        if (step == 5):
            event.set()
    print('inference done')


async def stop():
    await event.wait()
    await pipe.stop_session(session_id=session_id)


async def main():
    await asyncio.gather(inference(), stop())


if __name__ == '__main__':
    asyncio.run(main())

grimoire · 2025-02-12T10:30:29Z

The asyncio.Event should be created in a coroutine to prevent different even loop error.

Pytorch engine assumes user would break the generate loop by themself. If the automatic break is required, you can change _on_stop_session:

lmdeploy/lmdeploy/pytorch/engine/engine.py

Line 268 in e91ccf0

def _on_stop_session(self, reqs: Request, **kwargs):

to

    def _on_stop_session(self, reqs: Request, **kwargs):
        """on stop session callback."""
        for req in reqs:
            session_id = req.data['session_id']
            resp = req.data.get('response', True)
            resp_type = ResponseType.SESSION_NOT_EXIST
            if session_id in self.scheduler.sessions:
                self.scheduler.stop_session(session_id)
                session =  self.scheduler.sessions[session_id]
                for seq in session.sequences.values():
                    resp: Response = getattr(seq, 'resp', None)
                    if resp is not None:
                        resp.type = ResponseType.FINISH
                        self.req_manager.response(resp)
                resp_type = ResponseType.SUCCESS
            if resp:
                self._response(req.resp, resp_type)

irexyc · 2025-02-13T02:33:10Z

@grimoire
After applying this modification, there is still a probability that the program may hang

The following code is easier to reproduce

import asyncio
import numpy as np
from lmdeploy import pipeline, PytorchEngineConfig
import random

messages = '写一篇高考作文'
pipe = pipeline('/mnt/141/vicuna-7b-v1.5/', backend_config=PytorchEngineConfig())


async def inference(session_id):
    async for out in pipe.generate(messages=messages, session_id=session_id):
        pass
    print('inference done')


async def timer(event):
    dur = 1 + random.random() * 2
    await asyncio.sleep(dur)
    event.set()


async def stop(event, session_id):
    await event.wait()
    await pipe.stop_session(session_id=session_id)
    print('stop session')


async def main():
    session_id = 0
    event = asyncio.Event()
    while True:
        await asyncio.gather(inference(session_id), timer(event), stop(event, session_id))
        print(f'session={session_id} done\n')
        session_id += 1
        event.clear()


if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

grimoire · 2025-02-13T02:59:04Z

lmdeploy/lmdeploy/pytorch/engine/engine.py

Line 236 in 188f069

    
           def _response(self, resp: Response, resp_type: ResponseType, data: Any = None, err_msg: str = ''):

    def _response(self, resp: Response, resp_type: ResponseType, data: Any = None, err_msg: str = ''):
        """response."""
        if resp.type == ResponseType.FINISH:
            return
        resp.type = resp_type
        resp.data = data
        resp.err_msg = err_msg
        self.req_manager.response(resp)

zhulinJulia24 · 2025-02-13T05:47:44Z

It works on gradio！

zhulinJulia24

lgtm

…oad_state_dict * commit 'f6f7a5d707e3ccbc69af10babf1c9afcaf72a402': fix deepseekv2 has no attribute use_mla error (InternLM#3188) fix blocked fp8 moe (InternLM#3181) [Feature] support deepseek-vl2 for pytorch engine (InternLM#3149) make turbomind support gpu embedding inputs (InternLM#3177) fix temperature=0 (InternLM#3176) Update qwen2.py (InternLM#3174) Fix tool call prompt for InternLM and Qwen (InternLM#3156) Use pad_token_id as image_token_id for vl models (InternLM#3158) fix default temperature value (InternLM#3166) fix min length penalty (InternLM#3150) update cuda runtime package dependencies (InternLM#3142) fix typing (InternLM#3153) support deepseekv2 for maca backend. (InternLM#2918) fix the issue that stop_token may be less than defined in model.py (InternLM#3148) [fix] fix vl gradio, use pipeline api and remove interactive chat (InternLM#3136) [feature] add dlinfer w8a8 support. (InternLM#2988) Use aiohttp inside proxy server && add --disable-cache-status argument (InternLM#3020) support eos_token list in turbomind (InternLM#3044)

use pipeline api and remove interactive chat

8d361bb

lvhan028 added the Bug:P1 label Feb 12, 2025

lvhan028 requested review from grimoire and zhulinJulia24 February 12, 2025 13:24

support cancel request with pytorch backend

65ad2a1

zhulinJulia24 approved these changes Feb 13, 2025

View reviewed changes

grimoire approved these changes Feb 17, 2025

View reviewed changes

lvhan028 merged commit 28f3027 into InternLM:main Feb 17, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] fix vl gradio, use pipeline api and remove interactive chat #3136

[fix] fix vl gradio, use pipeline api and remove interactive chat #3136

irexyc commented Feb 12, 2025 •

edited

Loading

grimoire commented Feb 12, 2025

irexyc commented Feb 13, 2025 •

edited

Loading

grimoire commented Feb 13, 2025

zhulinJulia24 commented Feb 13, 2025

zhulinJulia24 left a comment

[fix] fix vl gradio, use pipeline api and remove interactive chat #3136

[fix] fix vl gradio, use pipeline api and remove interactive chat #3136

Conversation

irexyc commented Feb 12, 2025 • edited Loading

Motivation

grimoire commented Feb 12, 2025

irexyc commented Feb 13, 2025 • edited Loading

grimoire commented Feb 13, 2025

zhulinJulia24 commented Feb 13, 2025

zhulinJulia24 left a comment

Choose a reason for hiding this comment

irexyc commented Feb 12, 2025 •

edited

Loading

irexyc commented Feb 13, 2025 •

edited

Loading