[feature] add dlinfer w8a8 support. #2988

Reinerzhou · 2025-01-04T08:22:40Z

Motivation

add dlinfer w8a8 support.

jinminxi104 · 2025-02-17T02:37:56Z

please solve conflict

lmdeploy/lite/apis/smooth_quant.py

…oad_state_dict * commit 'f6f7a5d707e3ccbc69af10babf1c9afcaf72a402': fix deepseekv2 has no attribute use_mla error (InternLM#3188) fix blocked fp8 moe (InternLM#3181) [Feature] support deepseek-vl2 for pytorch engine (InternLM#3149) make turbomind support gpu embedding inputs (InternLM#3177) fix temperature=0 (InternLM#3176) Update qwen2.py (InternLM#3174) Fix tool call prompt for InternLM and Qwen (InternLM#3156) Use pad_token_id as image_token_id for vl models (InternLM#3158) fix default temperature value (InternLM#3166) fix min length penalty (InternLM#3150) update cuda runtime package dependencies (InternLM#3142) fix typing (InternLM#3153) support deepseekv2 for maca backend. (InternLM#2918) fix the issue that stop_token may be less than defined in model.py (InternLM#3148) [fix] fix vl gradio, use pipeline api and remove interactive chat (InternLM#3136) [feature] add dlinfer w8a8 support. (InternLM#2988) Use aiohttp inside proxy server && add --disable-cache-status argument (InternLM#3020) support eos_token list in turbomind (InternLM#3044)

Reinerzhou marked this pull request as draft January 4, 2025 08:23

Reinerzhou force-pushed the zhousl/w8a8 branch from bf8c259 to 738c79e Compare January 4, 2025 08:34

Reinerzhou changed the title ~~[dlinfer] add dlinfer w8a8 support.~~ [feature] add dlinfer w8a8 support. Jan 4, 2025

Reinerzhou force-pushed the zhousl/w8a8 branch 2 times, most recently from d123900 to fa0a742 Compare January 13, 2025 06:34

Reinerzhou force-pushed the zhousl/w8a8 branch 4 times, most recently from 4850707 to cd88dd1 Compare February 14, 2025 07:04

jinminxi104 approved these changes Feb 17, 2025

View reviewed changes

jinminxi104 marked this pull request as ready for review February 17, 2025 02:37

add dlinfer w8a8 support.

c6fced1

Reinerzhou force-pushed the zhousl/w8a8 branch 2 times, most recently from 9049a75 to aae769e Compare February 17, 2025 02:58

refine code.

a4904cc

Reinerzhou force-pushed the zhousl/w8a8 branch from aae769e to a4904cc Compare February 17, 2025 03:04

lvhan028 reviewed Feb 17, 2025

View reviewed changes

lmdeploy/lite/apis/smooth_quant.py Show resolved Hide resolved

lvhan028 requested a review from AllentDan February 17, 2025 06:23

lvhan028 approved these changes Feb 17, 2025

View reviewed changes

lvhan028 added the enhancement New feature or request label Feb 17, 2025

lvhan028 merged commit c64573b into InternLM:main Feb 17, 2025
5 checks passed

Reinerzhou deleted the zhousl/w8a8 branch February 20, 2025 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] add dlinfer w8a8 support. #2988

[feature] add dlinfer w8a8 support. #2988

Reinerzhou commented Jan 4, 2025

jinminxi104 commented Feb 17, 2025

[feature] add dlinfer w8a8 support. #2988

[feature] add dlinfer w8a8 support. #2988

Conversation

Reinerzhou commented Jan 4, 2025

Motivation

jinminxi104 commented Feb 17, 2025