support eos_token list in turbomind #3044

irexyc · 2025-01-16T11:53:13Z

Motivation

MinLengthLogitsProcessor in transformers supports list of eos_token_id

Modification

diff --git a/benchmark/profile_throughput.py b/benchmark/profile_throughput.py
index 2e4d2a3b..9fc9605f 100644
--- a/benchmark/profile_throughput.py
+++ b/benchmark/profile_throughput.py
@@ -108,6 +108,7 @@ class Engine:
                 session_id,
                 input_ids=input_ids,
                 gen_config=GenerationConfig(max_new_tokens=output_seqlen,
+                                            min_new_tokens=output_seqlen - 1,
+                                            stop_token_ids=[ 9051, 90306, 96144, 49577, 37966, 99950, 28860, 87040, 43814, 24921, 98583, 49449, 85357, 67608, 69612, 59268],
                                             temperature=temperature,
                                             top_p=top_p,
                                             top_k=top_k,

FT_NVTX=ON /mnt/141/2024.5.1/target-linux-x64/nsys profile -t cuda,nvtx,osrt,cudnn,cublas -o output -f true --stats true  python ../benchmark/profile_throughput.py -n 20000 /home/chenxin/ShareGPT_V3_unfiltered_cleaned_split.json /home/chenxin/Llama-3.2-1B-Instruct


  Time (%)  Total Time (ns)  Instances  Avg (ns)  Med (ns)  Min (ns)  Max (ns)  StdDev (ns)                                                  Name                                                
  --------  ---------------  ---------  --------  --------  --------  --------  -----------  ----------------------------------------------------------------------------------------------------
       0.0         45986089      16761    2743.6    2656.0      2272      3840        276.0  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, const int *, const int *, const int *, int,…

(sz=1) 0.0         45815667      16808    2725.8    2624.0      2272      3904        262.1  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, const int *, const int *, int, int, const i…
(sz=2) 0.0         49642108      16901    2937.2    2848.0      2304      4064        259.0  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, const int *, const int *, int, int, const i…
(sz=4) 0.0         54264699      16838    3222.8    3136.0      2304      4448        273.1  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, const int *, const int *, int, int, const i…
(sz=8) 0.0         55234255      16885    3271.2    3168.0      2304      4608        270.0  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, const int *, const int *, int, int, const i…
(sz=16)0.0         55683921      16858    3303.1    3200.0      2304      5184        266.5  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, const int *, const int *, int, int, const i…

src/turbomind/layers/sampling_layers/LogitsProcessorLayer.cc

src/turbomind/kernels/sampling_penalty_kernels.h

src/turbomind/kernels/sampling_penalty_kernels.cu

lvhan028 · 2025-01-22T13:10:34Z

Overall LGTM

src/turbomind/layers/sampling_layers/LogitsProcessorLayer.cc

lmdeploy/turbomind/turbomind.py

…oad_state_dict * commit 'f6f7a5d707e3ccbc69af10babf1c9afcaf72a402': fix deepseekv2 has no attribute use_mla error (InternLM#3188) fix blocked fp8 moe (InternLM#3181) [Feature] support deepseek-vl2 for pytorch engine (InternLM#3149) make turbomind support gpu embedding inputs (InternLM#3177) fix temperature=0 (InternLM#3176) Update qwen2.py (InternLM#3174) Fix tool call prompt for InternLM and Qwen (InternLM#3156) Use pad_token_id as image_token_id for vl models (InternLM#3158) fix default temperature value (InternLM#3166) fix min length penalty (InternLM#3150) update cuda runtime package dependencies (InternLM#3142) fix typing (InternLM#3153) support deepseekv2 for maca backend. (InternLM#2918) fix the issue that stop_token may be less than defined in model.py (InternLM#3148) [fix] fix vl gradio, use pipeline api and remove interactive chat (InternLM#3136) [feature] add dlinfer w8a8 support. (InternLM#2988) Use aiohttp inside proxy server && add --disable-cache-status argument (InternLM#3020) support eos_token list in turbomind (InternLM#3044)

irexyc added 6 commits January 16, 2025 03:39

support list of eos_id

0a0c7fc

fix wrong size

7288a7a

fix lint

1226fce

fix lint

d1f6fdf

Merge remote-tracking branch 'origin/main' into eos_id_list

3a7078c

fix ut

f13181a

lvhan028 added the improvement label Jan 17, 2025

support GenerationConfig.eos_token_id is None

cc06583

lvhan028 requested review from lvhan028 and lzhangzz January 20, 2025 03:40

lvhan028 reviewed Jan 22, 2025

View reviewed changes

src/turbomind/layers/sampling_layers/LogitsProcessorLayer.cc Outdated Show resolved Hide resolved

lvhan028 reviewed Jan 22, 2025

View reviewed changes

src/turbomind/kernels/sampling_penalty_kernels.h Outdated Show resolved Hide resolved

lvhan028 reviewed Jan 22, 2025

View reviewed changes

src/turbomind/kernels/sampling_penalty_kernels.cu Outdated Show resolved Hide resolved

lvhan028 reviewed Jan 22, 2025

View reviewed changes

src/turbomind/kernels/sampling_penalty_kernels.cu Outdated Show resolved Hide resolved

irexyc added 2 commits January 23, 2025 06:57

rename

5b4d9df

remove unused

c34def1

lvhan028 approved these changes Jan 23, 2025

View reviewed changes

irexyc added 4 commits January 24, 2025 08:16

remove start_id

0782f0d

Merge remote-tracking branch 'origin/main' into eos_id_list

a70e7d7

move eos_id from model to gen_confg

e675274

move stop/bad words to turbomind::GenerationConfig

e5e2088

lvhan028 reviewed Feb 11, 2025

View reviewed changes

src/turbomind/layers/sampling_layers/LogitsProcessorLayer.cc Outdated Show resolved Hide resolved

Merge remote-tracking branch 'origin/main' into eos_id_list

bd5a393

lvhan028 reviewed Feb 12, 2025

View reviewed changes

lmdeploy/turbomind/turbomind.py Show resolved Hide resolved

irexyc added 3 commits February 12, 2025 13:19

remove tokenizer_info

7b981ac

update invoke_min_length_penalty

ca183e2

use array to store stop_ids/bad_ids

f94db42

lzhangzz approved these changes Feb 17, 2025

View reviewed changes

lvhan028 merged commit bfc845a into InternLM:main Feb 17, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support eos_token list in turbomind #3044

support eos_token list in turbomind #3044

irexyc commented Jan 16, 2025 •

edited

Loading

lvhan028 commented Jan 22, 2025

support eos_token list in turbomind #3044

support eos_token list in turbomind #3044

Conversation

irexyc commented Jan 16, 2025 • edited Loading

Motivation

Modification

lvhan028 commented Jan 22, 2025

irexyc commented Jan 16, 2025 •

edited

Loading