Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support eos_token list in turbomind #3044

Merged
merged 17 commits into from
Feb 17, 2025
Merged

Conversation

irexyc
Copy link
Collaborator

@irexyc irexyc commented Jan 16, 2025

Motivation

MinLengthLogitsProcessor in transformers supports list of eos_token_id

Modification

diff --git a/benchmark/profile_throughput.py b/benchmark/profile_throughput.py
index 2e4d2a3b..9fc9605f 100644
--- a/benchmark/profile_throughput.py
+++ b/benchmark/profile_throughput.py
@@ -108,6 +108,7 @@ class Engine:
                 session_id,
                 input_ids=input_ids,
                 gen_config=GenerationConfig(max_new_tokens=output_seqlen,
+                                            min_new_tokens=output_seqlen - 1,
+                                            stop_token_ids=[ 9051, 90306, 96144, 49577, 37966, 99950, 28860, 87040, 43814, 24921, 98583, 49449, 85357, 67608, 69612, 59268],
                                             temperature=temperature,
                                             top_p=top_p,
                                             top_k=top_k,
FT_NVTX=ON /mnt/141/2024.5.1/target-linux-x64/nsys profile -t cuda,nvtx,osrt,cudnn,cublas -o output -f true --stats true  python ../benchmark/profile_throughput.py -n 20000 /home/chenxin/ShareGPT_V3_unfiltered_cleaned_split.json /home/chenxin/Llama-3.2-1B-Instruct


  Time (%)  Total Time (ns)  Instances  Avg (ns)  Med (ns)  Min (ns)  Max (ns)  StdDev (ns)                                                  Name                                                
  --------  ---------------  ---------  --------  --------  --------  --------  -----------  ----------------------------------------------------------------------------------------------------
       0.0         45986089      16761    2743.6    2656.0      2272      3840        276.0  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, const int *, const int *, const int *, int,…

(sz=1) 0.0         45815667      16808    2725.8    2624.0      2272      3904        262.1  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, const int *, const int *, int, int, const i…
(sz=2) 0.0         49642108      16901    2937.2    2848.0      2304      4064        259.0  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, const int *, const int *, int, int, const i…
(sz=4) 0.0         54264699      16838    3222.8    3136.0      2304      4448        273.1  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, const int *, const int *, int, int, const i…
(sz=8) 0.0         55234255      16885    3271.2    3168.0      2304      4608        270.0  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, const int *, const int *, int, int, const i…
(sz=16)0.0         55683921      16858    3303.1    3200.0      2304      5184        266.5  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, const int *, const int *, int, int, const i…

@lvhan028 lvhan028 requested review from lvhan028 and lzhangzz January 20, 2025 03:40
@lvhan028
Copy link
Collaborator

Overall LGTM

@lvhan028 lvhan028 merged commit bfc845a into InternLM:main Feb 17, 2025
9 checks passed
tastelikefeet added a commit to tastelikefeet/lmdeploy that referenced this pull request Feb 25, 2025
…oad_state_dict

* commit 'f6f7a5d707e3ccbc69af10babf1c9afcaf72a402':
  fix deepseekv2 has no attribute use_mla error (InternLM#3188)
  fix blocked fp8 moe (InternLM#3181)
  [Feature] support deepseek-vl2 for pytorch engine (InternLM#3149)
  make turbomind support gpu embedding inputs (InternLM#3177)
  fix temperature=0 (InternLM#3176)
  Update qwen2.py (InternLM#3174)
  Fix tool call prompt for InternLM and Qwen (InternLM#3156)
  Use pad_token_id as image_token_id for vl models (InternLM#3158)
  fix default temperature value (InternLM#3166)
  fix min length penalty (InternLM#3150)
  update cuda runtime package dependencies (InternLM#3142)
  fix typing (InternLM#3153)
  support deepseekv2 for maca backend. (InternLM#2918)
  fix the issue that stop_token may be less than defined in model.py (InternLM#3148)
  [fix] fix vl gradio, use pipeline api and remove interactive chat (InternLM#3136)
  [feature] add dlinfer w8a8 support. (InternLM#2988)
  Use aiohttp inside proxy server && add --disable-cache-status argument (InternLM#3020)
  support eos_token list in turbomind (InternLM#3044)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants