Llama inference #2144

yk · 2023-03-20T23:23:34Z

some hacks here to get it to work. This goes in conjunction with a fork of tokenizers and a fork of the HF text generation server.
sentencepiece tokenizer doesn't play nicely with streaming, i.e. decoding individual tokens somehow is not very helpful and leads to artefacts, so the way around it is to always decode from the beginning

andreaskoepf · 2023-03-21T08:44:12Z

inference/worker/download_model.py

+if __name__ == "__main__":
+    model_id = os.getenv("MODEL_ID")
+    if "llama" in model_id:
+        transformers.LlamaTokenizer.from_pretrained(model_id)


I saw llama was merged in huggingface transformers .. I will test today if this is really necessary or if we can use the Auto functions now .. also we have to try if there were significant changes .. in worst case we need to retrain our models.

yk added 4 commits March 20, 2023 21:27

added model downloading script to get around rust server auth issues

7d07cb1

commented out calling the basic HF server for llama

37e0ad6

added a decoder hack for llama

a7ddf17

more hacks for llama

6706f03

yk added the inference label Mar 20, 2023

yk requested a review from andreaskoepf as a code owner March 20, 2023 23:23

andreaskoepf reviewed Mar 21, 2023

View reviewed changes

andreaskoepf approved these changes Mar 21, 2023

View reviewed changes

andreaskoepf merged commit 792fd18 into main Mar 21, 2023

andreaskoepf deleted the llama-inference branch March 21, 2023 08:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama inference #2144

Llama inference #2144

Uh oh!

yk commented Mar 20, 2023

Uh oh!

andreaskoepf Mar 21, 2023

Uh oh!

Uh oh!

Llama inference #2144

Llama inference #2144

Uh oh!

Conversation

yk commented Mar 20, 2023

Uh oh!

andreaskoepf Mar 21, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!