Skip to content

Llama inference #2144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 21, 2023
Merged

Llama inference #2144

merged 4 commits into from
Mar 21, 2023

Conversation

yk
Copy link
Collaborator

@yk yk commented Mar 20, 2023

some hacks here to get it to work. This goes in conjunction with a fork of tokenizers and a fork of the HF text generation server.
sentencepiece tokenizer doesn't play nicely with streaming, i.e. decoding individual tokens somehow is not very helpful and leads to artefacts, so the way around it is to always decode from the beginning

@yk yk added the inference label Mar 20, 2023
@yk yk requested a review from andreaskoepf as a code owner March 20, 2023 23:23
if __name__ == "__main__":
model_id = os.getenv("MODEL_ID")
if "llama" in model_id:
transformers.LlamaTokenizer.from_pretrained(model_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw llama was merged in huggingface transformers .. I will test today if this is really necessary or if we can use the Auto functions now .. also we have to try if there were significant changes .. in worst case we need to retrain our models.

@andreaskoepf andreaskoepf merged commit 792fd18 into main Mar 21, 2023
@andreaskoepf andreaskoepf deleted the llama-inference branch March 21, 2023 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants