Skip to content

Llama 3.1 RoPE scaling #205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 16, 2024
Merged

Conversation

tengomucho
Copy link
Contributor

Added RoPE scaling support for Llama. This will enable the Llama 3.1, 3.2 and 3.3 (not vision) to be supported.

This is imported from the original Llama reference implementation:
https://github.com/meta-llama/llama-models/blob/7890266c5a3ccd29e739d53a71ea968bcf4ca400/models/llama3/reference_impl/model.py#L45

Note that the function does not have any effect on the original model
code as long as the use_scaled parameter is false (the default).
These are aligned with HF ones, so it will be easier to implement rope
scaling as it is done in Llama3.1.
@tengomucho
Copy link
Contributor Author

CC @qihqi for info

@qihqi
Copy link
Collaborator

qihqi commented Dec 13, 2024

Hi,

you probably also need to add an entry in here: https://github.com/AI-Hypercomputer/jetstream-pytorch/blob/main/jetstream_pt/third_party/llama/model_exportable.py#L294
for the name to be recognized, and the actual config values in https://github.com/AI-Hypercomputer/jetstream-pytorch/blob/main/jetstream_pt/third_party/llama/model_args.py#L43 for it to work.

@qihqi qihqi merged commit bb174b6 into AI-Hypercomputer:main Dec 16, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants