Llama 3.1 RoPE scaling #205

tengomucho · 2024-12-13T09:19:26Z

Added RoPE scaling support for Llama. This will enable the Llama 3.1, 3.2 and 3.3 (not vision) to be supported.

This is imported from the original Llama reference implementation: https://github.com/meta-llama/llama-models/blob/7890266c5a3ccd29e739d53a71ea968bcf4ca400/models/llama3/reference_impl/model.py#L45 Note that the function does not have any effect on the original model code as long as the use_scaled parameter is false (the default).

These are aligned with HF ones, so it will be easier to implement rope scaling as it is done in Llama3.1.

tengomucho · 2024-12-13T09:19:50Z

CC @qihqi for info

qihqi · 2024-12-13T18:15:44Z

Hi,

you probably also need to add an entry in here: https://github.com/AI-Hypercomputer/jetstream-pytorch/blob/main/jetstream_pt/third_party/llama/model_exportable.py#L294
for the name to be recognized, and the actual config values in https://github.com/AI-Hypercomputer/jetstream-pytorch/blob/main/jetstream_pt/third_party/llama/model_args.py#L43 for it to work.

tengomucho added 3 commits December 13, 2024 09:07

feat(llama): add RopeScalingArgs

fcffabe

These are aligned with HF ones, so it will be easier to implement rope scaling as it is done in Llama3.1.

feat(llama): support rope scaling arguments to improve flexibility

3d960ad

tengomucho mentioned this pull request Dec 13, 2024

🦙 Newer Llamas support huggingface/optimum-tpu#129

Merged

2 tasks

qihqi self-requested a review December 13, 2024 18:12

tengomucho added 3 commits December 16, 2024 16:17

chore: relax safetensors pattern on download

e9fc317

feat: untie weights when needed (i.e.: Llama3.2-1B)

6514228

feat: add support for Llama3.1 - 3.2 and 3.3 models

c4705e9

qihqi approved these changes Dec 16, 2024

View reviewed changes

qihqi merged commit bb174b6 into AI-Hypercomputer:main Dec 16, 2024
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3.1 RoPE scaling #205

Llama 3.1 RoPE scaling #205

tengomucho commented Dec 13, 2024

tengomucho commented Dec 13, 2024

qihqi commented Dec 13, 2024

Llama 3.1 RoPE scaling #205

Llama 3.1 RoPE scaling #205

Conversation

tengomucho commented Dec 13, 2024

tengomucho commented Dec 13, 2024

qihqi commented Dec 13, 2024