Here we provide the llm training examples from scratch.
Our examples integrate with the popular tools and libraries from the ecosystem:
- PyTorch FSDP for distributed training
- Hugging Face Hub for accessing model weights
- Hugging Face Datasets for training and evaluation datasets
- Transformers for training (Trainer) and modeling script
- Accelerate for Multi-GPUs launch
Note: Here we mainly focus on LLM training on multi-GPU, and provide examples for Mixtral 7B training from scratch.
MODEL FAMILY | Verified < MODEL ID > (Hugging Face hub) |
---|---|
Mixtral 7B | "mistralai/Mistral-7B-v0.1" |