Skip to content

Commit 679f56d

Browse files
committed
patch documentation.
1 parent e408b27 commit 679f56d

File tree

2 files changed

+413
-8
lines changed

2 files changed

+413
-8
lines changed

README.md

+6-8
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77

88
[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE)
99
[![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/DATA_LICENSE)
10+
[![Weight Diff License](https://img.shields.io/badge/Weight%20Diff%20License-CC%20By%20NC%204.0-yellow)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/WEIGHT_DIFF_LICENSE)
1011
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)
1112
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
1213

@@ -19,7 +20,8 @@ This is the repo for the Stanford Alpaca project, which aims to build and share
1920

2021
Note: We thank the community for feedback on Stanford-Alpaca and supporting our research. Our live demo is suspended until further notice.
2122

22-
**Usage and License Notices**: Alpaca is intended and licensed for research use only. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.
23+
**Usage and License Notices**: Alpaca is intended and licensed for research use only. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.
24+
The weight diff is also CC BY NC 4.0 (allowing only non-commercial use).
2325

2426
## Overview
2527

@@ -116,16 +118,12 @@ We fine-tune LLaMA-7B and LLaMA-13B with the following hyperparameters:
116118
| Max length | 512 | 512 |
117119
| Weight decay | 0 | 0 |
118120

119-
We have also fine-tuned larger variants of LLaMA and performed subsequent RLHF and are in the process of evaluating those models.
120-
121121
To reproduce our fine-tuning runs for LLaMA, first install the requirements
122122

123123
```bash
124124
pip install -r requirements.txt
125125
```
126126

127-
Then, install the particular fork of Hugging Face's transformers library.
128-
129127
Below is a command that fine-tunes LLaMA-7B with our dataset on a machine with 4 A100 80G GPUs in FSDP `full_shard` mode.
130128
We were able to reproduce a model of similar quality as the one we hosted in our demo with the following command using **Python 3.10**.
131129
Replace `<your_random_port>` with a port of your own, `<your_path_to_hf_converted_llama_ckpt_and_tokenizer>` with the
@@ -189,8 +187,8 @@ To run on more gpus, you may prefer to turn down `gradient_accumulation_steps` t
189187
Naively, fine-tuning a 7B model requires about 7 x 4 x 4 = 112 GB of VRAM. Commands given above enable parameter sharding, so no redundant model copy is stored on any GPU.
190188
If you'd like to further reduce the memory footprint, here are some options:
191189

192-
- Turn on CPU offload for FSDP with `--fsdp "full_shard auto_wrap offload"`. This saves VRAM at the cost longer runtime.
193-
- In our experience, DeepSpeed stage-3 (with offload) can at times be more memory efficient than FSDP. Here's an example to use DeepSpeed stage-3 with 4 GPUs with both parameter and optimizer offload:
190+
- Turn on CPU offload for FSDP with `--fsdp "full_shard auto_wrap offload"`. This saves VRAM at the cost of longer runtime.
191+
- In our experience, DeepSpeed stage-3 (with offload) can at times be more memory efficient than FSDP with offload. Here's an example to use DeepSpeed stage-3 with 4 GPUs with both parameter and optimizer offload:
194192
```bash
195193
pip install deepspeed
196194
torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
@@ -213,7 +211,7 @@ If you'd like to further reduce the memory footprint, here are some options:
213211
--tf32 True
214212
```
215213
- The DeepSpeed library also provides some [helpful functions](https://deepspeed.readthedocs.io/en/latest/memory.html) to estimate memory usage.
216-
- [LoRA](https://arxiv.org/abs/2106.09685) fine-tunes low-rank slices of the query, key, and value embeddings. This can reduce the total memory footprint from 112GB to about 7x4=28GB. We may release our re-implemention of this in the future, but for now the [peft](https://github.com/huggingface/peft) codebase can be a useful resource.
214+
- [LoRA](https://arxiv.org/abs/2106.09685) fine-tunes low-rank slices of the query, key, and value embedding heads. This can reduce the total memory footprint from 112GB to about 7x4=28GB. We may release our re-implemention of this in the future, but for now the [peft](https://github.com/huggingface/peft) codebase can be a useful resource.
217215

218216
## Recovering Alpaca Weights
219217

0 commit comments

Comments
 (0)