You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+55-15
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,7 @@ This is the repo for the Stanford Alpaca project, which aims to build and share
15
15
- The [52K data](#data-release) used for fine-tuning the model.
16
16
- The code for [generating the data](#data-generation-process).
17
17
- The code for [fine-tuning the model](#fine-tuning).
18
+
- The code for [recovering Alpaca-7B weights from our released weight diff](#recovering-alpaca-weights).
18
19
19
20
Note: We thank the community for feedback on Stanford-Alpaca and supporting our research. Our live demo is suspended until further notice.
20
21
@@ -115,10 +116,7 @@ We fine-tune LLaMA-7B and LLaMA-13B with the following hyperparameters:
115
116
| Max length | 512 | 512 |
116
117
| Weight decay | 0 | 0 |
117
118
118
-
We have also fine-tuned larger variants of LLaMA and are in the process of evaluating those models.
119
-
120
-
Given Hugging Face hasn't officially supported the LLaMA models, we fine-tuned LLaMA with Hugging Face's transformers library by installing it from a particular fork (i.e. this [PR](https://github.com/huggingface/transformers/pull/21955) to be merged).
121
-
The hash of the specific commit we installed was `68d640f7c368bcaaaecfc678f11908ebbd3d6176`.
119
+
We have also fine-tuned larger variants of LLaMA and performed subsequent RLHF and are in the process of evaluating those models.
122
120
123
121
To reproduce our fine-tuning runs for LLaMA, first install the requirements
Note the given training script is meant to be simple and easy to use, and is not particularly optimized.
197
185
To run on more gpus, you may prefer to turn down `gradient_accumulation_steps` to keep a global batch size of 128. Global batch size has not been tested for optimality.
198
186
187
+
### Addressing OOM
188
+
189
+
Naively, fine-tuning a 7B model requires about 7 x 4 x 4 = 112 GB of VRAM. Commands given above enable parameter sharding, so no redundant model copy is stored on any GPU.
190
+
If you'd like to further reduce the memory footprint, here are some options:
191
+
192
+
- Turn on CPU offload for FSDP with `--fsdp "full_shard auto_wrap offload"`. This saves VRAM at the cost longer runtime.
193
+
- In our experience, DeepSpeed stage-3 (with offload) can at times be more memory efficient than FSDP. Here's an example to use DeepSpeed stage-3 with 4 GPUs with both parameter and optimizer offload:
- The DeepSpeed library also provides some [helpful functions](https://deepspeed.readthedocs.io/en/latest/memory.html) to estimate memory usage.
216
+
- [LoRA](https://arxiv.org/abs/2106.09685) fine-tunes low-rank slices of the query, key, and value embeddings. This can reduce the total memory footprint from 112GB to about 7x4=28GB. We may release our re-implemention of this in the future, but for now the [peft](https://github.com/huggingface/peft) codebase can be a useful resource.
217
+
218
+
## Recovering Alpaca Weights
219
+
220
+
The weight diff between Alpaca-7B and LLaMA-7B is located [here](https://huggingface.co/tatsu-lab/alpaca-7b-wdiff/tree/main).
221
+
To recover the original Alpaca-7B weights, follow these steps:
222
+
```text
223
+
1. Convert Meta's released weights into huggingface format. Follow this guide:
0 commit comments