|
| 1 | +# Intro |
| 2 | +RAG is a seq2seq model which encapsulates two core components: a question encoder and a generator. |
| 3 | +During a forward pass, we encode the input with the question encoder and pass it |
| 4 | +to the retriever to extract relevant context documents. The documents are then prepended to the input. |
| 5 | +Such contextualized inputs is passed to the generator. |
| 6 | + |
| 7 | +The question encoder can be any `autoencoding` model, preferably :obj:`~transformers.DPRQuestionEncoder`, and the generator can be any `seq2seq` model, preferably :obj:`~transformers.BartForConditionalGeneration`. |
| 8 | + |
| 9 | +The model can be initialized with a :obj:`~transformers.RagRetriever` for end-to-end generation or used in combination with the outputs of a retriever in multiple steps - see examples for more details. |
| 10 | +The model is compatible any `autoencoding` model as the ``question_encoder`` and any `seq2seq` model with language model head as the ``generator``. |
| 11 | +The model has been tested with :class:`~transformers.DPRQuestionEncoder` as the ``question_encoder`` and :class:`~transformers.BartForConditionalGeneration` or :class:`~transformers.T5ForConditionalGeneration` as the ``generator``. |
| 12 | + |
| 13 | +RAG models were released with the paper `Retrieval-Augmented Generation for |
| 14 | +Knowledge-Intensive NLP Tasks <https://arxiv.org/abs/2005.11401>`_ by Patrick Lewis, Ethan Perez, Aleksandra Piktus et al. |
| 15 | + |
| 16 | + |
| 17 | +# Finetuning |
| 18 | +Our finetuning logic is based on scripts from [`examples/seq2seq`](https://github.com/huggingface/transformers/tree/master/examples/seq2seq). |
| 19 | +Follow instructions there regarding data preprocessing. A sample finetuning command: |
| 20 | + |
| 21 | +``` |
| 22 | +python examples/rag/finetune.py \ |
| 23 | + --data_dir $DATA_DIR \ |
| 24 | + --output_dir $OUTPUT_DIR \ |
| 25 | + --model_name_or_path $MODEL_NAME_OR_PATH \ |
| 26 | + --model_type rag_sequence \ |
| 27 | + --fp16 \ |
| 28 | + --gpus 8 |
| 29 | +``` |
| 30 | + |
| 31 | + |
| 32 | +# Evaluation |
| 33 | +Apart from the parameters specifying the model to evaluate and some extra parameters, the evaluation script expects paths to two files: |
| 34 | +- `evaluation_set` - a path to a file specifying the evaluation dataset, a single datapoint per line, e.g. |
| 35 | +```who is the owner of reading football club``` |
| 36 | +- `gold_data_path` - a path to a file contaning ground truth answers for datapoints from the `evaluation_set`. |
| 37 | + |
| 38 | +We expect the following formats of the gold data file: |
| 39 | + |
| 40 | +- for e2e evaluation, we support two formats of the gold file: |
| 41 | + - `qa` - where a single line in the following format: input [tab] output_list, e.g.: |
| 42 | + ``` |
| 43 | + who is the owner of reading football club ['Xiu Li Dai', 'Dai Yongge', 'Dai Xiuli', 'Yongge Dai'] |
| 44 | + ``` |
| 45 | + - `ans` - where a single line of the gold file contains the expected output string, e.g.: |
| 46 | + ``` |
| 47 | + Xiu Li Dai |
| 48 | + ``` |
| 49 | +
|
| 50 | +- for retrieval evaluation, we expect a tab-separated list of Wikipedia page titles constituting positive contexts for a given query, e.g. given a question `who sings does he love me with reba`, a line with ground truth retrieval data could look as follows: |
| 51 | +``` |
| 52 | +Does He Love You Does He Love You Red Sandy Spika dress of Reba McEntire Greatest Hits Volume Two (Reba McEntire album) Shoot for the Moon (album) |
| 53 | +``` |
| 54 | +
|
| 55 | +## Retrieval evaluation |
| 56 | +
|
| 57 | +We demonstrate how to evaluate retrieval against DPR evaluation data. You can download respective files from links listed [here](https://github.com/facebookresearch/DPR/blob/master/data/download_data.py#L39-L45). |
| 58 | +
|
| 59 | +1. Download and unzip the gold data file. We use the `biencoder-nq-dev` from https://dl.fbaipublicfiles.com/dpr/data/retriever/biencoder-nq-dev.json.gz. |
| 60 | +2. Parse the unziped file using the `parse_dpr_relevance_data.py` |
| 61 | +``` |
| 62 | +python examples/rag/parse_dpr_relevance_data.py --src_path path/to/unziped/biencoder-nq-dev.json --evaluation_set path/to/output/biencoder-nq-dev.questions --gold_data_path path/to/output/biencoder-nq-dev.pages |
| 63 | +``` |
| 64 | +3. Run evaluation: |
| 65 | +``` |
| 66 | +python examples/rag/eval_rag.py \ |
| 67 | + --model_name_or_path $MODEL_NAME_OR_PATH \ # model name or path of the model we're evaluating |
| 68 | + --model_type rag_sequence \ # RAG model type (rag_token or rag_sequence) |
| 69 | + --evaluation_set path/to/output/biencoder-nq-dev.questions \ # an input dataset for evaluation |
| 70 | + --gold_data_path path/to/output/biencoder-nq-dev.pages \ # a dataset containing ground truth answers for samples from the evaluation_set |
| 71 | + --predictions_path path/to/retrieval_preds.tsv \ # name of file in which predictions will be stored |
| 72 | + --eval_mode retrieval \ # indicates whether we're performing retrieval evaluation or e2e evaluation |
| 73 | + --recalculate # if predictions_filename already exists, and this option is set - we regenerate the answers, otherwise we reuse the predicsion file to calculate metrics. |
| 74 | +``` |
| 75 | +
|
| 76 | +
|
| 77 | +## End-to-end evaluation |
| 78 | +``` |
| 79 | +python examples/rag/eval_rag.py \ |
| 80 | + --model_name_or_path $MODEL_NAME_OR_PATH \ |
| 81 | + --model_type rag_sequence \ |
| 82 | + --evaluation_set path/to/test.source \ |
| 83 | + --gold_data_path path/to/gold_data \ |
| 84 | + --predictions_path path/to/e2e_preds.txt \ |
| 85 | + --eval_mode e2e \ # indicates whether we're performing retrieval evaluation or e2e evaluation (default) |
| 86 | + --n_docs 5 \ # You can experiment with retrieving different number of documents at evaluation time |
| 87 | + --print_predictions |
| 88 | +``` |
0 commit comments