Skip to content

Commit d22894d

Browse files
[Docs] Add DialoGPT (#3755)
* add dialoGPT * update README.md * fix conflict * update readme * add code links to docs * Update README.md * Update dialo_gpt2.rst * Update pretrained_models.rst * Update docs/source/model_doc/dialo_gpt2.rst Co-Authored-By: Julien Chaumond <chaumond@gmail.com> * change filename of dialogpt Co-authored-by: Julien Chaumond <chaumond@gmail.com>
1 parent c59b1e6 commit d22894d

19 files changed

+90
-17
lines changed

README.md

+17-16
Original file line numberDiff line numberDiff line change
@@ -148,23 +148,24 @@ At some point in the future, you'll be able to seamlessly move from pre-training
148148

149149
🤗 Transformers currently provides the following NLU/NLG architectures:
150150

151-
1. **[BERT](https://github.com/google-research/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
152-
2. **[GPT](https://github.com/openai/finetune-transformer-lm)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
153-
3. **[GPT-2](https://blog.openai.com/better-language-models/)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
154-
4. **[Transformer-XL](https://github.com/kimiyoung/transformer-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
155-
5. **[XLNet](https://github.com/zihangdai/xlnet/)** (from Google/CMU) released with the paper [​XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
156-
6. **[XLM](https://github.com/facebookresearch/XLM/)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
157-
7. **[RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
158-
8. **[DistilBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
159-
9. **[CTRL](https://github.com/salesforce/ctrl/)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
160-
10. **[CamemBERT](https://camembert-model.fr)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
161-
11. **[ALBERT](https://github.com/google-research/ALBERT)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
162-
12. **[T5](https://github.com/google-research/text-to-text-transfer-transformer)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
163-
13. **[XLM-RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/xlmr)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
151+
1. **[BERT](https://huggingface.co/transformers/model_doc/bert.html)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
152+
2. **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
153+
3. **[GPT-2](https://huggingface.co/transformers/model_doc/gpt2.html)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
154+
4. **[Transformer-XL](https://huggingface.co/transformers/model_doc/transformerxl.html)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
155+
5. **[XLNet](https://huggingface.co/transformers/model_doc/xlnet.html)** (from Google/CMU) released with the paper [​XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
156+
6. **[XLM](https://huggingface.co/transformers/model_doc/xlm.html)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
157+
7. **[RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
158+
8. **[DistilBERT](https://huggingface.co/transformers/model_doc/distilbert.html)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
159+
9. **[CTRL](https://huggingface.co/transformers/model_doc/ctrl.html)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
160+
10. **[CamemBERT](https://huggingface.co/transformers/model_doc/camembert.html)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
161+
11. **[ALBERT](https://huggingface.co/transformers/model_doc/albert.html)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
162+
12. **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
163+
13. **[XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
164164
14. **[MMBT](https://github.com/facebookresearch/mmbt/)** (from Facebook), released together with the paper a [Supervised Multimodal Bitransformers for Classifying Images and Text](https://arxiv.org/pdf/1909.02950.pdf) by Douwe Kiela, Suvrat Bhooshan, Hamed Firooz, Davide Testuggine.
165-
15. **[FlauBERT](https://github.com/getalp/Flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
166-
16. **[BART](https://github.com/pytorch/fairseq/tree/master/examples/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
167-
17. **[ELECTRA](https://github.com/google-research/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
165+
15. **[FlauBERT](https://huggingface.co/transformers/model_doc/flaubert.html)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
166+
16. **[BART](https://huggingface.co/transformers/model_doc/bart.html)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
167+
17. **[ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
168+
18. **[DialoGPT](https://huggingface.co/transformers/model_doc/dialogpt.html)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
168169
18. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
169170
19. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
170171

docs/source/index.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -104,4 +104,5 @@ The library currently contains PyTorch and Tensorflow implementations, pre-train
104104
model_doc/flaubert
105105
model_doc/bart
106106
model_doc/t5
107-
model_doc/electra
107+
model_doc/electra
108+
model_doc/dialogpt

docs/source/model_doc/albert.rst

+2
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ Tips:
3030
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
3131
number of (repeating) layers.
3232

33+
The original code can be found `here <https://github.com/google-research/ALBERT>`_.
34+
3335
AlbertConfig
3436
~~~~~~~~~~~~~~~~~~~~~
3537

docs/source/model_doc/bert.rst

+2
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,8 @@ Tips:
3535
prediction rather than a token prediction. However, averaging over the sequence may yield better results than using
3636
the [CLS] token.
3737

38+
The original code can be found `here <https://github.com/google-research/bert>`_.
39+
3840
BertConfig
3941
~~~~~~~~~~~~~~~~~~~~~
4042

docs/source/model_doc/camembert.rst

+2
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ Tips:
2222
- This implementation is the same as RoBERTa. Refer to the `documentation of RoBERTa <./roberta.html>`__ for usage
2323
examples as well as the information relative to the inputs and outputs.
2424

25+
The original code can be found `here <https://camembert-model.fr/>`_.
26+
2527
CamembertConfig
2628
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2729

docs/source/model_doc/ctrl.rst

+2
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@ Tips:
3131
See `reusing the past in generative models <../quickstart.html#using-the-past>`_ for more information on the usage
3232
of this argument.
3333

34+
The original code can be found `here <https://github.com/salesforce/ctrl>`_.
35+
3436

3537
CTRLConfig
3638
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

docs/source/model_doc/dialogpt.rst

+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
DialoGPT
2+
----------------------------------------------------
3+
4+
Overview
5+
~~~~~~~~~~~~~~~~~~~~~
6+
7+
DialoGPT was proposed in
8+
`DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`_
9+
by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
10+
It's a GPT2 Model trained on 147M conversation-like exchanges extracted from Reddit.
11+
12+
The abstract from the paper is the following:
13+
14+
*We present a large, tunable neural conversational response generation model, DialoGPT (dialogue generative pre-trained transformer).
15+
Trained on 147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, DialoGPT extends the Hugging Face PyTorch transformer to attain a performance close to human both in terms of automatic and human evaluation in single-turn dialogue settings.
16+
We show that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems.
17+
The pre-trained model and training pipeline are publicly released to facilitate research into neural response generation and the development of more intelligent open-domain dialogue systems.*
18+
19+
Tips:
20+
21+
- DialoGPT is a model with absolute position embeddings so it's usually advised to pad the inputs on
22+
the right rather than the left.
23+
- DialoGPT was trained with a causal language modeling (CLM) objective on conversational data and is therefore powerful at response generation in open-domain dialogue systems.
24+
- DialoGPT enables the user to create a chat bot in just 10 lines of code as shown on `DialoGPT's model card <https://huggingface.co/microsoft/DialoGPT-medium>`_.
25+
26+
27+
DialoGPT's architecture is based on the GPT2 model, so one can refer to GPT2's `docstring <https://huggingface.co/transformers/model_doc/gpt2.html>`_.
28+
29+
The original code can be found `here <https://github.com/microsoft/DialoGPT>`_.

docs/source/model_doc/distilbert.rst

+2
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ Tips:
2727
- DistilBert doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or `[SEP]`)
2828
- DistilBert doesn't have options to select the input positions (`position_ids` input). This could be added if necessary though, just let's us know if you need this option.
2929

30+
The original code can be found `here <https://github.com/huggingface/transformers/tree/master/examples/distillation>`_.
31+
3032

3133
DistilBertConfig
3234
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

docs/source/model_doc/electra.rst

+2
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@ Tips:
4444
and the generator may be loaded in the `ElectraForPreTraining` model (the classification head will be randomly
4545
initialized as it doesn't exist in the generator).
4646

47+
The original code can be found `here <https://github.com/google-research/electra>`_.
48+
4749

4850
ElectraConfig
4951
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

docs/source/model_doc/flaubert.rst

+2
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ of the time they outperform other pre-training approaches. Different versions of
2020
evaluation protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared
2121
to the research community for further reproducible experiments in French NLP.*
2222

23+
The original code can be found `here <https://github.com/getalp/Flaubert>`_.
24+
2325

2426
FlaubertConfig
2527
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

docs/source/model_doc/gpt.rst

+3
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@ Tips:
3636
`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by
3737
Hugging Face showcasing the generative capabilities of several models. GPT is one of them.
3838

39+
The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`_.
40+
41+
3942
OpenAIGPTConfig
4043
~~~~~~~~~~~~~~~~~~~~~
4144

docs/source/model_doc/gpt2.rst

+2
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ Tips:
3434
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
3535
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: distilgpt-2.
3636

37+
The original code can be found `here <https://openai.com/blog/better-language-models/>`_.
38+
3739

3840
GPT2Config
3941
~~~~~~~~~~~~~~~~~~~~~

docs/source/model_doc/roberta.rst

+3
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,9 @@ Tips:
2828
- RoBERTa doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or `</s>`)
2929
- `Camembert <./camembert.html>`__ is a wrapper around RoBERTa. Refer to this page for usage examples.
3030

31+
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_.
32+
33+
3134
RobertaConfig
3235
~~~~~~~~~~~~~~~~~~~~~
3336

docs/source/model_doc/t5.rst

+2
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,8 @@ Tips
5757
- For sequence to sequence generation, it is recommended to use ``T5ForConditionalGeneration.generate()``. The method takes care of feeding the encoded input via cross-attention layers to the decoder and auto-regressively generates the decoder output.
5858
- T5 uses relative scalar embeddings. Encoder input padding can be done on the left and on the right.
5959

60+
The original code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`_.
61+
6062

6163
T5Config
6264
~~~~~~~~~~~~~~~~~~~~~

docs/source/model_doc/transformerxl.rst

+2
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ Tips:
3030
The original implementation trains on SQuAD with padding on the left, therefore the padding defaults are set to left.
3131
- Transformer-XL is one of the few models that has no sequence length limit.
3232

33+
The original code can be found `here <https://github.com/kimiyoung/transformer-xl>`_.
34+
3335

3436
TransfoXLConfig
3537
~~~~~~~~~~~~~~~~~~~~~

docs/source/model_doc/xlm.rst

+2
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ Tips:
3030
- XLM has multilingual checkpoints which leverage a specific `lang` parameter. Check out the
3131
`multi-lingual <../multilingual.html>`__ page for more information.
3232

33+
The original code can be found `here <https://github.com/facebookresearch/XLM/>`_.
34+
3335

3436
XLMConfig
3537
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)