|
| 1 | + |
| 2 | +> LSTM is dead. Long Live Transformers! |
| 3 | +- YOUTUBE VIDEO -- https://www.youtube.com/watch?v=S27pHKBEp30&t=568s |
| 4 | +- Leo Dirac (@leopd) talks about how LSTM models for Natural Language Processing (NLP) have been practically replaced by transformer-based models. Basic background on NLP, and a brief history of supervised learning techniques on documents, from bag of words, through vanilla RNNs and LSTM. Then there's a technical deep dive into how Transformers work with multi-headed self-attention, and positional encoding. Includes sample code for applying these ideas to real-world projects. |
| 5 | + |
| 6 | +- @8:50 -- LSTM - Transfer Learning not Ok |
| 7 | +- [@10:30](https://www.youtube.com/watch?v=S27pHKBEp30&t=630s)- Attention is all you need -- Multi Head Attention Mechanism -- |
| 8 | +- |
| 9 | + |
| 10 | +# |
| 11 | + |
| 12 | +<br/> |
| 13 | + |
| 14 | +# |
| 15 | + |
| 16 | +Published as a conference paper at ICLR 2021 |
| 17 | + |
| 18 | +> AN IMAGE IS WORTH 16X16 WORDS:TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE -- Alexey Dosovitskiy∗,†, Lucas Beyer∗, Alexander Kolesnikov∗, Dirk Weissenborn∗, Xiaohua Zhai∗, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby∗,† ∗equal technical contribution, †equal advising Google Research, Brain Team {adosovitskiy, neilhoulsby}@google.com |
| 19 | +
|
| 20 | +- ABSTRACT - While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited ... |
| 21 | + |
| 22 | +- https://arxiv.org/pdf/2010.11929.pdf |
| 23 | +- Short Name -- Vision_Transformers__AlexeyDosovitskiy_2010.11929.pdf |
| 24 | + |
| 25 | +# |
| 26 | + |
| 27 | +<br/> |
| 28 | + |
| 29 | +# |
| 30 | + |
| 31 | +> Transformers in Vision: A Survey |
| 32 | +- https://arxiv.org/pdf/2101.01169.pdf |
| 33 | + |
| 34 | + |
| 35 | +# |
| 36 | + |
| 37 | +<br/> |
| 38 | + |
| 39 | +# |
| 40 | + |
| 41 | + |
| 42 | +> A Survey of Transformers - TIANYANG LIN, YUXIN WANG, XIANGYANG LIU, and XIPENG QIU∗, School of Computer |
| 43 | +Science, Fudan University, China and Shanghai Key Laboratory of Intelligent Information Processing, Fudan |
| 44 | +University, China |
| 45 | + |
| 46 | +- ABSTRACT -- Transformers have achieved great success in many artificial intelligence fields, such as natural language |
| 47 | +processing, computer vision, and audio processing. Therefore, it is natural to attract lots of interest from |
| 48 | +academic and industry researchers. Up to the present, a great variety of Transformer variants (a.k.a. X-formers) |
| 49 | +have been proposed, however, a systematic and comprehensive literature review on these Transformer variants |
| 50 | +is still missing. In this survey, we provide a comprehensive review of various X-formers. We first briefly |
| 51 | +introduce the vanilla Transformer and then propose a new taxonomy of X-formers. Next, we introduce the |
| 52 | +various X-formers from three perspectives: architectural modification, pre-training, and applications. Finally, |
| 53 | +we outline some potential directions for future research. |
| 54 | + |
| 55 | +- https://arxiv.org/pdf/2106.04554.pdf |
| 56 | + |
| 57 | +- Transformer Attention Modules -- Query-Key-Value |
| 58 | + |
| 59 | +# |
| 60 | + |
| 61 | +<br/> |
| 62 | + |
| 63 | +# |
| 64 | + |
0 commit comments