Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
YOUTUBE VIDEO -- https://www.youtube.com/watch?v=S27pHKBEp30&t=568s
Leo Dirac (@leopd) talks about how LSTM models for Natural Language Processing (NLP) have been practically replaced by transformer-based models. Basic background on NLP, and a brief history of supervised learning techniques on documents, from bag of words, through vanilla RNNs and LSTM. Then there's a technical deep dive into how Transformers work with multi-headed self-attention, and positional encoding. Includes sample code for applying these ideas to real-world projects.
@8:50 -- LSTM - Transfer Learning not Ok
@10:30- Attention is all you need -- Multi Head Attention Mechanism --
Published as a conference paper at ICLR 2021
ABSTRACT - While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited ...
https://arxiv.org/pdf/2010.11929.pdf
Short Name -- Vision_Transformers__AlexeyDosovitskiy_2010.11929.pdf
ABSTRACT -- Transformers have achieved great success in many artificial intelligence fields, such as natural language
processing, computer vision, and audio processing. Therefore, it is natural to attract lots of interest from
academic and industry researchers. Up to the present, a great variety of Transformer variants (a.k.a. X-formers)
have been proposed, however, a systematic and comprehensive literature review on these Transformer variants
is still missing. In this survey, we provide a comprehensive review of various X-formers. We first briefly
introduce the vanilla Transformer and then propose a new taxonomy of X-formers. Next, we introduce the
various X-formers from three perspectives: architectural modification, pre-training, and applications. Finally,
we outline some potential directions for future research.
https://arxiv.org/pdf/2106.04554.pdf
Transformer Attention Modules -- Query-Key-Value