Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attention is all you need #114

Merged
merged 1 commit into from
Sep 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion readme/Arxiv_papers_README.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,4 +168,6 @@ These q-distributions are normally parameterized for each individual data point

However, variational autoencoders use a neural network as an amortized approach to jointly optimize across data points. This neural network takes as input the data points themselves, and outputs parameters for the variational distribution. As it maps from a known input space to the low-dimensional latent space, it is called the encoder.

```
```


Binary file added readme/VIT___2101.01169.pdf
Binary file not shown.
Binary file not shown.
64 changes: 64 additions & 0 deletions readme/arxiv_paper_readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@

> LSTM is dead. Long Live Transformers!
- YOUTUBE VIDEO -- https://www.youtube.com/watch?v=S27pHKBEp30&t=568s
- Leo Dirac (@leopd) talks about how LSTM models for Natural Language Processing (NLP) have been practically replaced by transformer-based models. Basic background on NLP, and a brief history of supervised learning techniques on documents, from bag of words, through vanilla RNNs and LSTM. Then there's a technical deep dive into how Transformers work with multi-headed self-attention, and positional encoding. Includes sample code for applying these ideas to real-world projects.

- @8:50 -- LSTM - Transfer Learning not Ok
- [@10:30](https://www.youtube.com/watch?v=S27pHKBEp30&t=630s)- Attention is all you need -- Multi Head Attention Mechanism --
-

#

<br/>

#

Published as a conference paper at ICLR 2021

> AN IMAGE IS WORTH 16X16 WORDS:TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE -- Alexey Dosovitskiy∗,†, Lucas Beyer∗, Alexander Kolesnikov∗, Dirk Weissenborn∗, Xiaohua Zhai∗, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby∗,† ∗equal technical contribution, †equal advising Google Research, Brain Team {adosovitskiy, neilhoulsby}@google.com

- ABSTRACT - While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited ...

- https://arxiv.org/pdf/2010.11929.pdf
- Short Name -- Vision_Transformers__AlexeyDosovitskiy_2010.11929.pdf

#

<br/>

#

> Transformers in Vision: A Survey
- https://arxiv.org/pdf/2101.01169.pdf


#

<br/>

#


> A Survey of Transformers - TIANYANG LIN, YUXIN WANG, XIANGYANG LIU, and XIPENG QIU∗, School of Computer
Science, Fudan University, China and Shanghai Key Laboratory of Intelligent Information Processing, Fudan
University, China

- ABSTRACT -- Transformers have achieved great success in many artificial intelligence fields, such as natural language
processing, computer vision, and audio processing. Therefore, it is natural to attract lots of interest from
academic and industry researchers. Up to the present, a great variety of Transformer variants (a.k.a. X-formers)
have been proposed, however, a systematic and comprehensive literature review on these Transformer variants
is still missing. In this survey, we provide a comprehensive review of various X-formers. We first briefly
introduce the vanilla Transformer and then propose a new taxonomy of X-formers. Next, we introduce the
various X-formers from three perspectives: architectural modification, pre-training, and applications. Finally,
we outline some potential directions for future research.

- https://arxiv.org/pdf/2106.04554.pdf

- Transformer Attention Modules -- Query-Key-Value

#

<br/>

#

30 changes: 29 additions & 1 deletion readme/todo_list_readme.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@

-[pytorch_unet](https://github.com/RohitDhankar/PyTorch_1/blob/master/src/unet_pytorch_2.py)
-[pytorch_unet_1](https://cs231n.github.io/convolutional-networks/)
-[input_and_output_volume](https://cs231n.github.io/convolutional-networks/)
-[receptive_field_OR_context](https://cs231n.github.io/convolutional-networks/)
-[VISUALIZE_Network](https://github.com/microsoft/tensorwatch)
-[VISUALIZE_receptive_field_OR_context](https://github.com/shelfwise/receptivefield)

-[VISUALIZE_NEURALNET_LAYER_OUTPUTS]
-[XAI_Explainable_AI_HuggingFaceModels](https://jacobgil.github.io/pytorch-gradcam-book/HuggingFace.html)
-[Class_Activation_Maps](https://jacobgil.github.io/pytorch-gradcam-book/Class%20Activation%20Maps%20for%20Object%20Detection%20With%20Faster%20RCNN.html)
-[Gradient_Class_Activation_Maps](https://jacobgil.github.io/pytorch-gradcam-book/Class%20Activation%20Maps%20for%20Object%20Detection%20With%20Faster%20RCNN.html)

- https://jacobgil.github.io/pytorch-gradcam-book/Class%20Activation%20Maps%20for%20Object%20Detection%20With%20Faster%20RCNN.html


- [CycleGAN-pix2pix--pytorch-CycleGAN-and-pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix)
Expand Down Expand Up @@ -117,7 +129,23 @@ with open("logreg_iris.onnx", "wb") as f:

```

#


- [GEO_GIS_Intro_Init_UCLA](https://www.youtube.com/watch?v=gi4UdFsayoM)
- [GEO_GIS__Census_Data_Analysis_Mapping](https://www.youtube.com/watch?v=rrGw6ct-Cbw)
- [GEO_GIS__Spatial_Statistics_with_Python](https://www.youtube.com/watch?v=B_LHPRVEOvs)
-
-

#

- [Forecasting_TimeSeries_LinearModels]
- [Forecasting_TimeSeries_SARIMAX]
-

#

- [Forecasting_TimeSeries_DeepLearning]
- [Corrformer_PyTorch](https://github.com/thuml/Corrformer)
- [Anomaly_Transformer_Time_Series_Anomaly_Detection](https://github.com/thuml/Anomaly-Transformer)
-