@@ -78,6 +78,8 @@ <h2><a href="xl/index.html">Transformer XL</a></h2>
7878< p > This implements Transformer XL model using < a href ="xl/relative_mha.html "> relative multi-head attention</ a > </ p >
7979< h2 > < a href ="rope/index.html "> Rotary Positional Embeddings</ a > </ h2 >
8080< p > This implements Rotary Positional Embeddings (RoPE)</ p >
81+ < h2 > < a href ="retro/index.html "> RETRO</ a > </ h2 >
82+ < p > This implements the Retrieval-Enhanced Transformer (RETRO).</ p >
8183< h2 > < a href ="compressive/index.html "> Compressive Transformer</ a > </ h2 >
8284< p > This is an implementation of compressive transformer that extends upon < a href ="xl/index.html "> Transformer XL</ a > by compressing oldest memories to give a longer attention span.</ p >
8385< h2 > < a href ="gpt/index.html "> GPT Architecture</ a > </ h2 >
@@ -111,10 +113,10 @@ <h2><a href="hour_glass/index.html">Hourglass</a></h2>
111113
112114 </ div >
113115 < div class ='code '>
114- < div class ="highlight "> < pre > < span class ="lineno "> 106 </ span > < span > </ span > < span class ="kn "> from</ span > < span class ="nn "> .configs</ span > < span class ="kn "> import</ span > < span class ="n "> TransformerConfigs</ span >
115- < span class ="lineno "> 107 </ span > < span class ="kn "> from</ span > < span class ="nn "> .models</ span > < span class ="kn "> import</ span > < span class ="n "> TransformerLayer</ span > < span class ="p "> ,</ span > < span class ="n "> Encoder</ span > < span class ="p "> ,</ span > < span class ="n "> Decoder</ span > < span class ="p "> ,</ span > < span class ="n "> Generator</ span > < span class ="p "> ,</ span > < span class ="n "> EncoderDecoder</ span >
116- < span class ="lineno "> 108 </ span > < span class ="kn "> from</ span > < span class ="nn "> .mha</ span > < span class ="kn "> import</ span > < span class ="n "> MultiHeadAttention</ span >
117- < span class ="lineno "> 109 </ span > < span class ="kn "> from</ span > < span class ="nn "> labml_nn.transformers.xl.relative_mha</ span > < span class ="kn "> import</ span > < span class ="n "> RelativeMultiHeadAttention</ span > </ pre > </ div >
116+ < div class ="highlight "> < pre > < span class ="lineno "> 109 </ span > < span > </ span > < span class ="kn "> from</ span > < span class ="nn "> .configs</ span > < span class ="kn "> import</ span > < span class ="n "> TransformerConfigs</ span >
117+ < span class ="lineno "> 110 </ span > < span class ="kn "> from</ span > < span class ="nn "> .models</ span > < span class ="kn "> import</ span > < span class ="n "> TransformerLayer</ span > < span class ="p "> ,</ span > < span class ="n "> Encoder</ span > < span class ="p "> ,</ span > < span class ="n "> Decoder</ span > < span class ="p "> ,</ span > < span class ="n "> Generator</ span > < span class ="p "> ,</ span > < span class ="n "> EncoderDecoder</ span >
118+ < span class ="lineno "> 111 </ span > < span class ="kn "> from</ span > < span class ="nn "> .mha</ span > < span class ="kn "> import</ span > < span class ="n "> MultiHeadAttention</ span >
119+ < span class ="lineno "> 112 </ span > < span class ="kn "> from</ span > < span class ="nn "> labml_nn.transformers.xl.relative_mha</ span > < span class ="kn "> import</ span > < span class ="n "> RelativeMultiHeadAttention</ span > </ pre > </ div >
118120 </ div >
119121 </ div >
120122 < div class ='footer '>
0 commit comments