You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix to "perhaps there is a misprint at line 40 #2111";
review of referenced paper https://arxiv.org/pdf/1706.03762.pdf section 3.2.3 suggests:
"Similarly, self-attention layers in the decoder allow each position in the decoder to attend to
all positions in the decoder up to and including that position. We need to prevent leftward
information flow in the decoder to preserve the auto-regressive property. We implement this
inside of scaled dot-product attention by masking out (setting to −∞) all values in the input
of the softmax which correspond to illegal connections. See Figure 2."
Thus the suggested change in reference from nn.Transform.Encoder to nn.Transform.Decoder seems reasonable.
0 commit comments