-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformer Language Model Tutorial - Incorrect Attention Mask Description #1877
Comments
@montyevans Make sense! Please feel free to send a PR. |
@montyevans @hudeven To add to this, the transformer (at least as it's implemented in "Attention is all you need") does not mask self-attention in the encoder, rather, the upper triangular mask is for the self-attention blocks in the decoder. We want the encoder to see the whole sequence, but the decoder should only be able to see the tokens it has predicted up to that point, hence the mask. The tutorial applies the mask to the encoder instead. Hope this helps. |
/assigntome |
The tutorial is using a transformer encoder and the mask used was for masking a decoder which is not part of the tutorial. The mask is removed. Some variable names are changed to better reflect the purpose of the variable. Also, some unused imports are removed. Fixes pytorch#1877 Signed-off-by: BJ Hargrave <hargrave@us.ibm.com>
Fixes #1877 The tutorial is using a transformer encoder and the mask used was for masking a decoder which is not part of the tutorial. The mask is removed. Some variable names are changed to better reflect the purpose of the variable. Also, some unused imports are removed. Signed-off-by: BJ Hargrave <hargrave@us.ibm.com>
The tutorial Language Modeling With nn.Transformer and Torchtext describes an attention mask that will prevent
nn.TransformerEncoder
from attending to not-yet-seen tokens:I think the mask is actually upper triangular: a square mask would not prevent a token attending to future token.
I may have misunderstood the mask description; if not, happy to write the PR that fixes this.
cc @pytorch/team-text-core @Nayef211
The text was updated successfully, but these errors were encountered: