Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformer Language Model Tutorial - Incorrect Attention Mask Description #1877

Closed
zoeqevans opened this issue Mar 28, 2022 · 3 comments · Fixed by #2423
Closed

Transformer Language Model Tutorial - Incorrect Attention Mask Description #1877

zoeqevans opened this issue Mar 28, 2022 · 3 comments · Fixed by #2423
Assignees
Labels
docathon-h1-2023 A label for the docathon in H1 2023 medium module: torchtext

Comments

@zoeqevans
Copy link

zoeqevans commented Mar 28, 2022

The tutorial Language Modeling With nn.Transformer and Torchtext describes an attention mask that will prevent nn.TransformerEncoder from attending to not-yet-seen tokens:

Along with the input sequence, a square attention mask is required because the self-attention layers in nn.TransformerEncoder are only allowed to attend the earlier positions in the sequence.

I think the mask is actually upper triangular: a square mask would not prevent a token attending to future token.

I may have misunderstood the mask description; if not, happy to write the PR that fixes this.

cc @pytorch/team-text-core @Nayef211

@hudeven
Copy link

hudeven commented Oct 6, 2022

@montyevans Make sense! Please feel free to send a PR.

@keith-golden
Copy link

@montyevans @hudeven To add to this, the transformer (at least as it's implemented in "Attention is all you need") does not mask self-attention in the encoder, rather, the upper triangular mask is for the self-attention blocks in the decoder.

We want the encoder to see the whole sequence, but the decoder should only be able to see the tokens it has predicted up to that point, hence the mask. The tutorial applies the mask to the encoder instead. Hope this helps.

@svekars svekars added medium docathon-h1-2023 A label for the docathon in H1 2023 labels May 31, 2023
@bjhargrave
Copy link
Contributor

/assigntome

bjhargrave added a commit to bjhargrave/pytorch-tutorials that referenced this issue Jun 5, 2023
The tutorial is using a transformer encoder and the mask used was for
masking a decoder which is not part of the tutorial.

The mask is removed. Some variable names are changed to better reflect
the purpose of the variable. Also, some unused imports are removed.

Fixes pytorch#1877

Signed-off-by: BJ Hargrave <hargrave@us.ibm.com>
svekars pushed a commit that referenced this issue Jun 5, 2023
Fixes #1877
The tutorial is using a transformer encoder and the mask used was for
masking a decoder which is not part of the tutorial.
The mask is removed. Some variable names are changed to better reflect
the purpose of the variable. Also, some unused imports are removed.
Signed-off-by: BJ Hargrave <hargrave@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docathon-h1-2023 A label for the docathon in H1 2023 medium module: torchtext
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants