Transformer Language Model Tutorial - Incorrect Attention Mask Description #1877

zoeqevans · 2022-03-28T19:37:34Z

The tutorial Language Modeling With nn.Transformer and Torchtext describes an attention mask that will prevent nn.TransformerEncoder from attending to not-yet-seen tokens:

Along with the input sequence, a square attention mask is required because the self-attention layers in nn.TransformerEncoder are only allowed to attend the earlier positions in the sequence.

I think the mask is actually upper triangular: a square mask would not prevent a token attending to future token.

I may have misunderstood the mask description; if not, happy to write the PR that fixes this.

cc @pytorch/team-text-core @Nayef211

The text was updated successfully, but these errors were encountered:

hudeven · 2022-10-06T18:40:32Z

@montyevans Make sense! Please feel free to send a PR.

keith-golden · 2022-10-27T19:25:40Z

@montyevans @hudeven To add to this, the transformer (at least as it's implemented in "Attention is all you need") does not mask self-attention in the encoder, rather, the upper triangular mask is for the self-attention blocks in the decoder.

We want the encoder to see the whole sequence, but the decoder should only be able to see the tokens it has predicted up to that point, hence the mask. The tutorial applies the mask to the encoder instead. Hope this helps.

bjhargrave · 2023-06-01T21:02:44Z

/assigntome

The tutorial is using a transformer encoder and the mask used was for masking a decoder which is not part of the tutorial. The mask is removed. Some variable names are changed to better reflect the purpose of the variable. Also, some unused imports are removed. Fixes pytorch#1877 Signed-off-by: BJ Hargrave <hargrave@us.ibm.com>

Fixes #1877 The tutorial is using a transformer encoder and the mask used was for masking a decoder which is not part of the tutorial. The mask is removed. Some variable names are changed to better reflect the purpose of the variable. Also, some unused imports are removed. Signed-off-by: BJ Hargrave <hargrave@us.ibm.com>

svekars added the module: torchtext label Oct 5, 2022

svekars added medium docathon-h1-2023 A label for the docathon in H1 2023 labels May 31, 2023

github-actions bot assigned bjhargrave Jun 1, 2023

bjhargrave mentioned this issue Jun 5, 2023

Remove improper src_mask from encoder tutorial #2423

Merged

4 tasks

svekars closed this as completed in #2423 Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformer Language Model Tutorial - Incorrect Attention Mask Description #1877

Transformer Language Model Tutorial - Incorrect Attention Mask Description #1877

zoeqevans commented Mar 28, 2022 •

edited by pytorch-bot bot

Loading

hudeven commented Oct 6, 2022 •

edited

Loading

keith-golden commented Oct 27, 2022

bjhargrave commented Jun 1, 2023

Transformer Language Model Tutorial - Incorrect Attention Mask Description #1877

Transformer Language Model Tutorial - Incorrect Attention Mask Description #1877

Comments

zoeqevans commented Mar 28, 2022 • edited by pytorch-bot bot Loading

hudeven commented Oct 6, 2022 • edited Loading

keith-golden commented Oct 27, 2022

bjhargrave commented Jun 1, 2023

zoeqevans commented Mar 28, 2022 •

edited by pytorch-bot bot

Loading

hudeven commented Oct 6, 2022 •

edited

Loading