Skip to content

Conversation

@miss-islington
Copy link
Contributor

The unknown-8bit trick was designed to deal with unknown bytes in an
ASCII message, and it works fine for that. However, I also tried to
extend it to handle bytes that can't be decoded using the charset
specified in an encoded word, and there it fails because there can be
other non-ASCII characters that were successfully decoded. The fix is
simple: do the unknown-8bit encoding using the utf-8 codec. This is
especially appropriate since anyone trying to do recovery on an unknown
byte string will probably attempt utf-8 first.
(cherry picked from commit 1e17ccd)

Co-authored-by: R. David Murray rdmurray@bitdance.com

…-142517)

The unknown-8bit trick was designed to deal with unknown bytes in an
ASCII message, and it works fine for that.  However, I also tried to
extend it to handle bytes that can't be decoded using the charset
specified in an encoded word, and there it fails because there can be
other non-ASCII characters that were *successfully* decoded.  The fix is
simple: do the unknown-8bit encoding using the utf-8 codec.  This is
especially appropriate since anyone trying to do recovery on an unknown
byte string will probably attempt utf-8 first.
(cherry picked from commit 1e17ccd)

Co-authored-by: R. David Murray <rdmurray@bitdance.com>
…G4hbe.rst

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
@bitdancer bitdancer merged commit 8802556 into python:3.13 Dec 24, 2025
40 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants