Skip to content

Conversation

@bitdancer
Copy link
Member

The unknown-8bit trick was designed to deal with unknown bytes in an
ASCII message, and it works fine for that. However, I also tried to
extend it to handle bytes that can't be decoded using the charset
specified in an encoded word, and there it fails because there can be
other non-ASCII characters that were successfully decoded. The fix is
simple: do the unknown-8bit encoding using the utf-8 codec. This is
especially appropriate since anyone trying to do recovery on an unknown
byte string will probably attempt utf-8 first.

@bitdancer bitdancer requested a review from a team as a code owner December 10, 2025 14:50
@bitdancer bitdancer self-assigned this Dec 10, 2025
The unknown-8bit trick was designed to deal with unknown bytes in an
ASCII message, and it works fine for that.  However, I also tried to
extend it to handle bytes that can't be decoded using the charset
specified in an encoded word, and there it fails because there can be
other non-ASCII characters that were *successfully* decoded.  The fix is
simple: do the unknown-8bit encoding using the utf-8 codec.  This is
especially appropriate since anyone trying to do recovery on an unknown
byte string will probably attempt utf-8 first.
@bitdancer bitdancer force-pushed the undecodable_encoded_words branch from 4ae90b4 to 1bba134 Compare December 10, 2025 15:04
@bitdancer
Copy link
Member Author

Does anyone want to review this, or shall I just merge it?

@bitdancer bitdancer merged commit 1e17ccd into python:main Dec 24, 2025
48 checks passed
@bitdancer bitdancer added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Dec 24, 2025
@miss-islington-app
Copy link

Thanks @bitdancer for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Thanks @bitdancer for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Dec 24, 2025
…-142517)

The unknown-8bit trick was designed to deal with unknown bytes in an
ASCII message, and it works fine for that.  However, I also tried to
extend it to handle bytes that can't be decoded using the charset
specified in an encoded word, and there it fails because there can be
other non-ASCII characters that were *successfully* decoded.  The fix is
simple: do the unknown-8bit encoding using the utf-8 codec.  This is
especially appropriate since anyone trying to do recovery on an unknown
byte string will probably attempt utf-8 first.
(cherry picked from commit 1e17ccd)

Co-authored-by: R. David Murray <rdmurray@bitdance.com>
@bedevere-app
Copy link

bedevere-app bot commented Dec 24, 2025

GH-143146 is a backport of this pull request to the 3.14 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Dec 24, 2025
…-142517)

The unknown-8bit trick was designed to deal with unknown bytes in an
ASCII message, and it works fine for that.  However, I also tried to
extend it to handle bytes that can't be decoded using the charset
specified in an encoded word, and there it fails because there can be
other non-ASCII characters that were *successfully* decoded.  The fix is
simple: do the unknown-8bit encoding using the utf-8 codec.  This is
especially appropriate since anyone trying to do recovery on an unknown
byte string will probably attempt utf-8 first.
(cherry picked from commit 1e17ccd)

Co-authored-by: R. David Murray <rdmurray@bitdance.com>
@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Dec 24, 2025
@bedevere-app
Copy link

bedevere-app bot commented Dec 24, 2025

GH-143147 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Dec 24, 2025
bitdancer added a commit to bitdancer/cpython that referenced this pull request Dec 24, 2025
bitdancer added a commit that referenced this pull request Dec 24, 2025
bitdancer added a commit that referenced this pull request Dec 24, 2025
…H-142517) (#143147)

The unknown-8bit trick was designed to deal with unknown bytes in an
ASCII message, and it works fine for that.  However, I also tried to
extend it to handle bytes that can't be decoded using the charset
specified in an encoded word, and there it fails because there can be
other non-ASCII characters that were *successfully* decoded.  The fix is
simple: do the unknown-8bit encoding using the utf-8 codec.  This is
especially appropriate since anyone trying to do recovery on an unknown
byte string will probably attempt utf-8 first.
(cherry picked from commit 1e17ccd)

Co-authored-by: R. David Murray <rdmurray@bitdance.com>
Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
bitdancer added a commit that referenced this pull request Dec 24, 2025
…H-142517) (#143146)

The unknown-8bit trick was designed to deal with unknown bytes in an
ASCII message, and it works fine for that.  However, I also tried to
extend it to handle bytes that can't be decoded using the charset
specified in an encoded word, and there it fails because there can be
other non-ASCII characters that were *successfully* decoded.  The fix is
simple: do the unknown-8bit encoding using the utf-8 codec.  This is
especially appropriate since anyone trying to do recovery on an unknown
byte string will probably attempt utf-8 first.
(cherry picked from commit 1e17ccd)

Co-authored-by: R. David Murray <rdmurray@bitdance.com>
Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
@bitdancer bitdancer deleted the undecodable_encoded_words branch December 24, 2025 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant