encoding: make TextDecoder handle BOM correctly by addaleax · Pull Request #30132 · nodejs/node

addaleax · 2019-10-26T15:14:56Z

Do not accept the BOM if it comes from a different encoding, and
only discard the BOM after it has actually been read (including
when it is spread over multiple chunks in streaming mode).

Fixes: #25315

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
commit message follows commit guidelines

nodejs-github-bot · 2019-10-26T15:16:16Z

CI: https://ci.nodejs.org/job/node-test-pull-request/26209/

Do not accept the BOM if it comes from a different encoding, and only discard the BOM after it has actually been read (including when it is spread over multiple chunks in streaming mode). Fixes: nodejs#25315

nodejs-github-bot · 2019-11-01T12:45:33Z

CI: https://ci.nodejs.org/job/node-test-pull-request/26282/

nodejs-github-bot · 2019-11-04T07:56:51Z

CI: https://ci.nodejs.org/job/node-test-pull-request/26317/

addaleax · 2019-11-05T19:29:54Z

Landed in 237be2e

Do not accept the BOM if it comes from a different encoding, and only discard the BOM after it has actually been read (including when it is spread over multiple chunks in streaming mode). Fixes: #25315 PR-URL: #30132 Reviewed-By: Gus Caplan <me@gus.host>

srl295

missed this discussion. Code looks OK, I'm not familiar with this spec. But shouldn't ICU do this in ucnv_toUnicode?

addaleax · 2019-11-05T19:45:32Z

@srl295 I don’t know, should it drop a BOM at the beginning of the text automatically? The current behaviour is that it doesn’t. Unless maybe there’s a flag in there that we could use and that I don’t know about? (And we do also at least need one way to make it not drop a BOM at the beginning of the text.)

srl295 · 2019-11-05T20:26:25Z

the converter names can take parameters such as "ISO_2022,locale=ja,version=4” - if ICU should support these options for whatwg compliance, please open an issue on https://unicode-org.atlassian.net under the ICU project and mention nodejs

addaleax · 2019-11-05T20:39:30Z

@srl295 I really have no idea whether ICU should or should not support automatic BOM-dropping or where to look up if it already does (other than that the implementation doesn’t look to me like it does), so I don’t feel qualified to open an issue and explain Node’s needs.

All I know is that it’s relatively easy for Node.js to work with what ICU currently provides.

srl295 · 2019-11-05T20:59:52Z

@srl295 I really have no idea whether ICU should or should not support automatic BOM-dropping or where to look up if it already does (other than that the implementation doesn’t look to me like it does), so I don’t feel qualified to open an issue and explain Node’s needs.

All I know is that it’s relatively easy for Node.js to work with what ICU currently provides.

OK. I'll file an issue then.

Do not accept the BOM if it comes from a different encoding, and only discard the BOM after it has actually been read (including when it is spread over multiple chunks in streaming mode). Fixes: #25315 PR-URL: #30132 Reviewed-By: Gus Caplan <me@gus.host>

nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Oct 26, 2019

addaleax added encoding Issues and PRs related to the TextEncoder and TextDecoder APIs. review wanted PRs that need reviews. and removed lib / src Issues and PRs related to general changes in the lib or src directory. labels Nov 1, 2019

encoding: make TextDecoder handle BOM correctly

a30272b

Do not accept the BOM if it comes from a different encoding, and only discard the BOM after it has actually been read (including when it is spread over multiple chunks in streaming mode). Fixes: nodejs#25315

addaleax force-pushed the textdecoder-bom branch from b6414f5 to a30272b Compare November 1, 2019 12:45

devsnek approved these changes Nov 4, 2019

View reviewed changes

addaleax added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. and removed review wanted PRs that need reviews. labels Nov 4, 2019

addaleax closed this Nov 5, 2019

addaleax deleted the textdecoder-bom branch November 5, 2019 19:30

srl295 assigned addaleax Nov 5, 2019

srl295 reviewed Nov 5, 2019

View reviewed changes

BridgeAR mentioned this pull request Nov 19, 2019

v13.2.0 proposal #30547

Merged

BethGriggs mentioned this pull request Dec 9, 2019

v12.13.2 proposal #30865

Closed

BethGriggs mentioned this pull request Dec 23, 2019

v12.14.1 proposal #31069

Merged

phated mentioned this pull request Apr 18, 2022

lib: ensure TextDecoder only removes utf8 BOM on utf8 encoding #42779

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

encoding: make TextDecoder handle BOM correctly#30132

encoding: make TextDecoder handle BOM correctly#30132
addaleax wants to merge 1 commit intonodejs:masterfrom
addaleax:textdecoder-bom

addaleax commented Oct 26, 2019

Uh oh!

nodejs-github-bot commented Oct 26, 2019

Uh oh!

nodejs-github-bot commented Nov 1, 2019

Uh oh!

nodejs-github-bot commented Nov 4, 2019

Uh oh!

addaleax commented Nov 5, 2019

Uh oh!

srl295 left a comment

Uh oh!

addaleax commented Nov 5, 2019

Uh oh!

srl295 commented Nov 5, 2019 via email

Uh oh!

addaleax commented Nov 5, 2019

Uh oh!

srl295 commented Nov 5, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

addaleax commented Oct 26, 2019

Checklist

Uh oh!

nodejs-github-bot commented Oct 26, 2019

Uh oh!

nodejs-github-bot commented Nov 1, 2019

Uh oh!

nodejs-github-bot commented Nov 4, 2019

Uh oh!

addaleax commented Nov 5, 2019

Uh oh!

srl295 left a comment

Choose a reason for hiding this comment

Uh oh!

addaleax commented Nov 5, 2019

Uh oh!

srl295 commented Nov 5, 2019 via email

Uh oh!

addaleax commented Nov 5, 2019

Uh oh!

srl295 commented Nov 5, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants