encoding: make TextDecoder handle BOM correctly#30132
encoding: make TextDecoder handle BOM correctly#30132addaleax wants to merge 1 commit intonodejs:masterfrom
Conversation
Do not accept the BOM if it comes from a different encoding, and only discard the BOM after it has actually been read (including when it is spread over multiple chunks in streaming mode). Fixes: nodejs#25315
b6414f5 to
a30272b
Compare
|
Landed in 237be2e |
srl295
left a comment
There was a problem hiding this comment.
missed this discussion. Code looks OK, I'm not familiar with this spec. But shouldn't ICU do this in ucnv_toUnicode?
|
@srl295 I don’t know, should it drop a BOM at the beginning of the text automatically? The current behaviour is that it doesn’t. Unless maybe there’s a flag in there that we could use and that I don’t know about? (And we do also at least need one way to make it not drop a BOM at the beginning of the text.) |
|
the converter names can take parameters such as "ISO_2022,locale=ja,version=4” - if ICU should support these options for whatwg compliance, please open an issue on https://unicode-org.atlassian.net under the ICU project and mention nodejs
|
|
@srl295 I really have no idea whether ICU should or should not support automatic BOM-dropping or where to look up if it already does (other than that the implementation doesn’t look to me like it does), so I don’t feel qualified to open an issue and explain Node’s needs. All I know is that it’s relatively easy for Node.js to work with what ICU currently provides. |
OK. I'll file an issue then. |
Do not accept the BOM if it comes from a different encoding, and
only discard the BOM after it has actually been read (including
when it is spread over multiple chunks in streaming mode).
Fixes: #25315
Checklist
make -j4 test(UNIX), orvcbuild test(Windows) passes