url: fix WHATWG URL host formatting error caused by port string#20493
url: fix WHATWG URL host formatting error caused by port string#20493peakji wants to merge 4 commits intonodejs:masterfrom peakji:master
Conversation
|
This is a pretty serious bug and I'm wondering if these ambiguous names (host vs hostname) affected more stuffs in the code base? |
|
cc @nodejs/url |
apapirovski
left a comment
There was a problem hiding this comment.
The logic needs to be fixed slightly.
lib/internal/url.js
Outdated
| ret += options.unicode ? | ||
| domainToUnicode(this.host) : this.host; | ||
| domainToUnicode(this.hostname) : this.hostname; | ||
| if (ctx.port !== '') |
There was a problem hiding this comment.
I think this should be ctx.port !== null?
There was a problem hiding this comment.
The WHATWG URL API uses '' as the empty value, not null.
There was a problem hiding this comment.
> new url.URL("https://nodejs.org/en/")
URL {
href: 'https://nodejs.org/en/',
origin: 'https://nodejs.org',
protocol: 'https:',
username: '',
password: '',
host: 'nodejs.org',
hostname: 'nodejs.org',
port: '',
pathname: '/en/',
search: '',
searchParams: URLSearchParams {},
hash: '' }There was a problem hiding this comment.
Oh my bad! This is the internal context...
lib/internal/url.js
Outdated
| const ctx = this[context]; | ||
| var ret = ctx.scheme; | ||
| if (ctx.host !== null) { | ||
| if (ctx.hostname !== null) { |
There was a problem hiding this comment.
This should also be ctx.host !== null. The hostname change only applies below.
There was a problem hiding this comment.
You're right, URLContext uses null:
const { URL } = require('url');
const myURL = new URL('https://nodejs.org/en/');
const kContext = Object.getOwnPropertySymbols(myURL)[0];
console.log(myURL[kContext]);URLContext {
flags: 400,
scheme: 'https:',
username: '',
password: '',
host: 'nodejs.org',
port: null,
path: [ 'en', '' ],
query: null,
fragment: null }
|
FWIW you might want to contribute your test case back to https://github.com/w3c/web-platform-tests/tree/master/url — clearly this is a case that's not covered by the official tests. |
apapirovski
left a comment
There was a problem hiding this comment.
Should be good to go now.
Although I agree with this general sentiment, the url.format() API is nonstandard, so it doesn't make sense to test this in the official tests. |
|
Yeah, the Also as the name implies |
|
@jasnell @nodejs/url I don't know if it makes sense to throw in the non-internal part of our JS API when the non-standard methods like |
|
Throwing is fine, I think. |
|
CI re-run for node-test-commit-arm-fanned: https://ci.nodejs.org/job/node-test-commit-arm-fanned/977/ |
|
@joyeecheung I can’t really tell whether your suggestion is something for this PR or not – should we wait with landing this or is it good to go? |
|
@trivikr Could you help me to re-run node-test-commit as well? |
|
@addaleax This PR is good to go, the error handling can be changed in another PR |
|
Landed in 2a96ee2 Congrats on your first commit in Node.js core and becoming a Contributor! 🎉 |
The current url.format implementation will return an invalid URL string without the host if there is a port and unicode: true. This unexpected behavior is caused by domainToUnicode, which expects a hostname instead of a host string according to node_url.cc. Adds both a fix and a test for the issue. PR-URL: #20493 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
The current url.format implementation will return an invalid URL string without the host if there is a port and unicode: true. This unexpected behavior is caused by domainToUnicode, which expects a hostname instead of a host string according to node_url.cc. Adds both a fix and a test for the issue. PR-URL: #20493 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
The current url.format implementation will return an invalid URL string without the host if there is a port and unicode: true. This unexpected behavior is caused by domainToUnicode, which expects a hostname instead of a host string according to node_url.cc. Adds both a fix and a test for the issue. PR-URL: #20493 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
The current url.format implementation will return an invalid URL string without the host if there is a port and unicode: true. This unexpected behavior is caused by domainToUnicode, which expects a hostname instead of a host string according to node_url.cc. Adds both a fix and a test for the issue. PR-URL: #20493 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
The current url.format implementation will return an invalid URL string without the host if there is a port and
unicode: true.This unexpected behavior is caused by domainToUnicode, which expects a hostname instead of a host string according to node_url.cc.
For example:
url.domainToUnicode('xn--3kq375bonc.com')will be correctly converted into Chinese'搞事情.com', buturl.domainToUnicode('xn--3kq375bonc.com:80')would produce an empty string.Code to reproduce
References
Checklist
make -j4 test(UNIX), orvcbuild test(Windows) passes