-
Notifications
You must be signed in to change notification settings - Fork 8k
ext/uri: Fix the distinction between an empty and a missing query/fragment for WHATWG URLs #20208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: PHP-8.5
Are you sure you want to change the base?
Conversation
ndossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to give this a proper look tomorrow. Looking at the tests the output makes sense though.
ndossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks right to me.
As a follow-up (likely master-only), you can use ZVAL_STRINGL_FAST which will avoid allocating empty strings (and 1-byte-long strings) by using interned strings.
TimWolla
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I see, the WHATWG URL standard treats missing and empty queries and fragments the same with regard to the getters. The difference is however preserved when recomposing the URL (which is also the case for the toAsciiString() method in PHP already) - unless explicitly overwritten. Which makes for funky behavior:
> u = new URL('https://example.com/?#');
URL {
href: 'https://example.com/?#',
origin: 'https://example.com',
protocol: 'https:',
username: '',
password: '',
host: 'example.com',
hostname: 'example.com',
port: '',
pathname: '/',
search: '',
searchParams: URLSearchParams {},
hash: ''
}
> u.search
''
> u.hash
''
> u.toString()
'https://example.com/?#'
> u.hash = u.hash
''
> u.search = u.search
''
> u.toString()
'https://example.com/'
It thus makes sense to me to consider this another case of:
Getters of Uri\WhatWg\Url have a few gotchas for the ones who are inherently familiar with the WHATWG URL specification: they don't (entirely) follow the “getter steps” that are defined by the specification, but the individual components are returned directly without any other changes that the “getter steps” would otherwise specify.
And make them differ between missing and empty to be consistent with the recomposition and with Rfc3986.
TimWolla
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which makes for funky behavior:
I am seeing that this also exists with PHP after applying the patch:
$url1 = Uri\WhatWg\Url::parse("https://example.com/?#");
$url2 = $url1->withFragment($url1->getFragment());
var_dump($url1->getFragment()); // string(0) ""
var_dump($url2->getFragment()); // NULL
var_dump($url2->toAsciiString()); // string(21) "https://example.com/?"
So perhaps the correct solution is to not make getFragment() and getQuery() nullable and always return an empty string? That should still be okay for “baseline compatibility” between RFC 3986 and WHATWG.
Selecting “Request Changes” for visibility.
|
Similarly for the username: I'd argue that the username is empty in this case, not missing. It currently returns Here the |
BTW It's possible to "fix" this case by the following changes: :D I love that there's this loophole TBH... 😅 And I find it important to have a distinction between empty/missing, especially if toString() returns the leading
As sad as it is, I'm OK to get rid of nullability here.. It doesn't look right after seeing what happens in case of |
After thinking about this a little bit more.. I'm not sure anymore if it's an issue at all. Apparently, WHATWG URL doesn't give a big importance for the distinction between missing and empty: this is best observed in the getter steps. When passing an empty string to the What matters more for me is that:
All these seem more important to me than completely symmetrical results when an empty query/fragment is passed to a wither. |
fc0e725 to
f8d4171
Compare
f8d4171 to
3868a1f
Compare
|
It'll be soon time to decide on this so that it can either make or not make the final 8.5.0. |
…gment for WHATWG URLs
3868a1f to
bf70731
Compare
I've requested another review from myself and hope to take a look tomorrow. In any case, I believe this is something that can be treated as a “bugfix” even after the release and I prefer not to make any changes we are not sure of, because then things might change again after release anyways. |
DanielEScherzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RM approval, technical review not performed
|
So what's the state of this now that 8.5.0 has been branched? |
I don't think “custom recomposition” is something we should encourage or support. |
TimWolla
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this more, looking at the WHATWG URL Standard and also the Lexbor internals.
With regard to my previous comment:
So perhaps the correct solution is to not make getFragment() and getQuery() nullable and always return an empty string?
Moving these cases to “empty string” certainly is going into the right direction.
|
Thanks for the reviews! Do we want to cherry-pick this to 8.5.0 or it should rather go to 8.5.1? I'm asking this because of the NEWS entry. cc. @DanielEScherzer @edorian |
Split from #19970