-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Add internal URI handling API #19073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some first remarks. Did not yet look at everything.
ext/uri/php_uri.c
Outdated
if (uri_handler_name == NULL) { | ||
return uri_handler_by_name("parse_url", sizeof("parse_url") - 1); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulting to parse_url
in a new API is probably not a good idea. Instead the “legacy” users should just pass "parse_url"
explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulting to parse_url
here works because that's the default indeed where php_uri_get_handler()
is called, the other "backends" can only be used if the config is explicitly passed (not null).
The other reason why I opted for this approach is that it would be inconvenient to create and free a new zend_string
when the legacy implementation is needed, and I wanted to avoid adding a known string just for this purpose, or exposing the C string based uri_handler_by_name
function instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked at this again and I must say that I'm having trouble meaningfully reviewing this. It adds a large amount of code with unclear purpose and confusing (to me) naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preliminary review round
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The switch from zend_string to the pointer-length pair seems to have been a good idea
I'll try to take another look tomorrow. |
@nielsdos Do you see anything that I should fix before merging it? I'd like to implement some of the cleanups that we discussed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good, FWIW I agree with Tim on UNREACHABLE
return property_handler->read_func(internal_uri, read_mode, zv); | ||
zend_result result = property_handler->read_func(internal_uri, read_mode, zv); | ||
|
||
ZEND_ASSERT(result == FAILURE || (Z_TYPE_P(zv) == IS_STRING && GC_REFCOUNT(Z_STR_P(zv)) == 2) || Z_TYPE_P(zv) == IS_NULL || Z_TYPE_P(zv) == IS_LONG); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this assertions intends to make sure that only a new string, null, or long can be returned in case of success. I hope that you won't find it too obtrusive.
ZEND_ASSERT(uri_handler->parse_uri != NULL); | ||
ZEND_ASSERT(uri_handler->clone_uri != NULL); | ||
ZEND_ASSERT(uri_handler->uri_to_string != NULL); | ||
ZEND_ASSERT(uri_handler->clone_uri != NULL || strcmp(uri_handler->name, URI_PARSER_PHP) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep these assertions even though two handlers of the legacy parser became NULL. So I excluded this URI handler from the checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertions with exceptions to the rule are a bit iffy IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can understand your POV. Do you think it's better to get rid of the latter two special cased assertions?
ext/uri/php_uri.c
Outdated
|
||
ZEND_ASSERT(uri_handler->name != NULL); | ||
ZEND_ASSERT(uri_handler->name != NULL && (strlen(uri_handler->name) > 0 || strcmp(uri_handler->name, URI_PARSER_PHP) == 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I renamed the legacy implementation to ""
, I wanted to make sure that other implementations cannot use the same handler.
Although, I think this assertion doesn't make any sense. Especially because module initialization will surely fail because of the name collision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh? Why did you change the name to "" though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly because I wanted to avoid this to work:
filter_var("qwe", FILTER_VALIDATE_URL, ["uri_parser_class" => "parse_url"])
even though null
is the suggested value to use for the "uri_parser_class" config. As ""
is the most similar string value to null
that can be stored in a uri_handlers
hash table, I figured it's a better choice than parse_url
.
Or do you have any better solutions in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ""
is weird from a user point of view.
I still don't think I quite understand the problem though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is getting pretty close to being merge-able
ext/uri/php_uri.c
Outdated
|
||
ZEND_ASSERT(uri_handler->name != NULL); | ||
ZEND_ASSERT(uri_handler->name != NULL && (strlen(uri_handler->name) > 0 || strcmp(uri_handler->name, URI_PARSER_PHP) == 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh? Why did you change the name to "" though?
ZEND_ASSERT(uri_handler->parse_uri != NULL); | ||
ZEND_ASSERT(uri_handler->clone_uri != NULL); | ||
ZEND_ASSERT(uri_handler->uri_to_string != NULL); | ||
ZEND_ASSERT(uri_handler->clone_uri != NULL || strcmp(uri_handler->name, URI_PARSER_PHP) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertions with exceptions to the rule are a bit iffy IMHO
I'll fix the CI probably tomorrow night |
@@ -615,7 +615,7 @@ void php_filter_validate_url(PHP_INPUT_FILTER_PARAM_DECL) /* {{{ */ | |||
} | |||
|
|||
/* Parse the URI - if it fails, we return NULL */ | |||
php_uri *uri = php_uri_parse_to_struct(uri_handler, Z_STRVAL_P(value), Z_STRLEN_P(value), URI_COMPONENT_READ_NORMALIZED_ASCII, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that still the normalized URI components were retrieved in multiple places where the raw version would have made more sense, so I changed these.
Generally, I think the raw format should be used internally whenever possible. I am also considering to return an error if an unsupported format is required (e.g. the WHATWG implementation doesn't support normalization - except for the host whose unicode representation can be returned).
} | ||
|
||
php_raw_url_decode(Z_STRVAL_P(zv), Z_STRLEN_P(zv)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that the php_raw_url_decode()
is conformant to RFC 3986.
No description provided.