-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve file URI ergonomics in urllib.request
#125866
Comments
Merge `URL2PathNameTests` `PathName2URLTests` and test cases (which test only the Windows-specific implementations from `nturl2path`) into the main `Pathname_Tests` test case for these functions. Copy/port some test cases for `pathlib.Path.as_uri()` and `from_uri()`.
…125993) Merge `URL2PathNameTests` and `PathName2URLTests` test cases (which test only the Windows-specific implementations from `nturl2path`) into the main `Pathname_Tests` test case for these functions. Copy/port some test cases for `pathlib.Path.as_uri()` and `from_uri()`.
…()` (pythonGH-125993) Merge `URL2PathNameTests` and `PathName2URLTests` test cases (which test only the Windows-specific implementations from `nturl2path`) into the main `Pathname_Tests` test case for these functions. Copy/port some test cases for `pathlib.Path.as_uri()` and `from_uri()`. (cherry picked from commit 6742f14) Co-authored-by: Barney Gale <barney.gale@gmail.com>
…()` (pythonGH-125993) Merge `URL2PathNameTests` and `PathName2URLTests` test cases (which test only the Windows-specific implementations from `nturl2path`) into the main `Pathname_Tests` test case for these functions. Copy/port some test cases for `pathlib.Path.as_uri()` and `from_uri()`. (cherry picked from commit 6742f14) Co-authored-by: Barney Gale <barney.gale@gmail.com>
…e()` (GH-125993) (#126145) GH-125866: Improve tests for `pathname2url()` and `url2pathname()` (GH-125993) Merge `URL2PathNameTests` and `PathName2URLTests` test cases (which test only the Windows-specific implementations from `nturl2path`) into the main `Pathname_Tests` test case for these functions. Copy/port some test cases for `pathlib.Path.as_uri()` and `from_uri()`. (cherry picked from commit 6742f14) Co-authored-by: Barney Gale <barney.gale@gmail.com>
…e()` (GH-125993) (#126144) GH-125866: Improve tests for `pathname2url()` and `url2pathname()` (GH-125993) Merge `URL2PathNameTests` and `PathName2URLTests` test cases (which test only the Windows-specific implementations from `nturl2path`) into the main `Pathname_Tests` test case for these functions. Copy/port some test cases for `pathlib.Path.as_uri()` and `from_uri()`. (cherry picked from commit 6742f14) Co-authored-by: Barney Gale <barney.gale@gmail.com>
Adjust `urllib.request.pathname2url` and `url2pathname()` to generate and accept file URIs as described in RFC8089. `pathname2url()` gains a new *include_scheme* argument, which defaults to false. When set to true, the returned URL includes a `file:` prefix. `url2pathname()` now automatically removes a `file:` prefix if present. On Windows, `pathname2url()` now generates URIs that begin with two slashes rather than four when given a UNC path. On other platforms, `pathname2url()` now generates URIs that begin with three slashes rather than one when given an absolute path. `url2pathname()` now performs the opposite transformation, so `file:///etc/hosts` becomes `/etc/hosts`. Furthermore, `url2pathname()` now ignores local hosts (like "localhost" or any alias) and raises `URLError` for non-local hosts.
Stop converting Windows drive letters to uppercase in `urllib.request.pathname2url()` and `url2pathname()`. This behaviour is unnecessary and inconsistent with pathlib's file URI implementation.
Stop converting Windows drive letters to uppercase in `urllib.request.pathname2url()` and `url2pathname()`. This behaviour is unnecessary and inconsistent with pathlib's file URI implementation.
Actually, |
RFC 8089 is mainly non-normative, but it describes The freedesktop spec notes:
The wikipedia article lists the |
Also, using
|
RFC 8089 updates RFC 1738. The It may be a justified change, but it needs different justification than RFC 8089 compliance. |
Good point. I'll revise the issue - it's already evolved a bit since I logged it. |
urllib.request
urllib.request
@serhiy-storchaka I've revised the description as follows:
Does that seem like a good idea to you? If we go for these changes, and the user provides a URL with a non- Relatedly, what do you think should happen if the user provides a query or fragment in the URL? Currently it's returned as part of the file path. (This seems under-specified to me, but I'm leaning towards raising Thank you |
I wonder -- would not be better to add such functions in os.path? The implementation is clearly platform depending, and if we add support for platform with different path format (like old macpath), it will need a completely different implementation. In custom build for exotic platform it is easier to provide a separate os.path implementation than patch urllib.request. |
Good question! It's something that's come up before: The current Windows + POSIX implementations don't have much in common, but I think that will change when I address this issue and #123599. To solve them, I'm planning to use def url2pathname(url):
scheme, authority, path, query, fragment = urlsplit(url, scheme='file')
if scheme != 'file':
raise URLError(f'URL uses non-file scheme: {url!r}')
if query:
raise URLError(f'file URL has query: {url!r}')
if fragment:
raise URLError(f'file URL has fragment: {url!r}')
if os.name == 'nt':
# Windows-specific file URI quirks.
if authority and authority != 'localhost':
# e.g. file://server/share/file.txt
path = '//' + authority + path
elif path[:3] == '///':
# e.g. file://///server/share/file.txt
path = path[1:]
else:
if path[:1] == '/' and path[2:3] in ':|':
# Skip past extra slash before DOS drive in URL path.
path = path[1:]
if path[1:2] == '|':
# Older URLs use a pipe after a drive letter
path = path.replace('|', ':', 1)
path = path.replace('/', '\\')
elif not _is_local_authority(authority):
# POSIX only: reject URL if authority doesn't resolve to localhost.
raise URLError(f'file URL has non-local authority: {url!r}')
encoding = sys.getfilesystemencoding()
errors = sys.getfilesystemencodeerrors()
return unquote(path, encoding=encoding, errors=errors)
def pathname2url(pathname, include_scheme=False):
if os.name == 'nt':
pathname = pathname.replace('\\', '/')
encoding = sys.getfilesystemencoding()
errors = sys.getfilesystemencodeerrors()
prefix = 'file:' if include_scheme else ''
drive, root, tail = os.path.splitroot(pathname)
if drive:
if drive[:4] == '//?/':
drive = drive[4:]
if drive[:4].upper() == 'UNC/':
drive = '//' + drive[4:]
if drive[1:] == ':':
prefix += '///'
drive = quote(drive, encoding=encoding, errors=errors, safe='/:')
elif root:
prefix += '//'
tail = quote(tail, encoding=encoding, errors=errors)
return prefix + drive + root + tail There's clearly some OS-specific code in there, but there's also a fair amount of shared and URL-specific code which might feel out-of-place in Besides, if we added functions in |
FWIW, if we go with the above merger, then we can deprecate the oddball |
…()` (python#125993) Merge `URL2PathNameTests` and `PathName2URLTests` test cases (which test only the Windows-specific implementations from `nturl2path`) into the main `Pathname_Tests` test case for these functions. Copy/port some test cases for `pathlib.Path.as_uri()` and `from_uri()`.
…()` (python#125993) Merge `URL2PathNameTests` and `PathName2URLTests` test cases (which test only the Windows-specific implementations from `nturl2path`) into the main `Pathname_Tests` test case for these functions. Copy/port some test cases for `pathlib.Path.as_uri()` and `from_uri()`.
…hon#127138) Stop converting Windows drive letters to uppercase in `urllib.request.pathname2url()` and `url2pathname()`. This behaviour is unnecessary and inconsistent with pathlib's file URI implementation.
Add `tests.test_nturl2path` to exercise `nturl2path`. `nturl2path` is currently used by `urllib` and tested by `test_urllib`, but that will change when we deprecate `nturl2path` and add support for Windows paths/URLs in `urllib` itself.
Deprecate the `nturl2path` module. Its functionality is merged into `urllib.request`. Add `tests.test_nturl2path` to exercise `nturl2path`, as it's no longer covered by `test_urllib`.
Feature or enhancement
I request that we make
pathname2url
andurl2pathname
easier to use:pathname2url()
is made to accept an optional include_scheme argument that sticksfile:
on the front when trueurl2pathname()
is made to strip anyfile:
prefix from its argument.I think this would go a long way towards making these functions usable, and allow us to remove the scary "This does not accept/produce a complete URL" warnings from the docs.
Linked PRs
pathname2url()
andurl2pathname()
#125993pathname2url()
andurl2pathname()
(GH-125993) #126144pathname2url()
andurl2pathname()
(GH-125993) #126145nturl2path
module #131432The text was updated successfully, but these errors were encountered: