-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise FileNotFoundError
in read_json
if input looks like file path but file is missing
#46718
Conversation
FileNotFoundError
in read_json
if input is string ending in .json{,.gz,.bz2}
but file is missingFileNotFoundError
in read_json
if input looks like file path but file is missing
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Will need approval from @jreback to merge. (long string's overflowing the filesystem is handled by file_exists
; there's a comment # gh-5874: if the filepath is too long will raise here
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, after merging master looks like there's a code path needs updating.
Also @twoertwein, for other file extensions with compression, it doesn't appear we manually raise a FileNotFoundError
. Should this automatically be handled by the handler code?
Would maybe be better to raise here when the file doesn't exist: Line 692 in b1b788f
|
Is this a known error in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm merge on green
cc @mroeschke |
) | ||
and not file_exists(filepath_or_buffer) | ||
): | ||
raise FileNotFoundError(f"File {filepath_or_buffer} does not exist") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can investigate if this logic can be moved according to #46718 (comment) (such that not just json files raise this) that would be great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the safe assumptions to make here about how the file path ends? Is there sth like _extension_to_compression
but for file formats?
# Only for write methods
if "r" not in mode and is_path:
check_parent_directory(str(handle))
_supported_formats = ["json", "csv", "xls", "xlsx", ...]
_allowed_extensions = tuple(itertools.chain(
(x,) + tuple(f"{x}{c}" for c in _extension_to_compression)
for x in _supported_formats
))
if (
is_path
and handle.lower().endswith(_allowed_extensions)
and not file_exists(handle)
):
raise FileNotFoundError(f"File {handle} does not exist")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After read_json knows that it is a file, I think you can just raise inside get_handle when the file doesn't exist:
if "r" not in mode and is_path and not file_exists(handle):
raise ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean this, right?
if "r" in mode and is_path and not file_exists(handle):
raise ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually read_json
doesn't even call get_handle
since it's behind a bunch of or
cases none of which are true:
if (
not isinstance(filepath_or_buffer, str)
or is_url(filepath_or_buffer)
or is_fsspec_url(filepath_or_buffer)
or file_exists(filepath_or_buffer)
):
self.handles = get_handle(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there were attempts at creating an efficient way of checking whether a str is a legit JSON string. I assume these attempts were unsuccessful - we check the reverse. If that's the case, then moving the check into get_handle, will not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case, then moving the check into get_handle, will not work.
So leave PR as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So leave PR as is?
Probably.
Great, thanks @janosh |
…h but file is missing (pandas-dev#46718) * raise FileNotFoundError in _get_data_from_filepath() * update tests test_read_non_existent + test_read_expands_user_home_dir * add changelog entry in doc/source/whatsnew/v1.5.0.rst * use pandas.io.common._compression_to_extension instead of hard-coded extensions * move changelog entry from IO to other API changes * fix ImportError from _compression_to_extension -> _extension_to_compression rename * add test read_json very long file path * remove extra period in extension checking Co-authored-by: Matthew Roeschke <emailformattr@gmail.com>
doc/source/whatsnew/v1.5.0.rst
file if fixing a bug or adding a new feature.