-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catch exception around much smaller piece of code #23702
Conversation
Hello @jbrockmendel! Thanks for updating the PR.
Comment last updated on November 15, 2018 at 02:51 Hours UTC |
Azure fail looks unrelated |
pandas/core/tools/datetimes.py
Outdated
# reached when array_to_datetime raises "unless utc=True..." | ||
try: | ||
values, tz = conversion.datetime_to_datetime64(arg) | ||
return DatetimeIndex._simple_new(values, name=name, tz=tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be a future bug here in which the user passes box=False
and we hit this part and return a DatetimeIndex instead of an array. (By inspection looks like the bug already would exist.)
I don't exactly know the criteria to set this up, but is it possible to delegate this to _maybe_box_date_results
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is #23045 related to the potential bug you have in mind?
but is it possible to delegate this to _maybe_box_date_results?
It's a little ungainly, but I think it can be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly, here's a quick example I found (albeit questionable if this is the correct behavior)
In [3]: pd.__version__
Out[3]: '0.24.0.dev0+1010.ge413c491e.dirty'
In [4]: import pytz; import datetime
In [5]: pd.to_datetime([datetime.datetime(2018, 1, 1), datetime.datetime(2018, 1, 1, tzinfo=pytz.UTC)], box=False)
Out[5]: DatetimeIndex(['2018-01-01 00:00:00+00:00', '2018-01-01 00:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
Doesn't necessarily have to be handled here; we can create a new issue for this but it hits this ValueError
path.
pandas/core/tools/datetimes.py
Outdated
|
||
if result is None and (format is None or infer_datetime_format): | ||
|
||
if result is None and (format is None or infer_datetime_format): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you comment here why we might hit this branch? It looks like if no format was passed, or a format was passed and it failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the branch that gets hit the most often, e.g. pd.to_datetime([pd.Timestamp.now(), '2015/04/05'])
will go down this route.
Any list/object-dtyped case where a format is not passed.
Any list/object-dtype case where a format is passed, fails, and is allowed to try the fallback (via infer_datetime_format
kwarg, which defaults to None
)
Codecov Report
@@ Coverage Diff @@
## master #23702 +/- ##
==========================================
+ Coverage 92.24% 92.25% +<.01%
==========================================
Files 161 161
Lines 51339 51343 +4
==========================================
+ Hits 47360 47364 +4
Misses 3979 3979
Continue to review full report at Codecov.
|
The changes seem fine, but I'm not familiar with this section of the code (and in particular I don't know how well tested the expected failure conditions are). |
Looks like azure failure in is in Hypothesis |
AFAIK the tests specific to this file are mostly in tests.indexes.datetimes.test_tools. I'll need to take a closer look at these tests to see if they can be made more specific, but so far they seem fairly thorough |
Is there any behavioural difference? |
Dangit, I just found a corner case in which there is:
returns DTI in master, raises Another round of edits coming up in a bit. |
Hmm, that seems like something that does not necessarily need to work .. (if a user passes format, shouldn't the value be strings?) Back to my original question: there is no intended behavioural change? |
Correct. |
But what was then the original reason to do this? There is some change I would assume? (didn't follow the discussion in the PR you are referencing) I now only catch more specific error. But so you encountered a case where another error was catched that should not have been catched? |
Following this refactor, the call to (Good call asking me to make that explicit; I should have explained that from the start) |
@mroeschke you recently implemented the timezone checking in tslib.array_to_datetime for strings there. Could the same machinery be extended to handle datetimes? Because AFAICT that's the only thing that conversion.datetime_to_datetime64 does. If we can get that logic in one place, we can avoid the fallback to datetime_to_datetime64 (which is itself only called in two places, so we may be able to remove it entirely) |
@jbrockmendel Definitely. Not sure what it would look like since I had to weave it into the beast of |
|
||
def _maybe_box_date_results(result, box, tz, name, tz_parsed=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a docstring here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
Good news everyone! I found a case that behaves differently on OSX vs Linux in master
raises on Linux, returns correct result on OSX. Both cases in py27. (waiting for setup.py build_ext to run so i can check 36/37) This is turning out hairier than expected. I think The Right Way to do this involves combining array_to_datetime with datetime_to_datetime64 as discussed above, and probably making the behavior in the |
In py3 this raises on both OSX and Linux |
bfa30d6
to
e725875
Compare
I'm... not happy with how this is turning out. Closing. |
Broken off of #23675, followed by refactor.
The refactor has a non-trivial diff, but the only actual logic changed here is the
except ValueError as e
, instead of catching everything from L240-L303, now specifically only catches errors raised bytslib.array_to_datetime
. Everything else is just moving code outside of the try/except block and de-indenting.Catching more specific exceptions is worthwhile in its own right, but this is also an important step for implementing
DatetimeArray._from_sequence
(@TomAugspurger, @jorisvandenbossche, @jreback) since after this we can isolate the part ofto_datetime
called byDatetimeIndex.__new__
(xref #23675) to make the constructor non-circular (and then refactor it out into_from_sequence
).cc @mroeschke