-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF/DEPR: DatetimeIndex constructor #23675
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
pandas/core/indexes/datetimes.py
Outdated
# as wall-times instead of UTC timestamps. | ||
data = data.astype(_NS_DTYPE) | ||
copy = False | ||
# TODO: Why do we treat this differently from integer dtypes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback any idea why we treat floats differently from ints here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no idea, i would try to remove this special handling. only thing i can think of maybe this could have some odd rounding in the astype if its out of range of an int64 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would try to remove this special handling
Yah, the immediate goal is to pass fewer cases to to_datetime
since it is painfully circular and hides weird behavior like this float-dtype behavior.
If If we just cast floats to int64 (and mask with iNaT) then exactly one test fails in tests.dtypes.test_missing
. The behavior (in master) that is different between float vs int is that floats are treated as wall-times instead of UTC times. eg:
iarr = np.array([0], dtype='i8')
farr = np.array([0], dtype='f8')
>>> pd.DatetimeIndex(iarr)._data
array(['1970-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
>>> pd.DatetimeIndex(iarr, tz='US/Eastern')._data
array(['1970-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
>>> pd.DatetimeIndex(farr)._data
array(['1970-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
>>> pd.DatetimeIndex(farr, tz='US/Eastern')._data
array(['1970-01-01T05:00:00.000000000'], dtype='datetime64[ns]')
I don't see any especially good reason why it should work this way, save for keeping this test working and not introducing breaking changes.
azure failure is a Hypothesis failure |
Codecov Report
@@ Coverage Diff @@
## master #23675 +/- ##
==========================================
+ Coverage 92.31% 92.31% +<.01%
==========================================
Files 161 161
Lines 51562 51586 +24
==========================================
+ Hits 47599 47622 +23
- Misses 3963 3964 +1
Continue to review full report at Codecov.
|
pandas/core/indexes/datetimes.py
Outdated
# as wall-times instead of UTC timestamps. | ||
data = data.astype(_NS_DTYPE) | ||
copy = False | ||
# TODO: Why do we treat this differently from integer dtypes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no idea, i would try to remove this special handling. only thing i can think of maybe this could have some odd rounding in the astype if its out of range of an int64 .
Merged master to fix the merge conflict. |
Travis error looks unrelated |
@jbrockmendel Can you give a bit more context here? Eg, why are you adding so much new code (while there are actually no user facing changes, I suppose, except from the deprecation) ? |
Sharing implementation is fine, circularity is not. The fact that Moreover I am not convinced that the float_dtype behavior is intentional. There is an issue (need to dig it up) reporting a bug in passing a Categorical to |
Yes, but that circularity already existed. What changes did you try to make that it became more problematic?
But it also means duplicating in For the TimedeltaIndex PR, you made some common functions that were used by both. Is a similar pattern not possible here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I find the logic of what to do by each dtype (as you are introducing here) nice and clear to follow, but mainly wondering why not doing that in to_datetime
to avoid divergence between both
That's the goal behind #23702. |
Azure fail looks unrelated |
Change behavior here, deprecation cycle, or follow-up? |
gentle ping. Getting the constructors done will noticeably tighten the scope of #24024. |
# If tzaware, these values represent unix timestamps, so we | ||
# return them as i8 to distinguish from wall times | ||
return values.view('i8'), tz_parsed | ||
except (ValueError, TypeError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel is correct here.
side-note, we could clarify all this with python-3 style
except (ValueError, TypeError) as e2:
raise e2 from e
or six's https://pythonhosted.org/six/#six.raise_from, but it's probably just easier to wait until next month.
Returns | ||
------- | ||
result : ndarray | ||
np.int64 dtype if returned values represent UTC timestamps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to verify: getting an you'll get an np.int64
-dtype result
if and only if inferred_tz
is not None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct
# If tzaware, these values represent unix timestamps, so we | ||
# return them as i8 to distinguish from wall times | ||
return values.view('i8'), tz_parsed | ||
except (ValueError, TypeError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a "no" on the resolving miscommunication. Did I at least accurately summarize your suggested change?
not resolved, but I see from @TomAugspurger which what i was getting at. i guess ok for now.
pandas/core/arrays/datetimes.py
Outdated
return result, tz_parsed | ||
elif is_object_dtype(result): | ||
if allow_object: | ||
# allowed by to_datetime, not by DatetimeIndex constructor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right and that is the correct behavior, actually DTI should just return a Index in that case, otherwise this is very confusing. I believe you have an issue talking about mixed-timestatmps. These should always just return an Index of objects. No conversion / inference should happen at all.
if allow_object: | ||
# allowed by to_datetime, not by DatetimeIndex constructor | ||
return result, tz_parsed | ||
raise TypeError(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just remove the else and do the raise TypeError(result)
at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its worth distinguishing between hittable TypeError and defensive TypeError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you comment more then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
yearfirst=yearfirst) | ||
|
||
if is_object_dtype(data) or is_string_dtype(data): | ||
# TODO: We do not have tests specific to string-dtypes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could just write this as
try:
data = datas.astype(np.int64)
except:
...
might be more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with that is np.array(['20160405']) becomes np.array([20160405]) instead of 2016-04-05.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sure
raise e | ||
except ValueError as e: | ||
# Fallback to try to convert datetime objects | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cam you be slightly more specific on 'fallback', e.g. when would this be triggered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll flesh out the comment. The gist of it is if tz-aware datetime objects are passed and utc=True
is not passed, then array_to_datetime
will raise a ValueError
and datetime_to_datetime64
will handle this case.
It's a mess and is a big reason why I've been making array_to_datetime PRs: #24006 is an attempt to fix it, but there are some design questions @mroeschke and I are still batting around.
I'm -0.5 on having DatetimeIndex.new return an object Index, would rather have it raise. Either way, that is a design change that should be orthogonal to this PR. |
sure (and raising is prob correct) |
return result, tz_parsed | ||
raise TypeError(result) | ||
else: # pragma: no cover | ||
# GH#23675 this TypeError should never be hit, whereas the TypeError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not more clear.
Can you put the cases where each of these would be hit
This is a blocker for DatetimeArray._from_sequence, so probably merits a 0.24.0 milestone tag |
@jbrockmendel ok thanks. I realy really want to limit scope to things that we absolutely need to pushout the Datetime and associated EA's. sure there are other things to do, but theses need much more consideration that we have time for. (basically I am saying, this code IMHO is not strictly necessary for this refactor, though its a nice to do). |
This is preliminary to implementing DatetimeArray._from_sequence. The attempt to implement that directly (i.e. without existing circularity) failed, so trying it as a 2-parter.
Similar to #23539, this deprecates the option of passing timedelta64 data to DatetimeIndex.
There are a few places where I've made TODO notes of the "is this really intentional?" form; I'll point this out in inline comments.
@mroeschke I edited a couple of docstrings for the recently-implemented "nonexistent" kwarg, can you confirm that I didn't mess them up?