REF/DEPR: DatetimeIndex constructor #23675

jbrockmendel · 2018-11-13T21:10:02Z

This is preliminary to implementing DatetimeArray._from_sequence. The attempt to implement that directly (i.e. without existing circularity) failed, so trying it as a 2-parter.

Similar to #23539, this deprecates the option of passing timedelta64 data to DatetimeIndex.

There are a few places where I've made TODO notes of the "is this really intentional?" form; I'll point this out in inline comments.

@mroeschke I edited a couple of docstrings for the recently-implemented "nonexistent" kwarg, can you confirm that I didn't mess them up?

…delta64 data

pep8speaks · 2018-11-13T21:10:06Z

Hello @jbrockmendel! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/core/arrays/datetimes.py !
There are no PEP8 issues in the file pandas/core/indexes/datetimes.py !
There are no PEP8 issues in the file pandas/core/tools/datetimes.py !
There are no PEP8 issues in the file pandas/tests/indexes/datetimes/test_construction.py !

jbrockmendel · 2018-11-13T21:17:17Z

pandas/core/indexes/datetimes.py

+            #  as wall-times instead of UTC timestamps.
+            data = data.astype(_NS_DTYPE)
+            copy = False
+            # TODO: Why do we treat this differently from integer dtypes?


@jreback any idea why we treat floats differently from ints here?

no idea, i would try to remove this special handling. only thing i can think of maybe this could have some odd rounding in the astype if its out of range of an int64 .

i would try to remove this special handling

Yah, the immediate goal is to pass fewer cases to to_datetime since it is painfully circular and hides weird behavior like this float-dtype behavior.

If If we just cast floats to int64 (and mask with iNaT) then exactly one test fails in tests.dtypes.test_missing. The behavior (in master) that is different between float vs int is that floats are treated as wall-times instead of UTC times. eg:

iarr = np.array([0], dtype='i8') farr = np.array([0], dtype='f8') >>> pd.DatetimeIndex(iarr)._data array(['1970-01-01T00:00:00.000000000'], dtype='datetime64[ns]') >>> pd.DatetimeIndex(iarr, tz='US/Eastern')._data array(['1970-01-01T00:00:00.000000000'], dtype='datetime64[ns]') >>> pd.DatetimeIndex(farr)._data array(['1970-01-01T00:00:00.000000000'], dtype='datetime64[ns]') >>> pd.DatetimeIndex(farr, tz='US/Eastern')._data array(['1970-01-01T05:00:00.000000000'], dtype='datetime64[ns]')

I don't see any especially good reason why it should work this way, save for keeping this test working and not introducing breaking changes.

pandas/core/tools/datetimes.py

pandas/_libs/tslibs/conversion.pyx

pandas/core/indexes/datetimes.py

jbrockmendel · 2018-11-13T22:33:48Z

azure failure is a Hypothesis failure

codecov · 2018-11-13T22:56:46Z

Codecov Report

Merging #23675 into master will increase coverage by <.01%.
The diff coverage is 94.73%.

@@            Coverage Diff             @@
##           master   #23675      +/-   ##
==========================================
+ Coverage   92.31%   92.31%   +<.01%     
==========================================
  Files         161      161              
  Lines       51562    51586      +24     
==========================================
+ Hits        47599    47622      +23     
- Misses       3963     3964       +1

Flag	Coverage Δ
#multiple	`90.71% <94.73%> (ø)`	⬆️
#single	`42.44% <63.15%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/datetimes.py	`96.55% <100%> (+0.02%)`	⬆️
pandas/core/tools/datetimes.py	`85.27% <92.3%> (-0.27%)`	⬇️
pandas/core/arrays/datetimes.py	`98.43% <94.73%> (+0.06%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 022f458...03d5b35. Read the comment docs.

jreback · 2018-11-14T13:01:28Z

pandas/core/indexes/datetimes.py

+            #  as wall-times instead of UTC timestamps.
+            data = data.astype(_NS_DTYPE)
+            copy = False
+            # TODO: Why do we treat this differently from integer dtypes?


no idea, i would try to remove this special handling. only thing i can think of maybe this could have some odd rounding in the astype if its out of range of an int64 .

pandas/core/indexes/datetimes.py

pandas/core/tools/datetimes.py

pandas/tests/indexes/datetimes/test_construction.py

…e-fixes4

…fixes4

TomAugspurger · 2018-11-14T21:35:34Z

Merged master to fix the merge conflict.

jbrockmendel · 2018-11-14T23:47:33Z

Travis error looks unrelated

jorisvandenbossche · 2018-11-14T23:54:23Z

@jbrockmendel Can you give a bit more context here? Eg, why are you adding so much new code (while there are actually no user facing changes, I suppose, except from the deprecation) ?
What is the rationale of not using to_datetime (if we refactor things, I would rather think it is to share more implemenation) ?

jbrockmendel · 2018-11-15T00:05:18Z

Can you give a bit more context here? Eg, why are you adding so much new code (while there are actually no user facing changes, I suppose, except from the deprecation) ?
What is the rationale of not using to_datetime (if we refactor things, I would rather think it is to share more implemenation) ?

Sharing implementation is fine, circularity is not. The fact that to_datetime calls the DatetimeIndex constructor and vice-versa makes it very difficult to sort out (in fact I spent a big chunk of Monday trying to untangle it in one pass and couldn't) what happens in which case. The addition of new code is to make these behaviors more clear/explicit.

Moreover I am not convinced that the float_dtype behavior is intentional.

There is an issue (need to dig it up) reporting a bug in passing a Categorical to DatetimeIndex. Point being: there is value in being more explicit about what we are doing and why.

…e-fixes4

jorisvandenbossche · 2018-11-15T00:38:58Z

Sharing implementation is fine, circularity is not.

Yes, but that circularity already existed. What changes did you try to make that it became more problematic?

The addition of new code is to make these behaviors more clear/explicit.

But it also means duplicating in DatetimeIndex things that are already handled in to_datetime. Not sharing it will much more easily lead to a future change leading to inconsistencies between both?

For the TimedeltaIndex PR, you made some common functions that were used by both. Is a similar pattern not possible here?

jorisvandenbossche

To be clear, I find the logic of what to do by each dtype (as you are introducing here) nice and clear to follow, but mainly wondering why not doing that in to_datetime to avoid divergence between both

pandas/core/indexes/datetimes.py

jbrockmendel · 2018-11-15T00:46:54Z

For the TimedeltaIndex PR, you made some common functions that were used by both. Is a similar pattern not possible here?

That's the goal behind #23702.

jbrockmendel · 2018-11-27T03:19:35Z

Azure fail looks unrelated

…e-fixes4

jbrockmendel · 2018-11-30T00:22:26Z

yeah i agree this [treating floats as wall-times, ints as UTC] seems a little odd, these should just be cast to integer i think.

Change behavior here, deprecation cycle, or follow-up?

jbrockmendel · 2018-11-30T22:56:39Z

gentle ping. Getting the constructors done will noticeably tighten the scope of #24024.

TomAugspurger · 2018-12-01T12:17:35Z

pandas/core/arrays/datetimes.py

+            # If tzaware, these values represent unix timestamps, so we
+            #  return them as i8 to distinguish from wall times
+            return values.view('i8'), tz_parsed
+        except (ValueError, TypeError):


@jbrockmendel is correct here.

side-note, we could clarify all this with python-3 style

except (ValueError, TypeError) as e2: raise e2 from e

or six's https://pythonhosted.org/six/#six.raise_from, but it's probably just easier to wait until next month.

TomAugspurger · 2018-12-01T12:24:14Z

pandas/core/arrays/datetimes.py

+    Returns
+    -------
+    result : ndarray
+        np.int64 dtype if returned values represent UTC timestamps


Just to verify: getting an you'll get an np.int64-dtype result if and only if inferred_tz is not None?

jreback · 2018-12-02T21:44:53Z

pandas/core/arrays/datetimes.py

+            # If tzaware, these values represent unix timestamps, so we
+            #  return them as i8 to distinguish from wall times
+            return values.view('i8'), tz_parsed
+        except (ValueError, TypeError):


That sounds like a "no" on the resolving miscommunication. Did I at least accurately summarize your suggested change?

not resolved, but I see from @TomAugspurger which what i was getting at. i guess ok for now.

jreback · 2018-12-02T21:46:19Z

pandas/core/arrays/datetimes.py

+        return result, tz_parsed
+    elif is_object_dtype(result):
+        if allow_object:
+            # allowed by to_datetime, not by DatetimeIndex constructor


right and that is the correct behavior, actually DTI should just return a Index in that case, otherwise this is very confusing. I believe you have an issue talking about mixed-timestatmps. These should always just return an Index of objects. No conversion / inference should happen at all.

jreback · 2018-12-02T21:47:24Z

pandas/core/arrays/datetimes.py

+        if allow_object:
+            # allowed by to_datetime, not by DatetimeIndex constructor
+            return result, tz_parsed
+        raise TypeError(result)


maybe just remove the else and do the raise TypeError(result) at the end

I think its worth distinguishing between hittable TypeError and defensive TypeError

can you comment more then.

jreback · 2018-12-02T21:48:48Z

pandas/core/indexes/datetimes.py

-                                     yearfirst=yearfirst)
+
+        if is_object_dtype(data) or is_string_dtype(data):
+            # TODO: We do not have tests specific to string-dtypes,


you could just write this as

try: data = datas.astype(np.int64) except: ...

might be more clear

The issue with that is np.array(['20160405']) becomes np.array([20160405]) instead of 2016-04-05.

jreback · 2018-12-02T21:49:57Z

pandas/core/tools/datetimes.py

-            raise e
+        except ValueError as e:
+            # Fallback to try to convert datetime objects
+            try:


cam you be slightly more specific on 'fallback', e.g. when would this be triggered

I'll flesh out the comment. The gist of it is if tz-aware datetime objects are passed and utc=True is not passed, then array_to_datetime will raise a ValueError and datetime_to_datetime64 will handle this case.

It's a mess and is a big reason why I've been making array_to_datetime PRs: #24006 is an attempt to fix it, but there are some design questions @mroeschke and I are still batting around.

jbrockmendel · 2018-12-02T22:05:18Z

actually DTI should just return a Index in that case, otherwise this is very confusing. I believe you have an issue talking about mixed-timestatmps. These should always just return an Index of objects. No conversion / inference should happen at all.

I'm -0.5 on having DatetimeIndex.new return an object Index, would rather have it raise. Either way, that is a design change that should be orthogonal to this PR.

jreback · 2018-12-03T00:31:41Z

I'm -0.5 on having DatetimeIndex.new return an object Index, would rather have it raise. Either way, that is a design change that should be orthogonal to this PR.

sure (and raising is prob correct)

…e-fixes4

jreback · 2018-12-03T01:29:55Z

pandas/core/arrays/datetimes.py

+            return result, tz_parsed
+        raise TypeError(result)
+    else:  # pragma: no cover
+        # GH#23675 this TypeError should never be hit, whereas the TypeError


this is not more clear.

Can you put the cases where each of these would be hit

jbrockmendel · 2018-12-03T03:09:52Z

This is a blocker for DatetimeArray._from_sequence, so probably merits a 0.24.0 milestone tag

jreback · 2018-12-03T13:20:57Z

@jbrockmendel ok thanks. I realy really want to limit scope to things that we absolutely need to pushout the Datetime and associated EA's. sure there are other things to do, but theses need much more consideration that we have time for. (basically I am saying, this code IMHO is not strictly necessary for this refactor, though its a nice to do).

start untangling DatetimeIndex constructor; deprecate passing of time…

f8efbef

…delta64 data

add GH references

9d20bc9

jbrockmendel commented Nov 13, 2018

View reviewed changes

pandas/core/tools/datetimes.py Outdated Show resolved Hide resolved

mroeschke reviewed Nov 13, 2018

View reviewed changes

pandas/_libs/tslibs/conversion.pyx Outdated Show resolved Hide resolved

Fix incorrect usage of DatetimeIndex

aef3f4c

mroeschke reviewed Nov 13, 2018

View reviewed changes

pandas/core/indexes/datetimes.py Show resolved Hide resolved

dummy commit to force CI

66ae42b

gfyoung added Datetime Datetime data dtype Deprecate Functionality to remove in pandas labels Nov 14, 2018

jreback requested changes Nov 14, 2018

View reviewed changes

jbrockmendel added 4 commits November 14, 2018 08:24

more explicit name, test with TimedeltaIndex

d0e8ee3

remove comment

e1f4e17

Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…

d18e0df

…e-fixes4

make exception catching less specific

a4c8c77

jbrockmendel mentioned this pull request Nov 14, 2018

Catch exception around much smaller piece of code #23702

Closed

jbrockmendel and others added 2 commits November 14, 2018 10:33

Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…

5dc5980

…e-fixes4

Merge remote-tracking branch 'upstream/master' into jbrockmendel-pre-…

7e5587e

…fixes4

jbrockmendel mentioned this pull request Nov 14, 2018

Datetimelike Array Refactor #23185

Closed

Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…

e94e826

…e-fixes4

jorisvandenbossche reviewed Nov 15, 2018

View reviewed changes

pandas/core/indexes/datetimes.py Outdated Show resolved Hide resolved

pandas/core/indexes/datetimes.py Outdated Show resolved Hide resolved

jbrockmendel added 2 commits November 26, 2018 19:59

Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…

0367d6f

…e-fixes4

Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…

7cc8577

…e-fixes4

jbrockmendel mentioned this pull request Nov 29, 2018

REF: prelim for fixing array_to_datetime #24000

Merged

Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…

b3a096b

…e-fixes4

jbrockmendel mentioned this pull request Nov 30, 2018

REF: DatetimeLikeArray #24024

Merged

12 tasks

jbrockmendel mentioned this pull request Dec 1, 2018

WIP: multi-timezone handling for array_to_datetime #24006

Closed

TomAugspurger approved these changes Dec 1, 2018

View reviewed changes

jbrockmendel mentioned this pull request Dec 2, 2018

REF: array_to_datetime catch overflows in one place #24049

Merged

jreback requested changes Dec 2, 2018

View reviewed changes

Flesh out comment

fd5af18

jbrockmendel added 2 commits December 2, 2018 17:26

comment

2cdd215

Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…

782ca81

…e-fixes4

jreback reviewed Dec 3, 2018

View reviewed changes

comment more

03d5b35

jreback added this to the 0.24.0 milestone Dec 3, 2018

jreback approved these changes Dec 3, 2018

View reviewed changes

jreback merged commit e7991b3 into pandas-dev:master Dec 3, 2018

jbrockmendel deleted the pre-fixes4 branch December 3, 2018 15:18

jbrockmendel mentioned this pull request Dec 3, 2018

Implement DatetimeArray._from_sequence #24074

Merged

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

implement deprecation portion of pandas-dev#23675 (pandas-dev#23937)

1145d38

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

REF/DEPR: DatetimeIndex constructor (pandas-dev#23675)

d869973

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

implement deprecation portion of pandas-dev#23675 (pandas-dev#23937)

458ccab

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

REF/DEPR: DatetimeIndex constructor (pandas-dev#23675)

1235666

REF/DEPR: DatetimeIndex constructor #23675

REF/DEPR: DatetimeIndex constructor #23675

Conversation

jbrockmendel commented Nov 13, 2018

pep8speaks commented Nov 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 13, 2018

codecov bot commented Nov 13, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

TomAugspurger commented Nov 14, 2018

jbrockmendel commented Nov 14, 2018

jorisvandenbossche commented Nov 14, 2018

jbrockmendel commented Nov 15, 2018

jorisvandenbossche commented Nov 15, 2018

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 15, 2018

jbrockmendel commented Nov 27, 2018

jbrockmendel commented Nov 30, 2018

jbrockmendel commented Nov 30, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Dec 2, 2018

jreback commented Dec 3, 2018

Choose a reason for hiding this comment

jbrockmendel commented Dec 3, 2018

jreback commented Dec 3, 2018

codecov bot commented Nov 13, 2018 •

edited

Loading