-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add timedelta as valid type for interpolate with method='time' #14799
ENH: add timedelta as valid type for interpolate with method='time' #14799
Conversation
this PR is currently a WIP, tests have not been added yet. This iteration is for initial reviews |
@TomAugspurger the change that is currently in this PR will support timedelta as part of the index and the case presented in the issue pd.DataFrame({'v':[1,np.nan,5]},index=pd.to_timedelta([1,2,3], unit="d")).interpolate(method="time") will work
Though i think what we expect here is that interpolate should work if the passed data is of type datetime or timedelta, something like this (below) should work (like how cut was enhanced). Is this the expectation?
|
@jreback @jorisvandenbossche @TomAugspurger need pointers about expected functionality. I have added support for time delta as index when interpolating. Should we also be able to interpolate between input time data? |
@aileronajay IIRC the original issue #6424 was about interpolating when the index was timedeltas. We can support interpolating time delta data like in your second example as well. That's fine either in this PR or a separate one. |
@TomAugspurger i will then add support for timedelta as index in this PR and create a separate PR for interpolating time data |
Current coverage is 85.26% (diff: 100%)@@ master #14799 diff @@
==========================================
Files 144 144
Lines 50979 50979
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
Hits 43469 43469
Misses 7510 7510
Partials 0 0
|
are there other changes required in this PR? (like more tests?) @sinhrks @jreback @TomAugspurger @jorisvandenbossche |
@@ -187,7 +187,8 @@ def _interp_limit(invalid, fw_limit, bw_limit): | |||
if method in ('values', 'index'): | |||
inds = np.asarray(xvalues) | |||
# hack for DatetimeIndex, #1646 | |||
if issubclass(inds.dtype.type, np.datetime64): | |||
if (issubclass(inds.dtype.type, np.datetime64) or | |||
issubclass(inds.dtype.type, np.timedelta64)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should use the is_datetime64_dtype functions instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or better yet the needs_i8_conversion function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I have made this change now
@jreback i have made the change requested in the review, are there other changes that need to be taken care of? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a small notice in the v0.20.0.txt whatsnew file (in the 'other enhancements' section)
@@ -891,6 +891,16 @@ def test_spline_error(self): | |||
with tm.assertRaises(ValueError): | |||
s.interpolate(method='spline', order=0) | |||
|
|||
def test_interp_timedelta64(self): | |||
# GH 6424 | |||
tm._skip_if_no_scipy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is scipy needed for this interpolate method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the interpolate method being used over here is 'time', i dont think it requires scipy, i just had this notion that we ignore the interpolation tests if scipy is absent, on a more thorough check i found that is not the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the skip if no scipy statement
@@ -187,7 +188,7 @@ def _interp_limit(invalid, fw_limit, bw_limit): | |||
if method in ('values', 'index'): | |||
inds = np.asarray(xvalues) | |||
# hack for DatetimeIndex, #1646 | |||
if issubclass(inds.dtype.type, np.datetime64): | |||
if (needs_i8_conversion(inds.dtype.type)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the redundant outer pair of parenthesis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger i have addressed this review comment
@@ -891,6 +891,16 @@ def test_spline_error(self): | |||
with tm.assertRaises(ValueError): | |||
s.interpolate(method='spline', order=0) | |||
|
|||
def test_interp_timedelta64(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add another test with non-uniform spacing in the index. e.g. pd.to_timedelta([1, 2, 4])
and ensure that it gets interpolated to the correct value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, would it be better if i add it as one more in the existing test or as a new test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger i have added non uniform spacing as a test case now
90a8086
to
1cd8575
Compare
1cd8575
to
579c4bb
Compare
@jreback I have made the suggested changes |
@@ -59,6 +59,7 @@ Other enhancements | |||
- ``Series`` provides a ``to_excel`` method to output Excel files (:issue:`8825`) | |||
- The ``usecols`` argument in ``pd.read_csv`` now accepts a callable function as a value (:issue:`14154`) | |||
- ``pd.DataFrame.plot`` now prints a title above each subplot if ``suplots=True`` and ``title`` is a list of strings (:issue:`14753`) | |||
- The ``pd.Series.interpolate`` now supports timedelta as index type(:issue:`6424`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with method='time'
(also space before (:issue:...
minor doc change. ping on green. |
@@ -59,6 +59,7 @@ Other enhancements | |||
- ``Series`` provides a ``to_excel`` method to output Excel files (:issue:`8825`) | |||
- The ``usecols`` argument in ``pd.read_csv`` now accepts a callable function as a value (:issue:`14154`) | |||
- ``pd.DataFrame.plot`` now prints a title above each subplot if ``suplots=True`` and ``title`` is a list of strings (:issue:`14753`) | |||
- The ``pd.Series.interpolate`` now supports timedelta as index type with method=`time` (:issue:`6424`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small edit, but I can do on merge:
remove the leading The
and method='time'
needs to be in double backticks.
timedelta as *an* index type
(add an)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback should i make these changes or will be able to do that during merge?
@jreback there was a clean run in travis, appveyor failed after running for an hour, there appears to be a problem with appveyor |
@aileronajay Thanks! |
* origin/master: (22 commits) BUG: astype falsely converts inf to integer (GH14265) (pandas-dev#14343) BUG: Apply min_itemsize to index even when not appending DOC: warning section on memory overflow when joining/merging dataframes on index with duplicate keys (pandas-dev#14788) BLD: missing - on secure BLD: new access token on pandas-dev TST: Test DatetimeIndex weekend offset (pandas-dev#14853) BLD: escape GH_TOKEN in build_docs TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755) Frame benchmarking sum instead of mean (pandas-dev#14824) CLN: lint of test_base.py BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844) BUG: GH11847 Unstack with mixed dtypes coerces everything to object TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851) BLD: try new gh token for pandas-docs CLN/PERF: clean-up of the benchmarks (pandas-dev#14099) ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799) DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801) TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752) BLD: use org name in build-docs.sh BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832) ...
* commit 'v0.19.0-174-g81a2f79': (156 commits) BLD: escape GH_TOKEN in build_docs TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755) Frame benchmarking sum instead of mean (pandas-dev#14824) CLN: lint of test_base.py BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844) BUG: GH11847 Unstack with mixed dtypes coerces everything to object TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851) BLD: try new gh token for pandas-docs CLN/PERF: clean-up of the benchmarks (pandas-dev#14099) ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799) DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801) TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752) BLD: use org name in build-docs.sh BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832) ENH: Introduce UnsortedIndexError GH11897 (pandas-dev#14762) ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753) DOC: Fix grammar and formatting typos (pandas-dev#14803) BLD: try new build credentials for pandas-docs TST: Test pivot with categorical data MAINT: Cleanup pandas/src/parser (pandas-dev#14740) ...
release 0.19.1 was from release branch * releases: (156 commits) BLD: escape GH_TOKEN in build_docs TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755) Frame benchmarking sum instead of mean (pandas-dev#14824) CLN: lint of test_base.py BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844) BUG: GH11847 Unstack with mixed dtypes coerces everything to object TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851) BLD: try new gh token for pandas-docs CLN/PERF: clean-up of the benchmarks (pandas-dev#14099) ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799) DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801) TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752) BLD: use org name in build-docs.sh BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832) ENH: Introduce UnsortedIndexError GH11897 (pandas-dev#14762) ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753) DOC: Fix grammar and formatting typos (pandas-dev#14803) BLD: try new build credentials for pandas-docs TST: Test pivot with categorical data MAINT: Cleanup pandas/src/parser (pandas-dev#14740) ...
git diff upstream/master | flake8 --diff