Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add timedelta as valid type for interpolate with method='time' #14799

Merged
merged 9 commits into from
Dec 10, 2016

Conversation

aileronajay
Copy link
Contributor

@aileronajay aileronajay commented Dec 5, 2016

@aileronajay
Copy link
Contributor Author

this PR is currently a WIP, tests have not been added yet. This iteration is for initial reviews

@aileronajay
Copy link
Contributor Author

@TomAugspurger the change that is currently in this PR will support timedelta as part of the index and the case presented in the issue

pd.DataFrame({'v':[1,np.nan,5]},index=pd.to_timedelta([1,2,3], unit="d")).interpolate(method="time")

will work

pd.DataFrame({'v':[1,np.nan,5]},index=pd.to_timedelta([1,2,3], unit="d")).interpolate(method="time")
v
1 days 1.0
2 days 3.0
3 days 5.0

Though i think what we expect here is that interpolate should work if the passed data is of type datetime or timedelta, something like this (below) should work (like how cut was enhanced). Is this the expectation?

pd.DataFrame({'v':[np.datetime64('2012-01-01'),np.nan,np.datetime64('2012-01-03')]},index=[1,2,3]).interpolate()
v
1 2012-01-01
2 NaT
3 2012-01-03

@aileronajay
Copy link
Contributor Author

@jreback @jorisvandenbossche @TomAugspurger need pointers about expected functionality. I have added support for time delta as index when interpolating. Should we also be able to interpolate between input time data?

@TomAugspurger
Copy link
Contributor

@aileronajay IIRC the original issue #6424 was about interpolating when the index was timedeltas. We can support interpolating time delta data like in your second example as well. That's fine either in this PR or a separate one.

@aileronajay
Copy link
Contributor Author

@TomAugspurger i will then add support for timedelta as index in this PR and create a separate PR for interpolating time data

@sinhrks sinhrks added Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timedelta Timedelta data type labels Dec 6, 2016
@codecov-io
Copy link

codecov-io commented Dec 6, 2016

Current coverage is 85.26% (diff: 100%)

Merging #14799 into master will not change coverage

@@             master     #14799   diff @@
==========================================
  Files           144        144          
  Lines         50979      50979          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits          43469      43469          
  Misses         7510       7510          
  Partials          0          0          

Powered by Codecov. Last update 36bb8af...fff59f5

@aileronajay
Copy link
Contributor Author

are there other changes required in this PR? (like more tests?) @sinhrks @jreback @TomAugspurger @jorisvandenbossche

@@ -187,7 +187,8 @@ def _interp_limit(invalid, fw_limit, bw_limit):
if method in ('values', 'index'):
inds = np.asarray(xvalues)
# hack for DatetimeIndex, #1646
if issubclass(inds.dtype.type, np.datetime64):
if (issubclass(inds.dtype.type, np.datetime64) or
issubclass(inds.dtype.type, np.timedelta64)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should use the is_datetime64_dtype functions instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or better yet the needs_i8_conversion function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I have made this change now

@aileronajay
Copy link
Contributor Author

@jreback i have made the change requested in the review, are there other changes that need to be taken care of?

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a small notice in the v0.20.0.txt whatsnew file (in the 'other enhancements' section)

@@ -891,6 +891,16 @@ def test_spline_error(self):
with tm.assertRaises(ValueError):
s.interpolate(method='spline', order=0)

def test_interp_timedelta64(self):
# GH 6424
tm._skip_if_no_scipy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is scipy needed for this interpolate method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the interpolate method being used over here is 'time', i dont think it requires scipy, i just had this notion that we ignore the interpolation tests if scipy is absent, on a more thorough check i found that is not the case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the skip if no scipy statement

@@ -187,7 +188,7 @@ def _interp_limit(invalid, fw_limit, bw_limit):
if method in ('values', 'index'):
inds = np.asarray(xvalues)
# hack for DatetimeIndex, #1646
if issubclass(inds.dtype.type, np.datetime64):
if (needs_i8_conversion(inds.dtype.type)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the redundant outer pair of parenthesis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger i have addressed this review comment

@@ -891,6 +891,16 @@ def test_spline_error(self):
with tm.assertRaises(ValueError):
s.interpolate(method='spline', order=0)

def test_interp_timedelta64(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add another test with non-uniform spacing in the index. e.g. pd.to_timedelta([1, 2, 4]) and ensure that it gets interpolated to the correct value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, would it be better if i add it as one more in the existing test or as a new test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger i have added non uniform spacing as a test case now

@aileronajay
Copy link
Contributor Author

@jreback I have made the suggested changes

@@ -59,6 +59,7 @@ Other enhancements
- ``Series`` provides a ``to_excel`` method to output Excel files (:issue:`8825`)
- The ``usecols`` argument in ``pd.read_csv`` now accepts a callable function as a value (:issue:`14154`)
- ``pd.DataFrame.plot`` now prints a title above each subplot if ``suplots=True`` and ``title`` is a list of strings (:issue:`14753`)
- The ``pd.Series.interpolate`` now supports timedelta as index type(:issue:`6424`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with method='time' (also space before (:issue:...

@jreback
Copy link
Contributor

jreback commented Dec 9, 2016

minor doc change. ping on green.

@@ -59,6 +59,7 @@ Other enhancements
- ``Series`` provides a ``to_excel`` method to output Excel files (:issue:`8825`)
- The ``usecols`` argument in ``pd.read_csv`` now accepts a callable function as a value (:issue:`14154`)
- ``pd.DataFrame.plot`` now prints a title above each subplot if ``suplots=True`` and ``title`` is a list of strings (:issue:`14753`)
- The ``pd.Series.interpolate`` now supports timedelta as index type with method=`time` (:issue:`6424`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small edit, but I can do on merge:

remove the leading The and method='time' needs to be in double backticks.

timedelta as *an* index type (add an)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback should i make these changes or will be able to do that during merge?

@aileronajay
Copy link
Contributor Author

@jreback there was a clean run in travis, appveyor failed after running for an hour, there appears to be a problem with appveyor

@jorisvandenbossche jorisvandenbossche changed the title added timedelta as valid type for conversion to integer ENH: add timedelta as valid type for interpolate with method='time' Dec 10, 2016
@jorisvandenbossche jorisvandenbossche merged commit 1dbc7be into pandas-dev:master Dec 10, 2016
@jorisvandenbossche
Copy link
Member

@aileronajay Thanks!

@jorisvandenbossche jorisvandenbossche added this to the 0.20.0 milestone Dec 10, 2016
yarikoptic added a commit to neurodebian/pandas that referenced this pull request Dec 12, 2016
* origin/master: (22 commits)
  BUG: astype falsely converts inf to integer (GH14265) (pandas-dev#14343)
  BUG: Apply min_itemsize to index even when not appending
  DOC: warning section on memory overflow when joining/merging dataframes on index with duplicate keys (pandas-dev#14788)
  BLD: missing - on secure
  BLD: new access token on pandas-dev
  TST: Test DatetimeIndex weekend offset (pandas-dev#14853)
  BLD: escape GH_TOKEN in build_docs
  TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755)
  Frame benchmarking sum instead of mean (pandas-dev#14824)
  CLN: lint of test_base.py
  BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844)
  BUG: GH11847 Unstack with mixed dtypes coerces everything to object
  TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851)
  BLD: try new gh token for pandas-docs
  CLN/PERF: clean-up of the benchmarks (pandas-dev#14099)
  ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799)
  DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801)
  TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752)
  BLD: use org name in build-docs.sh
  BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832)
  ...
yarikoptic added a commit to neurodebian/pandas that referenced this pull request Dec 12, 2016
* commit 'v0.19.0-174-g81a2f79': (156 commits)
  BLD: escape GH_TOKEN in build_docs
  TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755)
  Frame benchmarking sum instead of mean (pandas-dev#14824)
  CLN: lint of test_base.py
  BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844)
  BUG: GH11847 Unstack with mixed dtypes coerces everything to object
  TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851)
  BLD: try new gh token for pandas-docs
  CLN/PERF: clean-up of the benchmarks (pandas-dev#14099)
  ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799)
  DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801)
  TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752)
  BLD: use org name in build-docs.sh
  BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832)
  ENH: Introduce UnsortedIndexError  GH11897 (pandas-dev#14762)
  ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753)
  DOC: Fix grammar and formatting typos (pandas-dev#14803)
  BLD: try new build credentials for pandas-docs
  TST: Test pivot with categorical data
  MAINT: Cleanup pandas/src/parser (pandas-dev#14740)
  ...
yarikoptic added a commit to neurodebian/pandas that referenced this pull request Dec 12, 2016
release 0.19.1 was from release branch

* releases: (156 commits)
  BLD: escape GH_TOKEN in build_docs
  TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755)
  Frame benchmarking sum instead of mean (pandas-dev#14824)
  CLN: lint of test_base.py
  BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844)
  BUG: GH11847 Unstack with mixed dtypes coerces everything to object
  TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851)
  BLD: try new gh token for pandas-docs
  CLN/PERF: clean-up of the benchmarks (pandas-dev#14099)
  ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799)
  DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801)
  TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752)
  BLD: use org name in build-docs.sh
  BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832)
  ENH: Introduce UnsortedIndexError  GH11897 (pandas-dev#14762)
  ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753)
  DOC: Fix grammar and formatting typos (pandas-dev#14803)
  BLD: try new build credentials for pandas-docs
  TST: Test pivot with categorical data
  MAINT: Cleanup pandas/src/parser (pandas-dev#14740)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

interpolate w/ method=time does not work with timedeltas
6 participants