Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge_asof() must be able to operate with timezone-aware DatetimeIndex #14844

Closed
chrisaycock opened this issue Dec 9, 2016 · 0 comments
Closed
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Milestone

Comments

@chrisaycock
Copy link
Contributor

I can perform the following merge just fine:

left = pd.DataFrame({'date': pd.DatetimeIndex(start=pd.to_datetime('2016-01-02'),
                                              freq='D', periods=5),
                     'value1':np.arange(5)})
right = pd.DataFrame({'date': pd.DatetimeIndex(start=pd.to_datetime('2016-01-01'),
                                               freq='D', periods=5),
                      'value2':list("ABCDE")})
pd.merge_asof(left, right, on='date', tolerance=pd.Timedelta('1 day'))

However, adding a timezone to the DatetimeIndex doesn't work:

import pytz
left = pd.DataFrame({'date': pd.DatetimeIndex(start=pd.to_datetime('2016-01-02'),
                                              freq='D', periods=5, tz=pytz.timezone('UTC')),
                     'value1':np.arange(5)})
right = pd.DataFrame({'date': pd.DatetimeIndex(start=pd.to_datetime('2016-01-01'),
                                               freq='D', periods=5, tz=pytz.timezone('UTC')),
                      'value2':list("ABCDE")})
pd.merge_asof(left, right, on='date', tolerance=pd.Timedelta('1 day'))

I get the oddly worded

MergeError: incompatible tolerance, must be compat with type <class 'pandas.tseries.index.DatetimeIndex'>

The solution is actually very simple. _AsOfMerge._get_merge_keys() needs to check for is_datetime64tz_dtype() in addition to is_datetime64_dtype() when there is a tolerance. I should also fix the error message to be clearer.

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype labels Dec 9, 2016
@jreback jreback added this to the 0.19.2 milestone Dec 9, 2016
yarikoptic added a commit to neurodebian/pandas that referenced this issue Dec 12, 2016
* origin/master: (22 commits)
  BUG: astype falsely converts inf to integer (GH14265) (pandas-dev#14343)
  BUG: Apply min_itemsize to index even when not appending
  DOC: warning section on memory overflow when joining/merging dataframes on index with duplicate keys (pandas-dev#14788)
  BLD: missing - on secure
  BLD: new access token on pandas-dev
  TST: Test DatetimeIndex weekend offset (pandas-dev#14853)
  BLD: escape GH_TOKEN in build_docs
  TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755)
  Frame benchmarking sum instead of mean (pandas-dev#14824)
  CLN: lint of test_base.py
  BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844)
  BUG: GH11847 Unstack with mixed dtypes coerces everything to object
  TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851)
  BLD: try new gh token for pandas-docs
  CLN/PERF: clean-up of the benchmarks (pandas-dev#14099)
  ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799)
  DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801)
  TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752)
  BLD: use org name in build-docs.sh
  BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832)
  ...
yarikoptic added a commit to neurodebian/pandas that referenced this issue Dec 12, 2016
* commit 'v0.19.0-174-g81a2f79': (156 commits)
  BLD: escape GH_TOKEN in build_docs
  TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755)
  Frame benchmarking sum instead of mean (pandas-dev#14824)
  CLN: lint of test_base.py
  BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844)
  BUG: GH11847 Unstack with mixed dtypes coerces everything to object
  TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851)
  BLD: try new gh token for pandas-docs
  CLN/PERF: clean-up of the benchmarks (pandas-dev#14099)
  ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799)
  DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801)
  TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752)
  BLD: use org name in build-docs.sh
  BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832)
  ENH: Introduce UnsortedIndexError  GH11897 (pandas-dev#14762)
  ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753)
  DOC: Fix grammar and formatting typos (pandas-dev#14803)
  BLD: try new build credentials for pandas-docs
  TST: Test pivot with categorical data
  MAINT: Cleanup pandas/src/parser (pandas-dev#14740)
  ...
yarikoptic added a commit to neurodebian/pandas that referenced this issue Dec 12, 2016
release 0.19.1 was from release branch

* releases: (156 commits)
  BLD: escape GH_TOKEN in build_docs
  TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755)
  Frame benchmarking sum instead of mean (pandas-dev#14824)
  CLN: lint of test_base.py
  BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844)
  BUG: GH11847 Unstack with mixed dtypes coerces everything to object
  TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851)
  BLD: try new gh token for pandas-docs
  CLN/PERF: clean-up of the benchmarks (pandas-dev#14099)
  ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799)
  DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801)
  TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752)
  BLD: use org name in build-docs.sh
  BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832)
  ENH: Introduce UnsortedIndexError  GH11897 (pandas-dev#14762)
  ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753)
  DOC: Fix grammar and formatting typos (pandas-dev#14803)
  BLD: try new build credentials for pandas-docs
  TST: Test pivot with categorical data
  MAINT: Cleanup pandas/src/parser (pandas-dev#14740)
  ...
jorisvandenbossche pushed a commit that referenced this issue Dec 15, 2016
closes #14844

Author: Christopher C. Aycock <christopher.aycock@twosigma.com>

Closes #14845 from chrisaycock/GH14844 and squashes the following commits:

97b73a8 [Christopher C. Aycock] BUG: Allow TZ-aware DatetimeIndex in merge_asof() (#14844)

(cherry picked from commit e991141)
ischurov pushed a commit to ischurov/pandas that referenced this issue Dec 19, 2016
closes pandas-dev#14844

Author: Christopher C. Aycock <christopher.aycock@twosigma.com>

Closes pandas-dev#14845 from chrisaycock/GH14844 and squashes the following commits:

97b73a8 [Christopher C. Aycock] BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants