Skip to content

BUG: DataFrame with tz-aware data and max(axis=1) returns NaN #10390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ewan1983 opened this issue Jun 19, 2015 · 2 comments · Fixed by #24759
Closed

BUG: DataFrame with tz-aware data and max(axis=1) returns NaN #10390

ewan1983 opened this issue Jun 19, 2015 · 2 comments · Fixed by #24759
Labels
Bug Timezones Timezone data dtype
Milestone

Comments

@ewan1983
Copy link

I have a dataframe looks like this, and its column 2 is missing:
image

When I try to select the max date in each row, I got all NaN in return:
img2

However, If the dataframe's type is float64, the selection work as expected.

@jreback
Copy link
Contributor

jreback commented Jun 19, 2015

pls show pd.show_versions() and df_datetime64.info()

@TimTimMadden
Copy link

TimTimMadden commented Oct 28, 2016

Hello! Hijacking this issue as I've also verified this behaviour (actually, it took a while to discover after upgrading to 0.19.0 and discovering some odd dropping of timezones - see #14524, which is a duplication of #13905). This behaviour was masked to my program previously as Pandas 0.18.1 was dropping the timezones from all relevant columns before I tried to perform this step. Once upgrading to 0.19.0 half the operations I was performing stopped dropping timezones, leading to mismatch between tz-aware and tz-naive timestamps which I've been chasing down the rabbit hole for a couple of days now.

I've verified that this is present in pandas 0.18.1 and 0.19.0.

From some stepping through of the code, this looks like a potential problem with the numpy implementations of .max(axis=1), but I haven't yet found the culprit!

This issue has meant that I've been forced to roll back to 0.18.1 to use the drop timezone bug in order to make the df.max(axis=1) work, which is frustrating! I have also tried a df.T.max() to work around the issue, but this infuriatingly returns an empty series (see below).

A small, complete example of the issue

import pandas as pd
df = pd.DataFrame(pd.date_range(start=pd.Timestamp('2016-01-01 00:00:00+00'), end=pd.Timestamp('2016-01-01 23:59:59+00'), freq='H'))
df.columns = ['a']

df['b'] = df.a.subtract(pd.Timedelta(seconds=60*60)) # if using pandas 0.19.0 to test, ensure that this is a series of timedeltas instead of a single - we want b and c to be tz-naive.

df[['a', 'b']].max() # This is fine, produces two numbers

df[['a', 'b']].max(axis=1) # This is not fine, produces a correctly sized series of NaN

df['c'] = df.a.subtract(pd.Timedelta(seconds=60)) # if using pandas 0.19.0 to test, ensure that this is a series of timedeltas instead of a single - we want b and c to be tz-naive.

df[['b', 'c']].max(axis=1) # This is fine, produces correctly sized series of valid timestamps without timezone

df[['a', 'b']].T.max() # produces an empty series.

Expected Output

Calling df.max(axis=1) on a dataframe with timezone-aware timestamps should return valid timestamps, not NaN.

Output of pd.show_versions()

(I have tested in two virtualenvs, the only difference between the two being the pandas version)

Paste the output here

INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: None.None

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 28.6.0
Cython: None
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

@mroeschke mroeschke changed the title Unexpected behavior on df_datetime64.max(axis=1) with missing column BUG: DataFrame with tz-aware data and max(axis=1) returns NaN Oct 21, 2018
@mroeschke mroeschke added Bug Timezones Timezone data dtype labels Oct 21, 2018
@jreback jreback added this to the Contributions Welcome milestone Oct 23, 2018
@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Jan 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timezones Timezone data dtype
Projects
None yet
4 participants