Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_datetime can unexpectedly return numpy array instead of DatetimeIndex #23045

Closed
krivonogov opened this issue Oct 8, 2018 · 7 comments
Closed
Labels
Datetime Datetime data dtype

Comments

@krivonogov
Copy link

In pandas>=0.23.0

import pandas as pd
good_input = np.array(['bad_format', '2013-06-07'], dtype='O')
bad_input = np.array([np.string_('bad_format'), '2013-06-07'], dtype='O')
pd.to_datetime(good_input, errors='coerce')
pd.to_datetime(bad_input, errors='coerce')

returns

Out[4]: DatetimeIndex(['NaT', '2013-06-07'], dtype='datetime64[ns]', freq=None)
Out[5]: array(['bad_format', '2013-06-07'], dtype=object)

while when using pandas==0.22.0 everything is fine and returns

Out[4]: DatetimeIndex(['NaT', '2013-06-07'], dtype='datetime64[ns]', freq=None)
Out[5]: DatetimeIndex(['NaT', '2013-06-07'], dtype='datetime64[ns]', freq=None)

More precisely, the behaviour of pandas._libs.array_to_datetime have been changed between 0.22.0 and 0.23.0:

from pandas._libs import tslib
tslib.array_to_datetime(good_input, errors='coerce')
tslib.array_to_datetime(bad_input, errors='coerce')

gives

Out[7]: 
array([                          'NaT', '2013-06-07T00:00:00.000000000'],
      dtype='datetime64[ns]')
Out[8]: array(['bad_format', '2013-06-07'], dtype=object)

Expected Output

the same as it was in pandas==0.22.0

Out[4]: DatetimeIndex(['NaT', '2013-06-07'], dtype='datetime64[ns]', freq=None)
Out[5]: DatetimeIndex(['NaT', '2013-06-07'], dtype='datetime64[ns]', freq=None)

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-1067-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 39.1.0
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: 0.7.1
xarray: None
IPython: 5.8.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.5
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.5
feather: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.5.0
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

Can you try on master?

In [4]: pd.to_datetime(bad_input, errors='coerce')
   ...:
Out[4]: DatetimeIndex(['NaT', '2013-06-07'], dtype='datetime64[ns]', freq=None)

@jorisvandenbossche
Copy link
Member

I also don't see it for 0.23.4 on my laptop, so it might also be related to other factors (eg numpy version, python version. I was using python 3.5 with numpy 1.13 for both master and 0.23.4)

@krivonogov
Copy link
Author

krivonogov commented Oct 9, 2018

Sorry, I didn't specify it in the first place, but I was using python 2.7.12. I tested it on other machine (always with python 2.7.12) and had the same result. However, I haven't tested on master yet.

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: None
pip: 9.0.1
setuptools: 35.0.1
Cython: 0.24.1
numpy: 1.13.3
scipy: 0.18.0
pyarrow: None
xarray: None
IPython: 5.0.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 1.5.1
openpyxl: 2.4.9
xlrd: None
xlwt: None
xlsxwriter: 1.0.2
lxml: 3.6.4
bs4: 4.3.2
html5lib: 1.0b8
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@sinhrks sinhrks added Datetime Datetime data dtype Can't Repro labels Oct 9, 2018
@krivonogov
Copy link
Author

I tried on master and I get yet another output (which however still doesn't match previous behaviour). BTW, everything works as expected on python 3.

pd.to_datetime(good_input, errors='coerce')
Out[4]: DatetimeIndex(['NaT', '2013-06-07'], dtype='datetime64[ns]', freq=None)

pd.to_datetime(bad_input, errors='coerce')
Out[5]: Index([u'bad_format', u'2013-06-07'], dtype='object')

INSTALLED VERSIONS

commit: ca7d518
python: 2.7.15.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: None.None

pandas: 0.24.0.dev0+717.gca7d518cb
pytest: None
pip: 10.0.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.1
scipy: None
pyarrow: None
xarray: None
IPython: 5.8.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@jorisvandenbossche
Copy link
Member

(which however still doesn't match previous behaviour

What's the difference? That it is unicode?


Thanks for checking on the different python versions. So it is a python2.7 issue then.

@jorisvandenbossche
Copy link
Member

What's the difference? That it is unicode?

Ah, sorry, that is is an Index instead of array, I see.

@mroeschke
Copy link
Member

Closing since pandas no longer supports 2.7 and it's fixed on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype
Projects
None yet
Development

No branches or pull requests

5 participants