BUG: `pd.to_numeric` has an inconsistent behavior for `datetime` objects #43280

hec10r · 2021-08-29T01:18:43Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

>>> import pandas as pd
>>> from datetime import datetime
>>> pd.to_numeric(datetime(2021, 8, 22), errors="coerce")
nan
>>> pd.to_numeric(pd.Series(datetime(2021, 8, 22)), errors="coerce")
0    1629590400000000000
dtype: int64
>>> pd.Series([datetime(2021, 8, 22)]).apply(partial(pd.to_numeric), errors="coerce")
0   NaN
dtype: float64
>>>
>>> pd.to_numeric(pd.NaT, errors="coerce")
nan
>>> pd.to_numeric(pd.Series(pd.NaT), errors="coerce")
0   -9223372036854775808
dtype: int64
>>> pd.Series([pd.NaT]).apply(partial(pd.to_numeric), errors="coerce")
0   NaN
dtype: float64

Problem description

When using pd.to_numeric to convert a pd.Series with dtype datetime64[ns], it returns different values than converting the series value by value

Expected Output

Converting a pd.Series as a whole should be the same than converting it value by value.
I am not sure about what the correct output should be, but IMO the output should be consistent in these two scenarios.

What I suggest:

For no-null values, returns the same value. Maybe the integer?
For pd.NaT, always returns np.NaN

Output of `pd.show_versions()`

I am using the latest version of master until today

INSTALLED VERSIONS

commit : e39ea30
python : 3.8.3.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Tue Jun 22 19:49:55 PDT 2021; root:xnu-6153.141.35~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8

pandas : 1.4.0.dev0+517.gc3761e24d8
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.2.0.post20200714
Cython : 0.29.21
pytest : 5.4.3
hypothesis : None
sphinx : 3.1.2
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.9
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

The text was updated successfully, but these errors were encountered:

DAKSHA2001 · 2021-08-29T09:23:12Z

Assign me this issue. I will solve it. I have to make an open source contribution as part of Microsoft Research Intern Role so assign this to me

ShreyasPatel031 · 2021-08-29T16:20:59Z

take

Navaneethan2503 · 2021-08-29T21:20:31Z

this PR will close this issue #43289 , thanks for this issue @hec10r .
closes #43280

hec10r · 2021-08-30T00:38:05Z

take

hec10r · 2021-08-30T01:54:49Z

After some investigation, it looks like this behavior is explained by: numpy/numpy#19782, so not sure if this is a pandas issue or needs to be fixed in numpy. Before creating a PR here I would like to have the input from the maintainers.

mroeschke · 2021-08-31T18:49:10Z

The key point to note is that datetime objects in pandas get converted to datetime64[ns] (this has been the convention for a while) and not explicitly an object unless dtype=object is specified. So that being said:

Incorrect, should be 1629590400000000000

>>> pd.to_numeric(datetime(2021, 8, 22), errors="coerce")
nan

Correct

>>> pd.to_numeric(pd.Series(datetime(2021, 8, 22)), errors="coerce")
0    1629590400000000000
dtype: int64

Incorrect, should be 1629590400000000000

>>> pd.Series([datetime(2021, 8, 22)]).apply(partial(pd.to_numeric), errors="coerce")
0   NaN
dtype: float64

Correct

>>> pd.to_numeric(pd.NaT, errors="coerce")
nan

Incorrect, probably related to #16674

>>> pd.to_numeric(pd.Series(pd.NaT), errors="coerce")
0   -9223372036854775808
dtype: int64

Correct

>>> pd.Series([pd.NaT]).apply(partial(pd.to_numeric), errors="coerce")
0   NaN

hec10r · 2021-08-31T19:15:44Z

Hi Matthew, thanks for answering.

That's the other approach that I considered, but since there weren't documentation of the pd.to_numeric behavior for date-like objects and I didn't find it very intuitive, I thought that changing the whole behavior for something more intuitive would be good.

The use case where I am struggling with is this one:

>>> import pandas as pd
>>> from datetime import datetime
>>> pd.to_numeric(pd.Series([datetime(2021,8,22)]), errors="raise") # Same for errors="ignore"/"coerce"
0    1629590400000000000
dtype: int64
>>> pd.to_numeric(pd.Series(["apple", 1, datetime(2021,8,22)]), errors="coerce")
0    NaN
1    1.0
2    NaN
dtype: float64
>>> pd.to_numeric(pd.Series(["apple", 1, datetime(2021,8,22)]), errors="ignore")
0                  apple
1                      1
2    2021-08-22 00:00:00
dtype: object

Is this desired/expected? Should we return 1629590400000000000 in the three cases?

mroeschke · 2021-08-31T19:32:34Z

Sorry, yes the documentation could use improvement to address how datetime-like objects (e.g. np.datetime64, datetime.datetime, etc) are treated.

I would expect 1629590400000000000 in all 3 cases, thought this case is a bit tricky since the entire Series has dtype=object, but certain elements in that Series can be converted to numbers as it aligns with what you mentioned in your OP

Converting a pd.Series as a whole should be the same as converting it value by value.

hec10r · 2021-08-31T20:16:08Z

IMO, the best approach would be to have a function pd._scalar_to_numericand then call it inside pd.to_numeric, something like:

from functools import partial
def to_numeric(arg, errors="raise", downcast=None):
    if is_scalar(arg):
        return pd._scalar_to_numeric(arg, errors=errors, downcast=downcast)
    elif is_series_or_index(arg):
        return arg.apply(pd._scalar_to_numeric, errors=errors, downcast=downcast)
    elif is_list_tuple_or_np_array(arg):
        # Maybe keeping the dtype as well for `np.array`?
        return np.array(map(partial(pd._scalar_to_numeric, errors=errors, downcast=downcast), arg)

What do you think?

mroeschke · 2021-09-01T04:47:49Z

Probably the best approach would be to modify this line here to handle datetime like scalars (use one of the existing functions in pandas.core.dtypes.common

pandas/pandas/core/tools/numeric.py

Line 155 in 5f648bf

elif is_scalar(arg):

hec10r · 2021-09-01T04:49:22Z

By only doing that, the problem will persist for lists, tuples and np.arrays

mroeschke · 2021-09-01T05:00:50Z

Best to address scalars vs array-likes in separate PRs. I imagine addressing array-likes may need to use one of the datetime inference functions for arrays

hec10r · 2021-09-01T05:13:53Z

I don't see a way of fixing array-likes without using the same logic than for scalars. Inferring the type of the array isn't enough given all the possible cases. Having a function that handles scalars and then mapping this function element by element for iterables seems to be the easiest solution.

mroeschke · 2021-09-01T05:21:32Z

mapping this function element by element for iterables seems to be the easiest solution

This will kill performance for the existing cases, so that implementation is probably a non-starter.

hec10r · 2021-09-01T14:37:22Z

Understood. Then let's split the problem

PR to fix behavior for scalar date-like objects, including pd.NaT (will do this before weekend)
PR to fix behavior for iterables: infer types for list/tuples and use current approach for np.array when type is number-like. Approach for mixed types TBF (open to discuss and implement)

jack5github · 2025-02-12T23:21:01Z

This is still an ongoing issue as of the 13th of February, 2025. This could theoretically be fixed by using the dtype Int64, which has support for NA values.

from datetime import datetime
import pandas as pd

print(pd.to_numeric(pd.Series([datetime(2025, 2, 13), pd.NaT])))

"""
0    1739404800000000000
1   -9223372036854775808
dtype: int64
"""

hec10r added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 29, 2021

github-actions bot assigned ShreyasPatel031 Aug 29, 2021

Navaneethan2503 mentioned this issue Aug 29, 2021

modified changes for datetime and NaT Objects Behavior closes #43280 #43289

Closed

ShreyasPatel031 removed their assignment Aug 30, 2021

github-actions bot assigned hec10r Aug 30, 2021

hec10r mentioned this issue Aug 31, 2021

BUG: Fix pd.to_numeric to have consistent behavior for date-like arguments #43315

Closed

4 tasks

mroeschke added Dtype Conversions Unexpected or buggy dtype conversions Datetime Datetime data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 30, 2021

ahobeost mentioned this issue Jul 10, 2024

ENH: Replacing behavior currently provided by pandas.to_numeric using errors="ignore" #59221

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: `pd.to_numeric` has an inconsistent behavior for `datetime` objects #43280

BUG: `pd.to_numeric` has an inconsistent behavior for `datetime` objects #43280

hec10r commented Aug 29, 2021 •

edited

Loading

INSTALLED VERSIONS

DAKSHA2001 commented Aug 29, 2021

ShreyasPatel031 commented Aug 29, 2021

Navaneethan2503 commented Aug 29, 2021

hec10r commented Aug 30, 2021

hec10r commented Aug 30, 2021

mroeschke commented Aug 31, 2021

hec10r commented Aug 31, 2021

mroeschke commented Aug 31, 2021

hec10r commented Aug 31, 2021 •

edited

Loading

mroeschke commented Sep 1, 2021

hec10r commented Sep 1, 2021

mroeschke commented Sep 1, 2021

hec10r commented Sep 1, 2021

mroeschke commented Sep 1, 2021

hec10r commented Sep 1, 2021

jack5github commented Feb 12, 2025

BUG: pd.to_numeric has an inconsistent behavior for datetime objects #43280

BUG: pd.to_numeric has an inconsistent behavior for datetime objects #43280

Comments

hec10r commented Aug 29, 2021 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

DAKSHA2001 commented Aug 29, 2021

ShreyasPatel031 commented Aug 29, 2021

Navaneethan2503 commented Aug 29, 2021

hec10r commented Aug 30, 2021

hec10r commented Aug 30, 2021

mroeschke commented Aug 31, 2021

hec10r commented Aug 31, 2021

mroeschke commented Aug 31, 2021

hec10r commented Aug 31, 2021 • edited Loading

mroeschke commented Sep 1, 2021

hec10r commented Sep 1, 2021

mroeschke commented Sep 1, 2021

hec10r commented Sep 1, 2021

mroeschke commented Sep 1, 2021

hec10r commented Sep 1, 2021

jack5github commented Feb 12, 2025

BUG: `pd.to_numeric` has an inconsistent behavior for `datetime` objects #43280

BUG: `pd.to_numeric` has an inconsistent behavior for `datetime` objects #43280

hec10r commented Aug 29, 2021 •

edited

Loading

Output of `pd.show_versions()`

hec10r commented Aug 31, 2021 •

edited

Loading