-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: pd.to_numeric
has an inconsistent behavior for datetime
objects
#43280
Comments
Assign me this issue. I will solve it. I have to make an open source contribution as part of Microsoft Research Intern Role so assign this to me |
take |
take |
After some investigation, it looks like this behavior is explained by: numpy/numpy#19782, so not sure if this is a |
The key point to note is that Incorrect, should be
Correct
Incorrect, should be
Correct
Incorrect, probably related to #16674
Correct
|
Hi Matthew, thanks for answering. That's the other approach that I considered, but since there weren't documentation of the The use case where I am struggling with is this one: >>> import pandas as pd
>>> from datetime import datetime
>>> pd.to_numeric(pd.Series([datetime(2021,8,22)]), errors="raise") # Same for errors="ignore"/"coerce"
0 1629590400000000000
dtype: int64
>>> pd.to_numeric(pd.Series(["apple", 1, datetime(2021,8,22)]), errors="coerce")
0 NaN
1 1.0
2 NaN
dtype: float64
>>> pd.to_numeric(pd.Series(["apple", 1, datetime(2021,8,22)]), errors="ignore")
0 apple
1 1
2 2021-08-22 00:00:00
dtype: object Is this desired/expected? Should we return |
Sorry, yes the documentation could use improvement to address how datetime-like objects (e.g. I would expect
|
IMO, the best approach would be to have a function from functools import partial
def to_numeric(arg, errors="raise", downcast=None):
if is_scalar(arg):
return pd._scalar_to_numeric(arg, errors=errors, downcast=downcast)
elif is_series_or_index(arg):
return arg.apply(pd._scalar_to_numeric, errors=errors, downcast=downcast)
elif is_list_tuple_or_np_array(arg):
# Maybe keeping the dtype as well for `np.array`?
return np.array(map(partial(pd._scalar_to_numeric, errors=errors, downcast=downcast), arg) What do you think? |
Probably the best approach would be to modify this line here to handle datetime like scalars (use one of the existing functions in pandas/pandas/core/tools/numeric.py Line 155 in 5f648bf
|
By only doing that, the problem will persist for lists, tuples and np.arrays |
Best to address scalars vs array-likes in separate PRs. I imagine addressing array-likes may need to use one of the datetime inference functions for arrays |
I don't see a way of fixing array-likes without using the same logic than for scalars. Inferring the type of the array isn't enough given all the possible cases. Having a function that handles scalars and then mapping this function element by element for iterables seems to be the easiest solution. |
This will kill performance for the existing cases, so that implementation is probably a non-starter. |
Understood. Then let's split the problem
|
This is still an ongoing issue as of the 13th of February, 2025. This could theoretically be fixed by using the dtype Int64, which has support for NA values.
|
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Problem description
When using
pd.to_numeric
to convert apd.Series
with dtypedatetime64[ns]
, it returns different values than converting the series value by valueExpected Output
Converting a
pd.Series
as a whole should be the same than converting it value by value.I am not sure about what the correct output should be, but IMO the output should be consistent in these two scenarios.
What I suggest:
pd.NaT
, always returnsnp.NaN
Output of
pd.show_versions()
I am using the latest version of
master
until todayINSTALLED VERSIONS
commit : e39ea30
python : 3.8.3.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Tue Jun 22 19:49:55 PDT 2021; root:xnu-6153.141.35~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8
pandas : 1.4.0.dev0+517.gc3761e24d8
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.2.0.post20200714
Cython : 0.29.21
pytest : 5.4.3
hypothesis : None
sphinx : 3.1.2
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.9
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1
The text was updated successfully, but these errors were encountered: