ENH: add errors='coerce' to DataFrame.astype #48781

joooeey · 2022-09-26T09:52:34Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

I wish I could quickly convert a DataFrame with some invalid data to numeric type and coerce. I thought pd.DataFrame.astype could do that but it doesn't have the option to coerce invalid data to NaNs (or NaTs).

In my particular case I have a DataFrame of sensor readings with mostly NaNs (indicating no value received), many integers (those I care about), and some strings (indicating specific errors). I quickly tried to get a histogram to get an overview of that data but the pd.DataFrame.hist requires numeric data which is a few lines of code to get. This is exploratory code I write in my console, so it would be sweet if this could be done with a single method.

Toy Example

import numpy as np
import pandas as pd

df = pd.DataFrame([
    [np.NaN, 0.1, 1.1, 1.6],
    ["error", 0.2, 1.2, 1.7],
    [0.3, "", 1.3, 1.8],
    [0.4, 1.4, "code255", 1.9],
])

df.astype(float, errors="coerce")
# ValueError: Expected value of kwarg 'errors' to be one of ['raise', 'ignore'].
# Supplied value is 'coerce'

import matplotlib.pyplot as plt
plt.hist(df.values.flatten(), bins=[0, 1, 2])

Expected result:

In [30]: df
Out[30]: 
     0    1    2    3
0  NaN  0.1  1.1  1.6
1  NaN  0.2  1.2  1.7
2  0.3  NaN  1.3  1.8
3  0.4  1.4  NaN  1.9

Feature Description

Two options:

Allow multidimensional input (e.g. DataFrames, Numpy Arrays) as the arg of pd.to_numeric. In case of mixed type columns (e.g. integers and floats), we'd have to decide and document if that would operate by column or cast the whole data structure to one dtype. I'd expect by column for DataFrames. Another issue that comes up is how to deal with multidimensional lists and tuples.

OR/AND

Add the option "coerce" to the errors kwarg in pd.DataFrame.astype (the current options are "raise" and "ignore". We'd have to decide how to deal with incompatibilities between errors="coerce" and dtype. E.g. what to do if someone tries to coerce to string. I would expect an error.

To me it looks like the potential for confusing the user is a lot lower with the second option because it has fewer edge cases.

Alternative Solutions

for col in df.cols:
    df[col] = pd.to_numeric(df[col], errors="coerce")

Additional Context

No response

The text was updated successfully, but these errors were encountered:

subbusainath · 2022-09-30T12:08:46Z

take

MarcoGorelli · 2022-09-30T14:09:17Z

Hi @joooeey

To expedite resolution, could you please include a reproducible example?

Like

I have a DataFrame like df = pd.DataFrame(...)
I would like to do ...
I'd like to see this as the output: ...

joooeey · 2022-10-04T09:34:07Z

@MarcoGorelli I added a toy example to the description.

jbrockmendel · 2022-10-07T23:18:02Z

Cc @jorisvandenbossche i think you were looking into design decisions related to this

oricou · 2025-03-05T14:46:51Z

I would like this feature too but why don't you call it "nan" ?

df.astype(float, errors="nan")

It seems easier to understand that errors will be Nan.

BTW we could choose what should become error:

df.astype(int, errors=-1)

and them your feature becomes

df.astype(float, errors=np.nan)

joooeey · 2025-03-06T08:02:47Z

I would like this feature too but why don't you call it "nan" ?

df.astype(float, errors="nan")

It seems easier to understand that errors will be Nan.

This has a number of disadvantages:

errors="coerce" is already established in the signature of pd.to_numeric. It should be the same in df.astype for consistency.
errors="nan" is not a great choice where values become pd.NA or pd.NaT, depending on the dtype.

BTW we could choose what should become error:

df.astype(int, errors=-1)

and them your feature becomes

df.astype(float, errors=np.nan)

This is a great idea, especially for dtypes int, category, str, etc. that don't have a NaN-value.

joooeey added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 26, 2022

github-actions bot assigned subbusainath Sep 30, 2022

MarcoGorelli added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 30, 2022

MarcoGorelli changed the title ~~ENH:~~ ENH: add errors='coerce' to DataFrame.astype Oct 4, 2022

subbusainath mentioned this issue Oct 11, 2022

enhancement: Added 'errors=coerce' option in astype() #49042

Closed

5 tasks

joooeey mentioned this issue Feb 13, 2023

ENH: to_numeric on dataframe #51357

Closed

3 tasks

rhshadrach added Dtype Conversions Unexpected or buggy dtype conversions and removed Needs Info Clarification about behavior needed to assess issue labels Feb 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add errors='coerce' to DataFrame.astype #48781

ENH: add errors='coerce' to DataFrame.astype #48781

joooeey commented Sep 26, 2022 •

edited

Loading

subbusainath commented Sep 30, 2022

MarcoGorelli commented Sep 30, 2022

joooeey commented Oct 4, 2022

jbrockmendel commented Oct 7, 2022

oricou commented Mar 5, 2025 •

edited

Loading

joooeey commented Mar 6, 2025

ENH: add errors='coerce' to DataFrame.astype #48781

ENH: add errors='coerce' to DataFrame.astype #48781

Comments

joooeey commented Sep 26, 2022 • edited Loading

Feature Type

Problem Description

Toy Example

Feature Description

Alternative Solutions

Additional Context

subbusainath commented Sep 30, 2022

MarcoGorelli commented Sep 30, 2022

joooeey commented Oct 4, 2022

jbrockmendel commented Oct 7, 2022

oricou commented Mar 5, 2025 • edited Loading

joooeey commented Mar 6, 2025

joooeey commented Sep 26, 2022 •

edited

Loading

oricou commented Mar 5, 2025 •

edited

Loading