pd.Series.interpolate(method="quadratic) Error with non-numeric index column #21662

kevinislas2 · 2018-06-28T02:00:32Z

I'm working with the Series.interpolate function and I noticed that a DataFrame's index column can cause some weird problems when using the quadratic method.

First example: trying to impute data with non-numeric index column crashes:

data = {'A': ["a", "b", "c", "d"], 'B': [0, 1, np.nan, 100]}
df = pd.DataFrame(data=data)
df.set_index("A", inplace=True)
s = pd.Series(df["B"])

s.interpolate(method="quadratic")

Raises the following error: 'TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' '

I know Pandas uses Scipy's quadratic interpolate method, and while this error is raised inside Scipy, I believe it is because Pandas expects a numeric index column to interpolate data when using a quadratic method and sends it to Scipy's method.

The previous code runs without errors by not using the quadratic method.

The following code also runs without any errors:

data = {'A': [1, 2, 3, 4], 'B': [0, 1, np.nan, 100]}
df = pd.DataFrame(data=data)
df.set_index("A", inplace=True)
s = pd.Series(df["B"])

s.interpolate(method="quadratic")

outputs:

A
0      0.0
1      1.0
2     18.0
4    100.0
Name: B, dtype: float64

So while it makes sense to use the index column as an indicator of how many timesteps separate two values in a series, other methods seem to simply assume 1 between each row and avoid using the index column.

Two things I can think of that could be helpful is sending a more descriptive error message when the method receives a non-numeric index column or writing this condition in the docs, as I couldn't find anything about this error and solved it by tinkering with the index column.

Let me know if any of these ideas seem appropiate to prepare a PR.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.20.3
pytest: 3.2.1
pip: 10.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2018-06-28T02:23:02Z

I'm not sure we should assume that the user wants us to use range(N) as the interpolation values for object indexes. That's assuming things about ordering and equal-spacing that may not be correct.

other methods seem to simply assume 1 between each row and avoid using the index column.

What methods do you mean by 'other'? AFAIK, linear is the only one that ignores the index values.

sending a more descriptive error message when the method receives a non-numeric index

That seems reasonable.

kevinislas2 · 2018-06-28T02:38:12Z

What methods do you mean by 'other'? AFAIK, linear is the only one that ignores the index

You're right, Series.interpolate's documentation
states that all methods except linear use index values:

‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’ is passed to scipy.interpolate.interp1d. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method=’polynomial’, order=4). These use the actual numerical values of the index.

I misread and thought that the last line was referring only to polynomial and spline, though now that I think about it it doesn't make sense that the other methods wouldn't.

…-dev#21662

TrigonaMinima · 2019-02-19T20:54:03Z

I'll start working on this.

TomAugspurger added Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Error Reporting Incorrect or improved errors from pandas Effort Low good first issue labels Jun 28, 2018

TomAugspurger added this to the Next Major Release milestone Jun 28, 2018

kevinislas2 added a commit to kevinislas2/pandas that referenced this issue Jul 4, 2018

Error message when interpolating with non-numeric index column pandas…

c18253f

…-dev#21662

kevinislas2 mentioned this issue Jul 6, 2018

Interpolation nonnumeric error handling #21773

Closed

4 tasks

jreback modified the milestones: Next Major Release, 0.24.0 Jul 6, 2018

jreback modified the milestones: 0.24.0, Contributions Welcome Oct 11, 2018

TrigonaMinima mentioned this issue Feb 21, 2019

BUG: pd.Series.interpolate non-numeric index column (21662) #25394

Merged

3 tasks

nmusolino mentioned this issue Mar 1, 2019

DOC: Reword Series.interpolate docstring for clarity #25491

Merged

3 tasks

jreback modified the milestones: Contributions Welcome, 0.25.0 Mar 22, 2019

jreback closed this as completed in #25394 Mar 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pd.Series.interpolate(method="quadratic) Error with non-numeric index column #21662

pd.Series.interpolate(method="quadratic) Error with non-numeric index column #21662

kevinislas2 commented Jun 28, 2018

TomAugspurger commented Jun 28, 2018

kevinislas2 commented Jun 28, 2018

TrigonaMinima commented Feb 19, 2019

pd.Series.interpolate(method="quadratic) Error with non-numeric index column #21662

pd.Series.interpolate(method="quadratic) Error with non-numeric index column #21662

Comments

kevinislas2 commented Jun 28, 2018

Output of pd.show_versions()

TomAugspurger commented Jun 28, 2018

kevinislas2 commented Jun 28, 2018

TrigonaMinima commented Feb 19, 2019

Output of `pd.show_versions()`