Rank Mixes np.nan with np.inf values #19538

WillAyd · 2018-02-05T15:57:18Z

Code Sample, a copy-pastable example if possible

In []: df = pd.DataFrame([1, np.nan, np.inf, -np.inf, 25])
In []: df.rank()
Out []:
     0
0  2.0
1  NaN
2  NaN
3  1.0
4  3.0

In []: df.rank(ascending=False)
Out []:
     0
0  3.0
1  NaN
2  1.0
3  NaN
4  2.0

Problem description

np.inf or -np.inf gets grouped with np.nan in the rank operation, depending on which direction the ranking occurs in. Ideally, np.inf would be entirely separate from np.nan.

The text was updated successfully, but these errors were encountered:

jreback · 2018-02-06T11:01:18Z

see #6945 / #17903 this was just done in master, is this another case?

cc @peterpanmj

WillAyd · 2018-02-06T17:43:08Z

Interesting...from glancing at it I think the problem is that PR only updated the rank_1d_ methods in algos. This makes it so you get the desired functionality from a Series object but not from a DataFrame or GroupBy object. There's also a bug with how it handles values in descending order, all of which are highlighted below

In []: df[0].rank()  # this works
Out[]: 
0    2.0
1    NaN
2    4.0
3    1.0
4    3.0

In []: df[0].rank(ascending=False)  # handles na appropriately, but incorrectly sets np.inf and -np.inf to equal
Out[]: 
0    3.0
1    NaN
2    1.0
3    1.0
4    2.0

In []: df['key'] = ['foo'] * 5
In []: df.groupby('key').rank()  # doesn't handle missing values appropriately
Out[]: 
     0
0  2.0
1  NaN
2  NaN
3  1.0
4  3.0

INSTALLED VERSIONS

commit: 93c86aa
python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+238.g93c86aa13.dirty
pytest: 3.2.5
pip: 9.0.1
setuptools: 36.4.0
Cython: 0.26
numpy: 1.13.1
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.3
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

peterpanmj · 2018-02-09T03:15:07Z

It is the same issue as #6945. And, it seems that #17903 does not completely fix it when ascending is False. I will have a look into it.
Group by nan values is another thing .

In []: df['key'] = ['foo'] * 5
In []: df.groupby('key').rank()  # doesn't handle missing values appropriately
Out[]: 
     0
0  2.0
1  NaN
2  NaN
3  1.0
4  3.0

It should related to #3729

WillAyd · 2018-02-09T03:33:31Z

@peterpanmj whatever solution you come up with in algos it shouldn't be that different to move to the groupby_helper.pyx.in file to support GroupBy nan handling as well (I'm touching the latter in #19481). Ideally these would be consistent

…andas-dev#19538

…20091)

…v#19538 (pandas-dev#20091)

WillAyd mentioned this issue Feb 5, 2018

PERF: Cythonize Groupby Rank #19481

Merged

4 tasks

jreback added the Numeric Operations Arithmetic, Comparison, and Logical operations label Feb 6, 2018

peterpanmj mentioned this issue Mar 10, 2018

FIX: add support for desc order when ranking infs with nans #19538 #20091

Merged

4 tasks

jreback added this to the 0.23.0 milestone Mar 10, 2018

jreback added the Bug label Mar 10, 2018

peterpanmj added a commit to peterpanmj/pandas that referenced this issue Mar 12, 2018

parametrize test_rank_tie_methods_on_infs_nans, add a small test for p…

aeed45e

…andas-dev#19538

peterpanmj added a commit to peterpanmj/pandas that referenced this issue Mar 14, 2018

parametrize test_rank_tie_methods_on_infs_nans, add a small test for p…

f6d0fd0

…andas-dev#19538

jreback closed this as completed in #20091 Mar 30, 2018

jreback pushed a commit that referenced this issue Mar 30, 2018

FIX: add support for desc order when ranking infs with nans #19538 (#…

0a00365

…20091)

kornilova203 pushed a commit to kornilova203/pandas that referenced this issue Apr 23, 2018

FIX: add support for desc order when ranking infs with nans pandas-de…

b3e5292

…v#19538 (pandas-dev#20091)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rank Mixes np.nan with np.inf values #19538

Rank Mixes np.nan with np.inf values #19538

WillAyd commented Feb 5, 2018

jreback commented Feb 6, 2018

WillAyd commented Feb 6, 2018 •

edited

Loading

INSTALLED VERSIONS

peterpanmj commented Feb 9, 2018

WillAyd commented Feb 9, 2018

Rank Mixes np.nan with np.inf values #19538

Rank Mixes np.nan with np.inf values #19538

Comments

WillAyd commented Feb 5, 2018

Code Sample, a copy-pastable example if possible

Problem description

jreback commented Feb 6, 2018

WillAyd commented Feb 6, 2018 • edited Loading

INSTALLED VERSIONS

peterpanmj commented Feb 9, 2018

WillAyd commented Feb 9, 2018

WillAyd commented Feb 6, 2018 •

edited

Loading