-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rank Mixes np.nan with np.inf values #19538
Comments
see #6945 / #17903 this was just done in master, is this another case? cc @peterpanmj |
Interesting...from glancing at it I think the problem is that PR only updated the In []: df[0].rank() # this works
Out[]:
0 2.0
1 NaN
2 4.0
3 1.0
4 3.0
In []: df[0].rank(ascending=False) # handles na appropriately, but incorrectly sets np.inf and -np.inf to equal
Out[]:
0 3.0
1 NaN
2 1.0
3 1.0
4 2.0
In []: df['key'] = ['foo'] * 5
In []: df.groupby('key').rank() # doesn't handle missing values appropriately
Out[]:
0
0 2.0
1 NaN
2 NaN
3 1.0
4 3.0 INSTALLED VERSIONScommit: 93c86aa pandas: 0.23.0.dev0+238.g93c86aa13.dirty |
It is the same issue as #6945. And, it seems that #17903 does not completely fix it when ascending is False. I will have a look into it.
It should related to #3729 |
@peterpanmj whatever solution you come up with in |
Code Sample, a copy-pastable example if possible
Problem description
np.inf
or-np.inf
gets grouped withnp.nan
in the rank operation, depending on which direction the ranking occurs in. Ideally,np.inf
would be entirely separate fromnp.nan
.The text was updated successfully, but these errors were encountered: