-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: value_counts and nunique behave differently for NaN and None with dropna=False #42688
Comments
@knoam I am new to open source, Can you please explain to me what's this issue is about? |
He expects that both statements return the same, e.g. 3 I think since the nunqiue call seems correct |
It is
The decision not to mangle |
This is the "special code" in pandas/pandas/_libs/hashtable_func_helper.pxi.in Lines 64 to 68 in cfbd076
To get consistent behavior it just needs to be deleted. |
If the change is required in value_counts function to make it consistent with nuniques. Then can I take up on this issue and make a PR based on the code changes suggested by @realead ?. I am new to open source, and I would like to work on this issue. |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample
Problem description
nunique returns 3, but value_counts only has 2 rows. This may be related to Issue #37566
Expected Output
I would expect these to be consistent.
Output of
pd.show_versions()
pandas : 1.3.0
numpy : 1.19.5
pytz : 2019.3
dateutil : 2.8.1
pip : 21.1.2
setuptools : 56.0.0
Cython : 0.29.23
pytest : 6.2.3
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.24.1
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.5
fastparquet : None
gcsfs : None
matplotlib : 3.4.1
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : 1.3.23
tables : None
tabulate : None
xarray : 0.17.0
xlrd : 2.0.1
xlwt : None
numba : 0.52.0
The text was updated successfully, but these errors were encountered: