Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Undocumented change in .isin behavior from 1.1.5 to 1.2.0 #38781

Closed
StefanBRas opened this issue Dec 29, 2020 · 9 comments · Fixed by #39064
Closed

DOC: Undocumented change in .isin behavior from 1.1.5 to 1.2.0 #38781

StefanBRas opened this issue Dec 29, 2020 · 9 comments · Fixed by #39064
Labels
Docs Dtype Conversions Unexpected or buggy dtype conversions good first issue Testing pandas testing functions or related to the test suite
Milestone

Comments

@StefanBRas
Copy link

Location of the documentation

https://pandas.pydata.org/docs/whatsnew/v1.2.0.html

Documentation problem

There seems to be an undocumented change with how isin works.
pandas 1.1.5:

import pandas as pd
pd.Series([0]).isin(['0']) 
# 0    True
# dtype: bool
pd.Series([1]).isin(['1']) 
# 0    True
# dtype: bool
pd.Series([1.1]).isin(['1.1']) 
# 0    True
# dtype: bool

But in pandas 1.2.0 (and 1.2.0rc):

import pandas as pd
pd.Series([0]).isin(['0']) 
# 0    False
# dtype: bool
pd.Series([1]).isin(['1']) 
# 0    False
# dtype: bool
pd.Series([1.1]).isin(['1.1']) 
# 0    False
# dtype: bool

But the release notes mentions nothing about it.
I have not posted this issue as an bug, because I think the newer implementation is the better implementation.
However I propose adding the change to the release notes for other people like me where the change broke tests in a pipeline.

Suggested fix for documentation

Add a line mentioning that isin no longer returns true when comparing floats to a string of the float.

@StefanBRas StefanBRas added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 29, 2020
@jreback
Copy link
Contributor

jreback commented Dec 29, 2020

hmm we did a lot of fixing on isin. likley: #37528

we would need validation tests for this as it works on master. happy to take a PR. your examples in the doc-string would be great as well.

@jreback jreback added good first issue Testing pandas testing functions or related to the test suite Dtype Conversions Unexpected or buggy dtype conversions and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 29, 2020
@jreback jreback added this to the 1.2.1 milestone Dec 29, 2020
@StefanBRas
Copy link
Author

So adding tests for it here: https://github.com/pandas-dev/pandas/blob/master/pandas/tests/test_algos.py#L812?
I'm not sure which doc-string you're refering to here.

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Dec 29, 2020
@jreback
Copy link
Contributor

jreback commented Dec 29, 2020

So adding tests for it here: https://github.com/pandas-dev/pandas/blob/master/pandas/tests/test_algos.py#L812?
I'm not sure which doc-string you're refering to here.

yep

https://github.com/pandas-dev/pandas/blob/master/pandas/core/series.py#L4584

@StefanBRas
Copy link
Author

It looks like there already are tests for this:
https://github.com/pandas-dev/pandas/blob/master/pandas/tests/test_algos.py#L1047
And it looks like there has been some discussion about it in #38346
Which means that #34125 and #19356 can probably be closed.

@simonjayhawkins
Copy link
Member

fixed in [4699c2b] BUG: isin numeric vs string (#38279), no release note added.

@omarafifii
Copy link
Contributor

Hi,
I am willing to work on this issue.
I understand I should add the missing part in the release notes, but I am not sure if I should do something with the testing.

@erfannariman
Copy link
Member

erfannariman commented Dec 31, 2020

Hi,
I am willing to work on this issue.
I understand I should add the missing part in the release notes, but I am not sure if I should do something with the testing.

The tests seem to be there as mentioned by @StefanBRas , looks like only release notes are needed.

@omarafifii
Copy link
Contributor

Ok, I can do it.

@phofl
Copy link
Member

phofl commented Jan 9, 2021

#34125 is fixed too, yes

rapids-bot bot pushed a commit to rapidsai/cudf that referenced this issue Feb 26, 2021
Fixes: #7367, #7446

This PR upgrades pandas to `1.2.2` in `cudf`. Changes include:

- [x] Bumping up `pandas` version.
- [x] Fixing `isin` behavior which now takes in types into accout: pandas-dev/pandas#38781
- [x] `CategoricalColumn.__setitem__` will now not allow setting of values that are not in existing categories.
- [x] Introduced `cudf.core._compat.PANDAS_GE_120` variable to create back-ward compatibility.
- [x] Updated usages of `pd.core.tools.datetimes._guess_datetime_format` to `pd.core.tools.datetimes.guess_datetime_format`
- [x] Introduced `std` & `median` in `DateTimeColumn`.
- [x] Fixed incorrect handling of passing `StringMethods` as an input to methods in string APIs.
- [x] Fixed a typo in calling `is_valid` of `Scalar`.
- [x] Removed unnecessary special handling in `TimeDeltaColumn.sum` logic for empty inputs.
- [x] Introduced passing `dtype='float64'` wherever there is an empty series being created since pandas will soon be defaulting to `object` dtype if no type is passed and we don't have a perfectly resembling `object` dtype as that of pandas.
- [x] Fixed deprecation warnings of `Index.__or__` and `Index.__xor__` by replacing with `union` & `symmetric_difference` APIs.
- [x] Introduced mapping of our `float32` & `float64` dtypes to pandas Nullable dtypes `FLoat32Dtype` & `Float64Dtype` when `nullable=True` in `to_pandas`.
- [x] With introduction of nullable float dtypes, there is an issue in creating `MultiIndex` from dataframe: pandas-dev/pandas#39984, so introduced a workaround in our `MultiIndex.__repr__` code.
- [x] Removed usages of `check_less_precise` in our code-base as this is deprecated and is replaced with `rtol` & `atol`. Retained its usages in our testing APIs for back-ward compatibility.
- [x] Removed good number `xfail` cases which are actually passing right now because of resolved issues in both `pandas` & `cudf`.
- [x] Did some miscellaneous code-cleanup in pytests.
- [x] Fixed pytests that will fail when run in parallel due to access to shared pytest params being manipulated inplace.
- [x] Follow a standard import pattern across pytest files, some files do `from pandas import Series` and some do `from cudf.core import Series`. So removed both patterns and doing only simple `import cudf` & `import pandas as pd` to avoid confusion while debugging test failures across multiple files. (Made this change in all pytest files which I had to touch as part of pandas upgrade, we can make similar changes in future for the files which we touch).
- [x] Fix issue with assigning `np.nan` values to a `CategoricalColumn` and fix related `__repr__` code: #7446

Authors:
  - GALI PREM SAGAR (@galipremsagar)

Approvers:
  - Keith Kraus (@kkraus14)
  - AJ Schmidt (@ajschmidt8)

URL: #7375
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Dtype Conversions Unexpected or buggy dtype conversions good first issue Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants