-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: dtype._is_valid_na_for_dtype #51378
Conversation
The base class implementation considers any scalar recognized by pd.isna | ||
to be equivalent. | ||
""" | ||
return libmissing.checknull(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could require that pd.NA always be recognized?
I think this part requires some more discussion, as it is not necessarily a "fix". For example, when casting a default numpy float64 to nullable Float64. I think that
And BTW, pyarrow does something similar if the input data is a pandas object, i.e. So maybe we should also change the |
Agreed that the constructor behavior is a difficult (frustrating!) API problem, and there is a natural link between constructor and One option to fix the There are a couple other places I intend to use something like dtype._is_valid_na_for_dtype: MultiIndex._get_loc_single_level_index and a few places in the indexing engines that use checknull. If we don't go down this path, I'll go back to the drawing board. |
Yes, I think that one is good to fix (one can probably argue there is still some ambiguity, but it's also much more a corner case, and a situation where the user is more explicitly checking for that specific value, so it make sense to be stricter. I think fixing this locally in the relevant
For indexing, we might also want to distinguish between pd.NA and np.nan? (this is for things like |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Long-term I think we need something like this. For now I'll make a PR that patches the relevant |
Some dtypes (pyarrow/nullable mostly) need to treat NaN (and maybe None) differently from pd.NA. Motivating bug (which this PR does not address):
What this does fix is
arr[2] = np.nan
now sets NaN instead of pd.NA. Setting np.nan into an IntegerArray/BooleanArray raises. (None still casts to pd.NA).This breaks a bunch of tests. I'd like feedback before I go track those down. cc @jorisvandenbossche
Tangentially related #27825