Skip to content

Conversation

@aijams
Copy link
Contributor

@aijams aijams commented Sep 29, 2025

In the edge case when called with integer arrays and asked to access non-existent entries (to be replaced with NaN), the take method of NumpyExtensionArray produces arrays whose dtypes don't match their underlying data.
Specifically, take promotes the underlying data to a floating-point type, but doesn't promote the dtype of the extension array to match.
These changes ensure that the result of this method has the correct dtype for its data.

@aijams aijams marked this pull request as draft September 29, 2025 18:11
@rhshadrach rhshadrach added Bug ExtensionArray Extending pandas with custom dtypes or arrays. Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Sep 30, 2025
@aijams
Copy link
Contributor Author

aijams commented Oct 2, 2025

Several tests are currently failing due to a new version of numexpr.
#62545
All other tests pass.
I hard-coded a list of dtypes in take to check for the integer types that can't store NaN values since setting the dtype on an extension array after it's created isn't allowed. I have yet to think of another way to correct this issue without modifying the base extension array class to allow its dtype to be modified.
Let me know if you thought of a better way of approaching this.

@aijams aijams marked this pull request as ready for review October 7, 2025 16:33
np.int64,
],
)
def test_take_assigns_integer_dtype_when_fill_disallowed(dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this test to make sure that take does the expected thing when not allowed to fill missing values. I did a quick search, but I didn't find any other tests for this. There are other tests that work with this method, but I would like to have a test explicitly for this behavior to check for regressions and make it clear that take should preserve dtype if it can. If you still think this test isn't needed, can you explain why you think so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is tested in the extension tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aijams if you address this we can get this merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug ExtensionArray Extending pandas with custom dtypes or arrays.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants