Convert to compatible NumPy dtype for MaskedArray to_numpy #55058

phofl · 2023-09-07T20:53:31Z

closes API: should to_numpy() by default return the corresponding type, and raise otherwise? #48891 (Replace xxxx with the GitHub issue number)
closes BUG: series.to_numpy does not work well with pd.Float64Dtype #40630
closes ENH: support casting Int arrays with nulls to np.float? #37460
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

This still needs a whatsnew

WillAyd · 2023-09-11T13:28:54Z

I think this is pretty reasonable. Is this part of a larger set of changes or is this self-contained?

Probably worth a whatsnew

phofl · 2023-09-11T13:31:44Z

This is something we wanted to clean up for a while now, but yes I have to add a whatsnew (didn't get to it yet because this needs more than a single line)

phofl · 2023-09-30T18:28:41Z

Added the whatsnew

lukemanley · 2023-09-30T20:08:29Z

Does this close #48891?

phofl · 2023-09-30T21:15:35Z

yep

# Conflicts: # pandas/tests/base/test_conversion.py # pandas/tests/frame/test_repr.py

phofl · 2023-11-16T23:41:53Z

cc @mroeschke

I think this is ready for a review

pandas/core/algorithms.py

pandas/core/methods/to_dict.py

mroeschke · 2023-11-17T17:59:26Z

pandas/io/formats/format.py

@@ -1527,6 +1528,8 @@ def _format_strings(self) -> list[str]:
        if isinstance(values, Categorical):
            # Categorical is special for now, so that we can preserve tzinfo
            array = values._internal_get_values()
+        elif isinstance(values, BaseMaskedArray):
+            array = values.to_numpy(na_value=NA)


As I discovered in https://github.com/pandas-dev/pandas/pull/56003/files#diff-c2df0eefbdda16cb85d5e0a305db90101f92bfb790a8e989ab2e99b935b7f6b3, maybe we just want to do to_numpy(dtype=object) here?

Yep hat should work as well

pandas/io/parsers/base_parser.py

pandas/tests/copy_view/test_array.py

mroeschke · 2023-11-17T18:02:16Z

pandas/tests/extension/test_masked.py

+    @pytest.mark.parametrize("na_action", [None, "ignore"])
+    def test_map(self, data_missing, na_action):
+        result = data_missing.map(lambda x: x, na_action=na_action)
+        if data_missing.dtype == Float32Dtype():


If there a solution to avoid this?

Not with roundtripping through object, maybe your other suggestion above solves this.

No if your suggestion doesn't work

mroeschke

Should we do the same thing for ArrowExtensionArray.to_numpy?

# Conflicts: # doc/source/whatsnew/v2.2.0.rst

phofl · 2023-11-29T21:17:22Z

Can we get this in if ci is green now? Just merged main to resolve whatsnew conflicts

doc/source/whatsnew/v2.2.0.rst

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

mroeschke · 2023-12-01T19:34:55Z

Thanks @phofl

phofl added 2 commits September 3, 2023 20:16

Convert masked arrays to valid numpy dtype

c3c718c

Convert ea to appropriate numpy dtype

03ff78e

phofl requested a review from WillAyd as a code owner September 7, 2023 20:53

Merge branch 'main' into to_numpy_ea

2e1a696

mroeschke added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Sep 8, 2023

phofl and others added 2 commits September 10, 2023 21:06

Fix typing

92a509f

Merge branch 'main' into to_numpy_ea

d55e41f

phofl and others added 2 commits September 30, 2023 19:19

Merge branch 'main' into to_numpy_ea

a37564a

Add whatsnew

e4618bb

phofl added this to the 2.2 milestone Sep 30, 2023

phofl and others added 2 commits October 15, 2023 16:00

Merge branch 'main' into to_numpy_ea

72cfba1

Merge remote-tracking branch 'upstream/main' into to_numpy_ea

6223938

# Conflicts: # pandas/tests/base/test_conversion.py # pandas/tests/frame/test_repr.py