-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: groupby.agg with UDF changing pyarrow dtypes #59601
base: main
Are you sure you want to change the base?
BUG: groupby.agg with UDF changing pyarrow dtypes #59601
Conversation
…unt for missing pyarrow dtypes
…b.com:undermyumbrella1/pandas into fix/group_by_agg_pyarrow_bool_numpy_same_type
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
…group_by_agg_pyarrow_bool_numpy_same_type
result = gb.agg(lambda x: {"number": 1}) | ||
|
||
arr = pa.array([{"number": 1}, {"number": 1}, {"number": 1}]) | ||
expected = DataFrame( | ||
{"B": ArrowExtensionArray(arr)}, | ||
index=Index(["c1", "c2", "c3"], name="A"), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the column starts as a PyArrow dtype and returns dictionaries, it seems questionable to me whether we should return the corresponding PyArrow dtype. The other option is a NumPy array of object dtype. But both seem like reasonable results and I imagine the PyArrow is likely to be more convenient for the user who is using PyArrow dtypes.
Continuation of #58129
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Root cause:
agg_series
always forces output dtype to be the same as input dtype, but depending on the lambda, the output dtype can be differentFix:
maybe_convert_object
, asmaybe_convert_object
does not check for NA, and forces dtype to float if NA is present (NA is not float in pyarrow),