-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: groupby.agg with UDF changing pyarrow dtypes #59601
Open
rhshadrach
wants to merge
45
commits into
pandas-dev:main
Choose a base branch
from
rhshadrach:fix/group_by_agg_pyarrow_bool_numpy_same_type
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
45 commits
Select commit
Hold shift + click to select a range
9faa460
Set preserve_dtype flag for bool type only when result is also bool
969d5b1
Update implementation to change type to pyarrow only
66114f3
Change import order
b0290ed
Convert numpy array to pandas representation of pyarrow array
20c8fa0
Add tests
97b3d54
Merge branch 'main' into fix/group_by_agg_pyarrow_bool_numpy_same_type
932d737
Change pyarrow to optional import in agg_series() method
82ddeb5
Seperate tests
d510052
Merge branch 'main' into fix/group_by_agg_pyarrow_bool_numpy_same_type
62a31d9
Merge branch 'main' into fix/group_by_agg_pyarrow_bool_numpy_same_type
a54bf58
Revert to old implementation
64330f0
Update implementation to use pyarrow array method
0647711
Update test_aggregate tests
affde38
Move pyarrow import to top of method
842f561
Update according to pr comments
93b5bf3
Merge branch 'main' into fix/group_by_agg_pyarrow_bool_numpy_same_type
6f35c0e
Fallback convert to input dtype is output is all nan or empty array
abd0adf
Strip na values when inferring pyarrow dtype
bebc442
Update tests to check expected inferred dtype instead of inputy dtype
bb6343b
Override test case for test_arrow.py
3a3f2a2
Merge branch 'main' into fix/group_by_agg_pyarrow_bool_numpy_same_type
6dc40f5
Empty commit to trigger build run
4ef96f7
In agg series, convert to np values, then cast to pyarrow dtype, acco…
c6a98c0
Update tests
9181eaf
Update rst docs
612d7d0
Update impl to fix tests
3b6696b
Declare variable in outer scope
680e238
Update impl to use maybe_cast_pointwise_result instead of maybe_cast…
3a8597e
Fix tests with nested array
6496b15
Update according to pr comments
712c36a
Merge branch 'main' into fix/group_by_agg_pyarrow_bool_numpy_same_type
e1ccef6
Preserve_dtype if argument is passed in, else don't preserve
0ce083d
Merge branch 'main' into fix/group_by_agg_pyarrow_bool_numpy_same_type
undermyumbrella1 a1d73f5
Update tests
57845a8
Merge branch 'fix/group_by_agg_pyarrow_bool_numpy_same_type' of githu…
fa257b0
Remove redundant tests
undermyumbrella1 0a9b83f
Merge branch 'main' into fix/group_by_agg_pyarrow_bool_numpy_same_type
undermyumbrella1 139319a
retrigger pipeline
undermyumbrella1 9c2f9f2
Merge main
rhshadrach fef315d
Merge branch 'main' into fix/group_by_agg_pyarrow_bool_numpy_same_type
rhshadrach f758eb1
Merge branch 'main' of https://github.com/pandas-dev/pandas into fix/…
rhshadrach 283eda9
Rework
rhshadrach d6edeff
Cleanup
rhshadrach b2e34fb
Fixup
rhshadrach 9cbf339
More skips
rhshadrach File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the column starts as a PyArrow dtype and returns dictionaries, it seems questionable to me whether we should return the corresponding PyArrow dtype. The other option is a NumPy array of object dtype. But both seem like reasonable results and I imagine the PyArrow is likely to be more convenient for the user who is using PyArrow dtypes.