Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Suppport masks in median groupby algo #48387

Merged
merged 11 commits into from
Sep 6, 2022

Conversation

phofl
Copy link
Member

@phofl phofl commented Sep 4, 2022

Only a small speed up (5-10%, depending on the number of NAs). In general, this is needed to streamline the code base in ops.py after everything supports masks.

@phofl phofl added Groupby NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Sep 4, 2022
@@ -93,6 +124,13 @@ cdef inline float64_t median_linear(float64_t* a, int n) nogil:
a = tmp
n -= na_count

return calc_median_linear(a, n, na_count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you free a here instead of in calc_median_linear? While it does mean you'll have to free in both this method and the masked version, it makes calc_median_linear have no side-effects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it, my train of thought went the other way, if moved inside the function the caller would not forget it. But no real preference on my side

@rhshadrach rhshadrach added this to the 1.6 milestone Sep 5, 2022
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mroeschke mroeschke merged commit da6cae7 into pandas-dev:main Sep 6, 2022
@mroeschke
Copy link
Member

Thanks @phofl

@phofl phofl deleted the enh_mask_median_groupby branch September 6, 2022 20:27
@mroeschke mroeschke modified the milestones: 1.6, 2.0 Oct 13, 2022
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
* ENH: Suppport masks in median groupby algo

* Avoid float cast

* Out dtype

* Add cast

* Revert algos

* Add whatsnew

* Deduplicate

* Fix

* Add type hints

* Move free
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants