-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: MultiIndex.get_locs #45931
PERF: MultiIndex.get_locs #45931
Conversation
this fully covers #38650 ? |
Using the example from #38650 - see results below. These changes make a decent impact. However, #38650 includes additional conversation about an alternative data structure which I suspect is still valid for further improvements. I can leave #38650 open if you want to keep that discussion going there?
|
thanks @lukemanley ok to leave that issue open though this seems to get a lot of the low hanging fruit so maybe not worth it but happy to be proven wrong |
sound good. Its now open |
@jreback - gentle ping. If this looks ok, I have a few perf-related follow-ups for multiindex indexing. |
thanks @lukemanley this is great! |
This looks to have a great performance improvement on partial indexing, which I think should also get a line in the whatsnew. |
#46040 (still open) adds some additional perf beyond this and I'm working on one more as well. I can expand on the whatsnew in one of the pending PRs. |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Perf improvements for MultiIndex.get_locs.
Two changes:
This comment in algos.searchsorted explains the rationale:
pandas/pandas/core/algorithms.py
Lines 1518 to 1521 in 6f8d279
The scenario of different types is common here due to the nature of
factorize_from_iterable
which returns different int types depending on the size of the codes:The impact when using numpy.searchsorted directly with mismatched types can be significant:
One asv added: