-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: building MultiIndex with categorical levels #26721
Conversation
ae8e348
to
9345aaa
Compare
For performance related items you can create a benchmark in |
@WillAyd I was thinking more in terms of possible regressions. I'm not sure what I should really test or worry for. But I couldn't find any test using a MultiIndex with at least a CategoricalIndex level... |
if isinstance(values, (ABCCategoricalIndex, ABCSeries)): | ||
values = values._values | ||
categories = CategoricalIndex(values.categories, dtype=values.dtype) | ||
values = CategoricalIndex(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here on what is happening
doc/source/whatsnew/v0.25.0.rst
Outdated
@@ -514,6 +514,7 @@ Performance Improvements | |||
- Improved performance of :meth:`read_csv` by faster concatenating date columns without extra conversion to string for integer/float zero and float ``NaN``; by faster checking the string for the possibility of being a date (:issue:`25754`) | |||
- Improved performance of :attr:`IntervalIndex.is_unique` by removing conversion to ``MultiIndex`` (:issue:`24813`) | |||
- Restored performance of :meth:`DatetimeIndex.__iter__` by re-enabling specialized code path (:issue:`26702`) | |||
- Improved performance of :meth:`DataFrame.set_index` when using multiple indexes and at least one of them is a categorical object (:issue:`22044`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the asv as you have in the example. Also we might not have tests for using multiple cats in a MI, see if you can add one in pandas/tests/indexes/multi/test_constructor which match what the .set_index() is doing at a lower level.
9345aaa
to
f3f10bc
Compare
f3f10bc
to
59a3c67
Compare
Codecov Report
@@ Coverage Diff @@
## master #26721 +/- ##
==========================================
- Coverage 91.78% 91.77% -0.01%
==========================================
Files 174 174
Lines 50703 50703
==========================================
- Hits 46538 46534 -4
- Misses 4165 4169 +4
Continue to review full report at Codecov.
|
1 similar comment
Codecov Report
@@ Coverage Diff @@
## master #26721 +/- ##
==========================================
- Coverage 91.78% 91.77% -0.01%
==========================================
Files 174 174
Lines 50703 50703
==========================================
- Hits 46538 46534 -4
- Misses 4165 4169 +4
Continue to review full report at Codecov.
|
thanks @0x0L very nice! |
git diff upstream/master -u -- "*.py" | flake8 --diff
On my machine, this takes ~20ms. The current implem takes 140ms.
Any suggestion for tests ?