-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix union_indexes not supporting sort=False for Index subclasses #35098
Conversation
This comment has been minimized.
This comment has been minimized.
Researching. Fiddling with |
The current idea is to support Update: okay, seems to work. I'll see if I can benchmark tomorrow, and it should be ready for a review. |
DatetimeIndex has an explicit branch set up for it with kind == 'special'
If an Index isn't just a sliceable one-level array (Index), we should not mess with sort to avoid breaking backwards compatibility.
pandas/core/indexes/api.py
Outdated
): | ||
ignore_sort = True | ||
else: | ||
ignore_sort = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we let any of these index type not get sorted, we'll break some tests where we assume that the results of joining on any of these come out sorted. Moreover, there isn't much point to having any of them unsorted except for pretty printing purposes, in my opinion.
If you think we should pass sort through for every type of Index
subclass, please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you show what tests break?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would rather just pass thru and adjust the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@gfyoung This PR is ready for a review now. |
Hello @AlexKirko! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-07-08 16:02:22 UTC |
@jreback Changes made, ready for another review:
|
This reverts commit ada7346. It threw an error.
All green. |
sort=None signifies that Index.union doesn't gurarantee sorting or error by default. It might mirror legacy behavior and leave the Index unsorted in edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm ping on green.
@jreback |
thanks very nice! |
…bclasses (pandas-dev#35098)" (pandas-dev#35277) This reverts commit c21be05.
sort=False
#35092black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Scope of PR
The reason for
DataFrame.append
unnecessarily sorting columns turned out to be that when merging indexes we were not passing thesort
argument for anIndex
subclass. This PR fixes this bug.Details
At some point down the call stack,
DataFrame.append
callsunion_indexes
incore.indexes.api
. In it we detect the type of our indexes:We don't pass
sort
forkind == 'special'
inunion_indexes
.I had some other ideas, but what I ended up doing is to simply pass the
sort
argument toIndex.union
forkind == 'special'
. We do have to ignore thesort
argument for[MultiIndex, RangeIndex, DatetimeIndex, CategoricalIndex]
to avoid breaking tests and backward compatibility, but it doesn't make sense for these ones to not be sorted when joined anyway, in my opinion.Performance
There are no changes to algorithms called. I still did some benchmarking to compare with my previous approach. The current solution didn't have any negative performance impact.