BUG: fix union_indexes not supporting sort=False for Index subclasses #35098

AlexKirko · 2020-07-02T18:05:31Z

closes BUG: DataFrame.append seems to be sorting columns even when sort=False #35092
1 tests added / 1 passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Scope of PR

The reason for DataFrame.append unnecessarily sorting columns turned out to be that when merging indexes we were not passing the sort argument for an Index subclass. This PR fixes this bug.

Details

At some point down the call stack, DataFrame.append calls union_indexes in core.indexes.api. In it we detect the type of our indexes:

indexes, kind = _sanitize_and_check(indexes)

We don't pass sort for kind == 'special' in union_indexes.

I had some other ideas, but what I ended up doing is to simply pass the sort argument to Index.union for kind == 'special'. We do have to ignore the sort argument for [MultiIndex, RangeIndex, DatetimeIndex, CategoricalIndex] to avoid breaking tests and backward compatibility, but it doesn't make sense for these ones to not be sorted when joined anyway, in my opinion.

Performance

There are no changes to algorithms called. I still did some benchmarking to compare with my previous approach. The current solution didn't have any negative performance impact.

pandas/core/indexes/api.py

AlexKirko · 2020-07-03T06:40:13Z

Researching. Fiddling with pd.concat and union_indexes affects a bunch of stuff. Let's see if I can get this bugfix to work without introducing undesired behavior elsewhere.

AlexKirko · 2020-07-03T07:28:04Z

The current idea is to support sort only for Index subclasses that are just an Index of a specific datatype.

Update: okay, seems to work. I'll see if I can benchmark tomorrow, and it should be ready for a review.
Update 2: benchmarked it, there were some decreases, so I did some more digging and ended up simply passing sort to Index.union instead of all the Index subclass detection.

DatetimeIndex has an explicit branch set up for it with kind == 'special'

If an Index isn't just a sliceable one-level array (Index), we should not mess with sort to avoid breaking backwards compatibility.

AlexKirko · 2020-07-05T06:07:36Z

pandas/core/indexes/api.py

+    ):
+        ignore_sort = True
+    else:
+        ignore_sort = False


If we let any of these index type not get sorted, we'll break some tests where we assume that the results of joining on any of these come out sorted. Moreover, there isn't much point to having any of them unsorted except for pretty printing purposes, in my opinion.
If you think we should pass sort through for every type of Index subclass, please let me know.

can you show what tests break?

i would rather just pass thru and adjust the tests

AlexKirko · 2020-07-05T06:43:54Z

@gfyoung This PR is ready for a review now.

pandas/core/indexes/api.py

pep8speaks · 2020-07-07T08:44:16Z

Hello @AlexKirko! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-07-08 16:02:22 UTC

AlexKirko · 2020-07-07T09:56:27Z

@jreback Changes made, ready for another review:

Pass sort through to all Index subclasses. Alter tests to accomodate no automatic sorting. Add comments to changes where appropriate. After another look, I don't think this change will break anything for our users.
Add test for append to reshape/test_concat.py. Use streamlined equivalent of the OP's test case, as the original case is overly verbose.
Move whatsnew record to reshape section and remove mention of the private function.

pandas/core/indexes/api.py

This reverts commit ada7346. It threw an error.

AlexKirko · 2020-07-07T18:36:29Z

All green.

sort=None signifies that Index.union doesn't gurarantee sorting or error by default. It might mirror legacy behavior and leave the Index unsorted in edge cases.

jreback

lgtm ping on green.

AlexKirko · 2020-07-08T17:05:23Z

@jreback
All green. There was some hardware incident that impacted Azure pipelines, but we are good now.

jreback · 2020-07-09T13:02:37Z

thanks very nice!

…bclasses (#35098)" This reverts commit c21be05.

…bclasses (#35098)" (#35277) This reverts commit c21be05.

…bclasses (pandas-dev#35098)" (pandas-dev#35277) This reverts commit c21be05.

AlexKirko added 6 commits July 2, 2020 13:54

BUG: fix index detection in _sanitize_and_check

148a589

DOC: add comment to fix

8a51ded

TST: add tests

371fd6e

DOC: add whatsnew entry

e5f1e68

REFACT: refact the boolean expression

9701bb5

REFACT: use generator comprehension

3208fbe

This comment has been minimized.

Sign in to view

AlexKirko added 2 commits July 3, 2020 09:27

TST: add cases to the test

a15cef4

preserve Index subclass, exclude MultiIndex

933956c

AlexKirko commented Jul 3, 2020

View reviewed changes

pandas/core/indexes/api.py Outdated Show resolved Hide resolved

AlexKirko marked this pull request as draft July 3, 2020 06:51

include RangeIndex in exceptions

4ecd83e

AlexKirko added 5 commits July 3, 2020 12:11

exclude DatetimeIndex

896d62a

DatetimeIndex has an explicit branch set up for it with kind == 'special'

exclude CategoricalIndex

950b279

CLN: sort imports in test_common

4b2076d

DOC: remove unnecessary backquoutes in whatsnew

6f2f3c9

DOC: add missing colon to whatsnew

0a6c62e

gfyoung added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jul 3, 2020

AlexKirko added 2 commits July 5, 2020 08:19

pass sort to Index.union instead of previous approach

9ca48f7

explicitly ignore sort for heavily modified Index subclasses

f41b7e7

If an Index isn't just a sliceable one-level array (Index), we should not mess with sort to avoid breaking backwards compatibility.

AlexKirko commented Jul 5, 2020

View reviewed changes

AlexKirko added 2 commits July 5, 2020 09:08

fully incorporate sort ignoring for particular Index subtypes

2fe2c9d

CLN: fix typo

cefd5c9

AlexKirko marked this pull request as ready for review July 5, 2020 06:40

gfyoung reviewed Jul 5, 2020

View reviewed changes

pandas/core/indexes/api.py Outdated Show resolved Hide resolved

gfyoung reviewed Jul 5, 2020

View reviewed changes

pandas/core/indexes/api.py Outdated Show resolved Hide resolved

AlexKirko added 4 commits July 7, 2020 10:21

TST: alter test_unbalanced in test_melt

93a4ccc

TST: alter nonnumeric and float suffix tests in test_melt

93452c1

CLN: run black pandas

fb3a906

TST: add OP test, use fixture in the index test

0479618

CLN: run black on test files

0f96dbe

AlexKirko requested a review from jreback July 7, 2020 09:56

jreback requested changes Jul 7, 2020

View reviewed changes

pandas/core/indexes/api.py Show resolved Hide resolved

AlexKirko added 3 commits July 7, 2020 16:41

REFACT: remove unnecessary change of sort from True to None

ada7346

Revert "REFACT: remove unnecessary change of sort from True to None"

c3cbecc

This reverts commit ada7346. It threw an error.

restart tests

7cc4d9d

AlexKirko requested a review from jreback July 8, 2020 05:27

AlexKirko added 3 commits July 8, 2020 16:30

DOC: add comment with ref to sort=None issue

4c1cc42

sort=None signifies that Index.union doesn't gurarantee sorting or error by default. It might mirror legacy behavior and leave the Index unsorted in edge cases.

DOC: clarify comment language

cc0d167

DOC: clarify comment more

54140af

jreback added this to the 1.1 milestone Jul 8, 2020

jreback approved these changes Jul 8, 2020

View reviewed changes

restart tests

1c371bd

AlexKirko requested a review from jreback July 9, 2020 09:44

jreback merged commit c21be05 into pandas-dev:master Jul 9, 2020

AlexKirko deleted the append-err-sort branch July 9, 2020 15:11

jorisvandenbossche mentioned this pull request Jul 11, 2020

REGR: concat on index with duplicate labels fails #35238

Closed

TomAugspurger added a commit that referenced this pull request Jul 14, 2020

Revert "BUG: fix union_indexes not supporting sort=False for Index su…

1fd3add

…bclasses (#35098)" This reverts commit c21be05.

TomAugspurger mentioned this pull request Jul 14, 2020

Revert "BUG: fix union_indexes not supporting sort=False for Index subclasses" #35277

Merged

jreback pushed a commit that referenced this pull request Jul 15, 2020

Revert "BUG: fix union_indexes not supporting sort=False for Index su…

d99d44a

…bclasses (#35098)" (#35277) This reverts commit c21be05.

fangchenli pushed a commit to fangchenli/pandas that referenced this pull request Jul 16, 2020

Revert "BUG: fix union_indexes not supporting sort=False for Index su…

2da5ae4

…bclasses (pandas-dev#35098)" (pandas-dev#35277) This reverts commit c21be05.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: fix union_indexes not supporting sort=False for Index subclasses #35098

BUG: fix union_indexes not supporting sort=False for Index subclasses #35098

AlexKirko commented Jul 2, 2020 •

edited

Loading

This comment has been minimized.

AlexKirko commented Jul 3, 2020

AlexKirko commented Jul 3, 2020 •

edited

Loading

AlexKirko Jul 5, 2020

jreback Jul 6, 2020

jreback Jul 6, 2020

AlexKirko Jul 7, 2020

AlexKirko commented Jul 5, 2020

pep8speaks commented Jul 7, 2020 •

edited

Loading

AlexKirko commented Jul 7, 2020 •

edited

Loading

AlexKirko commented Jul 7, 2020

jreback left a comment

AlexKirko commented Jul 8, 2020

jreback commented Jul 9, 2020

BUG: fix union_indexes not supporting sort=False for Index subclasses #35098

BUG: fix union_indexes not supporting sort=False for Index subclasses #35098

Conversation

AlexKirko commented Jul 2, 2020 • edited Loading

Scope of PR

Details

Performance

This comment has been minimized.

AlexKirko commented Jul 3, 2020

AlexKirko commented Jul 3, 2020 • edited Loading

AlexKirko Jul 5, 2020

Choose a reason for hiding this comment

jreback Jul 6, 2020

Choose a reason for hiding this comment

jreback Jul 6, 2020

Choose a reason for hiding this comment

AlexKirko Jul 7, 2020

Choose a reason for hiding this comment

AlexKirko commented Jul 5, 2020

pep8speaks commented Jul 7, 2020 • edited Loading

Comment last updated at 2020-07-08 16:02:22 UTC

AlexKirko commented Jul 7, 2020 • edited Loading

AlexKirko commented Jul 7, 2020

jreback left a comment

Choose a reason for hiding this comment

AlexKirko commented Jul 8, 2020

jreback commented Jul 9, 2020

AlexKirko commented Jul 2, 2020 •

edited

Loading

AlexKirko commented Jul 3, 2020 •

edited

Loading

pep8speaks commented Jul 7, 2020 •

edited

Loading

AlexKirko commented Jul 7, 2020 •

edited

Loading