-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
sort_index fails on MutiIndex'ed DataFrame resulting from groupby.apply #15687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this is a dupe of #15622 yes the bug here is that it is constructed in a different order when concat is used (though still lexsorted). |
actually this is not a dupe of #15622, rather this is correct. @8one6 see below. lexsortedness has to do with the label orderings w.r.t. to the levels (which may or may not be sorted). see if the below makes sense to you
This is a resort based on the 3 columns, we are reconstructing things here (IOW starting new), NOT from the existing levels
I agree this is a tricky concept and we don't reorder level values when sorting. |
this is also related to this: #14672 (or maybe that will make more sense) |
First off, thanks for taking the timer to answer this. But can you help me get my head around this: Are you saying this behavior is correct as above (i.e. that the goal of If the former, let me raise my hand to say "isn't this going to confuse a ton of people"? And if the former, is there another method which would achieve the simple end of "sorting the index" in the usual way? And if the former, can we try to put this in the docs somewhere? If we're going this way, then I think the intended behavior of If the latter, what, if anything, can be done to fix the current behavior? Thanks! |
well its currently the former. and yes its confusing. I believe its this ways because this can have a detrimental effect on performance (though its only if you are actually sorting that this matters). I will reopen this because I think I can fix this and actually make it sort w/o regard to the lexsortedness. |
I think I can (if I twist my head around funny) understand a context where a person might want to take advantage of the existing behavior. For example it could be a bit like an ordered categorical, where I just decide that for me the fruits But I'd stress that a) I can't imagine that's the more common use case and b) as of now I think the behavior is totally undocumented. I don't fully understand how the guts work here...but if I can help by proofreading new documentation or testing new functions, I'd love to. Coming at this from my use cases, I would think that it is incredibly rare to see people deliberately creating ordered levels on a I've found myself using the hack of
but that's just ugly, right? |
I feel like this is part of a suite of bugs that come from a failure to notice when a
MultiIndex
that was oncelexsort
ed loses itslexsort
edness. I submitted one example of this a couple years ago (see #8017), but this related bug persists.On Pandas 0.19.0:
So before the
apply
command,df
was properly sorted on the row index. However, as you can see,res
is not properly sorted, even though its creation ends with asort_index
command.This is a bug, right? I would think we want people to be able to assume that anytime they call
sort_index
that the result comes out lexicographically sorted, no?The text was updated successfully, but these errors were encountered: