-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: support kind and na_position kwargs in Series.sort_index #14445
ENH: support kind and na_position kwargs in Series.sort_index #14445
Conversation
Current coverage is 85.20% (diff: 100%)
|
@@ -144,3 +144,27 @@ def test_sort_index_multiindex(self): | |||
# rows share same level='A': sort has no effect without remaining lvls | |||
res = s.sort_index(level='A', sort_remaining=False) | |||
assert_series_equal(s, res) | |||
|
|||
def test_sort_index_kind(self): | |||
series = Series(index=[3, 2, 1, 4, 3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment with the github reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
ascending=ascending) | ||
from pandas.core.groupby import _nargsort | ||
|
||
if ((ascending and index.is_monotonic_increasing) or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do this optimization inside _nargsort
. you can try and see if anything breaks the test suite.
this would just return an indexer like np.arange(len(index))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. There isn't an explicit function to check monotonicity, but I've added similar functionality in _nargsort
. This also means we can remove this checking from the similar sort_index
function in pandas/core/frame.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls remove this part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's now removed. I'll work on moving that functionality into _nargsort
.
pls add a whatsnew note, 0.19.1 is fine. (you can list both issues) |
c294d9d
to
e69cbe4
Compare
@@ -48,3 +48,4 @@ Bug Fixes | |||
- Bug in ``MultiIndex.set_levels`` where illegal level values were still set after raising an error (:issue:`13754`) | |||
- Bug in ``DataFrame.to_json`` where ``lines=True`` and a value contained a ``}`` character (:issue:`14391`) | |||
- Bug in ``df.groupby`` causing an ``AttributeError`` when grouping a single index frame by a column and the index level (:issue`14327`) | |||
- Bug in ``Series.sort_index`` where parameters ``kind`` and ``na_position`` did not exist (:issue:`13589` & :issue:`14444`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, -> &
@@ -4280,6 +4280,11 @@ def _nargsort(items, kind='quicksort', ascending=True, na_position='last'): | |||
if is_categorical_dtype(items): | |||
return items.argsort(ascending=ascending) | |||
|
|||
# # GH11080 - Check monotonic-ness before sort an index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this optimization is very problematic. Its quite inefficient (we already have these checks on an actual Index
, but we don't guarantee this is an index here, though easy to wrap). It doesn't handle nans properly. If it passes the test suite I guess its ok, but would need a performance review. Remove and put in a new PR please.
@@ -144,3 +144,28 @@ def test_sort_index_multiindex(self): | |||
# rows share same level='A': sort has no effect without remaining lvls | |||
res = s.sort_index(level='A', sort_remaining=False) | |||
assert_series_equal(s, res) | |||
|
|||
# GH #14444 & #13589: Add support for sort algo choosing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the comment inside the function
I think this only handles single Index for now, we should also deal with MultiIndex (see also the older PR for this https://github.com/pandas-dev/pandas/pull/13729/files) |
@jorisvandenbossche that looks right. on merge let's make sure to close the other PR |
7a65e84
to
5cade34
Compare
@@ -37,6 +37,7 @@ Bug Fixes | |||
- Bug in localizing an ambiguous timezone when a boolean is passed (:issue:`14402`) | |||
|
|||
|
|||
- Bug in ``Series.sort_index`` where parameters ``kind`` and ``na_position`` did not exist (:issue:`13589`, :issue:`14444`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche want to highlite this change. maybe enhancements section? (even though its a bug fix / api change type of fix)
485949e
to
d1d6513
Compare
@@ -1786,8 +1786,11 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False, | |||
indexer = _ensure_platform_int(indexer) | |||
new_index = index.take(indexer) | |||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put:
indexer = _ensure_platform_int(indexer)
new_index = index.take(indexer)
outside the if-else block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; moved.
@@ -30,6 +30,7 @@ Performance Improvements | |||
|
|||
Bug Fixes | |||
~~~~~~~~~ | |||
- Bug in ``Series.sort_index`` where parameters ``kind`` and ``na_position`` did not exist (:issue:`13589`, :issue:`14444`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move to 0.20.0
ping on green. |
For me it's good to have this merged, but note that it does not fully close the issue, as it's not yet solved for MultiIndex, so I would leave the issue open and clarify there. |
@jorisvandenbossche I overlooked that; thanks for pointing it out. I've now added @jreback Moved to |
8a6bb77
to
b013103
Compare
@brandonmburroughs thanks for the update. Last question: can you also add a test for |
@jorisvandenbossche As I was adding the test, I realized that this doesn't seem to really use the MultiIndex.from_tuples([[1, 1, 3], [1, 1, np.nan]], names=list('ABC')) the values get stored as MultiIndex(levels=[[1], [1], [3]],
labels=[[0, 0], [0, 0], [0, -1]],
names=[u'A', u'B', u'C']) These label values are what get passed to the sorting algorithm for both DataFrames and Series. Since the sorting only happens on the In [206]: mi = MultiIndex.from_tuples([[1, 1, 3], [1, 1, np.nan]], names=list('ABC'))
In [207]: s = Series([1, 2], mi)
In [208]: s.sort_index(na_position="last")
Out[208]:
A B C
1 1 NaN 2
3 1
dtype: int64
In [209]: s.sort_index(na_position="first")
Out[209]:
A B C
1 1 NaN 2
3 1
dtype: int64 As I did more digging, I realized the same thing happens for In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: pd.__version__
Out[3]: u'0.19.0'
In [4]: mi = pd.MultiIndex.from_tuples([[1, 1, 3], [1, 1, np.nan]], names=list('ABC'))
In [5]: df = pd.DataFrame([[1, 2], [3, 4]], mi)
In [6]: df.sort_index(na_position="first")
Out[6]:
0 1
A B C
1 1 NaN 3 4
3 1 2
In [7]: df.sort_index(na_position="last")
Out[7]:
0 1
A B C
1 1 NaN 3 4
3 1 2 I've done a bit of looking and I think it can be fixed by passing the values of the
|
this has not been implemented afair for multi index; it's straightforward to do; you can create an issue about it you will always sort labels; however (-1) is the nan indicator |
@brandonmburroughs Let's just leave MultiIndex out of this PR then (so it can be merged), it indeed does not work for DataFrames as well (maybe you can add to the docstring of this keyword something like "Not implemented for MultiIndex").
I would first open an issue about it (or look for an existing one, I would think we already have an issue about it) to discuss it, as this is not a simple change (maybe in code changes, but has implications for usage) |
all of @jorisvandenbossche comments are good multi index sorting is not well implemented except for lexsorting |
Okay, thanks for all of the feedback @jreback and @jorisvandenbossche . I created a new issue for |
803a060
to
423c771
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a small remaining comment
indexer = _ensure_platform_int(indexer) | ||
new_index = index.take(indexer) | ||
indexer = _lexsort_indexer(index.labels, orders=ascending, | ||
na_position=na_position) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave this line out? (if it's not supported, not really needed to pass it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, leave out the na_position
argument from _lexsort_indexer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes (as it has no effect I understood?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is true. Unfortunately, it has no effect right now. I pushed that change now.
@brandonmburroughs Thanks a lot! |
closes #13589
closes #13729
I'm waiting on adding the whatsnew entry. I'm not sure if this is a BUG (the documentation says these parameters exist) or an ENH (we're adding support for these parameters). It depends on how you look at it!