ENH: support kind and na_position kwargs in Series.sort_index #14445

brandonmburroughs · 2016-10-17T23:22:47Z

I'm waiting on adding the whatsnew entry. I'm not sure if this is a BUG (the documentation says these parameters exist) or an ENH (we're adding support for these parameters). It depends on how you look at it!

codecov-io · 2016-10-18T01:09:52Z

Current coverage is 85.20% (diff: 100%)

No coverage report found for master at 3d73d05.

Powered by Codecov. Last update 3d73d05...b013103

jreback · 2016-10-19T12:00:54Z

pandas/tests/series/test_sorting.py

@@ -144,3 +144,27 @@ def test_sort_index_multiindex(self):
        # rows share same level='A': sort has no effect without remaining lvls
        res = s.sort_index(level='A', sort_remaining=False)
        assert_series_equal(s, res)
+
+    def test_sort_index_kind(self):
+        series = Series(index=[3, 2, 1, 4, 3])


add a comment with the github reference

jreback · 2016-10-19T12:07:42Z

pandas/core/series.py

-                                                   ascending=ascending)
+            from pandas.core.groupby import _nargsort
+
+            if ((ascending and index.is_monotonic_increasing) or


I would do this optimization inside _nargsort. you can try and see if anything breaks the test suite.

this would just return an indexer like np.arange(len(index)).

Okay. There isn't an explicit function to check monotonicity, but I've added similar functionality in _nargsort. This also means we can remove this checking from the similar sort_index function in pandas/core/frame.py.

pls remove this part

It's now removed. I'll work on moving that functionality into _nargsort.

jreback · 2016-10-19T12:08:21Z

pls add a whatsnew note, 0.19.1 is fine. (you can list both issues)

jreback · 2016-10-20T10:41:02Z

doc/source/whatsnew/v0.19.1.txt

@@ -48,3 +48,4 @@ Bug Fixes
 - Bug in ``MultiIndex.set_levels`` where illegal level values were still set after raising an error (:issue:`13754`)
 - Bug in ``DataFrame.to_json`` where ``lines=True`` and a value contained a ``}`` character (:issue:`14391`)
 - Bug in ``df.groupby`` causing an ``AttributeError`` when grouping a single index frame by a column and the index level (:issue`14327`)
+- Bug in ``Series.sort_index`` where parameters ``kind`` and ``na_position`` did not exist (:issue:`13589` & :issue:`14444`)


jreback · 2016-10-20T10:44:08Z

pandas/core/groupby.py

@@ -4280,6 +4280,11 @@ def _nargsort(items, kind='quicksort', ascending=True, na_position='last'):
    if is_categorical_dtype(items):
        return items.argsort(ascending=ascending)

+    # # GH11080 - Check monotonic-ness before sort an index


this optimization is very problematic. Its quite inefficient (we already have these checks on an actual Index, but we don't guarantee this is an index here, though easy to wrap). It doesn't handle nans properly. If it passes the test suite I guess its ok, but would need a performance review. Remove and put in a new PR please.

jreback · 2016-10-20T10:44:58Z

pandas/tests/series/test_sorting.py

@@ -144,3 +144,28 @@ def test_sort_index_multiindex(self):
        # rows share same level='A': sort has no effect without remaining lvls
        res = s.sort_index(level='A', sort_remaining=False)
        assert_series_equal(s, res)
+
+    # GH #14444 & #13589:  Add support for sort algo choosing


put the comment inside the function

jorisvandenbossche · 2016-10-20T10:49:18Z

I think this only handles single Index for now, we should also deal with MultiIndex (see also the older PR for this https://github.com/pandas-dev/pandas/pull/13729/files)

jreback · 2016-10-20T10:57:24Z

@jorisvandenbossche that looks right. on merge let's make sure to close the other PR

jreback · 2016-10-25T10:54:44Z

doc/source/whatsnew/v0.19.1.txt

@@ -37,6 +37,7 @@ Bug Fixes
 - Bug in localizing an ambiguous timezone when a boolean is passed (:issue:`14402`)


+- Bug in ``Series.sort_index`` where parameters ``kind`` and ``na_position`` did not exist (:issue:`13589`, :issue:`14444`)


@jorisvandenbossche want to highlite this change. maybe enhancements section? (even though its a bug fix / api change type of fix)

jreback · 2016-10-26T10:38:45Z

pandas/core/series.py

@@ -1786,8 +1786,11 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False,
            indexer = _ensure_platform_int(indexer)
            new_index = index.take(indexer)
        else:


can you put:

indexer = _ensure_platform_int(indexer) new_index = index.take(indexer)

outside the if-else block

Yes; moved.

jreback · 2016-11-16T22:26:21Z

doc/source/whatsnew/v0.19.1.txt

@@ -30,6 +30,7 @@ Performance Improvements

 Bug Fixes
 ~~~~~~~~~
+- Bug in ``Series.sort_index`` where parameters ``kind`` and ``na_position`` did not exist (:issue:`13589`, :issue:`14444`)


can you move to 0.20.0

jreback · 2016-11-16T22:26:38Z

ping on green.

jorisvandenbossche · 2016-11-17T23:49:06Z

For me it's good to have this merged, but note that it does not fully close the issue, as it's not yet solved for MultiIndex, so I would leave the issue open and clarify there.
The DataFrame.sort_index can handle na_position for MultiIndex, so I suspect it should not be difficult to add this to Series.sort_index as well.

brandonmburroughs · 2016-11-20T19:05:23Z

@jorisvandenbossche I overlooked that; thanks for pointing it out. I've now added na_position to MultiIndex as well. Thanks!

@jreback Moved to other enhancements in 0.20.0.

jorisvandenbossche · 2016-11-25T21:39:59Z

@brandonmburroughs thanks for the update. Last question: can you also add a test for 'na_position' for the multi-index case

brandonmburroughs · 2016-11-28T01:01:02Z

@jorisvandenbossche As I was adding the test, I realized that this doesn't seem to really use the na_position argument due to the way we sort the MultiIndex. Whenever we create a MultiIndex, we store the labels as relative values. For instance, if we have the following MultiIndex:

MultiIndex.from_tuples([[1, 1, 3], [1, 1, np.nan]], names=list('ABC'))

the values get stored as

MultiIndex(levels=[[1], [1], [3]],
           labels=[[0, 0], [0, 0], [0, -1]],
           names=[u'A', u'B', u'C'])

These label values are what get passed to the sorting algorithm for both DataFrames and Series. Since the sorting only happens on the labels, it has no notion of the NaN. Hence, even with the updated code, we get the following results for a Series:

In [206]: mi = MultiIndex.from_tuples([[1, 1, 3], [1, 1, np.nan]], names=list('ABC'))

In [207]: s = Series([1, 2], mi)

In [208]: s.sort_index(na_position="last")
Out[208]: 
A  B  C  
1  1  NaN    2
      3      1
dtype: int64

In [209]: s.sort_index(na_position="first")
Out[209]: 
A  B  C  
1  1  NaN    2
      3      1
dtype: int64

As I did more digging, I realized the same thing happens for DataFrame and it would seem the na_position argument never really worked.

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: pd.__version__
Out[3]: u'0.19.0'

In [4]: mi = pd.MultiIndex.from_tuples([[1, 1, 3], [1, 1, np.nan]], names=list('ABC'))

In [5]: df = pd.DataFrame([[1, 2], [3, 4]], mi)

In [6]: df.sort_index(na_position="first")
Out[6]: 
         0  1
A B C        
1 1 NaN  3  4
    3    1  2

In [7]: df.sort_index(na_position="last")
Out[7]: 
         0  1
A B C        
1 1 NaN  3  4
    3    1  2

I've done a bit of looking and I think it can be fixed by passing the values of the MultiIndex to sort rather than the labels. Should I

Add the changes here to be reviewed?
Create a new pull request since it's kind of a separate problem?
Create an issue detailing this behavior and then create a pull request?

jreback · 2016-11-28T01:07:24Z

this has not been implemented afair for multi index; it's straightforward to do; you can create an issue about it

you will always sort labels; however (-1) is the nan indicator

jorisvandenbossche · 2016-11-28T08:18:07Z

@brandonmburroughs Let's just leave MultiIndex out of this PR then (so it can be merged), it indeed does not work for DataFrames as well (maybe you can add to the docstring of this keyword something like "Not implemented for MultiIndex").

I think it can be fixed by passing the values of the MultiIndex to sort rather than the labels

I would first open an issue about it (or look for an existing one, I would think we already have an issue about it) to discuss it, as this is not a simple change (maybe in code changes, but has implications for usage)

jorisvandenbossche · 2016-11-28T08:21:32Z

Regarding the MutliIndex sort, I don't directly find an issue about it explicitly, but see the discussion in #14015, and an issue that followed from that discussion #14672.

jreback · 2016-11-28T11:06:02Z

all of @jorisvandenbossche comments are good

multi index sorting is not well implemented except for lexsorting
let's open a new issue (maybe should instead be a master issue encompassing the other 2 mi sort issues)

brandonmburroughs · 2016-12-01T23:31:25Z

Okay, thanks for all of the feedback @jreback and @jorisvandenbossche . I created a new issue for na_position specifically, #14784 . I also added a note in the sort_index docstring that na_position isn't implemented for MultiIndex.

jorisvandenbossche

Looks good, just a small remaining comment

jorisvandenbossche · 2016-12-02T23:20:53Z

pandas/core/series.py

-            indexer = _ensure_platform_int(indexer)
-            new_index = index.take(indexer)
+            indexer = _lexsort_indexer(index.labels, orders=ascending,
+                                       na_position=na_position)


Can you leave this line out? (if it's not supported, not really needed to pass it)

To be clear, leave out the na_position argument from _lexsort_indexer?

yes (as it has no effect I understood?)

Yes, that is true. Unfortunately, it has no effect right now. I pushed that change now.

jorisvandenbossche · 2016-12-04T20:33:03Z

@brandonmburroughs Thanks a lot!

jreback reviewed Oct 19, 2016

View reviewed changes

jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 19, 2016

brandonmburroughs force-pushed the series_sort_index branch from c294d9d to e69cbe4 Compare October 20, 2016 02:19

jorisvandenbossche added this to the 0.19.1 milestone Oct 20, 2016

jreback reviewed Oct 20, 2016

View reviewed changes

brandonmburroughs force-pushed the series_sort_index branch 2 times, most recently from 7a65e84 to 5cade34 Compare October 20, 2016 23:50

jreback reviewed Oct 25, 2016

View reviewed changes

BUG Series.sort_index does not accept parameters kind and na_position

d1d6513

brandonmburroughs force-pushed the series_sort_index branch from 485949e to d1d6513 Compare October 25, 2016 21:34

jreback reviewed Oct 26, 2016

View reviewed changes

Moving ensure and take

aae1c1c

jorisvandenbossche modified the milestones: 0.20.0, 0.19.1 Nov 1, 2016

jreback reviewed Nov 16, 2016

View reviewed changes

Adding na_position for multiindex

43dd994

brandonmburroughs force-pushed the series_sort_index branch 2 times, most recently from 8a6bb77 to b013103 Compare November 20, 2016 23:02

Moving to 0.20.0

b013103

jreback mentioned this pull request Nov 25, 2016

BUG: Accept kwargs kind and na_position in Series.sort_index (GH13589) #13729

Closed

3 tasks

Adding note that na_position doesn't work for MultiIndex

423c771

brandonmburroughs force-pushed the series_sort_index branch from 803a060 to 423c771 Compare December 2, 2016 22:41

jorisvandenbossche reviewed Dec 2, 2016

View reviewed changes

Removing na_position argument

05e9e52

jorisvandenbossche merged commit 5d0e157 into pandas-dev:master Dec 4, 2016

jorisvandenbossche changed the title ~~Updating sort_index for Series to follow pattern DataFrame~~ ENH: support kind and na_position kwargs in Series.sort_index Dec 4, 2016

jreback mentioned this pull request Dec 23, 2016

Series.sort_index does not() implement optional argument kind #14971

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: support kind and na_position kwargs in Series.sort_index #14445

ENH: support kind and na_position kwargs in Series.sort_index #14445

brandonmburroughs commented Oct 17, 2016 •

edited by jreback

Loading

codecov-io commented Oct 18, 2016 •

edited

Loading

jreback Oct 19, 2016

brandonmburroughs Oct 20, 2016

jreback Oct 19, 2016

brandonmburroughs Oct 20, 2016

jreback Oct 25, 2016

brandonmburroughs Oct 25, 2016

jreback commented Oct 19, 2016 •

edited

Loading

jreback Oct 20, 2016

jreback Oct 20, 2016

jreback Oct 20, 2016

jorisvandenbossche commented Oct 20, 2016

jreback commented Oct 20, 2016

jreback Oct 25, 2016

jreback Oct 26, 2016

brandonmburroughs Oct 27, 2016

jreback Nov 16, 2016

jreback commented Nov 16, 2016

jorisvandenbossche commented Nov 17, 2016

brandonmburroughs commented Nov 20, 2016

jorisvandenbossche commented Nov 25, 2016

brandonmburroughs commented Nov 28, 2016

jreback commented Nov 28, 2016

jorisvandenbossche commented Nov 28, 2016

jorisvandenbossche commented Nov 28, 2016

jreback commented Nov 28, 2016

brandonmburroughs commented Dec 1, 2016

jorisvandenbossche left a comment

jorisvandenbossche Dec 2, 2016

brandonmburroughs Dec 2, 2016

jorisvandenbossche Dec 2, 2016

brandonmburroughs Dec 2, 2016

jorisvandenbossche commented Dec 4, 2016

		@@ -37,6 +37,7 @@ Bug Fixes
		- Bug in localizing an ambiguous timezone when a boolean is passed (:issue:`14402`)


		- Bug in ``Series.sort_index`` where parameters ``kind`` and ``na_position`` did not exist (:issue:`13589`, :issue:`14444`)

ENH: support kind and na_position kwargs in Series.sort_index #14445

ENH: support kind and na_position kwargs in Series.sort_index #14445

Conversation

brandonmburroughs commented Oct 17, 2016 • edited by jreback Loading

codecov-io commented Oct 18, 2016 • edited Loading

Current coverage is 85.20% (diff: 100%)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 19, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Oct 20, 2016

jreback commented Oct 20, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Nov 16, 2016

jorisvandenbossche commented Nov 17, 2016

brandonmburroughs commented Nov 20, 2016

jorisvandenbossche commented Nov 25, 2016

brandonmburroughs commented Nov 28, 2016

jreback commented Nov 28, 2016

jorisvandenbossche commented Nov 28, 2016

jorisvandenbossche commented Nov 28, 2016

jreback commented Nov 28, 2016

brandonmburroughs commented Dec 1, 2016

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Dec 4, 2016

brandonmburroughs commented Oct 17, 2016 •

edited by jreback

Loading

codecov-io commented Oct 18, 2016 •

edited

Loading

jreback commented Oct 19, 2016 •

edited

Loading