Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added option keep=False to nlargests/nsmallest #18656

Closed
wants to merge 4 commits into from
Closed

added option keep=False to nlargests/nsmallest #18656

wants to merge 4 commits into from

Conversation

tdpetrou
Copy link
Contributor

@tdpetrou tdpetrou commented Dec 6, 2017

I made a minor change to nlargest/nsmallest to keep all the values of the last n when keep=False

@gfyoung gfyoung added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff API Design labels Dec 6, 2017
@gfyoung gfyoung added this to the No action milestone Dec 6, 2017
@gfyoung
Copy link
Member

gfyoung commented Dec 6, 2017

Closed due to decision in #18559.

@jreback
Copy link
Contributor

jreback commented Dec 6, 2017

actually this is prob reasonable.

@jreback jreback reopened this Dec 6, 2017
@jreback jreback removed this from the No action milestone Dec 6, 2017
@codecov
Copy link

codecov bot commented Dec 6, 2017

Codecov Report

Merging #18656 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18656      +/-   ##
==========================================
- Coverage   91.59%   91.57%   -0.02%     
==========================================
  Files         153      153              
  Lines       51221    51223       +2     
==========================================
- Hits        46917    46910       -7     
- Misses       4304     4313       +9
Flag Coverage Δ
#multiple 89.44% <100%> (ø) ⬆️
#single 40.67% <0%> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/core/algorithms.py 94.16% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13f6267...fcfc563. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 6, 2017

Codecov Report

Merging #18656 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18656      +/-   ##
==========================================
- Coverage   91.61%    91.6%   -0.02%     
==========================================
  Files         153      153              
  Lines       51367    51369       +2     
==========================================
- Hits        47061    47056       -5     
- Misses       4306     4313       +7
Flag Coverage Δ
#multiple 89.46% <100%> (ø) ⬆️
#single 40.76% <0%> (-0.12%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.81% <ø> (-0.1%) ⬇️
pandas/core/algorithms.py 94.16% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/util/testing.py 82.52% <0%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 265e327...56954b4. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the whatsnew note which remove the keep=False (but keep the actual issue number attached there)

@@ -2162,6 +2162,21 @@ def test_n_duplicate_index(self, df_duplicates, n, order):
expected = df.sort_values(order, ascending=False).head(n)
tm.assert_frame_equal(result, expected)

def test_keep_false(self):
df = pd.DataFrame({'a': [5, 4, 4, 2, 3, 3, 3, 3],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a more informative name for the test

@@ -1865,3 +1865,14 @@ def test_n(self, n):
result = s.nsmallest(n)
expected = s.sort_values().head(n)
assert_series_equal(result, expected)

def test_keep_false(self):
s = Series([10, 9, 8, 7, 7, 7, 7, 6])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -910,8 +910,8 @@ def __init__(self, obj, n, keep):
self.n = n
self.keep = keep

if self.keep not in ('first', 'last'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add back to the docs-string (don't need a version added as it was tehre before)

@gfyoung
Copy link
Member

gfyoung commented Dec 6, 2017

@jreback : What do you think of making keep=False be the option? Besides API consistency, semantically, it doesn't make as much sense to me (or @tdpetrou ) to use because it would imply that we drop ties.

@tdpetrou
Copy link
Contributor Author

tdpetrou commented Dec 11, 2017

@jreback As @gfyoung mentioned, can we choose the parameter value first before making more changes. Should we do 'all', False, or something completely different?

edit: I am highly in favor of 'all'

@jreback
Copy link
Contributor

jreback commented Dec 13, 2017

yeah False doesn't seem intuitive, all I think would be ok. Not sure if we have this option anywhere else (e.g. it doesn't exist in .drop_duplicates nor should it as that would be silly:>)

@tdpetrou
Copy link
Contributor Author

@jreback Changes complete. Also, I didn't see anything in whatsnew on the other issues.

@jreback
Copy link
Contributor

jreback commented Feb 10, 2018

pls rebase

@jreback
Copy link
Contributor

jreback commented Jun 26, 2018

closing as stale

@jreback jreback closed this Jun 26, 2018
gfyoung added a commit to forking-repos/pandas that referenced this pull request Jun 27, 2018
@gfyoung
Copy link
Member

gfyoung commented Jun 27, 2018

@jreback : Revived in #21650.

gfyoung added a commit to forking-repos/pandas that referenced this pull request Jun 27, 2018
gfyoung added a commit to forking-repos/pandas that referenced this pull request Jun 28, 2018
gfyoung added a commit to forking-repos/pandas that referenced this pull request Jun 28, 2018
gfyoung added a commit to forking-repos/pandas that referenced this pull request Jun 28, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff API Design
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make parameter keep=False keep duplicates for nlargest/nsmallest
3 participants