-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Allow keep='all' for nlargest/nsmallest #21650
Conversation
Codecov Report
@@ Coverage Diff @@
## master #21650 +/- ##
==========================================
+ Coverage 91.9% 91.9% +<.01%
==========================================
Files 154 154
Lines 49555 49557 +2
==========================================
+ Hits 45542 45544 +2
Misses 4013 4013
Continue to review full report at Codecov.
|
s = Series([10, 9, 8, 7, 7, 7, 7, 6]) | ||
result = s.nlargest(4, keep='all') | ||
expected = Series([10, 9, 8, 7, 7, 7, 7]) | ||
print(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print
statement should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -24,6 +24,7 @@ Other Enhancements | |||
<https://pandas-gbq.readthedocs.io/en/latest/changelog.html#changelog-0-5-0>`__. | |||
(:issue:`21627`) | |||
- New method :meth:`HDFStore.walk` will recursively walk the group hierarchy of an HDF5 file (:issue:`10932`) | |||
- :func:`Series` / :func:`DataFrame` methods :func:`nlargest` / :func:`nsmallest` now accept the value 'all' for the `keep` argument. This keeps all ties for the nth largest/smallest value (:issue:`16818`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these will link properly if the class is separated from the method (I could be wrong though), and I think you'll need to explicitly write out the variations, e.g. :meth:`Series.nlargest`
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@jschendel : Thanks for comments! TBH, I didn't really check the PR diff before trying to apply it on |
8d3b883
to
4b7f199
Compare
@jreback @jschendel : Comments addressed, and all is passing. PTAL. |
Can you add pandas/asv_bench/benchmarks/series_methods.py Lines 41 to 44 in a620e72
pandas/asv_bench/benchmarks/frame_methods.py Lines 501 to 504 in a620e72
I don't think this change will have any perf impact, so no need to run the asv's, but might be nice to have benchmarks in place for this. LGTM otherwise. |
4b7f199
to
a07c686
Compare
@jschendel : Yep, done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. very minor comments. merge on green.
pandas/core/frame.py
Outdated
4 8 e 4.0 | ||
|
||
When using ``keep='all'``, all duplicate items are maintained | ||
>>> df.nsmallest(3, 'a', keep='all') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, blank line here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pandas/core/frame.py
Outdated
``TypeError``: | ||
|
||
>>> df.nsmallest(3, 'b') | ||
Traceback (most recent call last): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pandas/tests/frame/test_analytics.py
Outdated
@@ -2461,6 +2461,22 @@ def test_n_duplicate_index(self, df_duplicates, n, order): | |||
expected = df.sort_values(order, ascending=False).head(n) | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
def test_keep_all_ties(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you name: test_duplicate_keep_all_ties
side issue: I think make sense to split out duplicate tests to separate file (can you make an issue / PR if you feel!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -2082,6 +2082,17 @@ def test_boundary_datetimelike(self, nselect_method, dtype): | |||
vals = [min_val + 1, min_val + 2, max_val - 1, max_val, min_val] | |||
assert_check_nselect_boundary(vals, dtype, nselect_method) | |||
|
|||
def test_keep_all_ties(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same naming here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -24,6 +24,7 @@ Other Enhancements | |||
<https://pandas-gbq.readthedocs.io/en/latest/changelog.html#changelog-0-5-0>`__. | |||
(:issue:`21627`) | |||
- New method :meth:`HDFStore.walk` will recursively walk the group hierarchy of an HDF5 file (:issue:`10932`) | |||
- :meth:`Series.nlargest`, :meth:`Series.nsmallest`, :meth:`DataFrame.nlargest`, and :meth:`DataFrame.nsmallest` now accept the value ``"all"`` for the ``keep` argument. This keeps all ties for the nth largest/smallest value (:issue:`16818`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add both issue numbers here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second issue number #18656 is an actually the old PR. I mentioned it for completeness.
a07c686
to
4998050
Compare
All is green, so merging. |
Title is self-explanatory.
Closes #16818.
Closes #18656.