-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: making value_counts stable/keeping original ordering #39009
Conversation
There are still some issues that needs to be addressed/discussed:
cpdef stable_value_count_object(ndarray[object] values, bint dropna, na_representant=np.nan):
...
for i in range(n):
val = values[i]
if checknull(val):
val = na_representant # uses one representant for all null-objects
if not checknull(val) or not dropna:
k = kh_get_{{ttype}}(table, <PyObject*>val)
... this would be slightly faster than the current approach, would work also for
|
surprising asv_bench showed no changes when ran with:
|
6aa9652
to
8718976
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good a few comments. pls add a whatsnew note in bug fixes reshaping section. also if this is closing multiple issues (iir also the 32-bit ordering one?) then indicate them there as well.
{{else}} | ||
build_count_table_{{dtype}}(values, table, dropna) | ||
result_keys = {{'U'+dtype[1::].title()}}Vector() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was just too clever, changed to {{name} here: 9a17c13
8718976
to
0966c2b
Compare
0966c2b
to
e64b121
Compare
I went on with 2. from comment #39009 (comment) I think 1. and 3. from this comment are better as separate PRs. |
e64b121
to
e532854
Compare
can you merge master. |
e532854
to
d480b7c
Compare
https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=52863&view=logs&j=acc1347f-8235-55b0-95c7-0e2189e61659&t=486f34f1-eda6-516a-fc5d-d8195128dacd i think you need to update this test (was added since you did this PR originally) |
@jreback Thanks, fixed now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm minor comments. ping on green
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -356,6 +356,7 @@ Reshaping | |||
- Bug in :func:`join` over :class:`MultiIndex` returned wrong result, when one of both indexes had only one level (:issue:`36909`) | |||
- :meth:`merge_asof` raises ``ValueError`` instead of cryptic ``TypeError`` in case of non-numerical merge columns (:issue:`29130`) | |||
- Bug in :meth:`DataFrame.join` not assigning values correctly when having :class:`MultiIndex` where at least one dimension is from dtype ``Categorical`` with non-alphabetically sorted categories (:issue:`38502`) | |||
- ``value_counts`` returns keys in original order (:issue:`12679`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:meth:`Series.value_counts`
, can you add the other issue that is being closed here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done d85b1aa
is_null = checknull(val) | ||
if is_null: | ||
val = navalue | ||
if not is_null or not dropna: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be elif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how it could work with elif (but I'm always struggeling with logic). I have rearanged the if and hope it is clearer now: 230215f
@jreback green |
thanks @realead |
closes #12679
closes #11227
The order of the returned keys for
value_counts
aren't arbitrary (i.e. depending on the used hash function) but are original ordering (when sorted this applies for the keys with the same number of values).