Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: rank_1d #40546

Merged
merged 18 commits into from
Mar 23, 2021
Merged

CLN: rank_1d #40546

merged 18 commits into from
Mar 23, 2021

Conversation

mzeitlin11
Copy link
Member

Long overdue followup to #38744. Only half the diff is relevant since the changes are duplicated over the if rank_t is object... else with no_gil... structure.

@mzeitlin11 mzeitlin11 added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Clean labels Mar 23, 2021
@WillAyd
Copy link
Member

WillAyd commented Mar 23, 2021

Can you post ASVs?

@mzeitlin11
Copy link
Member Author

mzeitlin11 commented Mar 23, 2021

Added ASVs here:

before           after         ratio
     [ced764da]       [f6a04b79]
     <master>         <cln/rank_1d>
          429±8μs          430±2μs     1.00  groupby.GroupByMethods.time_dtype_as_field('datetime', 'rank', 'direct')
          438±4μs          432±6μs     0.99  groupby.GroupByMethods.time_dtype_as_field('datetime', 'rank', 'transformation')
          490±7μs         469±20μs     0.96  groupby.GroupByMethods.time_dtype_as_field('float', 'rank', 'direct')
          487±3μs          487±3μs     1.00  groupby.GroupByMethods.time_dtype_as_field('float', 'rank', 'transformation')
         480±40μs         483±20μs     1.01  groupby.GroupByMethods.time_dtype_as_field('int', 'rank', 'direct')
          475±2μs         467±40μs     0.98  groupby.GroupByMethods.time_dtype_as_field('int', 'rank', 'transformation')
      1.49±0.08ms      1.55±0.01ms     1.03  groupby.GroupByMethods.time_dtype_as_field('object', 'rank', 'direct')
      1.55±0.03ms      1.57±0.01ms     1.01  groupby.GroupByMethods.time_dtype_as_field('object', 'rank', 'transformation')
          474±3μs          472±3μs     1.00  groupby.GroupByMethods.time_dtype_as_group('datetime', 'rank', 'direct')
          477±2μs          473±9μs     0.99  groupby.GroupByMethods.time_dtype_as_group('datetime', 'rank', 'transformation')
          468±9μs          474±4μs     1.01  groupby.GroupByMethods.time_dtype_as_group('float', 'rank', 'direct')
          473±3μs          474±7μs     1.00  groupby.GroupByMethods.time_dtype_as_group('float', 'rank', 'transformation')
          475±1μs          483±2μs     1.02  groupby.GroupByMethods.time_dtype_as_group('int', 'rank', 'direct')
         477±10μs         499±10μs     1.05  groupby.GroupByMethods.time_dtype_as_group('int', 'rank', 'transformation')
         367±10μs          374±2μs     1.02  groupby.GroupByMethods.time_dtype_as_group('object', 'rank', 'direct')
         373±20μs          378±6μs     1.01  groupby.GroupByMethods.time_dtype_as_group('object', 'rank', 'transformation')
       1.13±0.4ms      1.13±0.02ms     1.00  groupby.RankWithTies.time_rank_ties('datetime64', 'average')
      1.12±0.04ms      1.12±0.03ms     0.99  groupby.RankWithTies.time_rank_ties('datetime64', 'dense')
       1.14±0.1ms      1.12±0.02ms     0.98  groupby.RankWithTies.time_rank_ties('datetime64', 'first')
      1.09±0.08ms      1.13±0.01ms     1.03  groupby.RankWithTies.time_rank_ties('datetime64', 'max')
      1.11±0.07ms      1.12±0.01ms     1.01  groupby.RankWithTies.time_rank_ties('datetime64', 'min')
      1.05±0.05ms      1.07±0.01ms     1.02  groupby.RankWithTies.time_rank_ties('float32', 'average')
         942±70μs       1.05±0.1ms    ~1.11  groupby.RankWithTies.time_rank_ties('float32', 'dense')
      1.05±0.03ms         984±60μs     0.94  groupby.RankWithTies.time_rank_ties('float32', 'first')
      1.08±0.01ms      1.02±0.07ms     0.95  groupby.RankWithTies.time_rank_ties('float32', 'max')
      1.06±0.02ms         987±70μs     0.93  groupby.RankWithTies.time_rank_ties('float32', 'min')
         996±60μs      1.05±0.01ms     1.06  groupby.RankWithTies.time_rank_ties('float64', 'average')
      1.03±0.06ms         1.04±0ms     1.01  groupby.RankWithTies.time_rank_ties('float64', 'dense')
      1.06±0.09ms      1.05±0.01ms     0.99  groupby.RankWithTies.time_rank_ties('float64', 'first')
      1.05±0.06ms      1.04±0.08ms     0.99  groupby.RankWithTies.time_rank_ties('float64', 'max')
         997±50μs      1.03±0.07ms     1.03  groupby.RankWithTies.time_rank_ties('float64', 'min')
      1.08±0.07ms      1.05±0.03ms     0.98  groupby.RankWithTies.time_rank_ties('int64', 'average')
       1.54±0.6ms      1.06±0.02ms    ~0.69  groupby.RankWithTies.time_rank_ties('int64', 'dense')
      1.08±0.03ms      1.06±0.05ms     0.98  groupby.RankWithTies.time_rank_ties('int64', 'first')
       1.07±0.1ms      1.06±0.08ms     0.99  groupby.RankWithTies.time_rank_ties('int64', 'max')
       1.12±0.2ms      1.06±0.01ms     0.95  groupby.RankWithTies.time_rank_ties('int64', 'min')
      10.8±0.06ms       10.8±0.4ms     1.00  series_methods.Rank.time_rank('float')
      8.07±0.04ms      8.23±0.08ms     1.02  series_methods.Rank.time_rank('int')
        60.5±30ms       55.3±0.4ms     0.91  series_methods.Rank.time_rank('object')
      7.97±0.05ms       8.57±0.3ms     1.07  series_methods.Rank.time_rank('uint')

@jreback
Copy link
Contributor

jreback commented Mar 23, 2021

this is a pure refactor right?

@jreback jreback added this to the 1.3 milestone Mar 23, 2021
@jreback jreback requested a review from jbrockmendel March 23, 2021 16:21
@mzeitlin11
Copy link
Member Author

this is a pure refactor right?

Yeah, no behavior change

@jreback
Copy link
Contributor

jreback commented Mar 23, 2021

kk lgtm. cc @jbrockmendel

out[lexsort_indexer[j]] = j + 1 - grp_start
elif tiebreak == TIEBREAK_FIRST_DESCENDING:
for j in range(i - dups + 1, i + 1):
out[lexsort_indexer[j]] = 2 * i - j - dups + 2 - grp_start
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be obvious where 2 * i - j - dups + 2 - grp_start comes from? looks like equivalent to (i - dups + 1) + (i - j + 1) - grp_start

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, good opportunity to clarify here, have added some commentary

@jbrockmendel
Copy link
Member

Potential clarification, otherwise LGTM

@jreback jreback merged commit bc62e76 into pandas-dev:master Mar 23, 2021
@jreback
Copy link
Contributor

jreback commented Mar 23, 2021

thanks @mzeitlin11

@mzeitlin11 mzeitlin11 deleted the cln/rank_1d branch March 23, 2021 20:32
vladu pushed a commit to vladu/pandas that referenced this pull request Apr 5, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Clean
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants