CLN/PERF: clean-up of the benchmarks #14099

jorisvandenbossche · 2016-08-27T15:53:54Z

Related to #10849 (similar to #10998)

Just putting this up, was doing some time ago, but not yet fully ready, and thought to already merge this (if the content is OK) to not get too much conflicts.

Gist: removing redundancy by gathering tests in classes and sharing setup functions (you can see the diff change that it removed quite some lines of code). At the same time, I am cleaning up the test names a bit, to make them shorter/more pythonic (eg DatetimeIndex.time_add_timedelta instead of time_datetimeindex_add_timedelta.time_datetimeindex_add_timedelta).
The only disadvantage of this cleanup is that for people who already did run the benchmarks for older versions of pandas, this will have to be redone.

jreback · 2016-08-27T16:00:35Z

asv_bench/benchmarks/join_merge.py

    def sample(self, values, k):
        self.sampler = np.random.permutation(len(values))
        return values.take(self.sampler[:k])
+


iirc we had some MergeAsof benches somewhere (maybe just missed them)

Ah, yes, I probably did the cleanup of this file before that (or I didn't update), will cleanup the merge asof benches as well then (they are now in merge_asof_noby, merge_asof_byobject and merge_asof_byint classes

codecov-io · 2016-08-27T16:44:43Z

Current coverage is 85.26% (diff: 100%)

Merging #14099 into master will decrease coverage by <.01%

@@             master     #14099   diff @@
==========================================
  Files           144        140     -4   
  Lines         50979      50667   -312   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          43468      43201   -267   
+ Misses         7511       7466    -45   
  Partials          0          0

Powered by Codecov. Last update 1dbc7be...ab32803

gfyoung · 2016-08-28T09:46:27Z

xref #14105: Make sure to remove all references to datetools both in vb_suites and asv_bench.

jreback · 2016-08-28T13:19:43Z

@jorisvandenbossche skimmed lgtm. Though I didn't run this, so assuming correctness, go ahead.

jreback · 2016-09-02T21:34:56Z

@jorisvandenbossche merge when ready.

jorisvandenbossche · 2016-09-02T22:54:32Z

Will try to clean the remaining ones next week

jreback · 2016-09-20T10:30:58Z

let's merge this

jreback · 2016-10-25T22:33:03Z

side issue, we should prob pep8 the asv code :> (and of course include it in our tests)

jreback · 2016-12-06T21:31:56Z

should rebase and merge

jorisvandenbossche · 2016-12-10T14:33:32Z

Todo' from comments:

deal with the new resample API (see CLN/PERF: clean-up of the benchmarks #14099 (comment))
consolidate the benchmarks for the io functions (as you now have parsers_vb.py, but also io_bench.py, hdfstore_bench.py, io_sql.py). Rename parser_vb -> parser
further clean-up
better structure? (subdirs?)
PEP8

* origin/master: (22 commits) BUG: astype falsely converts inf to integer (GH14265) (pandas-dev#14343) BUG: Apply min_itemsize to index even when not appending DOC: warning section on memory overflow when joining/merging dataframes on index with duplicate keys (pandas-dev#14788) BLD: missing - on secure BLD: new access token on pandas-dev TST: Test DatetimeIndex weekend offset (pandas-dev#14853) BLD: escape GH_TOKEN in build_docs TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755) Frame benchmarking sum instead of mean (pandas-dev#14824) CLN: lint of test_base.py BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844) BUG: GH11847 Unstack with mixed dtypes coerces everything to object TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851) BLD: try new gh token for pandas-docs CLN/PERF: clean-up of the benchmarks (pandas-dev#14099) ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799) DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801) TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752) BLD: use org name in build-docs.sh BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832) ...

* commit 'v0.19.0-174-g81a2f79': (156 commits) BLD: escape GH_TOKEN in build_docs TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755) Frame benchmarking sum instead of mean (pandas-dev#14824) CLN: lint of test_base.py BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844) BUG: GH11847 Unstack with mixed dtypes coerces everything to object TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851) BLD: try new gh token for pandas-docs CLN/PERF: clean-up of the benchmarks (pandas-dev#14099) ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799) DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801) TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752) BLD: use org name in build-docs.sh BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832) ENH: Introduce UnsortedIndexError GH11897 (pandas-dev#14762) ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753) DOC: Fix grammar and formatting typos (pandas-dev#14803) BLD: try new build credentials for pandas-docs TST: Test pivot with categorical data MAINT: Cleanup pandas/src/parser (pandas-dev#14740) ...

release 0.19.1 was from release branch * releases: (156 commits) BLD: escape GH_TOKEN in build_docs TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755) Frame benchmarking sum instead of mean (pandas-dev#14824) CLN: lint of test_base.py BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844) BUG: GH11847 Unstack with mixed dtypes coerces everything to object TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851) BLD: try new gh token for pandas-docs CLN/PERF: clean-up of the benchmarks (pandas-dev#14099) ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799) DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801) TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752) BLD: use org name in build-docs.sh BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832) ENH: Introduce UnsortedIndexError GH11897 (pandas-dev#14762) ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753) DOC: Fix grammar and formatting typos (pandas-dev#14803) BLD: try new build credentials for pandas-docs TST: Test pivot with categorical data MAINT: Cleanup pandas/src/parser (pandas-dev#14740) ...

jorisvandenbossche force-pushed the asv-cleanup branch from 6595bf0 to 072deaa Compare August 27, 2016 15:54

jorisvandenbossche added Performance Memory or execution speed performance Clean labels Aug 27, 2016

jreback reviewed Aug 27, 2016
View reviewed changes

gfyoung mentioned this pull request Aug 28, 2016

DEPR: Deprecate pandas.core.datetools #14105

Merged

jreback added this to the 0.19.0 milestone Aug 28, 2016

jreback modified the milestones: 0.19.0, 0.19.1 Sep 28, 2016

jorisvandenbossche modified the milestones: 0.20.0, 0.19.1 Oct 3, 2016

jorisvandenbossche mentioned this pull request Oct 15, 2016

DEV: Publishing benchmarks vs. trunk? wesm/pandas2#47

Open

jorisvandenbossche force-pushed the asv-cleanup branch from 37db762 to ec4ca00 Compare October 24, 2016 13:27

jorisvandenbossche mentioned this pull request Nov 7, 2016

From pandas dask/dask-benchmarks#4

Open

jorisvandenbossche added 4 commits December 10, 2016 15:10

CLN/PERF: clean-up indexing benchmarks

89cfc25

Clean-up parser_vb benchmarks

b9e6665

Clean-up packers benchmarks

3e05108

Clean-up sql benchmarks

f5ccece

jorisvandenbossche added 16 commits December 10, 2016 15:10

Clean-up join_merge benchmarks

a12dbfd

Clean-up timeseries/timedelta benchmarks

cc23fc9

Clean-up string benchmarks

d1ecbe8

Clean-up merge asof benchmarks

7aad492

Clean-up algos/attrs/binary_ops benchmarks

731f50c

Clean-up categorical benchmarks

f174d8d

Clean-up constructor benchmarks

cce95ee

Clean-up eval + panel benchmarks

f0fbaed

Clean-up frame_methods benchmarks

65d720a

Clean-up gil bencmarks

16104d2

Clean-up indexing benchmarks

f56e26a

Clean-up inference/miscellaneous benchmarks

1baf3fc

Clean-up timedelta/period/plotting benchmarks

d54f18c

Cleanup indexing benchmarks

3826147

Clean-up groupby benchmarks

fbe8c5e

Clean-up HDFStore benchmarks

0748cb0

jorisvandenbossche force-pushed the asv-cleanup branch from f60d242 to 0748cb0 Compare December 10, 2016 14:29

jorisvandenbossche merged commit ad3eca1 into pandas-dev:master Dec 10, 2016

ischurov pushed a commit to ischurov/pandas that referenced this pull request Dec 19, 2016

CLN/PERF: clean-up of the benchmarks (pandas-dev#14099)

fa9e698

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN/PERF: clean-up of the benchmarks #14099

CLN/PERF: clean-up of the benchmarks #14099

jorisvandenbossche commented Aug 27, 2016

jreback Aug 27, 2016

jorisvandenbossche Aug 27, 2016

codecov-io commented Aug 27, 2016 •

edited

Loading

gfyoung commented Aug 28, 2016

jreback commented Aug 28, 2016

jreback commented Sep 2, 2016

jorisvandenbossche commented Sep 2, 2016

jreback commented Sep 20, 2016

jreback commented Oct 25, 2016

jreback commented Dec 6, 2016

jorisvandenbossche commented Dec 10, 2016

CLN/PERF: clean-up of the benchmarks #14099

CLN/PERF: clean-up of the benchmarks #14099

Conversation

jorisvandenbossche commented Aug 27, 2016

jreback Aug 27, 2016

Choose a reason for hiding this comment

jorisvandenbossche Aug 27, 2016

Choose a reason for hiding this comment

codecov-io commented Aug 27, 2016 • edited Loading

Current coverage is 85.26% (diff: 100%)

gfyoung commented Aug 28, 2016

jreback commented Aug 28, 2016

jreback commented Sep 2, 2016

jorisvandenbossche commented Sep 2, 2016

jreback commented Sep 20, 2016

jreback commented Oct 25, 2016

jreback commented Dec 6, 2016

jorisvandenbossche commented Dec 10, 2016

codecov-io commented Aug 27, 2016 •

edited

Loading