Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN/PERF: clean-up of the benchmarks #14099

Merged
merged 20 commits into from
Dec 10, 2016

Conversation

jorisvandenbossche
Copy link
Member

Related to #10849 (similar to #10998)

Just putting this up, was doing some time ago, but not yet fully ready, and thought to already merge this (if the content is OK) to not get too much conflicts.

Gist: removing redundancy by gathering tests in classes and sharing setup functions (you can see the diff change that it removed quite some lines of code). At the same time, I am cleaning up the test names a bit, to make them shorter/more pythonic (eg DatetimeIndex.time_add_timedelta instead of time_datetimeindex_add_timedelta.time_datetimeindex_add_timedelta).
The only disadvantage of this cleanup is that for people who already did run the benchmarks for older versions of pandas, this will have to be redone.

@jorisvandenbossche jorisvandenbossche added Performance Memory or execution speed performance Clean labels Aug 27, 2016
def sample(self, values, k):
self.sampler = np.random.permutation(len(values))
return values.take(self.sampler[:k])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc we had some MergeAsof benches somewhere (maybe just missed them)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I probably did the cleanup of this file before that (or I didn't update), will cleanup the merge asof benches as well then (they are now in merge_asof_noby, merge_asof_byobject and merge_asof_byint classes

@codecov-io
Copy link

codecov-io commented Aug 27, 2016

Current coverage is 85.26% (diff: 100%)

Merging #14099 into master will decrease coverage by <.01%

@@             master     #14099   diff @@
==========================================
  Files           144        140     -4   
  Lines         50979      50667   -312   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          43468      43201   -267   
+ Misses         7511       7466    -45   
  Partials          0          0          

Powered by Codecov. Last update 1dbc7be...ab32803

@gfyoung
Copy link
Member

gfyoung commented Aug 28, 2016

xref #14105: Make sure to remove all references to datetools both in vb_suites and asv_bench.

@jreback jreback added this to the 0.19.0 milestone Aug 28, 2016
@jreback
Copy link
Contributor

jreback commented Aug 28, 2016

@jorisvandenbossche skimmed lgtm. Though I didn't run this, so assuming correctness, go ahead.

@jreback
Copy link
Contributor

jreback commented Sep 2, 2016

@jorisvandenbossche merge when ready.

@jorisvandenbossche
Copy link
Member Author

Will try to clean the remaining ones next week

@jreback
Copy link
Contributor

jreback commented Sep 20, 2016

let's merge this

@jreback
Copy link
Contributor

jreback commented Oct 25, 2016

side issue, we should prob pep8 the asv code :> (and of course include it in our tests)

@jreback
Copy link
Contributor

jreback commented Dec 6, 2016

should rebase and merge

@jorisvandenbossche jorisvandenbossche merged commit ad3eca1 into pandas-dev:master Dec 10, 2016
@jorisvandenbossche
Copy link
Member Author

Todo' from comments:

  • deal with the new resample API (see CLN/PERF: clean-up of the benchmarks #14099 (comment))
  • consolidate the benchmarks for the io functions (as you now have parsers_vb.py, but also io_bench.py, hdfstore_bench.py, io_sql.py). Rename parser_vb -> parser
  • further clean-up
  • better structure? (subdirs?)
  • PEP8

yarikoptic added a commit to neurodebian/pandas that referenced this pull request Dec 12, 2016
* origin/master: (22 commits)
  BUG: astype falsely converts inf to integer (GH14265) (pandas-dev#14343)
  BUG: Apply min_itemsize to index even when not appending
  DOC: warning section on memory overflow when joining/merging dataframes on index with duplicate keys (pandas-dev#14788)
  BLD: missing - on secure
  BLD: new access token on pandas-dev
  TST: Test DatetimeIndex weekend offset (pandas-dev#14853)
  BLD: escape GH_TOKEN in build_docs
  TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755)
  Frame benchmarking sum instead of mean (pandas-dev#14824)
  CLN: lint of test_base.py
  BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844)
  BUG: GH11847 Unstack with mixed dtypes coerces everything to object
  TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851)
  BLD: try new gh token for pandas-docs
  CLN/PERF: clean-up of the benchmarks (pandas-dev#14099)
  ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799)
  DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801)
  TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752)
  BLD: use org name in build-docs.sh
  BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832)
  ...
yarikoptic added a commit to neurodebian/pandas that referenced this pull request Dec 12, 2016
* commit 'v0.19.0-174-g81a2f79': (156 commits)
  BLD: escape GH_TOKEN in build_docs
  TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755)
  Frame benchmarking sum instead of mean (pandas-dev#14824)
  CLN: lint of test_base.py
  BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844)
  BUG: GH11847 Unstack with mixed dtypes coerces everything to object
  TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851)
  BLD: try new gh token for pandas-docs
  CLN/PERF: clean-up of the benchmarks (pandas-dev#14099)
  ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799)
  DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801)
  TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752)
  BLD: use org name in build-docs.sh
  BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832)
  ENH: Introduce UnsortedIndexError  GH11897 (pandas-dev#14762)
  ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753)
  DOC: Fix grammar and formatting typos (pandas-dev#14803)
  BLD: try new build credentials for pandas-docs
  TST: Test pivot with categorical data
  MAINT: Cleanup pandas/src/parser (pandas-dev#14740)
  ...
yarikoptic added a commit to neurodebian/pandas that referenced this pull request Dec 12, 2016
release 0.19.1 was from release branch

* releases: (156 commits)
  BLD: escape GH_TOKEN in build_docs
  TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755)
  Frame benchmarking sum instead of mean (pandas-dev#14824)
  CLN: lint of test_base.py
  BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844)
  BUG: GH11847 Unstack with mixed dtypes coerces everything to object
  TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851)
  BLD: try new gh token for pandas-docs
  CLN/PERF: clean-up of the benchmarks (pandas-dev#14099)
  ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799)
  DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801)
  TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752)
  BLD: use org name in build-docs.sh
  BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832)
  ENH: Introduce UnsortedIndexError  GH11897 (pandas-dev#14762)
  ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753)
  DOC: Fix grammar and formatting typos (pandas-dev#14803)
  BLD: try new build credentials for pandas-docs
  TST: Test pivot with categorical data
  MAINT: Cleanup pandas/src/parser (pandas-dev#14740)
  ...
ischurov pushed a commit to ischurov/pandas that referenced this pull request Dec 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants