Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix memory issues in rolling.min/max #33693

Merged
merged 8 commits into from
Apr 28, 2020

Conversation

s-scherrer
Copy link
Contributor

@s-scherrer s-scherrer commented Apr 21, 2020

This fixes at least the reproducible part of #30726. I am not
totally sure what is going on here (is this a true memory leak?), and whether this fixes all issues, but it does strongly reduce memory usage as measured by psutil.

My tests have shown that there are two solutions that avoid growing memory
usage:

  • pass memoryviews (float64_t[:]) instead of ndarray[float64_t]
  • remove starti and endi as arguments to _roll_min_max_fixed

This commit implements both, since _roll_min_max_fixed doesn't use starti and endi anyways.

I am unsure about testing: Does this need a test? If so, what would be a good way to go about testing? The tests I performed (example code in the linked issue) are probably very system specific.

@jbrockmendel
Copy link
Member

I am unsure about testing: Does this need a test? If so, what would be a good way to go about testing? The tests I performed (example code in the linked issue) are probably very system specific.

Definitely needs a test. How about you implement a test based on the example in #30726 (though pls construct inputs directly, dont use skimage/scipy) and we'll see how the CI reacts.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u add a memory asv
and a whats new note (bug fix rolling section)

@s-scherrer
Copy link
Contributor Author

Upon closer inspection of #25926, I found that this fixes exactly the same issue. It seems to have been reintroduced at some point. An asv benchmark for this already exists.

@s-scherrer
Copy link
Contributor Author

I am unsure about testing: Does this need a test? If so, what would be a good way to go about testing? The tests I performed (example code in the linked issue) are probably very system specific.

Definitely needs a test. How about you implement a test based on the example in #30726 (though pls construct inputs directly, dont use skimage/scipy) and we'll see how the CI reacts.

How would I go about measuring the memory use? As shown in the issue, I used psutil to show system memory usage. During testing, I made sure to not have other processes running that could lead to larger changes in memory usage. I'm not sure that will work with CI.

@jreback
Copy link
Contributor

jreback commented Apr 23, 2020

you can add memory benchmarks with asv

we have a few,
search for mem i think

This fixes at least the reproducible part of pandas-dev#30726, however, I am not
totally sure what is going on here.
Tests have shown that there are two solutions that avoid growing memory
usage:

- pass memoryviews (float64_t[:]) instead of ndarray[float64_t]
- remove starti and endi as arguments to _roll_min_max_fixed

This commit implements both.
@s-scherrer s-scherrer force-pushed the rolling-min-max-memory-leak branch from f89a4a0 to 959a711 Compare April 23, 2020 14:50
@s-scherrer
Copy link
Contributor Author

you can add memory benchmarks with asv

we have a few,
search for mem i think

I changed the numbers for the asv peakmem benchmark from #25926 so that the isse is a bit more visible. Should I still add a test for the pytest test suite, or is the benchmark enough?

@jreback jreback added Performance Memory or execution speed performance Window rolling, ewma, expanding labels Apr 23, 2020
@jreback jreback added this to the 1.1 milestone Apr 23, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. pls show the memory asv results. As a followup we should have a memory asv for all of the standard window functions.

@jreback
Copy link
Contributor

jreback commented Apr 23, 2020

you can add memory benchmarks with asv
we have a few,
search for mem i think

I changed the numbers for the asv peakmem benchmark from #25926 so that the isse is a bit more visible. Should I still add a test for the pytest test suite, or is the benchmark enough?

no test needed, but see my comments above

@WillAyd
Copy link
Member

WillAyd commented Apr 24, 2020

pending asv results this lgtm

@s-scherrer
Copy link
Contributor Author

s-scherrer commented Apr 26, 2020

Here are the results of the asv run for master:

 $ asv show master
Commit: master <master>

rolling.PeakMemFixedWindowMinMax.peakmem_fixed [my_machine/virtualenv-py3.8-Cython-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt]
  ok
  ======================== =======
           param1                 
  ------------------------ -------
   <function Rolling.min>   1.72G 
   <function Rolling.max>   1.72G 
  ======================== =======
  started: 2020-04-26 11:28:00, duration: 19.8s

And this is on the new branch:

 $ asv show rolling-min-max-memory-leak
Commit: rolling- <rolling-min-max-memory-leak>

rolling.PeakMemFixedWindowMinMax.peakmem_fixed [my_machine/virtualenv-py3.8-Cython-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt]
  ok
  ======================== ======
           param1                
  ------------------------ ------
   <function Rolling.min>   118M 
   <function Rolling.max>   118M 
  ======================== ======
  started: 2020-04-26 11:06:59, duration: 22.6s

@pep8speaks
Copy link

pep8speaks commented Apr 26, 2020

Hello @s-scherrer! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-04-28 09:35:51 UTC

@xmatthias
Copy link

xmatthias commented Apr 26, 2020

This fix also resolves #32266 - which seems to be a duplicate of the original issue here too.

Considering that this is a pretty severe bug in the whole 1.0.x tree, i think this should be part of the 1.0.4 milestone, too.

@s-scherrer
Copy link
Contributor Author

Here are the new results for master:

 $ asv show master
Commit: master <master>

rolling.PeakMemFixedWindowMinMax.peakmem_fixed [my_machine/virtualenv-py3.8-Cython-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt]
  ok
  ======================== ======
           param1                
  ------------------------ ------
   <function Rolling.min>   262M 
   <function Rolling.max>   262M 
  ======================== ======
  started: 2020-04-26 15:09:36, duration: 1.12s

and on the new branch:

 $ asv show rolling-min-max-memory-leak
Commit: rolling- <rolling-min-max-memory-leak>

rolling.PeakMemFixedWindowMinMax.peakmem_fixed [my_machine/virtualenv-py3.8-Cython-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt]
  ok
  ======================== ======
           param1                
  ------------------------ ------
   <function Rolling.min>   198M 
   <function Rolling.max>   198M 
  ======================== ======
  started: 2020-04-26 15:08:56, duration: 1.32s

It's not below 1s, but close. I'm also running this on a pretty old thinkpad with a pre-2010 Core 2 Duo processor, so I guess it will run faster on newer machines.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last 2 comments long on green

@hroff-1902
Copy link

hroff-1902 commented Apr 28, 2020

can you add a longer test to be sure it does not affect performance of these widely used methods?

@jreback jreback merged commit df5eee6 into pandas-dev:master Apr 28, 2020
@jreback
Copy link
Contributor

jreback commented Apr 28, 2020

thanks @s-scherrer very nice

@s-scherrer s-scherrer deleted the rolling-min-max-memory-leak branch April 28, 2020 15:02
@xmatthias
Copy link

@jreback any chance of backporting this into the next 1.0.x release, or is it too late for that as it's due in a few days?

I'd think that would be beneficial as it's fixing a nasty bug...

@jreback
Copy link
Contributor

jreback commented Apr 28, 2020

@xmatthias this is a plan for a 1.0.4 release, I believe @simonjayhawkins is planning on this. But as pandas is fully volunteer it may or may not happen.

@xmatthias
Copy link

Don't misunderstand me - i'm aware that pandas is fully volunteer (and i appreciate and thank anyone spending their time on it!) -

i'm just trying to say that it would be great to have this bugfix included in the next possible release.

@jreback
Copy link
Contributor

jreback commented Apr 28, 2020

Don't misunderstand me - i'm aware that pandas is fully volunteer (and i appreciate and thank anyone spending their time on it!) -

i'm just trying to say that it would be great to have this bugfix included in the next possible release.

yes if we have a 1.04 it will be included, its also in 1.1, which actually should be soon anyhow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should Whitespaces be placed at the begging of a line Rolling min/max gives malloc error
8 participants