-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix memory issues in rolling.min/max #33693
BUG: Fix memory issues in rolling.min/max #33693
Conversation
Definitely needs a test. How about you implement a test based on the example in #30726 (though pls construct inputs directly, dont use skimage/scipy) and we'll see how the CI reacts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u add a memory asv
and a whats new note (bug fix rolling section)
Upon closer inspection of #25926, I found that this fixes exactly the same issue. It seems to have been reintroduced at some point. An asv benchmark for this already exists. |
How would I go about measuring the memory use? As shown in the issue, I used |
you can add memory benchmarks with asv we have a few, |
This fixes at least the reproducible part of pandas-dev#30726, however, I am not totally sure what is going on here. Tests have shown that there are two solutions that avoid growing memory usage: - pass memoryviews (float64_t[:]) instead of ndarray[float64_t] - remove starti and endi as arguments to _roll_min_max_fixed This commit implements both.
f89a4a0
to
959a711
Compare
I changed the numbers for the asv peakmem benchmark from #25926 so that the isse is a bit more visible. Should I still add a test for the pytest test suite, or is the benchmark enough? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. pls show the memory asv results. As a followup we should have a memory asv for all of the standard window functions.
no test needed, but see my comments above |
pending asv results this lgtm |
Here are the results of the asv run for master:
And this is on the new branch:
|
Hello @s-scherrer! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-04-28 09:35:51 UTC |
This fix also resolves #32266 - which seems to be a duplicate of the original issue here too. Considering that this is a pretty severe bug in the whole 1.0.x tree, i think this should be part of the 1.0.4 milestone, too. |
Here are the new results for master:
and on the new branch:
It's not below 1s, but close. I'm also running this on a pretty old thinkpad with a pre-2010 Core 2 Duo processor, so I guess it will run faster on newer machines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last 2 comments long on green
can you add a longer test to be sure it does not affect performance of these widely used methods? |
thanks @s-scherrer very nice |
@jreback any chance of backporting this into the next 1.0.x release, or is it too late for that as it's due in a few days? I'd think that would be beneficial as it's fixing a nasty bug... |
@xmatthias this is a plan for a 1.0.4 release, I believe @simonjayhawkins is planning on this. But as pandas is fully volunteer it may or may not happen. |
Don't misunderstand me - i'm aware that pandas is fully volunteer (and i appreciate and thank anyone spending their time on it!) - i'm just trying to say that it would be great to have this bugfix included in the next possible release. |
yes if we have a 1.04 it will be included, its also in 1.1, which actually should be soon anyhow |
…in rolling.min/max)
This fixes at least the reproducible part of #30726. I am not
totally sure what is going on here (is this a true memory leak?), and whether this fixes all issues, but it does strongly reduce memory usage as measured by
psutil
.My tests have shown that there are two solutions that avoid growing memory
usage:
float64_t[:]
) instead ofndarray[float64_t]
starti
andendi
as arguments to_roll_min_max_fixed
This commit implements both, since
_roll_min_max_fixed
doesn't usestarti
andendi
anyways.closes Rolling min/max gives malloc error #30726
closes Should Whitespaces be placed at the begging of a line #32466
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
I am unsure about testing: Does this need a test? If so, what would be a good way to go about testing? The tests I performed (example code in the linked issue) are probably very system specific.