Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dirty check performance of set #2393

Closed

Conversation

ssimek
Copy link

@ssimek ssimek commented May 21, 2020

I'm surprised that this trivial issue caused by an O(N^2) dirty check comparison went unfixed for so long. In my particular real-life scenario, this fix improved the performance by about an order of magnitude, almost all of which was spent in the dirty check according to profiler.

Also, is there a possiblity to include this in the 4.0.x branch and create a 4.0.5 release? We cannot move to a newer version so easily.

Thanks!

Related to #857

Copy link
Member

@fredericDelaporte fredericDelaporte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This improves the Set case for #857. But the main concern of #857 was seemingly more about the dirty-check mechanism implementation principle, snapshot based with comparisons, which cannot scale for "huge" session. (Huge sessions are an anti-pattern, so this kind of trouble are not considered major usually.)

Hibernate (Java) provides a way to inject a custom dirty-check mechanism which could be "changes" based. For actually fixing #857, I think it should be ported.

So I do not think this PR qualifies as a fix for #857.
It should be qualified as a somewhat related improvement.

@@ -8,6 +8,22 @@ namespace NHibernate.Collection.Generic.SetHelpers
internal class SetSnapShot<T> : ISetSnapshot<T>
{
private readonly List<T> _elements;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this into a LinkedHashMap<T, T> (from NHibernate.Util namespace) could do most of the job, and that would be way simpler, also avoiding having two collections instead of one.
But it does not support a null key. It could be modified to have an option for supporting it, maybe. This option would cause it to no more respect IDictionary<TKey, TValue> contract which explicitly states null keys are not supported, so it is debatable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact we do not need the ordering preserving capability either for the SetSnapshot. Set are not order sensitive, included in dirty check logics.

So we could use a regular Dictionary<T, T> instead of the List, with your additional field for handling the null case. The enumerator would have to be overloaded for also yielding the null if there is some, the count too for taking the null element into account, ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My implementation focuses primarily on preventing any possible regressions as I am not so familiar with NH to anticipate what can and cannot appear in the snapshot. Other than being significantly more complex, LinkedHashMap would also not support multiple entries which would claim to be Equal. I assumed there was a reason for the List based implementation of the SetSnapShot, it just failed terribly due to the way TryGetValue is used in a loop against a similarly sized collection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snapshot helper class ends up being consumed only in PersistentGenericSet. It may by passed around in NHibernate code but only for being used inside some PersistentGenericSet instance.
So you can check what features it actually needs.
And as it is an internal class, it can be freely changed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no strong guarantee of input. Also, an instance of SetSnapShot gets passed to and returned from GetOrphans, used in SequenceEquals, etc. Scenarios where the lookup by key does not even come into play, that's why I am creating the lookup dictionary very lazily.
I would need extremely deep knowledge of NH to feel confident changing the (even undocumented) contract of a class that's been there unchanged since 2.0.

@fredericDelaporte fredericDelaporte changed the title NH-1365 - set dirty check performance fix (#857) NH-1365 - set dirty check performance fix May 21, 2020
@fredericDelaporte fredericDelaporte changed the title NH-1365 - set dirty check performance fix Improve dirty check performance of set May 21, 2020
@fredericDelaporte fredericDelaporte linked an issue May 21, 2020 that may be closed by this pull request
Comment on lines +150 to +155
// be careful not to replace cache entries to preserve
// original semantics and return the first matching object
if (!_lookupCache.ContainsKey(snapshotElement))
{
_lookupCache.Add(snapshotElement, snapshotElement);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is accidental semantic. Having the snapshot containing two times the same element makes no sense, and would anyway cause bugs if this was actually happening. So it is unneeded.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not willing to risk that, see above comment.

@@ -8,6 +8,22 @@ namespace NHibernate.Collection.Generic.SetHelpers
internal class SetSnapShot<T> : ISetSnapshot<T>
{
private readonly List<T> _elements;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact we do not need the ordering preserving capability either for the SetSnapshot. Set are not order sensitive, included in dirty check logics.

So we could use a regular Dictionary<T, T> instead of the List, with your additional field for handling the null case. The enumerator would have to be overloaded for also yielding the null if there is some, the count too for taking the null element into account, ...

@ssimek
Copy link
Author

ssimek commented May 21, 2020

As for this not being a fix for the original issue - the report says that dirty comparison is "slow", not that it cannot scale to huge sets. I don't care if it assigned as a fix or an improvement, just wanted to share what I consider an useful contribution.
In the very real case for which this fix was designed, the entire process which caught my attention took roughly 21 minutes before and 3 minutes after applying the fix. Note that those 3 minutes were spent doing actual work, 99.6% of the rest was spent inside SetSnapShot.TryGetValue calls. So more than two orders of magnitude improvement. I do not have the exact numbers, but the collections were not unimaginably large. Quadratic complexity doesn't need a lot to get out of hand.

@fredericDelaporte
Copy link
Member

I am giving my rationale for classifying it, it should not be taken as any kind of reproach. Your contribution is anyway welcome, it pinpoints a point that can be improved.
Even if your proposed fix is changed for something else, it is still valuable.

@ssimek
Copy link
Author

ssimek commented May 21, 2020

As for the other question - is there a chance of getting this (if accepted) into a 4.0.5 release, or do we have to keep our own fork?

@fredericDelaporte
Copy link
Member

Well, the last 4.x release was 4.1.2GA, April 2018, for a regression appeared in 4.1. It is highly unlikely we will back-port a performance improvement all the way down to 4.0.x

@fredericDelaporte
Copy link
Member

Replaced by #2394.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants