Improve dirty check performance of set #2393

ssimek · 2020-05-21T14:16:19Z

I'm surprised that this trivial issue caused by an O(N^2) dirty check comparison went unfixed for so long. In my particular real-life scenario, this fix improved the performance by about an order of magnitude, almost all of which was spent in the dirty check according to profiler.

Also, is there a possiblity to include this in the 4.0.x branch and create a 4.0.5 release? We cannot move to a newer version so easily.

Thanks!

Related to #857

fredericDelaporte

This improves the Set case for #857. But the main concern of #857 was seemingly more about the dirty-check mechanism implementation principle, snapshot based with comparisons, which cannot scale for "huge" session. (Huge sessions are an anti-pattern, so this kind of trouble are not considered major usually.)

Hibernate (Java) provides a way to inject a custom dirty-check mechanism which could be "changes" based. For actually fixing #857, I think it should be ported.

So I do not think this PR qualifies as a fix for #857.
It should be qualified as a somewhat related improvement.

fredericDelaporte · 2020-05-21T18:11:51Z

src/NHibernate/Collection/Generic/SetHelpers/SetSnapShot.cs

@@ -8,6 +8,22 @@ namespace NHibernate.Collection.Generic.SetHelpers
 	internal class SetSnapShot<T> : ISetSnapshot<T>
 	{
 		private readonly List<T> _elements;


Changing this into a LinkedHashMap<T, T> (from NHibernate.Util namespace) could do most of the job, and that would be way simpler, also avoiding having two collections instead of one.
But it does not support a null key. It could be modified to have an option for supporting it, maybe. This option would cause it to no more respect IDictionary<TKey, TValue> contract which explicitly states null keys are not supported, so it is debatable.

In fact we do not need the ordering preserving capability either for the SetSnapshot. Set are not order sensitive, included in dirty check logics.

So we could use a regular Dictionary<T, T> instead of the List, with your additional field for handling the null case. The enumerator would have to be overloaded for also yielding the null if there is some, the count too for taking the null element into account, ...

My implementation focuses primarily on preventing any possible regressions as I am not so familiar with NH to anticipate what can and cannot appear in the snapshot. Other than being significantly more complex, LinkedHashMap would also not support multiple entries which would claim to be Equal. I assumed there was a reason for the List based implementation of the SetSnapShot, it just failed terribly due to the way TryGetValue is used in a loop against a similarly sized collection.

This snapshot helper class ends up being consumed only in PersistentGenericSet. It may by passed around in NHibernate code but only for being used inside some PersistentGenericSet instance.
So you can check what features it actually needs.
And as it is an internal class, it can be freely changed.

There is no strong guarantee of input. Also, an instance of SetSnapShot gets passed to and returned from GetOrphans, used in SequenceEquals, etc. Scenarios where the lookup by key does not even come into play, that's why I am creating the lookup dictionary very lazily.
I would need extremely deep knowledge of NH to feel confident changing the (even undocumented) contract of a class that's been there unchanged since 2.0.

fredericDelaporte · 2020-05-21T18:46:53Z

src/NHibernate/Collection/Generic/SetHelpers/SetSnapShot.cs

+					// be careful not to replace cache entries to preserve
+					// original semantics and return the first matching object
+					if (!_lookupCache.ContainsKey(snapshotElement))
+					{
+						_lookupCache.Add(snapshotElement, snapshotElement);
+					}


This is accidental semantic. Having the snapshot containing two times the same element makes no sense, and would anyway cause bugs if this was actually happening. So it is unneeded.

I was not willing to risk that, see above comment.

fredericDelaporte · 2020-05-21T18:53:45Z

src/NHibernate/Collection/Generic/SetHelpers/SetSnapShot.cs

@@ -8,6 +8,22 @@ namespace NHibernate.Collection.Generic.SetHelpers
 	internal class SetSnapShot<T> : ISetSnapshot<T>
 	{
 		private readonly List<T> _elements;


In fact we do not need the ordering preserving capability either for the SetSnapshot. Set are not order sensitive, included in dirty check logics.

So we could use a regular Dictionary<T, T> instead of the List, with your additional field for handling the null case. The enumerator would have to be overloaded for also yielding the null if there is some, the count too for taking the null element into account, ...

ssimek · 2020-05-21T19:11:40Z

As for this not being a fix for the original issue - the report says that dirty comparison is "slow", not that it cannot scale to huge sets. I don't care if it assigned as a fix or an improvement, just wanted to share what I consider an useful contribution.
In the very real case for which this fix was designed, the entire process which caught my attention took roughly 21 minutes before and 3 minutes after applying the fix. Note that those 3 minutes were spent doing actual work, 99.6% of the rest was spent inside SetSnapShot.TryGetValue calls. So more than two orders of magnitude improvement. I do not have the exact numbers, but the collections were not unimaginably large. Quadratic complexity doesn't need a lot to get out of hand.

fredericDelaporte · 2020-05-21T19:19:57Z

I am giving my rationale for classifying it, it should not be taken as any kind of reproach. Your contribution is anyway welcome, it pinpoints a point that can be improved.
Even if your proposed fix is changed for something else, it is still valuable.

ssimek · 2020-05-21T19:28:27Z

As for the other question - is there a chance of getting this (if accepted) into a 4.0.5 release, or do we have to keep our own fork?

fredericDelaporte · 2020-05-23T18:58:11Z

Well, the last 4.x release was 4.1.2GA, April 2018, for a regression appeared in 4.1. It is highly unlikely we will back-port a performance improvement all the way down to 4.0.x

fredericDelaporte · 2020-05-31T19:47:14Z

Replaced by #2394.

NH-1365 - set dirty check performance fix (nhibernate#857)

5b29f43

fredericDelaporte reviewed May 21, 2020

View reviewed changes

fredericDelaporte changed the title ~~NH-1365 - set dirty check performance fix (#857)~~ NH-1365 - set dirty check performance fix May 21, 2020

fredericDelaporte changed the title ~~NH-1365 - set dirty check performance fix~~ Improve dirty check performance of set May 21, 2020

fredericDelaporte added c: Core p: Minor t: Improvement labels May 21, 2020

fredericDelaporte linked an issue May 21, 2020 that may be closed by this pull request

NH-1365 - Dirty check does NOT scale #857

Open

fredericDelaporte removed a link to an issue May 21, 2020

NH-1365 - Dirty check does NOT scale #857

Open

fredericDelaporte reviewed May 21, 2020

View reviewed changes

maca88 mentioned this pull request May 21, 2020

Optimize PersistentGenericSet snapshot #2394

Merged

fredericDelaporte added the r: Replaced label May 31, 2020

fredericDelaporte closed this May 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve dirty check performance of set #2393

Improve dirty check performance of set #2393

ssimek commented May 21, 2020 •

edited by fredericDelaporte

Loading

fredericDelaporte left a comment •

edited

Loading

fredericDelaporte May 21, 2020

fredericDelaporte May 21, 2020

ssimek May 21, 2020

fredericDelaporte May 21, 2020

ssimek May 21, 2020

fredericDelaporte May 21, 2020

ssimek May 21, 2020

fredericDelaporte May 21, 2020

ssimek commented May 21, 2020

fredericDelaporte commented May 21, 2020

ssimek commented May 21, 2020

fredericDelaporte commented May 23, 2020

fredericDelaporte commented May 31, 2020

Improve dirty check performance of set #2393

Improve dirty check performance of set #2393

Conversation

ssimek commented May 21, 2020 • edited by fredericDelaporte Loading

fredericDelaporte left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssimek commented May 21, 2020

fredericDelaporte commented May 21, 2020

ssimek commented May 21, 2020

fredericDelaporte commented May 23, 2020

fredericDelaporte commented May 31, 2020

ssimek commented May 21, 2020 •

edited by fredericDelaporte

Loading

fredericDelaporte left a comment •

edited

Loading