-
Notifications
You must be signed in to change notification settings - Fork 935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve dirty check performance of set #2393
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This improves the Set
case for #857. But the main concern of #857 was seemingly more about the dirty-check mechanism implementation principle, snapshot based with comparisons, which cannot scale for "huge" session. (Huge sessions are an anti-pattern, so this kind of trouble are not considered major usually.)
Hibernate (Java) provides a way to inject a custom dirty-check mechanism which could be "changes" based. For actually fixing #857, I think it should be ported.
So I do not think this PR qualifies as a fix for #857.
It should be qualified as a somewhat related improvement.
@@ -8,6 +8,22 @@ namespace NHibernate.Collection.Generic.SetHelpers | |||
internal class SetSnapShot<T> : ISetSnapshot<T> | |||
{ | |||
private readonly List<T> _elements; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this into a LinkedHashMap<T, T>
(from NHibernate.Util
namespace) could do most of the job, and that would be way simpler, also avoiding having two collections instead of one.
But it does not support a null
key. It could be modified to have an option for supporting it, maybe. This option would cause it to no more respect IDictionary<TKey, TValue>
contract which explicitly states null
keys are not supported, so it is debatable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact we do not need the ordering preserving capability either for the SetSnapshot
. Set are not order sensitive, included in dirty check logics.
So we could use a regular Dictionary<T, T>
instead of the List
, with your additional field for handling the null
case. The enumerator would have to be overloaded for also yielding the null if there is some, the count too for taking the null element into account, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My implementation focuses primarily on preventing any possible regressions as I am not so familiar with NH to anticipate what can and cannot appear in the snapshot. Other than being significantly more complex, LinkedHashMap would also not support multiple entries which would claim to be Equal. I assumed there was a reason for the List based implementation of the SetSnapShot, it just failed terribly due to the way TryGetValue is used in a loop against a similarly sized collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This snapshot helper class ends up being consumed only in PersistentGenericSet
. It may by passed around in NHibernate code but only for being used inside some PersistentGenericSet
instance.
So you can check what features it actually needs.
And as it is an internal class, it can be freely changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no strong guarantee of input. Also, an instance of SetSnapShot gets passed to and returned from GetOrphans, used in SequenceEquals, etc. Scenarios where the lookup by key does not even come into play, that's why I am creating the lookup dictionary very lazily.
I would need extremely deep knowledge of NH to feel confident changing the (even undocumented) contract of a class that's been there unchanged since 2.0.
// be careful not to replace cache entries to preserve | ||
// original semantics and return the first matching object | ||
if (!_lookupCache.ContainsKey(snapshotElement)) | ||
{ | ||
_lookupCache.Add(snapshotElement, snapshotElement); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is accidental semantic. Having the snapshot containing two times the same element makes no sense, and would anyway cause bugs if this was actually happening. So it is unneeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not willing to risk that, see above comment.
@@ -8,6 +8,22 @@ namespace NHibernate.Collection.Generic.SetHelpers | |||
internal class SetSnapShot<T> : ISetSnapshot<T> | |||
{ | |||
private readonly List<T> _elements; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact we do not need the ordering preserving capability either for the SetSnapshot
. Set are not order sensitive, included in dirty check logics.
So we could use a regular Dictionary<T, T>
instead of the List
, with your additional field for handling the null
case. The enumerator would have to be overloaded for also yielding the null if there is some, the count too for taking the null element into account, ...
As for this not being a fix for the original issue - the report says that dirty comparison is "slow", not that it cannot scale to huge sets. I don't care if it assigned as a fix or an improvement, just wanted to share what I consider an useful contribution. |
I am giving my rationale for classifying it, it should not be taken as any kind of reproach. Your contribution is anyway welcome, it pinpoints a point that can be improved. |
As for the other question - is there a chance of getting this (if accepted) into a 4.0.5 release, or do we have to keep our own fork? |
Well, the last 4.x release was 4.1.2GA, April 2018, for a regression appeared in 4.1. It is highly unlikely we will back-port a performance improvement all the way down to 4.0.x |
Replaced by #2394. |
I'm surprised that this trivial issue caused by an O(N^2) dirty check comparison went unfixed for so long. In my particular real-life scenario, this fix improved the performance by about an order of magnitude, almost all of which was spent in the dirty check according to profiler.
Also, is there a possiblity to include this in the 4.0.x branch and create a 4.0.5 release? We cannot move to a newer version so easily.
Thanks!
Related to #857