Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for caching fetched relations with linq query provider #1952

Merged
merged 6 commits into from
Feb 24, 2019

Conversation

maca88
Copy link
Contributor

@maca88 maca88 commented Jan 2, 2019

This PR fixes #1195 as it adds the ability to cache all fetched relations (many-to-one/one-to-many). The solution uses a special query cache result builder that appends extra values when creating the cache list that is then passed to the IBatchableQueryCache.(Put\PutMany) method. When fetching a many-to-one relation an additional value is added to the row of the cache list which is the id of the fetched relation. When fetching a one-to-many relation two values are added, the first is the id of the fetched element and the second is the collection id, which is the id of the entity that contains that collection. When the cached list is retrieved form the cache, the query cache will assembly and initialize all objects from the cached query and afterwards a special method is called that will remove the extra values that were added by the result builder.

@fredericDelaporte fredericDelaporte added this to the 5.3 milestone Jan 23, 2019
@fredericDelaporte
Copy link
Member

I should have not validated it yet maybe: I have not checked (and not seen tests about) what would happen in case of Cartesian products due to the fetches. Won't it cache, in case of bags, duplicated collection items? And causes wrong data to be cached?

@maca88
Copy link
Contributor Author

maca88 commented Feb 4, 2019

I tested it by modifying the mappings to use bags instead and when I tried to fetch multiple bags I got an QueryException with message: Cannot simultaneously fetch multiple bags. Then I tried to change only one collection to be a bag and the bag collection had 16 items (where only 3 were unique) when the query was executed in the database and the same amount of items when executed for the second time where the results were retrieved from the cache. This means that the bag is cached with 16 items instead of 3 in this case, but the issue is not caused by this PR as the collection caching was not modified in this PR and the same issue occurs even if the query is not cached. This PR caches only the collection key which is then retrieved from the collection cache when assembling the query result from cache. I think that this issue is related to the FetchMany method which allows to create queries with cartesian products.

@bahusoid
Copy link
Member

bahusoid commented Feb 4, 2019

what would happen in case of Cartesian products due to the fetches

I'm also interested in this case. Could you please clarify how collections owners are cached in this case. Are they cached per row - the same owner is serialized for each row and then put to cache? Or there is some smart logic that serializes owner only once or even unique owner is put to cache only once?

@fredericDelaporte
Copy link
Member

For the bag mixed with sets case, I tend to think that before this PR, the bag would be loaded with duplicated elements due to the Cartesian products, but not cached. So loading afterward its owner in another session and lazy loading the bag should then yield correct results (from db) then cache correct results.

With this PR, we would now have wrong results cached for the bag, and later loading of the owner followed by a lazy load of the bag would retrieve these wrong results from cache.

Or am I wrong?

@maca88
Copy link
Contributor Author

maca88 commented Feb 5, 2019

In order to explain how this PR works I will use a simple one to many fetch first, for example:

order = s.Query<Order>()
  .WithOptions(o => o.SetCacheable(true))
  .FetchMany(x => x.OrderLines)
  .Where(x => x.OrderId == 10248)
  .ToList()
  .First();

When the query is executed for the first time it will cache the following data:

[
  1234679, -- timestamp
  [10248, 1, 10248], -- The first value is the order key, the second is the order line key and the last one is the collection key
  [10248, 2, 10248],
  [10248, 3, 10248]
]

Now one would ask the following questions:

Why are collection owners cached if this PR does not support caching a collection without its owner?

When I was doing this PR I didn't find an easy way to detect where the owner will be among the cached values especially when caching multiple collections.

Why are the collection items cached if the collection will be retrieved from the collection cache that have this information?

The idea was to be able to batch fetch all collection items at once when batch fetching is enabled so if we fetched 3 collection we would be able to get all collection items in one cache call, which is done in #1955.
Additionaly, the mentioned PR uses this information to know if it should batch fetch all of them when one of them is requested or only batch fetch them by the defined batch size.

When the query is executed for the second time the query result will be retrieved from the cache and the cached entites will be assembled by the TypeHelper.Assemble and for collections the newly added static method TypeHelper.InitializeCollections will be called that uses ICollectionPersistent.ForceInitialization method to initialize them. In our case the TypeHelper.InitializeCollections will be called 3 times for the same collection which is fine as ICollectionPersistent.ForceInitialization does nothing when the collection is already initialized.

Now if we have a cacheable query with a cartesian product the only difference is that we will have a lot more duplicated values. For the test I've made, the following data is cached:

[
  6346353453645824,
  [10248,"VINET",10248,1,"VINET",10248],
  [10248,"VINET",10248,2,"VINET",10248],
  [10248,"VINET",10248,3,"VINET",10248],
  [10248,"VINET",10274,1,"VINET",10248],
  [10248,"VINET",10274,2,"VINET",10248],
  [10248,"VINET",10274,3,"VINET",10248],
  [10248,"VINET",10295,1,"VINET",10248],
  [10248,"VINET",10295,2,"VINET",10248],
  [10248,"VINET",10295,3,"VINET",10248],
  [10248,"VINET",10737,1,"VINET",10248],
  [10248,"VINET",10737,2,"VINET",10248],
  [10248,"VINET",10737,3,"VINET",10248],
  [10248,"VINET",10739,1,"VINET",10248],
  [10248,"VINET",10739,2,"VINET",10248],
  [10248,"VINET",10739,3,"VINET",10248]
]

With this PR, we would now have wrong results cached for the bag, and later loading of the owner followed by a lazy load of the bag would retrieve these wrong results from cache.

I've made a test on the current master that shows that the bag can be cached with the duplicated elements, so the issue already exists.

@fredericDelaporte
Copy link
Member

fredericDelaporte commented Feb 24, 2019

Rebased for resolving conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NH-4078 - LINQ fetched collections aren't cached
3 participants