From 4f8ed1f0bd6eb0dfa21c381e2ed6c26f6c30ee66 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Attila=20M=C3=A9sz=C3=A1ros?= Date: Wed, 21 May 2025 14:21:29 +0200 Subject: [PATCH 01/18] blog: caching MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Attila Mészáros --- .../blog/news/primary-cache-for-next-recon.md | 62 +++++++++++++++++++ .../source/informer/InformerWrapper.java | 1 + 2 files changed, 63 insertions(+) create mode 100644 docs/content/en/blog/news/primary-cache-for-next-recon.md diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md new file mode 100644 index 0000000000..f1a58ed879 --- /dev/null +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -0,0 +1,62 @@ +--- +title: Custom resource change guarantees for next reconciliation +date: 2025-05-21 +author: >- + [Attila Mészáros](https://github.com/csviri) +--- + +We recently released v5.1 of Java Operator SDK. One of the highlights of this release is related to a topic of so-called +[allocated values](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#representing-allocated-values +) in Kubernetes. + +To sum up the problem, for example, if we create a resource in our controller that has a generated identifier - +in other words the new resource cannot be addressed only by using the values from the `.spec` of the resource - +we have to store it, commonly in the `.status` of the custom resource. However, operator frameworks cache resources +using informers, so the update that you made to the status of the custom resource will just eventually get into +the cache of the informer. If meanwhile some other event triggers the reconciliation, it can happen that you will +see the stale custom resource in the cache (in another word, the cache is eventually consistent). This is a problem +since you might not know at that point that the desired resources were already created, so it might happen that you try to +create it again. + +Java Operator SDK now out of the box provides a utility class [`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java) +if you use it, the framework guarantees that the next reconciliation will always receive the updated resource: + +```java + @Override + public UpdateControl reconcile( + StatusPatchCacheCustomResource resource, + Context context) { + + // omitted code + + var freshCopy = createFreshCopy(resource); + + freshCopy + .getStatus() + .setValue(statusWithAllocatedValue()); + + // using the utility instead of update control + var updated = + PrimaryUpdateAndCacheUtils.ssaPatchStatusAndCacheResource(resource, freshCopy, context); + return UpdateControl.noUpdate(); + } +``` + +This utility class will do the magic, but how does it work? Actually, there are multiple ways to solve this problem, +but in the end, we decided to provide only the mentioned approach. (If you want to dig deep see this [PR](https://github.com/operator-framework/java-operator-sdk/pull/2800/files)). + +The trick is that we can cache the resource of our update in an additional cache on top of the informer's cache. +If we read the resource, we first check if the resource is in the overlay cache and only read it from the Informers cache +if not present there. If the informer receives an event with that resource, we always remove the resource from the overlay +cache. But this **works only** if the update is done **with optimistic locking**. +So if the update fails on conflict, we simply read the resource from the server, apply your changes and try to update again +with optimistic locking. + +So why optimistic locking? (A bit simplified explanation) Note that if we do not update the resource with optimistic locking, it can happen that +another party does an update on resource just before we do. The informer receives the event from another party's update, +if we compare resource versions with this resource and previously cached resource (we used to do our update), +that would be different, and in general there is no elegant way to determine in general if this new version that +informer receives an event for is from an update that happened before or after our update. +(Note that informers watch can lose connection and other edge cases) +If we do an update with optimistic locking it simplifies the situation, we can easily have strong guarantees. + diff --git a/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerWrapper.java b/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerWrapper.java index c07ffdbf46..dc788f22dc 100644 --- a/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerWrapper.java +++ b/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerWrapper.java @@ -79,6 +79,7 @@ public void start() throws OperatorException { if (!configurationService.stopOnInformerErrorDuringStartup()) { informer.exceptionHandler((b, t) -> !ExceptionHandler.isDeserializationException(t)); } + // change thread name for easier debugging final var thread = Thread.currentThread(); final var name = thread.getName(); From 3c133b653baf78a239f186a0c77c71f5145c9e9e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Attila=20M=C3=A9sz=C3=A1ros?= Date: Wed, 21 May 2025 16:59:05 +0200 Subject: [PATCH 02/18] docs: blogpost about primary caching MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Attila Mészáros --- docs/content/en/blog/news/primary-cache-for-next-recon.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md index f1a58ed879..fd1c4ce387 100644 --- a/docs/content/en/blog/news/primary-cache-for-next-recon.md +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -58,5 +58,10 @@ if we compare resource versions with this resource and previously cached resourc that would be different, and in general there is no elegant way to determine in general if this new version that informer receives an event for is from an update that happened before or after our update. (Note that informers watch can lose connection and other edge cases) -If we do an update with optimistic locking it simplifies the situation, we can easily have strong guarantees. + +If we do an update with optimistic locking it simplifies the situation, we can easily have strong guarantees. +Since we know if the update with optimistic locking is successful, we had the fresh resource in our cache. +Thus, the next event we receive will be the one that is results of our update or a newer one. +So if we cache the resource in the overlay cache we know that with the next event, we can remove it from there. + From e9e9508c6dcb027150cbcbbbb5a036c74de638b1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Attila=20M=C3=A9sz=C3=A1ros?= Date: Wed, 21 May 2025 17:07:24 +0200 Subject: [PATCH 03/18] wip MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Attila Mészáros --- docs/content/en/blog/news/primary-cache-for-next-recon.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md index fd1c4ce387..f680d60125 100644 --- a/docs/content/en/blog/news/primary-cache-for-next-recon.md +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -29,8 +29,7 @@ if you use it, the framework guarantees that the next reconciliation will always // omitted code - var freshCopy = createFreshCopy(resource); - + var freshCopy = createFreshCopy(resource); // need fresh copy just because we use the SSA version of update freshCopy .getStatus() .setValue(statusWithAllocatedValue()); @@ -63,5 +62,5 @@ If we do an update with optimistic locking it simplifies the situation, we can e Since we know if the update with optimistic locking is successful, we had the fresh resource in our cache. Thus, the next event we receive will be the one that is results of our update or a newer one. So if we cache the resource in the overlay cache we know that with the next event, we can remove it from there. - - +If the update with optimistic locking fails, we can wait until the informer's cache is populated with next resource +version and retry. From 3388a1a3952d07e6085511699279ca18796bb658 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Attila=20M=C3=A9sz=C3=A1ros?= Date: Wed, 21 May 2025 17:49:34 +0200 Subject: [PATCH 04/18] wip MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Attila Mészáros --- .../processing/event/source/informer/InformerWrapper.java | 1 - 1 file changed, 1 deletion(-) diff --git a/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerWrapper.java b/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerWrapper.java index dc788f22dc..c07ffdbf46 100644 --- a/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerWrapper.java +++ b/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerWrapper.java @@ -79,7 +79,6 @@ public void start() throws OperatorException { if (!configurationService.stopOnInformerErrorDuringStartup()) { informer.exceptionHandler((b, t) -> !ExceptionHandler.isDeserializationException(t)); } - // change thread name for easier debugging final var thread = Thread.currentThread(); final var name = thread.getName(); From 343a828c7c5d2a2754ffc72ee3db3a4fa02f9fcf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Attila=20M=C3=A9sz=C3=A1ros?= Date: Thu, 22 May 2025 08:42:31 +0200 Subject: [PATCH 05/18] Update docs/content/en/blog/news/primary-cache-for-next-recon.md Co-authored-by: Martin Stefanko --- docs/content/en/blog/news/primary-cache-for-next-recon.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md index f680d60125..76b84a2072 100644 --- a/docs/content/en/blog/news/primary-cache-for-next-recon.md +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -60,7 +60,7 @@ informer receives an event for is from an update that happened before or after o If we do an update with optimistic locking it simplifies the situation, we can easily have strong guarantees. Since we know if the update with optimistic locking is successful, we had the fresh resource in our cache. -Thus, the next event we receive will be the one that is results of our update or a newer one. +Thus, the next event we receive will be the one that is the result of our update or a newer one. So if we cache the resource in the overlay cache we know that with the next event, we can remove it from there. If the update with optimistic locking fails, we can wait until the informer's cache is populated with next resource version and retry. From 171b5266ee57a0a421ece01ab67aeef7c9bb5f13 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Attila=20M=C3=A9sz=C3=A1ros?= Date: Thu, 22 May 2025 08:52:04 +0200 Subject: [PATCH 06/18] Update primary-cache-for-next-recon.md --- .../blog/news/primary-cache-for-next-recon.md | 33 +++++++++---------- 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md index 76b84a2072..b82449a335 100644 --- a/docs/content/en/blog/news/primary-cache-for-next-recon.md +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -10,13 +10,13 @@ We recently released v5.1 of Java Operator SDK. One of the highlights of this re ) in Kubernetes. To sum up the problem, for example, if we create a resource in our controller that has a generated identifier - -in other words the new resource cannot be addressed only by using the values from the `.spec` of the resource - +in other words, the new resource cannot be addressed only by using the values from the `.spec` of the resource - we have to store it, commonly in the `.status` of the custom resource. However, operator frameworks cache resources using informers, so the update that you made to the status of the custom resource will just eventually get into the cache of the informer. If meanwhile some other event triggers the reconciliation, it can happen that you will see the stale custom resource in the cache (in another word, the cache is eventually consistent). This is a problem since you might not know at that point that the desired resources were already created, so it might happen that you try to -create it again. +create them again. Java Operator SDK now out of the box provides a utility class [`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java) if you use it, the framework guarantees that the next reconciliation will always receive the updated resource: @@ -41,26 +41,25 @@ if you use it, the framework guarantees that the next reconciliation will always } ``` -This utility class will do the magic, but how does it work? Actually, there are multiple ways to solve this problem, -but in the end, we decided to provide only the mentioned approach. (If you want to dig deep see this [PR](https://github.com/operator-framework/java-operator-sdk/pull/2800/files)). +This utility class will do the magic, but how does it work? There are multiple ways to solve this problem, +but ultimately, we only provided the mentioned approach. (If you want to dig deep, see this [PR](https://github.com/operator-framework/java-operator-sdk/pull/2800/files)). -The trick is that we can cache the resource of our update in an additional cache on top of the informer's cache. -If we read the resource, we first check if the resource is in the overlay cache and only read it from the Informers cache +The trick is to cache the resource of our update in an additional cache on top of the informer's cache. +If we read the resource, we first check if it is in the overlay cache and only read it from the Informers cache if not present there. If the informer receives an event with that resource, we always remove the resource from the overlay cache. But this **works only** if the update is done **with optimistic locking**. -So if the update fails on conflict, we simply read the resource from the server, apply your changes and try to update again -with optimistic locking. +So if the update fails on conflict, we simply wait and poll the informer cache until there is a new resource version, apply your changes, +and try to update again with optimistic locking. So why optimistic locking? (A bit simplified explanation) Note that if we do not update the resource with optimistic locking, it can happen that -another party does an update on resource just before we do. The informer receives the event from another party's update, -if we compare resource versions with this resource and previously cached resource (we used to do our update), -that would be different, and in general there is no elegant way to determine in general if this new version that -informer receives an event for is from an update that happened before or after our update. +another party does an update on the resource just before we do. The informer receives the event from another party's update, +if we would compare resource versions with this resource and the previously cached resource (response from our update), +that would be different, and in general there is no elegant way to determine if this new version that +informer receives an event from an update that happened before or after our update. (Note that informers watch can lose connection and other edge cases) -If we do an update with optimistic locking it simplifies the situation, we can easily have strong guarantees. -Since we know if the update with optimistic locking is successful, we had the fresh resource in our cache. +If we do an update with optimistic locking, it simplifies the situation, we can easily have strong guarantees. +Since we know if the update with optimistic locking is successful, we have the fresh resource in our cache. Thus, the next event we receive will be the one that is the result of our update or a newer one. -So if we cache the resource in the overlay cache we know that with the next event, we can remove it from there. -If the update with optimistic locking fails, we can wait until the informer's cache is populated with next resource -version and retry. +So if we cache the resource in the overlay cache from the response, we know that with the next event, we can remove it from there. + From 629434201dfa881eca4e28443a840e9993d06270 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Attila=20M=C3=A9sz=C3=A1ros?= Date: Thu, 22 May 2025 09:23:01 +0200 Subject: [PATCH 07/18] mermaid and improvement MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Attila Mészáros --- .../blog/news/primary-cache-for-next-recon.md | 50 ++++++++++++------- 1 file changed, 33 insertions(+), 17 deletions(-) diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md index b82449a335..867c8e3028 100644 --- a/docs/content/en/blog/news/primary-cache-for-next-recon.md +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -9,13 +9,13 @@ We recently released v5.1 of Java Operator SDK. One of the highlights of this re [allocated values](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#representing-allocated-values ) in Kubernetes. -To sum up the problem, for example, if we create a resource in our controller that has a generated identifier - -in other words, the new resource cannot be addressed only by using the values from the `.spec` of the resource - -we have to store it, commonly in the `.status` of the custom resource. However, operator frameworks cache resources +To sum up the problem, for example, if we create a resource in our controller that has a generated ID - +in other words, we cannot address the resource only by using the values from the `.spec` - +we have to store it, usually in the `.status` of the custom resource. However, operator frameworks cache resources using informers, so the update that you made to the status of the custom resource will just eventually get into -the cache of the informer. If meanwhile some other event triggers the reconciliation, it can happen that you will -see the stale custom resource in the cache (in another word, the cache is eventually consistent). This is a problem -since you might not know at that point that the desired resources were already created, so it might happen that you try to +the cache of the informer. If meanwhile some other event triggers the reconciliation, it can happen that we will +see the stale custom resource in the cache (in another word, the cache is eventually consistent) without the generated ID in the status. +This is a problem since we might not know at that point that the desired resources were already created, so it might happen that you try to create them again. Java Operator SDK now out of the box provides a utility class [`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java) @@ -41,15 +41,16 @@ if you use it, the framework guarantees that the next reconciliation will always } ``` -This utility class will do the magic, but how does it work? There are multiple ways to solve this problem, -but ultimately, we only provided the mentioned approach. (If you want to dig deep, see this [PR](https://github.com/operator-framework/java-operator-sdk/pull/2800/files)). +This utility class will do the magic for you. But how does it work? +There are multiple ways to solve this problem, +but ultimately, we only provided the solution below. (If you want to dig deep in alternatives, see this [PR](https://github.com/operator-framework/java-operator-sdk/pull/2800/files)). -The trick is to cache the resource of our update in an additional cache on top of the informer's cache. -If we read the resource, we first check if it is in the overlay cache and only read it from the Informers cache -if not present there. If the informer receives an event with that resource, we always remove the resource from the overlay -cache. But this **works only** if the update is done **with optimistic locking**. -So if the update fails on conflict, we simply wait and poll the informer cache until there is a new resource version, apply your changes, -and try to update again with optimistic locking. +The trick is to cache the resource from the response of our update in an additional cache on top of the informer's cache. +If we read the resource, we first check if it is in the overlay cache and read it from there if present, otherwise read it from the cache of the informer. +If the informer receives an event with that resource, we always remove the resource from the overlay +cache, since that is a more recent resource. But this **works only** if the update is done **with optimistic locking**. +So if the update fails on conflict, we simply wait and poll the informer cache until there is a new resource version, +and try to update again using the new resource (applied our changes again) with optimistic locking. So why optimistic locking? (A bit simplified explanation) Note that if we do not update the resource with optimistic locking, it can happen that another party does an update on the resource just before we do. The informer receives the event from another party's update, @@ -58,8 +59,23 @@ that would be different, and in general there is no elegant way to determine if informer receives an event from an update that happened before or after our update. (Note that informers watch can lose connection and other edge cases) -If we do an update with optimistic locking, it simplifies the situation, we can easily have strong guarantees. +```mermaid +flowchart TD + A["Update Resource with Lock"] --> B{"Is Successful"} + B -- Fails on conflict --> D["Poll the Informer until resource changes"] + D --> n4["Apply desired changes on the resource"] + B -- Yes --> n2["Still the original resource in informer cache"] + n2 -- Yes --> C["Cache the resource in overlay cache"] + n2 -- NO --> n3["We know there is already a fesh resource in Informer"] + n4 --> A + +``` + +If we do our update with optimistic locking, it simplifies the situation, we can easily have strong guarantees. Since we know if the update with optimistic locking is successful, we have the fresh resource in our cache. -Thus, the next event we receive will be the one that is the result of our update or a newer one. +Thus, the next event we receive will be the one that is the result of our update +(or a newer one if somebody did an update after, but that is fine since it will contain our allocated values). So if we cache the resource in the overlay cache from the response, we know that with the next event, we can remove it from there. - +Note that we store the result in overlay cache only if at that time we still have the original resource in cache, +if the cache already updated. This means that we already received a new event after our update, +so we have a fresh resource in the informer cache. From 466facf65d5c2ff501a2025ebd793dc798e88b31 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Attila=20M=C3=A9sz=C3=A1ros?= Date: Thu, 22 May 2025 09:24:35 +0200 Subject: [PATCH 08/18] date MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Attila Mészáros --- docs/content/en/blog/news/primary-cache-for-next-recon.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md index 867c8e3028..1d2f66580f 100644 --- a/docs/content/en/blog/news/primary-cache-for-next-recon.md +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -1,6 +1,6 @@ --- title: Custom resource change guarantees for next reconciliation -date: 2025-05-21 +date: 2025-05-22 author: >- [Attila Mészáros](https://github.com/csviri) --- From f41390cb135c18f886805eee7cbf071cc0645287 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Attila=20M=C3=A9sz=C3=A1ros?= Date: Thu, 22 May 2025 09:31:24 +0200 Subject: [PATCH 09/18] title MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Attila Mészáros --- docs/content/en/blog/news/primary-cache-for-next-recon.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md index 1d2f66580f..fb34e35fde 100644 --- a/docs/content/en/blog/news/primary-cache-for-next-recon.md +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -1,5 +1,5 @@ --- -title: Custom resource change guarantees for next reconciliation +title: How to guarantee allocated values for reconciliation date: 2025-05-22 author: >- [Attila Mészáros](https://github.com/csviri) From 594598e359f1a406cc891299462330335ee9b346 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Attila=20M=C3=A9sz=C3=A1ros?= Date: Thu, 22 May 2025 09:41:54 +0200 Subject: [PATCH 10/18] docs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Attila Mészáros --- docs/content/en/blog/news/primary-cache-for-next-recon.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md index fb34e35fde..ad66624d80 100644 --- a/docs/content/en/blog/news/primary-cache-for-next-recon.md +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -1,5 +1,5 @@ --- -title: How to guarantee allocated values for reconciliation +title: How to guarantee allocated values for next reconciliation date: 2025-05-22 author: >- [Attila Mészáros](https://github.com/csviri) From 48e9c816de0c07d9e621035c2e6df1f8a5cc5eff Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Attila=20M=C3=A9sz=C3=A1ros?= Date: Thu, 22 May 2025 10:18:35 +0200 Subject: [PATCH 11/18] improve MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Attila Mészáros --- docs/content/en/blog/news/primary-cache-for-next-recon.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md index ad66624d80..b47aad0a39 100644 --- a/docs/content/en/blog/news/primary-cache-for-next-recon.md +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -59,11 +59,12 @@ that would be different, and in general there is no elegant way to determine if informer receives an event from an update that happened before or after our update. (Note that informers watch can lose connection and other edge cases) + ```mermaid flowchart TD A["Update Resource with Lock"] --> B{"Is Successful"} - B -- Fails on conflict --> D["Poll the Informer until resource changes"] - D --> n4["Apply desired changes on the resource"] + B -- Fails on conflict --> D["Poll the Informer cache until resource changes"] + D --> n4["Apply desired changes on the resource again"] B -- Yes --> n2["Still the original resource in informer cache"] n2 -- Yes --> C["Cache the resource in overlay cache"] n2 -- NO --> n3["We know there is already a fesh resource in Informer"] @@ -71,6 +72,7 @@ flowchart TD ``` + If we do our update with optimistic locking, it simplifies the situation, we can easily have strong guarantees. Since we know if the update with optimistic locking is successful, we have the fresh resource in our cache. Thus, the next event we receive will be the one that is the result of our update From 9e74dd211b232729ff3d131104fafd27d7e7fe88 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Attila=20M=C3=A9sz=C3=A1ros?= Date: Thu, 22 May 2025 12:39:13 +0200 Subject: [PATCH 12/18] wording MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Attila Mészáros --- docs/content/en/blog/news/primary-cache-for-next-recon.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md index b47aad0a39..eb3971b40c 100644 --- a/docs/content/en/blog/news/primary-cache-for-next-recon.md +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -9,9 +9,9 @@ We recently released v5.1 of Java Operator SDK. One of the highlights of this re [allocated values](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#representing-allocated-values ) in Kubernetes. -To sum up the problem, for example, if we create a resource in our controller that has a generated ID - +To sum up the problem, let's say if we create a resource from our controller that has a generated ID - in other words, we cannot address the resource only by using the values from the `.spec` - -we have to store it, usually in the `.status` of the custom resource. However, operator frameworks cache resources +we have to store this ID, usually in the `.status` of the custom resource. However, operator frameworks cache resources using informers, so the update that you made to the status of the custom resource will just eventually get into the cache of the informer. If meanwhile some other event triggers the reconciliation, it can happen that we will see the stale custom resource in the cache (in another word, the cache is eventually consistent) without the generated ID in the status. From a18765dc17f40e5250e321cabf4a15d5173c647d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Attila=20M=C3=A9sz=C3=A1ros?= Date: Thu, 22 May 2025 12:45:57 +0200 Subject: [PATCH 13/18] improve MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Attila Mészáros --- .../blog/news/primary-cache-for-next-recon.md | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md index eb3971b40c..db1e734656 100644 --- a/docs/content/en/blog/news/primary-cache-for-next-recon.md +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -7,11 +7,12 @@ author: >- We recently released v5.1 of Java Operator SDK. One of the highlights of this release is related to a topic of so-called [allocated values](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#representing-allocated-values -) in Kubernetes. +). -To sum up the problem, let's say if we create a resource from our controller that has a generated ID - -in other words, we cannot address the resource only by using the values from the `.spec` - -we have to store this ID, usually in the `.status` of the custom resource. However, operator frameworks cache resources +To describe the problem, let's say if we create a resource from our controller that has a generated ID +we have to store this ID, usually in the `.status` of the custom resource. +(In other words, we cannot address the resource only by using the values from the `.spec`.) +However, operator frameworks cache resources using informers, so the update that you made to the status of the custom resource will just eventually get into the cache of the informer. If meanwhile some other event triggers the reconciliation, it can happen that we will see the stale custom resource in the cache (in another word, the cache is eventually consistent) without the generated ID in the status. @@ -47,7 +48,7 @@ but ultimately, we only provided the solution below. (If you want to dig deep in The trick is to cache the resource from the response of our update in an additional cache on top of the informer's cache. If we read the resource, we first check if it is in the overlay cache and read it from there if present, otherwise read it from the cache of the informer. -If the informer receives an event with that resource, we always remove the resource from the overlay +If the informer receives an event with a fresh resource, we always remove the resource from the overlay cache, since that is a more recent resource. But this **works only** if the update is done **with optimistic locking**. So if the update fails on conflict, we simply wait and poll the informer cache until there is a new resource version, and try to update again using the new resource (applied our changes again) with optimistic locking. @@ -57,7 +58,7 @@ another party does an update on the resource just before we do. The informer rec if we would compare resource versions with this resource and the previously cached resource (response from our update), that would be different, and in general there is no elegant way to determine if this new version that informer receives an event from an update that happened before or after our update. -(Note that informers watch can lose connection and other edge cases) +(Note that informer's watch can lose connection and other edge cases) ```mermaid @@ -72,12 +73,11 @@ flowchart TD ``` - -If we do our update with optimistic locking, it simplifies the situation, we can easily have strong guarantees. -Since we know if the update with optimistic locking is successful, we have the fresh resource in our cache. +If we do our update with optimistic locking, it simplifies the situation, and we can have strong guarantees. +Since we know if the update with optimistic locking is successful, we have the up-to-date resource in our caches. Thus, the next event we receive will be the one that is the result of our update (or a newer one if somebody did an update after, but that is fine since it will contain our allocated values). So if we cache the resource in the overlay cache from the response, we know that with the next event, we can remove it from there. -Note that we store the result in overlay cache only if at that time we still have the original resource in cache, -if the cache already updated. This means that we already received a new event after our update, -so we have a fresh resource in the informer cache. +Note that we store the result in overlay cache only if at that time we still have the original resource in cache. +If the cache already updated, that means that we already received a new event after our update, +so we have a fresh resource in the informer cache already. From 3517ef1c5cc594f2c092699c87e91460d55ab9d1 Mon Sep 17 00:00:00 2001 From: Chris Laprun Date: Fri, 23 May 2025 12:09:23 +0200 Subject: [PATCH 14/18] docs: start improving wording Signed-off-by: Chris Laprun --- .../blog/news/primary-cache-for-next-recon.md | 94 +++++++++++-------- 1 file changed, 53 insertions(+), 41 deletions(-) diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md index db1e734656..375a6999ca 100644 --- a/docs/content/en/blog/news/primary-cache-for-next-recon.md +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -1,66 +1,77 @@ --- -title: How to guarantee allocated values for next reconciliation +title: How to guarantee allocated values for next reconciliation date: 2025-05-22 author: >- - [Attila Mészáros](https://github.com/csviri) + [Attila Mészáros](https://github.com/csviri) --- -We recently released v5.1 of Java Operator SDK. One of the highlights of this release is related to a topic of so-called +We recently released v5.1 of Java Operator SDK (JOSDK). One of the highlights of this release is related to a topic of +so-called [allocated values](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#representing-allocated-values ). -To describe the problem, let's say if we create a resource from our controller that has a generated ID -we have to store this ID, usually in the `.status` of the custom resource. -(In other words, we cannot address the resource only by using the values from the `.spec`.) -However, operator frameworks cache resources -using informers, so the update that you made to the status of the custom resource will just eventually get into -the cache of the informer. If meanwhile some other event triggers the reconciliation, it can happen that we will -see the stale custom resource in the cache (in another word, the cache is eventually consistent) without the generated ID in the status. -This is a problem since we might not know at that point that the desired resources were already created, so it might happen that you try to -create them again. +To describe the problem, let's say that our controller needs to create a resource that has a generated identifier, i.e. +a resource which identifier cannot be directely derived from the custom resource's desired state as specified in its +`spec` field. In order to record the fact that the resource was successfully created, and to avoid attempting to +recreate the resource again in subsequent reconciliations, it is typical for this type of controller to store the +generated identifier in the custom resource's `status` field. -Java Operator SDK now out of the box provides a utility class [`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java) -if you use it, the framework guarantees that the next reconciliation will always receive the updated resource: +The Java Operator SDK relies on the informers' cache to retrieve resources. These caches, however, are only guaranteed +to be eventually consistent. It could happen, then, that, if some other event occurs, that would result in a new +reconciliation, **before** the update that's been made to our resource status has the chance to be propagated first to +the cluster and then back to the informer cache, that the resource in the informer cache does **not** contain the latest +version as modified by the reconciler. This would result in a new reconciliation where the generated identifier would be +missing from the resource status and, therefore, another attempt to create the resource by the reconciler, which is not +what we'd like. + +Java Operator SDK now provides a utility class [ +`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java) +to handle this particular use case. Using that overlay cache, your reconciler is guaranteed to see the most up-to-date +version of the resource on the next reconciliation: ```java - @Override - public UpdateControl reconcile( - StatusPatchCacheCustomResource resource, - Context context) { - + +@Override +public UpdateControl reconcile( + StatusPatchCacheCustomResource resource, + Context context) { + // omitted code - + var freshCopy = createFreshCopy(resource); // need fresh copy just because we use the SSA version of update freshCopy - .getStatus() - .setValue(statusWithAllocatedValue()); + .getStatus() + .setValue(statusWithAllocatedValue()); // using the utility instead of update control var updated = - PrimaryUpdateAndCacheUtils.ssaPatchStatusAndCacheResource(resource, freshCopy, context); + PrimaryUpdateAndCacheUtils.ssaPatchStatusAndCacheResource(resource, freshCopy, context); return UpdateControl.noUpdate(); - } +} ``` -This utility class will do the magic for you. But how does it work? -There are multiple ways to solve this problem, -but ultimately, we only provided the solution below. (If you want to dig deep in alternatives, see this [PR](https://github.com/operator-framework/java-operator-sdk/pull/2800/files)). +How does `PrimaryUpdateAndCacheUtils` work? +There are multiple ways to solve this problem, but ultimately, we only provide the solution described below. (If you +want to dig deep in alternatives, see +this [PR](https://github.com/operator-framework/java-operator-sdk/pull/2800/files)). -The trick is to cache the resource from the response of our update in an additional cache on top of the informer's cache. -If we read the resource, we first check if it is in the overlay cache and read it from there if present, otherwise read it from the cache of the informer. -If the informer receives an event with a fresh resource, we always remove the resource from the overlay +The trick is to cache the resource from the response of our update in an additional cache on top of the informer's +cache. If we read the resource, we first check if it is in the overlay cache and read it from there if present, +otherwise read it from the cache of the informer. If the informer receives an event with a fresh resource, we always +remove the resource from the overlay cache, since that is a more recent resource. But this **works only** if the update is done **with optimistic locking**. So if the update fails on conflict, we simply wait and poll the informer cache until there is a new resource version, and try to update again using the new resource (applied our changes again) with optimistic locking. -So why optimistic locking? (A bit simplified explanation) Note that if we do not update the resource with optimistic locking, it can happen that -another party does an update on the resource just before we do. The informer receives the event from another party's update, -if we would compare resource versions with this resource and the previously cached resource (response from our update), -that would be different, and in general there is no elegant way to determine if this new version that -informer receives an event from an update that happened before or after our update. +So why optimistic locking? (A bit simplified explanation) Note that if we do not update the resource with optimistic +locking, it can happen that +another party does an update on the resource just before we do. The informer receives the event from another party's +update, +if we would compare resource versions with this resource and the previously cached resource (response from our update), +that would be different, and in general there is no elegant way to determine if this new version that +informer receives an event from an update that happened before or after our update. (Note that informer's watch can lose connection and other edge cases) - ```mermaid flowchart TD A["Update Resource with Lock"] --> B{"Is Successful"} @@ -74,10 +85,11 @@ flowchart TD ``` If we do our update with optimistic locking, it simplifies the situation, and we can have strong guarantees. -Since we know if the update with optimistic locking is successful, we have the up-to-date resource in our caches. -Thus, the next event we receive will be the one that is the result of our update -(or a newer one if somebody did an update after, but that is fine since it will contain our allocated values). -So if we cache the resource in the overlay cache from the response, we know that with the next event, we can remove it from there. +Since we know if the update with optimistic locking is successful, we have the up-to-date resource in our caches. +Thus, the next event we receive will be the one that is the result of our update +(or a newer one if somebody did an update after, but that is fine since it will contain our allocated values). +So if we cache the resource in the overlay cache from the response, we know that with the next event, we can remove it +from there. Note that we store the result in overlay cache only if at that time we still have the original resource in cache. -If the cache already updated, that means that we already received a new event after our update, +If the cache already updated, that means that we already received a new event after our update, so we have a fresh resource in the informer cache already. From 85a1be78640924bcd228e49e395cf6f09133338b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Attila=20M=C3=A9sz=C3=A1ros?= Date: Fri, 23 May 2025 15:13:58 +0200 Subject: [PATCH 15/18] wording MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Attila Mészáros --- docs/content/en/blog/news/primary-cache-for-next-recon.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md index 375a6999ca..e6987a9d9a 100644 --- a/docs/content/en/blog/news/primary-cache-for-next-recon.md +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -59,7 +59,7 @@ The trick is to cache the resource from the response of our update in an additio cache. If we read the resource, we first check if it is in the overlay cache and read it from there if present, otherwise read it from the cache of the informer. If the informer receives an event with a fresh resource, we always remove the resource from the overlay -cache, since that is a more recent resource. But this **works only** if the update is done **with optimistic locking**. +cache, since that is a more recent resource. But this **works only** if our request update is using **with optimistic locking**. So if the update fails on conflict, we simply wait and poll the informer cache until there is a new resource version, and try to update again using the new resource (applied our changes again) with optimistic locking. From 328f40add8e56c95a002909aa86ce68277180706 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Attila=20M=C3=A9sz=C3=A1ros?= Date: Fri, 23 May 2025 17:38:00 +0200 Subject: [PATCH 16/18] comment improve MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Attila Mészáros --- docs/content/en/blog/news/primary-cache-for-next-recon.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md index e6987a9d9a..a8ffab1cd5 100644 --- a/docs/content/en/blog/news/primary-cache-for-next-recon.md +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -43,7 +43,7 @@ public UpdateControl reconcile( .getStatus() .setValue(statusWithAllocatedValue()); - // using the utility instead of update control + // using the utility instead of update control to patch the resource status var updated = PrimaryUpdateAndCacheUtils.ssaPatchStatusAndCacheResource(resource, freshCopy, context); return UpdateControl.noUpdate(); From 88f22b4b10d42faa515281f3580200d5a6ab22c4 Mon Sep 17 00:00:00 2001 From: Chris Laprun Date: Fri, 23 May 2025 20:00:04 +0200 Subject: [PATCH 17/18] docs: improve Signed-off-by: Chris Laprun --- .../blog/news/primary-cache-for-next-recon.md | 70 ++++++++++--------- 1 file changed, 36 insertions(+), 34 deletions(-) diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md index a8ffab1cd5..8f22ddb7b6 100644 --- a/docs/content/en/blog/news/primary-cache-for-next-recon.md +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -51,45 +51,47 @@ public UpdateControl reconcile( ``` How does `PrimaryUpdateAndCacheUtils` work? -There are multiple ways to solve this problem, but ultimately, we only provide the solution described below. (If you +There are multiple ways to solve this problem, but ultimately, we only provide the solution described below. If you want to dig deep in alternatives, see -this [PR](https://github.com/operator-framework/java-operator-sdk/pull/2800/files)). - -The trick is to cache the resource from the response of our update in an additional cache on top of the informer's -cache. If we read the resource, we first check if it is in the overlay cache and read it from there if present, -otherwise read it from the cache of the informer. If the informer receives an event with a fresh resource, we always -remove the resource from the overlay -cache, since that is a more recent resource. But this **works only** if our request update is using **with optimistic locking**. -So if the update fails on conflict, we simply wait and poll the informer cache until there is a new resource version, -and try to update again using the new resource (applied our changes again) with optimistic locking. - -So why optimistic locking? (A bit simplified explanation) Note that if we do not update the resource with optimistic -locking, it can happen that -another party does an update on the resource just before we do. The informer receives the event from another party's -update, -if we would compare resource versions with this resource and the previously cached resource (response from our update), -that would be different, and in general there is no elegant way to determine if this new version that -informer receives an event from an update that happened before or after our update. -(Note that informer's watch can lose connection and other edge cases) +this [PR](https://github.com/operator-framework/java-operator-sdk/pull/2800/files). + +The trick is to intercept the resource that the reconciler updated and cache that version in an additional cache on top +of the informer's cache. Subsequently, if the reconciler needs to read the resource, the SDK will first check if it is +in the overlay cache and read it from there if present, otherwise read it from the informer's cache. If the informer +receives an event with a fresh resource, we always remove the resource from the overlay cache, since that is a more +recent resource. But this **works only** if the reconciler updates the resource using **with optimistic locking**, which +is handled for you by `PrimaryUpdateAndCacheUtils` provided you pass it a "fresh" (i.e. a version of the resource that +only contains the fields you care about being updated) copy of the resource since Server-Side Apply will be used +underneath. If the update fails on conflict, because the resource has already been updated on the cluster before we got +the chance to get our update in, we simply wait and poll the informer cache until the new resource version from the +server appears in the informer's cache, +and then try to apply our updates to the resource again using the updated version from the server, again with optimistic +locking. + +So why is optimistic locking required? We hinted at it above, but the gist of it, is that if another party updates the +resource before we get a chance to, we wouldn't be able to properly handle the resulting situation correctly in all +cases. The informer would receive that new event before our own update would get a chance to propagate. Without +optimistic locking, there wouldn't be a fail-proof way to determine which update should prevail (i.e. which occurred +first), in particular in the event of the informer losing the connection to the cluster or other edge cases (the joys of +distributed computing!). + +Optimistic locking simplifies the situation and provides us with stronger guarantees: if the update succeeds, then we +can be sure we have the proper resource version in our caches. The next event will contain our update in all cases. +Because we know that, we can also be sure that we can evict the cached resource in the overlay cache whenever we receive +a new event. The overlay cache is only used if the SDK detects that the original resource (i.e. the one before we +applied our status update in the example above) is still in the informer's cache. + +The following diagram sums up the process: ```mermaid flowchart TD - A["Update Resource with Lock"] --> B{"Is Successful"} - B -- Fails on conflict --> D["Poll the Informer cache until resource changes"] - D --> n4["Apply desired changes on the resource again"] - B -- Yes --> n2["Still the original resource in informer cache"] + A["Update Resource with Lock"] --> B{"Successful?"} + B -- Fails on conflict --> D{"Resource updated from cluster?"} + D -- " No: poll until updated " --> D + D -- Yes --> n4["Apply desired changes on the resource again"] + B -- Yes --> n2{"Original resource still in informer cache?"} n2 -- Yes --> C["Cache the resource in overlay cache"] - n2 -- NO --> n3["We know there is already a fesh resource in Informer"] + n2 -- No --> n3["Informer cache already contains up-to-date version, do not use overlay cache"] n4 --> A ``` - -If we do our update with optimistic locking, it simplifies the situation, and we can have strong guarantees. -Since we know if the update with optimistic locking is successful, we have the up-to-date resource in our caches. -Thus, the next event we receive will be the one that is the result of our update -(or a newer one if somebody did an update after, but that is fine since it will contain our allocated values). -So if we cache the resource in the overlay cache from the response, we know that with the next event, we can remove it -from there. -Note that we store the result in overlay cache only if at that time we still have the original resource in cache. -If the cache already updated, that means that we already received a new event after our update, -so we have a fresh resource in the informer cache already. From d56c576c7b10ea7b5c4c5618314b256cb237f6fd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Attila=20M=C3=A9sz=C3=A1ros?= Date: Tue, 27 May 2025 08:52:48 +0200 Subject: [PATCH 18/18] improvements MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Attila Mészáros --- .../blog/news/primary-cache-for-next-recon.md | 23 ++++++++----------- 1 file changed, 9 insertions(+), 14 deletions(-) diff --git a/docs/content/en/blog/news/primary-cache-for-next-recon.md b/docs/content/en/blog/news/primary-cache-for-next-recon.md index 8f22ddb7b6..67326a6f17 100644 --- a/docs/content/en/blog/news/primary-cache-for-next-recon.md +++ b/docs/content/en/blog/news/primary-cache-for-next-recon.md @@ -2,7 +2,7 @@ title: How to guarantee allocated values for next reconciliation date: 2025-05-22 author: >- - [Attila Mészáros](https://github.com/csviri) + [Attila Mészáros](https://github.com/csviri) and [Chris Laprun](https://github.com/metacosm) --- We recently released v5.1 of Java Operator SDK (JOSDK). One of the highlights of this release is related to a topic of @@ -11,13 +11,13 @@ so-called ). To describe the problem, let's say that our controller needs to create a resource that has a generated identifier, i.e. -a resource which identifier cannot be directely derived from the custom resource's desired state as specified in its -`spec` field. In order to record the fact that the resource was successfully created, and to avoid attempting to +a resource which identifier cannot be directly derived from the custom resource's desired state as specified in its +`spec` field. To record the fact that the resource was successfully created, and to avoid attempting to recreate the resource again in subsequent reconciliations, it is typical for this type of controller to store the generated identifier in the custom resource's `status` field. The Java Operator SDK relies on the informers' cache to retrieve resources. These caches, however, are only guaranteed -to be eventually consistent. It could happen, then, that, if some other event occurs, that would result in a new +to be eventually consistent. It could happen that, if some other event occurs, that would result in a new reconciliation, **before** the update that's been made to our resource status has the chance to be propagated first to the cluster and then back to the informer cache, that the resource in the informer cache does **not** contain the latest version as modified by the reconciler. This would result in a new reconciliation where the generated identifier would be @@ -59,10 +59,8 @@ The trick is to intercept the resource that the reconciler updated and cache tha of the informer's cache. Subsequently, if the reconciler needs to read the resource, the SDK will first check if it is in the overlay cache and read it from there if present, otherwise read it from the informer's cache. If the informer receives an event with a fresh resource, we always remove the resource from the overlay cache, since that is a more -recent resource. But this **works only** if the reconciler updates the resource using **with optimistic locking**, which -is handled for you by `PrimaryUpdateAndCacheUtils` provided you pass it a "fresh" (i.e. a version of the resource that -only contains the fields you care about being updated) copy of the resource since Server-Side Apply will be used -underneath. If the update fails on conflict, because the resource has already been updated on the cluster before we got +recent resource. But this **works only** if the reconciler updates the resource using **optimistic locking**. +If the update fails on conflict, because the resource has already been updated on the cluster before we got the chance to get our update in, we simply wait and poll the informer cache until the new resource version from the server appears in the informer's cache, and then try to apply our updates to the resource again using the updated version from the server, again with optimistic @@ -85,13 +83,10 @@ The following diagram sums up the process: ```mermaid flowchart TD - A["Update Resource with Lock"] --> B{"Successful?"} - B -- Fails on conflict --> D{"Resource updated from cluster?"} - D -- " No: poll until updated " --> D - D -- Yes --> n4["Apply desired changes on the resource again"] + A["Update Resource with Lock"] --> B{"Is Successful"} + B -- Fails on conflict --> D["Poll the Informer cache until resource updated"] + D --> A B -- Yes --> n2{"Original resource still in informer cache?"} n2 -- Yes --> C["Cache the resource in overlay cache"] n2 -- No --> n3["Informer cache already contains up-to-date version, do not use overlay cache"] - n4 --> A - ```