Skip to content

Commit 19378f3

Browse files
committed
Update README.md
1 parent 3f4d637 commit 19378f3

File tree

1 file changed

+58
-58
lines changed

1 file changed

+58
-58
lines changed

retry/README.md

Lines changed: 58 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -10,51 +10,55 @@ tags:
1010
---
1111

1212
## Intent
13-
Transparently retry certain operations that involve communication with external resources, particularly over the
14-
network, isolating calling code from the retry implementation details.
13+
14+
Transparently retry certain operations that involve communication with external resources,
15+
particularly over the network, isolating calling code from the retry implementation details.
1516

1617
## Explanation
17-
Retry pattern consists retrying operations on remote resources over the
18-
network a set number of times. It closely depends on both business and technical
19-
requirements: how much time will the business allow the end user to wait while
20-
the operation finishes? What are the performance characteristics of the
21-
remote resource during peak loads as well as our application as more threads
22-
are waiting for the remote resource's availability? Among the errors returned
23-
by the remote service, which can be safely ignored in order to retry? Is the
24-
operation [idempotent](https://en.wikipedia.org/wiki/Idempotence)?
25-
26-
Another concern is the impact on the calling code by implementing the retry
27-
mechanism. The retry mechanics should ideally be completely transparent to the
28-
calling code (service interface remains unaltered). There are two general
29-
approaches to this problem: from an enterprise architecture standpoint
30-
(strategic), and a shared library standpoint (tactical).
31-
32-
From a strategic point of view, this would be solved by having requests
33-
be redirected to a separate intermediary system, traditionally an
34-
[ESB](https://en.wikipedia.org/wiki/Enterprise_service_bus), but more recently
35-
a [Service Mesh](https://medium.com/microservices-in-practice/service-mesh-for-microservices-2953109a3c9a).
36-
37-
From a tactical point of view, this would be solved by reusing shared libraries
38-
like [Hystrix](https://github.com/Netflix/Hystrix) (please note that *Hystrix* is a complete implementation of
39-
the [Circuit Breaker](https://java-design-patterns.com/patterns/circuit-breaker/) pattern, of which the Retry pattern
40-
can be considered a subset of.). This is the type of solution showcased in the simple example that accompanies this
41-
*README*.
18+
19+
Retry pattern consists retrying operations on remote resources over the network a set number of
20+
times. It closely depends on both business and technical requirements: How much time will the
21+
business allow the end user to wait while the operation finishes? What are the performance
22+
characteristics of the remote resource during peak loads as well as our application as more threads
23+
are waiting for the remote resource's availability? Among the errors returned by the remote service,
24+
which can be safely ignored in order to retry? Is the operation
25+
[idempotent](https://en.wikipedia.org/wiki/Idempotence)?
26+
27+
Another concern is the impact on the calling code by implementing the retry mechanism. The retry
28+
mechanics should ideally be completely transparent to the calling code (service interface remains
29+
unaltered). There are two general approaches to this problem: From an enterprise architecture
30+
standpoint (strategic), and a shared library standpoint (tactical).
31+
32+
From a strategic point of view, this would be solved by having requests redirected to a separate
33+
intermediary system, traditionally an [ESB](https://en.wikipedia.org/wiki/Enterprise_service_bus),
34+
but more recently a [Service Mesh](https://medium.com/microservices-in-practice/service-mesh-for-microservices-2953109a3c9a).
35+
36+
From a tactical point of view, this would be solved by reusing shared libraries like
37+
[Hystrix](https://github.com/Netflix/Hystrix) (please note that Hystrix is a complete implementation
38+
of the [Circuit Breaker](https://java-design-patterns.com/patterns/circuit-breaker/) pattern, of
39+
which the Retry pattern can be considered a subset of). This is the type of solution showcased in
40+
the simple example that accompanies this `README.md`.
4241

4342
Real world example
4443

45-
> Our application uses a service providing customer information. Once in a while the service seems to be flaky and can return errors or sometimes it just times out. To circumvent these problems we apply the retry pattern.
44+
> Our application uses a service providing customer information. Once in a while the service seems
45+
> to be flaky and can return errors or sometimes it just times out. To circumvent these problems we
46+
> apply the retry pattern.
4647
4748
In plain words
4849

4950
> Retry pattern transparently retries failed operations over network.
5051
5152
[Microsoft documentation](https://docs.microsoft.com/en-us/azure/architecture/patterns/retry) says
5253

53-
> Enable an application to handle transient failures when it tries to connect to a service or network resource, by transparently retrying a failed operation. This can improve the stability of the application.
54+
> Enable an application to handle transient failures when it tries to connect to a service or
55+
> network resource, by transparently retrying a failed operation. This can improve the stability of
56+
> the application.
5457
5558
**Programmatic Example**
5659

57-
In our hypothetical application, we have a generic interface for all operations on remote interfaces.
60+
In our hypothetical application, we have a generic interface for all operations on remote
61+
interfaces.
5862

5963
```java
6064
public interface BusinessOperation<T> {
@@ -73,16 +77,14 @@ public final class FindCustomer implements BusinessOperation<String> {
7377
}
7478
```
7579

76-
Our `FindCustomer` implementation can be configured to throw
77-
`BusinessException`s before returning the customer's ID, thereby simulating a
78-
'flaky' service that intermittently fails. Some exceptions, like the
79-
`CustomerNotFoundException`, are deemed to be recoverable after some
80-
hypothetical analysis because the root cause of the error stems from "some
81-
database locking issue". However, the `DatabaseNotAvailableException` is
82-
considered to be a definite showstopper - the application should not attempt
83-
to recover from this error.
80+
Our `FindCustomer` implementation can be configured to throw `BusinessException`s before returning
81+
the customer's ID, thereby simulating a flaky service that intermittently fails. Some exceptions,
82+
like the `CustomerNotFoundException`, are deemed to be recoverable after some hypothetical analysis
83+
because the root cause of the error stems from "some database locking issue". However, the
84+
`DatabaseNotAvailableException` is considered to be a definite showstopper - the application should
85+
not attempt to recover from this error.
8486

85-
We can model a 'recoverable' scenario by instantiating `FindCustomer` like this:
87+
We can model a recoverable scenario by instantiating `FindCustomer` like this:
8688

8789
```java
8890
final var op = new FindCustomer(
@@ -93,15 +95,12 @@ final var op = new FindCustomer(
9395
);
9496
```
9597

96-
In this configuration, `FindCustomer` will throw `CustomerNotFoundException`
97-
three times, after which it will consistently return the customer's ID
98-
(`12345`).
98+
In this configuration, `FindCustomer` will throw `CustomerNotFoundException` three times, after
99+
which it will consistently return the customer's ID (`12345`).
99100

100-
In our hypothetical scenario, our analysts indicate that this operation
101-
typically fails 2-4 times for a given input during peak hours, and that each
102-
worker thread in the database subsystem typically needs 50ms to
103-
"recover from an error". Applying these policies would yield something like
104-
this:
101+
In our hypothetical scenario, our analysts indicate that this operation typically fails 2-4 times
102+
for a given input during peak hours, and that each worker thread in the database subsystem typically
103+
needs 50ms to "recover from an error". Applying these policies would yield something like this:
105104

106105
```java
107106
final var op = new Retry<>(
@@ -117,26 +116,27 @@ final var op = new Retry<>(
117116
);
118117
```
119118

120-
Executing `op` *once* would automatically trigger at most 5 retry attempts,
121-
with a 100 millisecond delay between attempts, ignoring any
122-
`CustomerNotFoundException` thrown while trying. In this particular scenario,
123-
due to the configuration for `FindCustomer`, there will be 1 initial attempt
119+
Executing `op` once would automatically trigger at most 5 retry attempts, with a 100 millisecond
120+
delay between attempts, ignoring any `CustomerNotFoundException` thrown while trying. In this
121+
particular scenario, due to the configuration for `FindCustomer`, there will be 1 initial attempt
124122
and 3 additional retries before finally returning the desired result `12345`.
125123

126-
If our `FindCustomer` operation were instead to throw a fatal
127-
`DatabaseNotFoundException`, which we were instructed not to ignore, but
128-
more importantly we did *not* instruct our `Retry` to ignore, then the operation
129-
would have failed immediately upon receiving the error, not matter how many
130-
attempts were left.
124+
If our `FindCustomer` operation were instead to throw a fatal `DatabaseNotFoundException`, which we
125+
were instructed not to ignore, but more importantly we did not instruct our `Retry` to ignore, then
126+
the operation would have failed immediately upon receiving the error, not matter how many attempts
127+
were left.
131128

132129
## Class diagram
130+
133131
![alt text](./etc/retry.png "Retry")
134132

135133
## Applicability
136-
Whenever an application needs to communicate with an external resource, particularly in a cloud environment, and if
137-
the business requirements allow it.
134+
135+
Whenever an application needs to communicate with an external resource, particularly in a cloud
136+
environment, and if the business requirements allow it.
138137

139138
## Consequences
139+
140140
**Pros:**
141141

142142
* Resiliency

0 commit comments

Comments
 (0)