In time of microservices, it’s relevant how we think about securing remote communication between multiple applications and how we react on network/application failure as a client. If our application depends on many external services and communicates with them remotely, it should be protected in case of network problems or unavailability of the application. One of the basic solutions is to use CircuitBreaker pattern, you can find more detailed description on Martin Fowler’s website http://martinfowler.com/bliki/CircuitBreaker.html.
Main task of CircuitBreaker is to stop communication to some service when failures occure in that service. Default state of circuit is CLOSED state. It means that everything works. When failure rate reach a certain level, like in 10 second window there is minimum 20 requests to a remote service via HTTP and 50% of those requests ends with status code 404/503 or timeout, then CircuitBreaker goes into OPEN state and stops further requests. During the OPEN state all of the invocations to failed service receive fallback result that we need to implement. In case of remote service that provides endpoint that returns list of some entities, the fallback code should provide just empty list. While our circuit is OPEN periodically once at a defined time the state changes to HALF-OPEN state. In HALF-OPEN state only one request is passed to failed service. If the request returns successful response, then it means that failed service is working again and circuit state is back to CLOSED. If response is still incorrect, then circuit stay in OPEN state.
CircuitBreaker advantages :
- fallback protects user from failure
- failing fast instead of timeouts and control over network latency
- prevents cascading failures in distributed environment
- failures isolation
Implementing simple CircuitBreaker
I want to share with you my implementation of CircuitBreaker which is based on Hystrix library from Netflix. It’s not full CircuitBreaker due to missing HALF-OPEN state. Also my implementation depends on Hystrix implementation of Sliding Window algorithm for measuring circuit health. Maybe it’s little overhead to use the whole library from Netflix, but this is just example. If i find some spare time, I will implement some simpler solution to get rid of Hystrix dependence and decrease coupling. But this is simple, thread-safe and technology agnostic example which you can use in standard JAVA applications and also in JAVA EE applications.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
In constructor we are passing closeCircuitDelay and closeCircuitDelayTimeUnit variables defining after a certain time circuit should be once again closed after opening. Also we pass HealthConfiguration instance that stores configuration for circuitBreakerHealth instance. We can configure conditions like :
- errorThresholdPercentage - the percentage threshold of errors
- requestVolumeThreshold - the minimum number of request in specified time window
- rollingStatisticalWindowInMilliseconds - time span in miliseconds
- rollingStatisticalWindowBuckets - the number of buckets in the statistical window
- healthSnapshotIntervalInMilliseconds - snapshot interval time in milliseconds
AtomicBoolean available variable is used as availability flag defining whether circuit is open / closed. We are using AtomicBoolean for thread safety, because we assume that code will be used in a multi-threaded environment.
The most important method in our CircuitBreaker class is
which is responsible for invoking call() method of Callable instance. When circuit availability is set to false then every invocation of doCall ends up throwing an CircuitBreakerCallableFailure exception. If invocation was successful the markSuccess method of CircuitHealth class will be called. Otherwise markFailure, which is for counting errors, will be called together with onError. In case of closed circuit and poor condition of circuit health onError method will open the circuit.
1 2 3 4 5 6 7
Probably the easiest solution for the possibility of retry/close circuit is to run scheduled task just after opening circuit. We can use ScheduledExecutorService with some defined configurable delay - like 5 minutes. The purpose of this task should be reseting state and sucess/failure counters.
CircuitHealth interface provides methods for counting success/failures, reseting counters and checking whether the circuit should be open. HystrixHealth implementation is based on Hystrix solution. As counter we are using HystrixRollingNumber instance. We also need to store lastHealthCountsSnapshot, which is the last time when HealthCounts instance was created.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
The most important part of CircuitHealth is shouldOpen method. Circuit should open when total requests count exceeds predefined boundary in certain, specified in configuration time frames and error rate also exceeds the threshold.
1 2 3 4 5 6 7
Immutable class HealthCounts stores state of circuit health in the time of snapshot.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
1 2 3 4 5 6