Monday, October 13, 2014

Building fault tolerant applications with Hystrix

In any distributed system, failures will happen. Remote calls will fail, servers will go down and database calls will return errors. When these calls fail, it is important that the failures stay isolated and don't cascade throughout the system. With this in mind, Netflix built and open sourced their Hystrix library. They do a pretty good job of describing what it is:
Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.
In particular, Hystrix allows you to easily employ the bulkhead pattern to isolate calls to different 3rd party systems and to use a circuit breaker to prevent too many repeated calls to a failing system. Here's an example of a simple command that does a surprisingly number of complex things:
  • This call will be executed using a thread pool to isolate it. All hystrix commands with the group MyCommandGroup will use this pool.
  • This command will use a circuit breaker in case it fails. In this example, returning "Hello world" is not going to fail, but if it was a remote command it could.
  • If the command fails, it will automatically use the fallback method.
public class RemoteCommand extends HystrixCommand {

    public RemoteCommand() {
        // "MyCommandGroup" determines which thread pool to use to execute the commands
        // Use a different group to separate logically different groups of commands

    protected String run() {
        return "Hello world";

    protected String getFallback() {
        return "fallback to me if run() fails";
The properties of the command (timeouts, circuit breaker threshold, etc) are highly configurable and can be tuned to your needs.

If you need a battle tested library for implementing these patterns, check out Hystrix.