Creating RabbitMQ cluster is tricky. It is easy to do manually and hard if you’d like to automate it.
TL;DR Use this Terraform configuration to create RabbitMQ cluster in less than 5 minutes.
The simplest cluster requires 2 nodes and a load balancer. In AWS we are going to use ELB as a load balancer and put nodes in Auto Scaling group, so that if a node goes down (or became unhealthy) it will be replaced by a new one.
Our setup will be:
Using Terraform we can create Launch Configuration, Auto Scaling group and ELB. This is our ELB configuration:
ELB is configured to replace a node if it becomes unhealthy after 5 minutes (there are 10 checks every 30 seconds). It listens on ports
5672 (AMQP) and
80 proxied to
15672 (HTTP interface).
What about nodes ? We use cloud-init to initialize a node and in there we configure RabbitMQ to run in Docker.
After RabbitMQ is running, the node has to join the cluster. To do that we call
rabbitmqctl join_cluster <node> for each of the nodes.
To find out what are the other nodes in the cluster we prepared a bash script that query nodes in our Auto Scaling group:
And then a script to join to these nodes:
The tricky part here is that to join a cluster, you have to stop the node first. So there is a chance that other node could also be stopped as well. To mitigate this problem we set sleep for some random amount of seconds before stopping the server Also, in case of errors, we perform sane amount of retries.
Last thing that requires explanation is that querying nodes in given _Auto Scaling_group. To to be able to do so you need to associate IAM role with
ec2:DescribeInstancespolicies. This can be done by our Terraform configuration automatically:
Using this Terraform configuration we successfully deployed many RabbitMQ clusters with up to 4 nodes.
Leave a comment if you find this useful or a question in case of troubles. Cheers!