Scaling to High Availability

This is Phase 2 of our Consul training series. While single-node setups are convenient for local testing and developer sandboxes, they represent a severe Single Point of Failure (SPOF) in enterprise production environments.

In this tutorial, we will scale our single node to a highly resilient 3-node High-Availability (HA) cluster. We will secure node-to-node communications using a symmetric encryption key, configure the Raft consensus parameters, and join the servers together to elect a cluster leader.

🏗️ Clustered Service Topology

Our high-availability cluster will run across three distinct virtual machine nodes, orchestrating consensus over Raft:

Loading diagram...

Step 1: Generating a Shared Gossip Encryption Key

Consul secures node-to-node gossip communication using a shared, symmetric pre-shared key.

On your primary VM node, generate a cryptographically secure gossip key using the Consul CLI:

consul keygen

This command will output a random, base64-encoded 32-byte string, for example: dGhpcy1pcy1hLXNlY3VyZS1rZXktZ2VuZXJhdGlvbi0=.

Save this key securely! We will insert it into all three server configurations.

Step 2: Configuring the Server Clustered Parameters

We will deploy three virtual machines with static IP addresses:

Server 01: 10.0.10.11
Server 02: 10.0.10.12
Server 03: 10.0.10.13

Create and edit the /etc/consul.d/consul.hcl configuration file on all three nodes, replacing the local variables as shown below:

# /etc/consul.d/consul.hcl
datacenter = "dc-devops-01"
data_dir   = "/opt/consul"

# Enable gossip encryption using our generated key
encrypt = "dGhpcy1pcy1hLXNlY3VyZS1rZXktZ2VuZXJhdGlvbi0="

# Bind settings
bind_addr   = "{{GetInterfaceIP \"eth0\"}}" # Dyn-resolve IP of eth0 interface
client_addr = "0.0.0.0"

# Enable UI
ui_config {
  enabled = true
}

# HA Cluster configuration parameters
server           = true
bootstrap_expect = 3
retry_join       = ["10.0.10.11", "10.0.10.12", "10.0.10.13"]

Note on retry_join: The retry_join parameter instructs new agents to automatically probe the specified IPs to join the active cluster, eliminating the need to run manual join commands.

Step 3: Starting the HA Cluster & Establishing Leadership

Once the configuration files are saved across all three nodes:

1. Set Permissions & Boot Services

Execute these start commands on all three server nodes:

sudo chown -R consul:consul /opt/consul /etc/consul.d
sudo systemctl daemon-reload
sudo systemctl restart consul

2. Monitor Cluster logs

Tail the system logs on Server 01 to watch the consensus election unfold:

journalctl -u consul.service -f --no-tail

You will see logs detailing the connection progress:

consul[1024]: agent: Join completed. Members joined: 2
consul[1024]: raft: Node at 10.0.10.11:8300 [Candidate] entering Candidate state
consul[1024]: raft: Election won. Node 10.0.10.11:8300 established as Leader!

Once a leader is successfully established, the cluster becomes fully operational and resilient to node failures!

Step 4: Verification of Cluster Health

Verify cluster member states from the terminal on any of the active servers:

consul members

You should see all three nodes listed with a healthy status:

Node              Address            Status  Type    Build  Protocol  DC            Partition  Segment
homelab-server-01  10.0.10.11:8301    alive   server  1.16.0  2         dc-devops-01  default    <all>
homelab-server-02  10.0.10.12:8301    alive   server  1.16.0  2         dc-devops-01  default    <all>
homelab-server-03  10.0.10.13:8301    alive   server  1.16.0  2         dc-devops-01  default    <all>

You can test high availability by manually stopping Consul on Server 02 (sudo systemctl stop consul). Run consul operator raft list-peers on Server 01 to verify that the cluster remains operational and elects a new leader if the current leader fails.

Awesome! You have successfully built a highly resilient, production-grade 3-node HA cluster! When you are ready, proceed to the final phase: Phase 3: Transitioning to Enterprise and Service Mesh to unlock multi-tenancy and zero-trust service mesh integrations.