Skip to main content

Clustering

Learn how to create and manage multi-node dqlite clusters for high availability and fault tolerance.

Overview

A dqlite cluster provides:

  • High Availability: Operations continue if minority of nodes fail
  • Automatic Failover: New leader elected automatically
  • Data Replication: All writes replicated to all nodes via Raft
  • Strong Consistency: Linearizable reads and writes
  • Split-Brain Protection: Requires quorum (majority) for operations

Cluster Basics

Quorum

A cluster needs quorum (strict majority) to operate:

Total NodesQuorum RequiredTolerated Failures
110
220 (not recommended)
321
532
743

Recommendation: Use 3 or 5 nodes for production. Odd numbers work best.

Leader Election

  • One node is elected as leader at a time
  • Only the leader handles writes
  • Any node can handle reads (may be slightly stale)
  • Leader election is automatic
  • Election takes typically 1-5 seconds

Creating a Cluster

Method 1: Bootstrap Together

All nodes start with knowledge of each other:

from dqlitepy import Node

# Define cluster topology
cluster = [
"192.168.1.101:9001",
"192.168.1.102:9001",
"192.168.1.103:9001"
]

# Start node 1
node1 = Node(
address="192.168.1.101:9001",
data_dir="/var/lib/dqlite/node1",
node_id=1,
cluster=cluster
)
node1.start()

# Start node 2
node2 = Node(
address="192.168.1.102:9001",
data_dir="/var/lib/dqlite/node2",
node_id=2,
cluster=cluster
)
node2.start()

# Start node 3
node3 = Node(
address="192.168.1.103:9001",
data_dir="/var/lib/dqlite/node3",
node_id=3,
cluster=cluster
)
node3.start()

Method 2: Dynamic Growth

Start with a single node and add others:

from dqlitepy import Node, Client
import time

# Start first node as standalone
node1 = Node(
address="192.168.1.101:9001",
data_dir="/var/lib/dqlite/node1",
node_id=1
)
node1.start()

# Start second node (not in cluster yet)
node2 = Node(
address="192.168.1.102:9001",
data_dir="/var/lib/dqlite/node2",
node_id=2
)
node2.start()

# Add node2 to the cluster via client
client = Client(["192.168.1.101:9001"])
client.add(2, "192.168.1.102:9001")

# Start third node and add it
node3 = Node(
address="192.168.1.103:9001",
data_dir="/var/lib/dqlite/node3",
node_id=3
)
node3.start()
client.add(3, "192.168.1.103:9001")

# Verify cluster
nodes = client.cluster()
for node in nodes:
print(f"Node {node.id}: {node.address} ({node.role_name})")

Cluster Operations

Finding the Leader

from dqlitepy import Client

client = Client(["192.168.1.101:9001", "192.168.1.102:9001"])
leader_address = client.leader()
print(f"Current leader: {leader_address}")

Listing Cluster Members

nodes = client.cluster()
for node in nodes:
print(f"Node {node.id}: {node.address}")
print(f" Role: {node.role_name}")
print(f" Is Leader: {node.role == 0}")

Removing a Node

# Remove node 3 from cluster
client.remove(3)

# Now you can safely stop node 3
node3.stop()

Important: You cannot remove the current leader. The leader will automatically step down during removal.

Transferring Leadership

Leadership transfer happens automatically, but you can influence it by removing and re-adding the current leader:

# Find current leader
leader = client.leader()
print(f"Current leader: {leader}")

# Remove and re-add (triggers election)
client.remove(leader_node_id)
# Wait for election
time.sleep(2)
client.add(leader_node_id, leader_address)

Writing to the Cluster

All Writes Go Through Leader

from dqlitepy import Node

# Connect to any node
node = Node("192.168.1.101:9001", "/data")
node.start()
node.open_db("mydb.db")

# Write operations are automatically forwarded to leader
node.exec("INSERT INTO users (name) VALUES (?)", ["Alice"])

# This works even if node1 is not the leader!

Handling Leader Changes

from dqlitepy.exceptions import NoLeaderError
import time

def resilient_write(node, sql, params=None):
"""Perform a write with automatic retry during leader election."""
max_retries = 5
for attempt in range(max_retries):
try:
node.exec(sql, params)
return
except NoLeaderError:
if attempt < max_retries - 1:
print(f"No leader, retrying in 1s... (attempt {attempt + 1})")
time.sleep(1)
else:
raise

# Use it
resilient_write(node, "INSERT INTO users (name) VALUES (?)", ["Bob"])

Reading from the Cluster

Reading from Any Node

# You can read from any node
results1 = node1.query("SELECT * FROM users")
results2 = node2.query("SELECT * FROM users")
results3 = node3.query("SELECT * FROM users")

# All will return the same data (may be slightly delayed on followers)

Stale Reads

Reads from follower nodes may be slightly stale (milliseconds to seconds behind):

# Write on leader
node1.exec("INSERT INTO users (name) VALUES (?)", ["Charlie"])

# Immediately read from follower
results = node2.query("SELECT * FROM users WHERE name = ?", ["Charlie"])
# Charlie might not appear yet (stale read)

# Wait briefly
time.sleep(0.1)

# Now it should be there
results = node2.query("SELECT * FROM users WHERE name = ?)", ["Charlie"])

For strongly consistent reads, always query the leader.

Cluster Failure Scenarios

Single Node Failure (3-node cluster)

  • Cluster continues operating (still has quorum: 2/3)
  • New leader elected if the failed node was leader
  • Reads and writes continue normally
  • Data is safe (replicated on remaining 2 nodes)

Recovery: Restart the failed node - it will rejoin automatically.

Two Node Failure (3-node cluster)

  • Cluster stops (lost quorum: 1/3)
  • No reads or writes possible
  • Existing data is safe but inaccessible

Recovery: Restart at least one failed node to restore quorum.

Split Brain Protection

Network partition splits cluster into 2 + 1:

  • Partition with 2 nodes: Continues operating (has quorum)
  • Partition with 1 node: Stops operating (no quorum)
  • When partition heals, single node rejoins automatically
  • No data loss, no conflicting writes

Best Practices

1. Use Odd Number of Nodes

# Good: 3 nodes (tolerates 1 failure)
# Good: 5 nodes (tolerates 2 failures)
# Bad: 4 nodes (still only tolerates 1 failure, more overhead)

2. Use Specific IP Addresses

# Good
node = Node("192.168.1.101:9001", "/data")

# Bad - will cause cluster communication issues
node = Node("0.0.0.0:9001", "/data")
node = Node("localhost:9001", "/data") # Only works for single-node

3. Separate Data Directories

# Each node must have its own data directory
node1 = Node("192.168.1.101:9001", "/var/lib/dqlite/node1")
node2 = Node("192.168.1.102:9001", "/var/lib/dqlite/node2")
node3 = Node("192.168.1.103:9001", "/var/lib/dqlite/node3")

4. Monitor Cluster Health

def check_cluster_health(client):
"""Check if cluster is healthy."""
try:
leader = client.leader()
nodes = client.cluster()

voter_count = sum(1 for n in nodes if n.role == 0)
quorum = (len(nodes) // 2) + 1

print(f"Leader: {leader}")
print(f"Total nodes: {len(nodes)}")
print(f"Voters: {voter_count}")
print(f"Quorum required: {quorum}")
print(f"Healthy: {voter_count >= quorum}")

return voter_count >= quorum
except Exception as e:
print(f"Health check failed: {e}")
return False

5. Graceful Shutdown

# Always close nodes gracefully
try:
node.close()
except Exception as e:
print(f"Error during shutdown: {e}")

Troubleshooting

Cluster Won't Form

Problem: Nodes start but don't form a cluster.

Solutions:

  • Verify all nodes can reach each other on the network
  • Check firewall rules allow traffic on the dqlite ports
  • Ensure addresses in cluster list match actual node addresses
  • Check logs for connection errors

Elections Taking Too Long

Problem: Leader election takes more than 5 seconds.

Possible causes:

  • High network latency between nodes
  • CPU overload on nodes
  • Disk I/O bottleneck

Solutions:

  • Use faster network connections
  • Ensure nodes have adequate CPU resources
  • Use SSDs for data directories

Nodes Keep Crashing

Problem: Nodes repeatedly crash or hang.

Check:

  • Disk space (dqlite needs space for database and logs)
  • Memory (ensure adequate RAM)
  • File descriptors (check ulimit)
  • Corrupted data directory (try starting with fresh directory)

Docker Compose Example

See complete example in examples/fast_api_example/docker-compose.yml:

version: '3.8'

services:
node1:
build: .
command: node 1
volumes:
- node1-data:/data
networks:
dqlite-net:
ipv4_address: 172.20.0.11

node2:
build: .
command: node 2
volumes:
- node2-data:/data
networks:
dqlite-net:
ipv4_address: 172.20.0.12

node3:
build: .
command: node 3
volumes:
- node3-data:/data
networks:
dqlite-net:
ipv4_address: 172.20.0.13

networks:
dqlite-net:
driver: bridge
ipam:
config:
- subnet: 172.20.0.0/16

volumes:
node1-data:
node2-data:
node3-data:

Next Steps