Clustering
Learn how to create and manage multi-node dqlite clusters for high availability and fault tolerance.
Overview
A dqlite cluster provides:
- High Availability: Operations continue if minority of nodes fail
- Automatic Failover: New leader elected automatically
- Data Replication: All writes replicated to all nodes via Raft
- Strong Consistency: Linearizable reads and writes
- Split-Brain Protection: Requires quorum (majority) for operations
Cluster Basics
Quorum
A cluster needs quorum (strict majority) to operate:
| Total Nodes | Quorum Required | Tolerated Failures |
|---|---|---|
| 1 | 1 | 0 |
| 2 | 2 | 0 (not recommended) |
| 3 | 2 | 1 |
| 5 | 3 | 2 |
| 7 | 4 | 3 |
Recommendation: Use 3 or 5 nodes for production. Odd numbers work best.
Leader Election
- One node is elected as leader at a time
- Only the leader handles writes
- Any node can handle reads (may be slightly stale)
- Leader election is automatic
- Election takes typically 1-5 seconds
Creating a Cluster
Method 1: Bootstrap Together
All nodes start with knowledge of each other:
from dqlitepy import Node
# Define cluster topology
cluster = [
"192.168.1.101:9001",
"192.168.1.102:9001",
"192.168.1.103:9001"
]
# Start node 1
node1 = Node(
address="192.168.1.101:9001",
data_dir="/var/lib/dqlite/node1",
node_id=1,
cluster=cluster
)
node1.start()
# Start node 2
node2 = Node(
address="192.168.1.102:9001",
data_dir="/var/lib/dqlite/node2",
node_id=2,
cluster=cluster
)
node2.start()
# Start node 3
node3 = Node(
address="192.168.1.103:9001",
data_dir="/var/lib/dqlite/node3",
node_id=3,
cluster=cluster
)
node3.start()
Method 2: Dynamic Growth
Start with a single node and add others:
from dqlitepy import Node, Client
import time
# Start first node as standalone
node1 = Node(
address="192.168.1.101:9001",
data_dir="/var/lib/dqlite/node1",
node_id=1
)
node1.start()
# Start second node (not in cluster yet)
node2 = Node(
address="192.168.1.102:9001",
data_dir="/var/lib/dqlite/node2",
node_id=2
)
node2.start()
# Add node2 to the cluster via client
client = Client(["192.168.1.101:9001"])
client.add(2, "192.168.1.102:9001")
# Start third node and add it
node3 = Node(
address="192.168.1.103:9001",
data_dir="/var/lib/dqlite/node3",
node_id=3
)
node3.start()
client.add(3, "192.168.1.103:9001")
# Verify cluster
nodes = client.cluster()
for node in nodes:
print(f"Node {node.id}: {node.address} ({node.role_name})")
Cluster Operations
Finding the Leader
from dqlitepy import Client
client = Client(["192.168.1.101:9001", "192.168.1.102:9001"])
leader_address = client.leader()
print(f"Current leader: {leader_address}")
Listing Cluster Members
nodes = client.cluster()
for node in nodes:
print(f"Node {node.id}: {node.address}")
print(f" Role: {node.role_name}")
print(f" Is Leader: {node.role == 0}")
Removing a Node
# Remove node 3 from cluster
client.remove(3)
# Now you can safely stop node 3
node3.stop()
Important: You cannot remove the current leader. The leader will automatically step down during removal.
Transferring Leadership
Leadership transfer happens automatically, but you can influence it by removing and re-adding the current leader:
# Find current leader
leader = client.leader()
print(f"Current leader: {leader}")
# Remove and re-add (triggers election)
client.remove(leader_node_id)
# Wait for election
time.sleep(2)
client.add(leader_node_id, leader_address)
Writing to the Cluster
All Writes Go Through Leader
from dqlitepy import Node
# Connect to any node
node = Node("192.168.1.101:9001", "/data")
node.start()
node.open_db("mydb.db")
# Write operations are automatically forwarded to leader
node.exec("INSERT INTO users (name) VALUES (?)", ["Alice"])
# This works even if node1 is not the leader!
Handling Leader Changes
from dqlitepy.exceptions import NoLeaderError
import time
def resilient_write(node, sql, params=None):
"""Perform a write with automatic retry during leader election."""
max_retries = 5
for attempt in range(max_retries):
try:
node.exec(sql, params)
return
except NoLeaderError:
if attempt < max_retries - 1:
print(f"No leader, retrying in 1s... (attempt {attempt + 1})")
time.sleep(1)
else:
raise
# Use it
resilient_write(node, "INSERT INTO users (name) VALUES (?)", ["Bob"])
Reading from the Cluster
Reading from Any Node
# You can read from any node
results1 = node1.query("SELECT * FROM users")
results2 = node2.query("SELECT * FROM users")
results3 = node3.query("SELECT * FROM users")
# All will return the same data (may be slightly delayed on followers)
Stale Reads
Reads from follower nodes may be slightly stale (milliseconds to seconds behind):
# Write on leader
node1.exec("INSERT INTO users (name) VALUES (?)", ["Charlie"])
# Immediately read from follower
results = node2.query("SELECT * FROM users WHERE name = ?", ["Charlie"])
# Charlie might not appear yet (stale read)
# Wait briefly
time.sleep(0.1)
# Now it should be there
results = node2.query("SELECT * FROM users WHERE name = ?)", ["Charlie"])
For strongly consistent reads, always query the leader.
Cluster Failure Scenarios
Single Node Failure (3-node cluster)
- Cluster continues operating (still has quorum: 2/3)
- New leader elected if the failed node was leader
- Reads and writes continue normally
- Data is safe (replicated on remaining 2 nodes)
Recovery: Restart the failed node - it will rejoin automatically.
Two Node Failure (3-node cluster)
- Cluster stops (lost quorum: 1/3)
- No reads or writes possible
- Existing data is safe but inaccessible
Recovery: Restart at least one failed node to restore quorum.
Split Brain Protection
Network partition splits cluster into 2 + 1:
- Partition with 2 nodes: Continues operating (has quorum)
- Partition with 1 node: Stops operating (no quorum)
- When partition heals, single node rejoins automatically
- No data loss, no conflicting writes
Best Practices
1. Use Odd Number of Nodes
# Good: 3 nodes (tolerates 1 failure)
# Good: 5 nodes (tolerates 2 failures)
# Bad: 4 nodes (still only tolerates 1 failure, more overhead)
2. Use Specific IP Addresses
# Good
node = Node("192.168.1.101:9001", "/data")
# Bad - will cause cluster communication issues
node = Node("0.0.0.0:9001", "/data")
node = Node("localhost:9001", "/data") # Only works for single-node
3. Separate Data Directories
# Each node must have its own data directory
node1 = Node("192.168.1.101:9001", "/var/lib/dqlite/node1")
node2 = Node("192.168.1.102:9001", "/var/lib/dqlite/node2")
node3 = Node("192.168.1.103:9001", "/var/lib/dqlite/node3")
4. Monitor Cluster Health
def check_cluster_health(client):
"""Check if cluster is healthy."""
try:
leader = client.leader()
nodes = client.cluster()
voter_count = sum(1 for n in nodes if n.role == 0)
quorum = (len(nodes) // 2) + 1
print(f"Leader: {leader}")
print(f"Total nodes: {len(nodes)}")
print(f"Voters: {voter_count}")
print(f"Quorum required: {quorum}")
print(f"Healthy: {voter_count >= quorum}")
return voter_count >= quorum
except Exception as e:
print(f"Health check failed: {e}")
return False
5. Graceful Shutdown
# Always close nodes gracefully
try:
node.close()
except Exception as e:
print(f"Error during shutdown: {e}")
Troubleshooting
Cluster Won't Form
Problem: Nodes start but don't form a cluster.
Solutions:
- Verify all nodes can reach each other on the network
- Check firewall rules allow traffic on the dqlite ports
- Ensure addresses in cluster list match actual node addresses
- Check logs for connection errors
Elections Taking Too Long
Problem: Leader election takes more than 5 seconds.
Possible causes:
- High network latency between nodes
- CPU overload on nodes
- Disk I/O bottleneck
Solutions:
- Use faster network connections
- Ensure nodes have adequate CPU resources
- Use SSDs for data directories
Nodes Keep Crashing
Problem: Nodes repeatedly crash or hang.
Check:
- Disk space (dqlite needs space for database and logs)
- Memory (ensure adequate RAM)
- File descriptors (check ulimit)
- Corrupted data directory (try starting with fresh directory)
Docker Compose Example
See complete example in examples/fast_api_example/docker-compose.yml:
version: '3.8'
services:
node1:
build: .
command: node 1
volumes:
- node1-data:/data
networks:
dqlite-net:
ipv4_address: 172.20.0.11
node2:
build: .
command: node 2
volumes:
- node2-data:/data
networks:
dqlite-net:
ipv4_address: 172.20.0.12
node3:
build: .
command: node 3
volumes:
- node3-data:/data
networks:
dqlite-net:
ipv4_address: 172.20.0.13
networks:
dqlite-net:
driver: bridge
ipam:
config:
- subnet: 172.20.0.0/16
volumes:
node1-data:
node2-data:
node3-data:
Next Steps
- Client API - Detailed client API documentation
- Node API - Detailed node API documentation
- FastAPI Example - Complete cluster application
- Troubleshooting - Common issues and solutions