Clustering

Learn how to create and manage multi-node dqlite clusters for high availability and fault tolerance.

Overview

A dqlite cluster provides:

High Availability: Operations continue if minority of nodes fail
Automatic Failover: New leader elected automatically
Data Replication: All writes replicated to all nodes via Raft
Strong Consistency: Linearizable reads and writes
Split-Brain Protection: Requires quorum (majority) for operations

Cluster Basics

Quorum

A cluster needs quorum (strict majority) to operate:

Total Nodes	Quorum Required	Tolerated Failures
1	1	0
2	2	0 (not recommended)
3	2	1
5	3	2
7	4	3

Recommendation: Use 3 or 5 nodes for production. Odd numbers work best.

Leader Election

One node is elected as leader at a time
Only the leader handles writes
Any node can handle reads (may be slightly stale)
Leader election is automatic
Election takes typically 1-5 seconds

Creating a Cluster

Method 1: Bootstrap Together

All nodes start with knowledge of each other:

from dqlitepy import Node

# Define cluster topology
cluster = [
    "192.168.1.101:9001",
    "192.168.1.102:9001",
    "192.168.1.103:9001"
]

# Start node 1
node1 = Node(
    address="192.168.1.101:9001",
    data_dir="/var/lib/dqlite/node1",
    node_id=1,
    cluster=cluster
)
node1.start()

# Start node 2
node2 = Node(
    address="192.168.1.102:9001",
    data_dir="/var/lib/dqlite/node2",
    node_id=2,
    cluster=cluster
)
node2.start()

# Start node 3
node3 = Node(
    address="192.168.1.103:9001",
    data_dir="/var/lib/dqlite/node3",
    node_id=3,
    cluster=cluster
)
node3.start()

Method 2: Dynamic Growth

Start with a single node and add others:

from dqlitepy import Node, Client
import time

# Start first node as standalone
node1 = Node(
    address="192.168.1.101:9001",
    data_dir="/var/lib/dqlite/node1",
    node_id=1
)
node1.start()

# Start second node (not in cluster yet)
node2 = Node(
    address="192.168.1.102:9001",
    data_dir="/var/lib/dqlite/node2",
    node_id=2
)
node2.start()

# Add node2 to the cluster via client
client = Client(["192.168.1.101:9001"])
client.add(2, "192.168.1.102:9001")

# Start third node and add it
node3 = Node(
    address="192.168.1.103:9001",
    data_dir="/var/lib/dqlite/node3",
    node_id=3
)
node3.start()
client.add(3, "192.168.1.103:9001")

# Verify cluster
nodes = client.cluster()
for node in nodes:
    print(f"Node {node.id}: {node.address} ({node.role_name})")

Cluster Operations

Finding the Leader

from dqlitepy import Client

client = Client(["192.168.1.101:9001", "192.168.1.102:9001"])
leader_address = client.leader()
print(f"Current leader: {leader_address}")

Listing Cluster Members

nodes = client.cluster()
for node in nodes:
    print(f"Node {node.id}: {node.address}")
    print(f"  Role: {node.role_name}")
    print(f"  Is Leader: {node.role == 0}")

Removing a Node

# Remove node 3 from cluster
client.remove(3)

# Now you can safely stop node 3
node3.stop()

Important: You cannot remove the current leader. The leader will automatically step down during removal.

Transferring Leadership

Leadership transfer happens automatically, but you can influence it by removing and re-adding the current leader:

# Find current leader
leader = client.leader()
print(f"Current leader: {leader}")

# Remove and re-add (triggers election)
client.remove(leader_node_id)
# Wait for election
time.sleep(2)
client.add(leader_node_id, leader_address)

Writing to the Cluster

All Writes Go Through Leader

from dqlitepy import Node

# Connect to any node
node = Node("192.168.1.101:9001", "/data")
node.start()
node.open_db("mydb.db")

# Write operations are automatically forwarded to leader
node.exec("INSERT INTO users (name) VALUES (?)", ["Alice"])

# This works even if node1 is not the leader!

Handling Leader Changes

from dqlitepy.exceptions import NoLeaderError
import time

def resilient_write(node, sql, params=None):
    """Perform a write with automatic retry during leader election."""
    max_retries = 5
    for attempt in range(max_retries):
        try:
            node.exec(sql, params)
            return
        except NoLeaderError:
            if attempt < max_retries - 1:
                print(f"No leader, retrying in 1s... (attempt {attempt + 1})")
                time.sleep(1)
            else:
                raise

# Use it
resilient_write(node, "INSERT INTO users (name) VALUES (?)", ["Bob"])

Reading from the Cluster

Reading from Any Node

# You can read from any node
results1 = node1.query("SELECT * FROM users")
results2 = node2.query("SELECT * FROM users")  
results3 = node3.query("SELECT * FROM users")

# All will return the same data (may be slightly delayed on followers)

Stale Reads

Reads from follower nodes may be slightly stale (milliseconds to seconds behind):

# Write on leader
node1.exec("INSERT INTO users (name) VALUES (?)", ["Charlie"])

# Immediately read from follower
results = node2.query("SELECT * FROM users WHERE name = ?", ["Charlie"])
# Charlie might not appear yet (stale read)

# Wait briefly
time.sleep(0.1)

# Now it should be there
results = node2.query("SELECT * FROM users WHERE name = ?)", ["Charlie"])

For strongly consistent reads, always query the leader.

Cluster Failure Scenarios

Single Node Failure (3-node cluster)

Cluster continues operating (still has quorum: 2/3)
New leader elected if the failed node was leader
Reads and writes continue normally
Data is safe (replicated on remaining 2 nodes)

Recovery: Restart the failed node - it will rejoin automatically.

Two Node Failure (3-node cluster)

Cluster stops (lost quorum: 1/3)
No reads or writes possible
Existing data is safe but inaccessible

Recovery: Restart at least one failed node to restore quorum.

Split Brain Protection

Network partition splits cluster into 2 + 1:

Partition with 2 nodes: Continues operating (has quorum)
Partition with 1 node: Stops operating (no quorum)
When partition heals, single node rejoins automatically
No data loss, no conflicting writes

Best Practices

1. Use Odd Number of Nodes

# Good: 3 nodes (tolerates 1 failure)
# Good: 5 nodes (tolerates 2 failures)
# Bad: 4 nodes (still only tolerates 1 failure, more overhead)

2. Use Specific IP Addresses

# Good
node = Node("192.168.1.101:9001", "/data")

# Bad - will cause cluster communication issues
node = Node("0.0.0.0:9001", "/data")
node = Node("localhost:9001", "/data")  # Only works for single-node

3. Separate Data Directories

# Each node must have its own data directory
node1 = Node("192.168.1.101:9001", "/var/lib/dqlite/node1")
node2 = Node("192.168.1.102:9001", "/var/lib/dqlite/node2")
node3 = Node("192.168.1.103:9001", "/var/lib/dqlite/node3")

4. Monitor Cluster Health

def check_cluster_health(client):
    """Check if cluster is healthy."""
    try:
        leader = client.leader()
        nodes = client.cluster()
        
        voter_count = sum(1 for n in nodes if n.role == 0)
        quorum = (len(nodes) // 2) + 1
        
        print(f"Leader: {leader}")
        print(f"Total nodes: {len(nodes)}")
        print(f"Voters: {voter_count}")
        print(f"Quorum required: {quorum}")
        print(f"Healthy: {voter_count >= quorum}")
        
        return voter_count >= quorum
    except Exception as e:
        print(f"Health check failed: {e}")
        return False

5. Graceful Shutdown

# Always close nodes gracefully
try:
    node.close()
except Exception as e:
    print(f"Error during shutdown: {e}")

Troubleshooting

Cluster Won't Form

Problem: Nodes start but don't form a cluster.

Solutions:

Verify all nodes can reach each other on the network
Check firewall rules allow traffic on the dqlite ports
Ensure addresses in cluster list match actual node addresses
Check logs for connection errors

Elections Taking Too Long

Problem: Leader election takes more than 5 seconds.

Possible causes:

High network latency between nodes
CPU overload on nodes
Disk I/O bottleneck

Solutions:

Use faster network connections
Ensure nodes have adequate CPU resources
Use SSDs for data directories

Nodes Keep Crashing

Problem: Nodes repeatedly crash or hang.

Check:

Disk space (dqlite needs space for database and logs)
Memory (ensure adequate RAM)
File descriptors (check ulimit)
Corrupted data directory (try starting with fresh directory)

Docker Compose Example

See complete example in examples/fast_api_example/docker-compose.yml:

version: '3.8'

services:
  node1:
    build: .
    command: node 1
    volumes:
      - node1-data:/data
    networks:
      dqlite-net:
        ipv4_address: 172.20.0.11

  node2:
    build: .
    command: node 2
    volumes:
      - node2-data:/data
    networks:
      dqlite-net:
        ipv4_address: 172.20.0.12

  node3:
    build: .
    command: node 3
    volumes:
      - node3-data:/data
    networks:
      dqlite-net:
        ipv4_address: 172.20.0.13

networks:
  dqlite-net:
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.0.0/16

volumes:
  node1-data:
  node2-data:
  node3-data:

Next Steps

Client API - Detailed client API documentation
Node API - Detailed node API documentation
FastAPI Example - Complete cluster application
Troubleshooting - Common issues and solutions

Overview​

Cluster Basics​

Quorum​

Leader Election​

Creating a Cluster​

Method 1: Bootstrap Together​

Method 2: Dynamic Growth​

Cluster Operations​

Finding the Leader​

Listing Cluster Members​

Removing a Node​

Transferring Leadership​

Writing to the Cluster​

All Writes Go Through Leader​

Handling Leader Changes​

Reading from the Cluster​

Reading from Any Node​

Stale Reads​

Cluster Failure Scenarios​

Single Node Failure (3-node cluster)​

Two Node Failure (3-node cluster)​

Split Brain Protection​

Best Practices​

1. Use Odd Number of Nodes​

2. Use Specific IP Addresses​

3. Separate Data Directories​

4. Monitor Cluster Health​

5. Graceful Shutdown​

Troubleshooting​

Cluster Won't Form​

Elections Taking Too Long​

Nodes Keep Crashing​

Docker Compose Example​

Next Steps​