Skip to main content

Troubleshooting

Common issues and their solutions when working with dqlitepy.

Installation Issues

Module Not Found: cffi

Error:

ModuleNotFoundError: No module named 'cffi'

Solution:

pip install cffi
# or with uv
uv pip install cffi

Wheel Installation Fails

Error:

ERROR: dqlitepy-0.2.0-py3-none-linux_x86_64.whl is not a supported wheel on this platform.

Solutions:

  • Ensure you're on Linux x86_64
  • Check Python version (requires 3.8+)
  • Verify platform: python3 -c "import platform; print(platform.platform())"

Node Startup Issues

Address Already in Use

Error:

NodeStartError: Address already in use: 127.0.0.1:9001

Solutions:

# Find process using the port
lsof -i :9001
# or
netstat -tulpn | grep 9001

# Kill the process
kill <PID>

# Or use a different port
node = Node("127.0.0.1:9002", "/data")

Permission Denied on Data Directory

Error:

NodeStartError: Permission denied: /var/lib/dqlite

Solutions:

import os

# Create directory with correct permissions
os.makedirs("/var/lib/dqlite", mode=0o755, exist_ok=True)

# Or use a directory you have access to
node = Node("127.0.0.1:9001", "/tmp/dqlite-data")

Corrupted Data Directory

Error:

NodeStartError: database disk image is malformed

Solutions:

# Backup the corrupted data
mv /var/lib/dqlite /var/lib/dqlite.corrupted

# Start fresh
mkdir /var/lib/dqlite
# Node will create new database files

Cluster Formation Issues

Nodes Won't Form Cluster

Problem: Nodes start but don't see each other.

Check:

# Verify addresses are correct
from dqlitepy import Client

client = Client(["192.168.1.101:9001"])
nodes = client.cluster()
print(f"Visible nodes: {len(nodes)}")

Solutions:

  • Ensure all nodes use same cluster address list
  • Verify network connectivity: ping 192.168.1.102
  • Check firewall rules allow dqlite ports
  • Use specific IPs, not 0.0.0.0 or localhost

Split Brain: Multiple Leaders

This Cannot Happen: Raft consensus prevents split brain.

If you think you're seeing multiple leaders:

  • Check if you're looking at different clusters
  • Verify all nodes use same cluster addresses
  • Check network isn't partitioned

Cluster Has No Quorum

Error:

NoLeaderError: No leader available (quorum lost)

Diagnosis:

from dqlitepy import Client

client = Client(["192.168.1.101:9001"])
nodes = client.cluster()

total = len(nodes)
quorum = (total // 2) + 1
print(f"Need {quorum} nodes out of {total}")

# Check how many are reachable

Solutions:

  • Restart failed nodes
  • Wait for elections (typically 1-5 seconds)
  • Check network connectivity

Operation Failures

NoLeaderError During Writes

Error:

NoLeaderError: No leader elected yet

Cause: Leader election in progress (cluster startup or after leader failure).

Solution: Retry with exponential backoff

import time
from dqlitepy.exceptions import NoLeaderError

def retry_on_no_leader(func, max_retries=5, base_delay=1):
"""Retry operation during leader election."""
for attempt in range(max_retries):
try:
return func()
except NoLeaderError:
if attempt < max_retries - 1:
delay = base_delay * (2 ** attempt) # Exponential backoff
print(f"No leader, waiting {delay}s...")
time.sleep(delay)
else:
raise

# Use it
retry_on_no_leader(lambda: node.exec("INSERT INTO users VALUES (1, 'Alice')"))

Database Locked

Error:

OperationalError: database is locked

Causes:

  • Concurrent writes without proper transaction management
  • Long-running transaction holding locks

Solutions:

# Use proper transaction management
node.begin()
try:
node.exec("INSERT INTO users VALUES (1, 'Alice')")
node.exec("INSERT INTO posts VALUES (1, 1, 'Hello')")
node.commit()
except Exception:
node.rollback()
raise

# Or use smaller transactions
node.exec("INSERT INTO users VALUES (1, 'Alice')") # Auto-commits
node.exec("INSERT INTO posts VALUES (1, 1, 'Hello')") # Auto-commits

SQLAlchemy Column Order Issues

Problem: Data appears in wrong columns.

This is fixed in dqlitepy >= 0.2.0. If you see this:

# Check your version
import dqlitepy
print(dqlitepy.__version__) # Should be >= 0.2.0

If older, rebuild:

cd /path/to/dqlitepy
bash scripts/build_wheel_docker.sh
pip install --force-reinstall dist/dqlitepy-*.whl

Performance Issues

Slow Writes

Symptoms: Write operations taking seconds.

Causes:

  • Network latency between nodes
  • Disk I/O bottleneck
  • Too many nodes (more replicas = slower writes)

Solutions:

# Batch operations in transactions
node.begin()
for i in range(1000):
node.exec("INSERT INTO users VALUES (?, ?)", [i, f"user{i}"])
node.commit() # Much faster than 1000 individual commits
# Use faster storage
# SSD vs HDD can make 10x difference

Slow Reads from Followers

Symptoms: Queries slower on follower nodes.

Cause: Follower nodes may be catching up on replication.

Solutions:

  • Query the leader for critical reads
  • Ensure followers have adequate CPU/disk
  • Accept slightly stale reads on followers

Memory Usage Growing

Symptoms: Node memory increases over time.

Causes:

  • Raft log growing (normal)
  • Connection leaks

Solutions:

# Always close connections
try:
conn = connect(node, "db.sqlite")
# ... use connection ...
finally:
conn.close()

# Or use context manager
from contextlib import closing
with closing(connect(node, "db.sqlite")) as conn:
# ... use connection ...
pass # Auto-closes

SQLAlchemy Issues

Session Errors After Node Restart

Error:

OperationalError: connection closed

Solution: Create new session after restarts

# Close old session
session.close()

# Restart node
node.stop()
node.start()
node.open_db("app.db")

# Register again
register_dqlite_node(node, "app.db")

# Create new session
Session = sessionmaker(bind=engine)
session = Session()

Transaction Conflicts

Error:

IntegrityError: UNIQUE constraint failed

Solution: Handle conflicts in application code

from sqlalchemy.exc import IntegrityError

try:
session.add(new_user)
session.commit()
except IntegrityError:
session.rollback()
# Handle duplicate
print("User already exists")

Docker Issues

Containers Can't Communicate

Problem: Nodes in different containers can't connect.

Solution: Use Docker networks

# docker-compose.yml
networks:
dqlite-net:
driver: bridge
ipam:
config:
- subnet: 172.20.0.0/16

services:
node1:
networks:
dqlite-net:
ipv4_address: 172.20.0.11

Data Lost on Container Restart

Problem: Database resets when container restarts.

Solution: Use volumes

services:
node1:
volumes:
- node1-data:/data

volumes:
node1-data:

Debugging Tips

Enable Debug Logging

import logging

# Enable dqlitepy debug logs
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger('dqlitepy')
logger.setLevel(logging.DEBUG)

Check Node Status

def diagnose_node(node):
"""Print node status for debugging."""
print(f"Address: {node.address}")
print(f"ID: {node.id}")
print(f"Data dir: {node.data_dir}")
print(f"Running: {node.is_running}")

if node.is_running:
try:
result = node.query("SELECT 1")
print("Database accessible: Yes")
except Exception as e:
print(f"Database accessible: No ({e})")

diagnose_node(node)

Check Cluster State

def diagnose_cluster(client):
"""Print cluster state for debugging."""
try:
leader = client.leader()
print(f"Leader: {leader}")
except Exception as e:
print(f"Leader: Unknown ({e})")

try:
nodes = client.cluster()
print(f"\nCluster nodes: {len(nodes)}")
for node in nodes:
print(f" - Node {node.id}: {node.address}")
print(f" Role: {node.role_name}")
except Exception as e:
print(f"Cannot list cluster: {e}")

from dqlitepy import Client
client = Client(["192.168.1.101:9001"])
diagnose_cluster(client)

Verify Build

# Check if Go library is included
unzip -l dist/dqlitepy-*.whl | grep libdqlitepy

# Should see:
# dqlitepy/_lib/linux-amd64/libdqlitepy.so

Getting Help

If you're still stuck:

  1. Check the logs: Enable debug logging (see above)
  2. Minimal reproduction: Create simplest case that shows the issue
  3. Check examples: See if examples/ directory has similar use case
  4. GitHub Issues: Open an issue at https://github.com/vantagecompute/dqlitepy/issues

Include in your report:

  • dqlitepy version: python -c "import dqlitepy; print(dqlitepy.__version__)"
  • Python version: python --version
  • OS: uname -a (Linux) or systeminfo (Windows)
  • Complete error message and stack trace
  • Minimal code to reproduce

Common Error Messages Reference

ErrorCauseSolution
Address already in usePort conflictUse different port or kill process
Permission deniedDirectory permissionsFix ownership or use writable directory
No leader availableElection in progressWait and retry
Node not runningStart() not calledCall node.start() first
Database not openopen_db() not calledCall node.open_db("name.db")
Connection closedNode stoppedRestart node and reconnect
Module 'cffi' not foundMissing dependencypip install cffi

Still having issues? Open an issue on GitHub with details!