Troubleshooting
Common issues and their solutions when working with dqlitepy.
Installation Issues
Module Not Found: cffi
Error:
ModuleNotFoundError: No module named 'cffi'
Solution:
pip install cffi
# or with uv
uv pip install cffi
Wheel Installation Fails
Error:
ERROR: dqlitepy-0.2.0-py3-none-linux_x86_64.whl is not a supported wheel on this platform.
Solutions:
- Ensure you're on Linux x86_64
- Check Python version (requires 3.8+)
- Verify platform:
python3 -c "import platform; print(platform.platform())"
Node Startup Issues
Address Already in Use
Error:
NodeStartError: Address already in use: 127.0.0.1:9001
Solutions:
# Find process using the port
lsof -i :9001
# or
netstat -tulpn | grep 9001
# Kill the process
kill <PID>
# Or use a different port
node = Node("127.0.0.1:9002", "/data")
Permission Denied on Data Directory
Error:
NodeStartError: Permission denied: /var/lib/dqlite
Solutions:
import os
# Create directory with correct permissions
os.makedirs("/var/lib/dqlite", mode=0o755, exist_ok=True)
# Or use a directory you have access to
node = Node("127.0.0.1:9001", "/tmp/dqlite-data")
Corrupted Data Directory
Error:
NodeStartError: database disk image is malformed
Solutions:
# Backup the corrupted data
mv /var/lib/dqlite /var/lib/dqlite.corrupted
# Start fresh
mkdir /var/lib/dqlite
# Node will create new database files
Cluster Formation Issues
Nodes Won't Form Cluster
Problem: Nodes start but don't see each other.
Check:
# Verify addresses are correct
from dqlitepy import Client
client = Client(["192.168.1.101:9001"])
nodes = client.cluster()
print(f"Visible nodes: {len(nodes)}")
Solutions:
- Ensure all nodes use same cluster address list
- Verify network connectivity:
ping 192.168.1.102 - Check firewall rules allow dqlite ports
- Use specific IPs, not
0.0.0.0orlocalhost
Split Brain: Multiple Leaders
This Cannot Happen: Raft consensus prevents split brain.
If you think you're seeing multiple leaders:
- Check if you're looking at different clusters
- Verify all nodes use same cluster addresses
- Check network isn't partitioned
Cluster Has No Quorum
Error:
NoLeaderError: No leader available (quorum lost)
Diagnosis:
from dqlitepy import Client
client = Client(["192.168.1.101:9001"])
nodes = client.cluster()
total = len(nodes)
quorum = (total // 2) + 1
print(f"Need {quorum} nodes out of {total}")
# Check how many are reachable
Solutions:
- Restart failed nodes
- Wait for elections (typically 1-5 seconds)
- Check network connectivity
Operation Failures
NoLeaderError During Writes
Error:
NoLeaderError: No leader elected yet
Cause: Leader election in progress (cluster startup or after leader failure).
Solution: Retry with exponential backoff
import time
from dqlitepy.exceptions import NoLeaderError
def retry_on_no_leader(func, max_retries=5, base_delay=1):
"""Retry operation during leader election."""
for attempt in range(max_retries):
try:
return func()
except NoLeaderError:
if attempt < max_retries - 1:
delay = base_delay * (2 ** attempt) # Exponential backoff
print(f"No leader, waiting {delay}s...")
time.sleep(delay)
else:
raise
# Use it
retry_on_no_leader(lambda: node.exec("INSERT INTO users VALUES (1, 'Alice')"))
Database Locked
Error:
OperationalError: database is locked
Causes:
- Concurrent writes without proper transaction management
- Long-running transaction holding locks
Solutions:
# Use proper transaction management
node.begin()
try:
node.exec("INSERT INTO users VALUES (1, 'Alice')")
node.exec("INSERT INTO posts VALUES (1, 1, 'Hello')")
node.commit()
except Exception:
node.rollback()
raise
# Or use smaller transactions
node.exec("INSERT INTO users VALUES (1, 'Alice')") # Auto-commits
node.exec("INSERT INTO posts VALUES (1, 1, 'Hello')") # Auto-commits
SQLAlchemy Column Order Issues
Problem: Data appears in wrong columns.
This is fixed in dqlitepy >= 0.2.0. If you see this:
# Check your version
import dqlitepy
print(dqlitepy.__version__) # Should be >= 0.2.0
If older, rebuild:
cd /path/to/dqlitepy
bash scripts/build_wheel_docker.sh
pip install --force-reinstall dist/dqlitepy-*.whl
Performance Issues
Slow Writes
Symptoms: Write operations taking seconds.
Causes:
- Network latency between nodes
- Disk I/O bottleneck
- Too many nodes (more replicas = slower writes)
Solutions:
# Batch operations in transactions
node.begin()
for i in range(1000):
node.exec("INSERT INTO users VALUES (?, ?)", [i, f"user{i}"])
node.commit() # Much faster than 1000 individual commits
# Use faster storage
# SSD vs HDD can make 10x difference
Slow Reads from Followers
Symptoms: Queries slower on follower nodes.
Cause: Follower nodes may be catching up on replication.
Solutions:
- Query the leader for critical reads
- Ensure followers have adequate CPU/disk
- Accept slightly stale reads on followers
Memory Usage Growing
Symptoms: Node memory increases over time.
Causes:
- Raft log growing (normal)
- Connection leaks
Solutions:
# Always close connections
try:
conn = connect(node, "db.sqlite")
# ... use connection ...
finally:
conn.close()
# Or use context manager
from contextlib import closing
with closing(connect(node, "db.sqlite")) as conn:
# ... use connection ...
pass # Auto-closes
SQLAlchemy Issues
Session Errors After Node Restart
Error:
OperationalError: connection closed
Solution: Create new session after restarts
# Close old session
session.close()
# Restart node
node.stop()
node.start()
node.open_db("app.db")
# Register again
register_dqlite_node(node, "app.db")
# Create new session
Session = sessionmaker(bind=engine)
session = Session()
Transaction Conflicts
Error:
IntegrityError: UNIQUE constraint failed
Solution: Handle conflicts in application code
from sqlalchemy.exc import IntegrityError
try:
session.add(new_user)
session.commit()
except IntegrityError:
session.rollback()
# Handle duplicate
print("User already exists")
Docker Issues
Containers Can't Communicate
Problem: Nodes in different containers can't connect.
Solution: Use Docker networks
# docker-compose.yml
networks:
dqlite-net:
driver: bridge
ipam:
config:
- subnet: 172.20.0.0/16
services:
node1:
networks:
dqlite-net:
ipv4_address: 172.20.0.11
Data Lost on Container Restart
Problem: Database resets when container restarts.
Solution: Use volumes
services:
node1:
volumes:
- node1-data:/data
volumes:
node1-data:
Debugging Tips
Enable Debug Logging
import logging
# Enable dqlitepy debug logs
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger('dqlitepy')
logger.setLevel(logging.DEBUG)
Check Node Status
def diagnose_node(node):
"""Print node status for debugging."""
print(f"Address: {node.address}")
print(f"ID: {node.id}")
print(f"Data dir: {node.data_dir}")
print(f"Running: {node.is_running}")
if node.is_running:
try:
result = node.query("SELECT 1")
print("Database accessible: Yes")
except Exception as e:
print(f"Database accessible: No ({e})")
diagnose_node(node)
Check Cluster State
def diagnose_cluster(client):
"""Print cluster state for debugging."""
try:
leader = client.leader()
print(f"Leader: {leader}")
except Exception as e:
print(f"Leader: Unknown ({e})")
try:
nodes = client.cluster()
print(f"\nCluster nodes: {len(nodes)}")
for node in nodes:
print(f" - Node {node.id}: {node.address}")
print(f" Role: {node.role_name}")
except Exception as e:
print(f"Cannot list cluster: {e}")
from dqlitepy import Client
client = Client(["192.168.1.101:9001"])
diagnose_cluster(client)
Verify Build
# Check if Go library is included
unzip -l dist/dqlitepy-*.whl | grep libdqlitepy
# Should see:
# dqlitepy/_lib/linux-amd64/libdqlitepy.so
Getting Help
If you're still stuck:
- Check the logs: Enable debug logging (see above)
- Minimal reproduction: Create simplest case that shows the issue
- Check examples: See if
examples/directory has similar use case - GitHub Issues: Open an issue at https://github.com/vantagecompute/dqlitepy/issues
Include in your report:
- dqlitepy version:
python -c "import dqlitepy; print(dqlitepy.__version__)" - Python version:
python --version - OS:
uname -a(Linux) orsysteminfo(Windows) - Complete error message and stack trace
- Minimal code to reproduce
Common Error Messages Reference
| Error | Cause | Solution |
|---|---|---|
Address already in use | Port conflict | Use different port or kill process |
Permission denied | Directory permissions | Fix ownership or use writable directory |
No leader available | Election in progress | Wait and retry |
Node not running | Start() not called | Call node.start() first |
Database not open | open_db() not called | Call node.open_db("name.db") |
Connection closed | Node stopped | Restart node and reconnect |
Module 'cffi' not found | Missing dependency | pip install cffi |
Still having issues? Open an issue on GitHub with details!