dqlitepy Architecture
This document describes the overall architecture of dqlitepy, including how Python bindings interact with the Go shim and the underlying dqlite C library.
Overview
dqlitepy is a multi-layer architecture that bridges Python applications with the dqlite distributed SQLite engine through a Go-based C-compatible shim layer.
Component Architecture
1. Python Layer Components
Core API (dqlitepy/node.py, dqlitepy/client.py)
The core API provides high-level Python interfaces for:
- Node Management: Creating, starting, stopping dqlite nodes
- Cluster Management: Adding/removing nodes, querying cluster state
- Configuration: Setting node options (timeouts, compression, etc.)
Key Features:
- Thread-safe operations using
threading.RLock - Context manager support for automatic cleanup
- Graceful error handling with custom exception hierarchy
- Automatic node ID generation
DB-API 2.0 Interface (dqlitepy/dbapi.py)
PEP 249 compliant database interface providing standard Python database connectivity:
Features:
- Parameter binding with
?placeholders - Transaction support (commit/rollback)
- Multiple fetch methods
- Cursor iteration support
- BLOB and Unicode handling
FFI Layer (dqlitepy/_ffi.py)
The FFI (Foreign Function Interface) layer uses CFFI to load and interact with the Go shim:
Responsibilities:
- Library discovery and loading
- Platform-specific shared library handling
- C type definitions and function signatures
- Error code translation
- Thread-safe library initialization
2. Go Shim Layer
The Go shim (go/shim/main_with_client.go) provides a C-compatible bridge between Python and go-dqlite:
Key Exports:
| Category | Functions |
|---|---|
| Node Lifecycle | dqlitepy_node_create, dqlitepy_node_start, dqlitepy_node_stop, dqlitepy_node_destroy |
| Node Configuration | dqlitepy_node_set_bind_address, dqlitepy_node_set_auto_recovery, dqlitepy_node_set_busy_timeout |
| Client Operations | dqlitepy_client_create, dqlitepy_client_add, dqlitepy_client_remove, dqlitepy_client_leader |
| Cluster Management | dqlitepy_client_cluster, dqlitepy_client_close |
| Utility | dqlitepy_version, dqlitepy_generate_node_id, dqlitepy_last_error |
Memory Management:
// Handle tracking for cleanup
var (
handleMu sync.Mutex
nodeHandles = make(map[dqlitepy_handle]*app.App)
clientHandles = make(map[dqlitepy_handle]*client.Client)
nextHandle = dqlitepy_handle(1)
)
3. C Library Layer
The vendored C libraries provide the core distributed database functionality:
Data Flow
Node Creation and Startup
Cluster Formation
Query Execution (DB-API)
Thread Safety
dqlitepy implements thread safety at multiple levels:
Python Layer:
- Each
Nodehas athreading.RLockfor state mutations - Protects
_started,_handle,_finalizerattributes - Ensures atomic start/stop operations
Go Layer:
- Global mutex for handle map operations
- Per-handle locks for concurrent access
- Go's runtime manages goroutine synchronization
Raft Layer:
- All cluster operations go through Raft leader
- Leader serializes all state changes
- Provides linearizable consistency
Error Handling
Exception Hierarchy:
Exception
├── DqliteError (base for all dqlite errors)
│ ├── NodeError (node operations)
│ ├── ClientError (client operations)
│ │ ├── ClientClosedError
│ │ ├── ClientConnectionError
│ │ └── ClusterConfigurationError
│ └── DatabaseError (DB-API 2.0 base)
│ ├── DataError
│ ├── IntegrityError
│ ├── NotSupportedError
│ └── OperationalError
└── Warning
└── ResourceWarning (cleanup issues)
Performance Characteristics
Memory Usage
| Component | Memory |
|---|---|
| Python Node object | ~1 KB |
| Go app.App | ~50-100 KB |
| dqlite node | ~10-50 MB (depends on database size) |
| Per-connection | ~100 KB |
Latency Profile
Read Operations: 0.5-2ms (no Raft consensus required)
Write Operations: 2-10ms (requires Raft consensus)
Leader Election: 100-500ms (during failures)
Scalability
Cluster Size
- Recommended: 3-5 nodes
- Maximum tested: 7 nodes
- Optimal for fault tolerance: 3 nodes (tolerates 1 failure)
Database Size
- SQLite limits: 281 TB theoretical, 140 TB tested
- Practical limit: Depends on disk and memory
- Snapshot transfer: Affects new node join time
Connection Pooling
dqlitepy nodes can handle multiple concurrent connections:
Security Considerations
Network Security
- No built-in encryption: dqlite wire protocol is unencrypted
- Recommendation: Use TLS tunnels (stunnel, wireguard) or private networks
- Authentication: None built-in, rely on network isolation
File System
- Data directory: Should have restricted permissions (700)
- SQLite files: WAL mode requires proper file locking
- Snapshots: Contain full database, protect with encryption at rest
Memory Safety
- Go runtime: Memory safe, garbage collected
- C libraries: Potential for memory bugs (use vendored, tested versions)
- CFFI: Type-safe bindings, validated at runtime
Known Limitations
Upstream Issues
-
Segfault in stop(): The dqlite C library's
dqlitepy_node_stop()function has a segfault bug- Workaround: Disabled finalizer and explicit stop() calls
- Impact: Nodes not explicitly stopped during cleanup
- Status: Tracked with upstream maintainers
-
BLOB Serialization: JSON serialization converts bytes to strings
- Workaround: Tests handle both bytes and string types
- Impact: May require manual conversion in application code
Current Limitations
- No SSL/TLS: Wire protocol is unencrypted
- No authentication: Relies on network security
- Single database per node: Can't host multiple databases in one node
- Synchronous API: No async/await support (yet)