Skip to main content

dqlitepy Architecture

This document describes the overall architecture of dqlitepy, including how Python bindings interact with the Go shim and the underlying dqlite C library.

Overview

dqlitepy is a multi-layer architecture that bridges Python applications with the dqlite distributed SQLite engine through a Go-based C-compatible shim layer.

Component Architecture

1. Python Layer Components

Core API (dqlitepy/node.py, dqlitepy/client.py)

The core API provides high-level Python interfaces for:

  • Node Management: Creating, starting, stopping dqlite nodes
  • Cluster Management: Adding/removing nodes, querying cluster state
  • Configuration: Setting node options (timeouts, compression, etc.)

Key Features:

  • Thread-safe operations using threading.RLock
  • Context manager support for automatic cleanup
  • Graceful error handling with custom exception hierarchy
  • Automatic node ID generation

DB-API 2.0 Interface (dqlitepy/dbapi.py)

PEP 249 compliant database interface providing standard Python database connectivity:

Features:

  • Parameter binding with ? placeholders
  • Transaction support (commit/rollback)
  • Multiple fetch methods
  • Cursor iteration support
  • BLOB and Unicode handling

FFI Layer (dqlitepy/_ffi.py)

The FFI (Foreign Function Interface) layer uses CFFI to load and interact with the Go shim:

Responsibilities:

  • Library discovery and loading
  • Platform-specific shared library handling
  • C type definitions and function signatures
  • Error code translation
  • Thread-safe library initialization

2. Go Shim Layer

The Go shim (go/shim/main_with_client.go) provides a C-compatible bridge between Python and go-dqlite:

Key Exports:

CategoryFunctions
Node Lifecycledqlitepy_node_create, dqlitepy_node_start, dqlitepy_node_stop, dqlitepy_node_destroy
Node Configurationdqlitepy_node_set_bind_address, dqlitepy_node_set_auto_recovery, dqlitepy_node_set_busy_timeout
Client Operationsdqlitepy_client_create, dqlitepy_client_add, dqlitepy_client_remove, dqlitepy_client_leader
Cluster Managementdqlitepy_client_cluster, dqlitepy_client_close
Utilitydqlitepy_version, dqlitepy_generate_node_id, dqlitepy_last_error

Memory Management:

// Handle tracking for cleanup
var (
handleMu sync.Mutex
nodeHandles = make(map[dqlitepy_handle]*app.App)
clientHandles = make(map[dqlitepy_handle]*client.Client)
nextHandle = dqlitepy_handle(1)
)

3. C Library Layer

The vendored C libraries provide the core distributed database functionality:

Data Flow

Node Creation and Startup

Cluster Formation

Query Execution (DB-API)

Thread Safety

dqlitepy implements thread safety at multiple levels:

Python Layer:

  • Each Node has a threading.RLock for state mutations
  • Protects _started, _handle, _finalizer attributes
  • Ensures atomic start/stop operations

Go Layer:

  • Global mutex for handle map operations
  • Per-handle locks for concurrent access
  • Go's runtime manages goroutine synchronization

Raft Layer:

  • All cluster operations go through Raft leader
  • Leader serializes all state changes
  • Provides linearizable consistency

Error Handling

Exception Hierarchy:

Exception
├── DqliteError (base for all dqlite errors)
│ ├── NodeError (node operations)
│ ├── ClientError (client operations)
│ │ ├── ClientClosedError
│ │ ├── ClientConnectionError
│ │ └── ClusterConfigurationError
│ └── DatabaseError (DB-API 2.0 base)
│ ├── DataError
│ ├── IntegrityError
│ ├── NotSupportedError
│ └── OperationalError
└── Warning
└── ResourceWarning (cleanup issues)

Performance Characteristics

Memory Usage

ComponentMemory
Python Node object~1 KB
Go app.App~50-100 KB
dqlite node~10-50 MB (depends on database size)
Per-connection~100 KB

Latency Profile

Read Operations: 0.5-2ms (no Raft consensus required)
Write Operations: 2-10ms (requires Raft consensus)
Leader Election: 100-500ms (during failures)

Scalability

Cluster Size

  • Recommended: 3-5 nodes
  • Maximum tested: 7 nodes
  • Optimal for fault tolerance: 3 nodes (tolerates 1 failure)

Database Size

  • SQLite limits: 281 TB theoretical, 140 TB tested
  • Practical limit: Depends on disk and memory
  • Snapshot transfer: Affects new node join time

Connection Pooling

dqlitepy nodes can handle multiple concurrent connections:

Security Considerations

Network Security

  • No built-in encryption: dqlite wire protocol is unencrypted
  • Recommendation: Use TLS tunnels (stunnel, wireguard) or private networks
  • Authentication: None built-in, rely on network isolation

File System

  • Data directory: Should have restricted permissions (700)
  • SQLite files: WAL mode requires proper file locking
  • Snapshots: Contain full database, protect with encryption at rest

Memory Safety

  • Go runtime: Memory safe, garbage collected
  • C libraries: Potential for memory bugs (use vendored, tested versions)
  • CFFI: Type-safe bindings, validated at runtime

Known Limitations

Upstream Issues

  1. Segfault in stop(): The dqlite C library's dqlitepy_node_stop() function has a segfault bug

    • Workaround: Disabled finalizer and explicit stop() calls
    • Impact: Nodes not explicitly stopped during cleanup
    • Status: Tracked with upstream maintainers
  2. BLOB Serialization: JSON serialization converts bytes to strings

    • Workaround: Tests handle both bytes and string types
    • Impact: May require manual conversion in application code

Current Limitations

  • No SSL/TLS: Wire protocol is unencrypted
  • No authentication: Relies on network security
  • Single database per node: Can't host multiple databases in one node
  • Synchronous API: No async/await support (yet)

Future Enhancements

References