Architecture
Slurm Factory is a modern Python-based build system that creates relocatable Slurm packages using Docker containers and Spack. The architecture is designed for modularity, reproducibility, and performance with intelligent caching at multiple layers.
System Overview
Module Structure
slurm_factory/
├── main.py # CLI interface
├── builder.py # Build orchestration
├── config.py # Settings & cache management
├── constants.py # Build configuration & templates
├── spack_yaml.py # Dynamic Spack config
├── utils.py # Docker & package creation
└── exceptions.py # Error handling
Build Pipeline
The build process follows a multi-stage pipeline with intelligent caching:
Stage 1: Compiler Build (Optional)
Outputs:
- Docker image:
slurm-factory:compiler-{version} - Buildcache:
~/.slurm-factory/buildcache/(local) or S3 (published) - Tarball:
~/.slurm-factory/compilers/{version}/gcc-{version}-compiler.tar.gz(optional)
Time: 30-60 minutes (first build), 2-5 minutes (from cache)
Stage 2: Slurm Build
Outputs:
- Docker image:
slurm-factory:{slurm_version}-gcc{compiler_version} - Buildcache:
~/.slurm-factory/buildcache/(local) or S3 (published) - Tarball:
~/.slurm-factory/builds/slurm-{version}-gcc{compiler}-software.tar.gz
Time:
- From buildcache: 5-15 minutes
- Partial cache: 20-45 minutes
- From source: 45-90 minutes (CPU), 90-180 minutes (GPU)
Caching Strategy
Slurm Factory implements a sophisticated multi-layer caching system:
Cache Hierarchy
Cache Locations
~/.slurm-factory/
├── buildcache/ # Spack binary packages (Layer 2)
│ ├── build_cache/ # Compiled binaries
│ ├── _pgp/ # GPG keys for verification
│ └── *.spec.json # Package specifications
├── sourcecache/ # Downloaded sources (Layer 3)
│ ├── gcc-*.tar.xz
│ ├── slurm-*.tar.bz2
│ └── ...
├── builds/ # Final tarballs (Layer 4)
│ └── slurm-*-software.tar.gz
└── compilers/ # Compiler tarballs (Layer 4)
└── {version}/
└── gcc-*-compiler.tar.gz
Docker Volume Mounts (Layer 1):
~/.slurm-factory/buildcache → /opt/slurm-factory-cache/buildcache
~/.slurm-factory/sourcecache → /opt/slurm-factory-cache/sourcecache
Cache Performance
| Build Scenario | Time | Cache Hit |
|---|---|---|
| First compiler build | 30-60 min | 0% |
| Subsequent compiler | 2-5 min | 95% |
| First Slurm (source) | 45-90 min | 0% |
| Slurm (compiler cached) | 30-60 min | 20% |
| Slurm (full cache) | 5-15 min | 95% |
| Slurm (GPU, source) | 90-180 min | 0% |
| Slurm (GPU, cached) | 15-25 min | 85% |
Public Buildcache
In addition to local caching, Slurm Factory can use the public buildcache:
Compiler Buildcache:
https://slurm-factory-spack-binary-cache.vantagecompute.ai/compilers/{version}/
Slurm Buildcache:
https://slurm-factory-spack-binary-cache.vantagecompute.ai/slurm/{slurm_version}/{compiler_version}/
Benefits:
- ✅ 10-15x faster builds - Skip compilation entirely
- ✅ 80-90% storage savings - No local compilation required
- ✅ Global CDN - Fast downloads via CloudFront
- ✅ Reproducible - Identical binaries across systems
See Slurm Factory Spack Build Cache for details.
Dependency Classification
External (Build-Only):
- cmake, autoconf, automake, gcc, pkg-config
- Not included in final package
Built Fresh (Runtime):
- munge, json-c, curl, openssl, hwloc
- Architecture-optimized, security-critical
Relocatable Modules
Dynamic module paths using environment variable fallback:
-- Dynamic prefix with fallback
local prefix = os.getenv("SLURM_INSTALL_PREFIX") or "/opt/slurm/view"
prepend_path("PATH", pathJoin(prefix, "bin"))
prepend_path("LD_LIBRARY_PATH", pathJoin(prefix, "lib"))
setenv("SLURM_ROOT", prefix)
Usage:
# Default location
module load slurm/25.11-noble
# Custom location
export SLURM_INSTALL_PREFIX=/shared/apps/slurm
module load slurm/25.11-noble
Package Structure
Single tarball output containing everything needed:
slurm-{version}-{toolchain}-software.tar.gz (2-25GB depending on options)
├── view/ # Slurm installation
│ ├── bin/ # Slurm binaries (srun, sbatch, etc.)
│ ├── sbin/ # Daemons (slurmd, slurmctld, slurmdbd)
│ ├── lib/ # Shared libraries and plugins
│ │ ├── libslurm.so # Core Slurm library
│ │ └── slurm/ # Slurm plugins
│ ├── lib64/ # 64-bit libraries
│ ├── include/ # Header files
│ │ └── slurm/ # Slurm headers
│ └── share/ # Documentation, man pages
│ ├── man/ # Man pages
│ └── doc/ # Documentation
├── modules/slurm/ # Lmod modulefiles
│ └── {version}-{toolchain}.lua # Modulefile for this build
└── data/slurm_assets/ # Configuration and scripts
├── slurm_install.sh # Installation script
├── defaults/ # Systemd unit files
│ ├── slurmd.service
│ ├── slurmctld.service
│ └── slurmdbd.service
├── slurm/ # Configuration templates
│ ├── slurm.conf.template
│ ├── slurmdbd.conf.template
│ └── cgroup.conf.template
└── mysql/ # Database configs
└── slurm_acct_db.sql
Package naming examples:
slurm-25.11-noble-software.tar.gz- Slurm 25.11 for Ubuntu 24.04slurm-24.11-jammy-software.tar.gz- Slurm 24.11 for Ubuntu 22.04slurm-23.11-rockylinux9-software.tar.gz- Slurm 23.11 for Rocky Linux 9
Toolchain Support
Builds support multiple OS toolchains for cross-distribution compatibility:
| Toolchain | Target OS | GCC Version | glibc |
|---|---|---|---|
| resolute | Ubuntu 25.04 | 15.x | 2.41+ |
| noble | Ubuntu 24.04 | 13.x | 2.39 |
| jammy | Ubuntu 22.04 | 11.x | 2.35 |
| rockylinux10 | Rocky Linux 10 / RHEL 10 | 14.x | 2.39+ |
| rockylinux9 | Rocky Linux 9 / RHEL 9 | 11.x | 2.34 |
| rockylinux8 | Rocky Linux 8 / RHEL 8 | 8.x | 2.28 |
Each toolchain uses the OS-provided compiler for maximum binary compatibility.
Build Options
# CPU-optimized for Ubuntu 24.04 (recommended)
slurm-factory build-slurm --slurm-version 25.11 --toolchain noble
# For Rocky Linux 9 compatibility
slurm-factory build-slurm --slurm-version 25.11 --toolchain rockylinux9
# With GPU support (CUDA/ROCm)
slurm-factory build-slurm --slurm-version 25.11 --toolchain noble --gpu
# Combined: Rocky Linux 8 with GPU support
slurm-factory build-slurm --slurm-version 25.11 --toolchain rockylinux8 --gpu
Docker Integration
Volume Mounts:
Host → Container
~/.slurm-factory/buildcache → /opt/slurm-factory-cache/buildcache
~/.slurm-factory/sourcecache → /opt/slurm-factory-cache/sourcecache
/var/run/docker.sock → /var/run/docker.sock (if needed)
Container Lifecycle:
Build Environment:
- Base: Ubuntu 24.04 LTS
- Spack: v1.0.0
- Python: 3.12+
- Build Tools: gcc, g++, gfortran, make, cmake, autotools
- Libraries: libc6-dev, libssl-dev, etc.
Resource Limits:
# Default (can be customized)
--cpus=0 # Use all available CPUs
--memory=0 # Use all available memory
--shm-size=2g # Shared memory for compilation
Error Handling
Custom exception hierarchy for comprehensive error management:
Exception Types:
SlurmFactoryError- Base exception with contextDockerError- Docker operation failures (image build, container run)BuildError- Build process failures (compilation, packaging)ConfigurationError- Invalid configuration (missing settings, wrong types)SpackError- Spack-specific errors (spec resolution, install failures)
Error Context:
All exceptions include:
- Clear error message
- Contextual information (package, version, stage)
- Suggested remediation steps
- Debug information (when verbose mode enabled)
Example:
raise BuildError(
"Failed to build package 'slurm'",
context={
"package": "slurm@25.11",
"toolchain": "noble",
"stage": "compilation",
"log_file": "/tmp/slurm-build.log"
}
)
Optimization Features
- Parallel compilation - 4 jobs by default
- ccache - C/C++ object caching
- Docker layer caching - Faster image builds
- Binary package cache - Reuse compiled dependencies
- Hardlinks - Fast file operations
Security
Process Isolation
- Docker Containers: All builds run in isolated containers
- User Namespaces: Container processes run as non-root when possible
- Resource Limits: CPU, memory, and disk quotas enforced
- Network Isolation: Containers can be restricted to specific networks
Data Protection
- Checksum Validation: All downloaded sources verified
- GPG Signing: Buildcache packages signed with GPG keys
- Secure Defaults: TLS/SSL enabled for all network connections
- No Privileged Containers: Builds don't require privileged access
Access Control
- User-Specific Caches: Each user has isolated cache directories
- File Permissions: Proper ownership and permissions on outputs
- AWS OIDC: GitHub Actions use temporary credentials (no long-lived keys)
- Secret Management: Sensitive data stored in GitHub Secrets
Supply Chain Security
- Pinned Dependencies: Exact versions specified in
spack.yaml - Source Verification: Downloaded sources verified with checksums
- Binary Verification: GPG signatures on buildcache packages
- Audit Trail: All builds logged with full provenance information
See Also
- Slurm Factory Spack Build Cache - Public buildcache details
- Infrastructure - AWS infrastructure and CDK
- GitHub Actions - CI/CD workflows
- API Reference - CLI and Python API docs