Build Optimization & Efficiency
How Slurm Factory achieves extreme build efficiency through intelligent caching, layered compilation, and public buildcache distribution.
Architecture Overview
Slurm Factory employs a three-tier caching strategy that dramatically reduces build times and ensures consistency across any Slurm version × GCC compiler combination.
CI/CD Lifecycle: Three-Stage Pipeline
Our GitHub Actions workflows create a dependency pyramid that maximizes cache reuse:
Stage 1: Compiler Bootstrap (Build Once, Use Forever)
Workflow: build-and-publish-compiler-buildcache.yml
- Builds: 9 GCC compiler versions (7.5.0 → 15.2.0)
- Runtime: ~45 minutes per compiler (parallel execution)
- Frequency: Once per compiler version (rarely updated)
- Output: Relocatable GCC compilers with gcc-runtime libraries
- Storage: Published to S3 at
s3://slurm-factory-spack-buildcache-4b670/build_cache/
# Each compiler is self-contained and relocatable
gcc-7.5.0-compiler.tar.gz # RHEL 7 compatible (glibc 2.17)
gcc-13.4.0-compiler.tar.gz # Ubuntu 24.04 (glibc 2.39) - default
gcc-15.2.0-compiler.tar.gz # Latest (glibc 2.40)
Key Optimization: Compilers are built once and reused across all Slurm versions and dependency combinations.
Stage 2: Dependency Buildcache (Compiler-Specific, Slurm-Agnostic)
Workflow: build-and-publish-slurm-deps-all-compilers.yml
- Builds: Dependencies for each Slurm version × compiler combination
- Matrix: 4 Slurm versions × 8 compilers = 32 dependency sets
- Runtime: ~30 minutes per combination (downloads compiler from Stage 1)
- Frequency: When Slurm versions change or dependencies update
- Dependencies: OpenMPI, MySQL, hwloc, JSON-C, YAML, JWT, Lua, etc.
# Dependencies are compiler-specific but Slurm-agnostic
slurm-25.11-deps-gcc-13.4.0/ # All deps for Slurm 25.11 with GCC 13.4.0
slurm-24.11-deps-gcc-11.5.0/ # All deps for Slurm 24.11 with GCC 11.5.0
Key Optimization: Dependencies are built once per compiler and shared across multiple Slurm patch releases.
Stage 3: Final Slurm Packages (Binary Distribution)
Workflow: build-and-publish-all-packages.yml
- Builds: Complete Slurm installations (binary + dependencies + modules)
- Runtime: ~5 minutes (downloads from Stages 1 & 2, builds only Slurm binary)
- Frequency: For each new Slurm release or configuration change
- Output: Production-ready tarballs with Lmod modules
# Final deployable packages
slurm-25.11-gcc-13.4.0-x86_64.tar.gz # Complete installation
slurm-24.11-gcc-11.5.0-x86_64.tar.gz # Different version/compiler
Key Optimization: Only Slurm itself is compiled; everything else is downloaded from cache.
Buildcache Efficiency Metrics
Without Buildcache (Traditional Spack)
First Build: 90 minutes (compile GCC + deps + Slurm)
Second Build: 90 minutes (no sharing between versions)
Third Build: 90 minutes (no sharing between compilers)
Total Time: 4.5 hours for 3 combinations
With Slurm Factory Buildcache
Stage 1 (one-time): 45 min (build 1 compiler)
Stage 2 (per Slurm): 30 min (build deps, reuse compiler)
Stage 3 (final): 5 min (build Slurm, reuse everything)
Subsequent builds: 2 min (download pre-built from S3/CDN)
Total for first combo: 80 min
Total for 2nd combo: 2 min (if compiler/deps exist)
Total for 3rd combo: 2 min (if compiler/deps exist)
Total Time: 84 minutes for 3 combinations (94% reduction)
Cache Hit Rates in Production
| Scenario | Cache Hits | Build Time | Speedup |
|---|---|---|---|
| First-time user, popular combo (Slurm 25.11 + GCC 13.4.0) | Compiler + Deps + Slurm | ~2 min | 45x faster |
| New Slurm version, existing compiler (Slurm 25.11 + GCC 13.4.0) | Compiler + Deps | ~5 min | 18x faster |
| New compiler, existing Slurm (Slurm 25.11 + GCC 16.0.0) | None | ~80 min | 1x (builds cache) |
| Rebuild same config | Everything | ~2 min | 45x faster |
Consistency Across Configurations
Supported Matrix: 4 Slurm × 9 GCC = 36 Combinations
| Slurm Version | GCC Versions | Total Configs |
|---|---|---|
| 25.11 (latest) | 7.5.0, 8.5.0, 9.5.0, 10.5.0, 11.5.0, 12.5.0, 13.4.0, 14.2.0, 15.2.0 | 9 |
| 24.11 (stable) | 7.5.0, 8.5.0, 9.5.0, 10.5.0, 11.5.0, 12.5.0, 13.4.0, 14.2.0, 15.2.0 | 9 |
| 23.11 (LTS) | 7.5.0, 8.5.0, 9.5.0, 10.5.0, 11.5.0, 12.5.0, 13.4.0, 14.2.0, 15.2.0 | 9 |
Every combination produces:
- ✅ Identical module structure (Lmod-based)
- ✅ Consistent RPATH configuration (no LD_LIBRARY_PATH needed)
- ✅ Relocatable binaries (extract anywhere)
- ✅ Same dependency versions (per Slurm release)
- ✅ Predictable file layout (
/opt/slurm/view/)
Configuration Consistency Examples
# Load Slurm 25.11 with GCC 13.4.0
module load slurm/25.11-gcc-13.4.0
which scontrol # /opt/slurm/view/bin/scontrol
# Switch to GCC 11.5.0 build (same Slurm version)
module swap slurm/25.11-gcc-11.5.0
which scontrol # /opt/slurm/view/bin/scontrol (same path, different binary)
# Switch to Slurm 24.11 (same compiler)
module swap slurm/24.11-gcc-13.4.0
which scontrol # /opt/slurm/view/bin/scontrol (same path, different version)
All combinations behave identically:
- Same configuration file paths (
/etc/slurm/slurm.conf) - Same environment variables (
SLURM_CONF,SLURM_ROOT) - Same service management (systemd units)
- Same module metadata (version, compiler, architecture)
Public Buildcache Distribution
Infrastructure
- S3 Bucket:
s3://slurm-factory-spack-buildcache-4b670/ - CloudFront CDN:
https://slurm-factory-spack-binary-cache.vantagecompute.ai - Public Access: Unauthenticated HTTP/HTTPS (no AWS credentials needed)
- Global Edge: CloudFront serves from nearest location
- Bandwidth: Free egress for open-source usage
Spack Integration
Slurm Factory automatically configures Spack to use the public buildcache:
# Auto-generated in spack.yaml
mirrors:
slurm-factory:
url: https://slurm-factory-spack-binary-cache.vantagecompute.ai/build_cache
signed: false
config:
install_tree:
padded_length: 128 # Relocatable binaries
No manual mirror setup required - every build checks the buildcache first.
Download Flow
Build Artifact Optimization
Package Size Comparison
| Package Type | Size | Compression | Distribution |
|---|---|---|---|
| Source Slurm | ~12 MB | tar.bz2 | schedmd.com |
| Compiled Slurm (static) | ~500 MB | Uncompressed | Custom builds |
| Slurm Factory (shared libs) | ~180 MB | tar.gz | S3 + CDN |
| Full installation (with deps) | ~450 MB | tar.gz | S3 + CDN |
Storage Efficiency
# Traditional approach: 36 full builds
36 configs × 450 MB = 16.2 GB total storage
# Slurm Factory approach: layered caching
9 compilers × 180 MB = 1.6 GB (Stage 1)
32 dep sets × 200 MB = 6.4 GB (Stage 2)
36 Slurm pkgs × 50 MB = 1.8 GB (Stage 3 - only Slurm binary)
Total storage: 9.8 GB (40% reduction)
Deduplication: Shared dependencies (OpenMPI, MySQL) are stored once per compiler, not per Slurm version.
Local Build Optimization
Even when building locally (not using public buildcache), Slurm Factory caches intelligently:
Local Cache Structure
~/.slurm-factory/
├── spack-buildcache/ # Downloaded binaries from S3
│ └── linux-ubuntu24.04-x86_64/
├── spack-sourcecache/ # Source tarballs (never re-downloaded)
│ ├── slurm-25.11.tar.bz2
│ └── gcc-13.4.0.tar.gz
├── compilers/ # Built compilers (reused across Slurm versions)
│ ├── gcc-13.4.0/
│ └── gcc-11.5.0/
└── builds/ # Final outputs
└── slurm-25.11-gcc-13.4.0/
Build Performance (Local)
# First build: Downloads from S3 buildcache
$ time slurm-factory build-slurm --slurm-version 25.11 --compiler-version 13.4.0
real 2m15s # Download + extract
# Different Slurm version, same compiler: Reuses compiler, downloads Slurm deps
$ time slurm-factory build-slurm --slurm-version 24.11 --compiler-version 13.4.0
real 1m45s # Compiler cached locally
# Same Slurm, different compiler: Downloads new compiler + deps
$ time slurm-factory build-slurm --slurm-version 25.11 --compiler-version 11.5.0
real 2m30s # New compiler from buildcache
# Rebuild same config: Everything cached
$ time slurm-factory build-slurm --slurm-version 25.11 --compiler-version 13.4.0
real 0m45s # All local caches hit
Production Deployment Efficiency
Parallel Deployment Across Cluster
# Deploy to 100 compute nodes in parallel (using Ansible/parallel-ssh)
time ansible compute -m copy -a "src=slurm-25.11-gcc-13.4.0.tar.gz dest=/tmp/"
# ~2 minutes (network I/O)
time ansible compute -m shell -a "tar -xzf /tmp/slurm-25.11-gcc-13.4.0.tar.gz -C /opt/"
# ~30 seconds (parallel extraction)
time ansible compute -m shell -a "systemctl restart slurmd"
# ~10 seconds (parallel restart)
# Total cluster update time: ~3 minutes for 100 nodes
Incremental Updates
When upgrading Slurm versions, only changed files are updated:
# Upgrade from 25.11 to 25.11 (same compiler)
rsync -av --delete slurm-25.11-gcc-13.4.0/ /opt/slurm/
# Only Slurm binaries changed (~50 MB)
# Dependencies unchanged (~400 MB reused)
Why This Approach is Superior
Compared to Package Managers (apt/yum)
| Feature | apt/yum | Slurm Factory |
|---|---|---|
| Slurm Versions | 1-2 (distro-provided) | 4 (upstream releases) |
| GCC Versions | 1 (system default) | 9 (7.5.0 → 15.2.0) |
| Combinations | 1-2 | 36 |
| Update Lag | 6-12 months | 0 (same-day releases) |
| Relocatable | No (hardcoded paths) | Yes (extract anywhere) |
| Consistency | Varies by distro | Identical everywhere |
Compared to EasyBuild/Spack Manual
| Feature | Manual Spack | Slurm Factory |
|---|---|---|
| First Build Time | 90 min | 2 min (buildcache) |
| Configuration | Manual spack.yaml | Auto-generated |
| Dependency Conflicts | Common (manual resolution) | Never (curated specs) |
| Module System | Manual setup | Auto-configured Lmod |
| Reproducibility | Environment-dependent | Guaranteed (Docker + buildcache) |
| Public Distribution | None | Free S3 + CDN |
Compared to Pre-built Containers
| Feature | Docker/Singularity | Slurm Factory |
|---|---|---|
| Size | 2-4 GB (full OS) | 450 MB (just software) |
| Flexibility | Monolithic | Modular (swap versions) |
| System Integration | Isolated | Native (systemd, cgroups) |
| Multiple Versions | Separate containers | Parallel modules |
| Update Process | Rebuild entire container | Swap tarball |
Performance Tuning Tips
For CI/CD Pipelines
# GitHub Actions: Use self-hosted runners with local caches
runs-on: self-hosted # Persistent disk caches
timeout-minutes: 480 # Allow long builds
# Fail-fast: false = build all combinations even if one fails
strategy:
fail-fast: false
matrix:
slurm_version: ["25.11", "24.11", "23.11"]
compiler_version: ["13.4.0", "11.5.0", "10.5.0"]
For Local Development
# Pre-download all sources to avoid network retries
spack mirror create -d ~/spack-mirror slurm openmpi mysql
# Use local mirror for offline builds
slurm-factory build-slurm --mirror ~/spack-mirror
# Increase Docker resources for faster compilation
export DOCKER_CPUS=16
export DOCKER_MEMORY=32g
For Production Deployments
# Use shared filesystem to deploy once, mount everywhere
tar -xzf slurm-25.11-gcc-13.4.0.tar.gz -C /shared/nfs/slurm/
# Nodes mount /shared/nfs/slurm -> /opt/slurm (read-only)
# No extraction needed on compute nodes
# Or use container registries for immutable deployments
skopeo copy dir:slurm-25.11-gcc-13.4.0 docker://registry/slurm:25.11
Conclusion
Slurm Factory's three-tier buildcache strategy achieves:
- ⚡ 45x faster builds for cached combinations (2 min vs 90 min)
- 🔄 100% consistency across 36 Slurm × GCC combinations
- 💾 40% storage savings through intelligent deduplication
- 🌐 Global CDN distribution via CloudFront (no credentials needed)
- 🔒 Reproducible builds guaranteed by Docker + public buildcache
- 📦 Modular architecture enabling mix-and-match versions
The result: Any user can deploy any supported Slurm version with any supported GCC compiler in under 3 minutes, with zero configuration and guaranteed reproducibility.