Deployment Guide
Deploy relocatable Slurm packages to HPC clusters. Packages can be built locally or downloaded from S3.
Quick Start
Option 1: Build Locally
# Build package (default: GCC 13.4.0 for Ubuntu 24.04)
slurm-factory build-slurm --slurm-version 25.11
# Build for RHEL 8 / Ubuntu 20.04 compatibility
slurm-factory build-slurm --slurm-version 25.11 --compiler-version 10.5.0
# Build for RHEL 7 compatibility
slurm-factory build-slurm --slurm-version 25.11 --compiler-version 7.5.0
# Deploy
sudo tar -xzf ~/.slurm-factory/builds/slurm-25.11-gcc13.4.0-software.tar.gz -C /opt/
cd /opt && sudo ./data/slurm_assets/slurm_install.sh --full-init --cluster-name mycluster
# Load module
module load slurm/25.11-gcc13.4.0
Option 2: Download Pre-Built Package from S3
# Download from S3
aws s3 cp s3://vantagecompute-slurm-builds/slurm-25.11-gcc13.4.0-software.tar.gz /tmp/
# Deploy
sudo tar -xzf /tmp/slurm-25.11-gcc13.4.0-software.tar.gz -C /opt/
cd /opt && sudo ./data/slurm_assets/slurm_install.sh --full-init --cluster-name mycluster
# Load module
module load slurm/25.11-gcc13.4.0
See Build Artifacts for all available S3 packages.
Cross-Distro Compatibility
Use --compiler-version to build packages compatible with older distributions:
| Compiler Version | Compatible Distros | glibc |
|---|---|---|
| 14.2.0 | Ubuntu 24.10+, Fedora 40+ | 2.40+ |
| 13.4.0 (default) | Ubuntu 24.04+, Debian 13+ | 2.39 |
| 12.5.0 | Ubuntu 23.10+, Debian 12+ | 2.38 |
| 11.5.0 | Ubuntu 22.04+, Debian 12+ | 2.35 |
| 10.5.0 | RHEL 8+, Ubuntu 20.04+, Debian 11+ | 2.31 |
| 8.5.0 | RHEL 8+, CentOS 8+ | 2.28 |
Build time: Older compilers add 30-60 minutes for toolchain bootstrap on first build.
Installation Script Options
--full-init- Complete installation (users, dirs, configs, services)--head-node-init- Install MySQL, InfluxDB, SSSD dependencies--start-services- Start Slurm daemons--cluster-name NAME- Set cluster name--org-id ID- Organization ID for LDAP--sssd-binder-password PW- SSSD password--ldap-uri URI- LDAP server URI
Examples:
# Head node
sudo ./slurm_install.sh --head-node-init --full-init --start-services --cluster-name prod
# Compute node
sudo ./slurm_install.sh --full-init --start-services --cluster-name prod
# With LDAP
sudo ./slurm_install.sh --full-init --org-id myorg --ldap-uri ldap://ldap.example.com
Package Structure
Each tarball contains:
slurm-{version}-gcc{compiler}-software.tar.gz
├── data/slurm_assets/ # Configuration templates & install script
├── modules/slurm/ # Lmod modulefiles
│ └── {version}-gcc{compiler}.lua
└── view/ # Slurm installation
├── bin/ # srun, sbatch, squeue, sinfo
├── sbin/ # slurmd, slurmctld, slurmdbd
└── lib/ # Libraries & plugins
Relocatable Deployment
# Custom path deployment
export SLURM_INSTALL_PREFIX=/shared/apps/slurm
module load slurm/25.11-gcc13.4.0
# Verify
which srun
echo $SLURM_ROOT
Multi-Version Deployment
# Deploy multiple versions
sudo tar -xzf slurm-25.11-gcc13.4.0-software.tar.gz -C /opt/slurm-25.11/
sudo tar -xzf slurm-24.11-gcc11.5.0-software.tar.gz -C /opt/slurm-24.11/
cd /opt/slurm-25.11 && sudo ./data/slurm_assets/slurm_install.sh
cd /opt/slurm-24.11 && sudo ./data/slurm_assets/slurm_install.sh
# Switch versions
module load slurm/25.11-gcc13.4.0 # or slurm/24.11-gcc11.5.0
Basic Configuration
# Create slurm.conf
sudo tee /etc/slurm/slurm.conf << 'EOF'
ClusterName=mycluster
ControlMachine=head01
SlurmUser=slurm
AuthType=auth/munge
StateSaveLocation=/var/spool/slurm/ctld
NodeName=node[01-04] CPUs=8 RealMemory=32000 State=UNKNOWN
PartitionName=compute Nodes=node[01-04] Default=YES State=UP
EOF
# Start services
sudo systemctl start slurmctld # head node
sudo systemctl start slurmd # compute nodes
Troubleshooting
Module not found:
module avail
echo $MODULEPATH
Library errors:
ldd $(which srun)
export LD_LIBRARY_PATH=/opt/slurm/view/lib:$LD_LIBRARY_PATH
Permission errors:
sudo chmod 755 /opt/slurm/view/bin/*
sudo chmod 755 /opt/slurm/view/sbin/*