Slurm Workload Manager
Slurm Workload Manager is an open-source, fault-tolerant, and highly scalable cluster management and job scheduling system designed for Linux clusters of varying sizes, enabling efficient resource allocation, job execution, and contention arbitration among users.[1] Originally developed in 2001 by Lawrence Livermore National Laboratory to address the need for an open-source resource manager on commodity hardware, it was initially released in 2002 as a simple resource manager and has since evolved to support diverse processor types, network architectures, and parallel computing environments.[2] Maintained by SchedMD—a company founded in 2010 by key developers Morris Jette and Danny Auble—Slurm operates under the GNU General Public License version 2 or later, ensuring its free availability and portability across platforms.[2][3][4]
At its core, Slurm allocates exclusive or non-exclusive access to compute nodes, manages job queues, and provides tools for monitoring and administration, such as srun for job initiation, squeue for status checks, and scontrol for configuration.[1] Its architecture includes a centralized controller (slurmctld) with backup support for high availability, node daemons (slurmd) for local execution, and optional components like a database (slurmdbd) for accounting or a REST API (slurmrestd) for integration.[1] Slurm's extensibility through plugins allows customization for features like advanced reservations, backfill scheduling, and license management, making it adaptable to high-performance computing (HPC), AI workloads, and cloud environments.[1][5]
Slurm's prominence in the HPC community is evident in its adoption by approximately 65% of the TOP500 supercomputers, powering some of the world's largest computational resources for scientific simulations, data analysis, and machine learning tasks.[6] Its fault-tolerant design ensures minimal downtime in large-scale deployments, while ongoing development—reflected in releases like version 25.11—continues to enhance scalability and performance for modern infrastructures.[7][2]
Introduction
Overview
Slurm Workload Manager, commonly known as Slurm, is an open-source, fault-tolerant, and highly scalable cluster management and job scheduling system designed for Linux-based environments, supporting both large and small clusters.[1] It serves as a critical tool in high-performance computing (HPC) by enabling efficient resource utilization across distributed systems.[8]
The primary functions of Slurm include allocating computational resources to user-submitted jobs, managing diverse workloads through queuing and prioritization, and arbitrating resource contention in multi-user settings via a centralized architecture that coordinates node states and job execution.[9] This setup ensures reliable operation even in the presence of failures, with features like backup controllers to maintain continuity.[10]
As of the November 2025 TOP500 list, Slurm powers approximately 65% of the world's TOP500 supercomputers, underscoring its dominance in large-scale HPC deployments.[6] Originally focused on traditional scientific computing, Slurm has evolved to accommodate modern demands, including support for AI and machine learning workloads through enhanced GPU management and large-scale data processing capabilities.[11]
Purpose and Applications
Slurm Workload Manager serves as an open-source system for efficient resource utilization in Linux-based cluster environments, enabling the allocation of compute nodes, memory, and other resources to user jobs while minimizing idle time through advanced scheduling mechanisms. It supports parallel processing by distributing workloads across multiple nodes and processors, facilitating high-throughput execution of compute-intensive applications. Additionally, Slurm accommodates diverse job types, such as batch jobs submitted via scripted commands for automated execution, interactive jobs for real-time user sessions, and GPU-accelerated tasks that leverage specialized hardware for accelerated computing.[1][12][11]
In practice, Slurm finds primary applications in high-performance computing (HPC) clusters and supercomputers, where it manages resource orchestration for scientific simulations, data analysis, and large-scale modeling. It is extensively deployed in AI and machine learning (AI/ML) training pipelines, optimizing GPU and accelerator usage to handle data-parallel workloads like neural network training. Slurm also supports cloud-hybrid setups, integrating with container orchestration tools to bridge on-premises HPC with cloud resources in research and enterprise settings.[1][11][13]
Key benefits of Slurm include its simplicity as a lightweight, kernel-independent solution that requires minimal configuration for deployment, alongside high portability across various Linux architectures and infrastructures. It imposes low overhead through efficient daemons and centralized control, while offering strong adaptability for large-scale systems, such as those spanning thousands of nodes with fault-tolerant features to ensure continuous operation. These qualities contribute to its widespread adoption for environments demanding reliable, high-performance workload management.[1][14]
Notable deployments highlight Slurm's impact, including its origins and ongoing use at Lawrence Livermore National Laboratory for managing HPC resources in national security and scientific research. It powers approximately 65% of the TOP500 supercomputers globally, underscoring its dominance in exascale computing. As of 2025, commercial AI platforms incorporate Slurm for scalable training clusters, such as those integrated with HPE infrastructure for enterprise AI workloads.[15][6][16]
History and Development
Origins and Early Development
The development of the Slurm Workload Manager began in 2001 as a collaborative effort primarily between Lawrence Livermore National Laboratory (LLNL) and Linux NetworX, with subsequent involvement from Hewlett-Packard and other partners; SchedMD, founded by key Slurm developers in 2010, later assumed primary maintenance responsibilities.[10][15][2] This initiative was driven by the need for a straightforward, highly scalable open-source alternative to proprietary schedulers like PBS and LSF, which were seen as overly complex and insufficiently adaptable for managing growing Linux-based high-performance computing (HPC) clusters at national laboratories and research institutions.[10][17]
The early design phase, informed by a 2002 survey of existing resource managers, emphasized core functionalities such as resource allocation, job queuing, and basic scheduling to support parallel workloads on commodity hardware.[10] Slurm's initial release took place in 2002, providing essential resource management capabilities tailored for small to medium-sized clusters, including support for job submission via command-line tools like sbatch and monitoring through squeue.[18][10] This version prioritized portability across Linux distributions and integration with common interconnects, marking a shift toward open-source solutions in HPC environments.[2]
Among the primary early challenges were ensuring fault tolerance to handle node or controller failures without disrupting operations, achieved through features like backup controllers that could seamlessly take over management duties.[10][18] Additionally, developers addressed compatibility with diverse network interconnects, initially supporting Quadrics Elan3 and incorporating plans for InfiniBand to enable efficient communication in heterogeneous cluster setups.[10] These foundations laid the groundwork for Slurm's reputation as a robust tool in early 2000s HPC deployments.[17]
Key Milestones and Versions
Slurm's evolution has been marked by several key milestones that expanded its capabilities for managing complex computing environments. Slurm introduced support for multi-cluster operations, allowing users to target jobs across multiple independent clusters for improved resource utilization.[2] By 2010, integration with accounting databases via the Slurm Database Daemon (slurmdbd) enabled centralized tracking of job usage and resource allocation, supporting enterprise-scale deployments.[19] In 2015, enhancements to GPU scheduling through the Generic Resource (GRES) framework in version 15.08 allowed for precise allocation and accounting of GPU resources, facilitating heterogeneous computing workloads.[20]
The progression of major versions has continued to introduce significant enhancements. Slurm 20.02, released in February 2020, added support for energy accounting through improved plugin integration, enabling the collection and reporting of power consumption data for jobs and nodes.[21] Slurm 23.02, released in February 2023, enhanced cloud bursting capabilities with new parameters like SuspendExcStates and State=CLOUD alternatives, allowing dynamic scaling to external cloud resources while maintaining seamless job management.[22] Slurm 24.05, released in May 2024, improved support for AI workloads via features such as Isolated Job Step management and RestrictedCoresPerGPU, ensuring dedicated CPU resources for GPU-intensive tasks like machine learning training.[23]
Slurm 25.05, released in May 2025, incorporated Kubernetes-native integrations, including compatibility with operators like Soperator for deploying Slurm clusters within Kubernetes environments, bridging traditional HPC with container orchestration.[24] Slurm 25.11, released on November 6, 2025, further enhanced scalability with improved RPC auditing, increased default connections, and bug fixes for high-availability setups.[25] Recent developments emphasize expansions for hybrid cloud environments through advanced bursting mechanisms, better ML orchestration with enhanced GPU and isolation controls, and performance optimizations tailored for exascale computing, as demonstrated in deployments on top-ranked supercomputers.[14][11]
Community contributions have driven substantial growth, particularly in the ecosystem of plugins and third-party extensions, which now include over 100 plugins for specialized hardware, storage, and integration needs, fostering adaptability across diverse platforms.[1]
Architecture
Core Components
The core architecture of Slurm Workload Manager revolves around a distributed set of daemons that manage cluster resources and job execution. At the heart is the slurmctld daemon, which serves as the central manager running on the primary control node. It is responsible for tracking all available resources across the cluster, maintaining job queues, and making scheduling decisions to allocate resources to pending jobs.[26] This daemon continuously monitors the state of other Slurm components and cluster nodes, ensuring a unified view of the system's capacity and workload.[1]
On each compute node, the slurmd daemon operates to handle local job execution and resource reporting. It launches and monitors tasks assigned to the node, reports real-time resource utilization back to slurmctld, and terminates jobs as directed.[27] This daemon plays a crucial role in the decentralized execution model, allowing Slurm to distribute workload management efficiently without requiring kernel modifications on compute nodes.[8]
For enhanced reliability, Slurm supports optional components such as backup instances of slurmctld to provide high availability. Multiple control hosts can be configured, where secondary slurmctld daemons stand ready to assume control if the primary fails, sharing state information via a common file system to minimize disruption.[9] Additionally, the slurmrestd daemon provides a REST API for interacting with Slurm, enabling external tools to query and manage jobs and resources.[28] Another optional component, slurmdbd, maintains a centralized database for accounting and resource tracking across multiple clusters, though it is primarily used for billing and usage statistics rather than real-time operations.[1]
Inter-component communication in Slurm relies on a remote procedure call (RPC) protocol over TCP/IP, secured by authentication plugins such as MUNGE to ensure integrity and prevent unauthorized access.[29] This design supports highly scalable deployments, with Slurm capable of managing clusters exceeding 100,000 nodes through hierarchical and fault-tolerant messaging between daemons.
Scalability and Fault Tolerance
Slurm's scalability is designed to accommodate expansive computing environments, supporting clusters with thousands of nodes without requiring kernel modifications. Key features include hierarchical partitioning, which organizes compute nodes into logical groups called partitions that can overlap and be configured for efficient resource allocation across diverse workloads. This structure allows administrators to define constraints such as job size limits and time limits per partition, enabling fine-grained control over large-scale resource distribution.[1][30]
Federated clusters further enhance multi-site management by enabling peer-to-peer job scheduling across independent clusters treated as a unified resource pool. In this setup, jobs submitted to a local cluster are replicated to participating federates, enabling seamless resource sharing and load balancing through coordinated scheduling. This federation capability scales to manage millions of jobs, particularly through job arrays that submit and track vast collections of similar tasks, as demonstrated in environments handling millions of cores and tasks efficiently.[31][32][33]
Fault tolerance in Slurm is achieved through redundant components and proactive monitoring mechanisms. The system employs a primary controller (slurmctld) with one or more backup controllers that automatically assume control during failover, ensuring continuous operation if the primary fails; this process is triggered via commands like scontrol takeover for manual intervention if needed. Node failures are detected through periodic communications between the controller and node daemons (slurmd), which report status updates and allow the system to mark unresponsive nodes as down or draining, preventing allocation to faulty resources.[9][34][1]
To maintain job reliability, Slurm supports job checkpointing and migration via integrations like CRIU (Checkpoint/Restore In Userspace) or application-specific plugins, allowing running jobs to save their state and resume or relocate to healthy nodes upon failure or preemption. This backward error recovery approach minimizes recomputation overhead in long-running parallel jobs, with extensions to Slurm's API enabling live migration across nodes or even clusters in federated setups.[35][36]
Performance in large-scale deployments emphasizes low-latency operations, with scheduling decisions typically completing in under 30 seconds for 30,000 tasks across 15,000 nodes, and systems routinely handling hundreds of jobs per second on 10,000-node clusters. Dynamic resource adjustments are facilitated by selectable plugins (e.g., select/linear for whole-node allocation) and topology-aware optimizations, which reduce communication overhead and adapt to varying node availability in real-time.[37][1]
Despite these strengths, Slurm faces limitations in handling network partitions, where communication disruptions between controllers or nodes can delay failover or lead to inconsistent state; mitigations include configuring multiple backups and health checks, though full quorum-based consensus is not natively implemented, relying instead on simple majority detection via heartbeats for critical decisions.[38][39]
Features
Resource Allocation and Scheduling
Slurm supports both exclusive and shared node access models for resource allocation. In the exclusive model, an entire node is dedicated to a single job, preventing other jobs from utilizing any resources on that node, which is the default behavior to ensure isolation and predictable performance. Shared access allows multiple jobs to concurrently use resources on the same node, configured via the OverSubscribe partition parameter (e.g., OverSubscribe=YES or FORCE), enabling higher utilization in environments with oversubscription, such as for lightweight tasks. Resource requests can specify CPUs via --cpus-per-task or --ntasks-per-node, GPUs and accelerators through Generic Resources (GRES) like --gpus=type:count for NVIDIA GPUs or --gres=mps:count for multi-process service sharing, memory with --mem or --mem-per-cpu, and other accelerators via custom GRES definitions in slurm.conf. These allocations are tracked using Trackable RESources (TRES), ensuring enforcement of limits like MaxTRESPerJob for CPUs, memory, and GRES.[1][40][41]
Slurm employs several scheduling algorithms to optimize job placement and cluster utilization. The backfill scheduler, enabled by default with SchedulerType=sched/backfill, augments FIFO scheduling by initiating lower-priority jobs in idle resources without delaying higher-priority ones, using estimated start times visible via squeue --start and configurable via parameters like bf_window (default 1440 minutes) for the look-ahead period. Gang scheduling facilitates parallel job execution by allocating resources to multiple jobs simultaneously in a partition and timeslicing them, with jobs suspended and resumed every SchedulerTimeSlice (default 30 seconds) to share resources, configured via PreemptMode=GANG and OverSubscribe=[FORCE](/page/Force). For topology-aware placement, Slurm uses a Hilbert curve algorithm in three-dimensional topologies to map node coordinates into a linear order that preserves locality, particularly for torus networks like Cray systems, integrated with the TopologyPlugin for best-fit allocations minimizing communication overhead. Fair-share policies are implemented through multifactor priority, where the fair-share factor (0.0–1.0) adjusts job priority based on historical resource usage versus allocation shares, weighted by PriorityWeightFairshare (recommended 10000) and using algorithms like Fair Tree for hierarchical accounting across users, accounts, and clusters.[42][43][44][45]
Partitions in Slurm provide logical groupings of nodes into queues with distinct configurations, such as node lists, resource limits, and access controls defined in slurm.conf (e.g., PartitionName=debug Nodes=node[01-10] Default=YES MaxTime=01:00:00). Each partition can enforce oversubscription, time limits, and priority tiers, allowing tailored environments like debug or production queues. Quality of Service (QoS) levels extend partitioning by associating limits and priorities with user classes or accounts, managed via sacctmgr (e.g., MaxTRESPerJob=cpu=100 for a QoS), overriding association limits and influencing scheduling via PriorityWeightQOS. A partition can inherit a default QoS with QOS=normal in its configuration, enabling differentiated access for research groups or priority users without altering base partitions.[30][46]
Advanced options enhance scheduling flexibility. Preemption allows higher-priority jobs to displace lower ones, configured with PreemptType=preempt/qos and using QoS preemption lists or partition tiers to determine eligibility, with exempt times (PreemptExemptTime) preventing immediate re-preemption. Reservations guarantee resources for specific jobs, users, or maintenance, created via scontrol create reservation (e.g., Nodes=ALL StartTime=now Duration=120), supporting flags like maint for downtime or magnetic for automatic attraction of eligible jobs, and integrable with backfill for non-disruptive planning. Dependency-based job chains defer execution until predecessor conditions are met, specified with --dependency in sbatch (e.g., afterok:12345 for success-dependent start or afterany:12345,67890 for any completion), supporting types like singleton for mutual exclusion and remote dependencies in federated clusters, modifiable post-submission via scontrol. As of Slurm version 25.11 (released November 6, 2025), an "Expedited Requeue" mode is available for batch jobs using --requeue=expedite, allowing immediate restart with highest priority upon node failure or script/epilog issues.[47][48][12][49]
Accounting and Monitoring
Slurm's accounting system collects detailed records of resource usage for every job and job step executed on the cluster, enabling administrators to track and enforce limits on users, accounts, and quality of service (QOS).[50] This system supports storage in simple text files for basic logging or integration with relational databases such as MySQL or MariaDB, facilitated by the Slurm Database Daemon (slurmdbd), which centralizes data from multiple clusters and handles authentication via plugins like MUNGE.[50] Database integration requires the InnoDB storage engine and allows for backup hosts to ensure data reliability, with records capturing job details including user, nodes allocated, execution times, status, and resource consumption metrics like CPU hours and memory usage.[50] As of Slurm version 25.11 (released November 6, 2025), administrators can retroactively modify AllocTRES values via sacctmgr for accounting adjustments, such as correcting energy usage records.[49]
For retrospective analysis, Slurm provides command-line tools such as sacct and sreport that query the accounting database. The sacct command displays job accounting data in customizable formats, supporting filters by job ID, time range, and state to report resource usage for active or completed jobs, including details like elapsed time, CPU count, and task distribution for detecting imbalances.[51] For example, sacct --format=jobid,elapsed,ncpus,ntasks,[state](/page/State) outputs key metrics for specified jobs, aiding in post-execution audits.[51] The sreport command generates aggregated reports on cluster utilization and job usage, such as account-based breakdowns or top users over hourly, daily, or monthly periods, requiring the slurmdbd for rolled-up data.[52] Common reports include Cluster Utilization for overall efficiency and AccountUtilizationByUser for per-user resource consumption, with options to include trackable resources (TRES) like energy or GPUs.[52]
Real-time monitoring is achieved through tools like squeue and sinfo, which provide immediate views of job and cluster status without relying on historical logs. The squeue command lists jobs in the scheduling queue, displaying states (e.g., PENDING, RUNNING) and resource allocations such as CPUs, memory, and nodes via formats like --long for extended details.[53] Meanwhile, sinfo reports on partitions and nodes, showing states (e.g., IDLE, ALLOCATED, DOWN) and utilization metrics including CPU count, memory, and generic resources per node, helping administrators identify available capacity or issues like drained nodes.[54] As of Slurm version 25.11 (released November 6, 2025), slurmctld supports exporting telemetry data in OpenMetrics format (compatible with Prometheus) on the SlurmctldPort for enhanced monitoring integration.[49]
Slurm's extensibility comes from plugins that allow custom metrics in accounting, such as the JobAcctGather plugin (e.g., linux or cgroup types) for collecting node-level data on energy consumption from hardware sensors, which is then attributed to jobs despite shared node usage.[50] Trackable RESources (TRES) extend this to specialized hardware, billing for GPU utilization or network bandwidth as predefined types (e.g., GRES/gpu, BB for billing), with configurable weights for priority and enforcement via parameters like AccountingStorageTRES.[55] User and account limits are enforced through the AccountingStorageEnforce parameter, applying QOS policies to prevent overuse based on associations of user, cluster, partition, and account.[50]
For enterprise compliance, Slurm maintains audit trails through comprehensive job records and supports data archiving to external servers for long-term retention, with configurable purging to manage database size while preserving access to historical data for security reviews or regulatory audits.[50] The sacctmgr tool further aids compliance by allowing modification and viewing of account hierarchies and limits in the database, ensuring traceable changes to resource policies.[56]
Configuration and Usage
Installation and Configuration
Slurm Workload Manager installation begins with ensuring the cluster environment meets specific prerequisites to support its operation across multiple nodes. A Linux kernel is required, as Slurm is designed for Unix-like systems, with synchronization of clocks, users, and groups (including UIDs and GIDs) across all nodes to maintain consistency.[9] Essential dependencies include the MUNGE authentication service, which provides secure communication between Slurm components; the same munge.key file must be distributed to all nodes, and the munged daemon started prior to Slurm daemons.[9][57] Additional development libraries are needed depending on enabled plugins, such as FreeIPMI for energy accounting or MySQL/MariaDB for database integration.[9] Compiler tools like GCC are necessary for building from source.[9]
Installation methods for Slurm vary by deployment preference, prioritizing ease of management and distribution-specific packaging. The primary approach is compiling from source: download the tarball from the official repository, unpack it, run ./configure to set build options, execute make followed by make install, and update the dynamic linker cache with ldconfig for libraries.[9] For RPM-based distributions like CentOS or Rocky Linux, administrators can build custom RPM packages using rpmbuild -ta on the source tarball, facilitating automated deployment via tools like yum or dnf.[9] Similarly, Debian or Ubuntu users can generate DEB packages with debuild -b -uc -us, enabling installation through apt.[9] Containerized installations are possible using images for Singularity (now Apptainer) or Docker, though these are more commonly used for job execution rather than core Slurm daemon deployment, and require custom builds to include host-specific configurations.[58]
Post-installation, configuration centers on key files that define cluster behavior and resource management. The primary file, slurm.conf, is an ASCII configuration located in the Slurm installation directory (typically /etc/slurm) and must be identical across all nodes; it outlines cluster topology via parameters like SlurmctldHost (specifying the control host), NodeName (detailing node attributes such as CPUs and memory), and TopologyPlugin (e.g., topology/tree for hierarchical layouts).[9][30] Partitions are configured within slurm.conf using PartitionName blocks, grouping nodes (via Nodes=) with settings like MaxTime for runtime limits, Default=YES for the primary queue, and OverSubscribe=YES to allow resource sharing.[30] Scheduling parameters include SchedulerType (e.g., sched/backfill for advanced planning) and PriorityType (e.g., priority/multifactor for job prioritization).[30] Complementing this, cgroup.conf configures Linux control groups for resource constraints when the task/cgroup or job/cgroup plugins are enabled; key options include ConstrainRAMSpace=YES to enforce memory limits (with AllowedRAMSpace=95 for percentage-based allocation) and CgroupPlugin=cgroup/v1 (or /v2 for modern kernels).[59][60]
Initial setup involves creating a dedicated Slurm user (e.g., "slurm") on all nodes for daemon execution, along with directories for state saving (StateSaveLocation=/var/spool/slurm), logs (SlurmctldLog=/var/log/slurmctld.log), and PID files, ensuring they are owned and writable by this user.[9] Distribute slurm.conf and start the slurmctld daemon on the controller node, followed by slurmd on compute nodes; if using accounting, configure and launch slurmdbd with a database backend.[9] Validation occurs by submitting a test job, such as srun -N1 /bin/hostname, to confirm daemon communication and basic scheduling.[9] Tools like the Slurm Configuration Tool can generate an initial slurm.conf interactively.[9]
Job Submission and Management
Slurm provides several core commands for users to submit, monitor, and control jobs on managed clusters. The primary command for batch job submission is sbatch, which submits a script to the Slurm controller for later execution and returns a job ID immediately, though resources may not be allocated right away.[12] For interactive or step-based execution, srun launches tasks directly, either creating a new allocation if needed or running within an existing one, and supports options like --ntasks to specify the number of parallel tasks.[61] To terminate jobs, scancel signals or cancels specified jobs, arrays, or steps using filters such as --user for owner-specific jobs or --state=RUNNING for active ones.[62]
Batch scripts submitted via sbatch typically begin with a shebang line followed by #SBATCH directives to request resources. For instance, a script might specify --ntasks=4 to allocate four tasks across nodes and --gres=gpu:1 to request one GPU, ensuring the job utilizes parallel processing and accelerators as needed.[12] Environment variables can be controlled with --export=ALL to inherit the submitting shell's environment or --export=NONE to start clean, preventing unintended variable propagation.[12] Dependencies allow chaining jobs; for example, --dependency=afterok:12345 defers execution until job 12345 completes successfully with exit code zero, while afterany:12345 triggers after any termination regardless of outcome.[12] A sample script could look like this:
#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --ntasks=4
#SBATCH --gres=gpu:1
#SBATCH --dependency=afterok:12345
#SBATCH --export=ALL
srun hostname
#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --ntasks=4
#SBATCH --gres=gpu:1
#SBATCH --dependency=afterok:12345
#SBATCH --export=ALL
srun hostname
Submitting with sbatch script.sh queues the job, which runs the srun command upon allocation.[12]
Queue management involves querying and modifying job states without deep administrative access. The squeue command displays pending and running jobs, filtered by options like --user=$USER for personal jobs or --states=PD for pending ones, with customizable output via --format to show details such as job ID, partition, and reason for delay.[53] Complementing this, sinfo reports on available partitions and nodes, using --summarize for an overview of idle or allocated resources in queues like "debug" or "batch."[54] For modifications, scontrol enables user-level updates, such as scontrol update JobId=12345 TimeLimit=02:00:00 to extend runtime or scontrol hold 12345 to pause a pending job, though releases require matching permissions.[34]
Best practices enhance efficiency for complex workflows. Job arrays, submitted with sbatch --array=0-99%10, manage up to thousands of similar tasks by limiting concurrent runs (e.g., 10 via %10) and using environment variables like $SLURM_ARRAY_TASK_ID for task-specific logic, reducing submission overhead for parameter sweeps.[32] Multi-step jobs leverage srun sequentially within a single sbatch script or salloc allocation, as in srun -n2 step1; srun -n4 step2, to chain dependent computations without multiple submissions.[8] Error handling includes checking exit codes in scripts (e.g., if [ $? -ne 0 ]; then echo "Error"; exit 1; fi) and using scancel promptly for failed jobs, while monitoring with squeue -u $USER --format="%i %T %R" helps identify issues like resource contention early.[8]
Operating Systems
Slurm Workload Manager provides primary support for major Linux distributions, ensuring compatibility with widely used enterprise and community editions as of 2025.[63] These include Red Hat Enterprise Linux (RHEL) versions 8, 9, and 10 along with their derivatives such as CentOS Stream, Rocky Linux, and AlmaLinux; Ubuntu versions 20.04 (Focal Fossa), 22.04 (Jammy Jellyfish), and 24.04 (Noble Numbat); SUSE Linux Enterprise Server (SLES) versions 12 and 15; and Debian versions 11 (Bullseye), 12 (Bookworm), and 13 (Trixie).[63] This support encompasses thorough testing on x86_64, arm64, and ppc64 architectures within these environments, facilitating seamless deployment in high-performance computing clusters.[63]
Slurm integrates with systemd for service management on compatible Linux distributions, allowing administrators to enable and control daemons such as slurmctld, slurmdbd, and slurmd using standard commands like systemctl enable slurmctld.[9] The software requires no kernel modifications for basic operation and is compatible with Linux kernels that support control groups (cgroups), which are available since kernel version 2.6.[8] However, full functionality, including advanced resource isolation, benefits from kernels supporting cgroups v2 (generally version 4.5 or later) to leverage unified cgroups hierarchies.[64]
Support extends to certain Unix-like systems with limitations. FreeBSD and NetBSD are compatible but receive limited testing and maintenance, restricting their use to basic scenarios without guaranteed feature parity.[63] Historical ports to IBM AIX exist, but current versions are not actively supported, leading to potential compatibility issues for production environments.[65] Slurm offers no native support for Windows operating systems, though it can interact with Windows-based nodes via external integrations in hybrid setups.[1]
Known issues arise on older distributions lacking robust cgroups v2 implementation, which is essential for advanced resource controls like CPU and memory limiting. For instance, early RHEL 8 releases (prior to 8.2) treat cgroups v2 as a technology preview with incomplete cpuset support, requiring workarounds such as enabling DefaultCpuAccounting=yes in systemd configurations.[64] Distributions without cgroups v2, such as those on kernels before 4.5, fall back to cgroups v1 but may encounter reduced performance in job containment and accounting.[64]
Hardware Architectures
Slurm Workload Manager supports a range of CPU architectures, enabling deployment across diverse hardware environments. Primary compatibility includes x86_64 processors from Intel and AMD, which form the backbone of most high-performance computing (HPC) clusters. Additionally, arm64 (AArch64) architectures are fully supported, facilitating integration with processors such as AWS Graviton instances and Ampere Altra systems for energy-efficient computing. PowerPC64 (ppc64) support accommodates IBM Power systems, allowing Slurm to manage workloads on enterprise-grade hardware optimized for reliability and scalability.[63]
The software accommodates heterogeneous node configurations, including clusters with specialized accelerators. GPU support encompasses NVIDIA devices, with features like Multi-Instance GPU (MIG) partitioning introduced in version 21.08 for models such as the A100, and AMD GPUs via the RSMI library for autodetection. Legacy accelerators like the Intel Xeon Phi (formerly MIC architecture) received enhanced integration in earlier releases, enabling finer-grained resource management on Knights Landing processors. Interconnects such as InfiniBand and Ethernet are natively handled, with topology plugins optimizing allocation in fat-tree network topologies to minimize communication overhead.[40][63][44]
Slurm demonstrates robust scalability, designed to handle clusters exceeding 100,000 nodes and up to 1 million concurrent jobs, as evidenced by deployments on large-scale supercomputers. This capability relies on fault-tolerant daemons and efficient resource arbitration to maintain performance under high load. Furthermore, integrations with quantum co-processors, such as cat-qubit systems from Alice & Bob, enable hybrid classical-quantum scheduling through custom plugins, marking initial steps toward quantum-HPC convergence in research environments.[1][66]
Licensing and Support
Open-Source License
Slurm Workload Manager is licensed under the GNU General Public License version 2 (GPLv2), a copyleft license that has governed the project since its inception in 2002.[4][3][2] This license permits users to freely use, study, modify, and redistribute the software, provided that any derivative works are also distributed under GPLv2 terms and accompanied by the source code.
Key provisions of the GPLv2 include copyleft requirements, which mandate that modifications or combined works must retain the same licensing obligations, ensuring that enhancements remain open to the community. The license explicitly disclaims any warranty, distributing the software "as is" without guarantees of merchantability or fitness for a particular purpose, thereby limiting liability for any damages arising from its use.[67] GPLv2 includes a defensive provision in Section 7 regarding patents: if a party asserts patent rights against the program or its use, the license terminates for that party. Slurm includes a special exception permitting linkage with the OpenSSL library, which would otherwise conflict with GPLv2 due to OpenSSL's licensing.[67]
The source code for Slurm is distributed through the official SchedMD repository, which provides tarballs and documentation for each release.[7] It is also mirrored on GitHub, facilitating version control and community contributions via pull requests.[3] Furthermore, Slurm is packaged for major Linux distributions, including Ubuntu via APT repositories and Red Hat Enterprise Linux/CentOS via RPM packages, enabling straightforward installation through standard package managers.[68]
When integrating Slurm with proprietary software, compliance with GPLv2 requires careful consideration of linking and derivative works, as the copyleft clause may obligate the release of source code for any tightly coupled components under the same license.[69] Users must ensure that proprietary elements do not violate these terms, potentially by using dynamic linking or separate processes to avoid triggering copyleft obligations.[69]
Commercial and Community Support
Slurm Workload Manager receives commercial support primarily from SchedMD, the company founded in 2010 by its original developers to sustain and enhance the software.[70] SchedMD offers a range of services, including real-time troubleshooting, performance optimization, custom development, configuration assistance, bug fixes, and onsite training for high-performance computing (HPC), high-throughput computing (HTC), AI, and machine learning environments.[70] These services are utilized by major institutions such as Harvard University, NASA, and the Technical University of Denmark, with support contracts extending up to seven years for some clients.[70] Additionally, SchedMD provides Slurm integration and support through partnerships with cloud providers like AWS, Google Cloud, and NVIDIA, as well as hardware vendors such as HPE.[71][72][73]
As an open-source project, Slurm benefits from robust community support mechanisms maintained by SchedMD. The primary channels include two official mailing lists: [email protected] for release announcements and critical updates, and [email protected] for user discussions, questions, and technical advice, which is also archived on Google Groups.[74][75] Community members can report bugs, request features, and submit contributions via the official support tracker at support.schedmd.com, where patches are attached to issues labeled under "C - Contributions" rather than using GitHub pull requests.[3] The Slurm source code is hosted on GitHub, enabling developers to access, review, and build upon the codebase, with extensive documentation available on the official site for self-guided troubleshooting and administration.[3] These resources foster active participation from a global user base, including supercomputing centers and research institutions.[76]