Fact-checked by Grok 2 weeks ago

Slurm Workload Manager

Slurm Workload Manager is an open-source, fault-tolerant, and highly scalable cluster management and job scheduling system designed for Linux clusters of varying sizes, enabling efficient resource allocation, job execution, and contention arbitration among users.^[1] Originally developed in 2001 by Lawrence Livermore National Laboratory to address the need for an open-source resource manager on commodity hardware, it was initially released in 2002 as a simple resource manager and has since evolved to support diverse processor types, network architectures, and parallel computing environments.^[2] Maintained by SchedMD—a company founded in 2010 by key developers Morris Jette and Danny Auble—Slurm operates under the GNU General Public License version 2 or later, ensuring its free availability and portability across platforms.^[2]^[3]^[4] At its core, Slurm allocates exclusive or non-exclusive access to compute nodes, manages job queues, and provides tools for monitoring and administration, such as srun for job initiation, squeue for status checks, and scontrol for configuration.^[1] Its architecture includes a centralized controller (slurmctld) with backup support for high availability, node daemons (slurmd) for local execution, and optional components like a database (slurmdbd) for accounting or a REST API (slurmrestd) for integration.^[1] Slurm's extensibility through plugins allows customization for features like advanced reservations, backfill scheduling, and license management, making it adaptable to high-performance computing (HPC), AI workloads, and cloud environments.^[1]^[5] Slurm's prominence in the HPC community is evident in its adoption by approximately 65% of the TOP500 supercomputers, powering some of the world's largest computational resources for scientific simulations, data analysis, and machine learning tasks.^[6] Its fault-tolerant design ensures minimal downtime in large-scale deployments, while ongoing development—reflected in releases like version 25.11—continues to enhance scalability and performance for modern infrastructures.^[7]^[2]

Introduction

Overview

Slurm Workload Manager, commonly known as Slurm, is an open-source, fault-tolerant, and highly scalable cluster management and job scheduling system designed for Linux-based environments, supporting both large and small clusters.^[1] It serves as a critical tool in high-performance computing (HPC) by enabling efficient resource utilization across distributed systems.^[8] The primary functions of Slurm include allocating computational resources to user-submitted jobs, managing diverse workloads through queuing and prioritization, and arbitrating resource contention in multi-user settings via a centralized architecture that coordinates node states and job execution.^[9] This setup ensures reliable operation even in the presence of failures, with features like backup controllers to maintain continuity.^[10] As of the November 2025 TOP500 list, Slurm powers approximately 65% of the world's TOP500 supercomputers, underscoring its dominance in large-scale HPC deployments.^[6] Originally focused on traditional scientific computing, Slurm has evolved to accommodate modern demands, including support for AI and machine learning workloads through enhanced GPU management and large-scale data processing capabilities.^[11]

Purpose and Applications

Slurm Workload Manager serves as an open-source system for efficient resource utilization in Linux-based cluster environments, enabling the allocation of compute nodes, memory, and other resources to user jobs while minimizing idle time through advanced scheduling mechanisms. It supports parallel processing by distributing workloads across multiple nodes and processors, facilitating high-throughput execution of compute-intensive applications. Additionally, Slurm accommodates diverse job types, such as batch jobs submitted via scripted commands for automated execution, interactive jobs for real-time user sessions, and GPU-accelerated tasks that leverage specialized hardware for accelerated computing.^[1]^[12]^[11] In practice, Slurm finds primary applications in high-performance computing (HPC) clusters and supercomputers, where it manages resource orchestration for scientific simulations, data analysis, and large-scale modeling. It is extensively deployed in AI and machine learning (AI/ML) training pipelines, optimizing GPU and accelerator usage to handle data-parallel workloads like neural network training. Slurm also supports cloud-hybrid setups, integrating with container orchestration tools to bridge on-premises HPC with cloud resources in research and enterprise settings.^[1]^[11]^[13] Key benefits of Slurm include its simplicity as a lightweight, kernel-independent solution that requires minimal configuration for deployment, alongside high portability across various Linux architectures and infrastructures. It imposes low overhead through efficient daemons and centralized control, while offering strong adaptability for large-scale systems, such as those spanning thousands of nodes with fault-tolerant features to ensure continuous operation. These qualities contribute to its widespread adoption for environments demanding reliable, high-performance workload management.^[1]^[14] Notable deployments highlight Slurm's impact, including its origins and ongoing use at Lawrence Livermore National Laboratory for managing HPC resources in national security and scientific research. It powers approximately 65% of the TOP500 supercomputers globally, underscoring its dominance in exascale computing. As of 2025, commercial AI platforms incorporate Slurm for scalable training clusters, such as those integrated with HPE infrastructure for enterprise AI workloads.^[15]^[6]^[16]

History and Development

Origins and Early Development

The development of the Slurm Workload Manager began in 2001 as a collaborative effort primarily between Lawrence Livermore National Laboratory (LLNL) and Linux NetworX, with subsequent involvement from Hewlett-Packard and other partners; SchedMD, founded by key Slurm developers in 2010, later assumed primary maintenance responsibilities.^[10]^[15]^[2] This initiative was driven by the need for a straightforward, highly scalable open-source alternative to proprietary schedulers like PBS and LSF, which were seen as overly complex and insufficiently adaptable for managing growing Linux-based high-performance computing (HPC) clusters at national laboratories and research institutions.^[10]^[17] The early design phase, informed by a 2002 survey of existing resource managers, emphasized core functionalities such as resource allocation, job queuing, and basic scheduling to support parallel workloads on commodity hardware.^[10] Slurm's initial release took place in 2002, providing essential resource management capabilities tailored for small to medium-sized clusters, including support for job submission via command-line tools like sbatch and monitoring through squeue.^[18]^[10] This version prioritized portability across Linux distributions and integration with common interconnects, marking a shift toward open-source solutions in HPC environments.^[2] Among the primary early challenges were ensuring fault tolerance to handle node or controller failures without disrupting operations, achieved through features like backup controllers that could seamlessly take over management duties.^[10]^[18] Additionally, developers addressed compatibility with diverse network interconnects, initially supporting Quadrics Elan3 and incorporating plans for InfiniBand to enable efficient communication in heterogeneous cluster setups.^[10] These foundations laid the groundwork for Slurm's reputation as a robust tool in early 2000s HPC deployments.^[17]

Key Milestones and Versions

Slurm's evolution has been marked by several key milestones that expanded its capabilities for managing complex computing environments. Slurm introduced support for multi-cluster operations, allowing users to target jobs across multiple independent clusters for improved resource utilization.^[2] By 2010, integration with accounting databases via the Slurm Database Daemon (slurmdbd) enabled centralized tracking of job usage and resource allocation, supporting enterprise-scale deployments.^[19] In 2015, enhancements to GPU scheduling through the Generic Resource (GRES) framework in version 15.08 allowed for precise allocation and accounting of GPU resources, facilitating heterogeneous computing workloads.^[20] The progression of major versions has continued to introduce significant enhancements. Slurm 20.02, released in February 2020, added support for energy accounting through improved plugin integration, enabling the collection and reporting of power consumption data for jobs and nodes.^[21] Slurm 23.02, released in February 2023, enhanced cloud bursting capabilities with new parameters like SuspendExcStates and State=CLOUD alternatives, allowing dynamic scaling to external cloud resources while maintaining seamless job management.^[22] Slurm 24.05, released in May 2024, improved support for AI workloads via features such as Isolated Job Step management and RestrictedCoresPerGPU, ensuring dedicated CPU resources for GPU-intensive tasks like machine learning training.^[23] Slurm 25.05, released in May 2025, incorporated Kubernetes-native integrations, including compatibility with operators like Soperator for deploying Slurm clusters within Kubernetes environments, bridging traditional HPC with container orchestration.^[24] Slurm 25.11, released on November 6, 2025, further enhanced scalability with improved RPC auditing, increased default connections, and bug fixes for high-availability setups.^[25] Recent developments emphasize expansions for hybrid cloud environments through advanced bursting mechanisms, better ML orchestration with enhanced GPU and isolation controls, and performance optimizations tailored for exascale computing, as demonstrated in deployments on top-ranked supercomputers.^[14]^[11] Community contributions have driven substantial growth, particularly in the ecosystem of plugins and third-party extensions, which now include over 100 plugins for specialized hardware, storage, and integration needs, fostering adaptability across diverse platforms.^[1]

Architecture

Core Components

The core architecture of Slurm Workload Manager revolves around a distributed set of daemons that manage cluster resources and job execution. At the heart is the slurmctld daemon, which serves as the central manager running on the primary control node. It is responsible for tracking all available resources across the cluster, maintaining job queues, and making scheduling decisions to allocate resources to pending jobs.^[26] This daemon continuously monitors the state of other Slurm components and cluster nodes, ensuring a unified view of the system's capacity and workload.^[1] On each compute node, the slurmd daemon operates to handle local job execution and resource reporting. It launches and monitors tasks assigned to the node, reports real-time resource utilization back to slurmctld, and terminates jobs as directed.^[27] This daemon plays a crucial role in the decentralized execution model, allowing Slurm to distribute workload management efficiently without requiring kernel modifications on compute nodes.^[8] For enhanced reliability, Slurm supports optional components such as backup instances of slurmctld to provide high availability. Multiple control hosts can be configured, where secondary slurmctld daemons stand ready to assume control if the primary fails, sharing state information via a common file system to minimize disruption.^[9] Additionally, the slurmrestd daemon provides a REST API for interacting with Slurm, enabling external tools to query and manage jobs and resources.^[28] Another optional component, slurmdbd, maintains a centralized database for accounting and resource tracking across multiple clusters, though it is primarily used for billing and usage statistics rather than real-time operations.^[1] Inter-component communication in Slurm relies on a remote procedure call (RPC) protocol over TCP/IP, secured by authentication plugins such as MUNGE to ensure integrity and prevent unauthorized access.^[29] This design supports highly scalable deployments, with Slurm capable of managing clusters exceeding 100,000 nodes through hierarchical and fault-tolerant messaging between daemons.

Scalability and Fault Tolerance

Slurm's scalability is designed to accommodate expansive computing environments, supporting clusters with thousands of nodes without requiring kernel modifications. Key features include hierarchical partitioning, which organizes compute nodes into logical groups called partitions that can overlap and be configured for efficient resource allocation across diverse workloads. This structure allows administrators to define constraints such as job size limits and time limits per partition, enabling fine-grained control over large-scale resource distribution.^[1]^[30] Federated clusters further enhance multi-site management by enabling peer-to-peer job scheduling across independent clusters treated as a unified resource pool. In this setup, jobs submitted to a local cluster are replicated to participating federates, enabling seamless resource sharing and load balancing through coordinated scheduling. This federation capability scales to manage millions of jobs, particularly through job arrays that submit and track vast collections of similar tasks, as demonstrated in environments handling millions of cores and tasks efficiently.^[31]^[32]^[33] Fault tolerance in Slurm is achieved through redundant components and proactive monitoring mechanisms. The system employs a primary controller (slurmctld) with one or more backup controllers that automatically assume control during failover, ensuring continuous operation if the primary fails; this process is triggered via commands like scontrol takeover for manual intervention if needed. Node failures are detected through periodic communications between the controller and node daemons (slurmd), which report status updates and allow the system to mark unresponsive nodes as down or draining, preventing allocation to faulty resources.^[9]^[34]^[1] To maintain job reliability, Slurm supports job checkpointing and migration via integrations like CRIU (Checkpoint/Restore In Userspace) or application-specific plugins, allowing running jobs to save their state and resume or relocate to healthy nodes upon failure or preemption. This backward error recovery approach minimizes recomputation overhead in long-running parallel jobs, with extensions to Slurm's API enabling live migration across nodes or even clusters in federated setups.^[35]^[36] Performance in large-scale deployments emphasizes low-latency operations, with scheduling decisions typically completing in under 30 seconds for 30,000 tasks across 15,000 nodes, and systems routinely handling hundreds of jobs per second on 10,000-node clusters. Dynamic resource adjustments are facilitated by selectable plugins (e.g., select/linear for whole-node allocation) and topology-aware optimizations, which reduce communication overhead and adapt to varying node availability in real-time.^[37]^[1] Despite these strengths, Slurm faces limitations in handling network partitions, where communication disruptions between controllers or nodes can delay failover or lead to inconsistent state; mitigations include configuring multiple backups and health checks, though full quorum-based consensus is not natively implemented, relying instead on simple majority detection via heartbeats for critical decisions.^[38]^[39]

Features

Resource Allocation and Scheduling

Slurm supports both exclusive and shared node access models for resource allocation. In the exclusive model, an entire node is dedicated to a single job, preventing other jobs from utilizing any resources on that node, which is the default behavior to ensure isolation and predictable performance. Shared access allows multiple jobs to concurrently use resources on the same node, configured via the OverSubscribe partition parameter (e.g., OverSubscribe=YES or FORCE), enabling higher utilization in environments with oversubscription, such as for lightweight tasks. Resource requests can specify CPUs via --cpus-per-task or --ntasks-per-node, GPUs and accelerators through Generic Resources (GRES) like --gpus=type:count for NVIDIA GPUs or --gres=mps:count for multi-process service sharing, memory with --mem or --mem-per-cpu, and other accelerators via custom GRES definitions in slurm.conf. These allocations are tracked using Trackable RESources (TRES), ensuring enforcement of limits like MaxTRESPerJob for CPUs, memory, and GRES.^[1]^[40]^[41] Slurm employs several scheduling algorithms to optimize job placement and cluster utilization. The backfill scheduler, enabled by default with SchedulerType=sched/backfill, augments FIFO scheduling by initiating lower-priority jobs in idle resources without delaying higher-priority ones, using estimated start times visible via squeue --start and configurable via parameters like bf_window (default 1440 minutes) for the look-ahead period. Gang scheduling facilitates parallel job execution by allocating resources to multiple jobs simultaneously in a partition and timeslicing them, with jobs suspended and resumed every SchedulerTimeSlice (default 30 seconds) to share resources, configured via PreemptMode=GANG and OverSubscribe=[FORCE](/page/Force). For topology-aware placement, Slurm uses a Hilbert curve algorithm in three-dimensional topologies to map node coordinates into a linear order that preserves locality, particularly for torus networks like Cray systems, integrated with the TopologyPlugin for best-fit allocations minimizing communication overhead. Fair-share policies are implemented through multifactor priority, where the fair-share factor (0.0–1.0) adjusts job priority based on historical resource usage versus allocation shares, weighted by PriorityWeightFairshare (recommended 10000) and using algorithms like Fair Tree for hierarchical accounting across users, accounts, and clusters.^[42]^[43]^[44]^[45] Partitions in Slurm provide logical groupings of nodes into queues with distinct configurations, such as node lists, resource limits, and access controls defined in slurm.conf (e.g., PartitionName=debug Nodes=node[01-10] Default=YES MaxTime=01:00:00). Each partition can enforce oversubscription, time limits, and priority tiers, allowing tailored environments like debug or production queues. Quality of Service (QoS) levels extend partitioning by associating limits and priorities with user classes or accounts, managed via sacctmgr (e.g., MaxTRESPerJob=cpu=100 for a QoS), overriding association limits and influencing scheduling via PriorityWeightQOS. A partition can inherit a default QoS with QOS=normal in its configuration, enabling differentiated access for research groups or priority users without altering base partitions.^[30]^[46] Advanced options enhance scheduling flexibility. Preemption allows higher-priority jobs to displace lower ones, configured with PreemptType=preempt/qos and using QoS preemption lists or partition tiers to determine eligibility, with exempt times (PreemptExemptTime) preventing immediate re-preemption. Reservations guarantee resources for specific jobs, users, or maintenance, created via scontrol create reservation (e.g., Nodes=ALL StartTime=now Duration=120), supporting flags like maint for downtime or magnetic for automatic attraction of eligible jobs, and integrable with backfill for non-disruptive planning. Dependency-based job chains defer execution until predecessor conditions are met, specified with --dependency in sbatch (e.g., afterok:12345 for success-dependent start or afterany:12345,67890 for any completion), supporting types like singleton for mutual exclusion and remote dependencies in federated clusters, modifiable post-submission via scontrol. As of Slurm version 25.11 (released November 6, 2025), an "Expedited Requeue" mode is available for batch jobs using --requeue=expedite, allowing immediate restart with highest priority upon node failure or script/epilog issues.^[47]^[48]^[12]^[49]

Accounting and Monitoring

Slurm's accounting system collects detailed records of resource usage for every job and job step executed on the cluster, enabling administrators to track and enforce limits on users, accounts, and quality of service (QOS).^[50] This system supports storage in simple text files for basic logging or integration with relational databases such as MySQL or MariaDB, facilitated by the Slurm Database Daemon (slurmdbd), which centralizes data from multiple clusters and handles authentication via plugins like MUNGE.^[50] Database integration requires the InnoDB storage engine and allows for backup hosts to ensure data reliability, with records capturing job details including user, nodes allocated, execution times, status, and resource consumption metrics like CPU hours and memory usage.^[50] As of Slurm version 25.11 (released November 6, 2025), administrators can retroactively modify AllocTRES values via sacctmgr for accounting adjustments, such as correcting energy usage records.^[49] For retrospective analysis, Slurm provides command-line tools such as sacct and sreport that query the accounting database. The sacct command displays job accounting data in customizable formats, supporting filters by job ID, time range, and state to report resource usage for active or completed jobs, including details like elapsed time, CPU count, and task distribution for detecting imbalances.^[51] For example, sacct --format=jobid,elapsed,ncpus,ntasks,[state](/page/State) outputs key metrics for specified jobs, aiding in post-execution audits.^[51] The sreport command generates aggregated reports on cluster utilization and job usage, such as account-based breakdowns or top users over hourly, daily, or monthly periods, requiring the slurmdbd for rolled-up data.^[52] Common reports include Cluster Utilization for overall efficiency and AccountUtilizationByUser for per-user resource consumption, with options to include trackable resources (TRES) like energy or GPUs.^[52] Real-time monitoring is achieved through tools like squeue and sinfo, which provide immediate views of job and cluster status without relying on historical logs. The squeue command lists jobs in the scheduling queue, displaying states (e.g., PENDING, RUNNING) and resource allocations such as CPUs, memory, and nodes via formats like --long for extended details.^[53] Meanwhile, sinfo reports on partitions and nodes, showing states (e.g., IDLE, ALLOCATED, DOWN) and utilization metrics including CPU count, memory, and generic resources per node, helping administrators identify available capacity or issues like drained nodes.^[54] As of Slurm version 25.11 (released November 6, 2025), slurmctld supports exporting telemetry data in OpenMetrics format (compatible with Prometheus) on the SlurmctldPort for enhanced monitoring integration.^[49] Slurm's extensibility comes from plugins that allow custom metrics in accounting, such as the JobAcctGather plugin (e.g., linux or cgroup types) for collecting node-level data on energy consumption from hardware sensors, which is then attributed to jobs despite shared node usage.^[50] Trackable RESources (TRES) extend this to specialized hardware, billing for GPU utilization or network bandwidth as predefined types (e.g., GRES/gpu, BB for billing), with configurable weights for priority and enforcement via parameters like AccountingStorageTRES.^[55] User and account limits are enforced through the AccountingStorageEnforce parameter, applying QOS policies to prevent overuse based on associations of user, cluster, partition, and account.^[50] For enterprise compliance, Slurm maintains audit trails through comprehensive job records and supports data archiving to external servers for long-term retention, with configurable purging to manage database size while preserving access to historical data for security reviews or regulatory audits.^[50] The sacctmgr tool further aids compliance by allowing modification and viewing of account hierarchies and limits in the database, ensuring traceable changes to resource policies.^[56]

Configuration and Usage

Installation and Configuration

Slurm Workload Manager installation begins with ensuring the cluster environment meets specific prerequisites to support its operation across multiple nodes. A Linux kernel is required, as Slurm is designed for Unix-like systems, with synchronization of clocks, users, and groups (including UIDs and GIDs) across all nodes to maintain consistency.^[9] Essential dependencies include the MUNGE authentication service, which provides secure communication between Slurm components; the same munge.key file must be distributed to all nodes, and the munged daemon started prior to Slurm daemons.^[9]^[57] Additional development libraries are needed depending on enabled plugins, such as FreeIPMI for energy accounting or MySQL/MariaDB for database integration.^[9] Compiler tools like GCC are necessary for building from source.^[9] Installation methods for Slurm vary by deployment preference, prioritizing ease of management and distribution-specific packaging. The primary approach is compiling from source: download the tarball from the official repository, unpack it, run ./configure to set build options, execute make followed by make install, and update the dynamic linker cache with ldconfig for libraries.^[9] For RPM-based distributions like CentOS or Rocky Linux, administrators can build custom RPM packages using rpmbuild -ta on the source tarball, facilitating automated deployment via tools like yum or dnf.^[9] Similarly, Debian or Ubuntu users can generate DEB packages with debuild -b -uc -us, enabling installation through apt.^[9] Containerized installations are possible using images for Singularity (now Apptainer) or Docker, though these are more commonly used for job execution rather than core Slurm daemon deployment, and require custom builds to include host-specific configurations.^[58] Post-installation, configuration centers on key files that define cluster behavior and resource management. The primary file, slurm.conf, is an ASCII configuration located in the Slurm installation directory (typically /etc/slurm) and must be identical across all nodes; it outlines cluster topology via parameters like SlurmctldHost (specifying the control host), NodeName (detailing node attributes such as CPUs and memory), and TopologyPlugin (e.g., topology/tree for hierarchical layouts).^[9]^[30] Partitions are configured within slurm.conf using PartitionName blocks, grouping nodes (via Nodes=) with settings like MaxTime for runtime limits, Default=YES for the primary queue, and OverSubscribe=YES to allow resource sharing.^[30] Scheduling parameters include SchedulerType (e.g., sched/backfill for advanced planning) and PriorityType (e.g., priority/multifactor for job prioritization).^[30] Complementing this, cgroup.conf configures Linux control groups for resource constraints when the task/cgroup or job/cgroup plugins are enabled; key options include ConstrainRAMSpace=YES to enforce memory limits (with AllowedRAMSpace=95 for percentage-based allocation) and CgroupPlugin=cgroup/v1 (or /v2 for modern kernels).^[59]^[60] Initial setup involves creating a dedicated Slurm user (e.g., "slurm") on all nodes for daemon execution, along with directories for state saving (StateSaveLocation=/var/spool/slurm), logs (SlurmctldLog=/var/log/slurmctld.log), and PID files, ensuring they are owned and writable by this user.^[9] Distribute slurm.conf and start the slurmctld daemon on the controller node, followed by slurmd on compute nodes; if using accounting, configure and launch slurmdbd with a database backend.^[9] Validation occurs by submitting a test job, such as srun -N1 /bin/hostname, to confirm daemon communication and basic scheduling.^[9] Tools like the Slurm Configuration Tool can generate an initial slurm.conf interactively.^[9]

Job Submission and Management

Slurm provides several core commands for users to submit, monitor, and control jobs on managed clusters. The primary command for batch job submission is sbatch, which submits a script to the Slurm controller for later execution and returns a job ID immediately, though resources may not be allocated right away.^[12] For interactive or step-based execution, srun launches tasks directly, either creating a new allocation if needed or running within an existing one, and supports options like --ntasks to specify the number of parallel tasks.^[61] To terminate jobs, scancel signals or cancels specified jobs, arrays, or steps using filters such as --user for owner-specific jobs or --state=RUNNING for active ones.^[62] Batch scripts submitted via sbatch typically begin with a shebang line followed by #SBATCH directives to request resources. For instance, a script might specify --ntasks=4 to allocate four tasks across nodes and --gres=gpu:1 to request one GPU, ensuring the job utilizes parallel processing and accelerators as needed.^[12] Environment variables can be controlled with --export=ALL to inherit the submitting shell's environment or --export=NONE to start clean, preventing unintended variable propagation.^[12] Dependencies allow chaining jobs; for example, --dependency=afterok:12345 defers execution until job 12345 completes successfully with exit code zero, while afterany:12345 triggers after any termination regardless of outcome.^[12] A sample script could look like this:

#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --ntasks=4
#SBATCH --gres=gpu:1
#SBATCH --dependency=afterok:12345
#SBATCH --export=ALL

srun hostname
#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --ntasks=4
#SBATCH --gres=gpu:1
#SBATCH --dependency=afterok:12345
#SBATCH --export=ALL

srun hostname

Submitting with sbatch script.sh queues the job, which runs the srun command upon allocation.^[12] Queue management involves querying and modifying job states without deep administrative access. The squeue command displays pending and running jobs, filtered by options like --user=$USER for personal jobs or --states=PD for pending ones, with customizable output via --format to show details such as job ID, partition, and reason for delay.^[53] Complementing this, sinfo reports on available partitions and nodes, using --summarize for an overview of idle or allocated resources in queues like "debug" or "batch."^[54] For modifications, scontrol enables user-level updates, such as scontrol update JobId=12345 TimeLimit=02:00:00 to extend runtime or scontrol hold 12345 to pause a pending job, though releases require matching permissions.^[34] Best practices enhance efficiency for complex workflows. Job arrays, submitted with sbatch --array=0-99%10, manage up to thousands of similar tasks by limiting concurrent runs (e.g., 10 via %10) and using environment variables like $SLURM_ARRAY_TASK_ID for task-specific logic, reducing submission overhead for parameter sweeps.^[32] Multi-step jobs leverage srun sequentially within a single sbatch script or salloc allocation, as in srun -n2 step1; srun -n4 step2, to chain dependent computations without multiple submissions.^[8] Error handling includes checking exit codes in scripts (e.g., if [ $? -ne 0 ]; then echo "Error"; exit 1; fi) and using scancel promptly for failed jobs, while monitoring with squeue -u $USER --format="%i %T %R" helps identify issues like resource contention early.^[8]

Supported Platforms

Operating Systems

Slurm Workload Manager provides primary support for major Linux distributions, ensuring compatibility with widely used enterprise and community editions as of 2025.^[63] These include Red Hat Enterprise Linux (RHEL) versions 8, 9, and 10 along with their derivatives such as CentOS Stream, Rocky Linux, and AlmaLinux; Ubuntu versions 20.04 (Focal Fossa), 22.04 (Jammy Jellyfish), and 24.04 (Noble Numbat); SUSE Linux Enterprise Server (SLES) versions 12 and 15; and Debian versions 11 (Bullseye), 12 (Bookworm), and 13 (Trixie).^[63] This support encompasses thorough testing on x86_64, arm64, and ppc64 architectures within these environments, facilitating seamless deployment in high-performance computing clusters.^[63] Slurm integrates with systemd for service management on compatible Linux distributions, allowing administrators to enable and control daemons such as slurmctld, slurmdbd, and slurmd using standard commands like systemctl enable slurmctld.^[9] The software requires no kernel modifications for basic operation and is compatible with Linux kernels that support control groups (cgroups), which are available since kernel version 2.6.^[8] However, full functionality, including advanced resource isolation, benefits from kernels supporting cgroups v2 (generally version 4.5 or later) to leverage unified cgroups hierarchies.^[64] Support extends to certain Unix-like systems with limitations. FreeBSD and NetBSD are compatible but receive limited testing and maintenance, restricting their use to basic scenarios without guaranteed feature parity.^[63] Historical ports to IBM AIX exist, but current versions are not actively supported, leading to potential compatibility issues for production environments.^[65] Slurm offers no native support for Windows operating systems, though it can interact with Windows-based nodes via external integrations in hybrid setups.^[1] Known issues arise on older distributions lacking robust cgroups v2 implementation, which is essential for advanced resource controls like CPU and memory limiting. For instance, early RHEL 8 releases (prior to 8.2) treat cgroups v2 as a technology preview with incomplete cpuset support, requiring workarounds such as enabling DefaultCpuAccounting=yes in systemd configurations.^[64] Distributions without cgroups v2, such as those on kernels before 4.5, fall back to cgroups v1 but may encounter reduced performance in job containment and accounting.^[64]

Hardware Architectures

Slurm Workload Manager supports a range of CPU architectures, enabling deployment across diverse hardware environments. Primary compatibility includes x86_64 processors from Intel and AMD, which form the backbone of most high-performance computing (HPC) clusters. Additionally, arm64 (AArch64) architectures are fully supported, facilitating integration with processors such as AWS Graviton instances and Ampere Altra systems for energy-efficient computing. PowerPC64 (ppc64) support accommodates IBM Power systems, allowing Slurm to manage workloads on enterprise-grade hardware optimized for reliability and scalability.^[63] The software accommodates heterogeneous node configurations, including clusters with specialized accelerators. GPU support encompasses NVIDIA devices, with features like Multi-Instance GPU (MIG) partitioning introduced in version 21.08 for models such as the A100, and AMD GPUs via the RSMI library for autodetection. Legacy accelerators like the Intel Xeon Phi (formerly MIC architecture) received enhanced integration in earlier releases, enabling finer-grained resource management on Knights Landing processors. Interconnects such as InfiniBand and Ethernet are natively handled, with topology plugins optimizing allocation in fat-tree network topologies to minimize communication overhead.^[40]^[63]^[44] Slurm demonstrates robust scalability, designed to handle clusters exceeding 100,000 nodes and up to 1 million concurrent jobs, as evidenced by deployments on large-scale supercomputers. This capability relies on fault-tolerant daemons and efficient resource arbitration to maintain performance under high load. Furthermore, integrations with quantum co-processors, such as cat-qubit systems from Alice & Bob, enable hybrid classical-quantum scheduling through custom plugins, marking initial steps toward quantum-HPC convergence in research environments.^[1]^[66]

Licensing and Support

Open-Source License

Slurm Workload Manager is licensed under the GNU General Public License version 2 (GPLv2), a copyleft license that has governed the project since its inception in 2002.^[4]^[3]^[2] This license permits users to freely use, study, modify, and redistribute the software, provided that any derivative works are also distributed under GPLv2 terms and accompanied by the source code. Key provisions of the GPLv2 include copyleft requirements, which mandate that modifications or combined works must retain the same licensing obligations, ensuring that enhancements remain open to the community. The license explicitly disclaims any warranty, distributing the software "as is" without guarantees of merchantability or fitness for a particular purpose, thereby limiting liability for any damages arising from its use.^[67] GPLv2 includes a defensive provision in Section 7 regarding patents: if a party asserts patent rights against the program or its use, the license terminates for that party. Slurm includes a special exception permitting linkage with the OpenSSL library, which would otherwise conflict with GPLv2 due to OpenSSL's licensing.^[67] The source code for Slurm is distributed through the official SchedMD repository, which provides tarballs and documentation for each release.^[7] It is also mirrored on GitHub, facilitating version control and community contributions via pull requests.^[3] Furthermore, Slurm is packaged for major Linux distributions, including Ubuntu via APT repositories and Red Hat Enterprise Linux/CentOS via RPM packages, enabling straightforward installation through standard package managers.^[68] When integrating Slurm with proprietary software, compliance with GPLv2 requires careful consideration of linking and derivative works, as the copyleft clause may obligate the release of source code for any tightly coupled components under the same license.^[69] Users must ensure that proprietary elements do not violate these terms, potentially by using dynamic linking or separate processes to avoid triggering copyleft obligations.^[69]

Commercial and Community Support

Slurm Workload Manager receives commercial support primarily from SchedMD, the company founded in 2010 by its original developers to sustain and enhance the software.^[70] SchedMD offers a range of services, including real-time troubleshooting, performance optimization, custom development, configuration assistance, bug fixes, and onsite training for high-performance computing (HPC), high-throughput computing (HTC), AI, and machine learning environments.^[70] These services are utilized by major institutions such as Harvard University, NASA, and the Technical University of Denmark, with support contracts extending up to seven years for some clients.^[70] Additionally, SchedMD provides Slurm integration and support through partnerships with cloud providers like AWS, Google Cloud, and NVIDIA, as well as hardware vendors such as HPE.^[71]^[72]^[73] As an open-source project, Slurm benefits from robust community support mechanisms maintained by SchedMD. The primary channels include two official mailing lists: [email protected] for release announcements and critical updates, and [email protected] for user discussions, questions, and technical advice, which is also archived on Google Groups.^[74]^[75] Community members can report bugs, request features, and submit contributions via the official support tracker at support.schedmd.com, where patches are attached to issues labeled under "C - Contributions" rather than using GitHub pull requests.^[3] The Slurm source code is hosted on GitHub, enabling developers to access, review, and build upon the codebase, with extensive documentation available on the official site for self-guided troubleshooting and administration.^[3] These resources foster active participation from a global user base, including supercomputing centers and research institutions.^[76]

References

[1]
Overview - Slurm Workload Manager - SchedMD
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.
[2]
Slurm History - SchedMD
Slurm's initial release as a simple resource manager occurred in 2002. In the coming years, Slurm evolved to support a variety of processor types, network ...Missing: development | Show results with:development
[3]
SchedMD/slurm - A Highly Scalable Workload Manager - GitHub
Slurm is provided "as is" and with no warranty. This software is distributed under the GNU General Public License, please see the files COPYING, DISCLAIMER, and ...Releases · Actions · Activity
[4]
Slurm Workload Manager - SchedMD
On the November 2013 Top500 list, five of the ten top systems use Slurm including the number one system. These five systems alone contain over 5.7 million ...
[5]
Licenses Guide - Slurm Workload Manager - SchedMD
Slurm can help with software license management by assigning available licenses to jobs at scheduling time.
[6]
Slurm Scheduler by SchedMD
Rating 5.0 (2) In fact, Slurm is used by approximately 65% of supercomputers run by the TOP500. Slurm is the combined work of hundreds of experts providing a solution for ...
[7]
Slurm Workload Manager - Documentation
Documentation. NOTE: This documentation is for Slurm version 25.11. Documentation for older versions of Slurm are distributed with the source, ...About · Quick Start User Guide · Related Software · Quick Start Administrator Guide
[8]
Quick Start User Guide - Slurm Workload Manager
Overview. Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm ...
[9]
Quick Start Administrator Guide - Slurm Workload Manager - SchedMD
It orchestrates Slurm activities, including queuing of jobs, monitoring node states, and allocating resources to jobs. There is an optional backup controller ...Slurm.conf · Slurm Configuration Tool · Platforms · Related Software
[10]
[PDF] slurm_design.pdf - Slurm Workload Manager
Simple Linux Utility for Resource Management (SLURM) is an open source, fault- tolerant, and highly scalable cluster management and job scheduling system for.
[11]
Slurm Workload Manager: Efficient Cluster Management - GigaIO
Slurm is the workload manager on about 60% of the TOP500 supercomputers around the world. It is designed to be highly efficient and fault-tolerant.
[12]
Slurm for Artificial Intelligence & Machine Learning - SchedMD
Slurm adapts to AI workloads, manages GPUs, manages large datasets, and allocates resources for AI and machine learning, increasing efficiency.
[13]
sbatch - Slurm Workload Manager - SchedMD
When the job allocation is finally granted for the batch script, Slurm runs a single copy of the batch script on the first node in the set of allocated nodes.
[14]
[PDF] Slinky - KubeCon Europe 2025 - Slurm Workload Manager - SchedMD
Mar 26, 2025 · Slinky is a toolkit of projects to integrate Slurm into Kubernetes, including a Slurm-operator and Slurm-bridge.
[15]
Why Choose Slurm - SchedMD
Rating 5.0 (2) Slurm is an open-source workload manager designed specifically to satisfy the needs of high performance computing (HPC), high throughput computing (HTC), ...Missing: definition | Show results with:definition
[16]
Slurm Tutorial (formerly Slurm and Moab) - | HPC @ LLNL
This tutorial presents the essentials for using Slurm and Moab wrappers on LC platforms. It begins with an overview of workload managers.
[17]
https://jsspp.org/papers23/JSSPP_2023_keynote_SLURM.pdf
[18]
[PDF] Architecture of the Slurm Workload Manager - jsspp
The development of Slurm began at Lawrence Livermore National Laboratory. (LLNL) in 2002. It was originally designed as a simple resource manager 1capable.Missing: origins | Show results with:origins
[19]
[PDF] Slurm Overview
Nov 11, 2018 · Dev. started in 2002 @ Lawrence Livermore National Lab as a resource manager for Linux clusters. ○ Sophisticated scheduling plugins added in ...
[20]
slurmdbd - Slurm Workload Manager
slurmdbd provides a secure enterprise-wide interface to a database for Slurm. This is particularly useful for archiving accounting records.
[21]
[PDF] Slurm Version 15.08
Tracks utilization of memory, GRES, burst buffer, license, and any other configurable resources in the accounting database.
[22]
[PDF] Slurm 20.02 Roadmap and Beyond
○ Accounting will still reflect these nodes as IDLE, but at least sinfo will separate them and keep your users from complaining that their job won't launch ...Missing: energy | Show results with:energy
[23]
[PDF] Slurm 22.05, 23.02, and Beyond - SchedMD
○ Alternative to the State=CLOUD model previously used for cloud-bursting. ○ Only works with select/cons_tres. ○ New MaxNodeCount option introduced. ○ Max ...
[24]
Slurm Version 24.05.0rc1 is Now Available - SchedMD
May 14, 2024 · New RestrictedCoresPerGPU option at the Node level, designed to ensure GPU workloads always have access to a certain number of CPUs even when ...
[25]
Slurm Version 25.05.0 is Now Available - SchedMD
May 29, 2025 · Slurm version 25.05.0 is now available. Release notes and documentation are available, and downloads are also available.
[26]
slurmctld - Slurm Workload Manager
slurmctld is the central management daemon of Slurm. It monitors all other Slurm daemons and resources, accepts work (jobs), and allocates resources to those ...
[27]
slurmd - Slurm Workload Manager - SchedMD
slurmd is the compute node daemon of Slurm. It monitors all tasks running on the compute node, accepts work (tasks), launches tasks, and kills running tasks ...
[28]
slurmrestd - Slurm Workload Manager - SchedMD
DESCRIPTION. slurmrestd is REST API interface for Slurm. It can be used in two modes: Inetd Mode: slurmrestd will read and write to STDIO.
[29]
Network Configuration Guide - Slurm Workload Manager
This document will go over what is needed for different components to be able to talk to each other.Missing: RPC | Show results with:RPC
[30]
Slurm Workload Manager - slurm.conf
slurm.conf is an ASCII file which describes general Slurm configuration information, the nodes to be managed, information about how those nodes are grouped ...Missing: introduction | Show results with:introduction
[31]
Slurm Federated Scheduling Guide
Slurm includes support for creating a federation of clusters and scheduling jobs in a peer-to-peer fashion between them.Missing: introduction | Show results with:introduction
[32]
Job Array Support - Slurm Workload Manager - SchedMD
Slurm creates a single job record when a job array is submitted. Additional job records are only created as needed, typically when a task of a job array is ...<|separator|>
[33]
Lenovo Compute Orchestration in HPC Data Centers with Slurm
Mar 3, 2023 · Slurm's scheduling and resource management capabilities handle both effortlessly, including job arrays to submit millions of tasks in ...
[34]
scontrol - Slurm Workload Manager
Instruct one of Slurm's backup controllers (slurmctld) to take over system control. By default the first backup controller (INDEX=1) requests control from ...
[35]
[PDF] Checkpoint/restart in Slurm: current status and new developments
After Docker itself is well integrated in Slurm. (security and so), integrate Docker checkpoint in Slurm. • Employ checkpoint/restart for live job migration ...
[36]
[PDF] Job Migration in HPC Clusters by Means of Checkpoint/Restart
With regard to resilience, fault tolerance is currently achieved, at least in production environments, with the use of application-level checkpoints: modifying ...
[37]
Large Cluster Administration Guide - Slurm Workload Manager
Large Cluster Administration Guide. This document contains Slurm administrator information specifically for clusters containing 1,024 nodes or more.Missing: protocol RPC 100000
[38]
[PDF] Never use Slurm HA again: Solve all your problems with Kubernetes
Sep 12, 2023 · ○ Pre 18.08 used a BackupController/BackupAddr directive. ○ Now ... ○ Failover used to (mostly) work up until Cray CLE7, then ...Missing: controller backup
[39]
[PDF] Comparing Fault-tolerance in Kubernetes and Slurm in HPC ...
Slurm provides node failover and job checkpointing to handle node and job-related errors. Health checks and job requeueing mechanisms help maintain system ...
[40]
Generic Resource (GRES) Scheduling - Slurm Workload Manager
You must explicitly specify which GRES are to be managed in the slurm.conf configuration file. The configuration parameters of interest are GresTypes and Gres.<|control11|><|separator|>
[41]
Resource Limits - Slurm Workload Manager - SchedMD
Slurm's hierarchical limits are enforced in the following order with Job QOS and Partition QOS order being reversible by using the QOS flag 'OverPartQOS'.Missing: quorum | Show results with:quorum
[42]
Scheduling Configuration Guide - Slurm Workload Manager
Slurm is designed to perform a quick and simple scheduling attempt at events such as job submission or completion and configuration changes.
[43]
Gang Scheduling - Slurm Workload Manager - SchedMD
Slurm supports timesliced gang scheduling in which two or more jobs are allocated to the same resources in the same partition and these jobs are alternately ...
[44]
Topology Guide - Slurm Workload Manager
Slurm can be configured to support topology-aware resource allocation to optimize job performance. Slurm supports several modes of operation.<|separator|>
[45]
Slurm Workload Manager - Multifactor Priority Plugin
### Summary of Fair-Share Policies via Priority Factors
[46]
Quality of Service (QOS) - Slurm Workload Manager
A QOS can be attached to a partition. This means the partition will have all the same limits as the QOS. This does not associate jobs with the QOS, nor does it ...Effects on Jobs · Preemption · Partition QOS · Relative QOS
[47]
Preemption - Slurm Workload Manager - SchedMD
The PriorityTier of the Partition of the job or its Quality Of Service (QOS) can be used to identify which jobs can preempt or be preempted by other jobs. Slurm ...
[48]
Slurm Workload Manager - Advanced Resource Reservation Guide
### Summary of Reservations in Slurm
[49]
Accounting and Resource Limits - Slurm Workload Manager
Slurm can be configured to collect accounting information for every job and job step executed. Accounting records can be written to a simple text file or a ...
[50]
sacct - Slurm Workload Manager
The sacct command displays job accounting data stored in the job accounting log file or Slurm database in a variety of forms for your analysis.DESCRIPTION · OPTIONS · JOB STATE CODES · ENVIRONMENT VARIABLES
[51]
sreport - Slurm Workload Manager - SchedMD
DESCRIPTION. sreport is used to generate reports of job usage and cluster utilization for Slurm jobs saved to the Slurm Database, slurmdbd. Report data comes ...
[52]
squeue - Slurm Workload Manager - SchedMD
In the case of a job that can not run due to job dependencies never being satisfied, the full original job dependency specification will be reported.
[53]
Slurm Workload Manager - sinfo
### Summary of `sinfo` for Cluster Monitoring
[54]
Slurm Workload Manager - Trackable RESources (TRES)
### Summary of Trackable RESources (TRES) for Accounting Custom Metrics
[55]
sacctmgr - Slurm Workload Manager
sacctmgr is used to view or modify Slurm account information. The account information is maintained within a database with the interface being provided by ...
[56]
https://slurm.schedmd.com/sacctmgr.html
[57]
Containers Guide - Slurm Workload Manager
Slurm's scrun can be directly integrated with Rootless Docker to run containers as jobs. No special user permissions are required and should not be granted to ...
[58]
cgroup.conf - Slurm Workload Manager - SchedMD
cgroup.conf is an ASCII file which defines parameters used by Slurm's Linux cgroup related plugins. The file will always be located in the same directory as the ...DESCRIPTION · TASK/CGROUP PLUGIN
[59]
Control Group in Slurm
Slurm Cgroup Configuration Overview · slurm.conf provides options to enable the cgroup plugins. · cgroup.conf provides general options that are common to all ...
[60]
srun - Slurm Workload Manager
Run a parallel job on cluster managed by Slurm. If necessary, srun will first create a resource allocation in which to run the parallel job.
[61]
scancel - Slurm Workload Manager - SchedMD
scancel is used to signal or cancel jobs, job arrays or job steps. An arbitrary number of jobs or job steps may be signaled using job specification filters.
[62]
Platforms - Slurm Workload Manager - SchedMD
Slurm Workload Manager · SchedMD. Navigation. Slurm Workload Manager. Version 25.11 ... Installation Guide. Getting Help. Mailing Lists · Support and Training ...
[63]
Control Group v2 plugin - Slurm Workload Manager
The following procedure is required when switching from cgroup v1 to v2: Modify Slurm configuration to allow cgroup/v2 plugin: /etc/slurm/cgroup.conf:.Conversion from cgroup v1 · Following systemd rules · cgroup/v2 overview
[64]
[PDF] Resource Management using SLURM
May 1, 2006 · IBM SP / AIX Support. > SLURM builds on AIX and has a plugin for the. Federation switch. > Job step launch is performed through POE for.
[65]
Rivos Scales RISC-V Chip Verification with Spillbox's Hybrid Cloud ...
Additionally, its support of workload manager Slurm provided zero cost integration. For Rivos, this meant instant portability of workflows and no disruption ...
[66]
https://www.hartree.stfc.ac.uk/news/2025/11/06/alice-bob-and-stfc-hartree-centre-integrate-cat-qubit-quantum-computers-into-standard-hpc-workflow-and-job-scheduling-software-slurm/
[67]
None
### License Type and Key Terms for Slurm
[68]
Slurm Download for Linux (deb pkg rpm tgz txz xbps) - pkgs.org
Download slurm linux packages for Debian, Fedora, FreeBSD, Mageia, NetBSD, Red Hat Enterprise Linux, Rocky Linux, Slackware, Ubuntu, Void Linux, openSUSE.
[69]
Frequently Asked Questions about version 2 of the GNU GPL
You may copy and distribute such a system following the terms of the GNU GPL for ABC and the licenses of the other code concerned, provided that you include the ...Missing: Slurm Workload
[70]
SchedMD: Slurm Support & Development
SchedMD provides Slurm commercial support, development, and a workload manager, with a team of experts and the latest software for HPC.Download Slurm · Frequently Asked Questions · HPC Slurm Careers · Slurm History
[71]
AWS Marketplace: Slurm Workload Manager SchedMD Commercial ...
SchedMD is the core developer and services provider for Slurm providing support, consulting, configuration, development and training services.
[72]
Slurm SchedMD - Google Cloud
SchedMD provides commercial support, development, training, and configuration services worldwide. Why Slurm Workload Manager? Slurm is a highly configurable ...
[73]
SchedMD Slurm QuickSpecs - HPE
Oct 2, 2023 · Slurm Workload Manager is a market-leading, free open-source workload manager ... Linux kernel versions. Currently supported distributions ...
[74]
Mailing Lists - Slurm Workload Manager - SchedMD
Mailing Lists. SchedMD maintains two Slurm mailing lists: slurm-announce@schedmd.com is designated for communications about Slurm releases.
[75]
slurm-users - Google Groups
You have reached the Slurm Workload Manager user list archive. Please post all new threads to slurm-users@schedmd.com. All communication will be copied here.
[76]
Frequently Asked Questions - Slurm Workload Manager - SchedMD
Another significant difference is in fault tolerance. Failures involving sbatch jobs typically result in the job being requeued and executed again, while ...Missing: challenges | Show results with:challenges