Fact-checked by Grok 2 weeks ago

Scratch space

Scratch space is a temporary storage area, which may be on disk, in memory, or within file systems, dedicated to holding intermediate data generated during program execution or computational tasks, analogous to scratch paper for quick notations.^[1] In computing, particularly high-performance computing (HPC) environments, it functions as a high-speed working buffer to manage bursty data I/O operations, such as those in scientific simulations, genomic sequencing, or machine learning workflows, thereby preventing performance bottlenecks from slower long-term storage.^[2]^[3] Unlike persistent storage systems like home or project directories, scratch space offers no backups or data redundancy guarantees, with files automatically purged after short retention periods, which vary by system (typically days to months), to reclaim capacity and enforce its transient nature.^[4]^[5] Technical implementations typically involve parallel file systems like Lustre or GPFS, optimized for low-latency metadata operations and scalable I/O across thousands of nodes, though quotas on storage volume (e.g., up to 20 TB per user) and file counts (e.g., 20 million inodes) are common to prevent overuse.^[2]^[6]^[5] Users must promptly transfer critical outputs to backed-up locations, as data loss from hardware failures or policy-driven cleanups is expected.^[7]^[8]

Definition and Concepts

Core Definition

Scratch space refers to a designated area on storage devices, such as hard disk drives or solid-state drives, or in memory, used for holding transient data during processing tasks in computing systems.^[9] This concept is analogous to scratch paper, which serves for temporary notes or calculations that are not meant to be preserved long-term.^[10] The primary purpose of scratch space is to facilitate intermediate computations, buffering of data streams, or temporary file operations without the need to commit information to permanent storage solutions.^[2] Unlike permanent storage, which is designed for long-term data retention with features like backups and redundancy, scratch space is inherently ephemeral, with contents often automatically purged after a short period or upon task completion to reclaim resources.^[11] This ephemerality ensures efficient resource utilization but requires users to manage data migration to persistent locations if retention is needed.^[3] Basic examples include temporary files created by compilers during code compilation and optimization processes, where unnamed scratch files hold intermediate representations of the program.^[12] Similarly, image editing software like Adobe Photoshop employs scratch space on designated disks to manage rendering operations and handle data overflow when system RAM is insufficient.^[13] In high-performance computing environments, scratch space variants support rapid access for large-scale intermediate datasets.^[2]

Historical Origins

The concept of scratch space in computing traces its etymological roots to the pre-digital practice of using "scratch paper" or "scratch pad"—a disposable notepad for jotting down temporary notes, calculations, or rough drafts during manual work. This analogy carried over into early computing as a metaphor for transient storage areas designed to hold intermediate data without long-term retention. The term "scratch pad" first emerged in technical literature in the mid-1960s, referring to high-speed semiconductor memory modules integrated into mainframe systems for rapid, temporary data access, as highlighted in a 1966 Electronics magazine article on the Signetics 8-bit RAM for the SDS Sigma 7.^[14] Early adoption of scratch space concepts occurred in mainframe operating systems during the 1960s, driven by the constraints of contemporary storage technologies like magnetic tapes and drum memory, which were slow for random access and ill-suited for intermediate processing in batch jobs. IBM's OS/360, released in 1964 and fully documented by 1965, introduced temporary data sets as a core feature for batch processing workflows, allowing programs to allocate short-lived storage on direct-access devices for compiler outputs or step-to-step data passing without permanent cataloging. These temporary datasets were automatically managed and deleted upon job completion, addressing the need for efficient working space in resource-limited environments where tapes required sequential mounting and drums offered limited capacity.^[15] A key milestone came in the 1970s with the development of UNIX, where the /tmp directory was established as a standard location for temporary files, enabling applications to create and discard short-term data in a shared filesystem without interfering with permanent storage. This formalized the scratch space paradigm in multi-user systems, influencing subsequent operating systems by providing a dedicated, volatile area purged periodically or on reboot. In the 1980s supercomputing era, scratch space gained prominence in high-performance computing clusters, exemplified by Cray systems that incorporated fast local disks for temporary file handling; for instance, Cray installations reserved gigabytes of attached disk space specifically for application scratch needs, accommodating the intense I/O demands of vector processing jobs.^[16] The evolution of scratch space reflects a cultural continuity from manual engineering practices, where engineers relied on scrap paper for iterative calculations before committing results to formal records, mirroring how computing systems use transient areas to support exploratory or intermediate computations without cluttering archival storage.

Key Characteristics

Scratch space is characterized by its volatility, where data is intended for short-term use and is typically purged automatically after job completion, inactivity periods ranging from 21 to 90 days, or system reboots, with no backups or redundancy provided to ensure users do not rely on it for long-term persistence.^[17]^[18]^[4]^[19]^[20] In terms of performance, scratch space prioritizes high I/O throughput and low-latency access over data durability, often utilizing fast storage media such as SSDs, NVMe drives, or RAM disks to support rapid read and write operations during computational tasks.^[2]^[21]^[22]^[23] Capacity in scratch space varies but is frequently large-scale, reaching terabytes or petabytes in HPC clusters to accommodate intensive workloads, though it is shared among multiple users, which can lead to contention and resource competition in multi-user environments.^[24]^[25]^[26]^[27] Access patterns for scratch space are optimized for high-volume, intensive read and write activities during active processing, such as intermediate computations in simulations, rather than for long-term archival or infrequent retrieval.^[2]^[28]^[29]

Applications in Computing

General-Purpose Computing

In operating systems, scratch space is commonly implemented through designated directories for temporary file storage. In Unix-like systems such as Linux, the /tmp directory provides a world-writable location for short-term files, often mounted as a tmpfs to leverage RAM for faster access and automatic cleanup on reboot. Windows uses the %TEMP% environment variable, which resolves to a user-specific path like C:\Users%USERNAME%\AppData\Local\Temp, where applications store transient data without requiring explicit permissions checks beyond the path's accessibility.^[30] On macOS, per-user scratch space is allocated in /private/var/folders, a hidden directory that holds application caches and temporary items, with subdirectories managed by the system to isolate user data. Applications in general-purpose computing routinely employ scratch space for operational efficiency. Web browsers, for instance, create cache files in dedicated temporary directories to store downloaded resources like images and JavaScript, enabling quicker subsequent loads without redownloading.^[31] Text editors leverage it for autosave features, generating draft files in temp locations—such as Visual Studio Code's backups in %APPDATA%\Code\Backups on Windows—to recover unsaved work after crashes or interruptions. Compilers use scratch space for intermediate object files during code translation, placing them in system temp directories before linking and deletion to avoid cluttering source folders. Integration of scratch space into workflows occurs via standard APIs that handle allocation and lifecycle. The tmpfile() function from the C standard library, for example, dynamically creates an unnamed binary file in the system's temp directory, opened in read/write mode and automatically removed upon closure or program termination, ideal for processing tasks like sorting oversized datasets or encoding media streams. Similar mechanisms exist in higher-level languages, ensuring seamless temporary storage without manual file management. From a user perspective, scratch space is designed for automatic maintenance to minimize intervention, with operating systems purging inactive files—such as Linux's /tmp contents after boot or Windows' via Storage Sense—and applications deleting their temps on exit.^[32] Nonetheless, accumulation from faulty apps or high usage can exhaust available space, resulting in disk full errors, application crashes, and overall system slowdowns due to fragmented I/O operations.^[33]

High-Performance Computing (HPC)

In high-performance computing (HPC) environments, scratch space serves as a dedicated, high-speed storage partition integrated into supercomputing clusters, such as those at national laboratories, to facilitate job staging and the management of intermediate results during large-scale simulations. These partitions, often mounted as /scratch or accessible via environment variables like $SCRATCH, are optimized for temporary data handling in resource-intensive workflows, enabling efficient input/output (I/O) operations without burdening persistent storage systems. For instance, facilities like the National Energy Research Scientific Computing Center (NERSC) deploy all-flash Lustre filesystems for scratch space, providing petabyte-scale capacity—such as 35 PB on the Perlmutter system—with aggregate bandwidth exceeding 5 TB/s to support data-intensive scientific computations.^[34] Scratch space is particularly vital in parallel processing paradigms, where it accommodates temporary data writes from distributed nodes managed by frameworks like the Message Passing Interface (MPI) or job schedulers such as SLURM. In these setups, compute nodes generate and share transient files during tightly coupled computations, ensuring synchronization without network congestion; for example, in molecular dynamics simulations using tools like VASP, scratch partitions store checkpoint files and intermediate atomic configurations to enable fault-tolerant restarts across hundreds of nodes. Similarly, climate modeling applications, such as the Weather Research and Forecasting (WRF) model, leverage scratch for executing simulations and handling large output datasets, with SLURM scripts directing I/O to these spaces to maintain workflow efficiency.^[35]^[36]^[37] At facilities like NERSC and the Texas Advanced Computing Center (TACC), scratch space routinely manages petabytes of transient data for workloads including genome sequencing pipelines and AI model training, where intermediate results from distributed tasks—such as alignment files or gradient checkpoints—demand rapid access to prevent job failures. TACC's per-resource scratch systems, for instance, support SLURM-orchestrated jobs by providing unlimited temporary quotas for staging data in AI training runs on systems like Vista, with files purged after 10 days of inactivity to reclaim space. This scale underscores scratch's role in handling exabyte-era datasets in research clusters.^[38]^[39] To meet the demands of these environments, scratch space emphasizes low-latency I/O through technologies like parallel filesystems (e.g., Lustre or Panasas), which mitigate bottlenecks in tightly coupled jobs by distributing data across object storage targets and enabling high-throughput reads/writes—up to millions of IOPS on flash-based setups. Global scratch configurations, shared across nodes, facilitate multinode access for applications requiring collective I/O, while local variants on individual nodes offer even lower latency for node-specific temporaries, ensuring overall system performance in simulations where I/O can constitute 20-50% of runtime.^[34]^[24]

Specialized Environments

In embedded systems, scratch space is typically realized through scratchpad memory (SPM), a compiler-managed on-chip RAM alternative to caches that serves as temporary storage for data processing in resource-limited environments like IoT devices and microcontrollers. This approach is particularly suited for handling sensor data without relying on persistent storage, enabling low-power operations by mapping frequently accessed variables directly to SPM, which reduces energy consumption by an average of 40% and area-time product by 46% compared to cache-based systems.^[40] In deep learning accelerators integrated into embedded platforms, SPM acts as a unified RAM buffer for temporary data reuse, minimizing off-chip accesses by up to 80% across models like ResNet18 and supporting ephemeral workloads without long-term storage needs.^[41] Cloud and virtualized setups employ ephemeral storage as scratch space, providing high-speed temporary volumes for short-lived instances in serverless computing paradigms. In AWS, EC2 instance stores deliver block-level temporary storage physically attached to the host, ideal for buffers, caches, and scratch data in applications like Amazon EMR, where such volumes handle HDFS spills and temporary content that is automatically deleted upon instance termination.^[42] ^[43] Similarly, Google Cloud's Local SSD offers low-latency ephemeral block storage for scratch data and caches, such as in flash-optimized databases or tempdb for SQL Server, ensuring rapid access for transient workloads while data persists only during the VM's lifecycle.^[44] ^[45] Real-time systems in domains like automotive and avionics leverage scratchpad memory for scratch space to maintain determinism, using it as a buffer during critical operations with enforced size limits to guarantee predictable timing and avoid interference. A dynamic SPM unit managed at the OS level hides transfer latencies and enhances schedulability in multitasking embedded environments, supporting applications where timing predictability is paramount without architectural overhauls.^[46] Scratchpad-based operating systems further enable this by implementing a three-phase task model—load, execute, unload—with dedicated DMA scheduling to provide temporal isolation across multi-core setups, achieving up to 2.1× speedups in benchmarks while ensuring hard real-time compliance for safety-critical buffering.^[47] In gaming and multimedia processing, scratch space facilitates temporary asset handling and rendering pipelines, where engines allocate ephemeral storage for build-time operations and runtime buffers. For instance, Unity's Temp folder serves as a staging area for temporary files generated during asset builds and compilation, allowing safe creation of unique paths for intermediate data without risking overwrites, which is essential for efficient pipeline workflows in game development.^[48] This temporary allocation supports in-game asset processing, such as dynamic loading and rendering of transient elements, mirroring broader use in multimedia tools for non-persistent data flows.^[49]

Types and Implementations

Disk-Based Scratch Space

Disk-based scratch space utilizes persistent storage media, such as hard disk drives (HDDs) or solid-state drives (SSDs), to provide temporary high-capacity areas for intermediate data in computing environments, particularly in high-performance computing (HPC) clusters. These implementations leverage rotational disks for cost-effective bulk storage or flash-based SSDs and NVMe devices for improved speed while maintaining larger capacities compared to volatile memory options.^[2]^[50] In shared setups, redundancy is often achieved through RAID configurations, such as RAID-6 arrays, which tolerate multiple disk failures while aggregating capacity across multiple drives to support cluster-wide access.^[50]^[51] High-performance parallel file systems are commonly employed to format and manage disk-based scratch space, enabling efficient concurrent access from multiple nodes. Systems like Lustre and IBM Spectrum Scale (GPFS) are formatted on these storage media in HPC clusters, supporting parallel I/O operations through striping data across object storage targets (OSTs) or wide block allocations to maximize bandwidth for large-scale workloads.^[52]^[53] For instance, Lustre configurations in clusters like NERSC's Perlmutter provide 35 PB of usable all-flash storage with aggregate bandwidths exceeding 5 TB/s for read/write operations on shared scratch directories.^[34] GPFS similarly facilitates parallel I/O in scratch environments, such as /gss/scratch, by optimizing for metadata operations and concurrency in multi-node scenarios.^[52] While disk-based scratch space offers terabyte- to petabyte-scale capacities suitable for handling voluminous temporary datasets, it incurs higher access latencies—typically in the milliseconds range—compared to nanosecond-scale RAM access, making it ideal for data that can tolerate brief interruptions like node reboots but requires persistence across short system events.^[53]^[54] In practice, configurations often involve dedicated partitions, such as /scratch on Linux servers, mounted from these file systems with options like noatime to reduce metadata updates and enhance I/O performance by avoiding unnecessary access time logging on reads.^[53]^[55] This setup is prevalent in HPC environments, where quotas (e.g., 10 TB per user on Lustre scratch) ensure efficient allocation for job-specific temporary storage.^[53]

Memory-Based Scratch Space

Memory-based scratch space employs random-access memory (RAM) to create volatile temporary file systems, offering extremely high-speed access for short-lived data in computing tasks. In Linux environments, this is commonly implemented using tmpfs (temporary file storage facility), which mounts a portion of the system's RAM (and optionally swap space) as a file system, typically accessible via paths like /dev/shm or $TMPDIR in HPC jobs.^[56]^[57] Unlike disk-based options, memory-based scratch provides nanosecond-scale latencies and high IOPS, making it suitable for I/O-intensive operations that require minimal delay, such as caching intermediate results in simulations or temporary buffering in machine learning training. However, its capacity is limited by available RAM—often a fraction of node memory, e.g., up to half the node's RAM (such as 64 GB on a 128 GB node)—and data is lost upon power cycles, node reboots, or job termination, with no persistence or redundancy.^[24]^[21] In HPC clusters, it is usually node-local, enhancing performance for single-node workloads but requiring data transfer to shared storage for multi-node collaboration. Users must manage size limits carefully to avoid out-of-memory errors, and it is purged automatically at job end to free resources.^[58]

Local vs. Global Configurations

Local scratch space consists of temporary storage resources attached directly to individual compute nodes in a high-performance computing (HPC) cluster, such as SSDs, enabling rapid access for private job data without the need for inter-node sharing.^[24]^[2] This setup leverages the proximity of storage to the processor, minimizing latency and maximizing throughput for I/O operations, but limits visibility to the specific node, making it unsuitable for collaborative workloads.^[59]^[60] Global scratch space, by comparison, provides a centralized repository of temporary storage accessible across all nodes in the cluster via a networked file system, often connected through high-speed interconnects like InfiniBand.^[61]^[62] This shared architecture facilitates data exchange in distributed computing environments but incurs overhead from network traversal, which can degrade performance for latency-sensitive tasks relative to local options.^[2]^[63] Selection between local and global configurations hinges on workload characteristics: local scratch is ideal for single-node, I/O-heavy computations where isolation and speed are paramount, whereas global scratch supports multi-node parallel applications, such as simulations requiring synchronized data access.^[2]^[24] In practice, hybrid approaches are common, combining local storage for efficient temporary processing with global storage for data staging and interoperability across nodes.^[60]^[63]

Management and Best Practices

Allocation and Quotas

In multi-user computing environments, particularly high-performance computing (HPC) systems, scratch space is typically allocated dynamically on an on-demand basis to support temporary data needs during job execution. Job schedulers like SLURM enable this through options such as --tmp=<size>[units], which requests a minimum amount of temporary disk space per node, with defaults in megabytes and support for suffixes like K, M, G, or T for other units.^[64] This allocation occurs when a job is submitted via commands like sbatch, provisioning local or shared temporary storage automatically upon job initiation. Deallocation is equally automated, with the space released and any associated files cleared immediately after job completion to reclaim resources for subsequent users.^[64] Operating system-level calls, such as those in Linux for mounting or creating temporary directories (e.g., via mktemp or filesystem mounts), can also facilitate ad-hoc allocation in non-scheduled environments, though schedulers predominate in shared systems. Quota mechanisms are essential for preventing resource monopolization in shared scratch spaces, imposing limits on storage usage per user or group. In HPC clusters, common configurations set boundaries like 1 TB per user on shared scratch filesystems, enforced through integrated tools such as the Linux quota(8) utility, which monitors and restricts disk usage on filesystems like ext4 or Lustre. Custom scripts or scheduler extensions often extend this to group-level quotas, ensuring equitable distribution across projects; for instance, Yale's HPC environment applies byte and file count limits to its 60-day scratch tier using similar enforcement.^[65] These quotas are typically soft (with grace periods) or hard (immediate blocking), configurable at mount time with options like usrquota or grpquota. To promote fair usage, many systems implement time-based purging policies alongside quotas, automatically removing inactive files to maintain availability. Files unmodified or unaccessed for periods like 30 to 60 days are deleted, as seen in policies from the Alliance for Compute-intensive Research in Canada, where 60-day thresholds trigger periodic scans and purges on scratch volumes.^[19] Monitoring tools such as the df command or custom dashboards (e.g., integrated with SLURM's sinfo for node storage visibility) allow administrators and users to track utilization and anticipate purges. These policies balance immediate job needs with long-term system health, often notifying users via email when approaching limits. Exceeding allocation quotas or available space poses overcommitment risks, potentially leading to job failures or performance throttling. If requested temporary space via SLURM's --[tmp](/page/TMP) cannot be fulfilled due to node constraints, the job may queue indefinitely or fail to start, as the scheduler prioritizes guaranteed resources.^[64] In quota-enforced filesystems, attempts to write beyond limits trigger errors like "Disk quota exceeded," halting operations and requiring manual cleanup or quota adjustments by administrators. Throttling can occur in overprovisioned environments, where I/O bandwidth is capped to prevent system-wide degradation, as documented in TACC's guidelines for managing scratch I/O.^[37]

Data Handling and Cleanup

In scratch space systems, data typically follows a defined lifecycle to ensure efficient resource utilization. Files are created during the execution of computational jobs for storing intermediate results, such as temporary outputs from simulations or analyses, and are actively used throughout the job's runtime to support high-speed processing.^[2] Upon job completion, mandatory deletion of these files is required to reclaim space, preventing accumulation that could hinder new job submissions and maintaining the transient nature of scratch storage.^[66] This post-completion purge is often enforced automatically, with files subject to removal if not explicitly managed, as scratch space is designed solely for short-term use without persistence guarantees.^[5] Automated tools play a crucial role in managing data removal across scratch environments, particularly in high-performance computing (HPC) clusters. These include cron jobs or background daemons that perform periodic scans of directories, identifying and deleting files based on criteria like age or inactivity thresholds. For instance, in Linux-based systems, tools like tmpreaper can be configured to remove files unaccessed for a specified period, such as 24 hours, thereby automating cleanup in scratch directories analogous to /tmp management.^[67] More advanced implementations, such as the automated scratch storage cleanup tool developed for heterogeneous HPC file systems like GPFS and Lustre, operate without human intervention, scanning and purging old data at regular intervals to sustain available capacity.^[68] In practice, many HPC centers set policies where files exceeding a lifespan—often 30 to 60 days based on creation time (crtime)—are systematically deleted to enforce space turnover.^[69]^[70] Users bear significant responsibility for proactive data handling to mitigate risks associated with scratch space's non-persistent design. Best practices recommend incorporating explicit cleanup commands into job scripts, such as rm -rf to remove temporary directories and files immediately after use, ensuring no remnants persist beyond necessity.^[71] HPC documentation universally warns of inevitable data loss due to automated purges and lack of backups, urging users to treat scratch as ephemeral and to avoid storing irreplaceable data there.^[72]^[73] Error handling in scratch space is inherently limited, with recovery options minimal owing to the absence of versioning or redundancy. Critical intermediate results should be backed up to permanent storage tiers, such as archival systems, during the job lifecycle to prevent total loss from unexpected failures or purges.^[74] Quotas can aid enforcement by alerting users to impending space constraints, prompting timely cleanup.^[70]

Performance Optimization

Performance optimization in scratch space focuses on tuning I/O operations, monitoring resource utilization, adopting efficient data handling practices, and evaluating system efficacy through benchmarking to support high-throughput computational workloads in high-performance computing (HPC) environments.^[75] I/O tuning enhances scratch space throughput by configuring parallel filesystems, such as Lustre, with appropriate striping parameters to distribute data across multiple object storage targets (OSTs). Increasing the stripe count, for instance from a default of 1 to 16, can yield up to a 4x improvement in write bandwidth, from approximately 1.3 GiB/s to 5.6 GiB/s, by parallelizing access and reducing contention.^[75] For SSD-based scratch spaces, aligning writes to filesystem stripe boundaries, typically 1 MiB, minimizes performance penalties from unaligned accesses, which can otherwise reduce throughput by about 20% due to inefficient server spanning.^[75] Monitoring tools integrated into HPC clusters enable real-time and historical tracking of scratch space usage to identify I/O bottlenecks. The sar utility from sysstat collects system-wide activity data, including disk I/O metrics, allowing analysis of bandwidth and latency trends at the job level.^[76] iostat reports detailed device-level statistics, such as read/write rates and service times, to pinpoint contention in shared Lustre filesystems commonly used for scratch.^[76] Ganglia provides cluster-scale visualization of storage metrics, though it aggregates data without job-specific resolution, complementing tools like TACC Stats for broader bottleneck detection.^[76] Best practices for scratch space efficiency include pre-staging input data from persistent storage like WORK to SCRATCH directories prior to job launch, which accelerates I/O by leveraging the high-performance temporary filesystem during computation.^[37] Minimizing the number of files, particularly avoiding one output file per process, reduces metadata overhead; instead, employ parallel I/O libraries such as HDF5 or NetCDF to consolidate data into fewer shared files, improving scalability on parallel filesystems.^[37] For large temporary datasets, compression techniques like gzip or bzip2 can reduce storage footprint and I/O volume, though they should be balanced against CPU overhead in memory-constrained jobs.^[74] Benchmarking scratch space performance often relies on the IOR tool, a standard MPI-based benchmark for parallel I/O, which measures bandwidth through configurable sequential read/write tests. In HPC evaluations, IOR assesses metrics like aggregate throughput, revealing that large transfer sizes (e.g., 256 MB) achieve up to 3500 MB/s on Lustre-based systems like Jaguar, compared to mere 2 MB/s with small 1 KB blocks, guiding optimizations for scratch efficacy.^[77]

Advantages and Limitations

Primary Benefits

Scratch space provides significant efficiency gains in computing workflows by enabling rapid access to temporary storage without the overhead of writing to or retrieving from permanent storage systems. This allows for faster iterations in development, analysis, and simulation tasks, as intermediate data can be generated, processed, and discarded locally on high-performance file systems. For instance, in high-performance computing (HPC) environments, the use of scratch space for I/O-intensive jobs reduces staging times by up to 85.9% compared to direct transfers to persistent storage, minimizing delays in data movement.^[27] Additionally, it decreases wait times for scratch access by an average of 75.2%, accelerating overall job throughput for bursty or iterative workloads.^[27] Resource optimization is another key advantage, as scratch space frees primary storage for long-term archival data while supporting transient needs without committing to persistent allocations. By designating scratch areas for temporary files—often automatically purged after a set period, such as 90 days—it prevents clutter in durable systems and accommodates variable workloads efficiently.^[18] This approach reduces average scratch utilization by 6.6% per hour relative to traditional caching methods, ensuring more space remains available for active computations.^[78] In terms of cost-effectiveness, scratch space leverages less expensive, high-capacity hardware for temporary use, avoiding the higher expenses associated with redundant, durable persistent arrays in HPC setups. Centers can provide extensive scratch storage at no additional cost to users, enabling large-scale temporary data handling without proportional increases in operational budgets for durability features.^[79] This model improves resource serviceability, reducing job scheduling delays by 282% on average and indirectly lowering costs through better infrastructure utilization.^[78] Scalability benefits arise from scratch space's ability to manage large intermediate datasets in big data pipelines and parallel processing, where permanent storage growth would otherwise be prohibitive. High-speed parallel file systems in scratch configurations support data-intensive computations across numerous nodes, scaling to handle petabyte-scale temporary files without bottlenecks in simultaneous access.^[80] For example, in cluster environments, scratch systems are designed to perform well with massive datasets, facilitating workflows in genomics or climate modeling by providing fast, local buffering for outputs from distributed jobs.^[81]

Common Challenges

One of the primary risks associated with scratch space usage is data loss due to its volatile nature, where files are not backed up and can be accidentally deleted or lost if computational jobs crash without proper cleanup mechanisms.^[2]^[82] In high-performance computing (HPC) environments, users often misuse scratch space as pseudo-persistent storage, leading to unintended deletions during system purges or hardware failures.^[2]^[73] In shared HPC systems, contention arises when multiple users compete for limited scratch resources, causing performance degradation through "noisy neighbor" effects where one user's excessive I/O demands slow down others.^[2] Quota exhaustion exacerbates this issue, as project-based limits—such as 1 TB total per group—can halt ongoing jobs if exceeded, particularly in multi-user setups without per-user caps.^[73]^[83] Security concerns emerge in multi-tenant HPC environments, where temporary files in scratch space may inadvertently contain sensitive data, increasing the risk of exposure through side-channel attacks or data leakage between users sharing the same infrastructure.^[82] Maintenance overhead is significant, requiring regular purging of old files—often those unmodified for 60 to 90 days—to prevent disk fill-up and fragmentation, which can disrupt services and lead to widespread job failures across the system.^[20]^[18]^[73]

Mitigation Strategies

To mitigate the risks associated with scratch space usage in high-performance computing (HPC) environments, backup protocols involve selectively copying critical intermediate files generated during job execution to more persistent storage locations, such as home directories or archival systems, to prevent data loss from automatic purges or hardware failures.^[84]^[85] For instance, users can implement job scripts that periodically checkpoint key outputs to project or home storage, ensuring that only essential data is retained beyond the scratch space's lifecycle.^[86] This approach is particularly vital in systems where scratch files are deleted after a fixed period, such as 30 days, without any built-in replication.^[87] Usage monitoring strategies help prevent quota exceedances by deploying tools that track storage consumption in real-time and trigger alerts or automated actions. Commands like myquota or taccinfo allow users to query current usage against allocated limits across scratch directories, enabling proactive management.^[88]^[37] Scripting can further enhance this by integrating periodic checks into workflows, such as sending email notifications when usage approaches 80% of quotas or initiating cleanup of obsolete files.^[89] These practices reduce contention and downtime in shared HPC environments by maintaining efficient space utilization.^[65] Design best practices in job scripting emphasize minimizing scratch space demands through techniques like in-situ processing, where data analysis occurs directly on generated outputs without writing large intermediate files to disk. This method operates within constrained memory to avoid excessive I/O, as demonstrated in workflows combining in-situ computation with limited scratch allocation for extreme-scale simulations. Additionally, for sensitive temporary data, encryption can be applied via tools like gpg or filesystem-level policies to protect against unauthorized access during processing, ensuring compliance in regulated research domains.^[3] At the system level, automated tiering solutions dynamically relocate data between scratch and permanent storage based on access patterns or age, optimizing retention without manual intervention. Systems like Data Jockey automate this for multi-tiered HPC setups by monitoring file metadata and migrating inactive data to archival tiers, thereby extending effective storage capacity.^[90]^[91] Such policies, often policy-driven, ensure that hot data remains on fast scratch media while cold data is offloaded, reducing the administrative burden on users.^[2]

References

[1]
Elevate Your Projects Using Our Scratch Space Features | Lenovo US
### Summary of Scratch Space from https://www.lenovo.com/us/en/glossary/scratch-space/
[2]
Understanding Scratch Space in HPC - DataCore Software
Scratch space is a high-speed, temporary storage area used to hold intermediate data generated during computational jobs. It acts as a working buffer for ...What is Scratch Space? · Technical Requirements for... · Example Use Cases for...
[3]
Storage Services at CHPC - Center for High Performance Computing
A scratch space is a high-performance temporary file system for files being accessed and operated on during jobs. It is recommended to transfer data from home ...
[4]
HPC Scratch Space - ser - Dartmouth
Scratch space is temporary storage for files. Most of the time you will want to use your own permanent storage but there are situations where scratch might be ...
[5]
Scratch Space Policy – Research Computing New
There are limits on storage and file count (inodes) on /scratch. Currently, these limits are 20TB for storage and 20 million files (inodes) per user.<|control11|><|separator|>
[6]
File Systems on ISAAC-NG - The University of Tennessee, Knoxville
In High Performance Computing (HPC), a Lustre scratch space refers to a storage area that is used to hold intermediate data during the execution of serial and ...
[7]
Storage File Systems
Jul 21, 2025 · The scratch file system is intended for temporary and intermediate files, so files are not backed up in any way. If needed, files stored here ...2 Home File System (/home1) · 3 Project File System... · 4 Scratch File System...
[8]
Storage and Data Guide — UIUC NCSA ICC User Guide
The /scratch area of the filesystem is where you can place data while it's under active work. The scratch area of the cluster is provisioned by the Campus ...Storage Areas · Accessing Storage · Guides/tutorials<|control11|><|separator|>
[9]
What Is Scratch Space? - Computer Hope
Jul 9, 2025 · In general, a scratch space is a temporary location in memory that allows something to be saved. A scratch space may be used because another ...
[10]
High Performance (Tier 1) Storage
The "scratch" name is by analogy to "scratch paper" for temportary notes or work. Both "global" scratch space, which is a set-aside area of primary storage ...
[11]
What is scratch space? - MCW Research Computing
Aug 25, 2023 · Scratch space is traditionally the high-performance storage component in any cluster. It's purpose is to hold temporary files generated by ...
[12]
Scratch Files in Fortran Wiki
Dec 21, 2017 · A scratch file is a special type of external file. It is an unnamed temporary file intended to exist only while being used by a single program execution.
[13]
https://helpx.adobe.com/photoshop/using/scratch-disks-preferences.html
[14]
Memory & Storage | Timeline of Computer History
Early memory included the Williams-Kilburn tube, magnetic core memory, and magnetic drum memory. Tape drives and magnetic disk drives were also early storage ...
[15]
[PDF] IBM Operating System/360 Concepts and Facilities - Bitsavers.org
A temporary data set is, in effect, passed to succeeding steps. This type of data set is often used for temporary storage of a compiler's output that serves ...
[16]
[PDF] (U.---S-Smi Ii - NASA Technical Reports Server (NTRS)
There are about 56 GB of disks attached directly to the Cray; 47 GB of this space is reserved for application scratch space and files over a few days old are ...
[17]
ARL DSRC Introductory Site Guide - HPC Centers
Oct 1, 2025 · A system "scrubber" monitors utilization of the scratch space. Files not accessed within 21 days on the scratch file system are subject to ...Missing: characteristics | Show results with:characteristics
[18]
Scratch - FASRC DOCS
Apr 15, 2025 · This is temporary high-performance space and files older than 90 days will be deleted through a periodic purge process. This purge can run at ...
[19]
Scratch purging policy - Alliance Doc
Jul 25, 2025 · Files older than 60 days are periodically deleted according to the policy outlined in this page. Note that the purging of a file is based on its age.Missing: HPC volatility
[20]
Scratch File Purging - Purdue's RCAC
Files in scratch directories are purged after 60 days of inactivity (30 on Bell), based on last access/modification time, and are not recoverable.Missing: volatility temporary<|control11|><|separator|>
[21]
Scratch Space - High Performance Computing
Our cluster offers three types of high-performance storage, each dedicated to scratch space for temporary data processing. ... Local scratch is a node filesystem ...
[22]
High Performance Computing Systems - Augusta University
Local scratch space: 960GB SSD; Number of nodes: 18. These nodes are candidates for bioinformatics, genomics, population science, mathematics, chemistry, and ...
[23]
Computational Systems | Department of Structural Biology
The compute nodes have high-performance 4TB SSD local scratch space. Nodes contain up to 6 Nvidia GPUs for CUDA accelerated applications. Cluster ...<|control11|><|separator|>
[24]
Scratch Storage | Minnesota Supercomputing Institute
While global scratch is a shared space that is visible to all nodes, local scratch disk, SSD and RAMdisk is only connected (and visible) to a single node.Missing: definition | Show results with:definition
[25]
Frontier User Guide - OLCF User Documentation
Each Frontier compute node consists of [1x] 64-core AMD “Optimized 3rd Gen EPYC” CPU (with 2 hardware threads per physical core) with access to 512 GB of DDR4 ...<|control11|><|separator|>
[26]
WebHome < WebDocumentation < Foswiki
... capacity and /work has 1.1 PB of capacity. A cumulative total of 250TB of local scratch (approximately 1.5 TB of /scratch space on most compute/GPU nodes).
[27]
[PDF] Reconciling Scratch Space Consumption, Exposure, and Volatility to ...
Our approach optimizes precious scratch space usage and minimizes the exposure of input data at center storage. Such an approach is a fundamentally novel way of ...
[28]
Performance characterization of scientific workflows for the optimal ...
... scratch space, coupling workflow ... BBs provide a unique opportunity to optimize I/O access patterns in scientific workflows executing on supercomputers.
[29]
[PDF] A Heterogeneous-Aware Multi-Tiered Distributed I/O Buffering System
Jun 15, 2018 · It is designed to offer a scratch space for fast temporary I/O. Upon ... to cover a wide variety of applications' I/O access patterns.
[30]
Creating and Using a Temporary File - Win32 apps | Microsoft Learn
Oct 10, 2024 · Use `GetTempFileName` and `GetTempPath2` to get a unique file name and path. Then, use `CreateFile` to create the temporary file.
[31]
How to clear the Firefox cache | Firefox Help
### How Browsers Use Temporary Cache Files
[32]
Manage drive space with Storage Sense - Microsoft Support
Go to Start > Settings > System > Storage . Under disk usage, select Temporary files , then ensure Previous Windows Installation(s) is checked and select Remove ...Missing: slowdown | Show results with:slowdown<|separator|>
[33]
Free up drive space in Windows - Microsoft Support
Storage Sense can automatically free up drive space for you by getting rid of items that you don't need, like temporary files and items in your Recycle Bin. For ...Missing: slowdown | Show results with:slowdown
[34]
Perlmutter scratch - NERSC Documentation
Perlmutter Scratch is an all-flash Lustre file system for high-performance temporary storage of large files, accessed via $SCRATCH or $PSCRATCH.Usage · Performance · File System Purging
[35]
VASP - NERSC Documentation
VASP is a package for performing ab initio quantum-mechanical molecular dynamics (MD) using pseudopotentials and a plane wave basis set.
[36]
WRF - NERSC Documentation
Use the scratch space for model execution and set an appropriate file stripe on the execution directory. Use the parallel netcdf library I/O Time spent on ...
[37]
Managing I/O on TACC Resources - TACC HPC Documentation
Oct 8, 2024 · You can use the /tmp partition to read/write temporary files that do not need to be accessed by other tasks. If this output data is needed at ...
[38]
Filesystems Overview - NERSC Documentation
The scratch file system is intended for temporary uses such as storage of checkpoints or application input and output during jobs. To facilitate data staging, ...Storage Systems Usage and... · Usage Limits · NERSC Global File Systems...
[39]
Vista User Guide - TACC HPC Documentation
Nov 3, 2025 · Vista expands the Frontera project's support of Machine Learning and GPU-enabled applications with a system based on NVIDIA Grace Hopper architecture.
[40]
(PDF) Scratchpad Memory: A Design Alternative for Cache On-chip ...
In this paper we address the problem of on-chip memory selection for computationally intensive applications, by proposing scratch pad memory as an ...
[41]
Scratchpad Memory Management for Deep Learning Accelerators
Aug 12, 2024 · In this work we propose using all on-chip scratchpad memory, including space for double buffering, in a unified way.
[42]
Instance store temporary block storage for EC2 instances
An instance store provides temporary block-level storage for your EC2 instance. This storage is provided by disks that are physically attached to the host ...Add instance store volumes · SSD instance store volumes · Initialize instance store...
[43]
Instance storage options and behavior in Amazon EMR
Instance store and Amazon EBS volume storage is used for HDFS data and for buffers, caches, scratch data, and other temporary content.
[44]
Choose a disk type | Compute Engine - Google Cloud Documentation
Because of its ephemeral nature, use temporary block storage for only scratch data, caches such as tempdb for Microsoft SQL Server, or storage for flash- ...Persistent Disk performance · About Local SSD · Google Cloud HyperdiskMissing: serverless | Show results with:serverless
[45]
Local SSD - Google Cloud
Google Cloud's Local SSD offers the lowest latency and highest-performance ephemeral storage for scratch data and caches.Missing: serverless | Show results with:serverless
[46]
A Dynamic Scratchpad Memory Unit for Predictable Real-Time ...
Aug 7, 2025 · Scratchpad memory is an attractive alternative to caches in real-time embedded systems due to its advantages in terms of timing ...
[47]
US10649914B1 - Scratchpad-based operating system for multi-core ...
The disclosed scratchpad-based operating system may provide predictability for hard real-time applications on multi-core embedded systems. In order to achieve ...
[48]
Scripting API: FileUtil.GetUniqueTempPathInProject - Unity - Manual
You can use it to create temporary files/folders and be sure that you are not overriding somebody else's files, plus you don't have to keep track of the unique ...
[49]
Unity - Developing Your First Game with Unity and C#
The Temp folder is used for temporary files from Mono and Unity during the build process. I want to stress the importance of making changes only through the ...<|control11|><|separator|>
[50]
None
### Summary of Lustre File System for Parallel I/O on Theta (Scratch Space/Temporary Storage)
[51]
Adventures in storage - Sherlock changelog
Dec 3, 2019 · Each disk array was configured with 6x 10-disk RAID6 LUNs, and every SAS path being redundant, the two OSS servers could act as a high- ...
[52]
[PDF] Best Practice Guide - Parallel I/O - LRZ-Doku
for HPC use would be writing to a scratch space in a traditional file system before pushing the data to an object ... | MPIIO write access patterns for ...
[53]
Storage overview - ULHPC Technical Documentation
Clustered file systems like GPFS and Lustre do not handle high throughput small file I/O well. Many file system technologies (e.g. ZFS) can hide a lot of the ...
[54]
High-Performance Computing Storage: Meeting HPC Demands
HPC storage needs massive parallel IO, scale-out performance, low latency, and should handle both flash and HDD for different workloads.
[55]
Improve Linux system performance with noatime - Opensource.com
Jun 4, 2020 · Turning off atime is a small but effective way to improve system performance. Here's what it is and why it matters.
[56]
Overview of File Systems | Ohio Supercomputer Center
The permanent (backed-up) and scratch file systems all have quotas limiting the amount of file space and the number of files that each user or group can use.
[57]
Storage - Center for Computational Research
Local Scratch¶. All compute nodes have local disk space in /scratch; The local scratch space on a compute node is only available to batch jobs running on that ...Global Scratch · Local Scratch · Checking Quotas
[58]
Global scratch file system. - CECI
By contrast to a global scratch, a local scratch is only visible from within the compute node it belongs to. On the CÉCI clusters, they are available through ...
[59]
Storage on ALICE - HPC wiki
Jul 29, 2025 · The total size of this shared scratch space is currently 370 TB which is significantly more than the old shared scratch space. There is a ...Missing: global | Show results with:global
[60]
[PDF] Using the NIH HPC Storage Systems Effectively
• /scratch is a network accessed, global /scratch directory. • The same ... • /lscratch = local scratch space on a compute node. • Allocated as a generic ...
[61]
sbatch - Slurm Workload Manager - SchedMD
When the job allocation is finally granted for the batch script, Slurm runs a single copy of the batch script on the first node in the set of allocated nodes.
[62]
HPC Storage - Yale Center for Research Computing
Oct 1, 2025 · The storage is separated into three quotas: Home, Project, and 60-day Scratch. Each of these quotas limit both the amount in bytes and number ...Storage Spaces · Project · Check Your Usage And Quotas
[63]
Scratch Space - USDA Scientific Computing Initiative - SCINet
Sep 3, 2024 · Scratch space is 1.5 TB of fast local temporary storage on SSDs, accessed via $TMPDIR, and is erased upon job exit.
[64]
tmpreaper - removes files which haven't been accessed for a period ...
tmpreaper removes files and empty directories that haven't been accessed for a given time, often used to clean temporary directories like /tmp.
[65]
A Fully Automated Scratch Storage Cleanup Tool for Heterogeneous ...
Jul 8, 2022 · This tool runs without human intervention to clean up our scratch space periodically, and it's compatible with both GPFS and Lustre, two popular ...Missing: handling | Show results with:handling
[66]
User Scratch Space - HPC Documentation - UIowa Wiki
An automated cleanup process will run periodically on the server to delete files whose crtime has reached the maximum lifespan. This space is provided by a ...
[67]
Scratch space - MSU HPCC User Documentation
Each user is provided with a working directory know as scratch space. This space is intended for intensive input/output ( I/O ) operations i.e., heavy reading ...Missing: definition | Show results with:definition
[68]
Best Practices for Supercomputer Users - NCAR HPC Documentation
Use scratch space for temporary files¶. The GLADE scratch file space is a temporary space for data that will be analyzed and removed within a short amount of ...
[69]
HPC Scratch storage management policy
May 28, 2025 · When purged, any file or directory not accessed for 45 days will be deleted. The HPC staff will monitor Scratch utilization and, in the future, ...Missing: based | Show results with:based
[70]
HPC Storage Tiers
Scratch Quotas (BeeGFS). The scratch filesystems have quota limits in place to prevent excessive use by a small number of users. However, to ensure there is ...
[71]
HOWTO: Reduce Disk Space Usage - Ohio Supercomputer Center
This HOWTO will demonstrate how to lower ones' disk space usage. The following procedures can be applied to all of OSC's file systems.
[72]
[PDF] How to Understand and Tune HPC I/O Performance
Aug 8, 2024 · ○ I/O optimization strategies like collective I/O & chunking can net large performance gains, especially when combined with striping and ...
[73]
[PDF] Comprehensive Resource Use Monitoring for HPC Systems with ...
There are many other open source and commercial resource usage monitoring tools which include: systat/SAR [1], iostat,. CLUMON [11], PCP [12], Ganglia [13], and ...
[74]
[PDF] Using IOR to Analyze the IO performance for HPC Platforms
In this work, we first analyzed the I/O practices and requirements of current HPC applications and used them as criteria to select a subset of microbenchmarks ...
[75]
[PDF] /Scratch as a Cache: Rethinking HPC Center Scratch Storage - People
Jun 12, 2009 · To remedy this, centers periodically scan the scratch in an attempt to purge tran- sient and stale data. This practice of supporting a cache.
[76]
A Fully Automated Scratch Storage Cleanup Tool for Heterogeneous ...
Jul 10, 2022 · This tool runs without human intervention to clean up our scratch space periodically, and it's compatible with both GPFS and Lustre, two popular parallel ...<|control11|><|separator|>
[77]
Savio cluster storage quadrupled to support Big Data research
Jul 10, 2015 · A large, fast, parallel scratch file system is one of the key benefits to researchers using the Savio cluster.
[78]
Knowledge Base: Gautschi User Guide: Scratch Space - RCAC
It is designed to perform well with data-intensive computations, while scaling well to large numbers of simultaneous connections. Helpful? Yes No. Thanks for ...
[79]
[PDF] High-Performance Computing (HPC) Security: Architecture, Threat ...
Feb 2, 2024 · A high-performance computing zone typically utilizes non-high-performance communication networks, like ethernet, as cluster internal networks ...
[80]
Fifteen quick tips for success with HPC, i.e., responsibly BASHing ...
Aug 5, 2021 · Scratch space is often shared with other users, and so it is important to follow the HPC guidelines and obey any recommended quotas. Tip 6: Take ...
[81]
NYU High Performance Computing - Data Management - Google Sites
Old file purging policy on HPC Scratch: All files on the HPC Scratch file system that have not been accessed for more than 60 days will be removed. It is a ...
[82]
Storage and Backup - Research Computing Center
Scratch space is hosted on the RCC's high-performance storage system and is intended to be used for staging data which is required/generated by computational ...
[83]
Best Practices - NURC RTD
In this section, you will find essential guidance for managing your directory storage, checkpointing jobs, and cluster usage.
[84]
Data Storage - High Performance Computing - NC State University
Scratch space is not backed up and files that have not been accessed for 30 days are automatically deleted. Each project has 20 TB of quota in /share/group_name ...Missing: protocols | Show results with:protocols
[85]
Checking Filesystem Quotas on SeaWulf - Research Computing
Nov 1, 2024 · We now have a new command, myquota, that prints out the SeaWulf storage usage and number of files located in major directories you have ...
[86]
Disk Quota and Usage Command - HPC Centers
Jan 28, 2025 · This command allows users to easily identify their quota and usage information for shared storage areas provided on all user-accessible systems.
[87]
https://hpc.ncsu.edu/Documents/Storage.php
[88]
[PDF] Automatic Data Management for HPC Multi-Tiered Storage Systems
Data Jockey is designed and implemented to automate manual data management tasks for HPC users that have large datasets and complex workflows.