Fact-checked by Grok 2 weeks ago

External sorting

External sorting refers to the process of sorting sets that are too large to reside entirely in , utilizing devices such as magnetic tapes or hard disks to manage the overflow. Unlike internal algorithms, which operate solely within (), algorithms are optimized to minimize the costly () operations between and secondary , as disk times significantly dominate computational costs in such scenarios. The primary for in is the number of disk block transfers, often modeled under the external model where is transferred in blocks of size B, with capacity M (where M > B), and total size N. The canonical algorithm for external sorting is the external merge sort, which proceeds in two main phases: initial run formation and multi-way merging. In the run formation phase (often called Pass 0), the input file is divided into smaller subfiles or "runs" that fit into available memory; each run is sorted using an internal sorting algorithm like quicksort or heapsort and written back to disk as a sorted file. The merging phase then iteratively combines these sorted runs using a k-way merge, where k is determined by the memory buffer size (typically k ≈ M/B), producing progressively larger sorted runs until the entire dataset is sorted. This approach achieves an optimal I/O complexity of O((N/B) log_{M/B} (N/B)), which is asymptotically superior to naive adaptations of internal sorts for large N. External sorting plays a foundational role in database management systems (DBMS), where it is employed for query processing, index construction, and join operations on massive datasets. Optimizations such as replacement selection during run formation can increase average run lengths, reducing the number of merge passes, while techniques like double buffering and graceful degradation handle varying buffer allocations. Variants including and polyphase merging have been developed to address specific hardware constraints, such as tape drives in early systems, but merge-based methods remain dominant due to their balance of simplicity and performance.

Background

Definition and Motivation

External sorting encompasses a class of algorithms engineered to organize datasets that surpass the limits of primary , leveraging secondary such as magnetic tapes or hard disk drives to manage input and output operations. These methods emerged as essential tools for processing voluminous records that cannot reside entirely in (), thereby bridging the gap between computational speed and constraints. The historical roots of external sorting trace back to the , when magnetic tape units became standard for in early computers like the , enabling the sorting of large payroll and datasets through multi-tape merge processes. Donald Knuth provided a seminal in his 1973 treatise, formalizing external merge techniques and replacement selection for tape-based systems, which laid the groundwork for subsequent developments in disk-oriented environments. This evolution reflects the shift from tape-driven in the mid-20th century to modern disk-based systems supporting relational databases and . The motivation for external sorting stems from the exigencies of management in domains like , file systems, and search engines, where datasets routinely exceed available —projected to reach 181 zettabytes globally by 2025. Internal sorting algorithms falter under such scales due to exhaustion, rendering external approaches indispensable for efficient in resource-constrained environments. Key challenges include the disparity in access times, where external I/O latencies can be orders of magnitude slower than CPU operations, thus demanding strategies to curtail disk seeks and optimize transfer volumes. For example, sorting a 1-terabyte file on a equipped with 64 gigabytes of exemplifies the need for external methods to and merge without overwhelming main .

Comparison to Internal Sorting

Internal sorting algorithms, such as and , process entire datasets that fit within main memory (), leveraging fast to achieve optimal time complexities of O(n log n) for both average and worst cases in comparison-based methods, with I/O costs being negligible due to the absence of secondary storage involvement. These algorithms prioritize computational efficiency and simplicity, as data elements can be freely rearranged without the overhead of disk transfers. In contrast, external sorting algorithms address datasets exceeding capacity by emphasizing I/O minimization as the primary performance bottleneck, since disk access latencies are significantly higher—often by factors of 10^5 to 10^6—than operations. While internal sorting assumes uniform low-cost access patterns, external methods must account for sequential I/O patterns and buffering strategies to reduce seek times and transfers, potentially sacrificing some CPU efficiency for overall throughput gains. This shift in focus leads to more complex implementations in external sorting, where the goal is not just comparisons but optimizing data movement between tiers of the . Internal sorting suits applications with datasets smaller than available , providing rapid execution and minimal hardware dependencies beyond memory allocation, whereas external sorting is essential for massive-scale , introducing trade-offs like reduced and heightened sensitivity to hardware characteristics such as disk speed and sizes. For instance, empirical comparisons in environments show internal outperforming external mergesort only for files under 1000 records, with external methods gaining substantial advantages as data size increases due to superior I/O overlap. Historically, internal prevailed in early due to modest data volumes that readily fit in primary , but the post-2000s data explosion—driven by sources like the and sensors, with global volumes projected to reach 181 zettabytes by —has rendered external sorting indispensable, as capacities fail to scale proportionally. Seminal analyses, such as those in Knuth's work, underscore this evolution by contrasting the flexibility of internal methods with the stringent I/O constraints of external ones, marking a transition from memory-bound to storage-bound paradigms in .

Memory and I/O Models

External Memory Model

The external memory model provides an abstract framework for analyzing algorithms that process data too large to fit entirely in main memory, emphasizing the cost of data transfers between internal and external storage. Introduced by Aggarwal and Vitter, this model abstracts the memory hierarchy into two levels: a fast internal memory of size M elements and a slower external memory holding N elements where N \gg M, with data transferred in blocks of B elements per I/O operation. The model assumes that computation within internal memory is free in terms of time, while the primary cost arises from I/O operations, making it suitable for evaluating external sorting efficiency. Key assumptions in the model highlight the differences between sequential and random I/O accesses. Sequential I/O, which transfers contiguous blocks, is significantly cheaper due to reduced overhead from disk seeks and rotational latency, often by a factor of 100 or more compared to random accesses. is modeled as linear, with transfers occurring in fixed-size blocks to mimic real constraints like disk sectors or records, ensuring that algorithms optimize for block-aligned operations to minimize I/O volume. Variants of the model account for different storage technologies. The disk access model, central to Aggarwal and Vitter's work, permits to any block but incorporates latency penalties for non-sequential reads, reflecting modern hard disk drives with multiple platters allowing parallel transfers. In contrast, the model, prevalent in early systems, restricts access to sequential only, without random seeks, as tapes unwind linearly and rewinding incurs substantial time costs; this variant influenced initial external sorting designs before disk dominance. These model components and assumptions directly shape design prerequisites, particularly in external sorting. The ratio M/B determines available slots in internal , enabling strategies like buffering multiple input runs to overlap reads and writes, thus reducing total I/O passes. Similarly, the large N/B (number of external blocks) necessitates multi-pass approaches, where each pass scans a of the data, with the number of passes influenced by how effectively internal partitions the workload to balance patterns.

I/O Cost Measures

In the external memory model, the primary metric for evaluating the efficiency of external sorting algorithms is the number of I/O operations, which counts the reads and writes of fixed-size blocks between internal memory and secondary storage. This measure captures the dominant bottleneck in processing large datasets that exceed available main memory, as each I/O transfers B records at a time. A basic scanning operation, such as reading or writing the entire dataset once, requires O(N/B) I/O operations, where N denotes the total number of records. For sorting, the established upper bound is O\left( \frac{N}{B} \log_{\frac{M}{B}} \frac{N}{B} \right) I/O operations, where M is the size of internal memory; this bound arises from multiway merge strategies and is tight under standard assumptions. The logarithmic term reflects the number of merging phases needed, with each phase involving scans of the data. Secondary costs include , which is typically linear in N and not the focus of I/O analysis, and seek time for disk accesses, modeled as a constant overhead per random I/O . Larger block sizes B reduce the total I/O count by minimizing the number of transfers but may increase per due to rotational delays. Similarly, increasing internal M lowers the base of the logarithm in the bound, reducing overall I/Os, though apply as M approaches N. Buffering introduces trade-offs by allocating portions of M to stage data and overlap I/O with computation, potentially cutting effective I/Os through techniques like double buffering in merging, but at the expense of reduced space for active sorting runs. Optimal buffer sizing balances these factors, as excessive buffering can underutilize memory for algorithmic progress while insufficient buffering amplifies I/O volume.

Merge-Based Algorithms

External Merge Sort Procedure

The external merge sort algorithm divides the sorting process into two main phases: run formation and multiway merging. In the first phase, the input file, assumed to contain N elements divided into n = N/B disk pages where B is the number of elements per page, is processed to create initial sorted runs. Each run is formed by reading approximately M elements (where M is the available main memory in elements, with M < N) into memory, sorting them using an internal sorting algorithm such as or , and writing the sorted run back to disk. This process repeats until the entire input is consumed, yielding approximately N/M initial runs, with the last run potentially shorter. An alternative to full internal sorting for run formation is replacement selection, which exploits any existing order in the input to produce longer initial runs on average by selecting the next output element as the smallest that is at least as large as the previously output element. In the second phase, the initial runs are iteratively merged using a k-way merge, where k is chosen as approximately (M/B) - 1 to fully utilize available memory (with k input buffers of size B each and one output buffer of size B). Each merge pass reads k runs into the input buffers, merges them by repeatedly selecting the smallest element from the buffer heads using a min-heap or tournament method, and writes the merged output as a new run to disk. If the number of runs exceeds k, multiple passes are required, halving (or reducing by factor k) the number of runs per pass until only one sorted run remains; the total number of passes is roughly 1 + log_k (N/M). This procedure assumes the external memory model, where I/O costs dominate due to limited memory and block-based disk access. The following pseudocode outlines the basic external merge sort procedure: Initial Run Formation:
for i = 1 to ceil(N/M) do
    Read min(M, remaining elements) from input file into memory
    Sort the M elements internally
    Write the sorted run to output file as run i
end
Set current_runs = ceil(N/M)
Multiway Merge Passes:
while current_runs > 1 do
    k = min(floor((M - B)/B) + 1, current_runs)  // Number of runs to merge per group
    for j = 1 to ceil(current_runs / k) do
        Allocate k input buffers and 1 output buffer
        Read first B elements of each of k runs into input buffers
        while any input buffer not empty do
            Find smallest element among buffer heads (e.g., via heap)
            Write it to output buffer
            Advance the corresponding input buffer
            If an input buffer empties, refill from its run
            If output buffer full, write to disk and reset
        end
        Write any remaining output buffer to disk as new run
    end
    current_runs = ceil(current_runs / k)
end
This pseudocode assumes sequential file access and does not handle end-of-run conditions in detail. For example, consider sorting a with 1,000 elements divided into 10 initial runs of 100 elements each, using M = 200 elements of memory and B = 10 elements per disk block. In run formation, each of the 10 runs is created by loading 100 elements (less than M), internally, and writing to disk, requiring 10 read-write cycles. For merging, k ≈ (200/10) - 1 = 19, but since only 10 runs exist, a single k-way merge pass (with k=10) uses 10 input and 1 output buffer to merge all runs into one sorted , selecting the global minimum 1,000 times and performing I/O only when fill or empty.

Optimizations for Merge Sort

Replacement selection is a key optimization for the initial run formation phase of external , where records are continuously read into a of size M and the largest available record is selected and written to the output run, with the buffer maintained as a to facilitate efficient selection. This approach, introduced by Goetz in , produces longer initial runs than simple load-sort-store methods by overlapping input, selection, and output operations, achieving an average run length approximately twice that of the buffer size for randomly distributed data. Bottom-up merging variants, such as polyphase or balanced merging, introduce additional passes to balance run lengths and minimize the total number of merging phases, particularly beneficial for sequential storage devices like tapes where rewinding and repositioning are costly. These methods distribute initial runs unevenly across multiple tapes during the first pass and merge them in a way that equalizes run sizes over subsequent passes, reducing the overall I/O volume by avoiding excessive tape traversals. Multiway merging extends the standard two-way merge by combining k sorted runs simultaneously, where k is bounded by the memory limit (k ≈ M/B - 1), employing a min-heap () to track the smallest unmerged element from each run, allowing efficient selection with O(log k) time per extraction. This heap-based approach reduces the number of merging passes from O(log (N/M)) to O(log_k (N/M)), thereby decreasing total I/O operations, especially when buffer space permits larger k values. For sequential devices like s, adaptations such as odd-even tape merging in balanced merge variants enable efficient distribution and merging without , minimizing head movements by processing runs on odd and even numbered tapes alternately during merges. This technique suits tape-based systems by facilitating balanced distribution and merging in a linear access manner, contrasting with disk-oriented random I/O optimizations. In practice, these optimizations significantly reduce I/O costs compared to basic external , with replacement selection alone doubling run lengths and multiway merging further compressing pass counts, as demonstrated in cache-conscious implementations on modern storage.

Distribution-Based Algorithms

External Radix Sort

External radix sort is a distribution-based adapted for external memory environments, where the dataset exceeds main memory capacity and must be processed using . Unlike comparison-based methods, it partitions into buckets based on individual digits of the keys, starting from the most significant digit () in a recursive manner. This approach is particularly effective for fixed-length keys, such as integers or strings of uniform length, as it enables linear-time processing per digit position with minimal I/O overhead. The procedure follows the MSD paradigm, modified for . In each pass, the input file is read sequentially, and records are distributed into corresponding to the possible values of the current (e.g., 0-255 for base-256). Only non-empty are written to separate external files or disk partitions, and the process recurses on each for the next position until the keys are fully sorted. Small that fit in main memory can be sorted internally using an in-memory or other efficient method to reduce disk accesses. This recursive distribution ensures that the sorting tree has depth equal to the number of , with each level requiring a single full scan of the data. Bucket management in external radix sort relies on creating temporary files or allocated disk areas for each bucket during distribution. To optimize I/O, buffers of size approximately equal to the main memory capacity are used to accumulate records before writing full blocks to disk, ensuring sequential access patterns. For multiple disks, buckets can be striped or cycled across devices to balance load and improve parallelism. If a bucket is small enough to fit in memory after distribution, it is sorted internally and collected back without further external passes, avoiding unnecessary disk operations. Stable partitioning is maintained by preserving the relative order of records with equal digits, typically achieved through counting or linked-list mechanisms adapted for external use. A key advantage of external radix sort over merge-based algorithms is the reduced number of passes for fixed-length keys, as the number of digit positions is constant and independent of data size. In ideal cases with sufficient memory for buffers, it achieves O(N) total I/O volume across all passes, making it highly efficient for large uniform-key datasets like integer arrays. For instance, sorting a collection of 32-bit integers using base 256 requires at most 4 passes—one per byte—each scanning the entire dataset once for distribution. This contrasts with merge sort's logarithmic number of passes, which grows with N. Despite its efficiencies, external radix sort has limitations, particularly with variable-length keys where the MSD recursion depth varies, leading to unbalanced buckets and potentially more I/O than fixed-length cases. It also requires partitioning to ensure correctness across digits, which adds complexity in external implementations due to the need for auxiliary or careful buffering to maintain during disk writes. Additionally, can degrade if the base is poorly chosen relative to size, as too many buckets may exceed available disk space or .

Other Distribution Methods

External quicksort hybrids extend the partitioning strategy of internal to external environments by using multiple external buckets to store records less than, equal to, or greater than a selected , with applied to each resulting subfile until the data fits in main memory for internal sorting. This method leverages space efficiently to reduce I/O operations during pivot selection and data redistribution, often achieving performance competitive with in practice when disk access patterns are favorable. Hash-based distribution scatters input records across a set of output files by applying a to the sort keys, promoting balanced partitioning without relying on key comparisons; each resulting file, typically small enough for internal , is then sorted in , followed by a multiway merge of the sorted files to obtain the final order. Randomized variants employ multiple functions over successive distribution passes to mitigate skew from poor initial hashing and ensure uniform load across files. These alternatives to prove useful when sort keys defy radix assumptions, such as fixed-length digits or uniform base representation, particularly for non-numeric or variable-length data like where comparison-based or hash partitioning avoids the need for digit-by-digit processing. For example, hash-based of distributes records by hashing the full to files for initial scattering; collisions within a file are managed through subsequent internal of that group before merging all sorted files.

Performance Evaluation

Complexity Analysis

The theoretical analysis of external sorting algorithms primarily focuses on their I/O complexity in the external memory model, where the dominant cost arises from transferring data between fast internal memory of size M (measured in elements) and slower external storage in blocks of size B elements, with N denoting the total number of elements to sort and assuming M \geq B. For comparison-based algorithms, a fundamental lower bound on the number of I/O operations is \Omega\left( \frac{N}{B} \log_{M/B} \frac{N}{B} \right), established by modeling the sorting process as requiring a certain number of element comparisons that necessitate data movement across the memory boundary. This bound holds under the assumption that B \log_2 (M/B) = o(\log_2 (N/B)), reflecting the information-theoretic needs of distinguishing permutations while accounting for block-level transfers. For external merge sort, the I/O complexity achieves this lower bound up to constant factors. The initial phase creates \lceil N/M \rceil sorted runs by loading M elements into , sorting them (at CPU cost O(M \log M)), and writing them back, incurring $2(N/B) I/Os overall since each of the N/B blocks is read and written once. Subsequent merge phases use k-way merging, where k = \lfloor M/B \rfloor - 1 (reserving space for output buffering). The number of merge passes required is \lceil \log_k (N/M) \rceil, as each pass reduces the number of runs by a factor of approximately k. Each pass reads and writes all N/B blocks, adding $2(N/B) I/Os per pass, for a total merge cost of $2(N/B) \log_{M/B} (N/M). Combining phases yields an overall I/O complexity of $2(N/B) \log_{M/B} (N/M) + 2(N/B) = O\left( \frac{N}{B} \log_{M/B} \frac{N}{B} \right), with O(N \log N). Distribution-based algorithms like external offer potentially better performance for keys with bounded word size w, processing d = O(\log_w N) digits (or fewer for fixed-length keys). Each digit pass distributes elements into buckets using a partition (e.g., adapted to external memory), requiring O(N/B) I/Os to scan, partition, and write back the data. With d passes, the total I/O complexity is O(d (N/B)), which is linear in N if d is constant (e.g., for fixed-precision integers) and can outperform comparison-based methods when d \ll \log_{M/B} (N/B). The is O(d N). More advanced variants, such as multiway , may add a logarithmic factor, yielding O\left( \frac{N}{B} (d + \log_{M/B} \frac{N}{B}) \right) I/Os, but the base form remains pass-linear. In terms of , both and require O(M) internal for buffering and O(N/B) external blocks for input and output files, plus O(B) temporary space per block during transfers, totaling O(M + B + N/B); the external component dominates for large N. Asymptotically, these complexities scale favorably with growing M and B: larger M/B reduces the logarithmic factor in 's I/Os (fewer passes), while bigger B amortizes transfer overhead, improving practical scalability on modern systems with increasing and block sizes; however, 's linearity in N makes it more robust when d remains small relative to \log (N/B).

Practical Implementation Factors

Hardware impacts significantly influence the efficiency of external sorting on single machines (as of 2025). Traditional hard disk drives (HDDs) incur substantial times of 5–10 to position the read/write head, dominating during the multiple random accesses typical in merge passes. Solid-state drives (SSDs), by contrast, lack mechanical components, eliminating and rotational latencies while offering sustained sequential transfer rates up to 7,000–14,000 MB/s for high-end NVMe models, which can accelerate external sorts by reducing I/O bottlenecks compared to HDDs' 100–200 MB/s limits. Effective tuning aligns sizes with the operating system's page size (often 4 ) to optimize and avoid fragmented I/O operations that increase overhead. Software selections and architectural decisions further shape implementation practicality. Languages with robust I/O libraries, such as Java's framework, facilitate efficient direct byte management for reading and writing large files without intermediate copying, supporting scalable external merge operations. Incorporating parallelism, particularly multi-threaded merging, leverages multi-core processors to process multiple input runs concurrently, potentially halving merge times on systems with 8+ cores while managing contention through careful buffer partitioning. Key tuning parameters include buffer size and initial run lengths, which must be adjusted based on hardware characteristics. Buffer sizes should balance seek costs against sequential transfer efficiency, often aligning with disk block sizes. Run length adjustments, such as employing replacement selection during the initial pass, can extend sorted runs beyond simple memory limits, minimizing subsequent merge passes. Real-world benchmarks highlight these factors' impact. The Unix sort command implements external merge sort with adaptive buffering and temporary file spilling, achieving effective throughputs typically in the range of 10–100 MB/s on modern HDDs for multi-gigabyte files (limited by seek times), scaling to SSDs for up to 10x faster completion on datasets exceeding available RAM. In database systems, external sorting underpins index construction, such as bulk-loading B+ trees; benchmarks indicate SSD-based sorts complete significantly faster than on HDDs, often by an order of magnitude for large relations like 100 GB datasets. While core implementations target single machines, modern extensions like adapt external sorting principles for distributed environments, partitioning data across nodes for petabyte-scale processing while retaining single-machine efficiency for smaller workloads via local merges.

References

  1. [1]
    [PDF] External Sorting Introduction Methods Analysis Practical Data
    External sorting refers to the sorting of a file that is on disk (or tape). Internal sorting refers to the sorting of an array of data that is in RAM.Missing: definition | Show results with:definition
  2. [2]
    [PDF] The Input/Output Complexity of Sorting and Related Problems
    Feb 21, 2019 · ▷ Approach 1: Reuse the algorithm used for the RAM model. Number of I/Os O(N). ▷ Approach 2: External sort: I/O's O((N/B)logM/B(N/B)).
  3. [3]
    External Sorting Algorithm: State-of-the-Art and Future Directions
    Merge sort is the algorithm most commonly used for external sorting. External merge sorting is generally divided into two phases: run formation phase (see ...
  4. [4]
    [PDF] Database Systems External Sort
    ▫ External sorting is important; DBMS may dedicate part of buffer pool for sorting! ▫ External merge sort minimizes disk I/O cost: – Pass 0: Produces sorted ...
  5. [5]
    External sorting for index construction of large ... - ACM Digital Library
    Since the generation of the initial runs has a similar performance, our approach is faster than external merge sort using main memory sort algorithms. In ...
  6. [6]
    The art of computer programming, volume 3: (2nd ed.) sorting and ...
    The art of computer programming, volume 3: (2nd ed.) sorting and searchingJanuary 1998. Author: Author Picture Donald E. Knuth. Stanford University, CA.
  7. [7]
    1951: Tape unit developed for data storage
    In 1952, IBM announced its first magnetic tape storage unit, Model 726, for sale with the IBM 701 computer. It rented for $850 a month and could store 2 million ...
  8. [8]
    Big data statistics: How much data is there in the world? - Rivery
    May 28, 2025 · By 2025, the global volume of data is projected to rise further to 181 zettabytes by the end of 2025. This growth is driven by the increasing ...
  9. [9]
    External sorting of large datasets - umbrant
    Apr 16, 2011 · External sorting of large datasets. This is a common interview question: how do you sort data that is bigger than memory? “Big data” in the ...Missing: motivation | Show results with:motivation
  10. [10]
    Sorting - Kent
    O(n2)-class includes bubble sort, insertion sort, selection sort and shell sort. O(n log n)-class includes heap sort, merge sort and quick sort.
  11. [11]
    2.5 Sorting Applications - Algorithms, 4th Edition
    Mar 18, 2018 · Quicksort is the fastest general-purpose sort. In most practical situations, quicksort is the method of choice. If stability is important and ...
  12. [12]
  13. [13]
    Comparison of external sorting and internal sorting in virtual memory
    To compare the performance of external sorting and internal sorting in virtual memory, (external) mergesort and (internal) quicksort are performed in ...
  14. [14]
    Preface to The Art of Computer Programming, Volume 3: Sorting and ...
    Aug 15, 2014 · Chapter 5 is concerned with sorting into order; this is a large subject that has been divided chiefly into two parts, internal sorting and ...Missing: TAOCP | Show results with:TAOCP
  15. [15]
    The input/output complexity of sorting and related problems
    We give a randomized algorithm for sorting strings in external memory. For K binary strings comprising N words in total, our algorithm finds the sorted ...
  16. [16]
    [PDF] External Memory Algorithms and Data Structures: Dealing with ...
    [2000]; the Aggarwal–Vitter model can be simu- lated probabilistically by PDM with only a constant factor more I/Os, thus making the two models ...
  17. [17]
    [PDF] The Input/Output Complexity of Sorting and Related Problems
    It is well documented that the bottleneck in external sorting is the time for input/output (I/O) between internal memory and secondary storage. 01988 AC:M OOOl- ...Missing: measures seminal
  18. [18]
    [PDF] Algorithms and Data Structures for External Memory
    Apr 24, 2008 · 5.1 Sorting by Distribution 31 the Aggarwal–Vitter model discussed in Section 2.3 by use of PDM with the same number of I/Os, up to a ...
  19. [19]
    The art of computer programming, volume 3: (2nd ed.) sorting and ...
    Near-optimal online multiselection in internal and external memory, Journal of Discrete Algorithms, 36:C, (3-17), Online publication date: 1-Jan-2016.
  20. [20]
    [PDF] External-Memory Sorting (lecture notes) - cs.aau.dk
    The main component of the MERGE-SORT algorithm is the MERGE procedure, which takes two sorted arrays as input and merges them into one sorted array. The MERGE ...
  21. [21]
    External Sorting
    External Sorting. One method for sorting a file is to load the file into memory, sort the data in memory, then write the results.Missing: history | Show results with:history
  22. [22]
    External Sorting
    1. We call an algorithm that sorts data contained in main memory an INTERNAL SORTING algorithm, while one that sorts data on disk is called an EXTERNAL SORTING ...
  23. [23]
  24. [24]
    External quicksort - ScienceDirect.com
    External quicksort is less sensitive to the block size and to the file size. With faster disks, the performance of external quicksort improves faster than that ...
  25. [25]
    Optimal Polyphase Sorting | SIAM Journal on Computing
    A read-forward polyphase merge algorithm is described which performs the polyphase merge starting from an arbitrary string distribution.<|separator|>
  26. [26]
    [PDF] optimal polyphase sorting - Stanford University
    A read-forward polyphase merge algorithm is described which performs the polyphase merge starting from an arbitrary string distribution. This algorithm ...<|separator|>
  27. [27]
    Hard Drive Seek Time: What It Means (And Why It Matters)
    May 2, 2023 · Modern hard drives have an average seek time of around 9 milliseconds (9ms), but high-performance drives can cut this down to as little as 4ms.
  28. [28]
    ISort: SSD Internal Sorting Algorithm for Big Data - Liu - 2022
    Dec 15, 2022 · As a basic algorithm for big data processing, external sorting suffers from massive read and write operations in the external memory.
  29. [29]
    14.6. External Sorting — OpenDSA Data Structures and Algorithms ...
    Years ago, sorting algorithm designers sought to optimize the use of specific hardware configurations, such as multiple tape or disk drives. Most computing ...
  30. [30]
    ExternalSort (The Adobe AEM Quickstart and Web Application.)
    https://code.google.com/p/externalsortinginjava. Goal: offer a generic external-memory sorting program in Java. It must be : - hackable (easy to adapt) ...Missing: implementation | Show results with:implementation
  31. [31]
    [PDF] Dynamic Memory Adjustment for External Mergesort
    This paper introduced a method and policy for dynam- ically adjusting the memory usage of external merge- sort at run time. We experimentally showed that.
  32. [32]
    [PDF] Efficient Algorithms for Sorting and Synchronization - Samba.org
    External sorting in- volves ... is about 29000, which according to the simple formula derived earlier would give an optimal block size of approximately 86.
  33. [33]
    [PDF] EXTERNAL SORTING - cs.wisc.edu
    Feb 7, 2017 · Consider sorting a file with a 1000 pages, using 11 buffer pages. – At the end of the first pass, we have runs of size 11 pages. – Next pass ...Missing: tuning | Show results with:tuning
  34. [34]
    Internal Implementation of Linux Sort Command - GeeksforGeeks
    Jul 23, 2025 · The SORT command in the Linux operating system is used for sorting. It can be done either for files or for input on a command line given by the user.Missing: benchmarks | Show results with:benchmarks
  35. [35]
    [PDF] MapReduce: Simplified Data Processing on Large Clusters
    In a general sorting program, we would add a pre-pass MapReduce operation that would collect a sample of the keys and use the distribution of the sampled keys ...