Unrolled linked list

An unrolled linked list is a linear data structure that extends the traditional linked list by storing multiple elements—typically in a fixed-size array—within each node, rather than a single element per node.^[1] The concept was introduced in 1994 by Zhong Shao, John H. Reppy, and Andrew W. Appel.^[1] This design reduces the overhead of pointers and links, making it a hybrid between arrays and linked lists.^[2] Introduced in the context of functional programming optimizations, unrolled linked lists aim to balance the dynamic sizing of linked lists with the contiguous memory access benefits of arrays.^[1] The structure typically employs doubly linked nodes, where each node contains an array of elements and metadata indicating the current occupancy to allow for efficient insertions and deletions without excessive rebalancing.^[2] For instance, when inserting an element, it is added to the array in the current node if space is available; otherwise, a new node is created and elements may be split across nodes to maintain balance.^[2] This approach provides improved performance for sequential operations like traversal, insertion, and erasure due to better cache locality, compared to standard linked lists.^[1] Key advantages of unrolled linked lists include enhanced cache efficiency due to sequential element storage, which minimizes cache misses compared to traditional linked lists where elements are scattered in memory.^[2] They also offer better memory utilization by reducing the number of pointer fields, potentially halving space overhead for lists with small elements.^[1] These benefits make unrolled linked lists particularly suitable for applications involving frequent sequential reads or writes, such as in database indexing or functional language implementations, though they may incur slightly higher costs for random access compared to pure arrays.^[1]

Fundamentals

Definition

An unrolled linked list is a variant of a singly or doubly linked list in which each node contains a fixed-size array capable of holding multiple data elements, typically 4 to 16 items, rather than a single element. This structure groups consecutive elements into blocks within nodes, connected by pointers to form the overall list.^[3]^[4] The primary purpose of an unrolled linked list is to merge the dynamic resizing and insertion flexibility of traditional linked lists with the sequential access efficiency of arrays, thereby reducing the overhead associated with numerous pointers and enhancing locality of reference. By minimizing the frequency of pointer traversals, it addresses inefficiencies in memory access patterns common in standard linked lists.^[3] This design yields key benefits, particularly improved performance during sequential traversals in environments sensitive to cache hierarchies or memory constraints, where reduced pointer chasing leads to fewer cache misses and better utilization of processor caches.^[5]^[4] Unrolled linked lists emerged in the 1990s amid growing awareness of cache hierarchy challenges in modern processors, with the concept first described in the 1994 paper "Unrolling Lists" by Zhong Shao, John H. Reppy, and Andrew W. Appel, presented at the ACM Conference on LISP and Functional Programming, which focused on optimizing list representations for functional programming languages. Early explorations highlighted their potential for memory efficiency and access speed improvements.^[3]

Node Structure

In an unrolled linked list, each node consists of a fixed-size array capable of holding multiple data elements, typically denoted as an array of size K, where K is a compile-time constant chosen to optimize memory usage and access patterns.^[6] This array stores the actual elements contiguously, accompanied by a count field that indicates the number of active or valid elements currently occupying the array, allowing for efficient management of partially filled nodes.^[6] Additionally, the node includes one or more pointers for linking to adjacent nodes: a next pointer for singly linked variants and both next and previous pointers for doubly linked implementations, facilitating traversal while minimizing the overhead of individual per-element pointers found in traditional linked lists.^[6] The choice of array size K is influenced by hardware characteristics, particularly the CPU cache line size, to maximize utilization and reduce cache misses during sequential access.^[4] For instance, on systems with a 64-byte cache line, if elements are 4-byte integers, a value of K = 16 allows the array to fill the cache line effectively, assuming minimal overhead from metadata, thereby improving spatial locality.^[5] In practice, smaller values like K = 8 are used in concurrent implementations to balance concurrency control overhead with cache efficiency.^[4] Partial nodes are handled through the count field, which tracks the exact number of occupied slots in the array, enabling nodes to operate below full capacity without wasting space or requiring immediate reorganization.^[6] This design supports dynamic adjustments, such as during list modifications, where the count ensures only valid elements are accessed, preventing errors from unused array portions. The memory layout of an unrolled linked list node promotes contiguous storage of elements within the array, which contrasts with the scattered allocation in standard linked lists and reduces indirection overhead by allowing multiple elements to be fetched in a single memory access.^[6] Typically, the structure might be represented in pseudocode as:

struct UnrolledNode {
    Element array[K];    // Fixed-size array of elements
    int count;           // Number of active elements (0 <= count <= K)
    UnrolledNode* next;  // Pointer to next node
    // Optionally: UnrolledNode* prev; for doubly linked
};
struct UnrolledNode {
    Element array[K];    // Fixed-size array of elements
    int count;           // Number of active elements (0 <= count <= K)
    UnrolledNode* next;  // Pointer to next node
    // Optionally: UnrolledNode* prev; for doubly linked
};

This arrangement clusters related data, enhancing cache performance by aligning with how processors load and process memory blocks.^[4]

Operations

Insertion

Insertion into an unrolled linked list begins by locating the appropriate node and position within its fixed-size array to accommodate the new element. The process typically involves a linear traversal through the nodes, examining each node's count field to determine if the insertion index falls within its range, which exploits the reduced number of nodes compared to a traditional linked list. Once the target node is found, if the node's array has available space (i.e., the current count is less than the maximum capacity K), the new element is inserted at the specified offset, and the count is incremented; this step may require shifting subsequent elements within the array to maintain order.^[7] When the target node is full, splitting is necessary to create space. The algorithm creates a new node and redistributes the elements, typically moving approximately half (e.g., \lfloor K/2 \rfloor) to the new node while inserting the new element into the appropriate half, then updates the next and previous pointers to link the split nodes correctly. This redistribution ensures that both resulting nodes are roughly balanced, preventing excessive fragmentation. In implementations like the SEList, an alternative spreading strategy may redistribute elements across up to K+1 adjacent nodes if no immediate space is found within a bounded search radius, amortizing the cost over multiple operations.^[7] For insertions at specific positions, such as the head or tail, special handling applies. At the head, the first node is checked for space; if available, elements are shifted right to insert at the beginning of its array; otherwise, a new node is created and linked as the new head. Tail insertions similarly append to the last node's array or create a new trailing node. The following pseudocode illustrates the linear search to find the insertion position in a linked unrolled list (assuming doubly-linked for bidirectional access):

function insert(position, value):
    if head is null:
        head = create_node()
        insert_into_node(head, 0, value)
        return
    current = head
    offset = 0
    prev = null
    while current != null:
        if offset + current.count > position:
            insert_into_node(current, position - offset, value)
            return
        prev = current
        offset += current.count
        current = current.next
    // Append to tail
    if prev.count < K:
        insert_into_node(prev, prev.count, value)
    else:
        new_node = create_node()
        insert_into_node(new_node, 0, value)
        prev.next = new_node
        new_node.prev = prev
function insert(position, value):
    if head is null:
        head = create_node()
        insert_into_node(head, 0, value)
        return
    current = head
    offset = 0
    prev = null
    while current != null:
        if offset + current.count > position:
            insert_into_node(current, position - offset, value)
            return
        prev = current
        offset += current.count
        current = current.next
    // Append to tail
    if prev.count < K:
        insert_into_node(prev, prev.count, value)
    else:
        new_node = create_node()
        insert_into_node(new_node, 0, value)
        prev.next = new_node
        new_node.prev = prev

This search proceeds node-by-node until the cumulative count exceeds the target position. The insert_into_node function handles shifting elements right if needed and increments the count.^[7] To maintain efficiency, an optional balancing strategy rebalances nodes post-insertion, aiming to keep them roughly half-full. This may involve merging underfilled adjacent nodes (e.g., if a node's count falls below a threshold like K/4 after related operations) by combining their arrays and removing the extra node, though such merging is often deferred or amortized to avoid frequent link updates. This approach enhances cache locality and prevents performance degradation from imbalanced node sizes.^[7] The average time complexity for insertion is O(n/K) due to the node traversal, where n is the total number of elements and K is the node capacity, as fewer nodes reduce the search overhead compared to single-element nodes.^[7]

Deletion

Deletion in an unrolled linked list involves first locating the target element within a specific node and array index, typically after a traversal to find it. Once identified, the element is removed by shifting all subsequent elements in the node's array one position to the left to fill the gap, ensuring no null entries remain, and then decrementing the node's count field. This shifting operation takes O(K) time, where K is the fixed array size per node, as it may require moving up to K-1 elements.^[8] If the node's count falls below a predefined minimum threshold—commonly K/4 or K/2 to balance space efficiency and fragmentation—the node is merged with an adjacent one to prevent underutilized nodes. Merging entails appending the elements from the underfilled node to the adjacent node's array (or vice versa, depending on which has more space), updating the count in the receiving node, and adjusting the links between nodes. The emptied node is then deallocated, and pointers from neighboring nodes are updated to bypass it. This process maintains the list's integrity by preserving the sequential order of elements across nodes.^[8]^[6] Edge cases include deletions from partially filled nodes, where shifting may not trigger merging if the threshold is not met, and from full nodes, which always require shifting but rarely immediate merging. If deletion empties a node entirely, it is removed by relinking the previous and next nodes directly, handling the head or tail specially to avoid null pointer issues. In doubly-linked implementations, both forward and backward pointers must be updated symmetrically during merging or node removal to ensure bidirectional traversal remains valid.^[8] The following pseudocode illustrates the core deletion logic within a node, assuming the node and index have been located:

function deleteInNode([node](/page/Node), [index](/page/Index)):
    // Shift elements after [index](/page/Index) to the left
    for i from [index](/page/Index) to node.count - 2:
        node.[array](/page/Array)[i] = node.[array](/page/Array)[i + 1]
    node.count -= 1
    // Optional: Set last position to null or invalid, but typically not needed in dense arrays
    
    // Check for merging if under threshold
    if node.count < MIN_THRESHOLD:  // e.g., K/4
        mergeWithAdjacent([node](/page/Node))
function deleteInNode([node](/page/Node), [index](/page/Index)):
    // Shift elements after [index](/page/Index) to the left
    for i from [index](/page/Index) to node.count - 2:
        node.[array](/page/Array)[i] = node.[array](/page/Array)[i + 1]
    node.count -= 1
    // Optional: Set last position to null or invalid, but typically not needed in dense arrays
    
    // Check for merging if under threshold
    if node.count < MIN_THRESHOLD:  // e.g., K/4
        mergeWithAdjacent([node](/page/Node))

The mergeWithAdjacent function would handle transferring elements, updating counts, and relinking, with O(K) cost in the worst case due to array copies. Overall, while individual deletions cost O(K + traversal time), amortized analysis over multiple operations spreads merging costs, preserving efficient performance.^[8]

Traversal and Search

Traversal in an unrolled linked list proceeds by following the pointers between nodes while sequentially scanning the fixed-size array within each node up to its occupancy count, which enhances efficiency for sequential access patterns compared to traditional linked lists by reducing pointer indirections.^[9] In a singly-linked variant, forward traversal begins at the head and continues until the tail, processing elements in order; for doubly-linked implementations, backward traversal mirrors this process using previous pointers, allowing bidirectional navigation without reversing the structure.^[10] The following pseudocode illustrates forward sequential traversal, where each element in the node's array is visited:

function traverse_forward(head):
    current = head
    while current is not [null](/page/Null):
        for i from 0 to current.count - 1:
            process(current.array[i])
        current = current.next
function traverse_forward(head):
    current = head
    while current is not [null](/page/Null):
        for i from 0 to current.count - 1:
            process(current.array[i])
        current = current.next

This approach visits approximately n/K nodes for a list of n elements with node capacity K, minimizing overhead from pointer chasing.^[9] Search for a specific value in an unrolled linked list employs a linear scan analogous to traversal, starting from the head and examining elements in each node's array until the target is found or the end is reached, with an optimization to first check the node's occupancy to skip empty slots if applicable. The time complexity is O(n, as the search must examine up to all elements, though it traverses only O(n/K) nodes, reducing pointer overhead compared to a standard linked list.^[11]^[10] Unlike arrays, unrolled linked lists lack direct indexing for random access, requiring traversal to the appropriate node and offset, which takes O(\min\{i, n-i\}/K) time to reach the i-th element in balanced doubly-linked variants, making sequential or near-sequential accesses faster than in traditional lists but still linear overall.^[9] Improvements for random access can involve hybrid structures, such as adding auxiliary indexing layers, though these add complexity.^[12] Iterators for unrolled linked lists typically follow standard patterns like those in languages such as C++ or Java, maintaining internal state with a pointer to the current node and an index into its array to enable incremental traversal without exposing the underlying structure.^[9] This design supports operations like incrementing to the next element by advancing the array index or jumping to the next node when the array is exhausted, ensuring compatibility with generic algorithms while leveraging the unrolled format for performance.^[11]

Performance Characteristics

Time Complexity

The time complexity of operations in an unrolled linked list depends on the node capacity K, which determines the number of elements stored per node, and the total number of elements n. Traversal and linear search require scanning all elements sequentially: following O(n/K) node pointers takes O(n/K) time, while processing the arrays within nodes takes O(n) time overall, yielding a total of O(n). However, the pointer-following cost is often highlighted as O(n/K) in analyses emphasizing the reduced linkage overhead compared to traditional linked lists.^[8] Insertion at an arbitrary position involves first locating the target node, which requires traversing O(n/K) nodes in the worst case, followed by O(K) time to shift elements within the node and potentially split it if full. Thus, the worst-case time complexity is O(n/K + K). Deletion follows a similar pattern: locating the node takes O(n/K) time, and removing the element plus merging or shifting within the node (and possibly adjacent nodes) takes O(K) time, again yielding O(n/K + K) in the worst case.^[8]^[4] Amortized analysis, particularly for structures like the space-efficient list (SEList) variant, spreads the cost of occasional rebalancing (such as block splitting or gathering) across multiple operations. With a fixed K, operations near the list ends can achieve amortized O(1) time if insertions and deletions maintain balance, though arbitrary-position operations remain O(K + n/K) on average; this contrasts with worst-case spikes during rebalancing that can reach O(K^2). Choosing K \approx \sqrt{n} optimizes to amortized O(\sqrt{n}) per operation, balancing traversal and local work.^[8] The complexities are influenced by K and n: larger K reduces the n/K term (fewer nodes to traverse) but increases local operation costs, while n scales the traversal linearly; optimal K values (e.g., matching cache line sizes) minimize overall time for typical workloads.^[8]

Space Efficiency and Cache Locality

Unrolled linked lists achieve better space efficiency than traditional linked lists primarily through a reduction in pointer overhead. In a conventional singly linked list, storing n elements requires n nodes, each containing one data element and one pointer (typically 8 bytes on 64-bit systems), resulting in approximately 50% of the memory dedicated to pointers alone. By contrast, an unrolled linked list organizes K elements into a contiguous array within each node, along with a single next pointer and a count field to track the number of occupied slots, yielding only m ≈ n/K pointers overall. This node structure incurs O(K) space for the array plus O(1) for the pointer and count, leading to a total space complexity of O(n) while amortizing the fixed overhead across multiple elements per node. The consolidation of elements reduces the total number of pointers by a factor of K, significantly lowering memory usage compared to single-element nodes; for instance, with K tuned to cache line sizes (e.g., K=8 for 64-byte lines and 8-byte elements), this can enhance storage density without excessive fragmentation. However, drawbacks include potential wasted space in partially filled nodes, particularly the last one, and the fixed block size K that may limit adaptability to varying data sizes or access patterns.^[13] Regarding cache locality, the array-based storage within nodes promotes sequential memory access, aligning well with modern CPU cache hierarchies where data is fetched in fixed-size lines (commonly 64 bytes). Traversing an unrolled linked list loads an entire cache line to access up to K elements at once—approximately K elements per load versus one in standard linked lists—minimizing compulsory and capacity misses during linear scans or searches. This spatial locality can reduce cache misses relative to traditional linked lists, as the contiguous layout ensures that adjacent elements share cache lines.^[5] Empirical evaluations confirm these benefits, with implementations demonstrating up to 3× performance improvements. For example, in concurrent implementations using lazy synchronization, such gains have been observed over other concurrent list-based data structures on modern multiprocessors, attributed to enhanced cache utilization and reduced pointer chasing. Such gains are particularly pronounced in memory-bound scenarios, though they depend on selecting K appropriately for the hardware's cache parameters.^[13]

Comparisons and Applications

Comparison with Traditional Linked Lists

Unrolled linked lists and traditional linked lists both enable dynamic storage and efficient local modifications, but they diverge significantly in design and performance trade-offs due to the former's use of array-based nodes containing multiple elements.^[6] In traditional singly or doubly linked lists, each node holds a single element alongside one or two pointers, leading to scattered memory allocation and high overhead from frequent indirections. Unrolled linked lists group K elements (typically 10–100) into a contiguous array per node with a single next pointer (or two for doubly-linked variants), which fundamentally alters access patterns, memory efficiency, and operation costs.^[8] Access patterns in unrolled linked lists excel for sequential reads, as the contiguous storage within nodes minimizes cache misses by leveraging spatial locality—traversing K elements requires only one pointer dereference per node, compared to one per element in traditional lists.^[6] This reduces the effective traversal steps to O(n/K) pointer follows, yielding fewer cache line fetches during linear scans. However, random access remains O(n) in both structures, as locating an arbitrary element necessitates traversing nodes from the head or a known position, with similar pointer-chasing overhead once the target node is reached.^[8] Memory usage benefits from the unrolled approach through reduced pointer overhead: traditional linked lists allocate one pointer (4–8 bytes) per element, resulting in O(n) extra space, whereas unrolled lists use one pointer per K elements, cutting overhead by a factor of up to K and limiting extra space to O(n/K + waste from partial nodes).^[6] For instance, with K=32, pointer storage drops from ~50% of total memory in traditional lists (assuming 8-byte elements) to ~2%, though underfilled nodes can waste up to K-1 slots per node if not balanced. Optimized variants, like space-efficient lists, further minimize this to O(√n) extra space by dynamically adjusting block sizes around √n.^[8] Operation costs for insertion and deletion highlight key trade-offs. Traditional linked lists achieve O(1) time for these at a known position via simple pointer updates, without shifting data. In unrolled linked lists, insertions or deletions within a node require O(K) array shifts to maintain contiguity, and balancing (e.g., splitting full nodes or merging sparse ones) adds amortized O(1) but worst-case O(K) per operation, making them slower for isolated random modifications yet more efficient overall for large n due to fewer memory allocations and better cache reuse during bulk changes.^[6] Unrolled linked lists are advantageous for cache-bound workloads emphasizing sequential access, such as streaming or batch processing, where locality drives performance gains; traditional linked lists suit applications with frequent random insertions or deletions, prioritizing the simplicity of constant-time local updates over traversal efficiency.^[8] For example, benchmarks on sequential traversal of lists with 1 million elements show unrolled variants achieving up to 2× faster execution times compared to traditional linked lists, primarily from reduced cache misses.^[6]

Comparison with Arrays and Other Structures

Unrolled linked lists provide dynamic growth capabilities without the need for periodic resizing operations that incur O(n) time costs in fixed-size or dynamic arrays, allowing for incremental expansion by adding new nodes as needed.^[5] This makes them particularly suitable for scenarios involving sparse data or frequent insertions, where arrays might require costly reallocations to double their capacity and copy existing elements. However, unlike arrays, which support O(1) random access via indexing, unrolled linked lists rely on sequential traversal through nodes, resulting in O(n) average-case access time for arbitrary elements.^[5] Compared to dynamic arrays such as C++ vectors, unrolled linked lists avoid the amortized costs associated with frequent resizing by distributing elements across fixed-capacity nodes linked together, eliminating the need for large-scale memory copies during insertions. This can lead to more predictable performance in insert-heavy workloads, though they incur higher memory usage due to the overhead of pointers in each node—typically requiring additional space for linking compared to the contiguous storage in dynamic arrays.^[5] In relation to other hybrid structures like skip lists and B-trees, unrolled linked lists offer simplicity for linear access patterns, as they maintain a straightforward node-linking mechanism without the probabilistic level promotions of skip lists or the balancing requirements of B-trees. While skip lists and B-trees achieve O(log n) search times through multi-level indexing or node splitting/merging, unrolled linked lists exhibit O(n) search complexity but reduce traversal steps by a factor of the block size k, making them less balanced yet easier to implement for sequential operations.^[5] Unrolled linked lists can serve as a foundational component in more advanced hybrids, such as unrolled skip lists, where array-based nodes enhance storage density and cache locality within skip list frameworks.^[5] Regarding space efficiency, unrolled linked lists achieve storage similar to arrays for the data elements themselves but include linking overhead, resulting in approximately 1.1 to 1.5 times the space of a pure array in dynamic scenarios, depending on the block size k and pointer/element sizes; this trade-off provides link flexibility without the fragmentation risks of traditional linked lists.^[5]

Practical Use Cases

Unrolled linked lists find application in database systems for managing free lists of storage pages, where each node stores multiple page identifiers to balance the flexibility of linked structures with efficient sequential access in embedded and lightweight databases.^[14] In persistent memory environments, variants like the Detectable Unrolled Lock-based Linked List (DULL) support durable, scalable set operations for database indexing and recovery, enabling crash-consistent data structures that outperform traditional linked lists by over twofold in throughput for update-intensive workloads across 20 to 250 threads.^[4] In multi-threaded software, concurrent unrolled linked lists serve as building blocks for high-performance data structures such as graphs and hash tables, providing up to 300% higher throughput than lock-free alternatives in read-write scenarios due to reduced pointer dereferences and enhanced spatial locality.^[15] These structures are particularly beneficial in scenarios involving frequent sequential scans, such as processing event streams or log files, where their cache-friendly design minimizes memory access latency compared to standard linked lists.^[16] Implementations appear in programming libraries to address performance bottlenecks in general-purpose collections; for instance, the PLF C++ library offers an unrolled linked list as a drop-in replacement for std::list, achieving 293% faster insertions, 57% faster erasures, and 17% faster iterations by storing multiple elements per node and optimizing allocator usage.^[16] Similarly, the Chapel language includes a built-in unrolled linked list module for parallel computing tasks requiring efficient dynamic lists.^[17] Despite these advantages, unrolled linked lists introduce overhead from partially filled nodes and complex iteration logic, making them less suitable for very small lists where the per-node allocation cost outweighs benefits, or for applications demanding frequent random access, which favor hash tables instead.^[18] A case study in cache optimization demonstrates that unrolled linked lists can reduce memory access times by grouping elements into cache-line-sized arrays, yielding up to 40% runtime savings in traversal-heavy workloads compared to traditional linked lists.^[5]