Skip list
A skip list is a probabilistic data structure that provides an efficient alternative to balanced binary search trees for implementing ordered sets and dictionaries, consisting of a base sorted linked list augmented by multiple higher-level "express" lists that form a sparse hierarchy, enabling fast search, insertion, and deletion operations with expected O(log n) time complexity per operation.[1]
Invented by William Pugh in 1989 and detailed in his 1990 paper, skip lists achieve balance through randomization rather than deterministic rotations or restructuring, where each node is assigned a random height (or level) with geometric probability p (typically p = 1/2), resulting in an average of about 1/(1-p) forward pointers per node and negligible risk of significant performance degradation—for instance, with p = 1/2 and n = 4096 elements, the probability of search time exceeding three times the expected value is less than 1 in 200 million.[1] This probabilistic approach simplifies implementation compared to trees like AVL or red-black variants, often yielding faster practical performance (e.g., search times around 0.051 milliseconds versus 0.046 for AVL trees in reported benchmarks) and lower space overhead due to the use of singly linked lists without parent pointers.[1]
Beyond core operations, skip lists support advanced functionalities such as merging two lists in expected O(log n) time per step—optimal across all input distributions—splitting a list by key in O(log n) time, and concatenating lists in O(log n) time, making them versatile for applications like concurrent data structures and linear list representations with position-based access.[2] They are particularly valued in scenarios requiring simplicity and speed, such as in-memory databases and probabilistic indexing, where their expected logarithmic performance rivals or exceeds that of more complex balanced trees while remaining easier to code and debug.[1]
Introduction
Definition and Purpose
A skip list is a hierarchical probabilistic data structure designed for maintaining a sorted collection of elements, consisting of a base sorted linked list augmented with multiple levels of "express lanes" formed by randomly selecting nodes to include additional forward pointers that skip over intervening elements.[1] This randomization, typically achieved by assigning each node a height (number of levels) with geometric probability (e.g., probability p = 1/2 of advancing to the next level), creates a layered structure where higher levels connect fewer but farther-apart nodes, facilitating faster traversal akin to express elevators in a building.[1]
The primary purpose of skip lists is to provide an efficient alternative to balanced binary search trees for operations on dynamic sets, such as search, insertion, and deletion, achieving expected O(\log n) time complexity per operation without the need for deterministic rebalancing.[1] Introduced as a simpler implementation option for abstract data types like dictionaries and ordered lists, skip lists leverage probabilistic balancing to ensure high performance in practice while requiring less code and debugging effort than tree-based structures.[1]
This design motivates the use of skip lists by combining the conceptual simplicity and sequential access ease of linked lists with logarithmic search speeds, avoiding the algorithmic complexity and potential bugs associated with maintaining tree balance during updates.[1] For instance, in a single-level sorted linked list of n elements, searching for a target requires O(n) time in the worst case by traversing node by node; in contrast, a skip list with 2-3 levels might skip every other node at level 1 and every fourth at level 2, reducing the expected search path length to roughly \log_2 n steps, as higher levels provide shortcuts across the base list.[1]
Comparison to Other Data Structures
Skip lists offer a compelling alternative to balanced binary search trees, such as AVL trees and red-black trees, primarily due to their simpler implementation while achieving comparable performance. Unlike balanced trees, which require complex rotation and rebalancing operations to maintain deterministic O(log n) worst-case time for search, insertion, and deletion, skip lists rely on probabilistic level assignments to achieve expected O(log n) time complexity for these operations without any balancing logic. This results in significantly easier-to-code algorithms that are comparable performance in practice—for example, benchmarks show search times of 0.051 ms versus 0.046 ms for non-recursive AVL trees—due to reduced maintenance overhead from the absence of balancing operations.[3]
In contrast to sorted arrays, which support efficient O(log n) binary search but demand O(n) time for insertions and deletions due to necessary element shifts, skip lists enable dynamic updates in expected O(log n) time, making them suitable for frequently changing datasets. Sorted arrays excel in space efficiency and deterministic performance for static queries but falter in mutable environments, whereas skip lists balance update efficiency with ordered access.[3]
Compared to hash tables, which provide average O(1) time for insertions, deletions, and lookups but store elements in unordered fashion, skip lists preserve key ordering to facilitate range queries and successor/predecessor operations that hash tables cannot support efficiently. Hash tables are ideal for exact-match accesses without order requirements, but skip lists are preferable when sorted traversal or partial key searches are needed.[3]
Key trade-offs include skip lists' probabilistic performance guarantees, which offer high probability (e.g., less than 1 in a million chance of exceeding three times the expected time for lists over 250 elements) against the deterministic worst-case bounds of balanced trees, alongside higher space usage from multiple forward pointers (averaging about 2 forward pointers per node for p = 1/2 versus trees' additional balance fields). These factors make skip lists particularly advantageous in concurrent or distributed settings where simplicity aids debugging and parallelism, though they may consume more memory than minimal structures like sorted arrays.[3]
Internal Structure
Multi-Level Organization
A skip list organizes its elements into multiple horizontal layers, known as levels, where the bottom level (level 0) contains all nodes in sorted order, forming a standard linked list.[3] Higher levels serve as "express lanes," consisting of subsets of nodes from the level below, allowing traversals to skip over multiple elements for faster access.[3] Typically, the probability p of a node being included in the next higher level is set to \frac{1}{2}, resulting in each successive level containing approximately half the nodes of the previous one.[3]
The structure begins with a sentinel header node that maintains forward pointers at every possible level, from the bottom up to a predefined maximum level, providing a unified starting point for operations across all layers.[3] This header ensures that empty levels above the current maximum node level point to a nil terminator, maintaining consistency in the hierarchy.[3]
Each node's level is assigned randomly upon insertion, with the probability of promotion to higher levels decreasing geometrically by p, which creates a sparse upper hierarchy that mimics the structure of a balanced tree in expectation.[3] The maximum level in a skip list of n elements is typically bounded by \lceil \log_{1/p} n \rceil + O(1), ensuring logarithmic height with high probability.[3]
Visually, a skip list can be represented as stacked horizontal lists: level 0 includes every node (e.g., all n elements), level 1 includes about n/2 nodes as a subsequence, level 2 about n/4, and so on, up to the maximum level with a single or few nodes, connected by vertical alignments between levels for the same elements.[3] This layered diagram highlights the progressive sparsity, where forward pointers at higher levels bridge larger gaps in the sorted sequence.[3]
Node and Pointer Composition
In a skip list, each node serves as the fundamental unit representing an element in the sorted collection. It consists of a key for ordering, an associated value, and an array of forward pointers that determine the node's participation across multiple levels. The height of a node, which indicates the number of levels it spans, is not stored explicitly but is implicitly defined by the size of its forward pointer array or the highest level with a non-null pointer. This structure allows nodes to connect horizontally within their respective levels while enabling vertical traversal for efficient searching.[1]
The forward pointers in each node link to the subsequent node at the same level that has a height at least as tall, effectively forming layered linked lists where higher levels contain fewer but longer jumps. For a node at height i, the forward pointer array has i+1 entries, with pointers indexed from level 0 (bottom) to level i (top); higher indices in the array are set to null if the node's height does not reach the maximum possible level of the skip list. Backward pointers are optional and not part of the standard design, though they can be added for applications requiring bidirectional traversal, such as in doubly-linked variants. All levels terminate with a sentinel node or null (NIL) pointers to mark the end of the list.[1]
Upon insertion, a node's height is assigned probabilistically to maintain the skip list's balance. This process starts at height 0 and increments the height with a geometric probability p (typically p = 1/4 or p = 1/2) via successive random trials—often simulated by coin flips—until the trial fails, subject to a predefined maximum height (MaxLevel) to bound the structure's depth. For example, with p = 1/2, approximately half the nodes reach level 1, a quarter reach level 2, and so on, ensuring a sparse upper hierarchy. This randomization distributes nodes across levels without explicit balancing.[1]
Nodes are dynamically allocated in memory, allowing the skip list to grow or shrink as elements are added or removed. The header node, which anchors the structure, maintains forward pointers up to MaxLevel, all initially pointing to null, and serves as the starting point for operations. This layout results in an average of about \frac{1}{1-p} pointers per node (e.g., about 1.33 for p = 1/4), promoting efficient space usage while supporting the probabilistic layering that underpins the data structure's performance.[1]
Core Operations
Search Procedure
The search procedure in a skip list begins at the header node and the highest level of the structure, traversing rightward along the forward pointers at that level until reaching a node whose next pointer would point to a key greater than or equal to the target search key.[1] At this point, the search descends to the next lower level and repeats the process, moving forward as far as possible without overshooting the target key.[1] This level-wise descent continues until the bottom level (level 1) is reached, ensuring an efficient path that skips over large portions of the list at higher levels.[1]
To support potential subsequent operations like insertion or deletion, the search maintains an update array that records the last node visited at each level before descending.[1] Specifically, for each level i, the array entry update[i] is set to the current node x after the forward traversal at that level completes, providing predecessors for structural modifications without re-traversing the list.[1]
Upon reaching the bottom level, the search follows the final forward pointer to the candidate node and checks if its key matches the target; if so, the associated value is returned, otherwise, the procedure indicates failure or returns the insertion point for the target key.[1]
The following pseudocode illustrates the search algorithm, adapted from the original description, where levels are indexed starting from 1 as the bottom level:
Search([list](/page/List), searchKey)
x := [list](/page/List).header
for i := [list](/page/List).level downto 1 do
while x.forward[i].key < searchKey do
x := x.forward[i]
update[i] := x // Optional for pure search, but tracks predecessors
x := x.forward[1]
if x.key = searchKey then
return x.[value](/page/Value)
else
return [failure](/page/Failure) // Or the position for insertion
Search([list](/page/List), searchKey)
x := [list](/page/List).header
for i := [list](/page/List).level downto 1 do
while x.forward[i].key < searchKey do
x := x.forward[i]
update[i] := x // Optional for pure search, but tracks predecessors
x := x.forward[1]
if x.key = searchKey then
return x.[value](/page/Value)
else
return [failure](/page/Failure) // Or the position for insertion
[1]
Consider a skip list containing keys 1, 3, 5, 7, 9 sorted in ascending order, with the maximum level of 3 and the following approximate structure (higher levels skipping nodes probabilistically): at level 3, pointers from header to 5 and then to 9; at level 2, header to 1, to 5, to 7; at level 1 (bottom), header to 1 to 3 to 5 to 7 to 9. Searching for key 6 starts at the header on level 3: the pointer to 5 (key 5 < 6) is followed, but the next to 9 (key 9 > 6) overshoots, so update[4] = node 5 and descend to level 2. On level 2, from node 5 (but actually continuing from the descent point, typically aligned), follow to 7 (key 7 > 6), so update[5] = node 5 and descend to level 1. On level 1, from node 5, the next is 7 (key 7 > 6), so update[6] = node 5, then follow forward[6] to node 7. Since 7 ≠ 6, the search fails, identifying the insertion point between 5 and 7.[1]
Insertion Process
The insertion process in a skip list begins by locating the appropriate position for the new key while maintaining the sorted order of the structure. This involves executing a search procedure to traverse from the header node downward through the levels, identifying the predecessor nodes at each level that immediately precede the insertion point. During this search, an update array is populated with pointers to these predecessor nodes, ensuring that the forward pointers can be efficiently rewired later.[1]
Once the insertion point is found, a random height (or level) is assigned to the new node using a geometric distribution, where each successive level has a probability p (typically p = 1/2) of being included, determined by repeated coin flips until the level is decided or a maximum level is reached. This randomization promotes nodes probabilistically, allowing higher levels to span multiple lower-level nodes and preserving the skip list's balanced properties on average. If the new node's height exceeds the current maximum level of the skip list, the header is extended to accommodate the additional levels by updating its forward pointers accordingly. A new node is then created with the given key and the assigned height, and it is inserted by updating the forward pointers of the predecessor nodes in the update array—from level 1 up to the new height—to point to the new node, effectively splicing it into the structure at all relevant levels.[1]
The following pseudocode outlines the insertion algorithm, adapted from the original description:
insert([list](/page/The_Algorithm), searchKey, newValue)
update = array of size MaxLevel
x = list.header
for i = list.level downto 1:
while x.forward[i].key < searchKey:
x = x.forward[i]
update[i] = x
x = x.forward[1]
if x.key == searchKey:
x.value = newValue // Update if key exists
return
lvl = randomLevel() // Generate height with p=1/2
if lvl > list.level:
for i = list.level + 1 to lvl:
update[i] = list.header
list.level = lvl
newNode = createNode(lvl, searchKey, newValue)
for i = 1 to lvl:
newNode.forward[i] = update[i].forward[i]
update[i].forward[i] = newNode
insert([list](/page/The_Algorithm), searchKey, newValue)
update = array of size MaxLevel
x = list.header
for i = list.level downto 1:
while x.forward[i].key < searchKey:
x = x.forward[i]
update[i] = x
x = x.forward[1]
if x.key == searchKey:
x.value = newValue // Update if key exists
return
lvl = randomLevel() // Generate height with p=1/2
if lvl > list.level:
for i = list.level + 1 to lvl:
update[i] = list.header
list.level = lvl
newNode = createNode(lvl, searchKey, newValue)
for i = 1 to lvl:
newNode.forward[i] = update[i].forward[i]
update[i].forward[i] = newNode
This procedure ensures O(\log n) expected time complexity for insertion, as the search and pointer updates occur over a logarithmic number of levels on average.[1]
For illustration, consider inserting the key 4 (with value irrelevant here) into a sample skip list containing keys [1, 3, 5, 7] across levels up to 3, where the search procedure would identify predecessors as the node with key 3 at levels 1–3 and the header at higher levels if needed. Assuming the random height generation yields height 2 for the new node (via two "heads" in coin flips with p=1/2), the new node is created and inserted: at level 1, the forward pointer from the node with key 3 changes from 5 to 4; at level 2, the forward pointer from the node with key 3 (if it spans level 2) or the appropriate predecessor changes similarly, bypassing to the next node (5) via the new node's forward pointer. This results in the new node linking forward to 5 at both levels 1 and 2, while nodes at level 3 remain unchanged unless the height were higher.[1]
Deletion Process
The deletion process in a skip list involves locating the target node and then bypassing it by updating the forward pointers of its predecessors at each relevant level, effectively removing it from all levels without altering the heights of other nodes. This operation leverages the update array collected during the search phase, which stores the predecessor nodes immediately before the search position at each level. If the target key is found, the algorithm splices out the node by redirecting these predecessors' forward pointers to the target's successors, ensuring the list remains correctly linked across all levels.[3]
To perform deletion, the algorithm first executes a search for the given key, populating the update array with the last node encountered at each level before the potential insertion or deletion point. Upon reaching level 1, it checks if the forward node at that level matches the target key; if not, the deletion fails silently. If matched, the loop iterates from level 1 up to the current maximum level, updating each predecessor's forward pointer to skip the target node, but terminates early if a higher-level predecessor does not directly point to the target (to avoid unnecessary updates beyond the node's height). Finally, the target node's memory is reclaimed, and the list's maximum level is decremented if the header's highest forward pointers become null, indicating no nodes at that level remain.[3]
This approach handles node heights implicitly: updates occur only up to the deleted node's effective height, as higher levels will not have the predecessor directly linking to it. For nodes with height 1, the process simplifies to a standard singly-linked list removal at the bottom level, with no changes to upper levels. Edge cases include attempting to delete a non-existent key, which results in no action, and deleting the sole node at the maximum level, which may reduce the overall list height by repeatedly checking and lowering the level until a non-null forward pointer is found at the top.[3]
The following pseudocode outlines the deletion procedure, adapted from the original description:
Delete(list, searchKey)
local update[1..MaxLevel]
x := list.header
for i := list.level downto 1 do
while x.forward[i].key < searchKey do
x := x.forward[i]
update[i] := x
x := x.forward[1]
if x.key = searchKey then
for i := 1 to list.level do
if update[i].forward[i] ≠ x then break
update[i].forward[i] := x.forward[i]
free(x)
while list.level > 1 and list.header.forward[list.level] = NIL do
list.level := list.level - 1
Delete(list, searchKey)
local update[1..MaxLevel]
x := list.header
for i := list.level downto 1 do
while x.forward[i].key < searchKey do
x := x.forward[i]
update[i] := x
x := x.forward[1]
if x.key = searchKey then
for i := 1 to list.level do
if update[i].forward[i] ≠ x then break
update[i].forward[i] := x.forward[i]
free(x)
while list.level > 1 and list.header.forward[list.level] = NIL do
list.level := list.level - 1
This code integrates the search to build the update array and performs the bypass logic efficiently.[3]
Consider a skip list with nodes for keys 3, 5, and 7, where node 5 has height 2 (present at levels 1 and 2), node 3 has height 3, and node 7 has height 1. The update array during search for key 5 would store node 3 as the predecessor at levels 1 and 2 (and the header at level 3). To delete key 5, the forward pointer from node 3 at level 1 is set to node 7, and at level 2, node 3's forward pointer skips to null (or the next higher node if present), bypassing node 5 entirely and freeing its memory, while node 3 retains its height 3 unchanged.[3]
Time Complexity
The time complexity of skip list operations is analyzed probabilistically, leveraging the random level assignment of nodes, which follows a geometric distribution with success probability p (typically p = 1/2).[3] For search, insertion, and deletion, the expected time is O(\log n), where n is the number of elements, due to the structure's multi-level organization enabling logarithmic traversal paths with high probability.[3]
The expected search time can be derived by considering the backward traversal from the target node to the list's head, where the number of steps at each level is geometrically distributed. Let C(k) denote the expected number of steps to climb k levels; this satisfies the recurrence C(k) = 1 + (1-p) C(k) + p C(k-1), which solves to C(k) = k / p.[3] The total expected search cost is then the sum over levels up to the maximum level L(n) \approx \log_{1/p} n, yielding O((1/p) \log_{1/p} n) using linearity of expectation on the expected steps per level.[3] For the standard p = 1/2, this simplifies to O(\log n), with a precise upper bound of approximately $2 \log_2 n + 2.[3]
Insertion and deletion have expected times asymptotically matching search, as they involve a preceding search followed by local updates whose cost is proportional to the inserted/deleted node's level, which has expected height O(1) under the geometric distribution with p = 1/2.[3] The maximum number of levels is O(\log n) with high probability, ensuring the total levels do not exceed this bound.[3]
In the worst case, operations can take O(n) time if the skip list degenerates to a linear structure, such as when all nodes reach the maximum height, but this occurs with exponentially low probability (approximately $1/2^{O(\log n)}).[3] To mitigate this, implementations often cap the maximum level at O(\log n), bounding the worst-case time while preserving the expected performance.[3] For p = 1/2 and n > 250, the probability that a search exceeds three times the expected time is less than 1 in a million.[3]
Space Complexity and Randomization
Skip lists exhibit an expected space complexity of O(n), where n is the number of elements, primarily because each node maintains an expected constant number of pointers. The height of each node is determined randomly, following a geometric distribution with success probability p (typically p = 1/2 or p = 1/4), resulting in an expected height of 1/(1-p). This probabilistic height assignment ensures that the total number of pointers across all nodes is linearly proportional to n, as higher levels become progressively sparser. For instance, with p = 1/2, the structure requires approximately 2n pointers in expectation, making it space-efficient compared to balanced trees that enforce strict height constraints.[1]
The randomization in skip lists is achieved through independent Bernoulli trials during node insertion, where the height of a new node is selected by repeatedly "flipping a coin" with probability p to decide whether to extend to the next level, up to a maximum level L(n) ≈ log_{1/p} n. This process, independent for each node, guarantees that the layers remain sparsely populated and balanced without the need for explicit rebalancing operations, mimicking the probabilistic balance of treaps or random binary search trees. The expected number of nodes at level i is n \cdot p^{i-1}, ensuring that the top levels have few nodes while lower levels are dense, which contributes to the overall linear space usage.[1]
Formally, the expected number of pointers per node is given by \frac{1}{1-p}, leading to a total expected space complexity of O\left(\frac{n}{1-p}\right). For p = 1/2, this yields 2 pointers per node on average; for p = 1/4, it reduces to approximately 1.33 pointers per node, optimizing memory while preserving performance. With high probability (approaching 1 as n grows), the space remains O(n), as the maximum height deviates from the expected log_{1/p} n only with exponentially small probability. However, in the pathological worst case—where randomization unluckily assigns maximum height to all nodes—the space can reach O(n \log n), though this event has probability less than (1/n)^c for some constant c > 0.[1]
Tuning the parameter p allows trade-offs between space efficiency and performance variability: lower values of p (e.g., p = 1/4) increase search speed by creating more sparse higher levels but reduce space usage and increase variance in running times, while p = 1/2 minimizes variance at the cost of marginally more memory. Pugh recommends p = 1/4 as a default for most applications due to its improved speed and space efficiency, though with higher variability; p = 1/2 should be used when minimizing variance is a concern, particularly when implementing random level generation with bit manipulation for efficiency.[1]
Variants
Indexable Skip Lists
Indexable skip lists extend standard skip lists by augmenting nodes with a span (or distance) field at each level, which records the number of elements spanned by the forward pointer from that node. This augmentation enables efficient support for order statistics operations, such as determining the rank of an element or selecting the k-th smallest element in the ordered sequence, in expected O(\log n) time. Unlike basic skip lists, which primarily facilitate dictionary operations like search by key, indexable variants map logical positions (ranks) to physical nodes, mimicking the capabilities of augmented balanced trees for rank-based queries.[2]
Construction of an indexable skip list follows the probabilistic layering of standard skip lists, where each inserted node is assigned a random height (e.g., with probability p=1/2 for each additional level). Upon insertion or deletion, the span fields are updated incrementally: for a pointer at level l from node x to its successor y, the span is set to the difference in ranks between y and x, computed as span(x, l) = [rank](/page/Rank)(y) - [rank](/page/Rank)(x). These updates propagate along the affected paths, maintaining the structure in expected O(\log n) time per operation without requiring full rebuilds. The spans can be built either eagerly during every update or lazily on demand for queried levels, though eager maintenance ensures consistent performance. End-of-list sentinels typically have spans of zero to bound traversals.[2]
Key operations leverage the spans for rapid navigation. The select(k) operation (or search-by-rank) begins at the list head and the highest level, subtracting the span values from k as it traverses forward pointers, descending levels when a span exceeds the remaining rank. This binary-search-like descent quickly approximates the target position before a final linear refinement at level 0 yields the exact k-th node. Similarly, the rank(x) operation computes the position of a node x by summing spans encountered during a standard search path to x, adjusted for the path's descent. Both operations achieve expected O(\log n) time, matching the complexity of core skip list searches.[2]
This design offers significant advantages over fully balanced trees for order statistics, providing equivalent asymptotic performance with simpler code—avoiding complex rotations or rebalancing—while inheriting the space efficiency and ease of concurrent modifications from probabilistic skip lists. The integrated spans eliminate the need for a fully separate indexing structure like an array or auxiliary tree, reducing overhead while supporting dynamic updates to the ordered set. For instance, in a skip list of 16 elements, the top level might span roughly 8 elements, enabling a single forward step to reach near the 8th position, with subsequent levels halving the search space to locate it precisely.[2]
Deterministic and Ordered Skip Lists
Deterministic skip lists modify the standard probabilistic skip list structure by assigning fixed heights to nodes based on deterministic rules, ensuring logarithmic time complexity for search, insertion, and deletion in the worst case rather than just in expectation. Unlike probabilistic skip lists, which rely on random height selection to achieve an expected O(log n) performance, deterministic variants enforce balance through predefined promotion criteria, such as limiting the number of nodes at lower levels between those at higher levels. This approach mimics the expected balance of the probabilistic version while eliminating randomness, making it suitable for systems requiring predictable behavior, such as real-time applications.[7]
A common implementation is the 1-2 skip list, where at each level h, there are either 1 or 2 nodes from level h-1 between nodes at level h, ensuring a bounded gap invariant. For instance, in a fixed-level assignment, nodes at level k may include every 2^k-th node in the sequence, creating a structure analogous to a balanced binary tree but using linked lists. During insertion, if this invariant is violated—such as when three consecutive nodes exist at a lower level—the middle node is promoted to the next level, maintaining the balance without probabilistic choices. Deletion similarly involves demoting nodes if gaps become too small, with search traversing levels from top to bottom as in the original skip list. These operations achieve O(log n) worst-case time complexity, though with higher constant factors due to the maintenance overhead.[7][8]
Ordered skip lists extend this determinism by enforcing strict ordering and balanced promotion rules across levels, often through variants like the 1-2-3 skip list, which allows up to 3 nodes at level h-1 between level h nodes, corresponding to a 2-3-4 tree structure. Promotion occurs deterministically based on position or insertion order, splitting larger gaps during updates to preserve even distribution and ordering integrity. This ensures not only worst-case O(log n) performance but also supports efficient range queries in ordered sequences, as levels remain strictly sorted without random disruptions.[7][8]
The primary trade-offs of deterministic and ordered skip lists include increased space usage—typically 2.3n to 6n pointers compared to the probabilistic version's average of about 2n—and more complex update logic, which can lead to O(log² n) insertion times in naive array-based implementations unless optimized with exponential arrays or linked lists. However, these variants gain predictability and bounded worst-case performance, avoiding the rare but possible linear-time degradations of probabilistic skip lists, at the cost of the original's simplicity and lower average constants. They are particularly valuable in environments demanding guaranteed response times, such as embedded systems or concurrent data structures.[7][8]
History and Development
Invention and Key Publications
The skip list data structure was proposed by William Pugh in 1989 while he was at the University of Maryland, College Park, as a probabilistic alternative to balanced trees aimed at simplifying implementation, particularly in concurrent environments where maintaining balance under parallel access posed significant challenges.[9] Pugh first presented the concept at the 1989 Workshop on Algorithms and Data Structures (WADS), with a technical report on concurrent maintenance highlighting how skip lists could support efficient locking mechanisms for shared access without the rotational adjustments required by structures like AVL trees.
The seminal publication expanding on this work appeared in 1990, with Pugh's article "Skip Lists: A Probabilistic Alternative to Balanced Trees" in Communications of the ACM, which formalized the structure's design, probabilistic height assignment, search algorithms, and expected performance guarantees.[1] This paper built directly on the 1989 workshop presentation at the Algorithms and Data Structures (WADS) conference, providing detailed proofs for O(log n) expected time complexity for insertions, deletions, and searches while emphasizing the structure's ease of implementation over deterministic balancing.[9][1]
Pugh drew inspiration for skip lists from earlier concepts in random graph theory, which informed the probabilistic layering to achieve balanced heights with high probability, and from multi-level indexes commonly used in database systems for accelerating range queries.[1] These influences allowed skip lists to generalize single-linked lists into a hierarchy of exponentially decreasing sublayers, avoiding the need for complex rebalancing while retaining logarithmic performance.[1]
Evolution and Influences
Following the original proposal of skip lists in 1990, significant developments emerged in the 2000s focused on concurrency, with lock-free implementations enabling non-blocking operations in multithreaded environments. These extensions, such as the lock-free skip list algorithm by Fomitchev and Ruppert (2004), addressed synchronization challenges by using compare-and-swap operations to manage updates without mutual exclusion, improving scalability in high-contention scenarios.[10] By the 2010s, integration with persistent memory structures further advanced skip lists, as seen in frameworks like Mirror (2021), which retrofits lock-free skip lists for durability across crash failures by logging operations and ensuring atomic persistence.[11]
Skip lists have influenced probabilistic balancing techniques in other data structures, notably treaps, where random priorities maintain expected logarithmic height similar to skip lists' layered randomization. Introduced by Seidel and Aragon in 1996, treaps draw from skip lists' use of probability to achieve balance without deterministic rotations, simplifying implementation while preserving amortized performance guarantees.[12] Additionally, skip lists were adopted in standard libraries, exemplified by Java's ConcurrentSkipListMap, released in Java 1.5 in 2004, which provides a thread-safe, sorted map using a lock-free skip list backbone for concurrent navigation and iteration.[13]
Refinements have addressed criticisms regarding the choice of promotion probability p, with debates centering on trading space efficiency against performance variance; Pugh recommended p = 1/4 for lower variance in search times, while p = 1/e \approx 0.368 optimizes expected height but increases space usage.[14] To mitigate randomization's unpredictability in resource-constrained settings, deterministic variants emerged, such as those by Munro et al. (1996), which enforce fixed skipping patterns to guarantee worst-case logarithmic operations without probability, proving useful in systems requiring bounded resources.[15]
In recent years up to 2025, skip lists have seen application in blockchain systems for efficient indexing of ordered transaction sets, as in the Enhanced Append-Only Skip List (EASL) scheme (2024), which leverages skip lists for sublinear retrieval in append-only ledgers while maintaining immutability.[16] Surveys highlight ongoing minor optimizations in database-integrated machine learning pipelines, where skip lists support fast sorted access in key-value stores underlying tensor operations, though without major algorithmic overhauls.[17] Additionally, a comprehensive 2025 survey reviews skiplist variants and applications in data systems, and the Ternary Tree Enhanced Append-Only Skip List (TEASL) (2025) further improves blockchain data search efficiency.[18][19]
Applications
Use in Databases and Libraries
Skip lists are employed in several key-value stores to manage in-memory data structures efficiently. In LevelDB, a persistent key-value store developed by Google and released in 2011, the memtable—an in-memory buffer for recent writes—is implemented using a skip list to maintain sorted key-value pairs.[20] This allows for logarithmic-time insertions and lookups, supporting high-throughput writes before data is flushed to disk-based sorted string tables.[21]
Redis, an in-memory data store, utilizes a variant of skip lists in its sorted sets data type to enable ordered collections of unique elements associated with scores.[22] This structure facilitates efficient range queries and maintains order without requiring full resorting on each operation. For instance, the ZADD command, which adds or updates elements with scores, and the ZRANGE command, which retrieves elements by rank, both achieve O(log N) time complexity thanks to the skip list's probabilistic layering.[22][23][24]
Apache Cassandra, a distributed NoSQL database, uses skip lists in its memtable implementation to index partitions efficiently, supporting concurrent writes and reads in large-scale deployments.[25]
SingleStore, a distributed SQL database, employs skiplist indexes for rowstore tables to achieve high-throughput inserts and scans with low memory overhead compared to traditional B-tree indexes.[26]
In these database contexts, skip lists provide advantages for write-heavy workloads by offering amortized O(log N) performance for insertions and searches without the overhead of tree rebalancing.[21] Their linked-list foundation also supports lock-free concurrency in multi-threaded environments through compare-and-swap operations on node pointers, reducing contention compared to traditional balanced trees.[13] This makes them suitable for scenarios demanding high concurrency and low-latency access, such as memtable management in embedded databases.
Standard software libraries have integrated skip lists for thread-safe ordered collections. In Java, the java.util.concurrent package includes ConcurrentSkipListSet and ConcurrentSkipListMap since JDK 1.6 (released in 2006), which implement scalable, sorted sets and maps using skip lists.[27] These classes ensure thread-safety without locks by leveraging atomic pointer updates, enabling concurrent insertions, deletions, and iterations with expected O(log N) costs.[27]
Real-World Implementations and Examples
Skip lists find application in distributed networking through structures like skip graphs, which extend the skip list concept to provide balanced tree-like functionality across networked nodes, enabling efficient search and maintenance in large-scale, fault-tolerant systems. These graphs model physical computers as nodes and network connections as pointers, supporting operations such as predecessor queries and range searches with expected logarithmic time complexity, making them resilient to node failures common in networking environments.[28]
In machine learning, skip lists influence approximate nearest neighbor search algorithms, notably the Hierarchical Navigable Small World (HNSW) index implemented in the FAISS library. HNSW borrows the hierarchical multi-layer design from probabilistic skip lists to create navigable graphs that accelerate vector similarity searches, achieving high recall rates while scaling to billions of high-dimensional embeddings used in recommendation systems and image retrieval. For instance, during beam search in sequence generation tasks, skip list-based priority queues can efficiently manage candidate sequences by maintaining sorted order with logarithmic insertion and extraction costs, though direct implementations vary by framework.[29][30]
Apache Lucene, a high-performance text search library, uses skip lists to build inverted indexes, allowing fast access to posting lists for efficient querying and scoring in search engines.
Couchbase Server incorporates the Nitro storage engine, which relies on a lock-free skip list as its core index for high-performance in-memory key-value storage in NoSQL global secondary indexes.[31]
Deterministic variants of skip lists, which replace randomization with fixed promotion rules to ensure worst-case logarithmic performance, suit embedded systems where predictability is essential. These structures support real-time scheduling queues in resource-constrained environments by providing bounded-time insertions and deletions for task prioritization, avoiding the variability of probabilistic skip lists in safety-critical applications like automotive control systems.[15]
In research on graph algorithms, skip lists underpin dynamic shortest path computations in evolving networks by enabling efficient updates to sorted edge lists during graph modifications. For example, skip graph extensions facilitate distributed shortest path queries in peer-to-peer overlays, where nodes dynamically adjust paths with sublinear overhead, outperforming traditional trees in decentralized settings. Similarly, in cryptocurrency systems like blockchains, append-only skip lists serve as indexes for transaction ordering and retrieval, allowing sublinear verification of block data without full scans; the Enhanced Append-Only Skip List (EASL) enhances this by integrating binary search for agile access to historical ledgers in systems requiring immutability and fast queries.[28][16]
Open-source implementations demonstrate practical adoption, such as the pyskiplist library in Python, which provides an indexable skip list for sorted collections with O(log n) operations, suitable for prototyping concurrent data structures.[32]