Fact-checked by Grok 2 weeks ago

Hash array mapped trie

A hash array mapped trie (HAMT) is a data structure that implements an , combining the efficiency of tables with the structured navigation of a by using a key's value to traverse an array-mapped for storage and retrieval. In a HAMT, keys are hashed to produce a fixed-length bit , with successive groups of bits (typically 5 bits per level) indexing into sparse arrays or sub-tries, allowing collisions to be resolved through deeper traversal rather than chaining. Key operations in a HAMT, such as lookup, insertion, and deletion, achieve average O(1) independent of the key set size, as traversal depth is bounded by the length (e.g., 32 bits). Insertion follows the bits to locate or create a , expanding nodes as needed with lazy resizing of the to maintain efficiency, while deletion prunes empty sub-tries to reclaim space. Nodes are typically represented compactly using a to indicate present children and an of pointers, minimizing memory overhead compared to full . HAMTs offer significant advantages over traditional chained hash tables and ternary search trees, including 3-4 times faster performance and 60% less space usage, with bounded worst-case times via rehashing. Their support for persistence—creating modified copies that share structure with the original—enables efficient immutable updates, making them ideal for where data immutability is key. HAMTs are widely implemented in languages emphasizing , such as Clojure's hash maps, which use them to provide log32N access times with structural sharing for concurrent safety and efficiency. Similar structures appear in collections and Haskell's unordered-containers package, underscoring their role in high-performance, immutable data handling across modern programming ecosystems.

Overview

Definition

A hash array mapped trie (HAMT) is a that implements associative arrays, such as maps or sets, by combining the indexing efficiency of tables with the path compression of tries. It organizes key-value pairs in a where the path to each entry is determined by the bits of a value computed from the key, enabling O(1) average-case operations with bounded worst-case performance independent of the number of elements. This structure avoids the resizing issues of traditional tables by using a fixed-width (typically 32) and sparse node representations, making it particularly suitable for immutable collections in languages. Key features of a HAMT include its use of key hashes to define key-independent traversal paths, which handle collisions by extending the hash with additional bits as needed during insertion. To save space in sparse scenarios, internal nodes employ bitmap-indexed arrays for child pointers: a 32-bit bitmap indicates which of the 32 possible slots are occupied, followed by a containing only the present sub-nodes or leaf values. Immutability is supported through path copying, where updates create new versions of modified nodes along the access path while sharing unchanged subtrees, facilitating structural sharing across multiple collection versions. A basic HAMT node can be conceptually represented as follows, where the encodes occupied and the holds references:
Node {
  bitmap: uint32  // Bit i set if [slot](/page/Slot) i is occupied
  children: [array](/page/Array)[Node or KeyValue]  // Length up to popcount([bitmap](/page/Bitmap)), up to 32 entries
}
For example, with a of 0b00000000000000000000000000000101 ( 5), 0 and 2 are occupied, so the children has two entries pointing to substructures derived from the corresponding bits. Unlike an array-mapped trie (AMT), which indexes paths directly using bits from the keys themselves for ordered or dense data like sequences, a HAMT relies on hash codes to create unordered, collision-resistant paths suitable for arbitrary keys in associative arrays. This hashing decouples the structure from key ordering, improving efficiency for dynamic, sparse sets.

History

The hash array mapped trie (HAMT) was invented by Phil Bagwell in 2001 at the (EPFL)'s Programming Methods Laboratory (LAMP), as detailed in his technical report "Ideal Hash Trees." Bagwell's design addressed limitations in traditional hash tables and tries by combining hashing with array-mapped branching to achieve near-ideal performance characteristics, such as O(1) average-case operations independent of table size. Bagwell's motivation stemmed from the demands of for efficient, persistent data structures that support immutability and structural sharing without performance penalties, particularly in emerging languages like developed at the same institution. The HAMT enabled cheap creation of new versions of the structure upon updates, making it suitable for concurrent and pure functional paradigms. The structure gained prominence through its adoption in Clojure's PersistentHashMap, implemented by for the language's 1.0 release in May 2009, providing immutable hash maps with efficient persistence. It also influenced 's collections library, where a concurrent variant, TrieMap, was introduced in Scala 2.10 in 2012 as a lock-free hash array mapped trie for thread-safe operations. Later extensions, such as those in Steindorfer's 2015 work on optimizing HAMTs for immutable stores, refined collision handling and space efficiency for broader applications. Key milestones include Bagwell's foundational 2001 report, followed by practical implementations in functional languages by the late 2000s, and widespread integration into production systems throughout the and , including refinements for concurrent access and reduced memory overhead in libraries like Scala's, as well as 2021 research on compiler and runtime support for HAMTs.

Data Structure

Node Types

In a hash array mapped trie (HAMT), nodes are the fundamental building blocks that enable efficient hashing and traversal while supporting persistence. The structure primarily features internal nodes for branching, nodes for , collision nodes for handling hash conflicts, and mechanisms for to manage growth. Internal nodes act as the core branching elements, each consisting of a fixed-size —typically 32 bits long, where each bit represents whether a slot (0 to 31) is occupied—and a of pointers to nodes. The 's bits are set based on 5-bit segments of a key's , allowing quick determination of occupied positions via bit operations like population count (popcount) to into the array, which holds exactly as many entries as set bits in the for space efficiency. This design avoids allocating a full 32-entry , using only the necessary pointers to sub-nodes. Leaf nodes serve as terminals in the trie, storing the actual key-value pairs once the hash-based path resolution concludes. They contain the full key and associated value, with hash information often retained for verification; during traversal, full key equality checks occur only at this leaf level to confirm matches. Collision nodes address cases where multiple keys produce identical full hash values, preventing overwrite or loss. In the original design, these take the form of a sub-trie that applies additional hash bits to further differentiate keys. Some implementations use a linear structure like a list or array of key-value entries, where resolution relies on complete key comparisons rather than hashing. When an internal node's bitmap fills completely (all 32 bits set), expansion occurs by introducing a new internal node level; existing children are redistributed using the next hash bits, creating a deeper path while preserving the maximum branching factor of 32 per node. This process ensures the trie remains balanced without unbounded node sizes. The following pseudocode illustrates a basic node layout in a C-like syntax, highlighting the bitmap and dynamic array for internal nodes, with simplified variants for others:
typedef struct Node Node;

struct InternalNode {
    uint32_t bitmap;          // 32-bit bitmap for occupied slots
    Node* children[32];       // Dynamic array, actual size = __builtin_popcount(bitmap)
};

struct LeafNode {
    uint32_t hash;            // Full hash for verification
    Key key;                  // Actual key
    Value value;              // Associated value
};

struct CollisionNode {
    uint32_t hash;            // Shared collision hash
    Entry<Key, Value> entries[];  // Array or list of colliding pairs; size >1
    // Alternatively: sub-trie root for further hashing
};
Internal and collision nodes derive from a common Node base, enabling uniform traversal.

Hashing and Traversal

In a hash array mapped trie (HAMT), the hashing process begins with computing a fixed-width hash value for each key, typically a 32-bit or 64-bit , to enable path indexing through the trie structure. Common hash functions include non-cryptographic algorithms such as Universal Hash or those producing uniform distributions like and FNV, which generate bit sequences that guide traversal without requiring key string comparisons at higher levels. In implementations like Clojure's PersistentHashMap, hash codes are computed using utility functions that may invoke the Java hashCode() method for objects. In contrast, Scala's HAMT applies bit-spreading to the hash codes to enhance distribution for the 5-bit indexing scheme. The traversal extracts successive bit slices from the to navigate the sparse nodes. For a of 32 (common in 32-bit setups), 5 bits are taken at a time starting from the most significant bits; these bits form an index that is used to probe a node's , which marks occupied slots. The , a 32-bit where each bit corresponds to a potential , allows efficient location of the child offset via a population count (CTPOP) operation on the bits up to the index position. The shift value advances by 5 bits for each level (e.g., shift = level * 5), enabling descent into sub-tries until the or a collision is reached. Collisions occur when multiple keys produce identical hash prefixes, leading the traversal to a shared path that terminates in a collision . In such cases, proceeds by descending into the collision and using the remaining hash bits or direct equality checks to distinguish entries, avoiding full rehashing unless necessary. This approach maintains the trie's efficiency by localizing collisions at the leaves. The following pseudocode illustrates a basic traversal function for lookup, assuming a 32-bit hash and 5-bit slices:
function traverse(node, key, hash, shift):
    if node is leaf:
        return node if key equals stored key else [null](/page/Null)
    index = (hash >> shift) & 31  // Extract 5 bits
    bit = 1 << index
    if (node.bitmap & bit) == 0:
        return [null](/page/Null)  // No child at this index
    child_index = popcount(node.bitmap & (bit - 1))  // Offset via CTPOP
    child = node.children[child_index]
    if child is collision node:
        return find_in_collision(child, key)  // Use key equality
    else:
        return traverse(child, key, hash, shift + 5)
This algorithm, rooted in the original HAMT design, iterates through levels until resolution.

Operations

Insertion

Insertion in a (HAMT) begins by computing a fixed-width hash value for the input key, typically 32 bits using a universal hashing function to minimize collisions. This hash guides traversal from the root node, consuming 5 bits at each level to determine the branch index (0 to 31), enabling a branching factor of up to 32. The process follows the search path until reaching a or an empty slot; if the key already exists at the leaf, the value is updated by creating a new leaf node with the same hash but the new value, without altering the structure. If the slot is empty, a new leaf node containing the key-value pair is inserted. To maintain persistence, insertion employs path copying: a new version of each node along the traversal path is created, while unchanged subtrees are shared with the original structure. This results in O(log_{32} n) new nodes, approximately 5 to 7 for typical 32-bit hashes, ensuring that the original trie remains unmodified and multiple versions can coexist efficiently. Internal nodes use a 32-bit bitmap to sparsely represent present children, where each bit corresponds to a potential . During insertion, if the target bit in the bitmap is unset, it is set (via bitwise OR), and the child array is expanded by inserting the new child at the position determined by the population count of bits lower than the target bit. Expansions occur when a collision is detected at a leaf level, such as differing keys sharing the same hash prefix up to that depth. In this case, the existing leaf is replaced by a new internal node, with the old key-value pair and the new one placed as children in a sub-trie using the next 5 hash bits for indexing; this may deepen the trie beyond the initial hash width if necessary, recomputing hashes as needed. The child array in internal nodes remains fixed at size up to 32 but is sparsely populated, avoiding full resizes—instead, full bitmaps (all 32 bits set) simply recurse deeper without splitting, as further bits distinguish keys. The insertion is typically implemented as a recursive function that returns a new root node, copying nodes only on the mutation path:
function insert(node, key, value, shift = 0):
    if node is Leaf:
        if node.key == key:
            return new Leaf(key, value)  // overwrite value
        else:
            // create collision sub-trie
            bit = (hash(key) >>> shift) & 31
            new_internal = new Internal(bitmap = 0, children = [])
            // add old leaf as child for its bit
            old_bit = (hash(node.key) >>> shift) & 31
            insert_child(new_internal, old_bit, node, shift + 5)
            // add new leaf
            insert_child(new_internal, bit, new Leaf(key, value), shift + 5)
            return new_internal
    else:  // Internal node
        bit = (hash(key) >>> shift) & 31
        index = popcount(node.bitmap & ((1 << bit) - 1))
        if (node.bitmap & (1 << bit)) != 0:
            // bit set, recurse to existing child
            new_child = insert(node.children[index], key, value, shift + 5)
            if new_child == node.children[index]:
                return node  // no change
            else:
                // copy path: new node with updated child
                new_children = copy(node.children)
                new_children[index] = new_child
                return new Internal(node.bitmap, new_children)
        else:
            // bit unset, create new leaf
            new_leaf = new Leaf(key, value)
            new_bitmap = node.bitmap | (1 << bit)
            new_index = popcount(new_bitmap & ((1 << bit) - 1))
            new_children = copy(node.children)
            insert_into_array(new_children, new_index, new_leaf)  // shift existing
            return new Internal(new_bitmap, new_children)
This recursive approach ensures structural sharing and immutability, with array copies limited to the path length. Duplicates are handled solely by value replacement at the leaf, incurring no additional node creation beyond the path copy.

Lookup

The lookup operation in a (HAMT) retrieves the value associated with a given key by traversing the structure based on the key's , without modifying the trie. This process begins by computing a fixed-width (typically 32 or 64 bits) of the key to determine the path through the trie levels. The traversal proceeds level by level, using successive bit chunks (commonly 5 bits) from the hash to index into sparse array-mapped nodes, which are compressed via to save space. At each level, the selected bit chunk identifies a position in the node's bitmap; if the corresponding bit is unset, indicating no child at that index, the lookup fails immediately and returns null or undefined. If a child exists, the process advances to that sub-node (potentially an array entry, sub-trie, or leaf) by calculating the offset using the population count of set bits below the target position in the bitmap. This continues until reaching a leaf node or collision node at the trie's depth. Node traversal details, such as bitmap indexing, are covered in the section. Upon reaching the endpoint, if the node is a leaf, the full key is compared for equality; a match yields the associated value, while a mismatch returns not found. In cases of hash collisions—where multiple keys share the same hash prefix leading to a collision node—the lookup performs a linear search through a list or sub-trie of key-value pairs, again comparing full keys until a match is found or all are exhausted. The operation concludes by returning the value if present, or null/undefined otherwise, ensuring no structural changes. The following pseudocode illustrates an iterative lookup in a typical 32-bit HAMT with 5-bit chunks and bitmap-indexed s:
function lookup(root, key):
    if root is [null](/page/Null):
        return [null](/page/Null)  // Empty trie
    hash = hashFunction(key)  // 32-bit hash
    [node](/page/Node) = root
    shift = 0
    while shift < 32:
        bitIndex = (hash >>> shift) & 31  // Extract 5 bits (0-31)
        if not bitmapHasBit([node](/page/Node).bitmap, bitIndex):
            return [null](/page/Null)  // No child at this index
        childIndex = popcount([node](/page/Node).bitmap & ((1 << bitIndex) - 1))  // Offset via popcount
        [node](/page/Node) = [node](/page/Node).children[childIndex]
        if [node](/page/Node) is leaf:
            if [node](/page/Node).key == key:
                return [node](/page/Node).value
            else:
                return [null](/page/Null)
        elif [node](/page/Node) is collision:
            for pair in [node](/page/Node).pairs:
                if pair.key == key:
                    return pair.value
            return [null](/page/Null)
        shift += 5
    return [null](/page/Null)  // Should not reach if properly structured
This algorithm detects misses early during traversal, contributing to average O(1) time complexity. Edge cases include an empty trie, where the root is null and lookup immediately returns null, and non-existent keys, which fail either mid-traversal (missing child) or at the endpoint (key mismatch). In collision nodes, exhaustive search ensures correctness even with hash collisions.

Removal

The removal operation in a hash array mapped trie (HAMT) deletes a specified key-value pair while maintaining the structure's immutability and persistence in functional implementations. The process begins by traversing the trie from the root to the location of the key, using the key's hash value to index into bitmap-indexed nodes along the path, similar to the lookup operation but with subsequent modifications for deletion. This traversal identifies the leaf node or collision node containing the key. At the target node, if it is a leaf and the key matches, an empty node is returned to indicate removal. If it is a collision node, a new collision node is created with the matching key-value pair excised; if the collision becomes empty, an empty node is returned instead. Upon removal, the algorithm backtracks up the path, potentially contracting the structure to maintain efficiency. If a child is now empty, the corresponding bit in the bitmap is cleared, and the child array is compacted by removing the empty slot. If a node has only one child after removal (singleton path), it may be collapsed by inlining the child directly into the parent, reducing depth; this contraction is optional in basic implementations but common in optimized variants to avoid unnecessary levels. Empty subtrees are pruned by returning an empty node sentinel. Path copying ensures persistence: new nodes are allocated only along the modified path, sharing unchanged subtrees with the original trie to minimize space overhead. If the root becomes empty after contraction, a new empty root is returned. For nodes with full bitmaps (all 32 children present), removal recurses deeper using further hash bits without splitting. The removal is typically implemented recursively, with each call returning a new node representing the updated subtree. The following pseudocode outlines a simplified recursive remove function for a persistent HAMT with 5-bit hash chunks, bitmap-indexed array nodes, and leaf/collision children:
function remove(node, key, shift = 0):
    if node is Empty:
        return Empty
    if node is Leaf:
        if node.key == key:
            return Empty  // Remove matching leaf
        else:
            return node  // No match
    if node is Collision:
        new_pairs = []
        for pair in node.pairs:
            if pair.key != key:
                new_pairs.append(pair)
        if new_pairs.empty():
            return Empty
        else:
            return new Collision(new_pairs)
    // Internal node
    bit = (hash(key) >>> shift) & 31
    index = popcount(node.bitmap & ((1 << bit) - 1))
    if (node.bitmap & (1 << bit)) == 0:
        return node  // No child at bit
    child = node.children[index]
    new_child = remove(child, key, shift + 5)
    if new_child == child:
        return node  // Unchanged
    if new_child is Empty:
        // Prune empty child: clear bit, compact array
        new_bitmap = node.bitmap & ~(1 << bit)
        new_children = copy(node.children)
        remove_from_array(new_children, index)  // Shift subsequent
        if new_bitmap == 0:
            return Empty  // Node now empty
        // Optional contraction: if only one child left, inline it
        if popcount(new_bitmap) == 1:
            remaining_bit = lowest_set_bit(new_bitmap)
            remaining_index = 0  // Since only one
            return new_child_of_remaining  // Inline logic simplified
        return new Internal(new_bitmap, new_children)
    else:
        // Update with new child; no contraction needed unless singleton
        new_children = copy(node.children)
        new_children[index] = new_child
        // Optional contraction if singleton
        if node.children.length == 1 and not changed before:
            return new_child  // Inline
        return new Internal(node.bitmap, new_children)
This recursive approach handles removal from leaves and collisions, propagates prunings upward, and optionally contracts singletons, ensuring O(log_{32} n) in practice due to bounded depth.

Properties

Advantages

Hash array mapped tries (HAMTs) excel in supporting persistent data structures, where updates produce new immutable versions without modifying the original. This is achieved through copying, which creates new nodes only along the path from the root to the affected leaf, requiring O(log n) time and space per update and enabling efficient versioning for applications like and transactional memory. HAMTs offer superior space efficiency compared to traditional sparse arrays or chained hashing schemes, employing to represent populated children in nodes, which avoids allocating space for empty slots. On average, this results in approximately 1.28 words of memory per entry for a typical 32-bit with a of 32, significantly reducing overhead in dense collections. In terms of performance, HAMTs provide predictable access patterns via -based indexing, where traversal follows fixed bit slices of the key's , loading compact nodes (typically 2 words) entirely into lines. This contrasts favorably with the irregular pointer chasing in linked lists of traditional tables, leading to fewer misses and faster traversals, with empirical improvements up to 6.7 times in iteration speed over alternative immutable implementations. For concurrent environments, HAMTs lend themselves to lock-free implementations, utilizing atomic (CAS) operations on root pointers to update the without locks, ensuring for at least one and avoiding contention hotspots. This supports scalable, non-blocking operations in multithreaded settings, as demonstrated in concurrent hash trie variants. Relative to alternatives like persistent red-black trees, HAMTs demonstrate practical superiority for hashable keys, offering constant expected-time operations with simpler node s and lower memory usage, while maintaining comparable or better performance in immutable contexts without the balancing overhead of tree rotations.

Performance Analysis

The performance of hash array mapped tries (HAMTs) is characterized by average-case constant-time operations for lookup, insertion, and deletion, achieved through a fixed high that limits to a small constant regardless of the number of elements stored. With a typical of 32, using 5 bits of the per level on 32-bit hashes, the maximum depth is approximately 6-7 levels, yielding an effective O(1) average for these operations under uniform hash distributions. Worst-case performance can degrade to O(n) in pathological scenarios with collisions, but good functions reduce this to O(log n) in practice, where the logarithm base is the . Space complexity for HAMTs is O(n), where n is the number of key-value pairs, with efficient node representations using bitmaps to track occupied children, resulting in low overhead constants. For a 32-way branching factor, each internal node requires a 32-bit bitmap (4 bytes) plus pointers to sparse children, leading to approximately 1.28n space usage with a resize factor of 4. Optimized variants can reduce memory footprint by up to 64% compared to standard implementations through canonical representations and cache-aware designs. Empirical benchmarks from foundational evaluations demonstrate HAMTs outperforming alternatives like chained hashing and ternary search trees (TSTs). In tests with key sets ranging from 8K to 8M elements, HAMT search times averaged 0.81-1.45 μs, compared to 1.00-2.49 μs for , while insertions were 3.55-5.22 μs versus 2.75-4.51 μs, showing HAMTs 3-4x faster than with 60% less space. Modern JVM implementations report 1.3-6.7x iteration speedups and 3-25.4x faster equality checks relative to Clojure's HAMTs, with up to 48% faster iterations over Scala's collections. Key factors influencing HAMT performance include hash function quality and selection. Universal hashing outperforms simpler functions like ELF or PJW by minimizing collisions, with rehashing extending effective hash bits if needed to bound worst-case depths. A of 32 suits 32-bit systems for efficient and locality, while 64 is preferable on 64-bit architectures to leverage word-sized operations, though it increases size from 32 to 64 bits per node; the choice balances traversal speed against memory overhead, with 32 often yielding 2-5x faster persistent map operations overall.

Implementations

In Functional Programming Languages

In Clojure, the PersistentHashMap in the clojure.core namespace implements a hash array mapped trie (HAMT) as the foundational structure for all persistent maps and sets, having been introduced with Clojure version 1.0 on May 4, 2009. This design enables efficient immutable updates while maintaining O(log n) access times, and it forms the backbone of Clojure's collection library, where maps are automatically converted to PersistentHashMap instances beyond a small threshold size for optimal performance. Optimizations include support for chunked sequences, which allow iterative operations over map entries to process elements in batches, reducing overhead in functional pipelines. Furthermore, Clojure employs a hash mixing function based on Murmur3 to enhance bit distribution and mitigate poor hash code clustering in the trie, ensuring more uniform key spreading across nodes. PersistentHashMap also integrates seamlessly with Clojure's transducers, a composable model for data transformation that operates directly on the collection's reduction protocol without intermediate realizations, allowing efficient processing of map data in streaming contexts. This alignment with Clojure's emphasis on immutability and leverages HAMT's persistence benefits, such as sharing structure between versions to minimize memory allocation during updates. In , the immutable HashMap in the standard collections library adopts a compressed HAMT variant starting from Scala 2.13, providing efficient persistent key-value storage with improved cache locality and reduced compared to traditional hash tables. This implementation builds on Phil Bagwell's original Scala prototype from his 2001 work on ideal hash trees, which demonstrated the structure's viability for functional languages. Persistent variants are further available in libraries like Scala's own collection extensions, supporting immutable operations in concurrent and scenarios. Haskell's unordered-containers library utilizes HAMT for its HashMap implementation, offering high-performance, hash-based lookups and insertions in a purely functional setting where immutability is inherent to the . This approach contrasts with Haskell's built-in containers package, which uses ordered structures like AVL for Map and a patricia trie for IntMap; instead, unordered-containers prioritizes average-case O(1) operations via hashing while preserving and through shared immutable structures. The design exploits Haskell's and purity to enable efficient and folding over HAMT nodes without side effects.

In Imperative and Other Languages

In and other JVM-based languages, Hash Array Mapped Tries (HAMTs) have been optimized for high performance and low memory usage, particularly through adaptations that support mutable operations in imperative contexts. A key advancement is the Compressed Hash-Array Mapped Prefix-tree (CHAMP), which enhances traditional HAMTs with improved cache locality via dual bitmaps for data and node mapping, reducing empty slots and enabling a canonical representation. This design achieves up to 64% memory savings for maps compared to Scala's immutable HashMap and 1.3–6.7x faster iteration speeds, making it suitable for imperative applications requiring efficient mutable updates. CHAMP has been integrated into the Rascal programming language runtime and evaluated for Truffle's model, where it delivers 9.9–28.1x speedups in algorithms like computing dominators. Custom HAMT implementations in , often derived from Phil Bagwell's foundational work, support in-place mutations for imperative use cases, though they typically retain persistence options for hybrid scenarios. In C++, HAMT adaptations emphasize mutable, space-efficient structures to leverage the language's performance-oriented features. A template-based implementation provides constant-time add and delete operations without rehashing, using 32- or 64-bit hashes split into 5-bit segments for bitmap-indexed arrays, resulting in an expected of O(log₂(2⁵n)). This approach occupies just 8 bytes for an empty 32-bit table, prioritizing low overhead for . Proposals for standardizing HAMTs in and C++ containers highlight their advantages over red-black trees, with reference implementations showing faster lookups than std::set for datasets up to 100,000 integers and competitive performance against unordered_set when optimized with search for collisions. These mutable variants facilitate in-place updates, contrasting with purely persistent designs, and are explored for high-throughput applications like financial data processing. In , the 'im' crate provides persistent, immutable collections including HashMap and HashSet implemented using HAMT, enabling efficient structural sharing for patterns in a systems language. Go implementations of HAMTs offer both persistent and mutable modes to accommodate imperative styles. One library provides transient (in-place mutable) and functional (immutable, ) variants, using 32- or 64-bit FNV hashes divided into 5-bit chunks for a of 32 and maximum depth of 6–12 levels, ensuring O(1) search and modification. The transient mode enables faster imperative updates without guarantees, while the persistent mode supports concurrency via shared immutable structures. Another implementation focuses on immutable maps and sets for memory efficiency, drawing from JVM precedents like . These are used in key-value stores requiring persistent snapshots alongside mutable operations. Imperative adaptations of HAMTs often incorporate mutability through in-place node modifications, avoiding full paths to reduce overhead in non-persistent scenarios. Concurrent variants employ techniques like fine-grained locks per node or (RCU) for read-mostly workloads, enabling lock-free reads while allowing atomic updates. For instance, lock-free concurrent HAMTs use hazard pointers or epoch-based reclamation to manage shared mutable state safely. In databases and high-performance servers, HAMTs serve as efficient backends for key-value stores, supporting persistent versioning and fast lookups. The IPLD HashMap specification employs HAMTs for content-addressed storage in IPFS, a distributed system handling large-scale data replication across servers with minimal memory duplication during updates. These structures excel in scenarios demanding concurrent access and immutability for reliability, such as snapshot-based querying in distributed databases.

References

  1. [1]
    [PDF] Ideal Hash Trees - LAMP – Programming Methods Laboratory
    The Hash Array Mapped Trie (HAMT) is based on the simple notion of hashing a key and storing the key in a trie based on this hash value. The AMT is used to ...
  2. [2]
    Clojure - Data Structures
    ### Summary of Persistent Hash Maps and HAMT in Clojure
  3. [3]
    PersistentHashMap, Part 1 – Making a hash of things | ClojureCLR
    Jul 2, 2024 · The first of several posts on implementing immutable, persistent Hash Array Mapped Tries. This post describes the data structure at a high ...
  4. [4]
  5. [5]
    [PDF] Efficient Immutable Collections - Michael Steindorfer
    sets and hash-maps is the Hash-Array Mapped Trie (HAMT) [Bag01]. HAMTs are wide and shallow search trees that encode the hash codes of the elements ...
  6. [6]
    [PDF] Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable ...
    Figure 1e shows an equivalent and collision-free array-based hash set, with prime number table size 7 and load factor of 75 %. in the left top corner of ...Missing: refinements | Show results with:refinements
  7. [7]
    [PDF] Lock-Free Resizeable Concurrent Tries
    In this paper we describe in detail a non-blocking implementation of the hash array mapped trie. Our contributions are the following: 1. We introduce a ...
  8. [8]
  9. [9]
    Scala Standard Library 2.13.0 - scala.collection.immutable.HashMap
    This class implements immutable maps using a Compressed Hash-Array Mapped Prefix-tree. See paper https://michael.steindorfer.name/publications/oopsla15.pdf for ...Missing: HAMT | Show results with:HAMT
  10. [10]
    unordered-containers: Efficient hashing-based container types
    Oct 2, 2025 · Efficient hashing-based container types. The containers have been optimized for performance critical use, both in terms of large data quantities and high speed.Missing: HAMT | Show results with:HAMT
  11. [11]
    chaelim/HAMT: Hash Array Mapped Trie (C++ Templates) - GitHub
    C++ Template class implementation of Hash Array Mapped Trie. Do you want space-efficient and fast hash table? HAMT is just here for you.
  12. [12]
    The Holy Grail - A Hash Array Mapped Trie for C++ - Phil Nash
    Dec 6, 2017 · C++ has a handful of associative containers. We started with set and map, both based on node-based red-black trees.<|control11|><|separator|>
  13. [13]
    lleo/go-hamt: Go implementation of a Hash Array Map Trie - GitHub
    HAMT makes this happen by first hashing the []byte slice into either a 32bit or 64bit hash value. We then split this hash value up into a fixed number of parts.
  14. [14]
    raviqqe/hamt: Immutable and Memory-Efficient Maps and Sets in Go
    This package hamt provides immutable collection types of maps (associative arrays) and sets implemented as Hash-Array Mapped Tries (HAMTs).Missing: cespare/ | Show results with:cespare/
  15. [15]
    HAMT: Hash Array Mapped Trie. This data structure makes efficient ...
    HAMT: Hash Array Mapped Trie. This data structure makes efficient immutable data possible. You can update a list of a million items, and keep a reference to the ...
  16. [16]
    Specification: HashMap - IPLD
    The IPLD HashMap is constructed as a hash array mapped trie (HAMT) ... The HAMT algorithm hashes incoming keys and uses incrementing subsections of that hash ...