Fact-checked by Grok 2 weeks ago

Database index

A database index is an optional data structure associated with a database table that speeds up data retrieval operations by providing a fast access path to rows, reducing the need for full table scans. It functions similarly to an index in a book, allowing the database management system (DBMS) to locate specific data efficiently without examining every record. Database indexes are particularly valuable for queries involving frequent searches, joins, sorting, or filtering on specific columns, as they enable the DBMS to seek directly to relevant row locations on disk. Common types include indexes, which are the default in many systems and support range queries and equality searches on ordered data; bitmap indexes, used for low-cardinality columns to represent multiple rows per value efficiently; hash indexes, suitable for exact-match queries on unordered data; and full-text indexes, designed for searching text content. Specialized variants, such as function-based or spatial indexes, address advanced use cases like computed expressions or geographic data. While indexes significantly enhance read performance—especially for selective queries that access a small fraction of rows—they incur costs in terms of additional storage space and overhead during insert, , or delete operations, as the index must be updated to reflect changes. Effective design requires balancing query patterns, data distribution, and write frequency to avoid performance degradation, often involving tools for and optimization provided by the DBMS.

Fundamentals

Definition and Core Concepts

A database table consists of rows, each representing a or containing values for predefined columns, serving as the foundational unit for organizing and storing in a management system (RDBMS). A database index is a that improves the speed of operations on a database at the cost of additional writes and storage space to maintain the index during data modifications. At its core, an index stores values extracted from one or more indexed columns—often sorted in tree-based indexes—along with pointers or references—such as row identifiers (ROWIDs) or tuple identifiers—to the actual rows in the . These index entries typically form pairs of (, row locator), enabling efficient to the corresponding rows without scanning the entire . In tree-based indexes, the is organized into internal nodes, which contain and pointers to child nodes for branching during searches, and leaf nodes, which hold the actual (, row locator) pairs pointing directly to the data. For heap-organized , where rows are stored in an unordered manner without a clustered dictating physical order, the exists as a separate from the itself, providing an auxiliary lookup . The concept of database indexes originated in early hierarchical database systems like IBM's Information Management System (IMS), developed in the 1960s for handling complex data in mainframe environments such as NASA's . Indexes evolved significantly in the 1970s with the advent of the , proposed by E. F. Codd, which emphasized declarative querying and integrated indexing to support efficient access in set-based operations on normalized tables.

Primary Purposes and Benefits

Database indexes primarily serve to enhance the efficiency of operations, particularly for SELECT queries, by enabling targeted lookups instead of exhaustive full scans. Without an index, querying a requires scanning all rows sequentially, which has a of O(n) where n is the number of records. In contrast, structures like B+-trees, commonly used in indexes, reduce this to O(log n) by traversing a balanced tree to locate relevant pointers, significantly accelerating access for equality, range, and other selective searches. Key benefits include substantial reductions in input/output (I/O) operations, as indexes minimize the number of disk pages read—often limiting accesses to a logarithmic number of nodes rather than the entire —and support faster and grouping by maintaining ordered values. For instance, in a B+-tree index with a of 100 for a million- table, search time involves at most about 4 disk accesses compared to potentially millions in a . These gains are most pronounced with high-selectivity indexes, where the indexed attribute has many values, filtering to few rows per and maximizing the of query execution. Low-selectivity attributes, however, may yield limited benefits due to the need for additional post-filtering. Beyond performance, unique indexes enforce uniqueness constraints, such as for primary keys, by preventing duplicate entries. Indexes also support for relationships by enabling efficient checks to ensure valid references between tables. This automatic enforcement maintains relational consistency without additional application logic. However, these advantages come at a space-time : indexes incur additional overhead plus costs during inserts, updates, and deletes that can slow write operations, though the query speedups often amortize these over repeated reads.

Usage in Database Operations

Accelerating Query Lookups

Database indexes accelerate query lookups by enabling efficient data retrieval without scanning entire tables, primarily through two mechanisms: index seeks and index scans. An index seek involves traversing the index structure, such as a B+-tree, to directly locate specific key values, which is ideal for exact-match or point queries. This process starts at the root node and follows pointers down to the leaf level, minimizing disk I/O by accessing only the necessary branches and leaves. In contrast, an index scan retrieves a range of values by scanning the leaf nodes sequentially after locating the starting point via a seek, which is more suitable for range queries like those using greater-than or less-than operators. For instance, in a clustered index on a salary column, a seek finds the first entry above a threshold, followed by a scan of consecutive leaves to gather all qualifying records. The query optimizer plays a central role in deciding when to use indexes by employing cost-based analysis to estimate execution plans. It evaluates factors such as index selectivity, table size, and available statistics from the to choose between , , full scans, or index-only , where all required data is fetched directly from the without accessing the base . An index-only , also known as a covering operation, occurs when the includes all attributes needed for the query, such as selecting a non-key column included in the , thereby avoiding costly lookups. For example, consider the SQL query SELECT name FROM employees WHERE department_id = 5; on a with a non-clustered on department_id that includes the name column: the optimizer performs an to find matching keys, retrieves names directly from the leaves, and returns results without probing the . Indexes also enhance join operations by accelerating lookups in nested loop and hash joins. In an index nested loop join, the outer relation's tuples probe an index on the inner relation's join column, enabling seeks to fetch only matching inner tuples rather than scanning the entire inner . This is particularly effective for selective joins where the outer relation is small, with costs scaling as the number of outer tuples times the index probe cost. Hash joins can similarly benefit if indexes support hash-based partitioning, though nested loops often leverage tree indexes for precise matches. Key performance metrics for index-accelerated queries include index hit rate, which measures the proportion of index accesses satisfied from rather than disk, and buffer pool cache efficiency, which tracks how effectively the database's in-memory retains frequently accessed index pages to reduce I/O . High index hit rates, often above 90% in optimized systems, indicate effective and can reduce query times by orders of magnitude compared to disk fetches.

Enforcing Data Constraints

Database indexes play a crucial role in enforcing constraints during insert, update, and delete operations by providing efficient mechanisms to validate proposed changes against predefined rules. When a write operation occurs, the database (DBMS) traverses the relevant to check compliance; if a violation is detected, the operation is rejected, and the transaction is typically rolled back to maintain . Unique indexes are essential for implementing and constraints, which prevent duplicate values in specified columns or combinations thereof. For constraints, the DBMS automatically creates a unique (often a structure) on the constrained columns to enforce non-duplication; during an insert or , the system searches the for an existing match, rejecting the operation if one is found. constraints combine this uniqueness with a NOT NULL requirement, leveraging the same unique for duplicate checks while separately enforcing the null prohibition. For instance, in a users , a unique on the column ensures that no two rows share the same by validating against the on every insert or to that field. Indexes also support foreign key (referential integrity) enforcement by accelerating the verification of relationships between tables. While the referenced column in the parent table typically already has a unique or index for efficient lookups during child table inserts or updates, an index on the foreign key column in the child table optimizes cascading operations, such as deletes or updates from the parent that propagate to the child. Without such an index, referential checks—scanning for matching rows in the child table—can become inefficient, potentially leading to full table scans; thus, manually creating an index on foreign key columns is recommended to ensure quick identification and handling of dependent records. Check constraints, which enforce domain-specific rules like value ranges or patterns on individual rows, do not directly rely on indexes for enforcement, as validation involves evaluating a Boolean expression per affected row during writes. However, for complex check constraints that incorporate subqueries or lookups against other tables (e.g., ensuring a value exists in a referenced table), indexes on the involved columns can indirectly speed up the validation process by enabling faster data retrieval. In such cases, the DBMS uses the index to efficiently resolve the subquery, reducing the overhead of constraint evaluation without altering the core per-row logic.

Index Types

Bitmap Indexes

A bitmap index is a specialized database index structure that employs bit vectors, known as bitmaps, to represent the presence or absence of distinct values in a column across all rows of a . For each unique value in the indexed column, a separate bitmap is created, where each bit corresponds to a specific row in the table; a bit set to 1 indicates that the row contains that value, while 0 indicates it does not. This design replaces traditional lists of row identifiers (rowids) with compact bit arrays, and a mapping function translates bit positions back to actual rowids for . For instance, in a table with 1 million rows and a column having 100 distinct values, the index consists of 100 bitmaps, each comprising 1 million bits, resulting in a total raw size of approximately 12.5 before compression. Bitmap indexes are created on columns exhibiting low cardinality, meaning a small number of distinct values relative to the total row count, such as (e.g., 'M', 'F'), , or product categories. The creation process involves a SQL statement like CREATE BITMAP INDEX, which builds the bitmaps by scanning the table and setting bits accordingly; this is particularly efficient for large tables due to the index's simple physical structure compared to tree-based indexes. To optimize storage, bitmaps are often compressed using techniques like (RLE), which exploits runs of identical bits (e.g., sequences of 0s or 1s) common in low-cardinality columns, reducing the index size to a fraction of the uncompressed form—sometimes as low as 2-5% of the original bitmap length in analytical datasets. Query operations on bitmap indexes leverage bitwise logical operations for efficient filtering. Single-column equality or membership queries involve selecting the corresponding and identifying set bits to retrieve matching rows. For multi-column queries, bitmaps from different indexes are combined using AND (intersection for conjunctive conditions), OR (union for disjunctive conditions), or XOR operations, enabling rapid evaluation of complex WHERE clauses without scanning the entire . Aggregations, such as (DISTINCT) or grouping by low-cardinality attributes, benefit from bitmap intersections to compute cardinalities directly from bit counts, avoiding row-by-row processing. These operations are particularly performant in parallel environments, as bitmaps can be divided across processors. The primary advantages of bitmap indexes lie in their space efficiency and suitability for online analytical processing (OLAP) workloads, where queries are read-heavy and involve aggregations over large datasets. In data warehouses employing star schemas, bitmap indexes on dimension table columns (e.g., customer demographics or time periods) facilitate fast joins and filters against massive fact tables, reducing query times from hours to seconds by minimizing I/O through compressed bitmaps. For example, on a column with three values across millions of rows, a bitmap index occupies about 570 KB versus 13 MB for a comparable index, while supporting dynamic ad-hoc queries with multiple conditions. Unlike dense indexes that store entries for every row, bitmap indexes are inherently sparse in representation for low-cardinality data, emphasizing conceptual efficiency over pointer density. Despite these benefits, bitmap indexes have notable drawbacks, particularly their poor performance on high-cardinality columns (e.g., unique identifiers) where the number of bitmaps approaches the row count, leading to excessive storage and slower construction. They are also unsuitable for (OLTP) environments with frequent updates, inserts, or deletes, as modifying a single row requires updating multiple bitmaps, potentially causing locking contention and deadlocks in concurrent scenarios; Oracle recommends them only for environments with batch updates, such as nightly loads in data warehouses.

Dense and Sparse Indexes

A dense index provides an index entry for every row in the table, irrespective of the data's physical order, ensuring complete coverage and a direct pointer to each record's location. This structure facilitates rapid lookups for exact matches and supports queries efficiently by allowing sequential traversal through the index entries. However, the comprehensive nature of dense indexes leads to higher overhead and increased costs during insertions, deletions, or updates, as every change may require modifying the index. In contrast, a sparse index contains entries only for a of rows, typically one entry per , page, or group of sorted , which assumes the underlying is ordered by the index . This approach points to the starting position of each rather than individual , enabling the system to scan sequentially within the to find the target row. Sparse indexes are commonly employed in scenarios where data is clustered and sorted, such as in primary indexes, reducing the overall index size and maintenance effort compared to dense variants. The primary between dense and sparse indexes lies in their balance of access speed and : dense indexes excel in unsorted files for point queries due to their exhaustive , while sparse indexes suit sorted, clustered for operations by minimizing —often using significantly less space, especially in large tables where block-level entries can reduce index footprint by factors related to block . Dense indexes are preferable for small tables or applications with frequent exact-match searches, whereas sparse indexes are advantageous for bulk in voluminous, ordered datasets. Historically, sparse indexes gained prominence in early database and file systems, particularly through the Indexed Sequential Access Method (ISAM), where they indexed the highest key per or to support efficient retrieval in relatively static environments without excessive overflow handling. In modern management systems, dense indexes dominate secondary indexing on non-clustered tables for versatility, while sparse designs persist in clustered index architectures to leverage data ordering for optimized performance. Sparse indexes are often integrated with clustered index structures to enhance efficiency in sorted data layouts.

Inverted and Reverse Indexes

An is a that maps from a collection of documents to the locations where those terms appear, enabling efficient and . Unlike forward indexes that map documents to their contents, inverted indexes reverse this mapping, storing a of unique alongside postings lists that record document identifiers (docIDs) and optionally term positions or frequencies within those documents. This structure supports rapid query processing, such as identifying all documents containing specific words or phrases. The core components include the term dictionary, a sorted of unique terms often stored in for fast lookup, and the postings lists, which are disk-based lists of docIDs sorted in ascending order to facilitate efficient intersections during multi-term queries. For example, in a of Shakespeare's works, the term "Brutus" might map to a postings list like {2, 6, 13, 16, 23}, indicating the docIDs where it appears, with additional position data such as {2:<1,4,17>, 6:<2,5>} to support phrase queries like "Brutus AND Caesar." Construction begins with tokenization, which breaks text into individual terms by removing punctuation and splitting on whitespace—for instance, converting "Friends, Romans, countrymen" to "Friends Romans countrymen." Stemming then normalizes variants to a common root, such as reducing "organize," "organizes," and "organizing" to "organ," potentially shrinking the vocabulary by about 17% while preserving query relevance. The process sorts and merges term-docID pairs to build the index, often using gap encoding in postings lists to compress consecutive docIDs (e.g., docIDs 1,2,4 become gaps 1,1,2). Inverted indexes power full-text search in systems like , an open-source library that uses them to index and query large text corpora with support for ranked retrieval, proximity searches, and field-specific queries. In Lucene, the enables sub-second searches over billions of documents by combining term dictionaries with compressed postings for scalability. To optimize storage and query speed, postings lists employ compression techniques such as variable-byte encoding, which represents integers with 7-bit payloads per byte, reducing the Reuters-RCV1 corpus index from 250 MB uncompressed to 116 MB. Skip lists further accelerate operations like intersecting postings for multi-term queries by adding pointers that allow jumping over irrelevant docIDs, achieving roughly √P skips for a list of length P and cutting intersection time significantly for common query patterns. A , also known as a reverse key index, reverses the bytes of key values in an index to alter behavior, particularly useful in composite indexes for prioritizing range scans on rightmost columns. In systems like Google Cloud Bigtable, where row keys are composite strings sorted lexicographically, byte reversal on earlier components groups data to favor scans on later ones—for example, storing a (last_name, first_name) as (reversed(first_name bytes), last_name) shifts primary to last_name while maintaining first_name order within groups via reversal decoding during queries. This technique enables efficient range scans on the prioritized column without restructuring the entire key schema; for instance, in time-series data, appending a reversed to a ID (e.g., user123#reversedTimestamp) allows scans from most recent to oldest by querying suffixes, spreading writes across nodes and avoiding hotspots from monotonic keys. Applications include databases for multi-column queries, such as retrieving records by ranges in a user directory while secondarily filtering by reversed first names to support descending order within matches.

Primary and Secondary Indexes

In relational databases, a primary index is established on the of a , which serves as a for each row and dictates the physical or logical ordering of the data records in the file. This index is inherently unique and enforces non-null values, ensuring that no duplicate or missing entries exist, thereby maintaining . Often implemented as a clustered index, the primary index organizes rows sequentially by the primary key value, allowing efficient and range queries aligned with the 's order. In contrast, a secondary index is created on non-primary key columns to provide auxiliary access paths for queries that do not involve the primary key. Unlike the primary index, secondary indexes do not determine the physical order of the table and can be multiple per table, pointing to the actual rows via references to the primary key values or row identifiers (RID). For instance, in a user table with a primary index on an ID column, a secondary index on the email column would map email values to corresponding ID values, enabling quick lookups by email without scanning the entire table. The relationship between primary and secondary indexes is interdependent: secondary index entries store primary key values to facilitate navigation to the correct row location, leveraging the primary index for final record retrieval. While primary indexes are always unique due to their tie to the , secondary indexes may be unique (e.g., on a ) or non-unique, allowing duplicates for attributes like categories or names. In systems, the primary index plays a crucial role in supporting properties, particularly consistency and isolation, by enforcing uniqueness and during transactions to prevent conflicts and ensure reliable data modifications.

Hash Indexes

Hash indexes employ a data structure to map search keys to data records through a , which computes an index into an array of buckets or slots. Each bucket can hold multiple entries via chaining, typically using overflow pages or linked lists to resolve collisions when multiple keys hash to the same slot. This organization enables direct access to records without traversing ordered structures, making hash indexes particularly efficient for point queries. In implementations like , the index stores only 4-byte hash values of the indexed column rather than the full key, allowing indexing of large data types such as UUIDs or URLs without size restrictions. Static hashing variants allocate a fixed number of primary at creation, with the designed to distribute keys evenly across these buckets. When a bucket overflows due to insertions, additional records are appended to overflow chains, which can degrade if chains grow long. To address capacity issues from data growth, the entire may require periodic reorganization, involving a full rehash of all entries into a larger set of buckets; this process is computationally expensive and disrupts availability. As described in standard database textbooks, static hashing suits scenarios with predictable, stable data volumes but incurs trade-offs in maintenance overhead similar to static file organizations like ISAM. Dynamic hashing techniques, such as , mitigate these limitations by allowing the index to grow or shrink incrementally without full rehashing. Introduced by Witold Litwin in , linear hashing uses a family of hash functions that evolve with the structure's size, employing a split pointer to designate the next bucket for division when the load factor exceeds a threshold, typically 80-90%. New buckets are appended sequentially, and records are redistributed only from the split bucket, ensuring average access times remain near constant—often 1-2 disk accesses for searches and insertions under typical bucket capacities of 50 or more. This pseudo-partitioning approach supports high load factors and minimal directory overhead, using just a few bytes of to track the current level and split position. Operations on hash indexes achieve average O(1) for equality-based lookups, as the directly computes the location, followed by a linear of any . Insertions and deletions similarly involve hashing to the target and updating the , though deletions may leave dead entries that require periodic cleanup via operations like to reclaim space and prevent fragmentation. However, hash indexes do not support queries, comparisons, or , as the hashed values disrupt ; attempts to use them for such operations revert to full s. Concurrency control in hash indexes often relies on -level locking to manage splits and insertions safely. Hash indexes find prominent use in in-memory databases for fast equality lookups on unique identifiers, such as user IDs in caching layers. For instance, leverages hash tables as a core data type to store field-value pairs under a single key, enabling sub-millisecond retrievals for session data or configuration objects in high-throughput applications. In disk-based systems like SQL Server's memory-optimized tables, hash indexes provide fixed-memory footprints for OLTP workloads dominated by point queries, avoiding the overhead of ordered indexes. Key limitations include the inability to handle range scans or ordered , leading to inefficiencies in analytical queries, and potential space wastage from uneven hash distributions or residual dead tuples after deletions, which can inflate storage until maintenance runs. High collision rates from poor functions may also increase chain lengths, approaching worst-case performance, though modern implementations mitigate this with tunable sizes and quality algorithms.

Index Architectures

Clustered Indexes

A clustered index determines the physical order of rows in a , storing the rows themselves in the leaf nodes of the index structure rather than pointers to separate data pages. In systems like SQL Server, when a has a clustered , it is referred to as a clustered , with data rows sorted and stored based on the index key values. Similarly, in MySQL's storage engine, every possesses a clustered that holds the row , typically aligned with the . Only one clustered is permitted per , as the physical data ordering can exist in only a single sequence. The structure of a clustered index is commonly implemented as a , where non-leaf nodes contain index keys and pointers to child nodes, while leaf nodes store the complete data rows in sorted order by the index key. This design ensures that traversing the tree leads directly to the physical location of the row data, eliminating the need for additional lookups. In SQL Server's rowstore indexes, the facilitates this by placing data rows at the leaf level. follows a comparable B-tree organization, with the clustered index pages containing all user-defined columns plus system fields like transaction IDs. Clustered indexes provide significant benefits for query performance, particularly in range scans and sequential access patterns, as the sorted physical order enables efficient retrieval without random I/O jumps. For instance, in time-series data sorted by timestamp as the clustered key, queries filtering recent dates can leverage sequential page reads for faster processing. This architecture often reduces disk I/O compared to non-indexed or heap-organized tables, especially for large datasets. Clustered indexes are typically created implicitly when defining a constraint, such as via ALTER TABLE ADD PRIMARY KEY in SQL Server, which enforces uniqueness and non-null values while organizing the table. Manual creation is possible using CREATE CLUSTERED INDEX statements for non-primary keys, though primary keys are the standard choice. In , the clustered index forms automatically on the primary key; absent that, it defaults to a unique non-null index or a generated row ID. Despite their advantages, clustered indexes introduce trade-offs in maintenance operations. Insertions and deletions can be slower because they require maintaining the physical sort order, potentially causing page splits when new data exceeds page capacity—typically leaving space (e.g., 1/16th in ) to mitigate this, but frequent changes still increase overhead. The restriction to one per table limits flexibility for multiple access patterns, necessitating careful key selection to align with predominant queries.

Non-Clustered Indexes

A non-clustered index is a separate from the base table, typically implemented as a B+ tree, where the leaf nodes contain the index values along with row locators that point to the actual rows. The row locator is either a row identifier (RID), which consists of the file, page, and slot location for heap-organized tables, or the clustered index for tables with a clustered index. This design allows the index to reference without reorganizing the table's physical storage. Relational database systems support multiple non-clustered indexes on a single , often up to dozens in practice and technically as many as 999 in systems like SQL Server, enabling optimized access for various query patterns. For instance, a of records might have one non-clustered on the name column for name-based searches and another on the city column for location-based queries. In and , similar indexes can be created on multiple columns without limit beyond storage constraints. To retrieve data using a non-clustered index, the database engine performs an index seek to locate the relevant keys in the index structure, followed by a key lookup (also known as a bookmark lookup) to fetch the full row from the table using the row locators. This two-step process efficiently supports equality and range queries on indexed columns. In PostgreSQL, the ctid system column serves as the row locator for this purpose. Non-clustered indexes offer flexibility by allowing indexing on columns that do not determine the table's physical order, such as attributes used solely for filtering or joining, without affecting the underlying table's storage layout. They enable rapid data access for diverse workloads while preserving the table as a or maintaining separate ordering via a clustered . However, non-clustered indexes introduce overhead, including additional storage for the structure and the row locators, which can duplicate clustering keys across multiple indexes in systems like SQL Server, effectively acting as hidden columns in the index leaves. Maintenance during insert, update, and delete operations further contributes to CPU and I/O costs, as each affected must be updated separately.

Covering Indexes

A covering index is a type of database index that includes all columns required to satisfy a specific query, allowing the database management system (DBMS) to retrieve the necessary data directly from the index without accessing the underlying table or clustered index. This feature extends nonclustered indexes by incorporating non-key columns, often referred to as "included columns," which store the actual data values at the leaf level of the index structure. In structure, a covering index consists of key columns used for indexing and sorting, with additional non-key columns appended only to the leaf nodes to minimize overhead. For instance, in SQL Server, the syntax employs an INCLUDE clause, such as CREATE NONCLUSTERED INDEX IX_Employee_LastName ON Employees (LastName) INCLUDE (FirstName, Email);, where LastName serves as the key and FirstName and Email are included for coverage. Similarly, PostgreSQL supports this via the INCLUDE clause in B-tree indexes since version 11, as in CREATE INDEX idx_employee_lastname ON employees (lastname) INCLUDE (firstname, email);, enabling index-only scans. These included columns do not affect the index's uniqueness or search efficiency but provide complete query resolution. The primary benefit of covering indexes is enabling "index-only" or "index-only scans," which eliminate the need for additional row lookups, thereby reducing (I/O) operations and improving query performance. For example, in SQL Server, this can reduce logical reads from 14 to 4 for a covered query, representing a substantial I/O decrease, and lower estimated operator costs from around 12 to under 1 in execution plans. Such optimizations are particularly effective for read-heavy workloads on data, where visibility information confirms row accessibility without fetches. However, covering indexes introduce limitations, including increased storage requirements due to the inclusion of extra columns, which can lead to index bloat and higher maintenance costs during inserts, updates, or deletes. Not all DBMS fully support dedicated INCLUDE clauses; for example, pre-version 11 relies on expression indexes to approximate coverage, and support is limited to specific index types like in both SQL Server and . Over-inclusion of columns can also exceed index size limits, such as 900 bytes for keys in SQL Server or tuple size constraints in . Detection of covering index usage occurs in query execution plans, where an "Index Seek" or "Index Scan" appears without accompanying "Key Lookup," "Table Scan," or heap access, confirming that the query was fully resolved via the index. In , plans explicitly show "Index Only Scan" when conditions are met.

Design Considerations

Column Selection and Order

Selecting columns for indexing involves prioritizing those that frequently appear in query predicates, particularly WHERE clauses, to maximize query while minimizing and overhead. High-selectivity columns, which have a large number of distinct values relative to the total rows (enabling the index to filter out most rows), are ideal candidates. For instance, a column like employee ID with or near-unique values offers high selectivity, whereas a column with only two values has low selectivity and is generally unsuitable for standalone indexing. In composite indexes, the order of columns follows the leftmost prefix principle, where the database can utilize the only if the query conditions start from the leading column(s). Equality conditions (=) should precede conditions (>, <, BETWEEN) to allow efficient scans after pinpointing exact matches. For example, an on (department, ) supports queries like WHERE department = 'IT' AND > 50000 effectively, as the equality on department enables a range scan on ; however, a query on > 50000 alone cannot use this without scanning the entire structure. This equality--inequality pattern—placing equality filters first, followed by , and inequalities last—optimizes for common query patterns. Database management systems (DBMS) provide tools like and to assess selectivity during index design. Histograms capture the distribution of values in a column, enabling the query optimizer to estimate how many rows a will return, which is particularly useful for range-based queries. For example, collecting histogram statistics via commands like RUNSTATS in Db2 helps identify skewed distributions that affect selectivity estimates. Avoid indexing low-cardinality columns in isolation, as they lead to full index scans with little filtering benefit. Common pitfalls include over-indexing small tables, where the overhead of traversing the index structure may exceed the cost of a direct , potentially degrading . Additionally, indexing computed columns—derived expressions like total = price * quantity—requires them to be deterministic and persisted in some DBMS to avoid recomputation costs during queries and updates, but this can introduce blocking or maintenance challenges if not managed carefully.

Composite and Multi-Column Indexes

A composite index, also known as a multi-column or concatenated index, is created on two or more columns of a table, treating the combination of column values as a single for indexing purposes. In standard SQL, such an index is defined using the CREATE INDEX statement by listing the columns in a specific order within parentheses, for example: CREATE INDEX idx_example ON table_name (column1, column2);. This structure allows the database management system (DBMS) to organize the index entries in a sorted manner based on the concatenated keys, enabling efficient access for queries that reference the indexed columns. The effectiveness of composite indexes relies on prefix matching, where the index can be utilized only for queries that specify conditions starting from the leftmost (leading) columns in the defined order. For an index on columns (A, B, C), it supports searches like WHERE A = value or WHERE A = value AND B = value, but cannot be used for WHERE B = value alone without scanning the entire index. This leftmost-prefix principle optimizes range scans, equality checks, and sorting by allowing the DBMS to prune irrelevant branches early in the index traversal. Queries involving inequalities on the first differing column after equalities can also leverage the index efficiently. Composite indexes require more storage space than single-column indexes due to the larger from concatenating multiple values, which can slow down index traversal and insertion operations as the number of columns increases. To mitigate degradation, indexes are typically limited to 3-5 columns, though some systems allow more; excessive columns can lead to fragmented storage and higher maintenance overhead during data modifications. For columns, lengths may be specified to index only the initial characters, reducing key size while preserving utility for common queries. These indexes are particularly valuable for optimizing compound queries involving multiple conditions in WHERE, ORDER BY, or GROUP BY clauses, such as in applications indexing on (category, price, date) to accelerate product filtering and sorting by category first, then price range. By covering multi-attribute access patterns, they reduce the need for table scans and improve overall query throughput in relational workloads. DBMS implementations vary in constraints: supports up to 16 columns per index with prefix lengths for strings up to 3072 bytes in ; permits multiple indexes on the same columns if they differ in type or partitioning, up to a maximum of 32 columns, emphasizing query-specific ordering; allows up to 32 columns (configurable at build time) and recommends sparsity for broader applicability; SQL Server has no explicit column limit but advises against excessive columns due to storage impacts.

Implementation Aspects

Concurrency Control Mechanisms

In multi-user database environments, mechanisms ensure that multiple transactions can access and modify indexes simultaneously without compromising data consistency or integrity. These mechanisms primarily rely on locking protocols to manage read and write operations, preventing conflicts such as lost updates or dirty reads. Indexes, being auxiliary structures, often employ finer-grained locks compared to base tables to enhance parallelism, allowing operations on specific index entries or pages rather than entire structures. Locking in indexes typically uses two fundamental types: shared locks (S) for read operations, which permit multiple concurrent readers but block writers, and exclusive locks (X) for write operations like inserts, updates, or deletes, which grant sole access and block both readers and other writers. These locks can be applied at the index level, such as key or page locks on specific entries, contrasting with coarser table-level locks that encompass the entire table and its indexes, thereby reducing contention but potentially limiting . For instance, an update query might acquire an X lock on an while holding intent locks on higher levels, enabling concurrent operations on unrelated index portions. Deadlocks arise when transactions cyclically block each other, such as two updating the same page where one acquires an S lock on page A followed by page B, while the other does the reverse, creating a wait cycle. Prevention strategies include deadlock detection algorithms that monitor lock waits and resolve conflicts by aborting one as a , often based on factors like cost or deadlock priority. Timeouts provide another safeguard, where wait for a configurable period (e.g., via LOCK_TIMEOUT in SQL Server) before erroring out, avoiding indefinite blocks. Multi-Version Concurrency Control (MVCC) integrates with indexes in systems like by maintaining multiple versions of index entries tied to snapshots, allowing readers to access a consistent without acquiring locks that conflict with writers. Each index tuple includes visibility metadata (e.g., xmin and xmax IDs), enabling snapshot-based reads to ignore uncommitted or obsolete versions, thus avoiding reader-writer blocking entirely during queries. Distinguishing latches from locks is crucial for index operations: locks provide transaction-level, logical protection for data items like rows or keys, supporting features such as deadlock detection and rollback, whereas latches offer short-term, fine-grained synchronization for internal structures like buffer pages during index traversals or updates. For example, a B+ tree search might use read latches on pages for concurrent access while employing write latches exclusively during insertions to prevent corruption, with modes like reader-writer policies to balance throughput. To enhance scalability, partitioned indexes distribute lock contention across subsets of data, enabling partition-level lock escalation in systems like SQL Server, where operations on one partition do not block others, thus supporting higher concurrency in large-scale environments. SQL Server's row versioning further aids this by storing historical row versions in a tempdb-based version store for snapshot isolation, reducing lock duration for readers and minimizing contention on active indexes without altering the locking protocol for writers.

Index Maintenance and Overhead

The process of building a database index typically involves scanning the entire to extract the indexed column values, those keys, and constructing the index structure, such as a , which can require significant temporary sort space from memory or disk. In modern database management systems like SQL Server and , this creation process supports parallelism to distribute the workload across multiple processors, reducing completion time. For terabyte-scale tables, index creation can take several hours depending on hardware, data distribution, and configuration, as the and dominate the I/O and CPU demands. Updating indexes during data modification operations introduces substantial overhead, as each insert, update, or delete must not only alter the base but also maintain the index structure to preserve its ordering and balance. In indexes, inserts and deletes often trigger page splits or merges when nodes exceed capacity, requiring additional I/O to redistribute keys and pointers across pages, which can make these operations slower than equivalent changes to a non-indexed . This maintenance cost escalates with multiple indexes on the same , as every write operation must propagate changes to all relevant structures, increasing both CPU usage for key computations (e.g., hashing in hash indexes) and I/O for tree balancing. Index fragmentation arises primarily from page splits during inserts or updates, where a full is split into two, often leaving the new pages underutilized and resulting in space waste due to reduced (e.g., dropping to around 50% fullness post-split). This inefficiency inflates storage requirements and amplifies I/O during scans, as more pages must be read to retrieve the same amount of data. To mitigate fragmentation, database systems provide commands like REORGANIZE, which compacts pages online without rebuilding the entire , and REBUILD, which drops and recreates the index to fully restore and order, though at higher resource cost. In recent developments as of 2025, emerging techniques such as learned indexes, which use models to approximate index structures, offer potential for improved query performance and reduced maintenance overhead in modern database systems. Tuning index maintenance involves identifying and addressing underutilized or inefficient indexes to minimize ongoing overhead, such as dropping unused ones that still consume resources during writes. Administrators can monitor index usage through system views, for example, Oracle's DBA_INDEX_USAGE or V$INDEX_USAGE_INFO, which track access frequency and effectiveness since monitoring was enabled via ALTER INDEX ... MONITORING USAGE. Regular analysis of these statistics helps prioritize drops or rebuilds, balancing maintenance costs against benefits. Overall, while indexes significantly enhance read performance in read-heavy workloads, they impose additional overhead on insert operations due to the I/O and CPU demands of structure maintenance, underscoring the need for selective indexing in write-intensive environments.

Applications and Limitations

Real-World Use Cases

In (OLTP) systems, clustered indexes are commonly applied on primary keys such as identity columns to organize data physically and support efficient insertions and updates, as seen in banking applications where are clustered by unique transaction IDs to handle high concurrency. Secondary non-clustered indexes are then built on frequently queried columns, such as account numbers in tables, enabling rapid lookups for account-specific queries without scanning the entire table. For (OLAP) workloads in data warehouses, bitmap indexes are utilized on low-cardinality dimension columns to optimize set-oriented operations and aggregations, such as querying sales totals by region and year in a . Covering indexes complement this by including aggregated measures directly in the index structure, allowing queries to retrieve summary data without accessing the base table, which accelerates roll-up operations on sales data across time hierarchies. Full-text search applications leverage inverted indexes to map terms to document locations, facilitating fast keyword retrieval in large corpora; for instance, employs inverted indexes to efficiently search and analyze log files by enabling quick identification of entries containing specific error patterns or timestamps. Spatial databases for geographic information systems (GIS) use indexes to manage multidimensional data, such as points, lines, and polygons representing maps or routes, by organizing them into hierarchical bounding rectangles that support efficient range queries like finding all features within a given area. This structure, originally proposed for dynamic spatial searching, allows GIS applications to perform proximity searches on location data without exhaustive scans. In environments, document-oriented databases like support indexes on documents to query nested fields efficiently, such as indexing user profiles within a larger order document to filter by attributes like address components. Wide-column stores like use secondary indexes on non-partition key columns to enable queries beyond the primary access path, accommodating the flexible schema of wide-column models in scenarios like time-series data retrieval across distributed partitions.

Performance Trade-offs and Constraints

Database indexes introduce notable storage overhead, typically increasing the overall database size by 10-100% relative to the base , depending on the , data types, and number of indexes. This expansion arises because each index stores sorted keys, pointers, and sometimes row , duplicating portions of the table across multiple structures. Write operations suffer from amplification due to the need to synchronize changes across all relevant indexes. An insert, update, or delete on a row requires modifying not only the base but also every referencing the affected columns, potentially multiplying the I/O and CPU by the number of indexes involved. For instance, if a column participates in five indexes, the effective of an to that column can approach five times the base operation, as each must be rebalanced or rewritten. In , a single might touch one data page plus one page per , leading to 11 or more page writes in tables with multiple indexes. Indexes are often counterproductive in certain scenarios and should be avoided to prevent unnecessary overhead. Columns with low selectivity—those having few distinct values relative to the row count—yield indexes that cover most of the , offering minimal query speedup while imposing full costs during writes. Write-heavy workloads exacerbate this issue, as frequent modifications amplify the burden. Small tables rarely benefit from indexing, since full table scans remain faster than index lookups plus the ongoing update overhead. Optimization strategies like partial indexes help balance these trade-offs by indexing only qualifying rows, thereby reducing storage footprint and write costs. For example, an index on a status column filtered by WHERE active = 1 targets active records exclusively, avoiding unnecessary entries for inactive ones and speeding up relevant queries. In systems like , partial indexes apply a WHERE during creation to limit scope. SQL Server implements this via filtered indexes, which support simple predicates and can decrease index size by up to 90% for skewed data distributions. Emerging approaches, such as learned indexes, seek to alleviate traditional constraints by replacing rigid tree structures with models trained on data distributions. Google's 2017 proposal uses recursive models—essentially small neural networks—to approximate index lookups, achieving up to 3x space savings and 5-70% faster queries in benchmarks while easing in dynamic environments. These models treat indexing as a solved via cumulative distribution functions, offering a promising evolution for large-scale databases. Since 2020, research has advanced learned indexes for multi-dimensional data, , and disk-based systems, with ongoing developments as of 2025 improving adaptability and performance in diverse workloads.

Standardization and Evolution

Key Standards

The SQL standard, as defined by ISO/IEC 9075, primarily addresses database indexes through the specification of constraints such as and , which require underlying index structures for enforcement but do not prescribe explicit index creation syntax in early versions like . In (ISO/IEC 9075-1:1992), UNIQUE constraints ensure no duplicate non-null values across specified columns and are defined using the syntax UNIQUE (column_name_list) within table constraints, while constraints combine uniqueness with non-nullability to identify rows uniquely, using PRIMARY KEY (column_name_list), with only one allowed per table. These constraints implicitly rely on indexes for efficient validation, but the CREATE INDEX statement itself is absent from , leaving index management to database implementations. The SQL standard does not define explicit index creation syntax such as CREATE INDEX, leaving it to RDBMS implementations. Standardization of types like clustered or non-clustered is also absent from the core standard, and no dedicated syntax for hints specifying such behaviors exists in the core standard. Subsequent updates to ISO/IEC 9075, including SQL:2016 and the latest revision SQL:2023, extend definitions but do not introduce formal support for partial indexes (which limit indexing to subsets of rows based on predicates) or functional indexes (which index computed expressions); these features remain vendor-specific extensions rather than core standard elements. The standard emphasizes through constraints, with index details deferred to implementation levels, ensuring portability for constraint-based operations while allowing RDBMS flexibility in optimization. APIs like ODBC and JDBC provide standardized mechanisms for querying index metadata, facilitating cross-database without relying on syntax. In JDBC, the DatabaseMetaData.getIndexInfo() method retrieves details on table indexes, including names, types (e.g., unique or approximate), column positions, and , returning a ResultSet for programmatic access. Similarly, ODBC's SQLForeignKeys and related catalog functions support index-related queries, though specifics vary by driver conformance to ODBC 3.8 standards. These APIs align with ISO/IEC 9075 by exposing information defined in Part 11 ( and Schemas), enabling applications to discover and utilize indexes portably. While the SQL standard promotes uniformity in constraints, implementations introduce non-standard index types to address specific use cases. For instance, MySQL's FULLTEXT indexes enable efficient text searching on CHAR, VARCHAR, or TEXT columns using MATCH...AGAINST syntax, applicable only to or storage engines and not part of ISO/IEC 9075. Oracle's indexes, created via CREATE BITMAP INDEX, use bit vectors for low-cardinality columns in data warehousing scenarios, supporting queries but unsuitable for high-concurrency OLTP due to locking overhead, and are proprietary extensions outside the standard. Most relational database management systems (RDBMS) demonstrate strong compliance with SQL standard constraints, automatically creating indexes for UNIQUE and PRIMARY KEY, though variations in index architecture persist. PostgreSQL and IBM DB2 fully align with ISO/IEC 9075 for constraint-based indexing, supporting B-tree structures by default and optional types like GiST or BRIN without deviating from standard semantics. In contrast, MySQL's InnoDB engine implements clustered indexing implicitly on the PRIMARY KEY (or first UNIQUE NOT NULL key), storing row data with the index, while secondary indexes are non-clustered. These features are non-standard. These differences highlight how RDBMS prioritize performance optimizations over strict adherence to index typology in the standard. The development of database indexes began in the with hierarchical database systems, where IBM's Information Management System (IMS), introduced in 1966, utilized primary and secondary indexes to navigate tree-structured data efficiently for applications like NASA's . These early indexes relied on pointer-based mechanisms to access child records from parent nodes, enabling sequential and direct access in a pre-relational era dominated by navigational databases. In the , the shift to marked a pivotal advancement, with IBM's System R prototype incorporating indexes for efficient storage and retrieval of ordered . The structure, invented by and Edward M. McCreight in 1970, organized index pages into balanced, multi-level trees to minimize disk I/O for large datasets, becoming the foundational index type for relational systems. This innovation addressed the limitations of hierarchical indexes by supporting dynamic insertions, deletions, and range queries without frequent reorganizations. The 1980s and 1990s saw diversification into specialized index types for commercial databases, including hash indexes for exact-match lookups and bitmap indexes for low-cardinality columns. introduced bitmap indexes in version 7.3 in 1995, enabling compact representation of multiple rows via bit vectors to accelerate analytical queries on sparse data. Concurrently, inverted indexes gained prominence for text search, with Verity Inc., founded in 1983 and commercializing its technology by 1988, pioneering full-text indexing engines that mapped terms to document locations for rapid retrieval in systems. Entering the 2000s, indexes evolved to optimize storage and query specificity, with covering indexes—allowing queries to be resolved entirely from the index without table access—and partial indexes—built only on subsets of matching a —emerging as key enhancements. Partial indexes were formalized in by 1989 and implemented in version 7.2 in 2002, reducing index size for conditional workloads like active records only. Covering indexes, widely adopted in systems like SQL Server by the early , included non-key columns to eliminate key lookups, improving performance for frequent SELECT operations. The rise of databases further expanded indexing paradigms, as 's in 2006 introduced distributed, column-oriented indexes using sorted string tables (SSTables) with block-level indexing for scalable, sparse across clusters. Post-2010 advancements integrated and specialized structures for emerging workloads, including learned indexes that use neural networks to predict key locations, as proposed by Kraska et al. in 2017, potentially reducing lookup latency by modeling cumulative distribution functions instead of rigid tree traversals. For AI-driven applications, vector indexes emerged to handle high-dimensional similarity searches; Facebook's FAISS library, released in 2017, provided efficient approximate nearest-neighbor indexing for dense embeddings using techniques like inverted file (IVF) structures with quantization. Cloud-native databases like , launched in 2014, incorporated optimized indexes such as spatial B-trees on for geospatial data, leveraging shared storage to minimize replication overhead in distributed environments. Contemporary trends emphasize automation and , with enabling automated index tuning to recommend and create indexes based on query patterns, as explored in recent surveys on ML-powered systems that adapt to workload changes without manual intervention. Additionally, sustainable indexing practices focus on reducing I/O through AI-optimized structures and compression, lowering energy consumption in cloud databases by minimizing disk accesses and promoting initiatives.

References

  1. [1]
    Database Concepts
    ### Summary of Indexes and Index-Organized Tables (Oracle Database 19c)
  2. [2]
    MySQL :: MySQL 8.0 Reference Manual :: 10.3.1 How MySQL Uses Indexes
    ### Summary of MySQL Indexes (from https://dev.mysql.com/doc/refman/8.0/en/mysql-indexes.html)
  3. [3]
    Chapter 11. Indexes
    ### Summary of Indexes from PostgreSQL Documentation
  4. [4]
    6 Indexes - Oracle Help Center
    An index provides pointers to the rows in a table that contain a given key value. A regular index stores a list of rowids for each key corresponding to the rows ...
  5. [5]
    Clustered and Nonclustered Indexes - SQL Server - Microsoft Learn
    Aug 21, 2025 · An index contains keys built from one or more columns in the table or view. These keys are stored in a structure (B-tree) that enables SQL ...Missing: components | Show results with:components
  6. [6]
    [PDF] 15-445/645 Database Systems (Spring 2025) - 08 Indexes & Filters I
    Depending on the index specification, the DBMS will place null keys in either the first leaf node (i.e., NULL. FIRST) or the last leaf node (i.e., NULL LAST).
  7. [7]
    5 Indexes and Index-Organized Tables - Oracle Help Center
    An index-organized table is a table stored in a variation of a B-tree index structure. In contrast, a heap-organized table inserts rows where they fit. Parent ...
  8. [8]
    Information Management Systems - IBM
    The first version shipped in 1967. A year later the system was delivered to NASA. IBM would soon launch a line of business called Database/Data Communications ...
  9. [9]
    The relational database - IBM
    In his 1970 paper “A Relational Model of Data for Large Shared Data Banks,” Codd envisioned a software architecture that would enable users to access ...
  10. [10]
    [PDF] A Relational Model of Data for Large Shared Data Banks
    If a system uses indices at all and if it is to perform well in an environment with changing patterns of activity on the data bank, an ability to create and ...
  11. [11]
    [PDF] Chapter 14: Indexing - Database System Concepts
    Compared to dense indices: • Less space and less maintenance overhead for insertions and deletions. • Generally slower than dense index for locating records ...
  12. [12]
    [PDF] Organization and Maintenance of Large Ordered Indices
    Organization and maintenance of an index for a dynamic random access file is considered. It is assumed that the index must be kept on some pseudo random ...
  13. [13]
    [PDF] 7. Indexing - Computer Science | UC Davis Engineering
    Basic Concepts. • Indexing mechanisms are used to optimize certain accesses to data (records) managed in files. For example, the author.
  14. [14]
    [PDF] Lecture 6: Indexes and Database Tuning - Washington
    Nov 1, 2011 · Indexes: Data structures to organize records via trees or hashing. – Like sorted files, they speed up searches for a subset of records, based on ...
  15. [15]
    Primary and foreign key constraints - SQL Server - Microsoft Learn
    Feb 4, 2025 · When you specify a primary key constraint for a table, the Database Engine enforces data uniqueness by automatically creating a unique index for ...
  16. [16]
  17. [17]
    Optimize storage cost in Azure Cosmos DB | Microsoft Learn
    Aug 14, 2024 · However, in Azure Cosmos DB, in the median case, it's much lower. In Azure Cosmos DB, the storage overhead of index is typically low (10-20%) ...Missing: percentage | Show results with:percentage
  18. [18]
    [PDF] Chapter 15: Query Processing - Database System Concepts
    Index scan – search algorithms that use an index. • selection condition must be on search-key of index. ▫ A2 (clustering index, equality on key). Retrieve a ...
  19. [19]
    [PDF] An Overview of Query Optimization in Relational Systems
    Query optimization focuses on SQL queries in relational systems. The query optimizer generates input for the execution engine, which is critical for choosing ...
  20. [20]
    [PDF] CMU 15-445/645 Database Systems (Fall 2018) :: Joins
    Oct 9, 2018 · INDEX NESTED LOOP JOIN. Why do basic nested loop joins suck ass ... Can we accelerate the join using an index? Use an index to find ...
  21. [21]
    [PDF] Buffer Pool Aware Query Scheduling via Deep Reinforcement ...
    Higher buffer hit rates (and hence lower disk access rates) can enormously impact query execution times but require strategic query scheduling, as execution ...<|control11|><|separator|>
  22. [22]
    Documentation: 18: 5.5. Constraints - PostgreSQL
    Adding a unique constraint will automatically create a unique B-tree index on the column or group of columns listed in the constraint. A uniqueness restriction ...
  23. [23]
    SQL Language Reference
    ### Summary of Constraint Enforcement Using Indexes in Oracle Database 19c
  24. [24]
    Documentation: 18: 11.6. Unique Indexes - PostgreSQL
    Unique indexes enforce uniqueness of a column's value or combined values, preventing multiple rows with equal values. PostgreSQL creates them automatically for ...11.7. Indexes on Expressions · Chapter 11. Indexes · Combining Multiple Indexes
  25. [25]
    Bitmap Indexes
    In a bitmap index, a bitmap for each key value replaces a list of rowids. Each bit in the bitmap corresponds to a possible rowid, and if the bit is set, it ...
  26. [26]
    [PDF] Bitmap Index Design and Evaluation - CMU 15-721
    In this paper, we present a general framework to study the design space of bitmap indexes for selection queries and examine the disk-space and time.Missing: seminal | Show results with:seminal
  27. [27]
    Bitmap Index vs. B-tree Index: Which and When? - Oracle
    There are several disadvantages to using a bitmap index on a unique culumn--one being the need for sufficient space (and Oracle does not recommend it).Missing: structure | Show results with:structure
  28. [28]
    [PDF] Chapter 14: Indexing - Database System Concepts
    • For unclustered index: sparse index on top of dense index (multilevel ... ©Silberschatz, Korth and Sudarshan. 14.30. Database System Concepts - 7th Edition.
  29. [29]
    [PDF] ISAM (Indexed Sequential Access Method)
    – The index only note the lowest or the highest key in a given bucket. • For this reason, clustering index, is often called a sparse index (e.g., ISAM, a B ...
  30. [30]
  31. [31]
    Apache Lucene Core
    ### Summary: Lucene's Use of Inverted Indexes for Full-Text Search
  32. [32]
    [PDF] Indexing
    Indexing locates records by value, and can be dense or sparse. Types include primary, secondary, tree-based, and hash-index.
  33. [33]
    Relational Database Introduction - Peter Lars Dordal
    To enforce the primary-key constraint, every time a primary-key value is inserted or updated the index is used to see if another record has the same key value.
  34. [34]
    [PDF] 7. Indexing
    primary index. (; index-sequential file). • Secondary Index: an index whose search key is different from the sequential order of the file (i.e., records in ...
  35. [35]
    Documentation: 18: 65.6. Hash Indexes - PostgreSQL
    Hash indexes store only the hash value of the data being indexed, thus there are no restrictions on the size of the data column being indexed. Hash indexes ...<|separator|>
  36. [36]
    [PDF] Hash-Based Indexes Static Hashing - cs.wisc.edu
    Static and dynamic hashing techniques exist; trade-offs similar to ISAM vs. B+ trees. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke. 3. Static ...Missing: reference | Show results with:reference
  37. [37]
    [PDF] Linear Hashing: A New Tool For File And Table Addressing - InfoLab
    Linear hashing is a hashing in which the address space may grow or shrink dynamically. A file or a table may then support ally number of insertions.
  38. [38]
    Redis hashes | Docs
    Redis hashes are record types structured as collections of field-value pairs. You can use hashes to represent basic objects and to store groupings of counters, ...
  39. [39]
  40. [40]
    17.6.2.1 Clustered and Secondary Indexes - MySQL :: Developer Zone
    Each InnoDB table has a special index called the clustered index that stores row data. Typically, the clustered index is synonymous with the primary key.
  41. [41]
    Create a Clustered Index - SQL Server | Microsoft Learn
    Sep 2, 2025 · Create a clustered index · Typical implementations. Clustered indexes are implemented in the following ways: PRIMARY KEY and UNIQUE constraints**.
  42. [42]
    Optimize index maintenance to improve query performance and ...
    Jun 23, 2025 · This article helps you decide when and how to perform index maintenance. It covers concepts such as index fragmentation and page density, and their impact on ...<|separator|>
  43. [43]
    17.6.2.2 The Physical Structure of an InnoDB Index
    When new records are inserted into an InnoDB clustered index, InnoDB tries to leave 1/16 of the page free for future insertions and updates of the index records ...
  44. [44]
    Indexes - SQL Server - Microsoft Learn
    Aug 22, 2025 · A clustered index sorts and stores the data rows of the table or view in order based on the clustered index key. The clustered index is ...Missing: concepts | Show results with:concepts<|control11|><|separator|>
  45. [45]
    Maximum Capacity Specifications for SQL Server - Microsoft Learn
    Aug 21, 2025 · The maximum number of bytes in a clustered index key can't exceed 900. For a nonclustered index key, the maximum is 1,700 bytes. You can define ...
  46. [46]
    Create Nonclustered Indexes - SQL Server - Microsoft Learn
    Sep 4, 2025 · Generally, nonclustered indexes are created to improve the performance of frequently used queries not covered by the clustered index or to ...Before you begin · Use SQL Server Management...
  47. [47]
    Create Indexes with Included Columns - SQL Server
    ### Summary of Indexes with Included Columns (Covering Indexes)
  48. [48]
    CREATE INDEX
    ### Summary of INCLUDE Clause for Covering Indexes in PostgreSQL
  49. [49]
    18: 11.9. Index-Only Scans and Covering Indexes - PostgreSQL
    To make effective use of the index-only scan feature, you might choose to create a covering index, which is an index specifically designed to include the ...11.10. Operator Classes and... · Chapter 11. Indexes · 11.8. Partial Indexes<|control11|><|separator|>
  50. [50]
    Covering Index in SQL Server with Key and Non-Key Columns
    Jan 11, 2023 · The simplest way to describe it is an index that includes every column for a specific query. Sometimes you hear this referred to as covering the ...Covering Index In Sql Server · Sql Server Index Structure · Key Size Reduction
  51. [51]
    Using Covering Indexes to Improve Query Performance - Simple Talk
    Sep 29, 2008 · An index that contains all information required to resolve the query is known as a “Covering Index”; it completely covers the query. Using the ...Introduction · An indexing refresher · Resorting to Table Scans · Covering Indexes
  52. [52]
    Index Architecture and Design Guide - SQL Server - Microsoft Learn
    Oct 1, 2025 · The index is a sorted list of keywords and next to each keyword is a set of page numbers pointing to the pages where each keyword can be found.
  53. [53]
    Selectivity of Indexes - Teradata Vantage - Analytics Database
    An index that retrieves many rows is said to have weak selectivity. An index that retrieves few rows is said to be strongly selective. The more strongly ...Missing: criteria | Show results with:criteria
  54. [54]
    The right column order in multi-column indexes - Use The Index, Luke
    The order of columns in an SQL index is essential. The right order makes an index useful for many queries, the wrong order only for a few queries or none at ...
  55. [55]
    Composite indexes — MySQL for Developers - PlanetScale
    Mar 9, 2023 · Composite indexes cover multiple columns instead of having individual indexes on each column. Understanding and effectively using composite indexes can elevate ...<|control11|><|separator|>
  56. [56]
    Collecting histogram statistics - Db2 - IBM
    Histogram statistics are most helpful for range, LIKE, and BETWEEN predicates, but they can also improve selectivity for equality, IS NULL, and IN predicates, ...
  57. [57]
    Histogram-Based Statistics | Server | MariaDB Documentation
    Jul 15, 2025 · Histogram-based statistics are a mechanism to improve the query plan chosen by the optimizer in certain situations.
  58. [58]
    How Computed Columns Can Cause Blocking - Brent Ozar Unlimited
    May 1, 2021 · When you add a computed column that references another table, like with a scalar user-defined function, you can end up causing concurrency problems.
  59. [59]
    Properly Persisted Computed Columns - SQLPerformance.com
    May 25, 2017 · Computed columns that are neither persisted nor indexed have valid uses. For example, they can support automatic statistics if the column is deterministic and ...Missing: pitfalls | Show results with:pitfalls
  60. [60]
    MySQL :: MySQL 8.4 Reference Manual :: 10.3.6 Multiple-Column Indexes
    ### Summary of Multiple-Column Indexes in MySQL (MySQL Documentation)
  61. [61]
    5 Indexes and Index-Organized Tables - Oracle Help Center
    Indexes are structures stored in the database that users manage using SQL statements. Keys are strictly a logical concept.
  62. [62]
    Documentation: 18: 11.3. Multicolumn Indexes - PostgreSQL
    Indexes can have up to 32 columns, including INCLUDE columns. (This limit can be altered when building PostgreSQL; see the file pg_config_manual.h.)11.4. Indexes and ORDER BY · Chapter 11. Indexes · 11.2. Index Types
  63. [63]
    Transaction locking and row versioning guide - SQL Server
    This guide describes locking and row versioning mechanisms the Database Engine uses to ensure the integrity of each transaction.
  64. [64]
    Data Concurrency and Consistency - Oracle Help Center
    In general, the database uses two types of locks: exclusive locks and share locks. Only one exclusive lock can be obtained on a resource such as a row or a ...
  65. [65]
    Inside SQL Server: Indexing and Locking - ITPro Today
    You still have an X lock on the RID and IX locks on the page and the table, but additionally, you have two X KEY locks. Only indexes acquire KEY locks, so the X ...Inside Sql Server: Indexing... · The Explanation · One Last Question<|control11|><|separator|>
  66. [66]
    Deadlocks Guide - SQL Server - Microsoft Learn
    Oct 10, 2025 · A deadlock occurs when two or more tasks permanently block each other by each task having a lock on a resource that the other tasks are trying to lock.Understand deadlocks · Resources that can deadlock
  67. [67]
    Chapter 13. Concurrency Control
    **Summary of MVCC with Indexes in PostgreSQL**
  68. [68]
    [PDF] Lecture Notes - 09 Index Concurrency Control - CMU 15-445/645
    There is an important distinction between locks and latches when discussing how the DBMS protects its internal elements. A lock is a higher-level, logical ...
  69. [69]
    Partitioned Tables and Indexes - SQL Server, Azure SQL Database ...
    Partitioning large tables or indexes can have the following manageability and performance benefits. You can transfer or access subsets of data quickly and ...Components And Concepts · Partition Function · Performance Guidelines
  70. [70]
    Managing Indexes - Oracle Help Center
    The percentage varies greatly according to the relative speed of a table scan and how the row data is distributed in relation to the index key.
  71. [71]
    Configure Parallel Index Operations - SQL Server - Microsoft Learn
    Sep 22, 2025 · Learn about the max degree of parallelism and learn how to modify this setting in SQL Server.Missing: process | Show results with:process
  72. [72]
    PostgreSQL: Parallel CREATE INDEX for better performance
    In PostgreSQL 11 parallel index creation is on by default. The parameter in charge for this issue is called max_parallel_maintenance_workers, which can be set ...Missing: scan DBMS
  73. [73]
    Creating Index on Large Table - Databases forum at WebmasterWorld
    Nov 1, 2014 · Finally, add the index. It really depends on the data and table types. Adding an index on a table of that size will invariably take a long time.
  74. [74]
    Understanding MySQL Indexes: Types, Benefits, and Best Practices
    Sep 23, 2024 · Too many indexes can slow down data insertion and updates, while too few indexes can result in slow query performance. It's also important ...
  75. [75]
    Usage Tracking (DBA_INDEX_USAGE, V$INDEX_USAGE_INFO) in ...
    Aug 19, 2017 · Index usage tracking allows unused indexes to be identified, helping to removing the risks associated with dropping useful indexes.
  76. [76]
    Database Index Selection Guide | Aerospike
    May 2, 2025 · Database index selection: A comprehensive guide. Learn how to ... Storage overhead: indices consume disk space, as well as memory when ...
  77. [77]
    Whitepaper: Diagnose and Resolve Latch Contention - SQL Server
    Jul 17, 2025 · When using the original table definition, excessive latch contention was observed to occur on the clustered index pk_table1: CREATE TABLE ...
  78. [78]
    5 Evaluating Your System's Performance Characteristics
    Typical OLTP applications are airline reservation systems, large order-entry applications, and banking applications. The key goals of an OLTP system are ...<|separator|>
  79. [79]
    3 Physical Design in Data Warehouses - Oracle Help Center
    Bitmap indexes are optimized index structures for set-oriented operations. Additionally, they are necessary for some optimized data access methods such as star ...
  80. [80]
    Indexing for Beginners, Part 3 | Elastic Blog
    Oct 29, 2013 · With the inverted index, we only have to look for a term once to retrieve a list of all documents containing the term.
  81. [81]
    15. Spatial Indexing — Introduction to PostGIS
    Both PostGIS and Oracle Spatial share the same “R-Tree” 1 spatial index structure. R-Trees break up data into rectangles, and sub-rectangles, and sub-sub ...
  82. [82]
    R-trees: a dynamic index structure for spatial searching
    In this paper we describe a dynamic index structure called an R-tree which meets this need, and give algorithms for searching and updating it.
  83. [83]
    Create an Index on an Embedded Document - Database Manual
    You can create indexes on embedded documents as a whole. However, only queries that specify the entire embedded document use the index.
  84. [84]
    Overview | Apache Cassandra Documentation
    It implements a partitioned wide-column storage model with eventually consistent semantics. ... Storage-attached indexing (SAI) for secondary indexes. Local ...
  85. [85]
    Index storage - Progress Documentation
    Jan 16, 2024 · Indexes account for approximately 30 percent of total storage. Therefore, you can take 50 percent of your data storage as an estimate of index storage.
  86. [86]
    Trying to find a way fit an extremely large index in to memory
    Jun 7, 2020 · The archive table size: 700GB of data, and ~200GB of index, around 0.9b of rows (yes, a lot of rows) Here is the table creation SQL:Balancing Indexing and Database Performance: How Many Indexes ...How much storage space do indexes and keys take for an SQL Table?More results from dba.stackexchange.com
  87. [87]
    How PostgreSQL Indexes Can Negatively Impact Performance
    Indexes in PostgreSQL come with a cost in terms of performance and resource consumption. Here are 10 different ways PostgreSQL indexes can hurt performance.
  88. [88]
    What is the cost of indexing multiple db columns? - Stack Overflow
    Jan 7, 2009 · Adding average index (1-3 columns in an index) to a table - makes inserts slower by 2.1%. So, if you add 20 indexes, your inserts will be slower by 40-50%.Missing: amplification | Show results with:amplification
  89. [89]
    Fighting PostgreSQL write amplification with HOT updates | by Adyen
    May 23, 2022 · In the case of an ordinary update, PostgreSQL would write to at least 11 different pages: one page in the data heap and one page on every index.
  90. [90]
    SQL Indexing Best Practices: Speed Up Your Queries - AI2sql.io
    Feb 20, 2025 · Small tables (a few hundred rows or less) don't usually need indexes at all, as a full scan is very fast in such cases​. Avoid Indexing Columns ...
  91. [91]
    Partial indexes: indexing selected rows - Use The Index, Luke
    You don't need to index the rows you are never searching for. Messages queues are good example how to use partial indexes in PostgreSQL and SQL Server.
  92. [92]
    a Subset of Rows with Partial Indexes - CockroachDB
    For example, to define a partial index on columns a and b of table t , filtering on rows in column c greater than 5: CREATE INDEX ON t (a, b) WHERE c > 5; The ...
  93. [93]
    Create Filtered Indexes - SQL Server - Microsoft Learn
    Jun 27, 2025 · A filtered index is an optimized disk-based rowstore nonclustered index especially suited to cover queries that select from a well-defined ...
  94. [94]
    Improve query performance using partial indexes in ... - Amazon AWS
    Apr 30, 2024 · Unlike traditional indexes that cover the entire collection, partial indexes narrow down the scope, optimizing performance for targeted queries.
  95. [95]
    [1712.01208] The Case for Learned Index Structures - arXiv
    Dec 4, 2017 · Access Paper: View a PDF of the paper titled The Case for Learned Index Structures, by Tim Kraska and 4 other authors. View PDF · TeX Source.
  96. [96]
    International Standard ISO/IEC 9075:1992
    SQL defines distinct data types named by the following <key word>s ... In addition, if the unique constraint was defined with PRIMARY KEY, then it ...
  97. [97]
    [PDF] ANSI/ISO/IEC International Standard (IS) Database Language SQL
    Annex F (informative): SQL feature and package taxonomy. Requests for interpretation, suggestions for ...
  98. [98]
    Database languages — SQL - ISO/IEC 9075-1:2016
    ISO/IEC 9075-1:2016 describes the conceptual framework used in other parts of ISO/IEC 9075 to specify the grammar of SQL and the result of processing ...Missing: extensions partial functional indexes
  99. [99]
    The 16 Parts of the SQL Standard ISO/IEC 9075
    The SQL standard consists of 10 parts: 1-4, 9-11, and 13-16. The missing parts were never released.Missing: extensions partial
  100. [100]
    DatabaseMetaData (Java Platform SE 8 ) - Oracle Help Center
    DatabaseMetaData provides comprehensive information about a database, implemented by driver vendors to show DBMS capabilities, used by tools to discover how to ...
  101. [101]
    getIndexInfo Method (SQLServerDatabaseMetaData) - Microsoft Learn
    Jun 25, 2024 · The following example demonstrates how to use the getIndexInfo method to return information about the indexes and statistics of the Person.Contact table.Missing: ODBC | Show results with:ODBC
  102. [102]
  103. [103]
    SQL Language Reference
    ### Summary on Bitmap Indexes in Oracle
  104. [104]
    What is IBM IMS (Information Management System)? - TechTarget
    Feb 24, 2022 · They have primary and secondary indexes ... Early versions of IMS served as a precursor to IBM's hierarchical database management system.
  105. [105]
    [PDF] System R: Relational Approach to Database Management
    The pages for a given index are organiaed into a balanced hierarchic structure, in the style of B-trees [3] and of. Key Sequenced Data Sets in IBM's VSAM ...
  106. [106]
    Organization and maintenance of large ordered indices
    Organization and maintenance of an index for a dynamic random access file is considered. It is assumed that the index must be kept on some pseudo random ...
  107. [107]
    BITMAP index - Gerald Venzl
    BITMAP index. By Gerald Venzl / 2010-04-20 / Technology · Oracle ... Bitmap indexes got introduced into Oracle with 7.3 but are currently only available in ...
  108. [108]
    History of Verity Inc. - FundingUniverse
    Verity was spun off from Advanced Decision Systems in 1988 by Dr. Michael S. Pliner and a group that included David Glazer, Phil Nelson, Andrew Leak, Abe ...
  109. [109]
    [PDF] THE CASE FOR PARTIAL INDEXES - Berkeley
    Current data managers support secondary and/or primary indexes on columns of relations. In this paper we suggest the advantages that result from indexes which ...<|separator|>
  110. [110]
    [PDF] Bigtable: A Distributed Storage System for Structured Data
    A Bigtable is a sparse, distributed, persistent multi- dimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the ...Missing: partial NoSQL
  111. [111]
    Faiss: A library for efficient similarity search - Engineering at Meta
    Mar 29, 2017 · This month, we released Facebook AI Similarity Search (Faiss), a library that allows us to quickly search for multimedia documents that are ...
  112. [112]
    Overview of Amazon Aurora MySQL - AWS Documentation
    Aurora MySQL supports spatial indexing on InnoDB tables. Spatial indexing improves query performance on large datasets for queries on spatial data.
  113. [113]
    ML-Powered Index Tuning: An Overview of Recent Progress ... - arXiv
    Aug 25, 2023 · This paper directs attention to these challenges within automated index tuning and explores ways in which machine learning (ML) techniques provide new ...
  114. [114]
    Green cloud computing: AI for sustainable database management
    Apr 7, 2025 · ... reducing database energy consumption is AI-driven indexing. Conventional indexing. methods often lead to excessive disk I/O operations, as ...<|control11|><|separator|>