Fact-checked by Grok 2 weeks ago

Database index

A database index is an optional data structure associated with a database table that speeds up data retrieval operations by providing a fast access path to rows, reducing the need for full table scans.^[1] It functions similarly to an index in a book, allowing the database management system (DBMS) to locate specific data efficiently without examining every record.^[2] Database indexes are particularly valuable for queries involving frequent searches, joins, sorting, or filtering on specific columns, as they enable the DBMS to seek directly to relevant row locations on disk.^[3] Common types include B-tree indexes, which are the default in many systems and support range queries and equality searches on ordered data; bitmap indexes, used for low-cardinality columns to represent multiple rows per value efficiently; hash indexes, suitable for exact-match queries on unordered data; and full-text indexes, designed for searching text content.^[1]^[3] Specialized variants, such as function-based or spatial indexes, address advanced use cases like computed expressions or geographic data.^[1] While indexes significantly enhance read performance—especially for selective queries that access a small fraction of rows—they incur costs in terms of additional storage space and maintenance overhead during insert, update, or delete operations, as the index must be updated to reflect changes.^[1]^[3] Effective index design requires balancing query patterns, data distribution, and write frequency to avoid performance degradation, often involving tools for monitoring and optimization provided by the DBMS.^[4]

Fundamentals

Definition and Core Concepts

A database table consists of rows, each representing a record or tuple containing values for predefined columns, serving as the foundational unit for organizing and storing data in a relational database management system (RDBMS). A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index during data modifications. At its core, an index stores key values extracted from one or more indexed columns—often sorted in tree-based indexes—along with pointers or references—such as row identifiers (ROWIDs) or tuple identifiers—to the actual data rows in the table.^[5] These index entries typically form pairs of (key, row locator), enabling efficient navigation to the corresponding table rows without scanning the entire table.^[6] In tree-based indexes, the structure is organized into internal nodes, which contain keys and pointers to child nodes for branching during searches, and leaf nodes, which hold the actual (key, row locator) pairs pointing directly to the data.^[7] For heap-organized tables, where rows are stored in an unordered manner without a clustered index dictating physical order, the index exists as a separate structure from the table itself, providing an auxiliary lookup mechanism.^[8] The concept of database indexes originated in early hierarchical database systems like IBM's Information Management System (IMS), developed in the 1960s for handling complex data in mainframe environments such as NASA's Apollo program.^[9] Indexes evolved significantly in the 1970s with the advent of the relational model, proposed by E. F. Codd, which emphasized declarative querying and integrated indexing to support efficient access in set-based operations on normalized tables.^[10]^[11]

Primary Purposes and Benefits

Database indexes primarily serve to enhance the efficiency of data retrieval operations, particularly for SELECT queries, by enabling targeted lookups instead of exhaustive full table scans. Without an index, querying a table requires scanning all rows sequentially, which has a time complexity of O(n) where n is the number of records. In contrast, structures like B+-trees, commonly used in indexes, reduce this to O(log n) by traversing a balanced tree to locate relevant data pointers, significantly accelerating access for equality, range, and other selective searches.^[12]^[13] Key benefits include substantial reductions in input/output (I/O) operations, as indexes minimize the number of disk pages read—often limiting accesses to a logarithmic number of nodes rather than the entire dataset—and support faster sorting and grouping by maintaining ordered key values. For instance, in a B+-tree index with a fanout of 100 for a million-key table, search time involves at most about 4 disk accesses compared to potentially millions in a scan. These gains are most pronounced with high-selectivity indexes, where the indexed attribute has many unique values, filtering to few rows per key and maximizing the efficiency of query execution. Low-selectivity attributes, however, may yield limited benefits due to the need for additional post-filtering.^[14]^[15]^[12] Beyond performance, unique indexes enforce uniqueness constraints, such as for primary keys, by preventing duplicate entries. Indexes also support referential integrity for foreign key relationships by enabling efficient checks to ensure valid references between tables. This automatic enforcement maintains relational consistency without additional application logic. However, these advantages come at a space-time trade-off: indexes incur additional storage overhead plus maintenance costs during inserts, updates, and deletes that can slow write operations, though the query speedups often amortize these over repeated reads.^[16]^[17]

Usage in Database Operations

Accelerating Query Lookups

Database indexes accelerate query lookups by enabling efficient data retrieval without scanning entire tables, primarily through two mechanisms: index seeks and index scans. An index seek involves traversing the index structure, such as a B+-tree, to directly locate specific key values, which is ideal for exact-match or point queries. This process starts at the root node and follows pointers down to the leaf level, minimizing disk I/O by accessing only the necessary branches and leaves.^[18] In contrast, an index scan retrieves a range of values by scanning the leaf nodes sequentially after locating the starting point via a seek, which is more suitable for range queries like those using greater-than or less-than operators. For instance, in a clustered index on a salary column, a seek finds the first entry above a threshold, followed by a scan of consecutive leaves to gather all qualifying records.^[18] The query optimizer plays a central role in deciding when to use indexes by employing cost-based analysis to estimate execution plans. It evaluates factors such as index selectivity, table size, and available statistics from the database catalog to choose between seeks, scans, full table scans, or index-only scans, where all required data is fetched directly from the index without accessing the base table.^[19] An index-only scan, also known as a covering index operation, occurs when the index includes all attributes needed for the query, such as selecting a non-key column included in the index structure, thereby avoiding costly table lookups.^[15] For example, consider the SQL query SELECT name FROM employees WHERE department_id = 5; on a table with a non-clustered index on department_id that includes the name column: the optimizer performs an index seek to find matching keys, retrieves names directly from the index leaves, and returns results without probing the table heap.^[18] Indexes also enhance join operations by accelerating lookups in nested loop and hash joins. In an index nested loop join, the outer relation's tuples probe an index on the inner relation's join column, enabling seeks to fetch only matching inner tuples rather than scanning the entire inner table.^[20] This is particularly effective for selective joins where the outer relation is small, with costs scaling as the number of outer tuples times the index probe cost. Hash joins can similarly benefit if indexes support hash-based partitioning, though nested loops often leverage tree indexes for precise matches.^[20] Key performance metrics for index-accelerated queries include index hit rate, which measures the proportion of index accesses satisfied from memory rather than disk, and buffer pool cache efficiency, which tracks how effectively the database's in-memory buffer pool retains frequently accessed index pages to reduce I/O latency. High index hit rates, often above 90% in optimized systems, indicate effective caching and can reduce query times by orders of magnitude compared to disk fetches.^[21]

Enforcing Data Constraints

Database indexes play a crucial role in enforcing data integrity constraints during insert, update, and delete operations by providing efficient mechanisms to validate proposed changes against predefined rules. When a write operation occurs, the database management system (DBMS) traverses the relevant index to check compliance; if a violation is detected, the operation is rejected, and the transaction is typically rolled back to maintain consistency.^[22]^[23]^[16] Unique indexes are essential for implementing UNIQUE and PRIMARY KEY constraints, which prevent duplicate values in specified columns or combinations thereof. For UNIQUE constraints, the DBMS automatically creates a unique index (often a B-tree structure) on the constrained columns to enforce non-duplication; during an insert or update, the system searches the index for an existing match, rejecting the operation if one is found.^[22]^[23] PRIMARY KEY constraints combine this uniqueness with a NOT NULL requirement, leveraging the same unique index for duplicate checks while separately enforcing the null prohibition.^[22]^[16] For instance, in a users table, a unique index on the email column ensures that no two rows share the same email address by validating against the index on every insert or update to that field.^[24] Indexes also support foreign key (referential integrity) enforcement by accelerating the verification of relationships between tables. While the referenced column in the parent table typically already has a unique or primary key index for efficient lookups during child table inserts or updates, an index on the foreign key column in the child table optimizes cascading operations, such as deletes or updates from the parent that propagate to the child.^[22]^[16] Without such an index, referential checks—scanning for matching rows in the child table—can become inefficient, potentially leading to full table scans; thus, manually creating an index on foreign key columns is recommended to ensure quick identification and handling of dependent records.^[23]^[16] Check constraints, which enforce domain-specific rules like value ranges or patterns on individual rows, do not directly rely on indexes for enforcement, as validation involves evaluating a Boolean expression per affected row during writes. However, for complex check constraints that incorporate subqueries or lookups against other tables (e.g., ensuring a value exists in a referenced domain table), indexes on the involved columns can indirectly speed up the validation process by enabling faster data retrieval.^[22]^[23] In such cases, the DBMS uses the index to efficiently resolve the subquery, reducing the overhead of constraint evaluation without altering the core per-row logic.^[22]

Index Types

Bitmap Indexes

A bitmap index is a specialized database index structure that employs bit vectors, known as bitmaps, to represent the presence or absence of distinct values in a column across all rows of a table. For each unique value in the indexed column, a separate bitmap is created, where each bit corresponds to a specific row in the table; a bit set to 1 indicates that the row contains that value, while 0 indicates it does not. This design replaces traditional lists of row identifiers (rowids) with compact bit arrays, and a mapping function translates bit positions back to actual rowids for data retrieval. For instance, in a table with 1 million rows and a column having 100 distinct values, the index consists of 100 bitmaps, each comprising 1 million bits, resulting in a total raw size of approximately 12.5 MB before compression.^[25]^[26] Bitmap indexes are created on columns exhibiting low cardinality, meaning a small number of distinct values relative to the total row count, such as gender (e.g., 'M', 'F'), marital status, or product categories. The creation process involves a SQL statement like CREATE BITMAP INDEX, which builds the bitmaps by scanning the table and setting bits accordingly; this is particularly efficient for large tables due to the index's simple physical structure compared to tree-based indexes. To optimize storage, bitmaps are often compressed using techniques like run-length encoding (RLE), which exploits runs of identical bits (e.g., sequences of 0s or 1s) common in low-cardinality columns, reducing the index size to a fraction of the uncompressed form—sometimes as low as 2-5% of the original bitmap length in analytical datasets.^[25]^[26]^[27] Query operations on bitmap indexes leverage bitwise logical operations for efficient filtering. Single-column equality or membership queries involve selecting the corresponding bitmap and identifying set bits to retrieve matching rows. For multi-column queries, bitmaps from different indexes are combined using AND (intersection for conjunctive conditions), OR (union for disjunctive conditions), or XOR operations, enabling rapid evaluation of complex WHERE clauses without scanning the entire table. Aggregations, such as COUNT(DISTINCT) or grouping by low-cardinality attributes, benefit from bitmap intersections to compute cardinalities directly from bit counts, avoiding row-by-row processing. These operations are particularly performant in parallel environments, as bitmaps can be divided across processors.^[25]^[26] The primary advantages of bitmap indexes lie in their space efficiency and suitability for online analytical processing (OLAP) workloads, where queries are read-heavy and involve aggregations over large datasets. In data warehouses employing star schemas, bitmap indexes on dimension table columns (e.g., customer demographics or time periods) facilitate fast joins and filters against massive fact tables, reducing query times from hours to seconds by minimizing I/O through compressed bitmaps. For example, on a gender column with three values across millions of rows, a bitmap index occupies about 570 KB versus 13 MB for a comparable B-tree index, while supporting dynamic ad-hoc queries with multiple conditions. Unlike dense indexes that store entries for every row, bitmap indexes are inherently sparse in representation for low-cardinality data, emphasizing conceptual efficiency over pointer density.^[25]^[27]^[26] Despite these benefits, bitmap indexes have notable drawbacks, particularly their poor performance on high-cardinality columns (e.g., unique identifiers) where the number of bitmaps approaches the row count, leading to excessive storage and slower construction. They are also unsuitable for online transaction processing (OLTP) environments with frequent updates, inserts, or deletes, as modifying a single row requires updating multiple bitmaps, potentially causing locking contention and deadlocks in concurrent scenarios; Oracle recommends them only for environments with batch updates, such as nightly loads in data warehouses.^[25]^[27]

Dense and Sparse Indexes

A dense index provides an index entry for every row in the table, irrespective of the data's physical order, ensuring complete coverage and a direct pointer to each record's location. This structure facilitates rapid lookups for exact matches and supports range queries efficiently by allowing sequential traversal through the index entries. However, the comprehensive nature of dense indexes leads to higher storage overhead and increased maintenance costs during insertions, deletions, or updates, as every change may require modifying the index.^[28] In contrast, a sparse index contains entries only for a subset of rows, typically one entry per block, page, or group of sorted keys, which assumes the underlying data file is ordered by the index key. This approach points to the starting position of each block rather than individual records, enabling the system to scan sequentially within the block to find the target row. Sparse indexes are commonly employed in scenarios where data is clustered and sorted, such as in primary indexes, reducing the overall index size and maintenance effort compared to dense variants.^[28] The primary trade-off between dense and sparse indexes lies in their balance of access speed and resource consumption: dense indexes excel in unsorted heap files for point queries due to their exhaustive mapping, while sparse indexes suit sorted, clustered data for range operations by minimizing storage—often using significantly less space, especially in large tables where block-level entries can reduce index footprint by factors related to block capacity. Dense indexes are preferable for small tables or applications with frequent exact-match searches, whereas sparse indexes are advantageous for bulk sequential access in voluminous, ordered datasets.^[28] Historically, sparse indexes gained prominence in early database and file systems, particularly through the Indexed Sequential Access Method (ISAM), where they indexed the highest key per cylinder or track to support efficient retrieval in relatively static environments without excessive overflow handling. In modern relational database management systems, dense indexes dominate secondary indexing on non-clustered tables for versatility, while sparse designs persist in clustered index architectures to leverage data ordering for optimized performance. Sparse indexes are often integrated with clustered index structures to enhance efficiency in sorted data layouts.^[29]^[28]

Inverted and Reverse Indexes

An inverted index is a data structure that maps terms from a collection of documents to the locations where those terms appear, enabling efficient full-text search and information retrieval.^[30] Unlike forward indexes that map documents to their contents, inverted indexes reverse this mapping, storing a term dictionary of unique terms alongside postings lists that record document identifiers (docIDs) and optionally term positions or frequencies within those documents.^[30] This structure supports rapid query processing, such as identifying all documents containing specific words or phrases.^[30] The core components include the term dictionary, a sorted list of unique terms often stored in memory for fast lookup, and the postings lists, which are disk-based lists of docIDs sorted in ascending order to facilitate efficient intersections during multi-term queries.^[30] For example, in a corpus of Shakespeare's works, the term "Brutus" might map to a postings list like {2, 6, 13, 16, 23}, indicating the docIDs where it appears, with additional position data such as {2:<1,4,17>, 6:<2,5>} to support phrase queries like "Brutus AND Caesar."^[30] Construction begins with tokenization, which breaks text into individual terms by removing punctuation and splitting on whitespace—for instance, converting "Friends, Romans, countrymen" to "Friends Romans countrymen."^[30] Stemming then normalizes variants to a common root, such as reducing "organize," "organizes," and "organizing" to "organ," potentially shrinking the vocabulary by about 17% while preserving query relevance.^[30] The process sorts and merges term-docID pairs to build the index, often using gap encoding in postings lists to compress consecutive docIDs (e.g., docIDs 1,2,4 become gaps 1,1,2).^[30] Inverted indexes power full-text search in systems like Apache Lucene, an open-source library that uses them to index and query large text corpora with support for ranked retrieval, proximity searches, and field-specific queries.^[31] In Lucene, the inverted index enables sub-second searches over billions of documents by combining term dictionaries with compressed postings for scalability.^[31] To optimize storage and query speed, postings lists employ compression techniques such as variable-byte encoding, which represents integers with 7-bit payloads per byte, reducing the Reuters-RCV1 corpus index from 250 MB uncompressed to 116 MB.^[30] Skip lists further accelerate operations like intersecting postings for multi-term queries by adding pointers that allow jumping over irrelevant docIDs, achieving roughly √P skips for a list of length P and cutting intersection time significantly for common query patterns.^[30] A reverse index, also known as a reverse key index, reverses the bytes of key values in an index to alter sorting behavior, particularly useful in composite indexes for prioritizing range scans on rightmost columns. In systems like Google Cloud Bigtable, where row keys are composite strings sorted lexicographically, byte reversal on earlier components groups data to favor scans on later ones—for example, storing a composite key (last_name, first_name) as (reversed(first_name bytes), last_name) shifts primary sorting to last_name while maintaining first_name order within groups via reversal decoding during queries. This technique enables efficient range scans on the prioritized column without restructuring the entire key schema; for instance, in time-series data, appending a reversed timestamp to a user ID (e.g., user123#reversedTimestamp) allows scans from most recent to oldest by querying suffixes, spreading writes across nodes and avoiding hotspots from monotonic keys. Applications include NoSQL databases for multi-column queries, such as retrieving records by surname ranges in a user directory while secondarily filtering by reversed first names to support descending order within matches.

Primary and Secondary Indexes

In relational databases, a primary index is established on the primary key of a table, which serves as a unique identifier for each row and dictates the physical or logical ordering of the data records in the file.^[32] This index is inherently unique and enforces non-null values, ensuring that no duplicate or missing primary key entries exist, thereby maintaining data integrity.^[33] Often implemented as a clustered index, the primary index organizes rows sequentially by the primary key value, allowing efficient sequential access and range queries aligned with the table's order.^[12] In contrast, a secondary index is created on non-primary key columns to provide auxiliary access paths for queries that do not involve the primary key.^[12] Unlike the primary index, secondary indexes do not determine the physical order of the table and can be multiple per table, pointing to the actual rows via references to the primary key values or row identifiers (RID).^[32] For instance, in a user table with a primary index on an ID column, a secondary index on the email column would map email values to corresponding ID values, enabling quick lookups by email without scanning the entire table.^[34] The relationship between primary and secondary indexes is interdependent: secondary index entries store primary key values to facilitate navigation to the correct row location, leveraging the primary index for final record retrieval.^[12] While primary indexes are always unique due to their tie to the primary key, secondary indexes may be unique (e.g., on a candidate key) or non-unique, allowing duplicates for attributes like categories or names.^[32] In relational database systems, the primary index plays a crucial role in supporting ACID properties, particularly consistency and isolation, by enforcing uniqueness and referential integrity during transactions to prevent conflicts and ensure reliable data modifications.^[33]

Hash Indexes

Hash indexes employ a hash table data structure to map search keys to data records through a hash function, which computes an index into an array of buckets or slots. Each bucket can hold multiple entries via chaining, typically using overflow pages or linked lists to resolve collisions when multiple keys hash to the same slot. This organization enables direct access to records without traversing ordered structures, making hash indexes particularly efficient for point queries. In implementations like PostgreSQL, the index stores only 4-byte hash values of the indexed column rather than the full key, allowing indexing of large data types such as UUIDs or URLs without size restrictions.^[35] Static hashing variants allocate a fixed number of primary buckets at creation, with the hash function designed to distribute keys evenly across these buckets. When a bucket overflows due to insertions, additional records are appended to overflow chains, which can degrade performance if chains grow long. To address capacity issues from data growth, the entire index may require periodic reorganization, involving a full rehash of all entries into a larger set of buckets; this process is computationally expensive and disrupts availability. As described in standard database textbooks, static hashing suits scenarios with predictable, stable data volumes but incurs trade-offs in maintenance overhead similar to static file organizations like ISAM.^[36] Dynamic hashing techniques, such as linear hashing, mitigate these limitations by allowing the index to grow or shrink incrementally without full rehashing. Introduced by Witold Litwin in 1980, linear hashing uses a family of hash functions that evolve with the structure's size, employing a split pointer to designate the next bucket for division when the load factor exceeds a threshold, typically 80-90%. New buckets are appended sequentially, and records are redistributed only from the split bucket, ensuring average access times remain near constant—often 1-2 disk accesses for searches and insertions under typical bucket capacities of 50 or more. This pseudo-partitioning approach supports high load factors and minimal directory overhead, using just a few bytes of metadata to track the current level and split position.^[37] Operations on hash indexes achieve average O(1) time complexity for equality-based lookups, as the hash function directly computes the bucket location, followed by a linear scan of any chain. Insertions and deletions similarly involve hashing to the target bucket and updating the chain, though deletions may leave dead entries that require periodic cleanup via operations like VACUUM to reclaim space and prevent fragmentation. However, hash indexes do not support range queries, inequality comparisons, or sorting, as the hashed values disrupt key order; attempts to use them for such operations revert to full table scans. Concurrency control in hash indexes often relies on bucket-level locking to manage splits and insertions safely.^[35] Hash indexes find prominent use in in-memory databases for fast equality lookups on unique identifiers, such as user IDs in caching layers. For instance, Redis leverages hash tables as a core data type to store field-value pairs under a single key, enabling sub-millisecond retrievals for session data or configuration objects in high-throughput applications. In disk-based systems like SQL Server's memory-optimized tables, hash indexes provide fixed-memory footprints for OLTP workloads dominated by point queries, avoiding the overhead of ordered indexes.^[38]^[39] Key limitations include the inability to handle range scans or ordered access, leading to inefficiencies in analytical queries, and potential space wastage from uneven hash distributions or residual dead tuples after deletions, which can inflate storage until maintenance runs. High collision rates from poor hash functions may also increase chain lengths, approaching O(n worst-case performance, though modern implementations mitigate this with tunable bucket sizes and quality hash algorithms.^[35]

Index Architectures

Clustered Indexes

A clustered index determines the physical order of data rows in a table, storing the rows themselves in the leaf nodes of the index structure rather than pointers to separate data pages. In systems like SQL Server, when a table has a clustered index, it is referred to as a clustered table, with data rows sorted and stored based on the index key values. Similarly, in MySQL's InnoDB storage engine, every table possesses a clustered index that holds the row data, typically aligned with the primary key. Only one clustered index is permitted per table, as the physical data ordering can exist in only a single sequence.^[6]^[40] The structure of a clustered index is commonly implemented as a B+ tree, where non-leaf nodes contain index keys and pointers to child nodes, while leaf nodes store the complete data rows in sorted order by the index key. This design ensures that traversing the tree leads directly to the physical location of the row data, eliminating the need for additional lookups. In SQL Server's rowstore indexes, the B+ tree facilitates this by placing data rows at the leaf level. InnoDB follows a comparable B-tree organization, with the clustered index pages containing all user-defined columns plus system fields like transaction IDs.^[6]^[40] Clustered indexes provide significant benefits for query performance, particularly in range scans and sequential access patterns, as the sorted physical order enables efficient retrieval without random I/O jumps. For instance, in time-series data sorted by timestamp as the clustered key, queries filtering recent dates can leverage sequential page reads for faster processing. This architecture often reduces disk I/O compared to non-indexed or heap-organized tables, especially for large datasets.^[6]^[40] Clustered indexes are typically created implicitly when defining a primary key constraint, such as via ALTER TABLE ADD PRIMARY KEY in SQL Server, which enforces uniqueness and non-null values while organizing the table. Manual creation is possible using CREATE CLUSTERED INDEX statements for non-primary keys, though primary keys are the standard choice. In InnoDB, the clustered index forms automatically on the primary key; absent that, it defaults to a unique non-null index or a generated row ID.^[41]^[40] Despite their advantages, clustered indexes introduce trade-offs in maintenance operations. Insertions and deletions can be slower because they require maintaining the physical sort order, potentially causing page splits when new data exceeds page capacity—typically leaving space (e.g., 1/16th in InnoDB) to mitigate this, but frequent changes still increase overhead. The restriction to one per table limits flexibility for multiple access patterns, necessitating careful key selection to align with predominant queries.^[42]^[43]

Non-Clustered Indexes

A non-clustered index is a separate data structure from the base table, typically implemented as a B+ tree, where the leaf nodes contain the index key values along with row locators that point to the actual data rows.^[6]^[1] The row locator is either a row identifier (RID), which consists of the file, page, and slot location for heap-organized tables, or the clustered index key for tables with a clustered index.^[6]^[4] This design allows the index to reference data without reorganizing the table's physical storage.^[44] Relational database systems support multiple non-clustered indexes on a single table, often up to dozens in practice and technically as many as 999 in systems like SQL Server, enabling optimized access for various query patterns.^[45]^[4] For instance, a table of customer records might have one non-clustered index on the name column for name-based searches and another on the city column for location-based queries.^[46] In Oracle and PostgreSQL, similar B-tree indexes can be created on multiple columns without limit beyond storage constraints.^[1]^[4] To retrieve data using a non-clustered index, the database engine performs an index seek to locate the relevant keys in the index structure, followed by a key lookup (also known as a bookmark lookup) to fetch the full row from the table using the row locators.^[6] This two-step process efficiently supports equality and range queries on indexed columns.^[1] In PostgreSQL, the ctid system column serves as the row locator for this purpose.^[4] Non-clustered indexes offer flexibility by allowing indexing on columns that do not determine the table's physical order, such as attributes used solely for filtering or joining, without affecting the underlying table's storage layout.^[44] They enable rapid data access for diverse workloads while preserving the table as a heap or maintaining separate ordering via a clustered index.^[1] However, non-clustered indexes introduce overhead, including additional storage for the index structure and the row locators, which can duplicate clustering keys across multiple indexes in systems like SQL Server, effectively acting as hidden columns in the index leaves.^[6] Maintenance during insert, update, and delete operations further contributes to CPU and I/O costs, as each affected index must be updated separately.^[4]^[1]

Covering Indexes

A covering index is a type of database index that includes all columns required to satisfy a specific query, allowing the database management system (DBMS) to retrieve the necessary data directly from the index without accessing the underlying table or clustered index.^[47] This feature extends nonclustered indexes by incorporating non-key columns, often referred to as "included columns," which store the actual data values at the leaf level of the index structure.^[48] In structure, a covering index consists of key columns used for indexing and sorting, with additional non-key columns appended only to the leaf nodes to minimize overhead. For instance, in SQL Server, the syntax employs an INCLUDE clause, such as CREATE NONCLUSTERED INDEX IX_Employee_LastName ON Employees (LastName) INCLUDE (FirstName, Email);, where LastName serves as the key and FirstName and Email are included for coverage.^[47] Similarly, PostgreSQL supports this via the INCLUDE clause in B-tree indexes since version 11, as in CREATE INDEX idx_employee_lastname ON employees (lastname) INCLUDE (firstname, email);, enabling index-only scans.^[48] These included columns do not affect the index's uniqueness or search efficiency but provide complete query resolution. The primary benefit of covering indexes is enabling "index-only" or "index-only scans," which eliminate the need for additional row lookups, thereby reducing input/output (I/O) operations and improving query performance.^[49] For example, in SQL Server, this can reduce logical reads from 14 to 4 for a covered query, representing a substantial I/O decrease, and lower estimated operator costs from around 12 to under 1 in execution plans.^[50] Such optimizations are particularly effective for read-heavy workloads on stable data, where visibility information confirms row accessibility without heap fetches.^[49] However, covering indexes introduce limitations, including increased storage requirements due to the inclusion of extra columns, which can lead to index bloat and higher maintenance costs during inserts, updates, or deletes.^[47] Not all DBMS fully support dedicated INCLUDE clauses; for example, pre-version 11 PostgreSQL relies on expression indexes to approximate coverage, and support is limited to specific index types like B-tree in both SQL Server and PostgreSQL.^[48] Over-inclusion of columns can also exceed index size limits, such as 900 bytes for keys in SQL Server or tuple size constraints in PostgreSQL.^[47]^[48] Detection of covering index usage occurs in query execution plans, where an "Index Seek" or "Index Scan" appears without accompanying "Key Lookup," "Table Scan," or heap access, confirming that the query was fully resolved via the index.^[51] In PostgreSQL, plans explicitly show "Index Only Scan" when conditions are met.^[49]

Design Considerations

Column Selection and Order

Selecting columns for indexing involves prioritizing those that frequently appear in query predicates, particularly WHERE clauses, to maximize query performance while minimizing storage and maintenance overhead. High-selectivity columns, which have a large number of distinct values relative to the total rows (enabling the index to filter out most rows), are ideal candidates. For instance, a column like employee ID with unique or near-unique values offers high selectivity, whereas a gender column with only two values has low selectivity and is generally unsuitable for standalone indexing.^[52]^[53] In composite indexes, the order of columns follows the leftmost prefix principle, where the database can utilize the index only if the query conditions start from the leading column(s). Equality conditions (=) should precede range conditions (>, <, BETWEEN) to allow efficient index range scans after pinpointing exact matches. For example, an index on (department, salary) supports queries like WHERE department = 'IT' AND salary > 50000 effectively, as the equality on department enables a range scan on salary; however, a query on salary > 50000 alone cannot use this index without scanning the entire structure. This equality-range-inequality pattern—placing equality filters first, followed by range, and inequalities last—optimizes for common query patterns.^[52]^[54]^[55] Database management systems (DBMS) provide tools like statistics and histograms to assess selectivity during index design. Histograms capture the distribution of values in a column, enabling the query optimizer to estimate how many rows a predicate will return, which is particularly useful for range-based queries. For example, collecting histogram statistics via commands like RUNSTATS in Db2 helps identify skewed distributions that affect selectivity estimates. Avoid indexing low-cardinality columns in isolation, as they lead to full index scans with little filtering benefit.^[56]^[52]^[57] Common pitfalls include over-indexing small tables, where the overhead of traversing the index structure may exceed the cost of a direct table scan, potentially degrading performance. Additionally, indexing computed columns—derived expressions like total = price * quantity—requires them to be deterministic and persisted in some DBMS to avoid recomputation costs during queries and updates, but this can introduce blocking or maintenance challenges if not managed carefully.^[52]^[58]^[59]

Composite and Multi-Column Indexes

A composite index, also known as a multi-column or concatenated index, is created on two or more columns of a table, treating the combination of column values as a single composite key for indexing purposes. In standard SQL, such an index is defined using the CREATE INDEX statement by listing the columns in a specific order within parentheses, for example: CREATE INDEX idx_example ON table_name (column1, column2);. This structure allows the database management system (DBMS) to organize the index entries in a sorted manner based on the concatenated keys, enabling efficient access for queries that reference the indexed columns.^[60]^[61]^[62] The effectiveness of composite indexes relies on prefix matching, where the index can be utilized only for queries that specify conditions starting from the leftmost (leading) columns in the defined order. For an index on columns (A, B, C), it supports searches like WHERE A = value or WHERE A = value AND B = value, but cannot be used for WHERE B = value alone without scanning the entire index. This leftmost-prefix principle optimizes range scans, equality checks, and sorting by allowing the DBMS to prune irrelevant branches early in the index traversal. Queries involving inequalities on the first differing column after equalities can also leverage the index efficiently.^[60]^[61]^[62]^[44] Composite indexes require more storage space than single-column indexes due to the larger key size from concatenating multiple values, which can slow down index traversal and insertion operations as the number of columns increases. To mitigate performance degradation, indexes are typically limited to 3-5 columns, though some systems allow more; excessive columns can lead to fragmented storage and higher maintenance overhead during data modifications. For string columns, prefix lengths may be specified to index only the initial characters, reducing key size while preserving utility for common queries.^[60]^[62]^[44] These indexes are particularly valuable for optimizing compound queries involving multiple conditions in WHERE, ORDER BY, or GROUP BY clauses, such as in e-commerce applications indexing on (category, price, date) to accelerate product filtering and sorting by category first, then price range. By covering multi-attribute access patterns, they reduce the need for table scans and improve overall query throughput in relational workloads.^[60]^[61]^[62] DBMS implementations vary in constraints: MySQL supports up to 16 columns per index with prefix lengths for strings up to 3072 bytes in InnoDB; Oracle permits multiple indexes on the same columns if they differ in type or partitioning, up to a maximum of 32 columns, emphasizing query-specific ordering; PostgreSQL allows up to 32 columns (configurable at build time) and recommends sparsity for broader applicability; SQL Server has no explicit column limit but advises against excessive columns due to storage impacts.^[60]^[61]^[62]^[44]^[63]

Implementation Aspects

Concurrency Control Mechanisms

In multi-user database environments, concurrency control mechanisms ensure that multiple transactions can access and modify indexes simultaneously without compromising data consistency or integrity. These mechanisms primarily rely on locking protocols to manage read and write operations, preventing conflicts such as lost updates or dirty reads. Indexes, being auxiliary structures, often employ finer-grained locks compared to base tables to enhance parallelism, allowing operations on specific index entries or pages rather than entire structures.^[64] Locking in indexes typically uses two fundamental types: shared locks (S) for read operations, which permit multiple concurrent readers but block writers, and exclusive locks (X) for write operations like inserts, updates, or deletes, which grant sole access and block both readers and other writers.^[65] These locks can be applied at the index level, such as key or page locks on specific entries, contrasting with coarser table-level locks that encompass the entire table and its indexes, thereby reducing contention but potentially limiting scalability.^[64] For instance, an update query might acquire an X lock on an index key while holding intent locks on higher levels, enabling concurrent operations on unrelated index portions.^[66] Deadlocks arise when transactions cyclically block each other, such as two transactions updating the same index page where one acquires an S lock on page A followed by page B, while the other does the reverse, creating a wait cycle.^[67] Prevention strategies include deadlock detection algorithms that monitor lock waits and resolve conflicts by aborting one transaction as a victim, often based on factors like cost or deadlock priority.^[67] Timeouts provide another safeguard, where transactions wait for a configurable period (e.g., via LOCK_TIMEOUT in SQL Server) before erroring out, avoiding indefinite blocks.^[64] Multi-Version Concurrency Control (MVCC) integrates with indexes in systems like PostgreSQL by maintaining multiple versions of index entries tied to transaction snapshots, allowing readers to access a consistent view without acquiring locks that conflict with writers.^[68] Each index tuple includes visibility metadata (e.g., xmin and xmax transaction IDs), enabling snapshot-based reads to ignore uncommitted or obsolete versions, thus avoiding reader-writer blocking entirely during queries. Distinguishing latches from locks is crucial for index operations: locks provide transaction-level, logical protection for data items like rows or keys, supporting features such as deadlock detection and rollback, whereas latches offer short-term, fine-grained synchronization for internal structures like buffer pages during index traversals or updates.^[69] For example, a B+ tree search might use read latches on pages for concurrent access while employing write latches exclusively during insertions to prevent corruption, with modes like reader-writer policies to balance throughput.^[69] To enhance scalability, partitioned indexes distribute lock contention across subsets of data, enabling partition-level lock escalation in systems like SQL Server, where operations on one partition do not block others, thus supporting higher concurrency in large-scale environments.^[70] SQL Server's row versioning further aids this by storing historical row versions in a tempdb-based version store for snapshot isolation, reducing lock duration for readers and minimizing contention on active indexes without altering the locking protocol for writers.^[64]

Index Maintenance and Overhead

The process of building a database index typically involves scanning the entire table to extract the indexed column values, sorting those keys, and constructing the index structure, such as a B-tree, which can require significant temporary sort space from memory or disk.^[71] In modern database management systems like SQL Server and PostgreSQL, this creation process supports parallelism to distribute the workload across multiple processors, reducing completion time.^[72]^[73] For terabyte-scale tables, index creation can take several hours depending on hardware, data distribution, and configuration, as the full table scan and sorting dominate the I/O and CPU demands.^[74] Updating indexes during data modification operations introduces substantial overhead, as each insert, update, or delete must not only alter the base table but also maintain the index structure to preserve its ordering and balance.^[71] In B-tree indexes, inserts and deletes often trigger page splits or merges when nodes exceed capacity, requiring additional I/O to redistribute keys and pointers across pages, which can make these operations slower than equivalent changes to a non-indexed table.^[42]^[75] This maintenance cost escalates with multiple indexes on the same table, as every write operation must propagate changes to all relevant structures, increasing both CPU usage for key computations (e.g., hashing in hash indexes) and I/O for tree balancing.^[52] Index fragmentation arises primarily from page splits during inserts or updates, where a full page is split into two, often leaving the new pages underutilized and resulting in space waste due to reduced page density (e.g., dropping to around 50% fullness post-split).^[42] This inefficiency inflates storage requirements and amplifies I/O during scans, as more pages must be read to retrieve the same amount of data. To mitigate fragmentation, database systems provide commands like REORGANIZE, which compacts pages online without rebuilding the entire index, and REBUILD, which drops and recreates the index to fully restore density and order, though at higher resource cost.^[71]^[42] In recent developments as of 2025, emerging techniques such as learned indexes, which use machine learning models to approximate index structures, offer potential for improved query performance and reduced maintenance overhead in modern database systems.^[76] Tuning index maintenance involves identifying and addressing underutilized or inefficient indexes to minimize ongoing overhead, such as dropping unused ones that still consume resources during writes.^[71] Administrators can monitor index usage through system views, for example, Oracle's DBA_INDEX_USAGE or V$INDEX_USAGE_INFO, which track access frequency and effectiveness since monitoring was enabled via ALTER INDEX ... MONITORING USAGE.^[77] Regular analysis of these statistics helps prioritize drops or rebuilds, balancing maintenance costs against benefits. Overall, while indexes significantly enhance read performance in read-heavy workloads, they impose additional overhead on insert operations due to the I/O and CPU demands of structure maintenance, underscoring the need for selective indexing in write-intensive environments.^[78]

Applications and Limitations

Real-World Use Cases

In online transaction processing (OLTP) systems, clustered indexes are commonly applied on primary keys such as identity columns to organize data physically and support efficient insertions and updates, as seen in banking applications where transactions are clustered by unique transaction IDs to handle high concurrency.^[79] Secondary non-clustered indexes are then built on frequently queried filter columns, such as account numbers in financial transaction tables, enabling rapid lookups for account-specific queries without scanning the entire table.^[71]^[80] For online analytical processing (OLAP) workloads in data warehouses, bitmap indexes are utilized on low-cardinality dimension columns to optimize set-oriented operations and aggregations, such as querying sales totals by region and year in a fact table.^[81] Covering indexes complement this by including aggregated measures directly in the index structure, allowing queries to retrieve summary data without accessing the base table, which accelerates roll-up operations on sales data across time hierarchies.^[25] Full-text search applications leverage inverted indexes to map terms to document locations, facilitating fast keyword retrieval in large corpora; for instance, Elasticsearch employs inverted indexes to efficiently search and analyze log files by enabling quick identification of entries containing specific error patterns or timestamps.^[82] Spatial databases for geographic information systems (GIS) use R-tree indexes to manage multidimensional data, such as points, lines, and polygons representing maps or routes, by organizing them into hierarchical bounding rectangles that support efficient range queries like finding all features within a given area.^[83] This structure, originally proposed for dynamic spatial searching, allows GIS applications to perform proximity searches on location data without exhaustive scans.^[84] In NoSQL environments, document-oriented databases like MongoDB support indexes on embedded documents to query nested fields efficiently, such as indexing user profiles within a larger order document to filter by embedded attributes like address components.^[85] Wide-column stores like Apache Cassandra use secondary indexes on non-partition key columns to enable queries beyond the primary access path, accommodating the flexible schema of wide-column models in scenarios like time-series data retrieval across distributed partitions.^[86]

Performance Trade-offs and Constraints

Database indexes introduce notable storage overhead, typically increasing the overall database size by 10-100% relative to the base table data, depending on the schema, data types, and number of indexes.^[87]^[88] This expansion arises because each index stores sorted keys, pointers, and sometimes row data, duplicating portions of the table across multiple structures.^[52] Write operations suffer from amplification due to the need to synchronize changes across all relevant indexes. An insert, update, or delete on a row requires modifying not only the base table but also every index referencing the affected columns, potentially multiplying the I/O and CPU cost by the number of indexes involved.^[89] For instance, if a column participates in five indexes, the effective cost of an update to that column can approach five times the base operation, as each index must be rebalanced or rewritten.^[90] In PostgreSQL, a single update might touch one data page plus one page per index, leading to 11 or more page writes in tables with multiple indexes.^[91] Indexes are often counterproductive in certain scenarios and should be avoided to prevent unnecessary overhead. Columns with low selectivity—those having few distinct values relative to the row count—yield indexes that cover most of the table, offering minimal query speedup while imposing full maintenance costs during writes.^[78] Write-heavy workloads exacerbate this issue, as frequent modifications amplify the synchronization burden. Small tables rarely benefit from indexing, since full table scans remain faster than index lookups plus the ongoing update overhead.^[52]^[92] Optimization strategies like partial indexes help balance these trade-offs by indexing only qualifying rows, thereby reducing storage footprint and write costs. For example, an index on a status column filtered by WHERE active = 1 targets active records exclusively, avoiding unnecessary entries for inactive ones and speeding up relevant queries.^[93] In systems like PostgreSQL, partial indexes apply a WHERE clause during creation to limit scope.^[94] SQL Server implements this via filtered indexes, which support simple predicates and can decrease index size by up to 90% for skewed data distributions.^[95]^[96] Emerging approaches, such as learned indexes, seek to alleviate traditional constraints by replacing rigid tree structures with machine learning models trained on data distributions. Google's 2017 proposal uses recursive models—essentially small neural networks—to approximate index lookups, achieving up to 3x space savings and 5-70% faster queries in benchmarks while easing maintenance in dynamic environments.^[97] These models treat indexing as a search problem solved via cumulative distribution functions, offering a promising evolution for large-scale databases. Since 2020, research has advanced learned indexes for multi-dimensional data, differential privacy, and disk-based systems, with ongoing developments as of 2025 improving adaptability and performance in diverse workloads.^[97]^[98]

Standardization and Evolution

Key Standards

The SQL standard, as defined by ISO/IEC 9075, primarily addresses database indexes through the specification of constraints such as UNIQUE and PRIMARY KEY, which require underlying index structures for enforcement but do not prescribe explicit index creation syntax in early versions like SQL:92. In SQL:92 (ISO/IEC 9075-1:1992), UNIQUE constraints ensure no duplicate non-null values across specified columns and are defined using the syntax UNIQUE (column_name_list) within table constraints, while PRIMARY KEY constraints combine uniqueness with non-nullability to identify rows uniquely, using PRIMARY KEY (column_name_list), with only one allowed per table.^[99] These constraints implicitly rely on indexes for efficient validation, but the CREATE INDEX statement itself is absent from SQL:92, leaving index management to database implementations. The SQL standard does not define explicit index creation syntax such as CREATE INDEX, leaving it to RDBMS implementations. Standardization of types like clustered or non-clustered is also absent from the core standard, and no dedicated syntax for hints specifying such behaviors exists in the core standard.^[100] Subsequent updates to ISO/IEC 9075, including SQL:2016 and the latest revision SQL:2023, extend schema definitions but do not introduce formal support for partial indexes (which limit indexing to subsets of rows based on predicates) or functional indexes (which index computed expressions); these features remain vendor-specific extensions rather than core standard elements.^[101]^[102] The standard emphasizes data integrity through constraints, with index details deferred to implementation levels, ensuring portability for constraint-based operations while allowing RDBMS flexibility in optimization.^[100] APIs like ODBC and JDBC provide standardized mechanisms for querying index metadata, facilitating cross-database interoperability without relying on proprietary syntax. In JDBC, the DatabaseMetaData.getIndexInfo() method retrieves details on table indexes, including names, types (e.g., unique or approximate), column positions, and cardinality, returning a ResultSet for programmatic access.^[103] Similarly, ODBC's SQLForeignKeys and related catalog functions support index-related queries, though specifics vary by driver conformance to ODBC 3.8 standards.^[104] These APIs align with ISO/IEC 9075 by exposing schema information defined in Part 11 (Information and Definition Schemas), enabling applications to discover and utilize indexes portably.^[101] While the SQL standard promotes uniformity in constraints, implementations introduce non-standard index types to address specific use cases. For instance, MySQL's FULLTEXT indexes enable efficient text searching on CHAR, VARCHAR, or TEXT columns using MATCH...AGAINST syntax, applicable only to InnoDB or MyISAM storage engines and not part of ISO/IEC 9075.^[105] Oracle's bitmap indexes, created via CREATE BITMAP INDEX, use bit vectors for low-cardinality columns in data warehousing scenarios, supporting ad hoc queries but unsuitable for high-concurrency OLTP due to locking overhead, and are proprietary extensions outside the standard.^[106] Most relational database management systems (RDBMS) demonstrate strong compliance with SQL standard constraints, automatically creating indexes for UNIQUE and PRIMARY KEY, though variations in index architecture persist. PostgreSQL and IBM DB2 fully align with ISO/IEC 9075 for constraint-based indexing, supporting B-tree structures by default and optional types like GiST or BRIN without deviating from standard semantics. In contrast, MySQL's InnoDB engine implements clustered indexing implicitly on the PRIMARY KEY (or first UNIQUE NOT NULL key), storing row data with the index, while secondary indexes are non-clustered. These features are non-standard. These differences highlight how RDBMS prioritize performance optimizations over strict adherence to index typology in the standard.

Historical Development and Modern Trends

The development of database indexes began in the 1960s with hierarchical database systems, where IBM's Information Management System (IMS), introduced in 1966, utilized primary and secondary indexes to navigate tree-structured data efficiently for applications like NASA's Apollo program.^[9] These early indexes relied on pointer-based mechanisms to access child records from parent nodes, enabling sequential and direct access in a pre-relational era dominated by navigational databases.^[107] In the 1970s, the shift to relational databases marked a pivotal advancement, with IBM's System R prototype incorporating B-tree indexes for efficient storage and retrieval of ordered data.^[108] The B-tree structure, invented by Rudolf Bayer and Edward M. McCreight in 1970, organized index pages into balanced, multi-level trees to minimize disk I/O for large datasets, becoming the foundational index type for relational systems.^[109] This innovation addressed the limitations of hierarchical indexes by supporting dynamic insertions, deletions, and range queries without frequent reorganizations. The 1980s and 1990s saw diversification into specialized index types for commercial databases, including hash indexes for exact-match lookups and bitmap indexes for low-cardinality columns. Oracle introduced bitmap indexes in version 7.3 in 1995, enabling compact representation of multiple rows via bit vectors to accelerate analytical queries on sparse data.^[110] Concurrently, inverted indexes gained prominence for text search, with Verity Inc., founded in 1983 and commercializing its technology by 1988, pioneering full-text indexing engines that mapped terms to document locations for rapid retrieval in information retrieval systems.^[111] Entering the 2000s, indexes evolved to optimize storage and query specificity, with covering indexes—allowing queries to be resolved entirely from the index without table access—and partial indexes—built only on subsets of data matching a predicate—emerging as key enhancements. Partial indexes were formalized in research by 1989 and implemented in PostgreSQL version 7.2 in 2002, reducing index size for conditional workloads like active records only.^[112] Covering indexes, widely adopted in systems like SQL Server by the early 2000s, included non-key columns to eliminate key lookups, improving performance for frequent SELECT operations. The rise of NoSQL databases further expanded indexing paradigms, as Google's BigTable in 2006 introduced distributed, column-oriented indexes using sorted string tables (SSTables) with block-level indexing for scalable, sparse data across clusters.^[113] Post-2010 advancements integrated machine learning and specialized structures for emerging workloads, including learned indexes that use neural networks to predict key locations, as proposed by Kraska et al. in 2017, potentially reducing lookup latency by modeling cumulative distribution functions instead of rigid tree traversals.^[97] For AI-driven applications, vector indexes emerged to handle high-dimensional similarity searches; Facebook's FAISS library, released in 2017, provided efficient approximate nearest-neighbor indexing for dense embeddings using techniques like inverted file (IVF) structures with quantization.^[114] Cloud-native databases like Amazon Aurora, launched in 2014, incorporated optimized indexes such as spatial B-trees on InnoDB for geospatial data, leveraging shared storage to minimize replication overhead in distributed environments.^[115] Contemporary trends emphasize automation and sustainability, with machine learning enabling automated index tuning to recommend and create indexes based on query patterns, as explored in recent surveys on ML-powered systems that adapt to workload changes without manual intervention.^[116] Additionally, sustainable indexing practices focus on reducing I/O through AI-optimized structures and compression, lowering energy consumption in cloud databases by minimizing disk accesses and promoting green computing initiatives.^[117]

References

[1]
Database Concepts
### Summary of Indexes and Index-Organized Tables (Oracle Database 19c)
[2]
MySQL :: MySQL 8.0 Reference Manual :: 10.3.1 How MySQL Uses Indexes
### Summary of MySQL Indexes (from https://dev.mysql.com/doc/refman/8.0/en/mysql-indexes.html)
[3]
Chapter 11. Indexes
### Summary of Indexes from PostgreSQL Documentation
[4]
6 Indexes - Oracle Help Center
An index provides pointers to the rows in a table that contain a given key value. A regular index stores a list of rowids for each key corresponding to the rows ...
[5]
Clustered and Nonclustered Indexes - SQL Server - Microsoft Learn
Aug 21, 2025 · An index contains keys built from one or more columns in the table or view. These keys are stored in a structure (B-tree) that enables SQL ...Missing: components | Show results with:components
[6]
[PDF] 15-445/645 Database Systems (Spring 2025) - 08 Indexes & Filters I
Depending on the index specification, the DBMS will place null keys in either the first leaf node (i.e., NULL. FIRST) or the last leaf node (i.e., NULL LAST).
[7]
5 Indexes and Index-Organized Tables - Oracle Help Center
An index-organized table is a table stored in a variation of a B-tree index structure. In contrast, a heap-organized table inserts rows where they fit. Parent ...
[8]
Information Management Systems - IBM
The first version shipped in 1967. A year later the system was delivered to NASA. IBM would soon launch a line of business called Database/Data Communications ...
[9]
The relational database - IBM
In his 1970 paper “A Relational Model of Data for Large Shared Data Banks,” Codd envisioned a software architecture that would enable users to access ...
[10]
[PDF] A Relational Model of Data for Large Shared Data Banks
If a system uses indices at all and if it is to perform well in an environment with changing patterns of activity on the data bank, an ability to create and ...
[11]
[PDF] Chapter 14: Indexing - Database System Concepts
Compared to dense indices: • Less space and less maintenance overhead for insertions and deletions. • Generally slower than dense index for locating records ...
[12]
[PDF] Organization and Maintenance of Large Ordered Indices
Organization and maintenance of an index for a dynamic random access file is considered. It is assumed that the index must be kept on some pseudo random ...
[13]
[PDF] 7. Indexing - Computer Science | UC Davis Engineering
Basic Concepts. • Indexing mechanisms are used to optimize certain accesses to data (records) managed in files. For example, the author.
[14]
[PDF] Lecture 6: Indexes and Database Tuning - Washington
Nov 1, 2011 · Indexes: Data structures to organize records via trees or hashing. – Like sorted files, they speed up searches for a subset of records, based on ...
[15]
Primary and foreign key constraints - SQL Server - Microsoft Learn
Feb 4, 2025 · When you specify a primary key constraint for a table, the Database Engine enforces data uniqueness by automatically creating a unique index for ...
[16]
https://learn.microsoft.com/en-us/sql/relational-databases/tables/primary-and-foreign-key-constraints?view=sql-server-ver17
[17]
Optimize storage cost in Azure Cosmos DB | Microsoft Learn
Aug 14, 2024 · However, in Azure Cosmos DB, in the median case, it's much lower. In Azure Cosmos DB, the storage overhead of index is typically low (10-20%) ...Missing: percentage | Show results with:percentage
[18]
[PDF] Chapter 15: Query Processing - Database System Concepts
Index scan – search algorithms that use an index. • selection condition must be on search-key of index. ▫ A2 (clustering index, equality on key). Retrieve a ...
[19]
[PDF] An Overview of Query Optimization in Relational Systems
Query optimization focuses on SQL queries in relational systems. The query optimizer generates input for the execution engine, which is critical for choosing ...
[20]
[PDF] CMU 15-445/645 Database Systems (Fall 2018) :: Joins
Oct 9, 2018 · INDEX NESTED LOOP JOIN. Why do basic nested loop joins suck ass ... Can we accelerate the join using an index? Use an index to find ...
[21]
[PDF] Buffer Pool Aware Query Scheduling via Deep Reinforcement ...
Higher buffer hit rates (and hence lower disk access rates) can enormously impact query execution times but require strategic query scheduling, as execution ...<|control11|><|separator|>
[22]
Documentation: 18: 5.5. Constraints - PostgreSQL
Adding a unique constraint will automatically create a unique B-tree index on the column or group of columns listed in the constraint. A uniqueness restriction ...
[23]
SQL Language Reference
### Summary of Constraint Enforcement Using Indexes in Oracle Database 19c
[24]
Documentation: 18: 11.6. Unique Indexes - PostgreSQL
Unique indexes enforce uniqueness of a column's value or combined values, preventing multiple rows with equal values. PostgreSQL creates them automatically for ...11.7. Indexes on Expressions · Chapter 11. Indexes · Combining Multiple Indexes
[25]
Bitmap Indexes
In a bitmap index, a bitmap for each key value replaces a list of rowids. Each bit in the bitmap corresponds to a possible rowid, and if the bit is set, it ...
[26]
[PDF] Bitmap Index Design and Evaluation - CMU 15-721
In this paper, we present a general framework to study the design space of bitmap indexes for selection queries and examine the disk-space and time.Missing: seminal | Show results with:seminal
[27]
Bitmap Index vs. B-tree Index: Which and When? - Oracle
There are several disadvantages to using a bitmap index on a unique culumn--one being the need for sufficient space (and Oracle does not recommend it).Missing: structure | Show results with:structure
[28]
[PDF] Chapter 14: Indexing - Database System Concepts
• For unclustered index: sparse index on top of dense index (multilevel ... ©Silberschatz, Korth and Sudarshan. 14.30. Database System Concepts - 7th Edition.
[29]
[PDF] ISAM (Indexed Sequential Access Method)
– The index only note the lowest or the highest key in a given bucket. • For this reason, clustering index, is often called a sparse index (e.g., ISAM, a B ...
[30]
https://nlp.stanford.edu/IR-book/
[31]
Apache Lucene Core
### Summary: Lucene's Use of Inverted Indexes for Full-Text Search
[32]
[PDF] Indexing
Indexing locates records by value, and can be dense or sparse. Types include primary, secondary, tree-based, and hash-index.
[33]
Relational Database Introduction - Peter Lars Dordal
To enforce the primary-key constraint, every time a primary-key value is inserted or updated the index is used to see if another record has the same key value.
[34]
[PDF] 7. Indexing
primary index. (; index-sequential file). • Secondary Index: an index whose search key is different from the sequential order of the file (i.e., records in ...
[35]
Documentation: 18: 65.6. Hash Indexes - PostgreSQL
Hash indexes store only the hash value of the data being indexed, thus there are no restrictions on the size of the data column being indexed. Hash indexes ...<|separator|>
[36]
[PDF] Hash-Based Indexes Static Hashing - cs.wisc.edu
Static and dynamic hashing techniques exist; trade-offs similar to ISAM vs. B+ trees. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke. 3. Static ...Missing: reference | Show results with:reference
[37]
[PDF] Linear Hashing: A New Tool For File And Table Addressing - InfoLab
Linear hashing is a hashing in which the address space may grow or shrink dynamically. A file or a table may then support ally number of insertions.
[38]
Redis hashes | Docs
Redis hashes are record types structured as collections of field-value pairs. You can use hashes to represent basic objects and to store groupings of counters, ...
[39]
https://learn.microsoft.com/en-us/sql/relational-databases/in-memory-oltp/indexes-for-memory-optimized-tables
[40]
17.6.2.1 Clustered and Secondary Indexes - MySQL :: Developer Zone
Each InnoDB table has a special index called the clustered index that stores row data. Typically, the clustered index is synonymous with the primary key.
[41]
Create a Clustered Index - SQL Server | Microsoft Learn
Sep 2, 2025 · Create a clustered index · Typical implementations. Clustered indexes are implemented in the following ways: PRIMARY KEY and UNIQUE constraints**.
[42]
Optimize index maintenance to improve query performance and ...
Jun 23, 2025 · This article helps you decide when and how to perform index maintenance. It covers concepts such as index fragmentation and page density, and their impact on ...<|separator|>
[43]
17.6.2.2 The Physical Structure of an InnoDB Index
When new records are inserted into an InnoDB clustered index, InnoDB tries to leave 1/16 of the page free for future insertions and updates of the index records ...
[44]
Indexes - SQL Server - Microsoft Learn
Aug 22, 2025 · A clustered index sorts and stores the data rows of the table or view in order based on the clustered index key. The clustered index is ...Missing: concepts | Show results with:concepts<|control11|><|separator|>
[45]
Maximum Capacity Specifications for SQL Server - Microsoft Learn
Aug 21, 2025 · The maximum number of bytes in a clustered index key can't exceed 900. For a nonclustered index key, the maximum is 1,700 bytes. You can define ...
[46]
Create Nonclustered Indexes - SQL Server - Microsoft Learn
Sep 4, 2025 · Generally, nonclustered indexes are created to improve the performance of frequently used queries not covered by the clustered index or to ...Before you begin · Use SQL Server Management...
[47]
Create Indexes with Included Columns - SQL Server
### Summary of Indexes with Included Columns (Covering Indexes)
[48]
CREATE INDEX
### Summary of INCLUDE Clause for Covering Indexes in PostgreSQL
[49]
18: 11.9. Index-Only Scans and Covering Indexes - PostgreSQL
To make effective use of the index-only scan feature, you might choose to create a covering index, which is an index specifically designed to include the ...11.10. Operator Classes and... · Chapter 11. Indexes · 11.8. Partial Indexes<|control11|><|separator|>
[50]
Covering Index in SQL Server with Key and Non-Key Columns
Jan 11, 2023 · The simplest way to describe it is an index that includes every column for a specific query. Sometimes you hear this referred to as covering the ...Covering Index In Sql Server · Sql Server Index Structure · Key Size Reduction
[51]
Using Covering Indexes to Improve Query Performance - Simple Talk
Sep 29, 2008 · An index that contains all information required to resolve the query is known as a “Covering Index”; it completely covers the query. Using the ...Introduction · An indexing refresher · Resorting to Table Scans · Covering Indexes
[52]
Index Architecture and Design Guide - SQL Server - Microsoft Learn
Oct 1, 2025 · The index is a sorted list of keywords and next to each keyword is a set of page numbers pointing to the pages where each keyword can be found.
[53]
Selectivity of Indexes - Teradata Vantage - Analytics Database
An index that retrieves many rows is said to have weak selectivity. An index that retrieves few rows is said to be strongly selective. The more strongly ...Missing: criteria | Show results with:criteria
[54]
The right column order in multi-column indexes - Use The Index, Luke
The order of columns in an SQL index is essential. The right order makes an index useful for many queries, the wrong order only for a few queries or none at ...
[55]
Composite indexes — MySQL for Developers - PlanetScale
Mar 9, 2023 · Composite indexes cover multiple columns instead of having individual indexes on each column. Understanding and effectively using composite indexes can elevate ...<|control11|><|separator|>
[56]
Collecting histogram statistics - Db2 - IBM
Histogram statistics are most helpful for range, LIKE, and BETWEEN predicates, but they can also improve selectivity for equality, IS NULL, and IN predicates, ...
[57]
Histogram-Based Statistics | Server | MariaDB Documentation
Jul 15, 2025 · Histogram-based statistics are a mechanism to improve the query plan chosen by the optimizer in certain situations.
[58]
How Computed Columns Can Cause Blocking - Brent Ozar Unlimited
May 1, 2021 · When you add a computed column that references another table, like with a scalar user-defined function, you can end up causing concurrency problems.
[59]
Properly Persisted Computed Columns - SQLPerformance.com
May 25, 2017 · Computed columns that are neither persisted nor indexed have valid uses. For example, they can support automatic statistics if the column is deterministic and ...Missing: pitfalls | Show results with:pitfalls
[60]
MySQL :: MySQL 8.4 Reference Manual :: 10.3.6 Multiple-Column Indexes
### Summary of Multiple-Column Indexes in MySQL (MySQL Documentation)
[61]
5 Indexes and Index-Organized Tables - Oracle Help Center
Indexes are structures stored in the database that users manage using SQL statements. Keys are strictly a logical concept.
[62]
Documentation: 18: 11.3. Multicolumn Indexes - PostgreSQL
Indexes can have up to 32 columns, including INCLUDE columns. (This limit can be altered when building PostgreSQL; see the file pg_config_manual.h.)11.4. Indexes and ORDER BY · Chapter 11. Indexes · 11.2. Index Types
[63]
Transaction locking and row versioning guide - SQL Server
This guide describes locking and row versioning mechanisms the Database Engine uses to ensure the integrity of each transaction.
[64]
Data Concurrency and Consistency - Oracle Help Center
In general, the database uses two types of locks: exclusive locks and share locks. Only one exclusive lock can be obtained on a resource such as a row or a ...
[65]
Inside SQL Server: Indexing and Locking - ITPro Today
You still have an X lock on the RID and IX locks on the page and the table, but additionally, you have two X KEY locks. Only indexes acquire KEY locks, so the X ...Inside Sql Server: Indexing... · The Explanation · One Last Question<|control11|><|separator|>
[66]
Deadlocks Guide - SQL Server - Microsoft Learn
Oct 10, 2025 · A deadlock occurs when two or more tasks permanently block each other by each task having a lock on a resource that the other tasks are trying to lock.Understand deadlocks · Resources that can deadlock
[67]
Chapter 13. Concurrency Control
**Summary of MVCC with Indexes in PostgreSQL**
[68]
[PDF] Lecture Notes - 09 Index Concurrency Control - CMU 15-445/645
There is an important distinction between locks and latches when discussing how the DBMS protects its internal elements. A lock is a higher-level, logical ...
[69]
Partitioned Tables and Indexes - SQL Server, Azure SQL Database ...
Partitioning large tables or indexes can have the following manageability and performance benefits. You can transfer or access subsets of data quickly and ...Components And Concepts · Partition Function · Performance Guidelines
[70]
Managing Indexes - Oracle Help Center
The percentage varies greatly according to the relative speed of a table scan and how the row data is distributed in relation to the index key.
[71]
Configure Parallel Index Operations - SQL Server - Microsoft Learn
Sep 22, 2025 · Learn about the max degree of parallelism and learn how to modify this setting in SQL Server.Missing: process | Show results with:process
[72]
PostgreSQL: Parallel CREATE INDEX for better performance
In PostgreSQL 11 parallel index creation is on by default. The parameter in charge for this issue is called max_parallel_maintenance_workers, which can be set ...Missing: scan DBMS
[73]
Creating Index on Large Table - Databases forum at WebmasterWorld
Nov 1, 2014 · Finally, add the index. It really depends on the data and table types. Adding an index on a table of that size will invariably take a long time.
[74]
Understanding MySQL Indexes: Types, Benefits, and Best Practices
Sep 23, 2024 · Too many indexes can slow down data insertion and updates, while too few indexes can result in slow query performance. It's also important ...
[75]
Usage Tracking (DBA_INDEX_USAGE, V$INDEX_USAGE_INFO) in ...
Aug 19, 2017 · Index usage tracking allows unused indexes to be identified, helping to removing the risks associated with dropping useful indexes.
[76]
Database Index Selection Guide | Aerospike
May 2, 2025 · Database index selection: A comprehensive guide. Learn how to ... Storage overhead: indices consume disk space, as well as memory when ...
[77]
Whitepaper: Diagnose and Resolve Latch Contention - SQL Server
Jul 17, 2025 · When using the original table definition, excessive latch contention was observed to occur on the clustered index pk_table1: CREATE TABLE ...
[78]
5 Evaluating Your System's Performance Characteristics
Typical OLTP applications are airline reservation systems, large order-entry applications, and banking applications. The key goals of an OLTP system are ...<|separator|>
[79]
3 Physical Design in Data Warehouses - Oracle Help Center
Bitmap indexes are optimized index structures for set-oriented operations. Additionally, they are necessary for some optimized data access methods such as star ...
[80]
Indexing for Beginners, Part 3 | Elastic Blog
Oct 29, 2013 · With the inverted index, we only have to look for a term once to retrieve a list of all documents containing the term.
[81]
15. Spatial Indexing — Introduction to PostGIS
Both PostGIS and Oracle Spatial share the same “R-Tree” 1 spatial index structure. R-Trees break up data into rectangles, and sub-rectangles, and sub-sub ...
[82]
R-trees: a dynamic index structure for spatial searching
In this paper we describe a dynamic index structure called an R-tree which meets this need, and give algorithms for searching and updating it.
[83]
Create an Index on an Embedded Document - Database Manual
You can create indexes on embedded documents as a whole. However, only queries that specify the entire embedded document use the index.
[84]
Overview | Apache Cassandra Documentation
It implements a partitioned wide-column storage model with eventually consistent semantics. ... Storage-attached indexing (SAI) for secondary indexes. Local ...
[85]
Index storage - Progress Documentation
Jan 16, 2024 · Indexes account for approximately 30 percent of total storage. Therefore, you can take 50 percent of your data storage as an estimate of index storage.
[86]
Trying to find a way fit an extremely large index in to memory
Jun 7, 2020 · The archive table size: 700GB of data, and ~200GB of index, around 0.9b of rows (yes, a lot of rows) Here is the table creation SQL:Balancing Indexing and Database Performance: How Many Indexes ...How much storage space do indexes and keys take for an SQL Table?More results from dba.stackexchange.com
[87]
How PostgreSQL Indexes Can Negatively Impact Performance
Indexes in PostgreSQL come with a cost in terms of performance and resource consumption. Here are 10 different ways PostgreSQL indexes can hurt performance.
[88]
What is the cost of indexing multiple db columns? - Stack Overflow
Jan 7, 2009 · Adding average index (1-3 columns in an index) to a table - makes inserts slower by 2.1%. So, if you add 20 indexes, your inserts will be slower by 40-50%.Missing: amplification | Show results with:amplification
[89]
Fighting PostgreSQL write amplification with HOT updates | by Adyen
May 23, 2022 · In the case of an ordinary update, PostgreSQL would write to at least 11 different pages: one page in the data heap and one page on every index.
[90]
SQL Indexing Best Practices: Speed Up Your Queries - AI2sql.io
Feb 20, 2025 · Small tables (a few hundred rows or less) don't usually need indexes at all, as a full scan is very fast in such cases. Avoid Indexing Columns ...
[91]
Partial indexes: indexing selected rows - Use The Index, Luke
You don't need to index the rows you are never searching for. Messages queues are good example how to use partial indexes in PostgreSQL and SQL Server.
[92]
a Subset of Rows with Partial Indexes - CockroachDB
For example, to define a partial index on columns a and b of table t , filtering on rows in column c greater than 5: CREATE INDEX ON t (a, b) WHERE c > 5; The ...
[93]
Create Filtered Indexes - SQL Server - Microsoft Learn
Jun 27, 2025 · A filtered index is an optimized disk-based rowstore nonclustered index especially suited to cover queries that select from a well-defined ...
[94]
Improve query performance using partial indexes in ... - Amazon AWS
Apr 30, 2024 · Unlike traditional indexes that cover the entire collection, partial indexes narrow down the scope, optimizing performance for targeted queries.
[95]
[1712.01208] The Case for Learned Index Structures - arXiv
Dec 4, 2017 · Access Paper: View a PDF of the paper titled The Case for Learned Index Structures, by Tim Kraska and 4 other authors. View PDF · TeX Source.
[96]
International Standard ISO/IEC 9075:1992
SQL defines distinct data types named by the following <key word>s ... In addition, if the unique constraint was defined with PRIMARY KEY, then it ...
[97]
[PDF] ANSI/ISO/IEC International Standard (IS) Database Language SQL
Annex F (informative): SQL feature and package taxonomy. Requests for interpretation, suggestions for ...
[98]
Database languages — SQL - ISO/IEC 9075-1:2016
ISO/IEC 9075-1:2016 describes the conceptual framework used in other parts of ISO/IEC 9075 to specify the grammar of SQL and the result of processing ...Missing: extensions partial functional indexes
[99]
The 16 Parts of the SQL Standard ISO/IEC 9075
The SQL standard consists of 10 parts: 1-4, 9-11, and 13-16. The missing parts were never released.Missing: extensions partial
[100]
DatabaseMetaData (Java Platform SE 8 ) - Oracle Help Center
DatabaseMetaData provides comprehensive information about a database, implemented by driver vendors to show DBMS capabilities, used by tools to discover how to ...
[101]
getIndexInfo Method (SQLServerDatabaseMetaData) - Microsoft Learn
Jun 25, 2024 · The following example demonstrates how to use the getIndexInfo method to return information about the indexes and statistics of the Person.Contact table.Missing: ODBC | Show results with:ODBC
[102]
MySQL :: MySQL 8.0 Reference Manual :: 14.9 Full-Text Search Functions
### Confirmation
[103]
SQL Language Reference
### Summary on Bitmap Indexes in Oracle
[104]
What is IBM IMS (Information Management System)? - TechTarget
Feb 24, 2022 · They have primary and secondary indexes ... Early versions of IMS served as a precursor to IBM's hierarchical database management system.
[105]
[PDF] System R: Relational Approach to Database Management
The pages for a given index are organiaed into a balanced hierarchic structure, in the style of B-trees [3] and of. Key Sequenced Data Sets in IBM's VSAM ...
[106]
Organization and maintenance of large ordered indices
Organization and maintenance of an index for a dynamic random access file is considered. It is assumed that the index must be kept on some pseudo random ...
[107]
BITMAP index - Gerald Venzl
BITMAP index. By Gerald Venzl / 2010-04-20 / Technology · Oracle ... Bitmap indexes got introduced into Oracle with 7.3 but are currently only available in ...
[108]
History of Verity Inc. - FundingUniverse
Verity was spun off from Advanced Decision Systems in 1988 by Dr. Michael S. Pliner and a group that included David Glazer, Phil Nelson, Andrew Leak, Abe ...
[109]
[PDF] THE CASE FOR PARTIAL INDEXES - Berkeley
Current data managers support secondary and/or primary indexes on columns of relations. In this paper we suggest the advantages that result from indexes which ...<|separator|>
[110]
[PDF] Bigtable: A Distributed Storage System for Structured Data
A Bigtable is a sparse, distributed, persistent multi- dimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the ...Missing: partial NoSQL
[111]
Faiss: A library for efficient similarity search - Engineering at Meta
Mar 29, 2017 · This month, we released Facebook AI Similarity Search (Faiss), a library that allows us to quickly search for multimedia documents that are ...
[112]
Overview of Amazon Aurora MySQL - AWS Documentation
Aurora MySQL supports spatial indexing on InnoDB tables. Spatial indexing improves query performance on large datasets for queries on spatial data.
[113]
ML-Powered Index Tuning: An Overview of Recent Progress ... - arXiv
Aug 25, 2023 · This paper directs attention to these challenges within automated index tuning and explores ways in which machine learning (ML) techniques provide new ...
[114]
Green cloud computing: AI for sustainable database management
Apr 7, 2025 · ... reducing database energy consumption is AI-driven indexing. Conventional indexing. methods often lead to excessive disk I/O operations, as ...<|control11|><|separator|>