Fact-checked by Grok 2 weeks ago

Candidate key

In relational database theory, a is defined as a minimal set of one or more attributes within a that uniquely identifies each , such that no proper of these attributes can perform the same unique identification. This minimality ensures that the key contains no redundant attributes, distinguishing it from broader superkeys, which are any sets of attributes—including candidate keys and their supersets—that also guarantee tuple uniqueness but may include extraneous elements. Candidate keys play a foundational in maintaining and enabling efficient querying in relational models, as they provide multiple potential options for uniquely referencing records without duplication. Typically, database designers select one candidate key to serve as the , which is enforced by the system to prevent values and duplicates, while the remaining candidate keys are termed alternate keys and may support secondary indexes or constraints. For instance, in a representing students, both a unique student ID and a combination of and birthdate could qualify as candidate keys if they each uniquely identify records without overlap. Beyond identification, candidate keys are integral to database normalization processes, particularly in achieving higher normal forms like Boyce-Codd Normal Form (BCNF), where every determinant in a functional dependency must be a candidate key to eliminate anomalies in insertion, update, and deletion operations. This requirement helps minimize redundancy and dependency preservation across decomposed relations, ensuring robust schema design. In practice, identifying all candidate keys during the logical design phase—often through analysis of functional dependencies—allows for flexible enforcement of referential integrity via foreign keys that reference these unique identifiers.

Definition and Basics

Formal Definition

In relational database theory, a is defined as a minimal set of attributes within a relation schema R that uniquely identifies each , ensuring that no two tuples share the same combination of values for those attributes, and no proper of the set possesses this property. This minimality means that removing any single attribute from the candidate key would allow duplicate tuples or fail to distinguish between some pairs of tuples. Formally, let K be a subset of the attributes of relation R. Then K is a candidate key if it satisfies the functional dependency K \to R (meaning the values of K determine all attributes in R), and for every proper subset K' \subset K, K' \not\to R (no smaller subset determines the entire relation). This notation captures the essence of uniqueness through functional dependencies, where K \to R implies that the relation projected onto K has no duplicate tuples. The concept of candidate keys originated in E.F. Codd's foundational , introduced in , where relations could possess multiple nonredundant primary keys, each serving as a . Candidate keys form the minimal basis for , which are non-minimal sets of attributes that also uniquely identify tuples.

Relation to Superkeys

A in a is defined as any set of one or more attributes that uniquely identifies each tuple in a , potentially including extraneous attributes that do not contribute to uniqueness. Every candidate key is a , but the converse does not hold; candidate keys represent the minimal subsets of attributes among all that maintain uniqueness. are generated by extending candidate keys with additional attributes, preserving the unique identification property without necessity. For instance, if {A, B} serves as a in a , then the expanded set {A, B, C} qualifies as a , though it is not minimal due to the redundant inclusion of C. The primary distinction lies in minimality: permit redundant attributes to achieve , whereas demand irreducibility, ensuring no attribute can be removed without violating .

Properties

Uniqueness and Irreducibility

A in a enforces the uniqueness property, ensuring that no two distinct in a share the same value for the attributes comprising the , thereby providing a one-to-one mapping from key values to tuples. This property, foundational to the , guarantees that each tuple can be distinctly identified without , as articulated in the original where primary keys (a type of candidate key) must uniquely identify each tuple in a nonredundant manner. The irreducibility property complements uniqueness by requiring that no proper of the 's attributes possesses the uniqueness property on its own; in other words, removing any single attribute from the key would result in a set that fails to uniquely identify all tuples. This minimality ensures the key is as concise as possible while maintaining full discriminatory power, and it is verified through analysis of functional dependencies within the relation . These properties collectively bolster by preventing the insertion of duplicate records that could otherwise lead to inconsistencies or redundant information in the database. Furthermore, they support across related tables, as foreign keys can reliably reference candidate keys to establish valid links without risking mismatches due to non-unique or reducible identifiers. In SQL database management systems, the property is enforced through constraints, which prevent duplicate values in the specified columns and automatically create a unique index for efficient checking. However, since standard constraints permit values (treating multiple s as non-duplicate), candidate keys require additional NOT NULL constraints on all attributes to fully emulate the non-null mandated by relational theory, distinguishing them from mere unique sets that might allow nulls.

Composition and Cardinality

A candidate key may consist of a single attribute, referred to as a simple candidate key, when that attribute alone uniquely identifies each in a . In the , such a key is a whose values are nonredundant and sufficient for unique identification, as exemplified by a dedicated field. When no individual attribute provides , a candidate key becomes composite, formed by the combination of two or more attributes that together uniquely identify tuples. This structure ensures nonredundancy, meaning no of the attributes can be removed without losing the identification . For instance, a composite candidate key might involve {LastName, FirstName, BirthDate} to distinguish records where single attributes overlap. The of a candidate key, defined as the number of attributes it comprises, varies across different and even within the same , where multiple candidate keys of differing lengths may exist. A might have one simple candidate key and another with higher , reflecting the diverse ways can be achieved based on the attribute's functional dependencies. All such compositions presuppose as a fundamental . Larger candidate keys impose greater storage demands due to the increased size of entries and key values, which in turn elevate indexing overhead and can degrade query performance, especially in operations involving joins or searches on primary keys derived from these candidates.

Identification Methods

Using Functional Dependencies

A () is a on a that specifies, for sets of attributes X and Y, that if two tuples agree on all attributes in X, they must agree on all attributes in Y, denoted as X \to Y. This holds if no two distinct tuples with the same X-values have differing Y-values, ensuring determination. Functional dependencies provide the theoretical foundation for identifying candidate keys, where a candidate key is a minimal set of attributes X such that X \to R, meaning X functionally determines all attributes in the relation schema R. Given a set of FDs F over R, candidate keys are those minimal superkeys derived from F, as superkeys are non-minimal sets satisfying the same determination property. To determine if a set X is a superkey, compute its attribute closure X^+ with respect to F, defined as the set of all attributes functionally determined by X using the FDs in F. X is a superkey if X^+ = R, and it is a candidate key if no proper subset of X has a closure equal to R, confirming minimality. The closure X^+ is derived by applying Armstrong's axioms repeatedly: reflexivity (if Y \subseteq X, then X \to Y), augmentation (if X \to Y, then XZ \to YZ), and transitivity (if X \to Y and Y \to Z, then X \to Z). These axioms are sound and complete, generating all implied FDs in the closure F^+. Before computing closures for key identification, it is often useful to derive a canonical cover of F, which is a minimal equivalent set of FDs with no redundant attributes or dependencies. The process involves removing extraneous attributes from the left and right sides of each FD and eliminating redundant FDs, resulting in a simplified set that preserves the semantics of F for closure computations. This reduction aids in efficient verification of key minimality without altering the implied dependencies.

Computational Algorithms

Computational algorithms for identifying candidate keys from a relational schema typically rely on the given set of functional dependencies (FDs) as input. These methods aim to systematically determine minimal superkeys that uniquely identify tuples in the relation. The attribute closure algorithm forms the foundation for key discovery by iteratively expanding a starting set of attributes using the FDs until no further attributes can be added. To apply it for candidate keys, the closure of each possible subset of attributes is computed; a subset qualifies as a superkey if its closure encompasses all attributes in the schema, and it is minimal (a candidate key) if removing any attribute from it results in a non-superkey. This process, while straightforward, requires checking multiple subsets to ensure minimality. Minimal key finders build on computations through techniques that test attribute subsets for status, incorporating optimizations to mitigate exponential growth in complexity. One such approach exploits the arrangement of attributes in FD graphs to identify essential attributes first—those not on the right-hand side of any —and then builds keys by combining them with non-essential ones via , avoiding exhaustive in many cases. For instance, starting from a of all attributes and iteratively reducing subsets while verifying closures prunes redundant checks. Practical tools and software facilitate automated computation, often integrating FD mining with key identification. In database management systems like and , schema analysis features in tools such as or support visualizing dependencies and manually verifying keys, though full automation typically requires extensions. Open-source libraries, such as the Python-based FDTool, mine minimal FDs from tabular data and directly infer candidate keys using closure-based methods, providing outputs like equivalent attribute sets for large datasets. Regarding complexity, computing all candidate keys is NP-hard in the worst case due to the need to enumerate and verify minimal transversals over the FD set. However, for acyclic schemas—where the dependency has no cycles—the problem reduces to polynomial time via efficient traversal algorithms. Heuristics, such as level-wise search with pruning in tools like FDTool, enable scalable application to large datasets by focusing on promising subsets early.

Examples

Single-Attribute Candidate Key

A single-attribute candidate key occurs in a where one attribute alone uniquely identifies each , assuming it is unique and non-null throughout the . For instance, in an Employees , the EmployeeID attribute serves as the sole candidate key, ensuring no two employees share the same identifier. This setup leverages the uniqueness property to prevent duplicates, allowing reliable entity identification without additional attributes. Consider the relation schema R(\text{EmployeeID}, \text{Name}, \text{Department}), where the \text{EmployeeID} \to \{\text{Name}, \text{Department}\} holds, meaning the value of EmployeeID determines the values of the other attributes. Here, EmployeeID functions as the minimal set required for unique identification, illustrating the simplicity of a single-attribute in relational schemas. The use of a single-attribute candidate key offers several practical benefits in database design. It enables efficient indexing, as the database management system can create a compact on just one attribute for fast lookups and retrievals. Additionally, it minimizes storage overhead by avoiding the need for multiple attributes in keys, reducing the size of indexes and join operations. Joins become straightforward, as referencing a single attribute simplifies queries across related tables without complex composite matching. In real-world applications, single-attribute candidate keys are commonly implemented as surrogate keys, such as auto-incrementing integer IDs in SQL tables, which are system-generated to provide a simple, artificial . For example, in , this can be specified using the property, while employs AUTO_INCREMENT to automatically assign sequential values.

Composite Candidate Key

A composite candidate key arises when no single attribute in a can uniquely identify each , necessitating the combination of multiple attributes to achieve uniqueness while maintaining minimality—meaning the removal of any attribute from the set would violate this . This irreducibility ensures that the key is as concise as possible without . In theory, such keys are essential for scenarios where individual attributes alone are insufficient due to the inherent multiplicity in real-world data relationships. Consider a relation named Orders with attributes OrderDate, CustomerID, Product, and . Here, {OrderDate, CustomerID} functions as a composite candidate key, as neither attribute alone uniquely identifies a —a single customer can place multiple orders, and the same date can involve orders from various customers, but their combination ensures distinctness. The relevant set includes {OrderDate, CustomerID} → {Product, }, demonstrating full determination by the composite key, whereas OrderDate does not functionally determine CustomerID, and vice versa, confirming that no proper qualifies as a key. This structure highlights the complexity of modeling temporal and entity-based uniqueness in transactional data. Composite candidate keys introduce challenges such as elevated join costs in query execution, as the multi-attribute nature results in larger index sizes and more complex matching during table joins compared to single-attribute keys. Furthermore, they heighten the potential for partial dependencies in unnormalized relations, where a non-key attribute might depend on only one component of the key, fostering insertion, update, and deletion anomalies that complicate . In practice, composite candidate keys often appear in legacy database systems employing non-surrogate natural keys to leverage existing business data without artificial identifiers. For instance, in a Books relation, {ISBN, Edition} may serve as a composite candidate key, accommodating cases where different editions of a share core identifiers but require distinction for inventory and cataloging purposes. This approach preserves semantic richness but demands careful schema design to mitigate performance overheads.

Applications and Relations

Selection as Primary Key

In relational database design, selecting a primary key from multiple candidate keys involves evaluating attributes based on , , and to ensure optimal performance and . Simplicity favors single-attribute keys over composites, as they reduce complexity in joins and indexing, while low-cardinality numeric attributes are preferred for their compact storage and faster comparisons. Stability prioritizes keys with values that rarely change, minimizing across related tables. Efficiency considers indexing overhead, where smaller data types like integers outperform variable-length strings in query execution and storage. These criteria guide database administrators to designate one candidate key as primary, ensuring it uniquely identifies rows without nulls or duplicates. The remaining candidate keys are designated as alternate keys, which maintain uniqueness but do not serve as the primary identifier. These are enforced through constraints, allowing multiple unique identifiers per while avoiding the stricter not-null requirement of s. Alternate keys support flexible querying and without impacting the clustered index structure tied to the primary key. In SQL implementations, the selection is formalized using the ALTER TABLE statement to add a constraint on the chosen candidate key columns. This automatically creates a clustered on those columns in most management systems, optimizing row retrieval and enforcing integrity at the storage level. For instance, nonclustered primary keys can be specified if a separate clustered is preferred, balancing access patterns. Trade-offs in selection often contrast natural keys, derived from business data like identifiers, against keys, which are system-generated artificial values such as integers or UUIDs. Natural keys leverage inherent domain logic but risk performance degradation from larger sizes or changes, whereas keys enhance stability and scalability in distributed environments by from business volatility, though they require additional unique constraints on natural alternates. In high-volume systems, reduce insert/delete costs in some cases but may increase overall .

Role in Database Normalization

Candidate keys play a pivotal role in by serving as the foundational elements for identifying and eliminating partial, transitive, and other dependencies that lead to data redundancies and update anomalies. In the process, these keys define the minimal sets of attributes that uniquely identify tuples, enabling designers to decompose relations into smaller, dependency-free components while preserving the relational structure. This ensures that non-key attributes depend solely on entire candidate keys, rather than subsets or intermediaries, thereby maintaining across operations like insertion, deletion, and modification. In (2NF) and (3NF), candidate keys are essential for detecting partial and transitive . A achieves 2NF when it is in and no non-prime attribute depends on a proper of any candidate key; if such a partial dependency exists, is required to isolate the dependent attributes into a separate projected over the and the dependents. For 3NF, the process extends to transitive dependencies, where non-prime attributes must depend directly on candidate keys rather than other non-prime attributes; violations prompt to ensure that all determinants involving non-prime attributes are tied to full candidate keys. This key-centric approach in 2NF and 3NF guarantees that updates to non-key data do not propagate redundantly across tuples. Boyce-Codd Normal Form (BCNF) further refines this by mandating that every determinant in a must be a , addressing cases where 3NF relations still harbor anomalies due to non-key determinants. If a α → β holds where α is not a (and thus not a ), the relation violates BCNF, necessitating into projections over α ∪ β and α ∪ (R - β), with the original dependencies projected accordingly to maintain equivalence. This stricter reliance on eliminates more subtle redundancies, such as those arising from overlapping keys, though it may not always preserve all dependencies without loss. For higher normal forms like (4NF) and (5NF), candidate keys aid in detecting multivalued dependencies (MVDs) and join dependencies that exceed constraints. In 4NF, a relation is normalized if, for every non-trivial MVD α →→ β, α is a , meaning candidate keys help identify and decompose relations where independent multi-valued facts about a key lead to spurious tuples. Similarly, 5NF requires that every join dependency is implied solely by the candidate keys of the relation, prompting into cyclic projections to resolve complex interdependencies without redundancy. These key-based checks ensure that handles advanced anomaly scenarios in relations with multiple independent associations. The benefits of leveraging candidate keys in normalization include ensuring lossless decomposition, where the join of projected relations reconstructs the original without spurious tuples, as verified by the keys' role in the chase algorithm or dependency closure. Additionally, key-centered decompositions promote dependency preservation, particularly in 3NF, allowing local enforcement of functional dependencies without global recomputation, which enhances query efficiency and data consistency in relational databases.

References

  1. [1]
    Relational Databases
    A candidate key is a minimal super key. If a collection of attributes is a super key, but removing any attribute from the collection destroys that the ...
  2. [2]
    [PDF] Relational Database Definitions Relational model - Princeton CS
    Candidate Key: any key. Primary key: a candidate key defined to be primary by person who defines relation. Superkey: any set of attributes that contains a.
  3. [3]
    Definitions and Terminology
    Candidate key. a minimal superkey;. no attributes appear in the key that do not help to identify the tuple ; Primary key. a candidate key chosen to be the ONE ...
  4. [4]
    Normalization
    A table is in Boyce-Codd normal form (BCNF) if it is in 3NF and if every determinant is a candidate key - a combination of attributes that can be uniquely used ...
  5. [5]
    [PDF] CSC 443 – Database Management Systems Normalization
    Candidate Keys. • A candidate key for a given table is. – unique (only one row has that value or combination of values). – irreducible (there is no subset of ...
  6. [6]
    [PDF] Normalization - Chapter 7: Relational Database Design
    Normalization decides if a relation is in good form, and if not, decomposes it into relations in good form, based on functional and multivalued dependencies.<|control11|><|separator|>
  7. [7]
    [PDF] CS34800 Information Systems Relational Design - CS@Purdue
    Sep 28, 2016 · Functional Dependencies (Cont.) ▫ K is a superkey for relation schema R if and only if K → R. ▫ K is a candidate key for R if and only if. ○ K → ...
  8. [8]
    [PDF] Functional Dependencies - Clark University
    Is AG a candidate key? 1. Is AG a super key? 1. Does AG → R? == Is (AG)+ ... • K is a superkey if K → R. • K is a candidate key if there is no subset ...
  9. [9]
    [PDF] A Relational Model of Data for Large Shared Data Banks
    Future users of large data banks must be protected from having to know how the data is organized in the machine. (the internal representation). A prompting.
  10. [10]
    [PDF] Introduction to Relational Database Systems
    Primary Keys. • A relation might have several candidate keys. • In these cases, one candidate key is chosen as the primary means of uniquely identifying tuples.Missing: definition | Show results with:definition
  11. [11]
    MORE ON CANDIDATE KEYS - SQL and Relational Theory, 2nd ...
    Uniqueness: No valid value for R contains two distinct tuples with the same value for K. · Irreducibility: No proper subset of K has the uniqueness property.
  12. [12]
    Documentation: 18: 5.5. Constraints - PostgreSQL
    Unique constraints ensure that the data contained in a column, or a group of columns, is unique among all the rows in the table. The syntax is: CREATE TABLE ...
  13. [13]
    Targeted Least Cardinality Candidate Key for Relational Databases
    Aug 24, 2024 · The targeted least cardinality candidate key problem (TCAND) aims to find the smallest set X such that the FD X to T can be derived from F.
  14. [14]
    Dependency Structures of Data Base Relationships
    The -semi-lattice of the B in maximal dependencies is shown to determine the dependency structure completely and some insight into recent work on use of ...
  15. [15]
    [PDF] FUNCTIONAL DEPENDENCIES - cs.wisc.edu
    FD CLOSURE. Armstrong's axioms are: • sound: any FD generated by an axiom belongs in F. K. • complete: repeated application of the axioms will generate all ...
  16. [16]
    [PDF] Functional Dependencies - Purdue Computer Science
    Closure of a Set of Functional Dependencies. ▫ Given a set F set of functional dependencies, Armstrong's axioms show there are certain other functional ...
  17. [17]
    [PDF] Functional Dependencies and Normalization - TINMAN
    Given R and F, K ⊆ R is a superkey for R if K → R holds. ... If the missing set of attributes forms a candidate key then this must be the only candidate key.
  18. [18]
    Finding candidate keys for relational data bases - ACM Digital Library
    E. F. Codd, "A Relational Model of Data for Large Shared Data Banks", CACM, Vol. 13, No. 6, 1970, 377 - 387. Digital Library · Google Scholar. [3]. E. F. Codd ...
  19. [19]
    (PDF) An Efficient Algorithm to Compute the Candidate Keys of a ...
    Aug 6, 2025 · We provide an efficient algorithm for computing the candidate keys of a relational database schema. The algorithm exploits the 'arrangement' ...
  20. [20]
    FDTool: a Python application to mine for functional dependencies ...
    Oct 19, 2018 · FDTool provides the user with the minimal FDs, equivalent attribute sets and candidate keys mined from a dataset. This is given with the time (s) ...
  21. [21]
    The complexity of dependency detection and discovery in relational ...
    Jan 8, 2022 · Arguably the most prominent type of such dependencies are the unique column combinations (UCCs), also known as candidate keys. These are ...
  22. [22]
    [PDF] Polynomial delay Hybrid algorithms to enumerate candidate keys for ...
    Dec 22, 2023 · We investigate the problem of enumerating all candidate keys of a rela- tional database schema. A candidate key of a set of functional ...
  23. [23]
    Database Design Methodology Summary - UC Homepages
    A candidate key is an attribute or minimal set of attributes of an entity that uniquely identifies each occurrence of that entity. We may identify more than one ...
  24. [24]
    [PDF] IT360: Applied Database Systems Relational Model (Chapter 3)
    Specify surrogate key in SQL Server: column_name int_type IDENTITY (seed, increment). Specify surrogate key in MySQL: column_name int_type AUTO_INCREMENT.
  25. [25]
    [PPT] DENEME - SIUE
    Minimality – every column of a composite candidate key must be necessary for uniqueness. Primary Key. Primary Key – a candidate key chosen by the database ...
  26. [26]
    Functional Dependencies
    A relation is in BCNF if and only if every determinate is a candidate key. To be in 3NF but not in BCNF, must have a composite candidate key, one of whose ...
  27. [27]
    [PDF] Chapter 3: Relational Model Example of a Relation
    ▫ K is a candidate key if K is minimal. Example: {customer-name} is a candidate key for Customer, since it is a superkey (assuming no two customers can ...
  28. [28]
    Composite Key in DBMS - Analytics Vidhya
    Jul 1, 2024 · Composite keys work by combining multiple columns to create a unique identifier for records in a table. For instance, consider a table Orders ...
  29. [29]
    Composite Key in DBMS: Definition, Uses, Examples & Best Practices
    Jul 28, 2024 · Instead, a composite key can be created using the combination of the customer ID and the order date. This ensures that each order is uniquely ...
  30. [30]
    [PDF] Normalization - Chapter 7: Relational Database Design
    candidate key of the original schema, so no further relation schema needs be ... ▫ A temporal functional dependency X → Y holds on schema R if the ...
  31. [31]
    [PDF] Chapter 14: Indexing - Database System Concepts
    single node (the one on the left), and delete the other node. • Delete the ... condition that the search key is a candidate key is a candidate key.
  32. [32]
    First Normal Form (1NF), Second Normal Form (2NF), and Third ...
    Dec 1, 2020 · Note that the 2NF partial dependency rule only kicks in if your relation has a composite candidate key (i.e. one that consists of multiple ...
  33. [33]
    Second Normal Form (2NF) - GeeksforGeeks
    Jul 12, 2025 · Eliminate Partial Dependencies: A partial dependency occurs when a non-prime attribute (not part of the candidate key) depends only on a part of ...Missing: challenges costs
  34. [34]
    Mapping UML class diagrams to the relational model
    (RCId, A) is a composite candidate key. RCId is a foreign key referencing RC(RCId); A surrogate key, such as RCA_Id, may be created to serve as the simple ...
  35. [35]
    Database Systems A Practical A - Thomas Connolly - Academia.edu
    ... relation, clientNo and propertyNo together form the (composite) candidate key. If we need to take into account the pos- sibility that a client may view a ...
  36. [36]
    Identifying Candidate Primary Keys | Database Design
    A candidate key is a nonredundant attribute set that uniquely identifies the tuples of a relation. Note that it is possible for composite candidate keys to ...
  37. [37]
    SQL Database design: Choosing a primary key
    Mar 16, 2014 · A primary key uniquely defines each row. Choose between natural keys (business value) or surrogate keys (generated by SQL). Ensure consistency ...
  38. [38]
    Unique constraints and check constraints - SQL - Microsoft Learn
    Feb 4, 2025 · Although both a UNIQUE constraint and a PRIMARY KEY constraint enforce uniqueness, use a UNIQUE constraint instead of a PRIMARY KEY constraint ...
  39. [39]
    Create Primary Keys in SQL Server - Microsoft Learn
    May 15, 2025 · You can define a primary key in the SQL Server Database Engine by using SQL Server Management Studio or Transact-SQL.
  40. [40]
    Natural versus Surrogate Primary Keys in a Distributed SQL Database
    Feb 18, 2020 · Every table must have a defined primary key constraint. · Every table must have at least one business unique key. One of these might coincide ...<|separator|>
  41. [41]
    [PDF] Data Normalization
    A relation is in BCNF, if and only if, every determinant is a candidate key. • The difference between 3NF and BCNF is that for a functional dependency A → B, ...
  42. [42]
    [PDF] Database Design & Normalization
    K ... Why is a dependency on parts of a candidate key bad? — That is: Why is not ...
  43. [43]
    class3
    5NF/PJNF (Projection-Join Normal Form). A relation R is in 5NF if and only if every join dependency in R is implied by the candidate keys of R. e.g., STUDENT ...