Fact-checked by Grok 2 weeks ago

Functional dependency

In relational database theory, a functional dependency (FD) is a constraint that exists between two sets of attributes in a relation, where the values of one set (the determinant) uniquely determine the values of the other set (the dependent). Formally, an FD denoted as X \to Y holds if, for every pair of tuples in the relation that agree on the attributes in X, they must also agree on the attributes in Y; this ensures that Y is functionally determined by X, preventing multiple possible values for Y given a single value for X.^[1] Trivial FDs occur when Y is a subset of X, while non-trivial FDs capture meaningful semantic relationships, such as an employee's social security number determining their name in a personnel database.^[2] The concept of functional dependencies was introduced by Edgar F. Codd in the early 1970s as part of his development of the relational model and normalization techniques, where they serve as a foundational mechanism for modeling real-world constraints and ensuring data consistency within relations.^[3] FDs generalize the idea of primary keys, extending to superkeys and candidate keys, and play a critical role in identifying dependencies that reflect business rules or entity relationships.^[4] For instance, in a relation schema for orders, the order ID might functionally determine the customer details, enforcing uniqueness and referential integrity across the database.^[5] Functional dependencies are central to the process of database normalization, a technique developed by Codd in the early 1970s to organize relations into progressively higher normal forms (such as first normal form (1NF), second normal form (2NF), third normal form (3NF), and Boyce-Codd normal form (BCNF)) that minimize data redundancy and avoid insertion, deletion, and update anomalies. By analyzing FDs, designers can decompose unnormalized relations into smaller, well-structured ones while preserving the original data's dependencies; for example, partial dependencies (where a non-key attribute depends on part of a composite key) are eliminated in 2NF to prevent redundancy. This normalization relies on concepts like full functional dependency, where no proper subset of the determinant implies the dependent attributes.^[1] To reason about sets of functional dependencies and derive implied constraints, database theory employs Armstrong's axioms—a complete and sound set of inference rules named after William W. Armstrong, who formalized them in 1974.^[1] These axioms include reflexivity (if Y \subseteq X, then X \to Y), augmentation (if X \to Y, then XZ \to YZ for any Z), and transitivity (if X \to Y and Y \to Z, then X \to Z), along with derived rules like union and decomposition.^[6] Using these, one can compute the closure of an attribute set (all attributes implied by it) to identify keys and validate normal forms, ensuring schemas are both theoretically robust and practically efficient.^[7]

Definition and Notation

Formal Definition

In relational database theory, a functional dependency (FD) is a constraint that specifies a semantic relationship between attributes in a relation schema. Consider a relation R defined over a set of attributes A = \{A_1, A_2, \dots, A_n\}, where each attribute A_i has an associated domain D_i, and R consists of tuples t such that for each attribute A_i, t[A_i] \in D_i. A functional dependency X \to Y, where X and Y are subsets of A, holds in R if, for any two tuples t_1 and t_2 in every possible instance of R, whenever t_1[X] = t_2[X] (meaning the projected values on all attributes in X are identical), it follows that t_1[Y] = t_2[Y] (the projected values on all attributes in Y are identical).^[8] This definition captures the idea that the values of attributes in Y are uniquely determined by the values of attributes in X, enforcing consistency across the relation without permitting anomalies from redundant data representations. FDs are properties of the relation schema rather than specific instances, meaning they must hold for all valid states of the relation to reflect the intended semantics of the data.^[8] A functional dependency X \to Y is trivial if Y \subseteq X, in which case it always holds because the equality on X necessarily implies equality on the subset Y. Trivial FDs provide no additional constraints but are useful in theoretical analyses, such as closure computations, as they are logically implied by any set of FDs.^[8] In contrast, a non-trivial functional dependency occurs when Y \not\subseteq X, meaning Y contains at least one attribute not in X; such FDs impose meaningful constraints that capture key data interdependencies, enabling database designers to identify dependencies for normalization and integrity enforcement.^[8]

Common Notation

In database theory, a functional dependency (FD) between two sets of attributes X and Y in a relation is commonly denoted using the arrow symbol as X \to Y, where X is the determinant (or left-hand side) and Y is the dependent (or right-hand side). This notation indicates that the values of X uniquely determine the values of Y in every tuple of the relation. Attributes are typically represented as single identifiers, often italicized for clarity (e.g., \textit{emp\_id} \to \textit{name}), while sets of attributes are denoted by uppercase letters or concatenated identifiers (e.g., AB \to C).^[8] A collection of functional dependencies is expressed as a set F = \{X_1 \to Y_1, X_2 \to Y_2, \dots \}, where each element is an individual FD applying to the same relation schema. This set notation allows for the specification of multiple constraints collectively, facilitating analysis of implications and normal forms. FDs are classified as trivial or non-trivial: an FD X \to Y is trivial if Y \subseteq X, as it holds vacuously without imposing additional constraints; otherwise, it is non-trivial and conveys meaningful semantic information about the data.^[8]

Illustrative Examples

Employee Departments

In a typical workplace database, the relation schema Employee(EID, Name, Dept, Manager) models basic employee information, where EID is the unique employee identifier, Name is the employee's full name, Dept is the department code, and Manager is the manager's identifier. This schema captures key functional dependencies (FDs) that reflect organizational relationships: {EID} → {Name, Dept, Manager}, meaning each employee's unique ID fully determines their name, department assignment, and reporting manager; and {Dept} → {Manager}, indicating that the department code determines the manager, as each department has a single overseeing manager. These FDs ensure data integrity by enforcing that no two employees share the same EID but differ in name, department, or manager, while all employees in the same department must report to the identical manager. For instance, if Employee 1 and Employee 2 are both in the "HR" department, their Manager attribute must be the same to satisfy {Dept} → {Manager}. The following sample tuples illustrate a relation satisfying these FDs:

EID	Name	Dept	Manager
101	Alice Smith	HR	M001
102	Bob Johnson	HR	M001
103	Carol Lee	IT	M002
104	David Kim	IT	M002

Here, tuples with Dept = "HR" share Manager = "M001", and those with Dept = "IT" share Manager = "M002", while each EID uniquely determines the full row. This example models organizational hierarchy by using FDs to maintain consistency in departmental assignments: the {EID} → {Dept, Manager} dependency links individual employees to their place in the structure, while {Dept} → {Manager} prevents anomalies like mismatched managers within a department, ensuring reliable representation of reporting lines across the workforce.

Vehicle Attributes

In a relational database for managing vehicle inventory, consider the schema Cars(VIN, Make, Model, Color, Engine), where VIN represents the unique vehicle identification number, Make is the manufacturer (e.g., Toyota), Model is the specific model (e.g., Camry), Color is the exterior color, and Engine is the engine type (e.g., 2.5L inline-4). A key functional dependency here is {VIN} → {Make, Model, Color, Engine}, meaning the VIN uniquely determines all other attributes of the vehicle, as each VIN corresponds to a single, specific car with fixed specifications at manufacture. Another dependency is {Make, Model} → {Engine}, since the engine type is predetermined by the manufacturer and model, regardless of individual vehicle variations like color. These dependencies illustrate how attributes interdepend in a product catalog: the VIN serves as a primary key enforcing uniqueness and completeness of vehicle details, while the partial key {Make, Model} constrains engine assignments to avoid inconsistencies, such as assigning a V8 engine to a compact sedan model. In practice, this ensures that all records for vehicles of the same make and model share the correct engine specification. The following table shows sample tuples in the Cars relation, demonstrating enforcement of these functional dependencies:

VIN	Make	Model	Color	Engine
1HGCM82633A004352	Honda	Accord	Red	2.4L I4
1HGCM82633A004353	Honda	Accord	Blue	2.4L I4
2T1BR32E65C123456	Toyota	Camry	Black	2.5L I4
2T1BR32E65C123457	Toyota	Camry	Silver	2.5L I4

Note that tuples with the same {Make, Model} have identical Engine values, upholding {Make, Model} → {Engine}, while distinct VINs allow variations in non-determined attributes like Color but maintain consistency in specs. In real-world inventory systems, such functional dependencies prevent data anomalies, such as duplicate or conflicting engine details for the same model, thereby supporting reliable querying and updates in automotive databases.

Academic Scheduling

In the context of university timetabling, functional dependencies model the constraints inherent in event-based data for lecture scheduling. A typical relation schema is Lectures(Course, Instructor, Time, Room), which stores details about scheduled lectures, including the course offered, assigned instructor, time slot, and assigned room. This schema captures real-world rules such as fixed instructor assignments per course and unique room allocations per instructor-time combination.^[9] A fundamental functional dependency in this schema is {Course} → {Instructor}, indicating that the course uniquely determines the instructor, as each course is taught by a designated faculty member across all its sessions. Another key dependency is {Instructor, Time} → {Room}, meaning that a specific instructor in a given time slot is assigned to exactly one room, reflecting the physical impossibility of an instructor occupying multiple locations simultaneously. These dependencies enforce consistency in the database, such as ensuring the same instructor for all instances of a course without redundant or conflicting entries.^[9] The following table provides sample tuples from the Lectures relation, demonstrating how the dependencies hold:

Course	Instructor	Time	Room
CS101	Smith	09:00	R101
CS101	Smith	10:00	R102
MATH201	Johnson	09:00	R103
CS101	Smith	11:00	R101
MATH201	Johnson	10:00	R104

In this example, all tuples for CS101 share the same Instructor (Smith), satisfying {Course} → {Instructor}. Similarly, there are no conflicting rooms for the same {Instructor, Time} pair (e.g., Smith at 09:00 is always in R101 if multiple entries existed for that slot). Trivial dependencies, such as {Course, Instructor} → {Instructor}, also apply but are inherent to the schema.^[9] Functional dependencies like these in educational databases promote data integrity by preventing inconsistencies, such as assigning the same instructor or room to overlapping events, thereby avoiding scheduling conflicts and supporting reliable timetable management.^[10]

Axiomatic Properties

Armstrong's Axioms

Armstrong's axioms form the foundational set of inference rules for deriving all functional dependencies implied by a given set in relational database theory. Developed by William W. Armstrong in his 1974 paper on dependency structures, these axioms enable logical reasoning about attribute determinations without enumerating all possible dependencies explicitly.^[11] They operate on sets of attributes, denoted as uppercase letters like X, Y, and Z, where a functional dependency X \rightarrow Y means that the values of attributes in Y are uniquely determined by those in X.^[2] The three core axioms are formally stated as follows:

Reflexivity: If Y \subseteq X, then X \rightarrow Y. This axiom captures the trivial case where a set of attributes determines its own subsets, as the values in Y are inherently part of X.
Augmentation: If X \rightarrow Y, then for any set Z, XZ \rightarrow YZ. This allows extending both sides of a dependency with the same attributes, preserving the determination relationship.
Transitivity: If X \rightarrow Y and Y \rightarrow Z, then X \rightarrow Z. This reflects the chain of determinations where indirect implications are propagated.

These axioms can be applied repeatedly to infer the closure of a set of functional dependencies.^[2] The axioms are sound, meaning any dependency derived from them holds true in every relation that satisfies the original set of dependencies. For soundness proofs: Reflexivity is immediate, as subsets are directly determined; augmentation preserves equality of tuples on the extended sides if they hold on the original; transitivity follows from the definition, since agreement on X implies agreement on Y, which implies agreement on Z.^[2] They are also complete, meaning every dependency logically implied by the original set can be derived using the axioms alone. Completeness is established by showing that for any implied X \rightarrow A (where A is a single attribute), a derivation exists via augmentation and transitivity from dependencies involving attributes that "reach" A through the implied equalities.^[2]

Derived Inference Rules

In addition to Armstrong's axioms, several derived inference rules provide a more convenient set of tools for manipulating functional dependencies, allowing for efficient reasoning without always reverting to the base axioms.^[6] These rules are sound, meaning any functional dependency inferred from them logically follows from the original set.^[12] The union rule states that if X \to Y and X \to Z, then X \to YZ. This rule enables combining multiple dependencies with the same left-hand side into a single dependency.^[6] To derive it using Armstrong's axioms:

X \to Y (given); augment with X to obtain X \to XY.
X \to Z (given); augment with Y to obtain XY \to YZ.
X \to YZ (transitivity of steps 1 and 2).^[12]

The decomposition rule asserts that if X \to YZ, then X \to Y and X \to Z. This allows splitting a dependency with a compound right-hand side into separate dependencies.^[6] Its derivation relies on reflexivity and transitivity:

X \to YZ (given),
YZ \to Y (reflexivity),
X \to Y (transitivity of steps 1 and 2).
The proof for X \to Z follows symmetrically by reflexivity on Z.^[6]

The pseudotransitivity rule provides that if X \to Y and WY \to Z, then WX \to Z. This extends transitivity to cases where the right-hand side of the first dependency overlaps with the left-hand side of the second.^[13] To derive it, use augmentation and transitivity:

X \to Y (given),
WX \to WY (augmentation of step 1 with W),
WY \to Z (given),
WX \to Z (transitivity of steps 2 and 3).^[12]

These derived rules enhance the efficiency of analyzing functional dependency sets by reducing the need for repeated applications of the base axioms during closure computations or equivalence checks, thereby simplifying proofs and algorithmic implementations in database design.^[1]

Attribute Closures

Closure of an Attribute Set

In relational database theory, the closure of an attribute set X with respect to a set of functional dependencies F, denoted X^+, is defined as the set of all attributes A such that the functional dependency X \to A is implied by F.^[14] This closure represents the complete set of attributes that can be functionally determined from X using the dependencies in F.^[15] The attribute closure possesses key properties that ensure its utility in dependency analysis. First, X \subseteq X^+, as X trivially determines itself via reflexivity.^[15] Additionally, the closure is monotonic: if Y \subseteq X, then Y^+ \subseteq X^+, meaning that expanding the determining set cannot reduce the scope of determined attributes.^[15] In database design and normalization, attribute closure plays a central role in verifying dependencies and identifying keys. To check if an FD \alpha \to \beta holds under F, one verifies whether \beta \subseteq \alpha^+.^[15] It is also used to detect superkeys: a set X qualifies as a superkey for a relation schema R if R \subseteq X^+.^[16] The closure of X under F is computed via an iterative process that initializes the result with X and repeatedly incorporates attributes from the right-hand sides of FDs in F whose left-hand sides are subsets of the current result set, continuing until no further attributes can be added.^[15]

Computing Attribute Closure

The standard algorithm for computing the closure of an attribute set X with respect to a set of functional dependencies F, denoted X^+, operates iteratively by applying the dependencies in F to expand X until stabilization.^[17] The procedure begins by initializing a result set to X. It then repeatedly iterates over all functional dependencies in F: for each dependency U \to V, if every attribute in U is contained in the current result set (i.e., U \subseteq result), the attributes in V are added to the result set. This scanning and augmentation continues in a loop until a full pass over F adds no new attributes, at which point the result set equals X^+.^[17]^[1] In the worst case, the time complexity of this naive iterative algorithm is O(|R|^2 \times |F|), where |R| is the total number of attributes (as subset checks and additions may take O(|R|) time per dependency per iteration, with up to |R| iterations).^[1] To illustrate, consider an employee relation with schema R(SSN, Name, Dept, Manager) and functional dependencies F = \{SSN \to Name, Dept \to Manager\}. Computing the closure of \{Dept\}^+ starts with result = \{Dept\}. Scanning F, Dept \subseteq result triggers adding Manager, yielding result = \{Dept, Manager\}. A second pass adds nothing new, so \{Dept\}^+ = \{Dept, Manager\}.^[2] Optimizations can improve efficiency: functional dependencies may be sorted by increasing size of the left-hand side U to prioritize those easier to satisfy early in iterations, accelerating attribute additions; trivial dependencies (where V \subseteq U) can be filtered out beforehand to reduce scanning overhead.^[1] More advanced linear-time variants, such as the linear closure algorithm using counters for pending dependencies and attribute lists, further mitigate redundancy in repeated checks and are suitable for larger schemas.^[1] This computation is essential in database design tools for tasks like identifying candidate keys and superkeys by checking closures against the full attribute set.^[17]

Functional Dependency Sets

Equivalence Between Sets

In relational database theory, two sets of functional dependencies F and G defined over the same relation schema are equivalent, denoted F \equiv G, if they imply exactly the same set of functional dependencies, that is, if F^+ = G^+, where F^+ represents the closure of F under Armstrong's axioms. This equivalence holds if and only if every functional dependency in F can be logically inferred from G (i.e., G \models F), and every functional dependency in G can be logically inferred from F (i.e., F \models G).^[18] Equivalence ensures that F and G enforce identical constraints on the relation instances, preserving data integrity without altering the semantics of the dependencies.^[19] To test whether G \models X \to Y for a specific functional dependency X \to Y in F, compute the closure of the attribute set X with respect to G, denoted X^+_G, by iteratively applying the functional dependencies in G starting from X until no new attributes are added; G implies X \to Y if Y \subseteq X^+_G.^[18] The full test for equivalence between F and G thus requires performing this closure computation for the left-hand side of every functional dependency in F using G, and vice versa for every functional dependency in G using F. If all such implications hold, then F \equiv G. This method leverages the attribute closure algorithm, which can be implemented efficiently for typical database schemas.^[19] The following algorithm outlines the procedure to verify equivalence:

For each functional dependency X \to Y in F:
- Compute X^+_G.
- If Y \not\subseteq X^+_G, return false (not equivalent).
For each functional dependency X \to Y in G:
- Compute X^+_F.
- If Y \not\subseteq X^+_F, return false (not equivalent).
If all checks pass, return true (F \equiv G).

This approach has polynomial time complexity relative to the number of attributes and dependencies, making it practical for database design tools.^[18] Consider the sets F = \{A \to B, B \to C\} and G = \{A \to B, B \to C, A \to C\} over a relation schema R(A, B, C). To verify F \equiv G:

Check G \models F: A^+_G = \{A, B, C\} (includes B), and B^+_G = \{B, C\} (includes C).
Check F \models G: A^+_F = \{A, B, C\} (includes B and C, so A \to C holds by closure); the other dependencies are direct.

Thus, F \equiv G, as G includes a redundant dependency A \to C that is implied by transitivity in F.^[19] Determining equivalence is crucial in database design, as it allows for the simplification of functional dependency sets by identifying and removing implied dependencies, thereby reducing redundancy while maintaining full informational equivalence for tasks like normalization and query optimization.^[18]

Minimal Covers

A minimal cover of a set of functional dependencies F over a relation schema R is an equivalent set F_m that satisfies three conditions: (1) each functional dependency in F_m has a singleton attribute on the right-hand side, (2) no attribute can be removed from the left-hand side of any dependency in F_m without altering the closure F_m^+, and (3) no dependency can be removed from F_m without altering F_m^+.^[20]^[21] To compute a minimal cover, the standard algorithm proceeds in three phases. First, decompose each dependency in F into equivalent dependencies with singleton right-hand sides by applying the decomposition rule (if X \to YZ, replace with X \to Y and X \to Z). Second, for each resulting dependency X \to A, remove extraneous attributes from the left-hand side: an attribute B \in X is extraneous if A \in (X - \{B\})^+ computed using the current set of dependencies. Third, remove any redundant dependencies: a dependency X \to A is redundant if A \in X^+ computed using the remaining dependencies after its removal. These steps may need iteration until no further reductions occur.^[20]^[22]^[21] Minimal covers are not unique; multiple distinct sets may satisfy the minimality conditions while preserving equivalence to the original F. The cardinality of a minimal cover is bounded above by |R| \times (|R| - 1), reflecting the maximum number of possible singleton right-hand side dependencies.^[20]^[21] For example, consider a relation schema R(A, B, C) with F = \{AB \to C, C \to B, A \to B\}. After decomposing to singletons (already satisfied), check left-hand sides: for AB \to C, B is extraneous because C \in A^+ under F (A \to B adds B, then AB \to C adds C), so replace with A \to C. The set becomes \{A \to C, C \to B, A \to B\}. Then, check redundancy: A \to B is redundant because under \{A \to C, C \to B\}, A^+ = \{A, C, B\} (A \to C adds C, C \to B adds B). The resulting minimal cover is \{A \to C, C \to B\}. An alternative minimal cover could be obtained through different reduction orders, but in this case, it aligns.^[20] Minimal covers eliminate redundancies in functional dependency sets, facilitating efficient analysis and serving as a prerequisite for dependency-preserving decompositions in database normalization.^[21]^[22]

Canonical Covers

A canonical cover of a set of functional dependencies F is defined as a nonredundant cover where each functional dependency is of the form X \to A, with A being a single attribute on the right-hand side, and the left-hand side X is left-reduced, meaning it contains no extraneous attributes. This form ensures the cover is minimal while preserving the closure of F, as the set logically implies all dependencies in F and vice versa. Canonical covers are not unique; multiple distinct sets may exist while satisfying the conditions.^[18] The construction of a canonical cover typically begins with a minimal cover of F and proceeds by decomposing any dependencies with multiple right-hand attributes into separate dependencies, each with a single right-hand attribute. Redundancy is then eliminated by checking, for each dependency \alpha \to \beta in the set, whether it can be derived from the remaining dependencies; if so, it is removed. Extraneous attributes on the left-hand side are identified and removed by testing whether the dependency still holds without them, using the closure computed from the current set. This process iterates until no further simplifications are possible, resulting in a left-reduced and nonredundant set.^[18] For example, consider the set F = \{A \to BC\} over schema R(A, B, C). Decomposing the right-hand side yields \{A \to B, A \to C\}. To check for redundancy, compute the closure of \{A \to C\}, which does not imply A \to B, and similarly for the other; thus, both dependencies remain, forming the canonical cover \{A \to B, A \to C\}. If F included an additional dependency like B \to C, further checks might reveal redundancies, such as if A \to C could be derived from the rest.^[18] Canonical covers serve as a foundation for automated database design tools, enabling efficient decomposition into normal forms by providing a simplified, nonredundant basis for dependency analysis and schema optimization.^[23]

Normalization Applications

Relation to Normal Forms

Functional dependencies (FDs) are fundamental to database normalization, a systematic process introduced by Edgar F. Codd to organize relational databases by minimizing data redundancy and avoiding update, insertion, and deletion anomalies caused by undesirable dependencies.^[24] In this context, FDs identify candidate keys—minimal sets of attributes that uniquely determine all others—and highlight inter-attribute relationships that can lead to inconsistencies if not properly managed.^[25] Normalization leverages FDs to decompose relations into smaller, well-structured schemas while preserving data integrity and query efficiency.^[3] First Normal Form (1NF) establishes the foundational structure for relations by requiring that all attributes contain atomic, indivisible values, with no repeating groups or arrays within cells.^[24] While 1NF does not explicitly rely on FDs for its definition, FDs implicitly support it by ensuring that relations can be viewed as sets of tuples where keys enforce uniqueness, preventing multivalued dependencies that violate atomicity.^[25] For instance, a relation storing employee skills as a list would fail 1NF; normalizing to atomic values allows FDs like {EmployeeID} → {Skill} to hold properly. Second Normal Form (2NF) builds on 1NF by eliminating partial dependencies, where a non-prime attribute (not part of any candidate key) depends on only a proper subset of a composite candidate key.^[3] Using FDs, a relation is in 2NF if every non-prime attribute is fully functionally dependent on each candidate key, meaning no FD exists where the determinant is a proper subset of the key.^[3] Consider a relation with attributes {OrderID, ProductID, Quantity, ProductName}, where {OrderID, ProductID} is the key and ProductName depends only on ProductID; this partial dependency violates 2NF, and decomposition guided by the FD {ProductID} → {ProductName} resolves it into separate relations for orders and products. Third Normal Form (3NF) extends 2NF by removing transitive dependencies, ensuring that every non-prime attribute depends directly on a candidate key and not on another non-prime attribute.^[3] Formally, a relation is in 3NF if, for every non-trivial FD X → A, either A is prime or X is a superkey, preventing chains like Key → B → C where C depends transitively on the key.^[3] In an employee relation with FDs {EmployeeID} → {Department} and {Department} → {DepartmentLocation}, the transitive dependency {EmployeeID} → {DepartmentLocation} via Department violates 3NF; decomposing into employee-department and department-location relations eliminates this while retaining the original FDs. Boyce-Codd Normal Form (BCNF) provides a stricter criterion than 3NF, requiring that for every non-trivial FD X → A, X must be a superkey, making every determinant a candidate key. This addresses cases in 3NF where overlapping candidate keys allow determinants that are not superkeys, potentially causing anomalies. For example, in a relation {Student, Subject, Instructor} with keys {Student, Subject} and {Subject, Instructor}, and FD {Instructor} → {Subject}, {Instructor} is a determinant but not a superkey, violating BCNF; decomposition into {Student, Subject} and {Instructor, Subject} restores BCNF without loss of information. Decomposition using FDs aims to achieve higher normal forms through lossless-join decompositions, where the natural join of the sub-relations reconstructs the original relation exactly, preserving all data and dependencies.^[26] A decomposition is lossless with respect to a set of FDs if the join dependency holds logically from the FDs, often tested using attribute closure or tableau methods to ensure no spurious tuples are introduced.^[26] This FD-guided process, as in 3NF synthesis, splits relations along FD boundaries to eliminate anomalies while maintaining dependency preservation.^[3]

Heath's Theorem

Heath's Theorem, named after Ian Heath who proved it in his 1971 work on relational decompositions, provides a condition for lossless-join decompositions in relational database design. This result is essential for ensuring that normalization processes preserve the original data without introducing spurious tuples upon rejoining decomposed relations. The theorem states that for a relation schema R with attributes partitioned as R = X \cup Y \cup Z, where X \to Y is a functional dependency in F, the decomposition into R_1 = X \cup Y and R_2 = X \cup Z is lossless-join. Formally, if R satisfies X \to Y, then R = \pi_{XY}(R) \bowtie \pi_{XZ}(R). This holds because the common attributes X act as a key for the join, and the FD ensures no information loss. The theorem generalizes to decompositions guided by individual FDs during normalization.^[27] A proof relies on the definition of lossless join and Armstrong's axioms. Given X \to Y, any tuple in the join of the projections must satisfy the FD, and since the original relation satisfies it, the join reconstructs R exactly without extraneous tuples. This property underpins algorithms for achieving normal forms like BCNF, where decomposing on violating FDs yields lossless results.^[28] The implications of Heath's Theorem are significant for normalization, as it guarantees that FD-based decompositions maintain data fidelity, supporting efficient schema design. For instance, in a relation R(\text{StudentID}, \text{Course}, \text{Instructor}) with F = \{\text{StudentID} \to \text{Course}\}, decomposing into R_1(\text{StudentID}, \text{Course}) and R_2(\text{StudentID}, \text{Instructor}) is lossless by Heath's Theorem, allowing isolation of dependencies while preserving the original information.^[27]

Advanced Concepts

Irreducible FD Sets

An irreducible set of functional dependencies, also known as a minimal cover or minimal basis, for a given set F over a relation schema R is a set G such that the closure G^+ equals F^+, and G satisfies three key properties: (1) every functional dependency in G has a singleton on the right-hand side; (2) no attribute in the left-hand side of any dependency in G is extraneous, meaning removing it would change the closure; and (3) no dependency in G is redundant, meaning removing it would alter the closure.^[20] These properties ensure that G is as concise as possible while preserving the semantics of F, with every dependency essential to generating the full set of implied dependencies. Irreducible sets are not unique; multiple distinct irreducible sets may exist for the same F, but all such sets contain the same number of dependencies.^[29] This minimality aids in database design tasks, such as identifying candidate keys, where computing the closure of attribute subsets from an irreducible set reveals the minimal determinants of all attributes.^[20] To construct an irreducible set from F, follow a systematic algorithm that iteratively refines the dependencies while maintaining equivalence:

Decompose each dependency in F so that right-hand sides are single attributes (using the union axiom).
For each dependency X \to A in the current set, check each attribute B \subset X; if replacing X \to A with (X - \{B\}) \to A preserves the closure (verified via attribute closure computation), perform the replacement and repeat until no further reductions are possible.
For each remaining X \to A, if A is in the closure of X without using X \to A itself, remove X \to A; repeat until no redundancies remain.

This process yields an irreducible set, though the order of steps and checks can lead to different but equivalent results.^[20] Consider the relation schema R(A, B, C) with F = \{AB \to C, C \to B, A \to B\}. First, right-hand sides are already singletons. For AB \to C, test subsets: the closure of A includes B (via A \to B) but not C, so B is not extraneous; however, rechecking shows A \to C can be derived later. After refinement, A \to B is redundant (derivable from C \to B and other implications), yielding the irreducible set G = \{A \to C, C \to B\}, whose closure matches F^+. Verification: the closure of AB under G includes C (via A \to C) and B (trivially), confirming equivalence.^[20]

Embedded Dependencies

In relational database theory, an embedded functional dependency refers to a functional dependency that holds within a projection of a relation onto a subset of its attributes. Specifically, given a relation schema R with attribute set U and a set of functional dependencies F over U, consider a subschema S \subseteq U. An embedded FD X \to Y (where X, Y \subseteq S) holds in the projection \pi_S(R) if it is logically implied by F, meaning X \to Y \in F^+.^[30] This ensures that the dependency is preserved in the restricted view of the data, even though projection may eliminate some tuples or attributes from the original relation. The projected functional dependency set F_S for a subschema S is formally defined as F_S = \{ X \to Y \mid X \cup Y \subseteq S, \, X \to Y \in F^+ \}. This set captures all functional dependencies that are embedded in the projection onto S, providing a complete characterization of the constraints that must hold in \pi_S(R) assuming R satisfies F. Computing F_S involves determining the closure of attribute sets under F restricted to S: for each potential determinant X \subseteq S, compute X^+ with respect to F, then take X^+ \cap S to derive the implied dependencies within S. While enumerating all such dependencies can be computationally intensive (exponential in the size of S), efficient algorithms exist for finding minimal covers of F_S, such as the Reduction By Resolution (RBR) method, which operates in polynomial time for many practical cases.^[30] The importance of embedded and projected functional dependencies lies in their role in verifying dependency preservation during database schema decomposition and normalization. A decomposition of R into subschemas R_1, \dots, R_n is dependency preserving if the union of the projected sets F_{R_1} \cup \cdots \cup F_{R_n} has the same closure as F, i.e., (F_{R_1} \cup \cdots \cup F_{R_n})^+ = F^+. This property allows constraints to be enforced locally on individual relations without requiring joins, which is essential for maintaining data integrity in distributed or normalized databases. Failure to preserve dependencies can lead to inconsistencies that are detectable only through costly full reconstructions of the original relation. For example, consider an employee relation schema R(\text{EmpID}, \text{Name}, \text{Dept}, \text{Manager}) with functional dependencies F = \{\text{EmpID} \to \text{Name}, \text{Dept} \to \text{Manager}\}. Projecting onto S = \{\text{Dept}, \text{Manager}\}, the closure computation yields F_S = \{\text{Dept} \to \text{Manager}\}, as \text{Dept}^+ \cap S = \{\text{Dept}, \text{Manager}\} and no other nontrivial dependencies hold within S. This embedded FD ensures that in the projected relation, each department maps to a unique manager, mirroring the constraint from the original schema. Advanced applications connect embedded dependencies to implication testing via the chase algorithm, which simulates the application of dependencies to a tableau to verify if a candidate FD follows from F. In the context of projections, the chase can confirm whether an embedded FD in F_S is implied by the original set, aiding in the analysis of decomposed schemas without explicit closure enumeration.

References

[1]
[PDF] Functional Dependencies - Cleveland State University
A functional dependency occurs when values on one set of attributes uniquely determine values on another set of attributes. For example, TIME depends on FLIGHT.
[2]
[PDF] FUNCTIONAL DEPENDENCIES - cs.wisc.edu
Functional dependencies (FDs) are constraints that generalize keys. If two tuples agree on attributes A, they must agree on B, then A functionally determines B.
[3]
[PDF] A Relational Model of Data for Large Shared Data Banks
E. F. CODD. IBM Research Laboratory, San Jose, California. Future users of large data banks must be protected from having to know how the data is organized in ...
[4]
functional dependency - T. Andrew Yang: CSCI5333 - Class Notes
Informal definition: A functional dependency is a constraint between two sets of attributes from the database.
[5]
[PDF] Functional Dependencies - Computer Science
Functional Dependencies (FD). Definition. Inference rules. Definition of a functional dependency. Functional dependencies are defined between sets of attributes ...
[6]
[PDF] Functional Dependencies and Normalization - TINMAN
Inference rules for FDs (Armstrong's Axioms). Reflexivity: If Y ⊆ X then, X → Y . Such FDs are called trivial FDs. Augmentation: If X → Y , then XZ → Y Z.
[7]
[PDF] Functional Dependencies - Purdue Computer Science
▫ Given a set F set of functional dependencies, Armstrong's axioms show there are certain other functional dependencies that are logically implied by F ...
[8]
The theory of joins in relational databases - ACM Digital Library
AHO, A.V., HOPCROFT, J.E., AND ULLMAN, J.D. The Design and Analysis of ... The theory of joins in relational databases. Computing methodologies · Modeling ...
[9]
A complete axiomatization for functional and multivalued ...
We investigate the inference rules that can be applied to functional and multivalued dependencies that exist in a database relation.
[10]
[PDF] Functional Dependencies - UOW
A manager manages one department: manager-number → department-name. An employee has one manager: employee-number → manager-number ... functional dependency ...
[11]
[PDF] Chapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION ...
This functional dependency may suggest that the attribute EngineCapacity be placed in a relation with candidate key VIN. However, that may not always be ...
[12]
[PDF] Identified By The VIN Pattern - DataOne Software
The VIN pattern is specific to a unique engine from the OEM. The engine block, cylinders, number of valves and displacement are always specific to VIN, ...
[13]
Data Modeling: vehicle's year, make, and model? - Stack Overflow
Mar 30, 2011 · I'm trying to model vehicles at a basic level. Here's how I see the data: It wouldn't make sense to have a "year" table containing just 1 column.Identifying Functional Dependencies IIFunctional dependencies is DBMS - keyMore results from stackoverflow.com
[14]
8.3: Relational Database Management Systems
Apr 22, 2025 · For example, a car database may use the vehicle identification number (VIN) and the make of the vehicle attributes as a candidate key.
[15]
CS186, Homework 5: Functional Dependencies - cs.wisc.edu
In a similar manner, create attributes Course (char[20]), Grade (char[1]), Hour (datetime), Room (char[10]), and Instructor (char[20]). For the attributes with ...
[16]
A guide to functional dependencies in database systems - Wrike
Mar 27, 2024 · Basically, a functional dependency is represented as X → Y, where X and Y are sets of attributes. This notation implies that any two rows in the ...
[17]
Dependency Structures of Data Base Relationships
A fuzzy extension of the notion of functional dependencies which is consistent with the Armstrong axioms is considered, improving the well known notion of ...
[18]
[PDF] Exercises - UCLA Computer Science
7.8 Use Armstrong's axioms to prove the soundness of the union rule. (Hint: Use the augmentation rule to show that, if α → β, then α → αβ. Apply the ...
[19]
[PDF] Manipulating Functional Dependencies - Computer Science (CS)
Sep 30, 2008 · – To check if a functional dependency α → β holds (or, in other words is in F+) just check if β ⊆ α+ other words, is in F+), just check if β ⊆ α ...
[20]
[PDF] Relational Normalization Theory Limitations of E-R Designs
Definition: A functional dependency (FD) on a relation schema R is a ... • The attribute closure of a set of attributes, X, with respect to a set of ...
[21]
[PDF] Normalization
Given a set F of functional dependencies, there are certain other functional dependencies that are logically implied by. F. – For example: If A → B and B → C, ...
[22]
[PDF] Functional Dependencies - Berkeley
Sep 18, 2002 · Functional Dependency (FD), X A (X determines A), given any set of ... 4) Attribute Closure (X+) a. The complete set of attributes that ...
[23]
[PDF] Functional Dependency - Dr. Murat Kantarcioglu
Consider a set F of functional dependencies and the functional dependency α → β in F. ... test (with attribute closure done with respect to F). – result = α while ...
[24]
[PDF] Chapter 5: Covers for Functional Dependencies
each FD in F, giving O(np) time complexity for this stage. Hence, the entire algorithm takes O(np) time. Corollary A reduced minimum cover can be found for ...<|control11|><|separator|>
[25]
[PDF] Functional Dependencies
Sep 25, 2008 · Two sets of FDs S and T are equivalent if each FD in S follows from T and each FD in T follows from S. • S = {A → B, B → C, A → C} and T ...
[26]
A Note on Minimal Covers
This paper corrects a wide-spread misconception regarding the algorithm for extracting a minimal cover from a given set of functional dependencies. 16. SIGMOD ...
[27]
[PDF] Functional Dependencies, Schema Decomposition, Normal Forms
Let X, Y be sets of attributes from relation R. • X -> Y (we say: “X functionally determines Y”). − Any tuples in R which agree in all attributes of X must ...Missing: room | Show results with:room
[28]
[PDF] Functional Dependencies and Finding a Minimal Cover
More formally, F covers G if G+ ⊆ F+. F is a minimal cover of G if F is the smallest set of functional dependencies that cover G.Missing: theory | Show results with:theory
[29]
[PDF] Normalization - Chapter 7: Relational Database Design
functional dependency is said to be NOT dependency preserving. Page 19. ©Silberschatz, Korth and Sudarshan. 7.21. Database System Concepts - 7th Edition.Missing: manager | Show results with:manager
[30]
A relational model of data for large shared data banks
A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced.
[31]
[PDF] Further Normalization of the Data Base Relational Model
In this paper, second and third normal forms are defined with the objective of making the collection of relations easier to understand and control, simpler to ...
[32]
[PDF] The Theory of Joins in Relational Data Bases 107 - Harry Moreno
given set of functional dependencies has a lossless join. Let us fix our attention on a particular set of relation schemes R1, R2,. , Rm , whose union R is ...
[33]
The theory of joins in relational databases
ACM Transactions an Database Systems, Vol. 4, No. 3, September 1979. Page 17. The Theory of Joins in Relational Databases.
[34]
[PDF] 6. Normalization - NUS Computing
Jan 28, 2015 · Theorem (Heath's Theorem). A relation R that satisfies a functional dependency X → Y can always be losslessly decomposed into its ...
[35]
[PDF] 3. Functional Dependencies - NUS Computing
Jan 22, 2015 · Functional Dependencies. Closure. Keys. Armstrong Axioms. Minimal Cover. Functional Dependencies. What is the meaning of ∅→1Al? Page 17 ...
[36]
Computing covers for embedded functional dependencies
Computing covers for embedded functional dependencies. Article. Free access. Share on. Computing covers for embedded functional dependencies. Author: G. Gottlob.