Second normal form
Second normal form (2NF) is a level of database normalization that ensures a relational database table eliminates partial dependencies on any candidate key, thereby reducing data redundancy and potential anomalies in data manipulation. A relation is in 2NF if it is already in first normal form (1NF)—meaning it contains no repeating groups or arrays—and every non-prime attribute (any attribute that is not part of a candidate key) is fully functionally dependent on the entire candidate key, rather than depending on only a portion of a composite key.[1] Introduced by Edgar F. Codd in his 1971 paper "Further Normalization of the Data Base Relational Model", 2NF builds directly on 1NF by addressing issues arising from partial dependencies, where non-key attributes rely on only part of a multi-attribute key.[1] For instance, in a table tracking suppliers and their products (with a composite key of supplier ID and product ID), if supplier details like city depend solely on supplier ID, this partial dependency violates 2NF and can lead to redundant data storage.[2] To achieve 2NF, such tables are typically decomposed into separate relations: one for the partial dependency (e.g., suppliers with their details) and another for the full key combination (e.g., supplier-product links), preserving data integrity through lossless joins.[1] The primary benefits of 2NF include minimizing data redundancy, which saves storage and simplifies maintenance, while preventing update anomalies (e.g., inconsistent changes to supplier city across multiple rows), insertion anomalies (e.g., inability to add a new supplier without a product), and deletion anomalies (e.g., losing supplier details when removing the last product).[3] Although 2NF does not address transitive dependencies (handled in third normal form), it forms a foundational step in the normalization hierarchy, promoting efficient and reliable relational database design as outlined in Codd's relational model.[4]Prerequisites
First Normal Form
First normal form (1NF) requires that a relation in a relational database consists solely of atomic values in each attribute, meaning every entry is indivisible and cannot contain groups, arrays, or multivalued components.[5] This ensures that domains are simple, with no nested structures or repeating groups within tuples, and that rows are unique to maintain the set-like properties of relations.[5] Additionally, each attribute must conform to a defined domain to preserve data type integrity and prevent anomalies in querying or updating.[6] The concept of 1NF was introduced by Edgar F. Codd in 1970 as the initial requirement for normalizing relations in the relational model, emphasizing the decomposition of nonsimple domains into atomic elements to facilitate efficient data management.[7] Codd described relations on simple domains as those where elements are "atomic (nondecomposable) values," highlighting the need to eliminate nonsimple domains that could complicate representation and operations.[5] Key to 1NF is the elimination of multivalued attributes, which arise when an attribute holds multiple values for a single entity, leading to redundancy and update inconsistencies.[3] By enforcing atomicity, 1NF promotes a flat structure where each tuple represents a single, indivisible fact, ensuring domain integrity across all attributes.[8] Consider an unnormalized employee table where the "Skills" attribute contains a list of values for each employee:| EmployeeID | Name | Skills |
|---|---|---|
| 101 | Alice | SQL, Python, Java |
| 102 | Bob | Java, C++ |
| EmployeeID | Name | Skill |
|---|---|---|
| 101 | Alice | SQL |
| 101 | Alice | Python |
| 101 | Alice | Java |
| 102 | Bob | Java |
| 102 | Bob | C++ |
Functional Dependencies
A functional dependency (FD) in a relational database is a constraint that specifies a relationship between two sets of attributes, where the values of one set (the determinant) uniquely determine the values of another set (the dependent). Formally, an FD X \to Y holds over a relation r if, for any two tuples t_1 and t_2 in r, t_1[X] = t_2[X] implies t_1[Y] = t_2[Y], ensuring that each X-value is associated with at most one Y-value.[9] This assumes relations are in first normal form, with atomic attribute values.[10] The notation for an FD is X \to Y, where X and Y are subsets of the relation's attributes, X is the determinant (left side), and Y is the dependent (right side). A full functional dependency exists when X \to Y and no proper subset of X determines Y, meaning all attributes in X are necessary for the determination. In contrast, a partial functional dependency occurs when a proper subset of X determines Y, indicating that not all attributes in the determinant are required.[11] Functional dependencies are classified into types based on their structure. A trivial FD is one where Y \subseteq X, such as AB \to A, which always holds by definition and provides no new information. A non-trivial FD has Y \not\subseteq X, where the dependent attributes are not part of the determinant, allowing for meaningful constraints on the data.[9] To reason about sets of FDs, inference rules known as Armstrong's axioms are used, providing a sound and complete system for deriving all implied dependencies. These include reflexivity: if Y \subseteq X, then X \to Y; augmentation: if X \to Y, then XZ \to YZ for any Z; and transitivity: if X \to Y and Y \to Z, then X \to Z.[9] These axioms, originally proposed by William W. Armstrong, enable the logical inference of additional FDs from a given set without accessing the actual data.[12] For example, consider a student relation with attributes StudentID, Name, and Major. The FD StudentID \to Name is a full functional dependency, as the single attribute StudentID uniquely determines the student's name, with no subset possible. However, if the key is composite (e.g., {StudentID, CourseID} \to Grade), then StudentID \to Name becomes a partial functional dependency, since only a proper subset of the key (StudentID) determines Name.[13] To identify all attributes determined by a set X under a given set of FDs F, the attribute closure X^+ is computed. This is the largest set of attributes derivable from X by repeatedly applying the FDs in F: start with X^+ = X, then add any attribute A if there exists W \to A in F such that W \subseteq X^+, until no changes occur. For instance, given F = \{AB \to C, C \to B\} over schema ABC, the closure (AB)^+ = ABC, as AB determines C, and C then determines B (though B is already included).[9] This process helps verify candidate keys and implied dependencies systematically.[10]Formal Definition
Core Conditions
Second normal form (2NF) was introduced by Edgar F. Codd in 1971 as an advancement beyond first normal form (1NF) to address issues arising from partial dependencies in relational database design, thereby enhancing the structure's resistance to certain anomalies.[14] This normalization level builds on the foundations of functional dependencies, ensuring that relations maintain integrity by preventing attributes from depending on only portions of composite keys.[14] Formally, a relation R is in 2NF if it satisfies 1NF and every non-prime attribute of R is fully functionally dependent on every candidate key of R.[14] A candidate key is defined as a minimal set of attributes that uniquely identifies each tuple in the relation, serving as a superkey with no superfluous attributes—meaning no proper subset of the set can also uniquely identify tuples.[14] Attributes are classified as prime if they belong to at least one candidate key, and non-prime otherwise; the 2NF requirement applies specifically to non-prime attributes to ensure their dependencies align with entire keys rather than fragments.[14] The core condition can be broken down through functional dependencies: for every dependency X \to A in the relation, where A is a non-prime attribute, the determinant X must be a complete candidate key and not a proper subset thereof, guaranteeing full rather than partial dependency.[14] Full functional dependency means that A depends on X but on no proper subset of X, preventing scenarios where non-prime attributes are influenced by only part of a composite key.[14] This criterion ensures that the relation avoids redundancy tied to key subsets while preserving the uniqueness enforced by candidate keys.[14]Partial Dependencies
A partial dependency in the context of database normalization occurs when a non-prime attribute is functionally dependent on only a proper subset of a composite candidate key, rather than the entire key.[1] This violates the principle of full functional dependency required for second normal form (2NF), as defined by E.F. Codd, where every non-prime attribute must depend on the whole candidate key and not merely part of it.[1] To identify a partial dependency, consider a functional dependency X \to A in a relation where X is a proper subset of a candidate key and A is a non-prime attribute.[15] For instance, if the candidate key is a composite like \{K_1, K_2\} and K_1 \to A holds, then A partially depends on K_1 alone, failing the full dependency test.[16] This identification process relies on analyzing all functional dependencies to ensure no such partial relationships exist with respect to any candidate key.[17] Partial dependencies lead to data redundancy and various anomalies in database operations.[1] Specifically, they cause update anomalies, where changing a value in one part of the key (e.g., a product detail) necessitates updates across multiple rows to maintain consistency, increasing the risk of errors or inconsistencies.[15] Insertion anomalies may also arise, as adding a new non-key attribute value tied to only part of the key could require extraneous rows, while deletion anomalies occur when removing a row eliminates unrelated data.[16] A representative example involves an order details relation with composite candidate key \{ \text{OrderID}, \text{ProductID} \} and an attribute Supplier that depends solely on ProductID, as each product has a fixed supplier regardless of the order.[17] Here, the functional dependency \text{ProductID} \to \text{Supplier} represents a partial dependency, since Supplier does not rely on the full key including OrderID.[15] To resolve partial dependencies and achieve 2NF, the relation must be decomposed into smaller relations where each non-prime attribute fully depends on the entire candidate key in its new schema (detailed process in Decomposition Process).[1] This decomposition eliminates the partial relationships while preserving the original data's recoverability through joins.[16]Practical Examples
Violating Design
A common example of a database design violating second normal form is the "Order Details" relation, which stores information about items in customer orders. This table has columns for OrderID, ProductID, ProductName, Supplier, and Quantity, with a composite primary key consisting of OrderID and ProductID.[18] The functional dependencies in this relation are: ProductID → ProductName, ProductID → Supplier, and (OrderID, ProductID) → Quantity.[18] Here, ProductName and Supplier exhibit partial dependencies, as they depend solely on ProductID rather than the entire composite key, leading to redundancy when the same product appears in multiple orders.[18] To illustrate the redundancy, consider the following sample data where the same product (e.g., "Widget A" from "Supplier X") is ordered multiple times, resulting in repeated values for ProductName and Supplier:| OrderID | ProductID | ProductName | Supplier | Quantity |
|---|---|---|---|---|
| 1001 | P001 | Widget A | Supplier X | 5 |
| 1002 | P001 | Widget A | Supplier X | 3 |
| 1003 | P002 | Gadget B | Supplier Y | 2 |
| 1004 | P001 | Widget A | Supplier X | 4 |
Compliant Design
To achieve second normal form, a relation exhibiting partial dependencies is decomposed into multiple relations, each of which has no partial dependencies and preserves the original data through lossless decomposition. This process involves identifying the determinants of partial dependencies and creating separate relations for those subsets, ensuring that every non-prime attribute in the new relations is fully functionally dependent on the entire primary key.[19] Consider a typical violating design for an "Order Details" relation with attributes OrderID, ProductID, Quantity, ProductName, and Supplier, where the composite primary key is (OrderID, ProductID). Here, partial dependencies exist, such as ProductID → ProductName and ProductID → Supplier, since these attributes depend only on part of the key. To rectify this, decompose the relation into two: an "Order Details" relation with composite primary key (OrderID, ProductID)—where ProductID is a foreign key referencing the Products table—and attribute Quantity; and a "Products" relation with attributes ProductID (primary key), ProductName, and Supplier. In the "Order Details" relation, Quantity is fully dependent on the entire composite key (OrderID, ProductID), as it represents the amount per specific order line item. In the "Products" relation, both ProductName and Supplier are fully dependent on ProductID.[18] The following tables illustrate the before-and-after structures with sample data, highlighting the reduction in duplication. The original table repeats product information across multiple orders, leading to redundancy. Before Decomposition (Violating "Order Details" Table):| OrderID | ProductID | Quantity | ProductName | Supplier |
|---|---|---|---|---|
| 1001 | 201 | 5 | Widget A | Acme Corp |
| 1001 | 202 | 3 | Gadget B | Beta Inc |
| 1002 | 201 | 2 | Widget A | Acme Corp |
| 1003 | 202 | 1 | Gadget B | Beta Inc |
| OrderID | ProductID | Quantity |
|---|---|---|
| 1001 | 201 | 5 |
| 1001 | 202 | 3 |
| 1002 | 201 | 2 |
| 1003 | 202 | 1 |
| ProductID | ProductName | Supplier |
|---|---|---|
| 201 | Widget A | Acme Corp |
| 202 | Gadget B | Beta Inc |
Relations to Other Forms
Connection to Third Normal Form
Second normal form (2NF) serves as a foundational requirement for third normal form (3NF), with 3NF extending 2NF by imposing additional constraints on functional dependencies. Specifically, a relation is in 3NF if it is already in 2NF and every non-prime attribute is non-transitively dependent on each candidate key, meaning no non-prime attribute depends on another non-prime attribute.[1] The primary difference lies in the types of dependencies addressed: 2NF focuses on eliminating partial dependencies, where a non-prime attribute depends on only part of a composite candidate key, ensuring full dependency on the entire key. In contrast, 3NF targets transitive dependencies, where a non-prime attribute depends indirectly on the candidate key through another non-prime attribute (e.g., Key → A → B, with B not directly dependent on the key).[1] For instance, consider a relation in 2NF with candidate key EmployeeID and attributes EmployeeID, Department, and Location, where EmployeeID determines Department, and Department determines Location. This satisfies 2NF by having full dependency on the candidate key but violates 3NF due to the transitive dependency EmployeeID → Department → Location, leading to potential redundancy if multiple employees share the same department.[21] Within the normalization hierarchy, relations progress from first normal form (1NF), which ensures atomic values, to 2NF by enforcing full functional dependency on candidate keys, and then to 3NF by removing transitive dependencies, thereby further minimizing redundancy and anomalies.[1] Regarding Boyce-Codd normal form (BCNF), which strengthens 3NF by requiring that every determinant in a functional dependency is a candidate key, 3NF permits certain redundancies that BCNF eliminates; however, achieving 2NF remains a prerequisite for both 3NF and BCNF.[22]Decomposition Process
The decomposition process to achieve second normal form (2NF) involves systematically identifying and eliminating partial dependencies in a relation that is already in first normal form (1NF). The algorithm begins by determining all functional dependencies (FDs) within the relation, identifying candidate keys using techniques such as FD closure, and detecting partial dependencies where a non-prime attribute depends on only a proper subset of any candidate key. Once partial dependencies are found, the relation is decomposed into smaller relations that eliminate these violations while preserving the original data semantics.[23][11] The step-by-step process is as follows:- Ensure the relation is in 1NF, meaning all attributes are atomic and there are no repeating groups.[24]
- Identify all candidate keys by computing the attribute closure for potential keys using the set of FDs; this verifies which minimal sets of attributes uniquely determine the entire relation. Then, list all non-prime attributes (those not part of any candidate key) that are involved in partial dependencies, where the determinant is a proper subset of a candidate key.[23][11]
- For each partial dependency X → Y (where X is a proper subset of a candidate key and Y is a non-prime attribute), create a new relation R1 consisting of X union the closure of X (X⁺), with X as its key; this captures all attributes fully dependent on X.[23]
- Update the original relation R to R2 by removing the attributes in Y (now in R1) but retaining X as a foreign key to link back to R1. Repeat this for all partial dependencies until no violations remain.[11][24]