Fact-checked by Grok 2 weeks ago

Second normal form

Second normal form (2NF) is a level of that ensures a table eliminates partial dependencies on any , thereby reducing and potential anomalies in data manipulation. A is in 2NF if it is already in (1NF)—meaning it contains no repeating groups or arrays—and every non-prime attribute (any attribute that is not part of a ) is fully functionally dependent on the entire , rather than depending on only a portion of a . Introduced by Edgar F. Codd in his 1971 paper "Further Normalization of the Data Base Relational Model", 2NF builds directly on 1NF by addressing issues arising from partial dependencies, where non-key attributes rely on only part of a multi-attribute key. For instance, in a table tracking suppliers and their products (with a composite key of supplier ID and product ID), if supplier details like city depend solely on supplier ID, this partial dependency violates 2NF and can lead to redundant data storage. To achieve 2NF, such tables are typically decomposed into separate relations: one for the partial dependency (e.g., suppliers with their details) and another for the full key combination (e.g., supplier-product links), preserving data integrity through lossless joins. The primary benefits of 2NF include minimizing , which saves storage and simplifies maintenance, while preventing update anomalies (e.g., inconsistent changes to supplier city across multiple rows), insertion anomalies (e.g., inability to add a new supplier without a product), and deletion anomalies (e.g., losing supplier details when removing the last product). Although 2NF does not address transitive dependencies (handled in ), it forms a foundational step in the normalization hierarchy, promoting efficient and reliable design as outlined in Codd's .

Prerequisites

First Normal Form

First normal form (1NF) requires that a in a consists solely of values in each attribute, meaning every entry is indivisible and cannot contain groups, arrays, or multivalued components. This ensures that are simple, with no nested structures or repeating groups within tuples, and that rows are unique to maintain the set-like properties of relations. Additionally, each attribute must conform to a defined to preserve integrity and prevent anomalies in querying or updating. The concept of 1NF was introduced by in 1970 as the initial requirement for normalizing relations in the , emphasizing the decomposition of nonsimple domains into atomic elements to facilitate efficient data management. Codd described relations on simple domains as those where elements are "atomic (nondecomposable) values," highlighting the need to eliminate nonsimple domains that could complicate representation and operations. Key to 1NF is the elimination of multivalued attributes, which arise when an attribute holds multiple values for a single , leading to and update inconsistencies. By enforcing atomicity, 1NF promotes a flat where each represents a single, indivisible fact, ensuring domain integrity across all attributes. Consider an unnormalized employee table where the "Skills" attribute contains a list of values for each employee:
EmployeeIDNameSkills
101AliceSQL, Python, Java
102BobJava, C++
This violates due to the multivalued "Skills" attribute. To achieve , transform it by creating separate tuples for each skill, resulting in atomic values:
EmployeeIDNameSkill
101AliceSQL
101AlicePython
101AliceJava
102BobJava
102BobC++
This structure eliminates repeating groups while preserving all data, with EmployeeID serving as the key to identify unique combinations. Such transformations lay the groundwork for analyzing functional dependencies in subsequent normal forms.

Functional Dependencies

A functional dependency (FD) in a relational database is a constraint that specifies a relationship between two sets of attributes, where the values of one set (the determinant) uniquely determine the values of another set (the dependent). Formally, an FD X \to Y holds over a relation r if, for any two tuples t_1 and t_2 in r, t_1[X] = t_2[X] implies t_1[Y] = t_2[Y], ensuring that each X-value is associated with at most one Y-value. This assumes relations are in first normal form, with atomic attribute values. The notation for an FD is X \to Y, where X and Y are subsets of the relation's attributes, X is the determinant (left side), and Y is the dependent (right side). A full functional dependency exists when X \to Y and no proper subset of X determines Y, meaning all attributes in X are necessary for the determination. In contrast, a partial functional dependency occurs when a proper subset of X determines Y, indicating that not all attributes in the determinant are required. Functional dependencies are classified into types based on their structure. A trivial FD is one where Y \subseteq X, such as AB \to A, which always holds by definition and provides no new information. A non-trivial FD has Y \not\subseteq X, where the dependent attributes are not part of the determinant, allowing for meaningful constraints on the data. To reason about sets of FDs, inference rules known as Armstrong's axioms are used, providing a sound and complete system for deriving all implied dependencies. These include reflexivity: if Y \subseteq X, then X \to Y; augmentation: if X \to Y, then XZ \to YZ for any Z; and transitivity: if X \to Y and Y \to Z, then X \to Z. These axioms, originally proposed by William W. Armstrong, enable the logical inference of additional FDs from a given set without accessing the actual data. For example, consider a relation with attributes StudentID, Name, and . The FD StudentID \to Name is a full functional dependency, as the single attribute StudentID uniquely determines the student's name, with no subset possible. However, if the key is composite (e.g., {StudentID, CourseID} \to ), then StudentID \to Name becomes a partial functional dependency, since only a proper subset of the key (StudentID) determines Name. To identify all attributes determined by a set X under a given set of FDs F, the attribute X^+ is computed. This is the largest set of attributes derivable from X by repeatedly applying the FDs in F: start with X^+ = X, then add any attribute A if there exists W \to A in F such that W \subseteq X^+, until no changes occur. For instance, given F = \{AB \to C, C \to B\} over ABC, the (AB)^+ = ABC, as AB determines C, and C then determines B (though B is already included). This process helps verify candidate keys and implied dependencies systematically.

Formal Definition

Core Conditions

Second normal form (2NF) was introduced by in 1971 as an advancement beyond (1NF) to address issues arising from partial dependencies in design, thereby enhancing the structure's resistance to certain anomalies. This normalization level builds on the foundations of functional dependencies, ensuring that relations maintain integrity by preventing attributes from depending on only portions of composite keys. Formally, a relation R is in 2NF if it satisfies 1NF and every non-prime attribute of R is fully functionally dependent on every of R. A is defined as a minimal set of attributes that uniquely identifies each in the , serving as a with no superfluous attributes—meaning no proper of the set can also uniquely identify tuples. Attributes are classified as prime if they belong to at least one , and non-prime otherwise; the 2NF requirement applies specifically to non-prime attributes to ensure their dependencies align with entire keys rather than fragments. The core condition can be broken down through functional dependencies: for every dependency X \to A in the relation, where A is a non-prime attribute, the determinant X must be a complete and not a proper thereof, guaranteeing full rather than partial dependency. Full means that A depends on X but on no proper of X, preventing scenarios where non-prime attributes are influenced by only part of a composite key. This criterion ensures that the relation avoids redundancy tied to key subsets while preserving the enforced by s.

Partial Dependencies

A partial dependency in the context of occurs when a non-prime attribute is functionally dependent on only a proper of a composite , rather than the entire key. This violates the principle of full required for second normal form (2NF), as defined by E.F. Codd, where every non-prime attribute must depend on the whole and not merely part of it. To identify a partial dependency, consider a functional dependency X \to A in a relation where X is a proper subset of a candidate key and A is a non-prime attribute. For instance, if the candidate key is a composite like \{K_1, K_2\} and K_1 \to A holds, then A partially depends on K_1 alone, failing the full dependency test. This identification process relies on analyzing all functional dependencies to ensure no such partial relationships exist with respect to any candidate key. Partial dependencies lead to data redundancy and various anomalies in database operations. Specifically, they cause anomalies, where changing a value in one part of the (e.g., a product detail) necessitates updates across multiple rows to maintain consistency, increasing the risk of errors or inconsistencies. Insertion anomalies may also arise, as adding a new non- attribute value tied to only part of the key could require extraneous rows, while deletion anomalies occur when removing a row eliminates unrelated . A representative example involves an order details relation with composite candidate key \{ \text{OrderID}, \text{ProductID} \} and an attribute Supplier that depends solely on ProductID, as each product has a fixed supplier regardless of the order. Here, the functional dependency \text{ProductID} \to \text{Supplier} represents a partial dependency, since Supplier does not rely on the full key including OrderID. To resolve partial dependencies and achieve 2NF, the must be decomposed into smaller where each non-prime attribute fully depends on the entire in its new (detailed process in Process). This decomposition eliminates the partial relationships while preserving the original data's recoverability through joins.

Practical Examples

Violating Design

A common example of a database design violating second normal form is the "Order Details" relation, which stores information about items in orders. This has columns for OrderID, ProductID, ProductName, Supplier, and , with a composite primary key consisting of OrderID and ProductID. The functional dependencies in this are: ProductID → ProductName, ProductID → Supplier, and (OrderID, ProductID) → . Here, ProductName and Supplier exhibit partial dependencies, as they depend solely on ProductID rather than the entire composite key, leading to when the same product appears in multiple orders. To illustrate the redundancy, consider the following sample data where the same product (e.g., "Widget A" from "Supplier X") is ordered multiple times, resulting in repeated values for ProductName and Supplier:
OrderIDProductIDProductNameSupplierQuantity
1001P001Widget ASupplier X5
1002P001Widget ASupplier X3
1003P002Gadget BSupplier Y2
1004P001Widget ASupplier X4
This structure causes update anomalies, such as changing the supplier for ProductID P001 requiring modifications to every row with that ProductID (e.g., all four rows here if applicable). It also leads to insertion anomalies, preventing the addition of a new supplier for a product without creating a corresponding order entry.

Compliant Design

To achieve second normal form, a exhibiting partial dependencies is decomposed into multiple , each of which has no partial dependencies and preserves the original data through lossless . This process involves identifying the determinants of partial dependencies and creating separate for those subsets, ensuring that every non-prime attribute in the new is fully functionally dependent on the entire . Consider a typical violating design for an "Order Details" relation with attributes OrderID, ProductID, Quantity, ProductName, and Supplier, where the composite primary key is (OrderID, ProductID). Here, partial dependencies exist, such as ProductID → ProductName and ProductID → Supplier, since these attributes depend only on part of the key. To this, decompose the relation into two: an "Order Details" relation with composite primary key (OrderID, ProductID)—where ProductID is a foreign key referencing the Products table—and attribute Quantity; and a "Products" relation with attributes ProductID (primary key), ProductName, and Supplier. In the "Order Details" relation, Quantity is fully dependent on the entire composite key (OrderID, ProductID), as it represents the amount per specific order line item. In the "Products" relation, both ProductName and Supplier are fully dependent on ProductID. The following tables illustrate the before-and-after structures with sample data, highlighting the reduction in duplication. The original table repeats product information across multiple orders, leading to redundancy. Before Decomposition (Violating "Order Details" Table):
OrderIDProductIDQuantityProductNameSupplier
10012015Widget AAcme Corp
10012023Gadget BBeta Inc
10022012Widget AAcme Corp
10032021Gadget BBeta Inc
After Decomposition: Order Details Table:
OrderIDProductIDQuantity
10012015
10012023
10022012
10032021
Products Table:
ProductIDProductNameSupplier
201Widget AAcme Corp
202Gadget BBeta Inc
To verify compliance, examine the functional dependencies in the new relations. In the "Order Details" table, the functional dependency {OrderID, ProductID} → Quantity holds fully, with no partial dependencies on subsets of the key. In the "Products" table, {ProductID} → {ProductName, Supplier} is a full dependency, as both non-prime attributes depend entirely on the primary key, eliminating any partial reliance. This decomposition eliminates redundancy by storing product details once, rather than repeating them per order line. It also enables independent updates, such as changing a supplier for a product without altering historical order records, thereby preventing update anomalies and maintaining data consistency.

Relations to Other Forms

Connection to Third Normal Form

Second normal form (2NF) serves as a foundational requirement for (3NF), with 3NF extending 2NF by imposing additional constraints on functional dependencies. Specifically, a is in 3NF if it is already in 2NF and every non-prime attribute is non-transitively dependent on each , meaning no non-prime attribute depends on another non-prime attribute. The primary difference lies in the types of dependencies addressed: 2NF focuses on eliminating partial dependencies, where a non-prime attribute depends on only part of a composite , ensuring full dependency on the entire key. In contrast, 3NF targets transitive dependencies, where a non-prime attribute depends indirectly on the through another non-prime attribute (e.g., Key → A → B, with B not directly dependent on the key). For instance, consider a in 2NF with EmployeeID and attributes EmployeeID, , and Location, where EmployeeID determines Department, and Department determines Location. This satisfies 2NF by having full dependency on the but violates 3NF due to the EmployeeID → Department → Location, leading to potential if multiple employees share the same department. Within the normalization hierarchy, relations progress from (1NF), which ensures atomic values, to 2NF by enforcing full on , and then to 3NF by removing , thereby further minimizing and anomalies. Regarding Boyce-Codd normal form (BCNF), which strengthens 3NF by requiring that every determinant in a functional dependency is a candidate key, 3NF permits certain redundancies that BCNF eliminates; however, achieving 2NF remains a prerequisite for both 3NF and BCNF.

Decomposition Process

The decomposition process to achieve second normal form (2NF) involves systematically identifying and eliminating partial dependencies in a relation that is already in first normal form (1NF). The algorithm begins by determining all functional dependencies (FDs) within the relation, identifying candidate keys using techniques such as FD closure, and detecting partial dependencies where a non-prime attribute depends on only a proper subset of any candidate key. Once partial dependencies are found, the relation is decomposed into smaller relations that eliminate these violations while preserving the original data semantics. The step-by-step process is as follows:
  1. Ensure the relation is in 1NF, meaning all attributes are and there are no repeating groups.
  2. Identify all s by computing the attribute for potential keys using the set of FDs; this verifies which minimal sets of attributes uniquely determine the entire . Then, list all non-prime attributes (those not part of any ) that are involved in partial dependencies, where the determinant is a proper of a .
  3. For each partial dependency X → Y (where X is a proper of a and Y is a non-prime attribute), create a new relation R1 consisting of X the of X (X⁺), with X as its ; this captures all attributes fully dependent on X.
  4. Update the original R to R2 by removing the attributes in Y (now in R1) but retaining X as a to link back to R1. Repeat this for all partial dependencies until no violations remain.
Decompositions must satisfy two key properties: dependency preservation, ensuring all original FDs can be inferred from the FDs in the decomposed relations via their projections; and lossless join, meaning the natural join of the decomposed relations recovers the original relation without spurious tuples. These properties are verified using the or by checking that the common attributes form a in one of the relations. Failure to preserve these can lead to information loss or inefficiency in query processing. To illustrate, consider a violating relation Orders (CustomerID, OrderID, ProductID, ProductName, Quantity) with candidate key {OrderID, ProductID} and partial dependencies OrderID → CustomerID and ProductID → ProductName. Using FD closure, confirm the key and partial FDs. Decompose into CustomerOrders (OrderID, CustomerID) with key OrderID, and OrderDetails (OrderID, ProductID, ProductName, Quantity) with key {OrderID, ProductID}, where OrderID serves as a in OrderDetails linking to CustomerOrders. A further decomposition may separate Products (ProductID, ProductName) if needed, resulting in lossless, dependency-preserving relations.

Benefits and Applications

Reduction of Redundancy

Second normal form (2NF) reduces by eliminating partial dependencies in relations with composite primary keys, ensuring that every non-prime attribute is fully functionally dependent on the entire rather than a proper of it. Partial dependencies arise from functional dependencies where a non-key attribute relies only on part of the key, causing the same value to be stored redundantly across multiple tuples that share that key . By decomposing the into projections that isolate these dependencies, 2NF prevents such duplication, as data tied to a key is moved to a separate referenced by foreign keys. This mechanism directly lowers storage requirements, as information is stored only once per unique instead of being replicated for each combination involving the full . It also promotes , since changes to the dependent attribute need to occur in only one location, avoiding discrepancies that could arise from incomplete updates in redundant copies. For instance, product details such as name or description, which depend solely on a product identifier rather than an order-product combination, are stored once per product in a dedicated , rather than repeated for every line item. In non-2NF designs, the degree of redundancy for an attribute dependent on a subset can scale proportionally to the number of distinct values in the remaining key components—for example, repeating supplier details for each order of the same product across numerous orders. Achieving 2NF limits this by confining storage to the actual multiplicity of the dependent values, capping redundancy at the scale of unique entities rather than their cross-products. A practical application appears in databases, where a supply might include order ID, product ID, and supplier information; since supplier details depend only on product ID, 2NF separates them into a products , preventing repetition of supplier across multiple order lines for identical products. This approach, drawn from classic examples like supplier-part relations, ensures efficient handling of and without unnecessary duplication. While 2NF effectively curbs redundancy from partial dependencies, it does not resolve transitive dependencies among non-key attributes, allowing some duplication to persist until is reached.

Insertion and Update Anomalies

In relational databases, relations not in second normal form (2NF) are susceptible to insertion, update, and deletion anomalies due to partial dependencies, where non-prime attributes depend on only a portion of a composite rather than the entire key. These anomalies compromise during common operations, as they force extraneous , risk inconsistency, or lead to unintended . Achieving 2NF eliminates such issues by ensuring full on the whole key, typically through decomposition into smaller relations. An insertion anomaly occurs when it is impossible to add certain facts to the database without including unrelated information, often because partial dependencies require extraneous details to satisfy the 's structure. Consider an OrderDetails with attributes OrderID, ProductID, ProductName, SupplierName, and Quantity, where the is the composite (OrderID, ProductID). Here, ProductName and SupplierName depend only on ProductID (a partial ). To insert details for a new product and its supplier, an order must already exist; otherwise, the row cannot be added without fabricating an OrderID, leading to incomplete or inaccurate data representation. In E. F. Codd's foundational example, a similar issue arises in a supplier-part where city information for a new supplier cannot be inserted without associating it to a specific part-supply . An update anomaly manifests as the need to modify multiple rows to maintain , increasing the risk of errors or incomplete updates due to redundant data from partial . In the OrderDetails example, altering a supplier's name for a given ProductID requires updating every row containing that ProductID across various orders; failing to update all instances results in inconsistencies. This stems directly from the partial dependency, as the supplier information is repeated unnecessarily. Codd illustrates this with a supplier relocation, where changing a supplier's demands updates to all associated part-supply tuples, potentially leaving some unchanged if the operation is not exhaustive. A deletion anomaly happens when removing a erases unrelated but valuable information, again tied to partial dependencies that entwine distinct facts. Using the OrderDetails , deleting the final order for a product would also remove the ProductName and SupplierName, losing essential product even though the deletion targeted only the . Codd's supplier-part example shows deleting all supply tuples for a supplier also discards the supplier's , severing entity details. Normalizing to 2NF resolves these anomalies by decomposing the —for instance, separating OrderDetails into an Orders (OrderID, ProductID, ) and a Products (ProductID, ProductName, SupplierName), ensuring each non-prime attribute fully depends on its key and allowing insertions, updates, and deletions.

References

  1. [1]
    [PDF] Further Normalization of the Data Base Relational Model
    In an earlier paper, the author proposed a relational model of data as a basis for protecting users of formatted data systems from the potentially.
  2. [2]
    7.3: Define first (1NF), second (2NF), and third normal (3NF) form
    Mar 20, 2023 · Second normal form (2NF) must have all attributes or non-key columns dependent on the key. For example, if the data is based on making an order ...
  3. [3]
    Normalization
    Second Normal Form (2NF) = ELIMINATE REDUNDANT DATA (if an attribute depends on only part of multi-valued key, remove it to a separate table).
  4. [4]
    Normalization
    Sep 30, 2019 · A relation is in second normal form (2NF) if it is in first normal form and all the non-key attributes are fully functionally dependent on the ...Missing: explanation | Show results with:explanation
  5. [5]
    [PDF] A Relational Model of Data for Large Shared Data Banks
    A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced. In Section. 2, certain.
  6. [6]
    [PDF] Normalization - Purdue Computer Science
    Oct 3, 2016 · ▫ A relational schema R is in first normal form if the domains of all attributes of R are atomic. ▫ Non-atomic values complicate storage ...
  7. [7]
    A relational model of data for large shared data banks
    A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced.
  8. [8]
    [PDF] Normalization
    • A relational schema R is in first normal form if the domains of all attributes of R are atomic. • Non-atomic values complicate storage and encourage.
  9. [9]
    [PDF] Functional Dependencies - Cleveland State University
    These restrictions are examples of functional dependencies. • Informally, a functional dependency occurs when the values of a tuple on one set of attributes ...
  10. [10]
    [PDF] FUNCTIONAL DEPENDENCIES - cs.wisc.edu
    1. express constraints on the data: functional. dependencies (FDs) 2. use the FDs to decompose the relations.
  11. [11]
    [PDF] Normalization - Computer Science
    Definition of a full functional dependency. If X ... Has a partial functional dependency on the primary key ... Has a full functional dependency on the primary key.
  12. [12]
    [PDF] Chapter 4: Functional Dependencies
    These three axioms are sometimes called Armstrong's uxioms, although they are not very similar to Armstrong's original axioms (but the name has a nice ring to ...
  13. [13]
    [PDF] Lecture 04: Functional Dependencies - CMU 15-445/645
    • Example: sid→name because name depends on sid (”student ID ... • A schema preserves dependencies if its original FD's do not span multiple tables. 3.
  14. [14]
    Further Normalization of the Data Base Relational Model
    Further Normalization of the Data Base Relational Model · E. Codd · Published in Research Report / RJ / IBM… 1971 · Computer Science · Research Report / RJ / IBM / ...
  15. [15]
    [PDF] 7+8+9: Functional Dependencies and Normalization
    Functional dependencies are relationships where one attribute's value can determine another. Normalization is the systematic validation of attributes to avoid ...
  16. [16]
    None
    ### Summary of Partial Dependencies in Second Normal Form (2NF)
  17. [17]
    [PDF] Normal Forms - UNL School of Computing
    Oct 25, 2008 · relation by converting it to second normal form. A relation is in ... that is a determinant in a partial dependency. That attribute is ...
  18. [18]
    None
    ### Summary of 2NF Violation Examples from Chapter 19 - Normalization
  19. [19]
    Normalized data base structure: a brief tutorial - ACM Digital Library
    Normalized data base structure: a brief tutorial. research-article. Free access. Share on. Normalized data base structure: a brief tutorial. Author: E. F. Codd.
  20. [20]
    A simple guide to five normal forms in relational database theory
    Codd, E.F. Further normalization of the data base relational model. R. Rustin (Ed.), Data Base Systems (Courant Computer Science Symposia 6). Prentice- ...
  21. [21]
    What is Transitive Dependency? Clear and Simple - DataCamp
    Sep 26, 2024 · A transitive dependency occurs when one attribute in a database indirectly relies on another through a third attribute, causing redundancy.
  22. [22]
    [PDF] Edgar F. Codd - SIGMOD Record
    (In the interests of accuracy, I should say that this is really a definition of what subsequently became known as Boyce/Codd normal form,. BCNF, not 3NF as ...
  23. [23]
    [PDF] CS 2451 Database Schema Design: Normalization - GW Engineering
    Second Normal Form: Relation is in 2NF if no non-prime attribute is partially dependent on the primary key. What problems (if any) caused by partial.Missing: details | Show results with:details
  24. [24]
    [PDF] CSC 261/461 – Database Systems Lecture 12
    Second Normal Form (cont.) • Examples: – {Ssn, Pnumber} → Hours is a full FD since neither. • Ssn → Hours nor Pnumber → Hours hold. – {Ssn, Pnumber} ...
  25. [25]
    Introduction to Data Normalization: Database Design 101
    Second normal form (2NF), An entity type is in 2NF when it is in 1NF and when all of its non-key attributes are fully dependent on its primary key. ; Third ...
  26. [26]
    Why is Data Normalization Important? - IEEE Computer Society
    Mar 20, 2024 · Second normal form (2NF). With the first normal form in place, you can progress to the second. Here, any data that can be used by multiple ...
  27. [27]
    Database Normalization: 1NF, 2NF, 3NF & BCNF Examples
    Jul 26, 2025 · Partial dependency, Composite Primary Key. 3NF, Transitive ... Since Supplier depends on Product and not directly on the primary key ( OrderID ) ...
  28. [28]
    [PDF] Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
    dependencies and normal forms. ▫ - 1NF (First Normal Form). ▫ - 2NF (Second Normal Form). ▫ - 3NF (Third Noferferferfewrmal Form). ▫ - BCNF (Boyce-Codd Normal ...Missing: details | Show results with:details
  29. [29]