First normal form
First normal form (1NF) is a fundamental level of database normalization in relational database theory, requiring that all entries in a relation (table) consist of atomic values, eliminating repeating groups and multivalued attributes to ensure data integrity and simplify querying.[1] Introduced by Edgar F. Codd in his seminal 1970 paper "A Relational Model of Data for Large Shared Data Banks," 1NF establishes the basic structure for relations by mandating simple domains—indivisible scalar values—without nested structures or arrays within cells.[1][2] Definitions of 1NF can vary slightly across sources, particularly regarding atomicity and key requirements. This form serves as the starting point for higher normal forms, promoting data independence and enabling efficient manipulation via a universal data sublanguage.[1] To achieve 1NF, every attribute (column) must hold only single, indivisible values, with no groups of related data repeated across multiple columns or within a single cell; while relations are sets and thus have no duplicate rows, a primary key is commonly used in practice to enforce unique identification of each tuple (row).[2] For instance, a table storing employee information with a multivalued attribute like multiple phone numbers in one field violates 1NF; normalization would decompose this into separate rows or relations.[3] As articulated in William Kent's 1983 guide, 1NF addresses the uniformity of record shapes, ensuring all records have the same fixed number of fields, which aligns with Codd's emphasis on avoiding variable repeating fields in relational models.[3][1] The primary purpose of 1NF is to prevent anomalies in data insertion, updates, and deletions by removing redundancy inherent in non-atomic structures, thereby laying the groundwork for scalable database designs.[2] While 1NF alone does not address all dependencies, it is essential for progressing to second normal form (2NF) and beyond, as non-1NF relations cannot be reliably queried using relational algebra or calculus.[1] In practice, modern relational database management systems (RDBMS) such as IBM's DB2 adhere to 1NF as a foundational principle for core relational structures, though they often support non-1NF extensions like JSON fields for flexibility.[2]Fundamentals
Definition
First normal form (1NF) is the foundational level of database normalization in the relational model, requiring that every attribute in a relation contains only atomic values and that there are no repeating groups or arrays within a single tuple. This definition originates from Edgar F. Codd's seminal 1970 paper, "A Relational Model of Data for Large Shared Data Banks," where he introduced the relational model and specified 1NF as the initial requirement for relations to ensure data integrity and eliminate redundancy from hierarchical or network models. Atomic values, in this context, refer to scalar, indivisible entries such as single numbers, strings, or dates that cannot be further decomposed into smaller components without losing their meaning. These values exclude sets, lists, nested relations, or other complex structures that would allow multiple values per attribute in a tuple, thereby maintaining the simplicity and predictability of data storage and retrieval. Formally, a relation R in 1NF consists of attributes A_1, A_2, \dots, A_n with atomic domains, where each tuple is an unordered set of attribute-value pairs, and the relation itself is a set of such tuples with no duplicates. This structure ensures that all entries are single-valued and that the relation adheres to the mathematical properties of sets, avoiding multivalued dependencies within rows. As the starting point for higher normal forms such as second normal form (2NF) and third normal form (3NF), 1NF establishes domain integrity by prohibiting non-atomic data, which forms the basis for subsequent normalization steps to address functional dependencies and further reduce anomalies.Key Requirements
To achieve first normal form (1NF) in the relational model, a table must satisfy specific criteria that ensure its structure aligns with the foundational principles of relations as sets of atomic tuples. These requirements operationalize the abstract definition by providing enforceable rules for database design.[4] The first requirement is that all entries in each column must be atomic, meaning they consist of indivisible, nondecomposable values drawn from simple domains. This prohibits multi-valued attributes, such as storing multiple items in a single cell via comma-separated lists or other composite representations, as such structures violate the atomicity principle and complicate querying and data integrity.[4][5] A second key requirement is the elimination of repeating groups, which refers to arrays, nested relations, or any form of repeated data within a single row. Multi-valued dependencies, like a list of multiple phone numbers or addresses in one entry, must instead be resolved by decomposing the data into separate tables linked by foreign keys, thereby preventing redundancy and ensuring each row captures a single, cohesive fact.[4] Third, each row in the table must represent a unique tuple, with no duplicate rows permitted. This uniqueness is inherent to the set-based nature of relations and is typically enforced through a primary key, which identifies each tuple distinctly without allowing identical rows to coexist.[4] Furthermore, domain constraints mandate that every attribute is restricted to a single, well-defined domain of atomic values, where the domain's elements are treated as indivisible units for the purposes of the relation's operations. This ensures consistency in data interpretation and storage across the schema.[5] In practical SQL implementations faithful to the relational model, these requirements are met by employing primitive data types—such as INT for integers, VARCHAR for variable-length strings, and DATE for temporal values—while avoiding non-atomic types like arrays or JSON columns in core relational tables to preserve 1NF compliance.[5]Illustrative Examples
Non-Compliant Designs
Non-compliant database schemas fail to adhere to first normal form (1NF) primarily through the inclusion of multi-valued attributes or repeating groups, which introduce non-atomic values into table cells.[1] These violations stem from designs where a single attribute attempts to store multiple distinct pieces of information, complicating data management and querying.[6] A common example is a "Customer Orders" table where the "Products" column contains comma-separated lists of items ordered, such as "Widget A, Widget B". This structure violates the atomicity requirement of 1NF because each cell holds multiple values rather than a single, indivisible entry.[7] The table might appear as follows: Searching or updating this schema requires string parsing operations, such as splitting the comma-separated values, which increases query complexity and error risk.[8] Another frequent violation involves repeating groups, where multiple attributes of the same type are duplicated across columns for a single entity. For instance, a "Customers" table might include separate columns for multiple phone numbers under one customer ID, like Phone1, Phone2, and Phone3, leading to insertion anomalies if a customer has more or fewer than three numbers.[9] This repeating group design can be illustrated as:| Customer ID | Name | Phone1 | Phone2 | Phone3 |
|---|---|---|---|---|
| C001 | John Doe | 555-0101 | 555-0102 | NULL |
| C002 | Jane Smith | 555-0201 | NULL | NULL |
Compliant Designs
To transform the non-compliant "Customer Orders" design, where a single table contains repeating groups of multiple products per order row, the structure is split into three related tables: Customers for unique customer details, Orders for order headers, and a junction table called OrderItems for associating individual products with orders. This eliminates multi-valued attributes by ensuring atomicity in each cell. The Customers table uses CustomerID as the primary key (PK). The Orders table links to Customers via a foreign key (FK) on CustomerID. The OrderItems table uses composite keys or a separate OrderItemID, with FKs to Orders and a Products table (assumed for product details), allowing one product per row without duplication of order or customer data.[7]| Table | Columns | Keys |
|---|---|---|
| Customers | CustomerID, Name | PK: CustomerID |
| Orders | OrderID, CustomerID, OrderDate | PK: OrderID FK: CustomerID → Customers |
| OrderItems | OrderID, ProductID, Quantity | PK: (OrderID, ProductID) FK: OrderID → Orders FK: ProductID → Products |
For the repeating phone numbers example, where a customer table has multiple phone values in separate columns (e.g., Phone1, Phone2) or comma-separated in one cell, the design is normalized by extracting phones into a dedicated CustomerPhones table. This table uses CustomerID as an FK to reference the Customers table, with each row holding a single phone number, thus removing repeating groups and ensuring atomic values per attribute.[6]sqlCREATE TABLE Customers ( CustomerID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL ); CREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerID INT NOT NULL, OrderDate DATE NOT NULL, FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID) ); CREATE TABLE OrderItems ( OrderID INT NOT NULL, ProductID INT NOT NULL, Quantity INT NOT NULL, PRIMARY KEY (OrderID, ProductID), FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) );CREATE TABLE Customers ( CustomerID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL ); CREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerID INT NOT NULL, OrderDate DATE NOT NULL, FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID) ); CREATE TABLE OrderItems ( OrderID INT NOT NULL, ProductID INT NOT NULL, Quantity INT NOT NULL, PRIMARY KEY (OrderID, ProductID), FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) );
| Table | Columns | Keys |
|---|---|---|
| Customers | CustomerID, Name | PK: CustomerID |
| CustomerPhones | CustomerID, PhoneNumber | PK: (CustomerID, PhoneNumber) FK: CustomerID → Customers |
These transformations result in designs where every attribute contains a single, indivisible value, satisfying the atomic domain requirement of first normal form as defined by E.F. Codd, and facilitating relational operations such as joins across tables without ambiguity or redundancy in multi-valued fields.[4]sqlCREATE TABLE Customers ( CustomerID [INT](/page/INT) PRIMARY KEY, Name [VARCHAR](/page/Varchar)(100) NOT NULL ); CREATE TABLE CustomerPhones ( CustomerID [INT](/page/INT) NOT NULL, PhoneNumber [VARCHAR](/page/Varchar)(20) NOT NULL, PRIMARY KEY (CustomerID, PhoneNumber), FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID) );CREATE TABLE Customers ( CustomerID [INT](/page/INT) PRIMARY KEY, Name [VARCHAR](/page/Varchar)(100) NOT NULL ); CREATE TABLE CustomerPhones ( CustomerID [INT](/page/INT) NOT NULL, PhoneNumber [VARCHAR](/page/Varchar)(20) NOT NULL, PRIMARY KEY (CustomerID, PhoneNumber), FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID) );
Theoretical Rationale
Normalization Goals
The primary goal of first normal form (1NF) is to ensure data integrity in relational databases by enforcing atomic domains, thereby preventing anomalies such as insertion, deletion, and update issues that arise from non-atomic or multivalued attributes.[10] In Edgar F. Codd's relational model, 1NF requires that every attribute value be indivisible and single-valued, eliminating repeating groups or nested structures that could lead to inconsistencies during data operations.[1] This foundational step minimizes redundancy and supports consistent data manipulation, as nonsimple domains in unnormalized relations often necessitate multiple updates to maintain accuracy across related records.[10] Within Codd's relational model, 1NF serves as a prerequisite for achieving higher normal forms, enabling lossless decomposition of relations while preserving functional dependencies essential for data integrity.[1] By requiring simple domains, 1NF facilitates the orthogonal decomposition of complex relations into independent tables, where subsequent normalizations can address dependency issues without losing information or introducing spurious tuples. This structure aligns with the model's emphasis on set-based operations and data independence, allowing queries and updates to operate on well-defined, atomic units.[1] Introduced by Codd in 1970, 1NF addressed limitations in prevailing hierarchical and network database models, which relied on tree-structured files or pointer-based links that complicated data access and maintenance.[1] These earlier systems often embedded multiplicity within records, leading to navigation challenges and dependency on physical storage details; 1NF promoted a tabular, set-oriented approach that abstracted data from implementation specifics.[1] In contemporary contexts, such as transitions to NoSQL databases, 1NF retains its role as a benchmark for relational integrity, even as denormalization is occasionally adopted for performance gains in document-oriented or key-value stores.[11] While NoSQL systems may tolerate nested structures to optimize read-heavy workloads, adhering to 1NF principles helps mitigate risks of data anomalies during schema evolution or hybrid integrations with relational components.[11]Benefits in Relational Model
Adhering to first normal form (1NF) in the relational model promotes improved data consistency by ensuring atomic values in each attribute, which minimizes redundancy and simplifies the enforcement of domain constraints such as CHECK constraints. This atomicity allows for precise validation rules on individual values, reducing the risk of inconsistent data entry across related records.[1][2] 1NF enhances querying efficiency in relational databases by enabling standard SQL operations like SELECT and JOIN without the need for custom parsing of multi-valued or composite fields, thereby improving overall performance and ensuring greater portability across database management systems. The elimination of repeating groups facilitates symmetric access to data, avoiding complex path-dependent queries that would otherwise be required in non-normalized structures.[1][2] Normalized structures under 1NF support scalability for larger datasets by preventing bloated rows from multi-valued attributes, which in turn facilitates effective indexing and partitioning strategies to handle growing data volumes without proportional increases in storage or query overhead. Smaller, atomic-focused tables allow indexes to target specific values more efficiently, enabling horizontal scaling in distributed environments.[12][13] 1NF prevents update anomalies inherent in non-normalized designs, particularly partial updates to multi-valued fields that could lead to incomplete or inconsistent changes; for instance, attempting to modify only one element in a repeating group might require scanning and altering multiple rows, risking data loss if not all instances are updated. This is avoided through decomposition into atomic relations, ensuring updates affect single, well-defined tuples without side effects on related data. Consider a conceptual scenario where a non-1NF update pseudocode might resemble:In contrast, 1NF decomposes to separate rows, allowing targeted:UPDATE table SET multi_valued_field = REPLACE(multi_valued_field, 'old_value', 'new_value') -- But this fails to update all occurrences if field contains lists, leading to inconsistencyUPDATE table SET multi_valued_field = REPLACE(multi_valued_field, 'old_value', 'new_value') -- But this fails to update all occurrences if field contains lists, leading to inconsistency
Such mechanisms safeguard against insertion and deletion anomalies as well, maintaining relational integrity.[1][14][UPDATE](/page/Update) [normalized_table](/page/normalized_table) SET [atomic_field](/page/atomic_field) = 'new_value' WHERE key = specific_id -- Ensures complete, [atomic](/page/Atomic) update without partial effects[UPDATE](/page/Update) [normalized_table](/page/normalized_table) SET [atomic_field](/page/atomic_field) = 'new_value' WHERE key = specific_id -- Ensures complete, [atomic](/page/Atomic) update without partial effects