Fact-checked by Grok 2 weeks ago

Codd's 12 rules

Codd's twelve rules are a set of thirteen criteria (numbered from zero to twelve) proposed by Edgar F. Codd, the pioneer of the relational database model, in 1985 to establish the standards for a database management system to be considered fully relational.^[1] These rules define the core principles of the relational model, emphasizing logical data independence, data integrity, and user-friendly access mechanisms to ensure that systems adhere strictly to relational theory rather than merely incorporating relational features.^[2] Developed amid growing commercial interest in relational databases during the 1980s, the rules were first detailed in two Computerworld articles titled "Is Your DBMS Really Relational?" (October 14, 1985) and "Does Your DBMS Run By the Rules?" (October 21, 1985), addressing vendors' tendency to market non-relational systems as relational for marketing advantage.^[2] The foundation rule (Rule 0) requires that all data management functions be performable using only relational capabilities, setting the baseline for compliance.^[2] Subsequent rules cover critical aspects such as representing all information in tables (Rule 1), guaranteeing access to data via logical identifiers (Rule 2), handling null values systematically (Rule 3), maintaining active online catalogs as base tables (Rule 4), supporting comprehensive sublanguages for data definition, manipulation, and control (Rule 5), enabling view updating (Rule 6), providing relational operations for insert, update, and delete (Rule 7), ensuring physical data independence (Rule 8) and logical data independence (Rule 9), enforcing integrity constraints through the relational language (Rule 10), supporting distributed databases (Rule 11), and preventing low-level languages from bypassing relational safeguards (Rule 12).^[2] While no commercial database management system has fully satisfied all twelve rules, they remain a foundational benchmark for evaluating relational fidelity and have profoundly influenced the design and standardization of modern RDBMS, including SQL-based systems.^[1]

Introduction

Definition and Purpose

Codd's 12 rules, formally known as a set of 13 criteria (numbered from Rule 0 to Rule 12), were proposed by Edgar F. Codd in 1985 as a formal evaluation scheme to assess whether a database management system (DBMS) qualifies as truly relational.^[3] These rules serve as benchmarks for fidelity to the relational model, which Codd himself introduced in 1970 as a framework for organizing data into relations to simplify data access and management. Often referred to as the "Twelve Commandments" despite the inclusion of the foundational Rule 0, the criteria emphasize that a relational DBMS must manage data exclusively through relational mechanisms without relying on non-relational or procedural extensions.^[4] The primary purpose of these rules is to protect the integrity of the relational model against dilutions by vendors, who in the 1980s frequently marketed hybrid or "born-again" systems as fully relational while incorporating navigational or hierarchical elements that undermined relational principles.^[3] Codd developed the rules out of frustration with such misleading claims, aiming to provide database users and purchasers with a rigorous standard to evaluate vendor products and ensure long-term investments in applications, training, and data administration remain viable.^[4] By enforcing strict adherence, the rules promote standardized data independence, integrity constraints, and manipulative capabilities, countering the "performance myth" that non-relational features were necessary for efficiency.^[3] At their core, the rules define a relational DBMS as a system that organizes and manages data using relations—typically represented as tables consisting of rows (tuples) and columns (attributes)—and supports declarative query languages for operations like retrieval, insertion, update, and deletion at the relational level, often handling multiple records simultaneously.^[4] This approach ensures data sublanguage commands are comprehensive and uniformly applicable across the database, distinguishing true relational systems from those requiring low-level navigational access or record-at-a-time processing.^[3]

Relation to the Relational Model

The relational model, introduced by Edgar F. Codd in 1970, organizes data into relations—mathematical structures akin to tables—comprising rows (tuples) and columns (attributes) defined over specific domains, with the entire framework rooted in set theory and first-order predicate logic to enable precise querying and manipulation. This abstraction allows data to be represented declaratively without regard to physical storage details, distinguishing it from earlier models like hierarchical or network databases that imposed rigid navigational structures. Codd's 12 rules, formalized in 1985 and elaborated in his 1990 work, extend this foundational model by translating its theoretical principles into practical, verifiable requirements for database management systems (DBMS).^[5] Specifically, the rules enforce declarative access to data through non-procedural languages, logical and physical data independence to insulate applications from storage changes, and systematic handling of null values to represent missing or inapplicable information accurately, thereby preventing implementations from deviating toward non-relational paradigms such as pointer-based hierarchies or CODASYL networks.^[5] These extensions ensure that systems adhere strictly to the model's emphasis on logical data structures, where relations maintain integrity through keys and domains, rather than relying on implementation-specific optimizations.^[5] A core concept in this relation is the prioritization of logical over physical representation: users interact with data via relational algebra operations like selection, projection, and join, oblivious to how tuples are stored or indexed on disk. The rules serve as a rigorous checklist to assess a DBMS's fidelity to the model, confirming that all data manipulation occurs within the relational framework without procedural code or external dependencies.^[5] The rules presuppose familiarity with basic relational elements, such as primary keys for uniqueness and query languages for retrieval. For instance, consider an employee relation with attributes EmployeeID (a unique integer key), Name (a string domain), and Department (a categorical domain):

EmployeeID	Name	Department
101	Alice Smith	Engineering
102	Bob Johnson	Sales
103	Carol Lee	Engineering

This table exemplifies a relation where tuples represent facts, attributes enforce domain constraints, and queries (e.g., selecting all engineers) demonstrate the model's declarative power.

Historical Development

Edgar F. Codd's Contributions

Edgar Frank Codd (1923–2003) was a British-born computer scientist and mathematician whose work profoundly influenced modern database systems. Born on August 19, 1923, in the Isle of Portland, England, Codd earned an honors degree in mathematics from Exeter College, Oxford University, in 1948 after serving as a pilot in the Royal Air Force during World War II. He later obtained a Ph.D. in computer and communication sciences from the University of Michigan in 1965. Codd joined IBM in 1949 as a mathematical programmer in New York City, initially focusing on early computing systems such as the Selective Sequence Electronic Calculator and contributing to the design of the IBM 701 computer in the early 1950s. By 1957, he had moved to Poughkeepsie, New York, where he helped develop the IBM 7030 STRETCH, the company's first transistorized supercomputer, advancing concepts in multiprogramming. In 1968, Codd relocated to IBM's San Jose Research Laboratory in California, marking his transition toward database research amid growing needs for managing large-scale data in business environments.^[6]^[7] Codd's pivotal contributions began in the late 1960s when he sought alternatives to prevailing hierarchical and network database models, such as IBM's Information Management System (IMS) and the Conference on Data Systems Languages (CODASYL) approach, which required programmers to navigate complex pointer-based structures. In a landmark 1970 paper, "A Relational Model of Data for Large Shared Data Banks," published in Communications of the ACM, he introduced the relational model, proposing data organization into tables (relations) with rows (tuples) and columns (attributes), linked via keys to ensure logical independence and simplify querying without knowledge of physical storage. This model emphasized declarative access, allowing users to specify what data they needed rather than how to retrieve it, fundamentally shifting database design from navigational to set-based operations. Building on this foundation, Codd extended the relational model throughout the 1970s, refining concepts like normalization to minimize redundancy and dependency theory to maintain data integrity.^[6] By the late 1970s and early 1980s, Codd grew concerned with database vendors, including IBM itself, incorporating non-relational features into products like SQL/DS—such as low-level navigational interfaces and deviations from strict relational principles—that diluted the model's purity and complicated user access. Motivated to establish clear criteria for true relational database management systems (RDBMS), Codd advocated for rigorous standards to enforce data independence, integrity, and usability, preventing vendor implementations from undermining the relational paradigm's benefits. His efforts culminated in the 1981 A.M. Turing Award from the Association for Computing Machinery, recognizing "his fundamental and continuing contributions to the theory and practice of database management systems." Codd retired from IBM in 1984 but continued independent consulting and research.^[6]^[2] In the 1990s, Codd further advanced his ideas through the Relational Model Version 2 (RM/V2), outlined in his 1990 book The Relational Model for Database Management: Version 2, which expanded the original framework into over 300 detailed rules and features to address evolving requirements like temporal data and enhanced integrity constraints. This work reinforced his commitment to evolving the relational model while preserving its mathematical rigor based on first-order predicate logic and set theory.^[5]

Publication and Evolution

E. F. Codd first formally proposed his 12 rules for evaluating relational database management systems (RDBMS) in a two-part article series published in Computerworld magazine on October 14 and October 21, 1985. Titled "Is Your DBMS Really Relational?" and "Does Your DBMS Run by the Rules?", these articles outlined the rules as a rigorous test to distinguish truly relational systems from those merely claiming the label, including Rule 0 as the foundational principle that the system must use relational facilities exclusively to manage the database.^[2]^[8] The publication came amid growing commercial interest in relational databases during the 1980s, as vendors such as Oracle and IBM with its DB2 product aggressively marketed their offerings as relational, often without full adherence to Codd's emerging criteria. This led to widespread use of the term "relational" in industry promotions, sparking debates on authenticity and prompting Codd's rules as a benchmark for compliance. The rules also exerted influence on the development of ANSI SQL standards, providing conceptual guidance for features like data sublanguages and catalog management in subsequent revisions beyond the initial 1986 standard.^[9]^[10] In the 1990s, Codd refined his ideas through the RM/V2 framework, detailed in his 1990 book The Relational Model for Database Management: Version 2, which updated several rules—such as expanding view updatability requirements—and integrated them into a broader vision of relational integrity and temporal support. These evolutions addressed limitations in early implementations and aimed to guide future standards, though assessments of commercial compliance varied.^[5]

The Rules

Rule 0: The Foundation Rule

Rule 0, known as the Foundation Rule, establishes the fundamental prerequisite for a database management system (DBMS) to be considered truly relational. It requires that any system advertised or claimed to be a relational DBMS must manage the database exclusively using its relational facilities, without relying on any non-relational mechanisms. This ensures that the entire scope of database operations—from definition and manipulation to integrity enforcement—is handled through relational principles alone.^[2] The rule explicitly prohibits the incorporation of non-relational extensions, such as navigational pointers, hierarchical structures, or procedural coding elements, which were common in pre-relational systems like CODASYL or IMS. Instead, it mandates that all data manipulation be declarative and relation-based, leveraging mathematical relations (tables with rows and columns) to represent and query data. This foundational constraint guarantees that the system's architecture adheres strictly to the relational model, preventing hybrid approaches that dilute its purity and benefits, such as data independence and simplicity.^[2]^[5] A practical illustration of this rule is that a compliant DBMS cannot expose or depend on low-level, record-oriented APIs for data access; rather, it must provide interfaces limited to relational constructs like tables, primary and foreign keys, and a comprehensive query language (e.g., SQL equivalents) for all operations, including inserts, updates, and deletes.^[2] Codd introduced Rule 0 in his 1985 series of articles in Computerworld as the zeroth rule—preceding the other twelve—to emphasize its indispensable role, directly addressing misleading vendor claims in the 1980s where products were labeled "relational" despite heavy reliance on non-relational procedural add-ons.^[9]^[5]

Rule 1: The Information Rule

Rule 1, known as the Information Rule, stipulates that all information in a relational database, including both user data and metadata such as database structure definitions, must be represented explicitly at the logical level and in exactly one way—by values in tables. This rule ensures that the database's logical structure is self-contained within the relational framework, without reliance on external files or non-relational mechanisms for storing schema information. By confining all data representation to tables, the rule promotes a uniform approach to information management, abstracting away physical storage details and emphasizing the logical view presented to users.^[2] At the logical level, this representation hides the underlying physical storage mechanisms, such as file formats or indexing structures, allowing users to interact solely with tabular data through relational operations. Metadata, including details like table names, column definitions, and data types, is stored as rows and columns in dedicated system tables, eliminating the need for separate schema files or proprietary formats outside the relational model. This approach aligns with the foundational relational facilities outlined in Rule 0, ensuring the entire system operates on a consistent data model.^[11] For instance, the definition of a table—such as its columns and their types—would be entered as values in a system catalog table, treatable like any other relational data. This enables the database to be self-describing, where structural information is as accessible and manipulable as application data, fostering flexibility in database design and maintenance. The implication of this rule is profound: it establishes the basis for treating data and metadata uniformly, which is essential for building truly relational systems that support dynamic introspection and evolution without disrupting the logical schema.^[12]

Rule 2: The Guaranteed Access Rule

The Guaranteed Access Rule, designated as Rule 2 in Edgar F. Codd's framework for relational database management systems, mandates that each and every datum (atomic value) in the database is guaranteed to be logically accessible by specifying a combination of the relation name, primary key value, and attribute name.^[13] This precise addressing scheme ensures unambiguous retrieval of individual scalar values without ambiguity or reliance on implementation-specific details.^[9] As a direct corollary to Rule 1's emphasis on data representation solely as values in relations, Rule 2 reinforces the need for relations to adhere to first normal form (1NF), where attributes are atomic, and higher normal forms like fifth normal form (5NF) to maintain dependencies tied to the primary key.^[13] It explicitly prohibits access methods based on positional or ordinal references, such as identifying data by the "third field" in a sequential record, thereby mandating the use of primary keys for unique tuple identification.^[2] A practical illustration of this rule involves accessing an employee's salary in a relation named employees, where emp_id serves as the primary key. The query would be:

sql
SELECT [salary](/page/Salary) FROM employees WHERE emp_id = 123;
SELECT [salary](/page/Salary) FROM employees WHERE emp_id = 123;

This logical specification avoids any reference to physical constructs, such as "field 5 of record 10," ensuring the access remains independent of storage layout.^[2] By promoting a three-part logical addressing mechanism over physical pointers or navigational paths, Rule 2 enhances data independence and portability, allowing applications to interact with the database schema without concern for underlying hardware or storage changes.^[13] This foundational principle underscores the relational model's shift toward declarative query languages for robust, scalable data management.^[9]

Rule 3: Systematic Treatment of Null Values

Rule 3 requires that a relational database management system (RDBMS) support null values to represent missing or inapplicable information in a systematic and uniform manner, independent of the data type involved. Nulls must be distinctly handled and differentiated from other representations such as empty character strings, strings of blank characters, zeros, or any other numeric values that could otherwise serve as legitimate data entries. This ensures that nulls function as a dedicated marker solely for the absence of applicable data, avoiding the pitfalls of ad-hoc conventions that vary by column or domain.^[14] Central to this rule is the concept of nulls as a unique indicator for "unknown" or "not applicable" states, which necessitates the adoption of three-valued logic in query processing and data manipulation. In traditional two-valued logic (true/false), the presence of nulls introduces a third outcome: unknown. For instance, a comparison involving a null value, such as checking if a salary equals 50,000, evaluates to unknown rather than true or false, propagating through operations like AND, OR, and NOT according to extended truth tables. This systematic approach allows queries to explicitly test for nulls using operators like IS NULL or IS NOT NULL, ensuring consistent behavior across the database. Primary keys and certain foreign keys can be constrained to disallow nulls, enforcing data integrity where completeness is mandatory.^[14]^[15] Consider a customer table with columns for first name, last name, and middle initial. The middle initial field can legitimately contain a null value for customers without a middle name, distinguishing it from an empty string that might imply a deliberate absence of data. A query to find customers missing a middle initial, such as SELECT * FROM customers WHERE middle_initial IS [NULL](/page/Null), must reliably identify these records without conflating them with zero-length strings or other placeholders, thereby maintaining query accuracy.^[14] By mandating this uniform treatment, Rule 3 mitigates ambiguities in data representation and querying that arise from incomplete real-world datasets, such as optional attributes in forms or unavailable measurements in scientific records. It promotes robust data integrity and reliable analysis, preventing errors where missing information is misinterpreted as specific values, and supports scalable handling of partial data without compromising the relational model's foundational principles.^[14]^[15]

Rule 4: Active Online Catalog

Rule 4, known as the Active Online Catalog rule, requires that the structure description of the entire database be represented at the logical level in the same way as ordinary data, enabling authorized users to apply the same relational language for querying the catalog as they do for regular data. This ensures the catalog functions as a dynamic component of the database, maintaining consistency with the principles outlined in the Information Rule by treating metadata uniformly as relations. The rule emphasizes that the catalog must be stored online and accessible in real time, allowing immediate reflection of any structural changes without requiring separate tools or offline processes.^[14] The term "active" in this context signifies that the catalog supports real-time updates and queries, integrating it as an essential, always-available part of the database system rather than relying on static files or external documentation. This relational representation of the catalog—often referred to as a data dictionary—facilitates seamless interrogation by users, who need only master a single data model and language, unlike in non-relational systems where metadata access demands distinct mechanisms. By embedding the catalog within the relational framework, the rule promotes uniformity and simplifies database administration, as structural modifications propagate instantly to authorized queries.^[14] A practical example of this rule in action is the use of system views like INFORMATION_SCHEMA.COLUMNS, which stores details about column names, data types, and other attributes for all tables in a database and can be queried using standard relational operations. Authorized users can execute queries such as SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'example_table' to retrieve and analyze table structures dynamically, with the catalog updating in real time to reflect schema alterations if permissions allow modifications through the same language. This approach exemplifies how the catalog remains modifiable where appropriate, ensuring it evolves alongside the database content.^[16] The implications of Rule 4 extend to enabling metadata-driven applications, where software can automatically discover and utilize database structures for tasks like report generation or schema validation without hard-coded assumptions. It also fosters self-documenting databases, as the catalog itself serves as a comprehensive, queryable reference that authorized users can extend into a full-fledged relational data dictionary if the vendor's implementation falls short. This rule underscores the relational model's emphasis on transparency and accessibility, reducing complexity for developers and administrators while enhancing overall system integrity.^[14]

Rule 5: Comprehensive Data Sublanguage Rule

Rule 5, known as the Comprehensive Data Sublanguage Rule, stipulates that a relational database management system (RDBMS) must include at least one language that comprehensively supports all essential database operations through a well-defined syntax expressible as character strings.^[2] This language must handle data definition, view definition, data manipulation (both interactively and programmatically), integrity constraints, authorizations, and transaction boundaries (such as begin, commit, and rollback).^[2] Formulated by Edgar F. Codd in 1985, the rule ensures that the system avoids reliance on disparate, non-relational tools by mandating a unified, relational-based sublanguage, often exemplified by SQL, which integrates these functions seamlessly.^[17] The rule emphasizes a single, powerful language to promote uniformity across database tasks, supporting both interactive use (e.g., via command-line interfaces) and embedded forms within host programming languages like C or Java.^[14] This relational foundation, rooted in set theory and predicate logic, distinguishes it from procedural or navigational languages used in earlier models like CODASYL, ensuring operations align with the relational paradigm's declarative nature.^[2] For instance, SQL fulfills this by providing Data Definition Language (DDL) commands like CREATE TABLE for defining structures, Data Manipulation Language (DML) operations such as SELECT, INSERT, UPDATE, and DELETE for data handling, and Data Control Language (DCL) statements like GRANT for authorizations, all within the same syntax.^[2] By requiring such comprehensiveness, Rule 5 implies enhanced developer productivity and system maintainability, as users can perform all database interactions without switching between multiple specialized languages or graphical tools that might bypass relational principles.^[17] This uniformity reduces complexity in application development and enforces consistent enforcement of rules like null value treatment from Rule 3 or catalog access from Rule 4, fostering robust, scalable RDBMS implementations.^[14]

Rule 6: View Updating Rule

Rule 6, known as the View Updating Rule, requires that all views which are theoretically updatable—meaning those whose definitions permit unambiguous translation of modifications back to the underlying base relations—must support insert, update, and delete operations through the system. This stipulation ensures that the relational database management system (DBMS) treats such views equivalently to base tables for data manipulation purposes, without imposing artificial restrictions beyond theoretical limitations.^[15] In the relational model, views function as virtual tables derived from base relations via operators like selection, projection, and equi-join, preserving the structure of relations while providing abstracted perspectives on the data. Updatability is assessed at view-definition time using algorithms such as VU-1 or stronger variants, which analyze the view's expression, base table declarations, and integrity constraints to determine properties like tuple-insertibility, tuple-deletability, and component-updatability. Simple views, such as those based on a single base table with selection (e.g., restricting to rows meeting a condition like age greater than 30) or projection that retains the primary key, are always theoretically updatable, as each view row maps uniquely to a base row, allowing insertions, updates, or deletions to propagate directly.^[15] More complex views, however, such as those involving many-to-many joins or projections omitting primary keys, may fail updatability tests due to ambiguities like "quads" (multiple contributing base rows), in which case the system flags restrictions via catalog indicators (e.g., not tuple-insertible).^[15] This rule reinforces the relational model's emphasis on abstraction and logical data independence by enabling users to modify data through customized views without direct access to base tables, thereby simplifying application development and maintenance. For example, inserting a row into a view of employees over 30 must add a qualifying record to the base employee table, with the system handling the propagation seamlessly if the view meets updatability criteria. By integrating with the comprehensive data sublanguage outlined in Rule 5, Rule 6 ensures that relational operators support full manipulative capabilities across both base relations and views.^[15]

Rule 7: High-Level Insert, Update, and Delete

Rule 7, known as the High-level Insert, Update, and Delete rule, stipulates that a relational database management system must support insert, update, and delete operations using a multiple-record-at-a-time approach, treating entire relations or derived relations as single operands rather than processing tuples individually.^[4] This requirement ensures that data manipulation can be specified declaratively at a high level, without reliance on low-level navigation or record-by-record locking, which aligns with the relational model's emphasis on set-oriented processing.^[4] The key concept here is set-at-a-time processing, exemplified by SQL's relational algebra-inspired operations that allow modifications to multiple rows in a single statement, promoting efficiency by leveraging query optimizers to minimize CPU and I/O overhead.^[4] For instance, the SQL statement UPDATE employees SET salary = salary * 1.1 WHERE department = 'Sales'; atomically adjusts salaries across all qualifying rows in the employees relation, without procedural loops or explicit tuple traversal. This approach not only simplifies user queries but also enhances performance in distributed environments by reducing intersite communication costs.^[4] By mandating such high-level capabilities, Rule 7 reinforces the declarative paradigm of the relational model, where users specify what data to modify rather than how, building on guaranteed access to individual tuples from Rule 2 while avoiding subversion through procedural code.^[4] This leads to more robust, scalable systems that handle bulk operations efficiently, a principle central to modern relational database implementations.^[4]

Rule 8: Physical Data Independence

Rule 8, known as Physical Data Independence, states that application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods.^[3] This principle, articulated by E.F. Codd in his 1985 framework for evaluating relational database management systems (RDBMS), ensures that the physical aspects of data storage—such as file structures, hardware devices, and indexing techniques—are isolated from the logical structure of the data, which is represented as relations per the foundational information rule. By maintaining this separation, the rule allows database administrators to optimize performance through physical modifications without necessitating alterations to the application code or user interfaces that interact with the logical views.^[18] At its core, physical data independence distinguishes the physical layer, which handles how data is stored and accessed on underlying hardware (e.g., disk files or memory allocations), from the logical layer, where data is organized into relations accessible via declarative queries. For instance, a database system might switch from B-tree indexing to hash indexing for faster lookups on a particular attribute, or migrate data to a different storage device, without impacting the relational schema or the SQL statements used by applications.^[19] This decoupling is achieved through the DBMS's mapping mechanisms, which translate logical requests into physical operations transparently. Codd emphasized that true relational systems must fully support this isolation to prevent non-relational systems' common pitfalls, where physical details leak into application logic.^[3] A practical example illustrates this rule: consider a relational table storing employee records; if compression is added to the physical storage format to reduce disk usage, applications issuing SELECT queries on the table—such as retrieving employee details via joins—continue to function unchanged, as the DBMS handles the decompression internally.^[20] This capability not only facilitates ongoing performance tuning but also enhances system maintainability, as physical upgrades (e.g., adopting solid-state drives) can occur without the costly and error-prone task of rewriting application programs.^[21] Overall, Rule 8 promotes a robust architecture where logical consistency and application reliability are preserved amid evolving physical infrastructures.

Rule 9: Logical Data Independence

Rule 9, known as the Logical Data Independence rule, requires that application programs and terminal activities remain logically unimpaired whenever information-preserving changes of any kind are made to the base tables, provided those changes theoretically allow for such unimpairment.^[2] This rule emphasizes the insulation of the logical layer of the database from modifications in the conceptual schema, ensuring that the overall information content and meaning are preserved without necessitating alterations to user views or dependent applications.^[22] At its core, logical data independence provides a buffer between the external schema (how users perceive the data) and the conceptual schema (the logical structure of base tables), allowing database administrators to evolve the underlying logical design—such as renaming tables, adding or removing columns, or splitting tables—while maintaining transparent access for applications through mechanisms like views.^[5] For instance, views can compensate for structural changes by joining or projecting data in a way that mimics the original schema, thereby hiding the modifications from end-users and software.^[22] This concept builds on the separation of concerns in the three-schema architecture, where changes at the logical level do not propagate to the external level as long as the semantics remain intact. A practical example involves splitting a single "orders" base table, which originally contained columns for order ID, customer ID, product ID, quantity, and price, into two separate tables: "orders" (with order ID, customer ID, and date) and "order_details" (with order ID, product ID, quantity, and price), linked by a foreign key.^[22] To preserve independence, a view can be created that joins these tables and presents the combined structure identical to the original "orders" table; applications querying the view continue to function without modification, as the view handles the underlying split transparently.^[22] The primary implication of Rule 9 is that it enables schema evolution in production environments without incurring downtime, recoding of applications, or disruptions to ongoing operations, fostering long-term maintainability and adaptability in relational database systems.^[5] This contrasts with physical data independence (Rule 8), which addresses storage-level changes, by focusing solely on logical restructurings that affect table definitions rather than file organization or access paths.^[2]

Rule 10: Integrity Independence

Rule 10, known as the Integrity Independence rule, requires that all integrity constraints specific to a relational database be definable in the relational data sublanguage and storable in the catalog, rather than embedded within application programs.^[3] This ensures that the database's integrity mechanisms operate independently of the software applications that access it, allowing constraints to be modified without necessitating changes to external code.^[2] The primary integrity constraints addressed by this rule are entity integrity and referential integrity, both foundational to the relational model. Entity integrity mandates that every component of a primary key must be non-null and unique within its relation, preventing ambiguous or incomplete identification of tuples.^[23] Referential integrity requires that the value of any foreign key in one relation either matches the primary key value of some tuple in the referenced relation or is null, thereby maintaining consistent relationships across the database without orphaned data.^[23] These constraints must be specified using the relational data sublanguage—such as declarative Data Definition Language (DDL) statements—and automatically enforced by the database management system (DBMS) during operations like inserts, updates, and deletes, with storage in the online catalog as outlined in Rule 4.^[3] For instance, in a relational schema for an employee database, entity integrity can be enforced by declaring emp_id as the primary key in the employees relation, ensuring no null values or duplicates. Referential integrity might then be defined with a statement like FOREIGN KEY (dept_id) REFERENCES departments(dept_id), which the DBMS checks automatically to validate that any dept_id in employees corresponds to an existing primary key in departments. This DDL approach embeds the constraints directly in the schema, independent of any application logic. By centralizing integrity constraints in the database catalog, Rule 10 facilitates easier maintenance, as modifications to business rules—such as tightening referential checks—can be applied once at the database level without recompiling or redeploying multiple applications.^[3] This independence enhances portability across different application environments and reduces the risk of integrity violations due to overlooked code in disparate programs.^[2]

Rule 11: Distribution Independence

Rule 11, known as the Distribution Independence rule, requires that a relational database management system (RDBMS) must support the distribution of data across multiple sites or machines while remaining fully transparent to end-users and applications. According to E.F. Codd, "A relational DBMS has distribution independence. By this we mean that application programs and on-line terminal activities should continue to operate successfully, unchanged, when data previously stored at one site is relocated to another site or is replicated at several sites."^[2] This rule extends the principles of data independence outlined in Rules 8 and 9 by ensuring that the logical and physical aspects of distribution do not impact user interactions.^[14] At its core, the rule facilitates techniques such as horizontal partitioning (dividing tables by rows across sites) and vertical partitioning (dividing tables by columns), allowing the system to manage large datasets efficiently without altering the database schema or query interfaces. Queries and updates formulated in the relational language, such as SQL, remain valid and perform as expected regardless of how the data is distributed, whether centrally or across a network.^[24] This transparency is achieved through the DBMS's query optimizer and distribution mechanisms, which handle location resolution internally.^[25] For instance, consider a query like SELECT * FROM customers WHERE country = 'USA';. In a compliant system, this executes identically whether the customers table resides on a single server or is sharded horizontally across multiple distributed servers, with the DBMS routing and aggregating results seamlessly.^[26] The primary implication of Rule 11 is enhanced scalability and flexibility for enterprise environments, enabling the construction of large-scale, federated, or cloud-based databases without necessitating modifications to existing applications or user queries. This supports growth in data volume and geographic distribution, as seen in modern distributed RDBMS implementations, while preserving the single logical database illusion.^[2]

Rule 12: The Nonsubversion Rule

Rule 12, known as the Nonsubversion Rule, stipulates that if a relational database management system (RDBMS) provides a low-level language or interface—such as one operating on single records at a time—that low-level mechanism must not allow users to subvert or bypass the integrity rules, constraints, and protections enforced by the higher-level relational language, which handles multiple records simultaneously.^[17] This rule ensures that all forms of access to the database adhere to the relational model's safeguards, preventing any procedural or record-oriented operations from undermining the system's overall integrity.^[15] The core concept behind this rule is to eliminate potential "back doors" into the database, such as direct file input/output or low-level procedural calls, that could circumvent relational semantics like data independence and constraint enforcement. For instance, even if the system supports non-relational languages for manipulation tasks, there must be rigorous proof that these cannot violate integrity constraints defined in the relational language and cataloged within the system.^[15] By mandating enforcement of high-level relational protections across all interfaces, the rule promotes a uniform application of database rules, safeguarding against inconsistencies that might arise from mixed access methods.^[17] A practical example illustrates this principle: suppose an RDBMS exposes a procedural API designed for record-level updates; under Rule 12, this API must still enforce primary key constraints, preventing duplicate insertions or null violations that would be impossible through high-level relational operations like SQL INSERT statements.^[15] Such enforcement ensures that developers cannot inadvertently or maliciously introduce data anomalies by exploiting low-level access, maintaining the database's reliability regardless of the interface used. This rule reinforces the foundational principles of the relational model, particularly Rule 0, by guaranteeing that the system's relational capabilities are not compromised by supplementary features, thereby discouraging vendors from implementing non-relational shortcuts that could erode user trust in the DBMS's relational compliance.^[17] Ultimately, it upholds the integrity and independence aspects outlined in prior rules, ensuring a cohesive relational environment.^[15]

Significance and Impact

Adoption and Compliance

In the 1980s and 1990s, major database vendors including Oracle, IBM with its DB2 product released in 1983, and Sybase aggressively marketed their systems as relational, often invoking Codd's framework to differentiate from legacy hierarchical and network models like IBM's IMS. However, many implementations offered only partial adherence, with vendors repackaging existing technologies and making unsubstantiated claims of full relational capability, such as supporting limited features like basic theta-select operations while ignoring broader model requirements. Codd himself contributed to this landscape after leaving IBM in 1984 by founding two consulting firms in 1985 dedicated to evaluating and advising on relational database products, influencing vendor designs through education on principles like normalization and referential integrity.^[15]^[27] The ANSI SQL standards of 1986 (SQL-86) and 1992 (SQL-92) directly incorporated core elements of Codd's relational model, such as tabular data representation and declarative query access, establishing a vendor-neutral foundation that propelled widespread adoption while addressing early systems' inconsistencies. Early relational prototypes like Ingres, developed in the 1970s at UC Berkeley and commercialized in the 1980s, demonstrated partial compliance by supporting key features like relational algebra operations and SQL-like querying but falling short on advanced independence rules due to implementation constraints. These standards and systems helped embed Codd's vision into industry practice, though full adherence remained elusive.^[28]^[29] Analyses by C.J. Date and Hugh Darwen in the late 1990s revealed that most commercial systems readily passed Rules 1 through 5, which emphasize foundational aspects like information representation and guaranteed access, but compliance dropped sharply for Rules 6 through 12, particularly view updating (Rule 6) and distribution independence (Rule 11), due to SQL's deviations such as duplicate rows and weak constraint support. Rule 12, the nonsubversion rule, posed ongoing challenges as low-level interfaces like cursors or procedural extensions in products such as SQL Server and Oracle often allowed bypassing relational integrity constraints, undermining the model's security guarantees. Codd's rules also informed SQL:1999 extensions, including enhanced semantic integrity mechanisms like assertions and triggers, which aimed to better align with integrity independence (Rule 10) while accommodating object-relational features.^[30]^[2]^[31] Overall, no database management system achieved complete compliance with all 12 rules by the early 2000s, as practical trade-offs in performance, legacy integration, and scalability favored partial implementations over theoretical purity, a pattern evident in vendor products that prioritized usability over strict adherence to higher-level rules like logical and physical data independence.^[30]^[2]

Criticisms and Limitations

Codd's 12 rules have been critiqued for their overly strict nature, which often prioritizes theoretical purity over practical implementation in real-world database systems. For instance, Rule 6, the View Updating Rule, requires that all theoretically updatable views be updatable by the system, but this proves impractical for complex views involving joins or aggregations, as determining unambiguous updates becomes computationally intensive and error-prone without additional user intervention.^[2]^[13] Critics argue that such rigidity ignores essential performance trade-offs, where full compliance could degrade query efficiency or increase development costs without proportional benefits in usability.^[13] Additionally, Rule 0, the Foundation Rule, enforces a purist relational approach that resists integration with non-relational elements, making it challenging for hybrid systems that blend structured and unstructured data.^[2] The rules also exhibit significant limitations stemming from their 1985 origins, predating the rise of object-oriented databases and NoSQL paradigms that address hierarchical or semi-structured data more flexibly.^[11] They provide minimal guidance on security beyond basic authorizations in Rule 5's comprehensive data sublanguage, overlooking advanced concerns like encryption, access controls for distributed environments, or privacy regulations that became prominent later.^[13] Furthermore, the framework does not account for big data scalability challenges, such as handling petabyte-scale volumes or real-time processing, which require distributed architectures beyond the rules' scope for logical and physical independence.^[32] Debates surrounding the rules often center on compliance interpretations, with Codd himself clashing with vendors who exaggerated adherence for marketing purposes, leading to systems that emulated relational features without full integrity enforcement.^[2] C.J. Date and David McGoveran advocated for stricter interpretations, emphasizing that partial compliance undermines the relational model's foundational logic and arguing that many commercial DBMSs fail on rules like distribution independence (Rule 11) due to proprietary optimizations.^[13] In practice, widespread non-compliance persists, as vendors prioritize usability and performance over exhaustive rule adherence, such as skipping full view updatability for complex scenarios.^[33] Conceptually, the rules are viewed as aspirational guidelines rather than mandatory standards, serving as benchmarks to guide development but not as exhaustive criteria for relational fidelity.^[13] Codd later expanded on this in his Relational Model Version 2 (RM/V2), introducing over 300 features to address gaps in structure, integrity, and manipulation, evolving the original rules into a broader, more nuanced framework.^[5]

Modern Perspectives

Current DBMS Compliance

Contemporary relational database management systems (RDBMS) post-2010, including PostgreSQL, MySQL, and Oracle Database 26ai (as of 2025), exhibit strong adherence to Codd's foundational Rules 0 through 5, which define the core relational structure, guaranteed access via tables and keys, systematic null value handling, and active catalogs.^[2] These systems represent all data logically in tables and support declarative query languages like SQL for manipulation, fulfilling the information rule and guaranteed access without reliance on physical storage details.^[25] Challenges arise with more advanced rules, particularly Rule 6 on view updating, where support remains partial across major RDBMS. In PostgreSQL, for example, simple views based on single tables are updatable, but complex views involving joins or aggregates require custom rules or triggers for insert, update, or delete operations, limiting full compliance.^[34] Similarly, MySQL strives for view updatability aligned with Codd's intent but restricts it to key-preserved tables without subqueries or aggregates. Oracle Database 26ai supports updatable views through its SQL engine but encounters limitations with joined or grouped views unless using INSTEAD OF triggers. Oracle complies with most rules but is criticized for partially subverting Rule 12, the nonsubversion rule, through proprietary extensions like PL/SQL procedural language, which allows low-level record access bypassing relational interfaces. Rule 11, distribution independence, shows varied implementation in modern systems, often enhanced by cloud architectures. Sharding transparency differs; while on-premises setups like PostgreSQL with Citus require application-level awareness for distributed queries, cloud services abstract this complexity. AWS Aurora, for instance, achieves distribution independence by automatically managing replication across up to 15 read replicas and a shared storage layer up to 256 TiB (as of 2025), presenting a unified database interface to users without exposing partitioning details.^[35]^[36] In contrast, the ISO/IEC SQL:2023 standard aligns closely with Codd's relational principles, incorporating features like row pattern recognition, enhanced temporal support, and property graph queries while maintaining declarative integrity constraints. Non-relational systems like MongoDB fail Rule 0 outright, as they manage data via document stores rather than relational tables, though they incorporate concepts like indexing inspired by relational practices.^[37] Projects such as Rel, built on Tutorial D, pursue full compliance by enforcing strict relational operations without procedural deviations, serving as a reference for theoretical adherence.^[14] Analyses from the 2020s indicate partial compliance in commercial RDBMS for core functionality, with ongoing improvements in standards driving better support for independence rules.^[13]

Influence on Database Standards

Codd's 12 rules profoundly influenced the formulation of the ANSI/ISO SQL standard beginning in 1986, embedding core relational principles such as the information rule (Rule 1), which mandates that all data be represented explicitly as values in tables; the guaranteed access rule (Rule 2), ensuring logical accessibility via table, row, and column identifiers; the comprehensive data sublanguage rule (Rule 5), which SQL fulfills through its declarative query capabilities; and the integrity independence rule (Rule 10), supported by SQL's constraint mechanisms for defining and enforcing data integrity independently of application code.^[2]^[28] These elements provided a rigorous benchmark for relational compliance, guiding the evolution of SQL revisions and ensuring standardized support for relational operations across vendors.^[38] Beyond SQL, the rules served as a foundational basis for ACID properties in database transactions, particularly through Rule 10's focus on integrity constraints that underpin the consistency requirement, enabling reliable data manipulation in concurrent environments.^[39] They also complemented Codd's earlier normalization theory, forming the core of database design teachings that emphasize anomaly prevention and redundancy reduction in relational schemas.^[40] The broader legacy of the rules extends to application programming interfaces like ODBC and JDBC, which embody the data independence principles (Rules 8 and 9) by allowing applications to interact with relational data sources without regard to underlying storage or schema changes.^[41] In NewSQL systems such as CockroachDB, Rule 11's distribution independence is directly realized, as the architecture transparently manages data sharding and replication across nodes, presenting a unified SQL interface to users.^[42] As an educational staple, the rules are routinely covered in computer science curricula to illustrate relational database fundamentals and system evaluation criteria.^[25] Specific integrations appear in IEEE standards and publications, where the rules are appended as benchmarks for relational database design and development. Codd's principles of data integrity are echoed in GDPR's clauses on maintaining accuracy, completeness, and reliability of personal data processing, reinforcing relational safeguards against corruption.^[43] Extensions to semi-structured data are evident in standards like SQL:2023, which incorporates JSON querying and property graph support while preserving relational access rules for hybrid environments.^[44] The enduring emphasis on data independence—spanning physical (Rule 8), logical (Rule 9), and distribution (Rule 11) aspects—continues to shape database principles, with adaptations in big data frameworks like Spark SQL, which emulates relational rules for scalable, declarative querying over distributed datasets.^[45]^[25]

References

[1]
50 Years of Queries | Communications of the ACM
Jun 25, 2024 · Relational database systems were attracting so much attention during the 1980s that Codd published a list of Twelve Rules (actually 13 rules, ...
[2]
Codd's Twelve Rules - Simple Talk - Redgate Software
Apr 14, 2020 · Dr. Codd, the creator of relational databases, was bothered by this, so he set up a set of 13 rules that a product had to match to be considered relational.
[3]
Is your DBMS really relational? | The Thaumatorium
Oct 14, 1985 · This paper critiques DBMS' that claim to be relational but fail to adhere to the principles of the Relational Model.
[4]
Does your DBMS run by the rules? - The Thaumatorium
Oct 21, 1985 · Does your DBMS run by the rules? To be "mid-80s" fully relational, a DBMS must support all 12 basic rules plus nine structural, 18 manipulative ...
[5]
The relational model for database management: version 2
Codd, E. F. (1985) "How Relational Is Your Database Management System? ... Codd, E. F. (1986b) The Twelve Rules for Relational DBMS. San Jose, The ...
[6]
Edgar F. Codd - A.M. Turing Award Laureate
Codd who invented the relational model and was responsible for the significant development of the database field as a scientific discipline.
[7]
Edgar F. Codd - IBM
“Ted” Codd was a mathematician and computer scientist best known for his trailblazing work on the relational model that led to the multibillion-dollar database ...
[8]
Access isn't a relational database - The Register
Dec 22, 2006 · He published two articles in Computerworld (14th and 21st October 1985) and in the first he wrote: “In this paper I supply a set of rules with ...<|control11|><|separator|>
[9]
Codd's 12 Rules - Computerworld
Sep 2, 2002 · The relational data model was first developed by Dr. E.F. Codd, an IBM. researcher, in 1970. In 1985, Dr. Codd published a list of 12 rules.Missing: ACM | Show results with:ACM<|control11|><|separator|>
[10]
The early history of databases and DB2 - DataGeek.blog
Feb 2, 2012 · ... Oracle was the “first relational database”. Oracle was “commercially available” as a relational database before DB2 in June of 1979 (http ...
[11]
[PDF] The Third Manifesto - DCS - Department of Computer Science
... 12. Copyright © 2014 C. J. Date and Hugh Darwen. 1. Part I: PRELIMINARIES. This ... Codd's original papers, therefore, it can be seen as an abstract ...
[12]
The Relational Model And Edgar F. Codd's 12 Rules - BPS-Corp.com
Nov 7, 2022 · EF Codd's 12 Rules By Mike Bennyhoff. The relational model for databases was first proposed by Edgar F. Codd in 1970. Codd's model was based ...<|control11|><|separator|>
[13]
Codd's 12 Criteria (Normalization) - RelationalDBDesign
Rule 1: Information Rule. All information in a relational database is represented explicitly at the logical level and in exactly one way—by values in tables.
[14]
Understanding Codd's 12 Rules for RDBMS
Oct 17, 2020 · Guaranteed Logical Access: Each and every datum (atomic value) is guaranteed ... Rule #2 by subverting the logical addressing scheme[13].<|control11|><|separator|>
[15]
Codd's Twelve Rules - Rel
Jun 30, 2019 · Rule 12: If a relational system has a low-level (single-record-at-a-time) language, that low level cannot be used to subvert or bypass the ...Missing: ACM | Show results with:ACM
[16]
[PDF] The relational model for database management - CodeBlab
Beginning in 1968, Dr. Codd turned his attention to the management of large commercial databases and developed the relational model as a foundation. Since the ...
[17]
https://www.softwaregems.com.au/Documents/Documentary%20Examples/Codd%27s%20Twelve%20Rules.pdf
[18]
None
### Extracted Statement and Explanation of Rule 5: Comprehensive Data Sublanguage Rule
[19]
[PDF] ch1.pdf - Chapter 7: Relational Database Design
Database System Concepts - 7th Edition. Physical Data Independence. ▫ Physical Data Independence – the ability to modify the physical schema without changing ...
[20]
[PDF] The Relational Data Model Data and Its Structure Physical Data ...
This property is referred to as physical data independence. 5. Conceptual Data Level (con t). Application. DBMS. Conceptual view of data. Physical view of data.
[21]
Physical and Logical Data Independence - GeeksforGeeks
Jul 15, 2025 · Data independence refers to the ability to change the schema (structure) at one level without affecting the schema at higher or lower levels ...Logical Data Independence · 1. Flexibility In Database... · 3. Compatibility With...<|control11|><|separator|>
[22]
[PDF] CSE 544 Principles of Database Management Systems - Washington
– Example relationships are course registrations, product purchases. • User ... • Motivation: better logical and physical data independence. • Overview.
[23]
[PDF] Logical Data Independence Via Views: A Misapprehension?
Logical Data Independence Via Views: A Misapprehension? J.M. de Graa& RJ ... Splitting a table into two or more tables by columns using column names ...
[24]
[PDF] A Relational Model of Data for Large Shared Data Banks
A Relational Model of Data for. Large Shared Data Banks. E. F. CODD. IBM Research Laboratory, San Jose, California. Future users of large data banks must be ...
[25]
[PDF] Codd's 12 Rules
Dr Edgar F. Codd did some extensive research in Relational Model of database systems and came up with twelve rules of his own which according to him, ...
[26]
Codd's Rules in DBMS - GeeksforGeeks
Jul 23, 2025 · Rule 1: The Information Rule · Rule 2: The Guaranteed Access Rule · Rule 3: Systematic Treatment of NULL Values · Rule 4: Active Online Catalog ...Missing: RM/ 333
[27]
DBMS - Codd's 12 Rules - Tutorials Point
Rule 3: Systematic Treatment of NULL Values. The NULL values in a database must be given a systematic and uniform treatment. This is a very important rule ...Missing: source | Show results with:source
[28]
12 simple rules: How Ted Codd transformed the humble database
Aug 19, 2013 · By 1985, Codd had outlined his 12 rules for defining a fully relational database. Sponsored: How TeamViewer builds enterprise trust through ...
[29]
The SQL Standard - ISO/IEC 9075:2023 (ANSI X3.135)
Oct 5, 2018 · SQL (Structured Query Language) standard for relational database management systems is ISO/IEC 9075:2023, with origins in ANSI X3.135.
[30]
History of the Ingres Corporation - ResearchGate
Aug 7, 2025 · Codd published his famous 12 `rules' for determining whether a database management system can be called relational. Adherence to these rules ...
[31]
Semantic Integrity Support in SQL-99 and Commercial (Object ...
Aug 6, 2025 · In this paper, we give an overview of the semantic integrity support in the most recent SQL-standard SQL:1999, and we show to what extent the ...
[32]
https://www.i-programmer.info/babbages-bag/292-codd-and-his-rules.html
[33]
What Is a True Relational System (and What It Is Not)
Mar 11, 2017 · Codd's 12 Rules Are Deprecated. While a true RDBMS will comply with Codd's 12 Rules, compliance is insufficient for relational fidelity. The ...
[34]
Documentation: 18: 39.2. Views and the Rule System - PostgreSQL
Views in PostgreSQL are implemented using the rule system. A view is basically an empty table (having no actual storage) with an ON SELECT DO INSTEAD rule.
[35]
What is Amazon Aurora? - Amazon Aurora
### Summary: How AWS Aurora Provides Distribution Independence to Users
[36]
how many of the codd's rules are supported by oracle?
Oracle allows us to update key-preserved views. Rule 7: High-level insert, update, and delete. Oracle has set-based insert, update, and delete. Rule 8 ...
[37]
Introduction of Relational Model and Codd Rules in DBMS
Jul 23, 2025 · Codd has proposed 13 rules which are popularly known as Codd's 12 rules. These rules are stated as follows: Rule 0: Foundation Rule- For any ...
[38]
54 Years of Relational Databases | LearnSQL.com
Jun 27, 2024 · Edgar Frank (Ted) Codd was an English mathematician and computer scientist. During World War II, he was a pilot in the Royal Air Force. After ...Who Was Edgar Frank Codd? · Databases in the 1960s · The Road to Relational...Missing: biography | Show results with:biography
[39]
ACID Properties in DBMS - GeeksforGeeks
Sep 8, 2025 · How ACID Properties Impact DBMS Design and Operation · 1. Data Integrity and Consistency · 2. Concurrency Control · 3. Recovery and Fault Tolerance.Missing: teachings | Show results with:teachings
[40]
Normalization to 3NF | What?, Steps, Types & Examples
Normalization is a way of arranging the database data to eliminate data duplication, anomaly of addition, anomaly of modification & anomaly of deletion.Missing: impact curriculum
[41]
Codd's Rules for Relational Database Systems - SQL in a Nutshell ...
Information is represented logically in tables. · Data must be logically accessible by table, primary key, and column. · Null values must be uniformly treated as ...
[42]
Mainframe to Distributed SQL, Part 3 - CockroachDB
Nov 14, 2024 · In distributed databases like CockroachDB, the system hides the complexities of data replication (keyspace sharding into ranges and distribution ...
[43]
Data Integrity: A Detailed Overview - EFS Consulting
Jul 29, 2025 · Edgar F. Codd introduced a data model that remains the foundation of modern database systems in his article “A Relational Model of Data for ...
[44]
Multi-model query languages: taming the variety of big data
May 31, 2023 · In Sect. 3, we present the SQL-based extensions toward multi-model data, including the standard SQL extensions such as SQL/XML, SQL/JSON, ...
[45]
Spark SQL for Relational Databases - Analytics Vidhya
Jul 19, 2022 · This article will look at some of the significant advances made in harnessing the power of relational databases, but “at scale,” using some of the newer ...Missing: emulation Codd's