Codd's 12 rules
Codd's twelve rules are a set of thirteen criteria (numbered from zero to twelve) proposed by Edgar F. Codd, the pioneer of the relational database model, in 1985 to establish the standards for a database management system to be considered fully relational.[1] These rules define the core principles of the relational model, emphasizing logical data independence, data integrity, and user-friendly access mechanisms to ensure that systems adhere strictly to relational theory rather than merely incorporating relational features.[2] Developed amid growing commercial interest in relational databases during the 1980s, the rules were first detailed in two Computerworld articles titled "Is Your DBMS Really Relational?" (October 14, 1985) and "Does Your DBMS Run By the Rules?" (October 21, 1985), addressing vendors' tendency to market non-relational systems as relational for marketing advantage.[2] The foundation rule (Rule 0) requires that all data management functions be performable using only relational capabilities, setting the baseline for compliance.[2] Subsequent rules cover critical aspects such as representing all information in tables (Rule 1), guaranteeing access to data via logical identifiers (Rule 2), handling null values systematically (Rule 3), maintaining active online catalogs as base tables (Rule 4), supporting comprehensive sublanguages for data definition, manipulation, and control (Rule 5), enabling view updating (Rule 6), providing relational operations for insert, update, and delete (Rule 7), ensuring physical data independence (Rule 8) and logical data independence (Rule 9), enforcing integrity constraints through the relational language (Rule 10), supporting distributed databases (Rule 11), and preventing low-level languages from bypassing relational safeguards (Rule 12).[2] While no commercial database management system has fully satisfied all twelve rules, they remain a foundational benchmark for evaluating relational fidelity and have profoundly influenced the design and standardization of modern RDBMS, including SQL-based systems.[1]Introduction
Definition and Purpose
Codd's 12 rules, formally known as a set of 13 criteria (numbered from Rule 0 to Rule 12), were proposed by Edgar F. Codd in 1985 as a formal evaluation scheme to assess whether a database management system (DBMS) qualifies as truly relational.[3] These rules serve as benchmarks for fidelity to the relational model, which Codd himself introduced in 1970 as a framework for organizing data into relations to simplify data access and management. Often referred to as the "Twelve Commandments" despite the inclusion of the foundational Rule 0, the criteria emphasize that a relational DBMS must manage data exclusively through relational mechanisms without relying on non-relational or procedural extensions.[4] The primary purpose of these rules is to protect the integrity of the relational model against dilutions by vendors, who in the 1980s frequently marketed hybrid or "born-again" systems as fully relational while incorporating navigational or hierarchical elements that undermined relational principles.[3] Codd developed the rules out of frustration with such misleading claims, aiming to provide database users and purchasers with a rigorous standard to evaluate vendor products and ensure long-term investments in applications, training, and data administration remain viable.[4] By enforcing strict adherence, the rules promote standardized data independence, integrity constraints, and manipulative capabilities, countering the "performance myth" that non-relational features were necessary for efficiency.[3] At their core, the rules define a relational DBMS as a system that organizes and manages data using relations—typically represented as tables consisting of rows (tuples) and columns (attributes)—and supports declarative query languages for operations like retrieval, insertion, update, and deletion at the relational level, often handling multiple records simultaneously.[4] This approach ensures data sublanguage commands are comprehensive and uniformly applicable across the database, distinguishing true relational systems from those requiring low-level navigational access or record-at-a-time processing.[3]Relation to the Relational Model
The relational model, introduced by Edgar F. Codd in 1970, organizes data into relations—mathematical structures akin to tables—comprising rows (tuples) and columns (attributes) defined over specific domains, with the entire framework rooted in set theory and first-order predicate logic to enable precise querying and manipulation. This abstraction allows data to be represented declaratively without regard to physical storage details, distinguishing it from earlier models like hierarchical or network databases that imposed rigid navigational structures. Codd's 12 rules, formalized in 1985 and elaborated in his 1990 work, extend this foundational model by translating its theoretical principles into practical, verifiable requirements for database management systems (DBMS).[5] Specifically, the rules enforce declarative access to data through non-procedural languages, logical and physical data independence to insulate applications from storage changes, and systematic handling of null values to represent missing or inapplicable information accurately, thereby preventing implementations from deviating toward non-relational paradigms such as pointer-based hierarchies or CODASYL networks.[5] These extensions ensure that systems adhere strictly to the model's emphasis on logical data structures, where relations maintain integrity through keys and domains, rather than relying on implementation-specific optimizations.[5] A core concept in this relation is the prioritization of logical over physical representation: users interact with data via relational algebra operations like selection, projection, and join, oblivious to how tuples are stored or indexed on disk. The rules serve as a rigorous checklist to assess a DBMS's fidelity to the model, confirming that all data manipulation occurs within the relational framework without procedural code or external dependencies.[5] The rules presuppose familiarity with basic relational elements, such as primary keys for uniqueness and query languages for retrieval. For instance, consider an employee relation with attributes EmployeeID (a unique integer key), Name (a string domain), and Department (a categorical domain):| EmployeeID | Name | Department |
|---|---|---|
| 101 | Alice Smith | Engineering |
| 102 | Bob Johnson | Sales |
| 103 | Carol Lee | Engineering |
Historical Development
Edgar F. Codd's Contributions
Edgar Frank Codd (1923–2003) was a British-born computer scientist and mathematician whose work profoundly influenced modern database systems. Born on August 19, 1923, in the Isle of Portland, England, Codd earned an honors degree in mathematics from Exeter College, Oxford University, in 1948 after serving as a pilot in the Royal Air Force during World War II. He later obtained a Ph.D. in computer and communication sciences from the University of Michigan in 1965. Codd joined IBM in 1949 as a mathematical programmer in New York City, initially focusing on early computing systems such as the Selective Sequence Electronic Calculator and contributing to the design of the IBM 701 computer in the early 1950s. By 1957, he had moved to Poughkeepsie, New York, where he helped develop the IBM 7030 STRETCH, the company's first transistorized supercomputer, advancing concepts in multiprogramming. In 1968, Codd relocated to IBM's San Jose Research Laboratory in California, marking his transition toward database research amid growing needs for managing large-scale data in business environments.[6][7] Codd's pivotal contributions began in the late 1960s when he sought alternatives to prevailing hierarchical and network database models, such as IBM's Information Management System (IMS) and the Conference on Data Systems Languages (CODASYL) approach, which required programmers to navigate complex pointer-based structures. In a landmark 1970 paper, "A Relational Model of Data for Large Shared Data Banks," published in Communications of the ACM, he introduced the relational model, proposing data organization into tables (relations) with rows (tuples) and columns (attributes), linked via keys to ensure logical independence and simplify querying without knowledge of physical storage. This model emphasized declarative access, allowing users to specify what data they needed rather than how to retrieve it, fundamentally shifting database design from navigational to set-based operations. Building on this foundation, Codd extended the relational model throughout the 1970s, refining concepts like normalization to minimize redundancy and dependency theory to maintain data integrity.[6] By the late 1970s and early 1980s, Codd grew concerned with database vendors, including IBM itself, incorporating non-relational features into products like SQL/DS—such as low-level navigational interfaces and deviations from strict relational principles—that diluted the model's purity and complicated user access. Motivated to establish clear criteria for true relational database management systems (RDBMS), Codd advocated for rigorous standards to enforce data independence, integrity, and usability, preventing vendor implementations from undermining the relational paradigm's benefits. His efforts culminated in the 1981 A.M. Turing Award from the Association for Computing Machinery, recognizing "his fundamental and continuing contributions to the theory and practice of database management systems." Codd retired from IBM in 1984 but continued independent consulting and research.[6][2] In the 1990s, Codd further advanced his ideas through the Relational Model Version 2 (RM/V2), outlined in his 1990 book The Relational Model for Database Management: Version 2, which expanded the original framework into over 300 detailed rules and features to address evolving requirements like temporal data and enhanced integrity constraints. This work reinforced his commitment to evolving the relational model while preserving its mathematical rigor based on first-order predicate logic and set theory.[5]Publication and Evolution
E. F. Codd first formally proposed his 12 rules for evaluating relational database management systems (RDBMS) in a two-part article series published in Computerworld magazine on October 14 and October 21, 1985. Titled "Is Your DBMS Really Relational?" and "Does Your DBMS Run by the Rules?", these articles outlined the rules as a rigorous test to distinguish truly relational systems from those merely claiming the label, including Rule 0 as the foundational principle that the system must use relational facilities exclusively to manage the database.[2][8] The publication came amid growing commercial interest in relational databases during the 1980s, as vendors such as Oracle and IBM with its DB2 product aggressively marketed their offerings as relational, often without full adherence to Codd's emerging criteria. This led to widespread use of the term "relational" in industry promotions, sparking debates on authenticity and prompting Codd's rules as a benchmark for compliance. The rules also exerted influence on the development of ANSI SQL standards, providing conceptual guidance for features like data sublanguages and catalog management in subsequent revisions beyond the initial 1986 standard.[9][10] In the 1990s, Codd refined his ideas through the RM/V2 framework, detailed in his 1990 book The Relational Model for Database Management: Version 2, which updated several rules—such as expanding view updatability requirements—and integrated them into a broader vision of relational integrity and temporal support. These evolutions addressed limitations in early implementations and aimed to guide future standards, though assessments of commercial compliance varied.[5]The Rules
Rule 0: The Foundation Rule
Rule 0, known as the Foundation Rule, establishes the fundamental prerequisite for a database management system (DBMS) to be considered truly relational. It requires that any system advertised or claimed to be a relational DBMS must manage the database exclusively using its relational facilities, without relying on any non-relational mechanisms. This ensures that the entire scope of database operations—from definition and manipulation to integrity enforcement—is handled through relational principles alone.[2] The rule explicitly prohibits the incorporation of non-relational extensions, such as navigational pointers, hierarchical structures, or procedural coding elements, which were common in pre-relational systems like CODASYL or IMS. Instead, it mandates that all data manipulation be declarative and relation-based, leveraging mathematical relations (tables with rows and columns) to represent and query data. This foundational constraint guarantees that the system's architecture adheres strictly to the relational model, preventing hybrid approaches that dilute its purity and benefits, such as data independence and simplicity.[2][5] A practical illustration of this rule is that a compliant DBMS cannot expose or depend on low-level, record-oriented APIs for data access; rather, it must provide interfaces limited to relational constructs like tables, primary and foreign keys, and a comprehensive query language (e.g., SQL equivalents) for all operations, including inserts, updates, and deletes.[2] Codd introduced Rule 0 in his 1985 series of articles in Computerworld as the zeroth rule—preceding the other twelve—to emphasize its indispensable role, directly addressing misleading vendor claims in the 1980s where products were labeled "relational" despite heavy reliance on non-relational procedural add-ons.[9][5]Rule 1: The Information Rule
Rule 1, known as the Information Rule, stipulates that all information in a relational database, including both user data and metadata such as database structure definitions, must be represented explicitly at the logical level and in exactly one way—by values in tables. This rule ensures that the database's logical structure is self-contained within the relational framework, without reliance on external files or non-relational mechanisms for storing schema information. By confining all data representation to tables, the rule promotes a uniform approach to information management, abstracting away physical storage details and emphasizing the logical view presented to users.[2] At the logical level, this representation hides the underlying physical storage mechanisms, such as file formats or indexing structures, allowing users to interact solely with tabular data through relational operations. Metadata, including details like table names, column definitions, and data types, is stored as rows and columns in dedicated system tables, eliminating the need for separate schema files or proprietary formats outside the relational model. This approach aligns with the foundational relational facilities outlined in Rule 0, ensuring the entire system operates on a consistent data model.[11] For instance, the definition of a table—such as its columns and their types—would be entered as values in a system catalog table, treatable like any other relational data. This enables the database to be self-describing, where structural information is as accessible and manipulable as application data, fostering flexibility in database design and maintenance. The implication of this rule is profound: it establishes the basis for treating data and metadata uniformly, which is essential for building truly relational systems that support dynamic introspection and evolution without disrupting the logical schema.[12]Rule 2: The Guaranteed Access Rule
The Guaranteed Access Rule, designated as Rule 2 in Edgar F. Codd's framework for relational database management systems, mandates that each and every datum (atomic value) in the database is guaranteed to be logically accessible by specifying a combination of the relation name, primary key value, and attribute name.[13] This precise addressing scheme ensures unambiguous retrieval of individual scalar values without ambiguity or reliance on implementation-specific details.[9] As a direct corollary to Rule 1's emphasis on data representation solely as values in relations, Rule 2 reinforces the need for relations to adhere to first normal form (1NF), where attributes are atomic, and higher normal forms like fifth normal form (5NF) to maintain dependencies tied to the primary key.[13] It explicitly prohibits access methods based on positional or ordinal references, such as identifying data by the "third field" in a sequential record, thereby mandating the use of primary keys for unique tuple identification.[2] A practical illustration of this rule involves accessing an employee's salary in a relation named employees, where emp_id serves as the primary key. The query would be:This logical specification avoids any reference to physical constructs, such as "field 5 of record 10," ensuring the access remains independent of storage layout.[2] By promoting a three-part logical addressing mechanism over physical pointers or navigational paths, Rule 2 enhances data independence and portability, allowing applications to interact with the database schema without concern for underlying hardware or storage changes.[13] This foundational principle underscores the relational model's shift toward declarative query languages for robust, scalable data management.[9]sqlSELECT [salary](/page/Salary) FROM employees WHERE emp_id = 123;SELECT [salary](/page/Salary) FROM employees WHERE emp_id = 123;
Rule 3: Systematic Treatment of Null Values
Rule 3 requires that a relational database management system (RDBMS) support null values to represent missing or inapplicable information in a systematic and uniform manner, independent of the data type involved. Nulls must be distinctly handled and differentiated from other representations such as empty character strings, strings of blank characters, zeros, or any other numeric values that could otherwise serve as legitimate data entries. This ensures that nulls function as a dedicated marker solely for the absence of applicable data, avoiding the pitfalls of ad-hoc conventions that vary by column or domain.[14] Central to this rule is the concept of nulls as a unique indicator for "unknown" or "not applicable" states, which necessitates the adoption of three-valued logic in query processing and data manipulation. In traditional two-valued logic (true/false), the presence of nulls introduces a third outcome: unknown. For instance, a comparison involving a null value, such as checking if a salary equals 50,000, evaluates to unknown rather than true or false, propagating through operations like AND, OR, and NOT according to extended truth tables. This systematic approach allows queries to explicitly test for nulls using operators like IS NULL or IS NOT NULL, ensuring consistent behavior across the database. Primary keys and certain foreign keys can be constrained to disallow nulls, enforcing data integrity where completeness is mandatory.[14][15] Consider a customer table with columns for first name, last name, and middle initial. The middle initial field can legitimately contain a null value for customers without a middle name, distinguishing it from an empty string that might imply a deliberate absence of data. A query to find customers missing a middle initial, such asSELECT * FROM customers WHERE middle_initial IS [NULL](/page/Null), must reliably identify these records without conflating them with zero-length strings or other placeholders, thereby maintaining query accuracy.[14]
By mandating this uniform treatment, Rule 3 mitigates ambiguities in data representation and querying that arise from incomplete real-world datasets, such as optional attributes in forms or unavailable measurements in scientific records. It promotes robust data integrity and reliable analysis, preventing errors where missing information is misinterpreted as specific values, and supports scalable handling of partial data without compromising the relational model's foundational principles.[14][15]
Rule 4: Active Online Catalog
Rule 4, known as the Active Online Catalog rule, requires that the structure description of the entire database be represented at the logical level in the same way as ordinary data, enabling authorized users to apply the same relational language for querying the catalog as they do for regular data. This ensures the catalog functions as a dynamic component of the database, maintaining consistency with the principles outlined in the Information Rule by treating metadata uniformly as relations. The rule emphasizes that the catalog must be stored online and accessible in real time, allowing immediate reflection of any structural changes without requiring separate tools or offline processes.[14] The term "active" in this context signifies that the catalog supports real-time updates and queries, integrating it as an essential, always-available part of the database system rather than relying on static files or external documentation. This relational representation of the catalog—often referred to as a data dictionary—facilitates seamless interrogation by users, who need only master a single data model and language, unlike in non-relational systems where metadata access demands distinct mechanisms. By embedding the catalog within the relational framework, the rule promotes uniformity and simplifies database administration, as structural modifications propagate instantly to authorized queries.[14] A practical example of this rule in action is the use of system views like INFORMATION_SCHEMA.COLUMNS, which stores details about column names, data types, and other attributes for all tables in a database and can be queried using standard relational operations. Authorized users can execute queries such as SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'example_table' to retrieve and analyze table structures dynamically, with the catalog updating in real time to reflect schema alterations if permissions allow modifications through the same language. This approach exemplifies how the catalog remains modifiable where appropriate, ensuring it evolves alongside the database content.[16] The implications of Rule 4 extend to enabling metadata-driven applications, where software can automatically discover and utilize database structures for tasks like report generation or schema validation without hard-coded assumptions. It also fosters self-documenting databases, as the catalog itself serves as a comprehensive, queryable reference that authorized users can extend into a full-fledged relational data dictionary if the vendor's implementation falls short. This rule underscores the relational model's emphasis on transparency and accessibility, reducing complexity for developers and administrators while enhancing overall system integrity.[14]Rule 5: Comprehensive Data Sublanguage Rule
Rule 5, known as the Comprehensive Data Sublanguage Rule, stipulates that a relational database management system (RDBMS) must include at least one language that comprehensively supports all essential database operations through a well-defined syntax expressible as character strings.[2] This language must handle data definition, view definition, data manipulation (both interactively and programmatically), integrity constraints, authorizations, and transaction boundaries (such as begin, commit, and rollback).[2] Formulated by Edgar F. Codd in 1985, the rule ensures that the system avoids reliance on disparate, non-relational tools by mandating a unified, relational-based sublanguage, often exemplified by SQL, which integrates these functions seamlessly.[17] The rule emphasizes a single, powerful language to promote uniformity across database tasks, supporting both interactive use (e.g., via command-line interfaces) and embedded forms within host programming languages like C or Java.[14] This relational foundation, rooted in set theory and predicate logic, distinguishes it from procedural or navigational languages used in earlier models like CODASYL, ensuring operations align with the relational paradigm's declarative nature.[2] For instance, SQL fulfills this by providing Data Definition Language (DDL) commands likeCREATE TABLE for defining structures, Data Manipulation Language (DML) operations such as SELECT, INSERT, UPDATE, and DELETE for data handling, and Data Control Language (DCL) statements like GRANT for authorizations, all within the same syntax.[2]
By requiring such comprehensiveness, Rule 5 implies enhanced developer productivity and system maintainability, as users can perform all database interactions without switching between multiple specialized languages or graphical tools that might bypass relational principles.[17] This uniformity reduces complexity in application development and enforces consistent enforcement of rules like null value treatment from Rule 3 or catalog access from Rule 4, fostering robust, scalable RDBMS implementations.[14]
Rule 6: View Updating Rule
Rule 6, known as the View Updating Rule, requires that all views which are theoretically updatable—meaning those whose definitions permit unambiguous translation of modifications back to the underlying base relations—must support insert, update, and delete operations through the system. This stipulation ensures that the relational database management system (DBMS) treats such views equivalently to base tables for data manipulation purposes, without imposing artificial restrictions beyond theoretical limitations.[15] In the relational model, views function as virtual tables derived from base relations via operators like selection, projection, and equi-join, preserving the structure of relations while providing abstracted perspectives on the data. Updatability is assessed at view-definition time using algorithms such as VU-1 or stronger variants, which analyze the view's expression, base table declarations, and integrity constraints to determine properties like tuple-insertibility, tuple-deletability, and component-updatability. Simple views, such as those based on a single base table with selection (e.g., restricting to rows meeting a condition like age greater than 30) or projection that retains the primary key, are always theoretically updatable, as each view row maps uniquely to a base row, allowing insertions, updates, or deletions to propagate directly.[15] More complex views, however, such as those involving many-to-many joins or projections omitting primary keys, may fail updatability tests due to ambiguities like "quads" (multiple contributing base rows), in which case the system flags restrictions via catalog indicators (e.g., not tuple-insertible).[15] This rule reinforces the relational model's emphasis on abstraction and logical data independence by enabling users to modify data through customized views without direct access to base tables, thereby simplifying application development and maintenance. For example, inserting a row into a view of employees over 30 must add a qualifying record to the base employee table, with the system handling the propagation seamlessly if the view meets updatability criteria. By integrating with the comprehensive data sublanguage outlined in Rule 5, Rule 6 ensures that relational operators support full manipulative capabilities across both base relations and views.[15]Rule 7: High-Level Insert, Update, and Delete
Rule 7, known as the High-level Insert, Update, and Delete rule, stipulates that a relational database management system must support insert, update, and delete operations using a multiple-record-at-a-time approach, treating entire relations or derived relations as single operands rather than processing tuples individually.[4] This requirement ensures that data manipulation can be specified declaratively at a high level, without reliance on low-level navigation or record-by-record locking, which aligns with the relational model's emphasis on set-oriented processing.[4] The key concept here is set-at-a-time processing, exemplified by SQL's relational algebra-inspired operations that allow modifications to multiple rows in a single statement, promoting efficiency by leveraging query optimizers to minimize CPU and I/O overhead.[4] For instance, the SQL statementUPDATE employees SET salary = salary * 1.1 WHERE department = 'Sales'; atomically adjusts salaries across all qualifying rows in the employees relation, without procedural loops or explicit tuple traversal. This approach not only simplifies user queries but also enhances performance in distributed environments by reducing intersite communication costs.[4]
By mandating such high-level capabilities, Rule 7 reinforces the declarative paradigm of the relational model, where users specify what data to modify rather than how, building on guaranteed access to individual tuples from Rule 2 while avoiding subversion through procedural code.[4] This leads to more robust, scalable systems that handle bulk operations efficiently, a principle central to modern relational database implementations.[4]
Rule 8: Physical Data Independence
Rule 8, known as Physical Data Independence, states that application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods.[3] This principle, articulated by E.F. Codd in his 1985 framework for evaluating relational database management systems (RDBMS), ensures that the physical aspects of data storage—such as file structures, hardware devices, and indexing techniques—are isolated from the logical structure of the data, which is represented as relations per the foundational information rule. By maintaining this separation, the rule allows database administrators to optimize performance through physical modifications without necessitating alterations to the application code or user interfaces that interact with the logical views.[18] At its core, physical data independence distinguishes the physical layer, which handles how data is stored and accessed on underlying hardware (e.g., disk files or memory allocations), from the logical layer, where data is organized into relations accessible via declarative queries. For instance, a database system might switch from B-tree indexing to hash indexing for faster lookups on a particular attribute, or migrate data to a different storage device, without impacting the relational schema or the SQL statements used by applications.[19] This decoupling is achieved through the DBMS's mapping mechanisms, which translate logical requests into physical operations transparently. Codd emphasized that true relational systems must fully support this isolation to prevent non-relational systems' common pitfalls, where physical details leak into application logic.[3] A practical example illustrates this rule: consider a relational table storing employee records; if compression is added to the physical storage format to reduce disk usage, applications issuing SELECT queries on the table—such as retrieving employee details via joins—continue to function unchanged, as the DBMS handles the decompression internally.[20] This capability not only facilitates ongoing performance tuning but also enhances system maintainability, as physical upgrades (e.g., adopting solid-state drives) can occur without the costly and error-prone task of rewriting application programs.[21] Overall, Rule 8 promotes a robust architecture where logical consistency and application reliability are preserved amid evolving physical infrastructures.Rule 9: Logical Data Independence
Rule 9, known as the Logical Data Independence rule, requires that application programs and terminal activities remain logically unimpaired whenever information-preserving changes of any kind are made to the base tables, provided those changes theoretically allow for such unimpairment.[2] This rule emphasizes the insulation of the logical layer of the database from modifications in the conceptual schema, ensuring that the overall information content and meaning are preserved without necessitating alterations to user views or dependent applications.[22] At its core, logical data independence provides a buffer between the external schema (how users perceive the data) and the conceptual schema (the logical structure of base tables), allowing database administrators to evolve the underlying logical design—such as renaming tables, adding or removing columns, or splitting tables—while maintaining transparent access for applications through mechanisms like views.[5] For instance, views can compensate for structural changes by joining or projecting data in a way that mimics the original schema, thereby hiding the modifications from end-users and software.[22] This concept builds on the separation of concerns in the three-schema architecture, where changes at the logical level do not propagate to the external level as long as the semantics remain intact. A practical example involves splitting a single "orders" base table, which originally contained columns for order ID, customer ID, product ID, quantity, and price, into two separate tables: "orders" (with order ID, customer ID, and date) and "order_details" (with order ID, product ID, quantity, and price), linked by a foreign key.[22] To preserve independence, a view can be created that joins these tables and presents the combined structure identical to the original "orders" table; applications querying the view continue to function without modification, as the view handles the underlying split transparently.[22] The primary implication of Rule 9 is that it enables schema evolution in production environments without incurring downtime, recoding of applications, or disruptions to ongoing operations, fostering long-term maintainability and adaptability in relational database systems.[5] This contrasts with physical data independence (Rule 8), which addresses storage-level changes, by focusing solely on logical restructurings that affect table definitions rather than file organization or access paths.[2]Rule 10: Integrity Independence
Rule 10, known as the Integrity Independence rule, requires that all integrity constraints specific to a relational database be definable in the relational data sublanguage and storable in the catalog, rather than embedded within application programs.[3] This ensures that the database's integrity mechanisms operate independently of the software applications that access it, allowing constraints to be modified without necessitating changes to external code.[2] The primary integrity constraints addressed by this rule are entity integrity and referential integrity, both foundational to the relational model. Entity integrity mandates that every component of a primary key must be non-null and unique within its relation, preventing ambiguous or incomplete identification of tuples.[23] Referential integrity requires that the value of any foreign key in one relation either matches the primary key value of some tuple in the referenced relation or is null, thereby maintaining consistent relationships across the database without orphaned data.[23] These constraints must be specified using the relational data sublanguage—such as declarative Data Definition Language (DDL) statements—and automatically enforced by the database management system (DBMS) during operations like inserts, updates, and deletes, with storage in the online catalog as outlined in Rule 4.[3] For instance, in a relational schema for an employee database, entity integrity can be enforced by declaringemp_id as the primary key in the employees relation, ensuring no null values or duplicates. Referential integrity might then be defined with a statement like FOREIGN KEY (dept_id) REFERENCES departments(dept_id), which the DBMS checks automatically to validate that any dept_id in employees corresponds to an existing primary key in departments. This DDL approach embeds the constraints directly in the schema, independent of any application logic.
By centralizing integrity constraints in the database catalog, Rule 10 facilitates easier maintenance, as modifications to business rules—such as tightening referential checks—can be applied once at the database level without recompiling or redeploying multiple applications.[3] This independence enhances portability across different application environments and reduces the risk of integrity violations due to overlooked code in disparate programs.[2]
Rule 11: Distribution Independence
Rule 11, known as the Distribution Independence rule, requires that a relational database management system (RDBMS) must support the distribution of data across multiple sites or machines while remaining fully transparent to end-users and applications. According to E.F. Codd, "A relational DBMS has distribution independence. By this we mean that application programs and on-line terminal activities should continue to operate successfully, unchanged, when data previously stored at one site is relocated to another site or is replicated at several sites."[2] This rule extends the principles of data independence outlined in Rules 8 and 9 by ensuring that the logical and physical aspects of distribution do not impact user interactions.[14] At its core, the rule facilitates techniques such as horizontal partitioning (dividing tables by rows across sites) and vertical partitioning (dividing tables by columns), allowing the system to manage large datasets efficiently without altering the database schema or query interfaces. Queries and updates formulated in the relational language, such as SQL, remain valid and perform as expected regardless of how the data is distributed, whether centrally or across a network.[24] This transparency is achieved through the DBMS's query optimizer and distribution mechanisms, which handle location resolution internally.[25] For instance, consider a query likeSELECT * FROM customers WHERE country = 'USA';. In a compliant system, this executes identically whether the customers table resides on a single server or is sharded horizontally across multiple distributed servers, with the DBMS routing and aggregating results seamlessly.[26]
The primary implication of Rule 11 is enhanced scalability and flexibility for enterprise environments, enabling the construction of large-scale, federated, or cloud-based databases without necessitating modifications to existing applications or user queries. This supports growth in data volume and geographic distribution, as seen in modern distributed RDBMS implementations, while preserving the single logical database illusion.[2]