Fact-checked by Grok 2 weeks ago

Relational database

A relational database is a type of database management system that organizes data into structured tables composed of rows and columns, where each row represents a (or ) and each column represents an attribute, enabling the establishment of relationships between data points across tables through keys. This model ensures , consistency, and efficient retrieval by adhering to principles such as to minimize and anomalies. The was first proposed by researcher in his seminal 1970 paper, "A Relational Model of Data for Large Shared Data Banks", which introduced the concept of representing data as mathematical relations to simplify querying and maintenance in large-scale systems. Codd's framework shifted away from earlier hierarchical and network models, emphasizing declarative querying over procedural navigation, which laid the groundwork for modern database technology. Key features of relational databases include the use of primary keys to uniquely identify rows within a table and foreign keys to link tables, enforcing and supporting complex joins for . They are typically managed by a relational database management system (RDBMS), such as , , or , which provides tools for data definition, manipulation, and control. A cornerstone of RDBMS is Structured Query Language (SQL), developed in the by researchers Donald Chamberlin and Boyce as part of the System R prototype, allowing users to perform operations like selecting, inserting, updating, and deleting data in a standardized, non-procedural manner. The adoption of relational databases accelerated in the late 1970s and 1980s, with IBM's System R serving as an influential prototype that demonstrated practical implementation, leading to the first commercial RDBMS releases, including in 1979. Today, relational databases remain foundational for applications requiring (Atomicity, Consistency, Isolation, Durability) compliance, such as in , , and , handling vast datasets while supporting through features like indexing and partitioning.

Overview

Definition and Principles

A relational database is a type of database management system that organizes into relations, which are tabular structures consisting of rows (tuples) and columns (attributes), adhering to the introduced by in 1970. This model represents as sets of relations where each relation captures entities and their associations through shared attributes, enabling efficient storage, retrieval, and manipulation without reliance on physical storage details or navigational paths. The foundational principles of relational databases emphasize , , and declarative querying. Logical ensures that changes to the , such as adding new relations, do not affect application programs, while physical shields users from alterations in storage structures or access methods. is enforced through constraints like primary keys, which uniquely identify each in a relation, and rules that maintain consistency across relations. Querying relies on set-based operations—such as selection, , and join—which treat data as mathematical sets, allowing users to specify what data is needed without detailing how to retrieve it. In contrast to earlier hierarchical and network models, which organize data in tree-like or graph structures requiring explicit navigation along predefined paths, the relational model uses a flat, tabular format that promotes simplicity and flexibility. Hierarchical models limit relationships to parent-child hierarchies, while network models (like ) allow more complex linkages but often lead to access dependencies and redundancy; the relational approach avoids these by providing a declarative, high-level that abstracts away implementation details.

Applications and Advantages

Relational databases find extensive application across diverse sectors due to their ability to manage structured data efficiently. In business environments, they underpin (CRM) and (ERP) systems, such as , which use them to handle structured processes like customer interactions, inventory tracking, and operations. Financial services leverage relational databases for secure , account management, and compliance reporting in banking and payment systems, where is paramount. Web applications, including platforms like those built with or , rely on them to store and retrieve user profiles, product catalogs, and order histories through relational links. In scientific , relational databases organize experimental results, patient records, and , as seen in healthcare studies where they enable consistent querying and analysis of structured datasets. A primary advantage of relational databases is their adherence to ACID properties—atomicity, , , and —which ensure reliable and predictable transaction handling, minimizing errors in critical operations like financial transfers. The use of standardized SQL provides portability, allowing queries and schemas to transfer seamlessly between systems, while supporting complex operations such as joins to relate data across tables and aggregations for analytical insights. For structured data, they offer through techniques like indexing, sharding, and vertical scaling, enabling growth in enterprise settings without compromising performance. Despite these strengths, relational databases have limitations when applied to certain data types; they excel with structured information but are less suitable for , such as or semi-structured formats, due to rigid schemas that require predefined structures. Similarly, in high-velocity environments involving streams, their emphasis on compliance can introduce overhead, potentially slowing ingestion compared to more flexible alternatives. As of 2025, relational databases power the majority of enterprise applications, with the global market projected to reach $82.95 billion, reflecting their enduring dominance in structured data environments. Systems like and alone account for over 57% of developer usage in surveys, highlighting their widespread adoption.

History

Origins and Theoretical Foundations

The relational model originated from the work of , a mathematician at IBM's San Jose Research Laboratory, who began developing the concept in 1969 while addressing challenges in managing large-scale systems. In June 1970, Codd published his seminal paper, "A Relational Model of Data for Large Shared Data Banks," in Communications of the ACM, introducing a organization based on mathematical relations to enable shared access to extensive formatted banks without exposing users to underlying storage details. This work was motivated by the limitations of prevailing navigational database systems, such as hierarchical (tree-structured) and network models like IBM's IMS, which enforced rigid physical linkages, ordering, and access paths that made retrieval inflexible and dependent on specific program knowledge of the . Codd argued that these systems led to program failures when organization changed, as applications had to navigate predefined paths, resulting in high maintenance costs and inefficiency for large shared environments. Codd's model sought to overcome these issues through key theoretical motivations, including the elimination of to prevent inconsistencies and wasted storage, the assurance of so that changes in physical representation did not affect user queries or applications, and the representation of data using predicate logic for precise, declarative querying. By treating data as relations—n-ary sets of tuples—he proposed a "universal data sublanguage" grounded in first-order predicate calculus, allowing users to specify what data they needed rather than how to retrieve it, thus insulating applications from structural modifications. This approach addressed redundancy by defining where relations minimized derivable projections, ensuring that data could be reconstructed without duplication while maintaining logical consistency. The theoretical foundations drew heavily from mathematical disciplines, particularly for modeling relations as mathematical sets and for query formulation and integrity enforcement. Codd's innovations at built on these principles to create a framework that prioritized user protection from data organization details, stating that "future users of large data banks must be insulated from any changes in the structure of data which are made possible by improvements in base hardware and software technology." To further refine the criteria for true relational systems, Codd later proposed his 12 rules (often counted as 13, including Rule 0: the foundation rule) in 1985, outlining essential properties for a relational database , such as guaranteed access via logical addressing and support for view updating.

Commercial Development and Adoption

The development of commercial relational database management systems (RDBMS) began in the late 1970s, transitioning from research prototypes to market-ready products. IBM's System R, initiated in 1974 as an internal research project, served as a key prototype that demonstrated the feasibility of relational databases using SQL as the query language, influencing subsequent commercial efforts. In 1979, released the first commercially available RDBMS, initially known as Oracle Version 2, which ran on hardware and marked a pivotal shift toward enterprise adoption by offering structured query capabilities for business applications. IBM followed with SQL/DS in 1981, targeted at mainframe environments, and later DB2 in 1983, which became a cornerstone for large-scale in corporations. Standardization efforts solidified the relational model's commercial viability. In 1986, the (ANSI) published the first SQL standard (SQL-86, or ANSI X3.135), establishing a common syntax for querying relational databases and promoting across vendors. This was adopted internationally by the ISO in 1987 as SQL-87, with subsequent revisions—such as for enhanced integrity and SQL:1999 for object-relational features—evolving the standard to address growing complexities in , culminating in SQL:2023, which includes support for and property graphs. The 1980s saw an enterprise boom in RDBMS adoption, driven by the need for reliable data handling in sectors like finance and manufacturing, with products from , , and others powering systems. In the , open-source alternatives accelerated widespread use: was first released in 1995, gaining popularity for web applications due to its simplicity and performance, while emerged in 1996 from the academic Postgres project, offering advanced features like extensibility for complex queries. The 2000s integrated relational databases with , enabling scalable deployments through services like Amazon RDS (launched in 2009), which facilitated on-demand access and reduced infrastructure costs for businesses. By 2025, relational databases maintain dominance in the DBMS market, accounting for approximately 64% of revenue in 2023 according to industry analyses, underscoring their enduring role in handling structured data amid the rise of environments.

Relational Model

Core Concepts and Terminology

In the , a is defined as a of ordered n-s, where each consists of values drawn from specified domains, mathematically equivalent to a of the of those n domains. This structure is typically represented as a , with no inherent ordering among the s or within the attributes, ensuring that the remains a set without duplicates. A corresponds to a single row in the , forming an n-tuple where the i-th component belongs to the i-th . Each attribute represents a column in the , named and associated with a specific that defines the allowable values for that position across all tuples. The of a is the number of attributes (n), while its cardinality is the number of distinct tuples it contains. The relational model distinguishes between the , which defines the logical structure including relations, attributes, and their domains, and the instance, which is the actual collection of tuples at any given time. Data storage relies on value-based representation, where relationships between data are established solely through shared attribute values rather than physical pointers, ordering, or hierarchical links, promoting . To handle missing or inapplicable information, E.F. Codd later extended the model to include null values, which represent either "value at present unknown" or "property inapplicable," distinct from empty strings or zeros, and integrated into a for queries.

Mathematical Basis

The relational model is grounded in and predicate logic, providing a formal framework for data representation and manipulation. At its core, a D_i is defined as a set of values that can be assigned to attributes, ensuring type consistency across the database. A R of n is formally a of the Cartesian product of n domains, expressed as R \subseteq D_1 \times D_2 \times \cdots \times D_n, where each element of R corresponds to a valid of values from these domains. A tuple in this model is a finite, ordered of n values, with the i-th value drawn from D_i, representing a single or fact. The R itself is a set of such tuples, inherently enforcing since sets do not permit duplicates, which eliminates at the mathematical level. Formally, a comprises a heading and a body: the heading specifies the attribute names paired with their respective domains (e.g., R(A_1: D_1, A_2: D_2, \dots, A_n: D_n)), defining the structure, while the body is the finite set of tuples populating that structure at any given time. The foundation in predicate logic enables queries to be formulated as logical predicates applied over relations, allowing declarative expressions of and in terms of statements. This approach, rooted in an applied predicate , supports relational completeness, where any query expressible in can be represented within the model.

Data Organization

Relations, Tuples, and Attributes

In the , a is conceptualized as a that organizes into rows and columns, where the columns represent attributes defining the characteristics of the stored entities, and the rows, known as tuples, capture instances or of those entities. This structure ensures that data is stored in a declarative manner, independent of physical implementation details, allowing users to interact with it through logical representations. To illustrate, consider a simple employee relation named Employees with three attributes: EmployeeID (an integer identifier), Name (a for the employee's full name), and Department (a indicating the ). Each in this would consist of a unique combination of values for these attributes, such as (101, "Alice Johnson", "Engineering"), representing one employee's details without implying any order among the tuples. This tabular format facilitates straightforward comprehension and manipulation of data relationships. Relations are categorized into base relations, which store the actual persistent data in the database (often called base tables in SQL implementations), and derived relations, such as views, which are virtual and computed dynamically from queries on base relations or other views without storing data separately. Base relations form the foundational storage, while derived ones provide flexible, on-demand perspectives of the data. Each attribute in a relation must hold only (indivisible, simple) values from its defined , prohibiting nested structures like lists or sets within a single cell to maintain the model's simplicity and ensure compliance. This atomicity requirement, where domains briefly specify the allowable value types (e.g., integers or strings), supports efficient querying and integrity.

Domains and Schemas

In the , a domain represents the set of permissible atomic values from which the values of a specific attribute are drawn, ensuring consistency and across relations. This concept, introduced by E.F. Codd, defines domains as finite or infinite sets of values, such as the domain of integers for numeric attributes or strings for textual ones, preventing invalid entries like non-numeric values in an age field. For instance, the domain for an employee's age attribute might be restricted to integers between 18 and 65, limiting values to that range while excluding extraneous like negative numbers or decimals. A in a relational database outlines the structural blueprint, comprising relation schemas that specify the attributes of each along with their associated , and the overall as the integrated collection of these relation schemas, including definitions for , indexes, and constraints where applicable. schemas thus serve as the foundational descriptors, naming the table and mapping each attribute to its , while the provides a holistic view of inter-table organization without delving into instances. This separation allows for abstract design independent of physical storage, facilitating maintenance and scalability in large systems. Modern relational database management systems (RDBMS) implement domains through type systems, offering built-in data types such as for whole numbers, for variable-length strings, and for temporal values, which align with the abstract domains of the by enforcing value ranges and formats at the storage level. Users can extend these with user-defined domains, created via SQL statements like CREATE DOMAIN, which base a new type on an existing one while adding custom , such as conditions to validate specific rules beyond standard types. For example, a user-defined domain for might build on with a of two places and a non-negative , promoting reusability across attributes. Schema evolution addresses the need to modify these structures over time in response to changing requirements, involving operations like adding or dropping attributes, altering domains, or renaming relations, often managed through versioning to track historical states and automate migrations. In practice, tools and protocols enable forward and , allowing applications to query evolving schemas without , as demonstrated in case studies where schema changes were applied incrementally to minimize in production environments. This process underscores the relational model's flexibility, though it requires careful planning to preserve during transitions.

Integrity Mechanisms

Keys and Relationships

In the relational model, keys are essential attributes or sets of attributes that ensure uniqueness within a and facilitate connections between relations. A is any set of one or more attributes that uniquely identifies each in a , allowing no two tuples to share the same values for that set. A is a minimal superkey, meaning no proper of its attributes is itself a superkey; multiple candidate keys may exist for a given , such as both employee ID and a combination of name and birthdate uniquely identifying an employee. The is the candidate key selected to serve as the unique identifier for tuples in the , with the choice often guided by factors like simplicity and stability; for instance, in an employee , employee ID might be chosen as the primary key over a composite of name and address. A is an attribute or set of attributes in one that matches the (or a ) of another relation, establishing a link between them without duplicating data. For example, in a relation with department ID as the primary key, an employee relation might include department ID as a to indicate which each employee belongs to. Foreign keys enable the to represent associations between entities while preserving through referential constraints, ensuring that referenced values exist in the target relation. Keys define the types of relationships between relations, which describe how tuples in one relation correspond to those in another. A one-to-one relationship occurs when each tuple in one relation is associated with at most one tuple in another, and vice versa; this can be implemented by placing the primary key of one relation as a foreign key in the other, often with mutual foreign keys or by merging relations if appropriate. For instance, a person relation might have a one-to-one link to a passport relation, where passport number serves as both primary and foreign key. A one-to-many relationship exists when each in one (the "one" side) can be associated with zero, one, or multiple tuples in another (the "many" side), but each tuple on the "many" side links to at most one on the "one" side; this is typically realized by placing a in the "many" that references the of the "one" . In a classic example, a (one side) relates to an employee (many side), where employees' department IDs as point to the department's , allowing one to have multiple employees but each employee to belong to only one . A many-to-many relationship arises when tuples in one can associate with multiple tuples in another, and vice versa; direct implementation is avoided to prevent redundancy, instead using a (or associative) that contains foreign keys referencing the primary keys of both original relations, effectively decomposing the many-to-many into two one-to-many relationships. For example, a and a might connect via an with student ID and course ID as foreign keys, capturing multiple enrollments per student and multiple students per course. This structure supports efficient querying and updates while maintaining principles.
Key TypeDefinitionExample in Employee Relation
SuperkeySet of attributes uniquely identifying tuples (may include extras){EmployeeID, Name, Address}
Candidate KeyMinimal (no subset is a superkey){EmployeeID}, {SSN}
Primary KeySelected for unique identificationEmployeeID
Foreign KeyReferences of another relationDepartmentID (referencing Departments table)

Constraints and

In relational databases, constraints are rules enforced on data to maintain accuracy, consistency, and validity across relations. These mechanisms prevent invalid states by restricting operations that would violate predefined conditions, such as insertions, updates, or deletions that introduce inconsistencies. Entity integrity is a fundamental ensuring that the of every in a is neither nor contains duplicate values, thereby guaranteeing that each entity can be uniquely identified without ambiguity. This rule applies specifically to attributes, prohibiting nulls to uphold the relational model's requirement for identifiable records. Referential integrity maintains consistency between related relations by requiring that the value of a in one relation either matches an existing value in the referenced relation or is , thus avoiding orphaned records or invalid references. Violations of this occur during operations like deleting a referenced or updating a to an unmatched value. To handle such violations, database systems support actions including RESTRICT, which blocks the operation if it would break the reference; CASCADE, which propagates the delete or update to dependent s; SET NULL, which sets the to ; or SET DEFAULT, which assigns a value, depending on the system's implementation. Check constraints enforce custom business rules on attribute values within a , such as ensuring an employee's is greater than 18 or a exceeds a minimum , by evaluating a during data modification. These constraints are declarative, specified at the level, and apply to single or multiple columns, rejecting operations that fail the condition to preserve semantic correctness. Unique constraints extend beyond primary keys by ensuring that values in one or more columns are distinct across all tuples in a , allowing values (unlike primary keys) to support alternate identifiers, such as email addresses in a . This prevents duplicates in non-primary attributes while permitting flexibility for optional requirements.

Querying and Manipulation

Relational Algebra Operations

provides a procedural for querying and manipulating relations in the , where each operation takes one or more relations as input and yields a new relation as output. Introduced by in 1970, it emphasizes set-theoretic foundations to ensure and structured manipulation. The operations are designed to be composable, forming a that maintains relational integrity throughout computations. The fundamental operations, often termed primitive, encompass selection, , union, set difference, , and rename. These primitives enable the expression of basic and combination tasks. Selection, symbolized as \sigma, filters tuples from a R that satisfy a P, defined formally as: \sigma_P(R) = \{ t \mid t \in R \land P(t) \} where P involves comparisons like or and logical connectives. , denoted \Pi, extracts specified attributes from R while eliminating duplicates to ensure the result remains a , expressed as \Pi_{A_1, A_2, \dots, A_k}(R), with A_1 to A_k as the chosen attributes. Union, indicated by \cup, merges tuples from two type-compatible relations R and S (same arity and corresponding domains), yielding: R \cup S = \{ t \mid t \in R \lor t \in S \} with duplicates removed. Set difference, using -, identifies tuples unique to R relative to S: R - S = \{ t \mid t \in R \land t \notin S \} applicable only to compatible relations. Cartesian product, \times, generates all possible pairings of tuples from R and S: R \times S = \{ tq \mid t \in R \land q \in S \} assuming attribute names are distinct or renamed if overlapping. Rename, \rho, reassigns names to relations or attributes, such as \rho_{T}(R) to designate R as T, facilitating composition without name conflicts. Derived operations build upon the primitives to handle common relational tasks more directly, including join, intersection, and division. Natural join, \bowtie, links R and S on matching values of shared attributes, equivalent to a theta join (generalized condition) restricted to equality, and formally: R \bowtie S = \Pi_X \left( \sigma_P (R \times S) \right) where P enforces equality on common attributes and X selects output attributes. Theta join extends this to arbitrary conditions in P, such as inequalities. Intersection, \cap, retrieves shared tuples: R \cap S = \{ t \mid t \in R \land t \in S \} derivable as R - (R - S), requiring compatibility. Division, \div, identifies attribute values in the projection of R (excluding S's attributes) that associate with every tuple in S: R \div S = \{ t \mid t \in \Pi_{R - S}(R) \land \forall u \in S \, (tu \in R) \} useful for queries like "all parts supplied by every supplier." These operations exhibit closure: any composition results in a valid relation, enabling the construction of arbitrary query expressions through nesting and sequencing. This expressiveness allows relational algebra to represent all information-retrieval requests expressible in the model, serving as the theoretical core for languages like SQL.

SQL as the Standard Language

SQL, or Structured Query Language, emerged as the declarative language for managing and querying relational databases, providing a standardized interface that translates concepts into practical syntax for data operations. Developed in 1974 by and as SEQUEL (Structured English QUEry Language) for IBM's System R research project, it was designed to demonstrate the viability of Edgar F. Codd's in a prototype database system. The language was later shortened to SQL due to trademark issues and evolved through System R's phases, unifying data definition, manipulation, and view mechanisms by 1976. This foundation enabled SQL to become the , influencing commercial systems like IBM's SQL/DS and DB2. SQL is categorized into sublanguages that handle distinct aspects of database interaction. (DDL) includes commands like CREATE and ALTER to define and modify database structures such as tables and schemas, while removes them. (DML) encompasses for querying data, INSERT for adding rows, for modifying existing data, and DELETE for removing rows. (DCL) manages access with GRANT to assign privileges and REVOKE to withdraw them, ensuring security over database objects. At the heart of SQL lies the SELECT statement, which retrieves from one or more tables using a structured that supports complex filtering and aggregation. The basic form is SELECT column_list FROM table_list [WHERE condition] [GROUP BY columns] [HAVING condition] [ORDER BY columns];, where FROM specifies the source tables, WHERE filters rows before grouping, GROUP BY aggregates into groups, HAVING applies conditions to groups, and ORDER BY sorts the results. Joins, such as INNER JOIN or LEFT JOIN, combine rows from multiple tables based on related columns, while subqueries—nested SELECT statements—allow embedding queries within clauses like WHERE or FROM for advanced filtering, such as selecting employees with salaries above the departmental average. SQL's standardization began with ANSI's adoption as X3.135 in 1986, followed by ISO as 9075 in 1987, establishing core syntax and semantics across implementations. The ISO/IEC 9075 standard, now in its 2023 edition, comprises nine parts, including SQL/Foundation for core language elements and optional modules like SQL/ for document handling; it defines conformance levels such as (mandatory features) and (vendor extensions). Over time, SQL has evolved from SQL-86's basic relational operations to SQL:1999's introduction of Common Table Expressions (CTEs) for readable subquery reuse and window functions for analytics like ROW_NUMBER() over ordered partitions without collapsing rows. SQL:2016 added support for storing and querying , while SQL:2023 enhances this with native types, scalar functions, and simplified accessors like dot notation for nested objects, alongside improvements to recursive CTEs for handling cycles in hierarchical data. These advancements support modern analytics workloads while maintaining .

Database Design

Normalization Process

The normalization process in relational databases involves systematically decomposing relations into smaller, well-structured components to eliminate data redundancies and dependency anomalies while preserving the information content of the original database. This step-by-step refinement ensures that the database schema adheres to progressively stricter normal forms, based on constraints known as functional dependencies. The goal is to design a schema that minimizes update, insertion, and deletion anomalies, thereby improving data integrity and consistency. Functional dependencies form the foundational constraints in this process. A functional dependency (FD) exists in a relation R when one set of attributes X functionally determines another set Y, denoted as X → Y, meaning that for any two tuples in R that agree on X, they must also agree on Y. This concept was introduced as part of the to capture semantic relationships between attributes. FDs help identify potential redundancies, such as when non-key attributes depend on only part of a , leading to anomalies during data modifications. To infer all implied FDs from a given set, Armstrong's axioms provide a complete set of inference rules. These axioms, developed by William W. Armstrong, include three primary rules: reflexivity, augmentation, and . Reflexivity states that if Y is a of X, then X → Y holds trivially. Augmentation asserts that if X → Y, then for any Z, XZ → YZ. implies that if X → Y and Y → Z, then X → Z. Additional derived rules, such as and , can be proven from these basics, ensuring and for FD . The normalization process progresses through a series of normal forms, each building on the previous to address specific types of dependencies. First Normal Form (1NF) requires that all attributes in a relation contain atomic (indivisible) values, eliminating repeating groups or multivalued attributes within tuples. This ensures the relation resembles a mathematical table with no nested structures. Second Normal Form (2NF) extends 1NF by requiring that no non-prime attribute (one not part of any candidate key) is partially dependent on any candidate key. In other words, every non-key attribute must depend on the entire candidate key, not just a portion of it. This eliminates partial dependencies, which can cause update anomalies in relations with composite keys. Third Normal Form (3NF) further refines 2NF by prohibiting transitive dependencies, where a non-prime attribute depends on another non-prime attribute rather than directly on a candidate key. A relation is in 3NF if, for every FD X → Y, either X is a superkey or each attribute in Y - X is prime. These forms were formalized to free relations from insertion, update, and deletion dependencies. Boyce-Codd Normal Form (BCNF) imposes a stricter condition than 3NF: for every non-trivial FD X → Y in the relation, X must be a (a minimal ). This addresses cases where 3NF allows that are not superkeys, potentially leading to anomalies in relations with overlapping candidate keys. BCNF ensures every is a , making it particularly useful for eliminating certain redundancy issues not resolved by 3NF. Higher normal forms target more complex dependencies. (4NF) deals with multivalued dependencies (MVDs), where an attribute set is independent of another but both depend on a common key. A is in 4NF if it is in BCNF and has no non-trivial MVDs other than those implied by FDs. This prevents redundancy from independent multivalued facts, such as multiple hobbies per unrelated to skills. MVDs generalize FDs and were defined to capture such scenarios. (5NF), also known as Project-Join Normal Form (PJ/NF), addresses join dependencies, where a can be decomposed into projections that can be rejoined without spurious tuples. A is in 5NF if it is in 4NF and every join dependency is implied by the candidate keys. This form eliminates anomalies from cyclic dependencies across multiple relations. In practice, the normalization process begins by identifying all relevant FDs (and higher dependencies for advanced forms) using domain knowledge and Armstrong's axioms to compute closures. The schema is then decomposed iteratively: for violations of a target normal form, select an offending FD X → Y, project the relation into R1 = (X Y) and R2 = (attributes of R - Y), and replace the original with these projections. Decompositions must be lossless—meaning the natural join of the projections equals the original relation without spurious tuples—to preserve data. This is verified if the FDs include a condition where one projection's key is contained in the other. The process continues until the schema satisfies the desired normal form, balancing integrity with query efficiency.

Denormalization and Performance Considerations

Denormalization involves intentionally introducing into a relational database that has been to higher normal forms, such as (3NF), to enhance query performance at the expense of storage efficiency and data consistency maintenance. This technique counters the strict elimination of in by selectively duplicating data, thereby reducing the computational overhead of joins and aggregations during read operations. Common denormalization strategies include creating pre-joined , where data from multiple normalized tables is combined into a single to eliminate runtime joins for frequently queried combinations. For example, in an system, customer and order details might be merged into one to speed up retrieval of order histories. Another approach is implementing aggregates, which precompute and store results of common aggregation functions like sums or averages, avoiding repeated calculations on large datasets. This is particularly useful for reporting queries involving totals, such as monthly sales figures stored directly in a denormalized . Clustering, as a , groups related records physically or logically within tables to minimize data scattering, facilitating faster scans and range queries without relying solely on indexes. The primary trade-offs of denormalization center on improved read performance versus increased risks of update anomalies and higher storage costs. By duplicating data, queries can execute faster—often reducing response times by orders of magnitude for join-heavy operations—but updates require propagating changes across redundant copies, potentially leading to inconsistencies if not managed carefully. Storage overhead rises due to redundancy, which can be significant in large-scale systems, though this is offset in read-intensive environments where query speed is paramount. Denormalization is most appropriate for high-read workloads, such as analytical reporting or (OLAP) systems, where complex queries dominate over frequent updates typical in (OLTP). In OLAP scenarios, denormalized schemas support multidimensional analysis by flattening hierarchies, enabling sub-second responses on terabyte-scale data. Conversely, OLTP environments, focused on concurrent transactions, generally avoid extensive denormalization to preserve during writes. Modern relational database management systems (RDBMS) provide materialized views as a controlled for , storing precomputed query results that can be refreshed periodically or incrementally. These views act as virtual denormalized tables, combining the benefits of for fast reads with automated to mitigate update anomalies. For instance, Oracle's materialized views support equi-joins and aggregations optimized for warehousing, reducing query times while integrating with the underlying normalized . This approach, rooted in incremental view techniques, balances performance gains with consistency in production environments.

Advanced Features

Transactions and ACID Properties

In relational databases, a transaction is defined as a logical consisting of a sequence of operations, such as reads and writes, that are executed as a single, indivisible entity to maintain . Transactions typically begin with a BEGIN statement, proceed through a series of database operations, and conclude with either a COMMIT to permanently apply the changes or a to undo them entirely, ensuring that partial failures do not leave the database in an inconsistent state. This mechanism allows complex operations, like transferring funds between accounts, to be treated atomically, preventing issues such as overdrafts if one step fails. The reliability of transactions in relational databases is ensured through the ACID properties (atomicity, consistency, isolation, durability), a set of guarantees that ensure reliable ; the acronym was coined by Theo Härder and Andreas Reuter in 1983. Atomicity requires that a is executed completely or not at all; if any operation fails, the entire is rolled back, restoring the database to its pre-transaction state. Consistency mandates that a brings the database from one valid state to another, preserving integrity constraints such as primary keys, foreign keys, and check constraints after completion. Isolation ensures that concurrent s do not interfere with each other, making each appear to execute in isolation even when running in parallel. Durability guarantees that once a is committed, its effects are permanently stored, surviving subsequent system failures through techniques like . To balance isolation with performance in multi-user environments, relational databases implement varying isolation levels as defined by the ANSI SQL standard, which specify the degree to which concurrent are shielded from each other's effects. The read uncommitted level allows a to read data modified by another uncommitted , potentially leading to dirty reads but maximizing concurrency. Read committed prevents dirty reads by ensuring reads only from committed data, though it permits non-repeatable reads where the same query may yield different results within a . Repeatable read avoids non-repeatable reads by locking read data until the ends, but it may still allow reads from new insertions by other . The strictest, serializable, fully emulates sequential execution, preventing all anomalies including phantoms through techniques like locking or timestamping, at the cost of reduced concurrency. For distributed relational databases spanning multiple nodes, the two-phase commit (2PC) protocol coordinates to achieve atomicity and across sites. In the first phase, a coordinator polls participants to prepare the ; each votes yes if it can commit locally or no if it cannot, with all logging their intent durably. If all vote yes, the second phase issues a global commit, propagating the decision; otherwise, an abort is sent, and all roll back. This ensures that either all sites commit or none do, though it can block if the coordinator fails, requiring recovery mechanisms.

Stored Procedures, Triggers, and Views

Stored procedures are pre-compiled blocks of SQL code stored in the database that can be invoked repeatedly to perform complex operations, such as data manipulation or execution, often with input and output parameters for flexibility. They originated as an extension to SQL in commercial RDBMS implementations, with introducing stored procedures in Oracle7 in 1992 to enhance reusability and reduce network traffic by executing code server-side. Stored procedures support error handling through exception blocks and can include conditional logic, making them suitable for encapsulating database-side programming. A basic example of creating a in , Oracle's procedural extension to SQL, is as follows:
sql
CREATE OR REPLACE PROCEDURE update_employee_salary(emp_id IN NUMBER, raise_pct IN NUMBER)
IS
BEGIN
    UPDATE employees
    SET salary = salary * (1 + raise_pct / 100)
    WHERE employee_id = emp_id;
    
    IF SQL%ROWCOUNT = 0 THEN
        RAISE_APPLICATION_ERROR(-20001, 'Employee not found');
    END IF;
    
    COMMIT;
EXCEPTION
    WHEN OTHERS THEN
        ROLLBACK;
        RAISE;
END update_employee_salary;
This procedure updates an employee's salary by a percentage and includes error handling if no rows are affected. In , (T-SQL) provides similar functionality, allowing procedures to accept parameters and manage transactions internally. Triggers are special types of stored procedures that automatically execute in response to specific database events, such as INSERT, UPDATE, or DELETE operations on a table or view, enabling automation of tasks like or auditing. They were introduced alongside stored procedures in early RDBMS to enforce rules implicitly without application-level code, with supporting them since version 7. DML triggers, the most common type, fire for each affected row (row-level) or once per statement (statement-level), and can access special variables like OLD and NEW to reference pre- and post-event data. For instance, a T-SQL trigger in SQL Server for audit logging on an UPDATE event might look like this:
sql
CREATE TRIGGER tr_employees_audit
ON employees
AFTER UPDATE
AS
BEGIN
    INSERT INTO audit_log (table_name, operation, changed_at)
    SELECT 'employees', 'UPDATE', GETDATE()
    WHERE @@ROWCOUNT > 0;
END;
This trigger logs updates to an audit table automatically after the operation completes. Triggers promote data integrity by responding immediately to changes, though they require careful design to avoid recursive firing or performance issues. Views serve as virtual tables derived from one or more base tables via a stored query, providing a simplified or restricted perspective of the underlying data without storing it physically, which aids in and . Introduced in the original SQL (ANSI X3.135-1986), views hide complex joins or sensitive columns, enabling row-level by limiting access to subsets of data based on privileges. They can be updatable if based on a single table with no aggregates, allowing modifications that propagate to the base tables. An example of creating a view in standard SQL, compatible with systems like PostgreSQL, is:
sql
CREATE VIEW active_employees AS
SELECT employee_id, first_name, last_name, department
FROM employees
WHERE status = 'active';
Querying this view (SELECT * FROM active_employees) returns only current employees, abstracting the full table and enforcing access controls. Views thus facilitate modular database design by decoupling applications from physical schema changes.

Implementation

RDBMS Architecture

The ANSI/ three-schema architecture provides a foundational for relational database management systems (RDBMS), dividing the database into three abstraction levels to promote and modularity. The external level consists of multiple user views, each tailored to specific applications or end-users, presenting only relevant portions of the data in a customized format without exposing the underlying structure. The conceptual level defines the , encompassing the entire database's entities, attributes, relationships, and constraints in a community-wide model of physical storage details. The internal level handles the physical , specifying how data is stored on disk, including file organizations and access paths optimized for efficiency. Mappings between these levels ensure , allowing modifications at one level without impacting others. The external/conceptual mapping translates user views to the , supporting logical data independence by enabling view changes without altering the conceptual model. Similarly, the conceptual/internal mapping converts the to physical storage, providing physical data independence so storage optimizations can occur without affecting higher levels. This separation enhances system flexibility, as changes in user requirements or hardware can be isolated. Core RDBMS components operate across these levels to manage queries and storage. The query processor handles SQL statement processing, comprising a parser that validates syntax and semantics, an optimizer that generates efficient execution plans using techniques like cost-based selection, and an that runs via iterator-based operators to retrieve and manipulate . The storage manager oversees persistence and access, including subcomponents like the manager, which controls transfers between disk and main memory using a shared pool with replacement policies such as LRU-2 to minimize I/O operations. It also incorporates a manager to enforce properties through locking protocols like and for concurrency and recovery. At the internal level, relations are stored using specific file structures to balance access efficiency and storage overhead. Heap files organize records in insertion order without sorting, suiting full scans or append-heavy workloads by allowing fast inserts at the file end. Sorted files maintain records in key order, facilitating range queries and equality searches through binary search, though inserts require costly maintenance to preserve ordering. Hashed files employ a on a key to distribute records across buckets, enabling O(1) average-case lookups for equality selections at the expense of range query support.

Indexing and Optimization Techniques

Indexing in relational database management systems (RDBMS) enhances query performance by providing efficient access structures, reducing the need for full table scans on large datasets. Indexes organize in a way that allows the query executor to locate and retrieve specific rows quickly, often at logarithmic . Common index types include B-trees, indexes, and indexes, each suited to different query patterns and characteristics. B-tree indexes, introduced as a balanced for maintaining ordered data, serve as the default for most equality and range queries in RDBMS. They consist of internal nodes pointing to child nodes or leaf nodes containing key-value pairs, ensuring balanced height for O(log n) search, insertion, and deletion operations. B-trees excel in scenarios requiring sorted access, such as ORDER BY clauses or range conditions like salary BETWEEN 50000 AND 80000. Hash indexes, designed for exact-match lookups, use a to map keys to in an array-like structure, enabling constant-time O(1) average-case access for equality predicates. They are particularly effective for lookups in point queries but less suitable for scans due to the unordered nature of . Extendible hashing variants address bucket overflows dynamically, making them adaptable to varying data volumes in relational systems. indexes are optimized for attributes with low , where the number of distinct values is small relative to the row count, such as or flags. Each distinct value is represented by a —a bit vector of equal to the table's row count—where a '1' indicates the presence of that value in a row. This structure supports fast bitwise operations for conjunctive and disjunctive queries, reducing I/O for selective predicates on low-cardinality columns. Query optimization in RDBMS involves selecting the most efficient execution strategy from multiple possible plans, balancing factors like CPU, I/O, and memory costs. Cost-based optimization, pioneered in , estimates plan costs using statistics on table sizes, index selectivity, and cardinalities to generate dynamic programming tables for evaluating join methods and access paths. This approach outperforms rule-based methods by adapting to data distribution, though it requires accurate statistics for reliable estimates. Heuristic rules complement cost-based techniques by applying predefined transformations to the search space early, such as pushing selections before joins or projecting only needed columns to minimize intermediate result sizes. These rules, like performing restrictions as early as possible, ensure efficient plan even for complex queries, reducing optimization time from to in many cases. Execution plans detail the sequence of operations for query processing, including access methods and join strategies. Index scans traverse only relevant index portions for selective queries, contrasting with table scans that read entire tables, which are preferable for low-selectivity conditions where index overhead exceeds benefits. Join order determination, often via dynamic programming, minimizes intermediate result sizes by joining smaller relations first, with bushy trees allowing parallel evaluation in modern optimizers. Caching mechanisms, such as buffer pools, mitigate disk I/O by holding frequently accessed pages in memory, managed via least-recently-used (LRU) replacement policies to prioritize "hot" data. Buffer pools allocate fixed memory regions to cache and pages, enabling sub-millisecond access for repeated queries on working sets smaller than available . Constraints like primary keys can influence usage in caching, as they enforce uniqueness and accelerate lookups. Recent advancements incorporate AI-driven auto-tuning for query optimization, leveraging to refine execution plans and parameters dynamically. In extensions like Balsa and LEON, models analyze historical query patterns to suggest index configurations and join strategies, achieving up to 2-5x speedup on workloads with variable selectivity without manual intervention. These techniques represent a shift toward adaptive, self-optimizing RDBMS.

Modern Extensions

Distributed Relational Databases

Distributed relational databases extend traditional relational database management systems (RDBMS) by partitioning and replicating data across multiple nodes to handle increased load, ensure , and manage large-scale data volumes. This approach maintains relational integrity and SQL compatibility while addressing limitations of single-node systems. Key mechanisms include sharding for data distribution and replication for redundancy, often combined to balance performance and reliability. Horizontal sharding, also known as horizontal partitioning, divides a database into smaller subsets called , typically based on a shard key such as a user ID or geographic region, with each shard stored on a separate . This strategy enables of queries and scales write throughput by localizing operations to specific nodes. For instance, range-based sharding assigns contiguous key ranges to shards, while hash-based sharding distributes keys evenly using a to minimize hotspots. Replication complements sharding by maintaining multiple copies of data across nodes to enhance read performance and . In master-slave replication, a single master node handles all writes, propagating changes asynchronously or synchronously to slave nodes that serve read queries, reducing load on the master and providing options. Multi-master replication allows writes on multiple nodes, synchronizing changes among them, which supports higher write but introduces complexity in , often using last-write-wins or versioning schemes. , for example, supports both via its replication framework, where master-slave setups are common for read scaling, and group replication enables multi-master configurations. Consistency models in distributed relational databases trade off between , where all nodes reflect the latest committed data, and , where updates propagate over time. The , formalized by Gilbert and Lynch, posits that in the presence of network partitions (P), a system can prioritize either consistency (C) or (A), but not both. Relational databases traditionally favor CP systems, ensuring transactions across nodes via protocols like two-phase commit (2PC), but this may sacrifice during partitions. Some modern implementations relax to AP models for better , accepting temporary inconsistencies resolved through . Brewer's original highlighted these trade-offs in distributed . Distributed joins pose significant challenges due to data locality, requiring data movement across nodes via techniques like broadcast, redistribution, or semi-joins, which incur high network and CPU overhead. For example, joining tables sharded on different keys may necessitate shipping entire partitions, exacerbating in large clusters. Optimization strategies include co-partitioning related tables on the same key to localize joins. The ensures atomicity in distributed transactions by coordinating a prepare and a commit across nodes, but its overhead—from multiple message rounds and blocking—can performance, especially under high contention or failures. Extensions like presumed abort reduce this by assuming aborts on timeouts, minimizing coordinator involvement. Distributed transactions extend single-node properties using such protocols, though at increased cost. Middleware solutions like Vitess address these issues for MySQL-based systems by providing transparent sharding, query , and connection pooling across shards, allowing applications to interact with a unified database while handling resharding without . Vitess uses a keyspace-shard model, where VSchema defines rules, enabling efficient distributed operations.

Cloud-Native and NewSQL Systems

Cloud-native relational database management systems (RDBMS) are designed specifically for cloud environments, leveraging , , and to provide seamless and management without underlying infrastructure concerns. These systems, such as and Cloud SQL, enable automatic resource provisioning and across distributed cloud regions. , a fully managed service compatible with and , employs a cloud-native that separates compute from , allowing serverless scaling where capacity adjusts dynamically based on workload demands, achieving up to five times the throughput of standard instances. Similarly, Cloud SQL offers managed instances for , , and SQL Server, with built-in automation for backups, patching, and replication, ensuring 99.99% availability through multi-zone deployments and pay-per-use pricing models. NewSQL databases extend relational principles to distributed environments while preserving compliance, addressing scalability limitations of traditional RDBMS in cloud settings. , a PostgreSQL-compatible system, distributes data across clusters using a key-value store foundation, supporting horizontal scaling and geo-partitioning for global applications without sacrificing transactional consistency. , MySQL-compatible, employs a hybrid architecture combining SQL processing with a distributed key-value backend, enabling elastic scaling to petabyte levels while maintaining via consensus. These systems build on distributed strategies to handle massive concurrency, making them suitable for cloud-native workloads like and . Recent developments from 2024 to 2025 have integrated into relational databases for enhanced automation, particularly in query optimization. Autonomous Database incorporates Select AI, allowing prompts to generate and explain SQL queries, while algorithms automatically tune performance by adjusting indexes and resource allocation in real time. Additionally, hybrid SQL/ features have emerged in cloud systems, such as vector search capabilities in and , enabling unified handling of structured relational data alongside unstructured elements for AI-driven applications. Key advantages of cloud-native and systems include auto-scaling to match demand, reducing over-provisioning, and pay-per-use billing that aligns costs with actual usage, potentially lowering expenses by up to 50% compared to on-premises setups. Market adoption has accelerated, with services projected to reach $23.84 billion in 2025, representing about 30% of organizations operating in fully cloud-native modes and driving overall relational database deployments toward greater reliance.

References

  1. [1]
    What is a Relational Database? - IBM
    A relational database is a type of database that organizes data into rows and columns, which collectively form a table where the data points are related to each ...What is a relational database? · What is a relational database...
  2. [2]
    What Is a Relational Database? (RDBMS)? - Oracle
    Jun 18, 2021 · A relational database is a type of database that stores and provides access to data points that are related to one another.What Is a Relational Database · Oracle United Kingdom · Oracle Africa Region
  3. [3]
    A relational model of data for large shared data banks
    A relational model of data for large shared data banks. Author: E. F. Codd ... PDFeReader. Contents. Communications of the ACM. Volume 13, Issue 6 · PREVIOUS ...
  4. [4]
    The relational database - IBM
    A system that could potentially store and access information in large databases without providing a formal organizational structure or even recording exact ...
  5. [5]
    What Is Structured Query Language (SQL)? - IBM
    The history of SQL ... In the 1970s, IBM scientists Donald Chamberlin and Raymond Boyce developed and introduced SQL. It originated from the concept of relational ...
  6. [6]
  7. [7]
    Different Types of Databases & When To Use Them | Rivery
    Apr 11, 2025 · Use relational databases for applications like banking or financial systems that require ACID properties.Missing: ERP | Show results with:ERP
  8. [8]
    10 Best Databases for Web Applications to Use in 2025 - Bitcot
    Rating 4.9 (92) Nov 26, 2024 · Examples: MySQL, PostgreSQL, Microsoft SQL Server, SQLite. NoSQL Databases. NoSQL databases provide flexibility by allowing data storage in ...
  9. [9]
    Developing a Relational Database for Best Practice Data Management
    This manuscript provides an examplar for healthcare research data management by describing the development of a relational database and customized dashboard/ ...
  10. [10]
    Types of Databases, Pros & Cons, and Real-World Examples
    May 30, 2024 · Relational databases are ideal for scenarios requiring strong data integrity, complex queries, and structured data.
  11. [11]
    Relational vs Non-Relational Databases: A Complete Breakdown
    Apr 28, 2023 · The main disadvantages of relational databases is their rigidity. Depending on the platform you choose, relational databases can have a rigid ...Missing: limitations | Show results with:limitations
  12. [12]
    Relational Database Global Market Report 2025
    In stockThe relational database market size has grown rapidly in recent years. It will grow from $74.09 billion in 2024 to $82.95 billion in 2025 at a compound annual ...
  13. [13]
    The Most Popular Databases Used In 2025 - WebCreek
    Feb 13, 2025 · Based on the Stack Overflow Developer Survey, PostgreSQL leads the database ranking with 48.7% popularity, followed closely by MySQL at 40.3%.
  14. [14]
    Edgar F. Codd - IBM
    The revolutionary power of relational databases is taken for granted today, but in 1970 the concept was merely theoretical. That's when Codd, an Oxford-educated ...
  15. [15]
    [PDF] A Relational Model of Data for Large Shared Data Banks
    E. F. CODD. IBM Research Laboratory, San Jose, California. Future users of large data banks must be protected from having to know how the data is organized in ...
  16. [16]
    [PDF] The relational model for database management - CodeBlab
    Codd turned his attention to the management of large commercial databases and developed the relational model as a foundation. Since the mid-1970s, Dr. Codd has.
  17. [17]
    Codd's Twelve Rules - Simple Talk - Redgate Software
    Apr 14, 2020 · Dr. Codd, the creator of relational databases, was bothered by this, so he set up a set of 13 rules that a product had to match to be considered relational.
  18. [18]
    50 years of the relational database - Oracle
    Feb 19, 2024 · The relational database has come a long way since those early days when Oracle V2 ran on a Digital Equipment Corporation PDP-11 minicomputer.
  19. [19]
    The early history of databases and DB2 - DataGeek.blog
    Feb 2, 2012 · IBM had “System R” internally before that, and “SQL/DS” in in 1981 (later renamed to DB2 for VM/VSE). DB2 as such was released in 1983.
  20. [20]
    The SQL Standard - ISO/IEC 9075:2023 (ANSI X3.135)
    Oct 5, 2018 · In 1986, the SQL language became formally accepted, and the ANSI Database Technical Committee (ANSI X3H2) of the Accredited Standards Committee ...
  21. [21]
    The Evolution of SQL: From SQL-86 to SQL-2023 - Coginiti
    Jan 18, 2024 · Our journey through the evolution of SQL, from SQL-86 to the latest SQL:2023, is not just a chronicle of technical enhancements.
  22. [22]
    6 The Rise of Relational Databases | Funding a Revolution
    What Codd called the "relational model" rested on two key points: ... It also became clear that the relational model has limitations, particularly in handling ...<|control11|><|separator|>
  23. [23]
    MySQL Retrospective - The Early Years - Oracle Blogs
    Dec 1, 2024 · The first version of MySQL was released in May 1995. 1996. The first external release of MySQL was published in August 1996. This release ...Missing: PostgreSQL | Show results with:PostgreSQL
  24. [24]
    A brief history of databases: From relational, to NoSQL, to distributed ...
    Feb 24, 2022 · Oracle brought the first commercial relational database to market in 1979 followed by DB2, SAP Sysbase ASE, and Informix. In the 1980s and ...
  25. [25]
    Database Management System Market Projected to Reach $225.17 ...
    May 5, 2025 · Relational database segment of the Database Management System market dominated with a revenue share of 64% in 2023. This dominance is attributed ...
  26. [26]
    Extending the database relational model to capture more meaning
    In this paper we propose extensions to the relational model to support certain atomic and molecular semantics. These extensions represent a synthesis of many ...
  27. [27]
    RELATIONAL DATABASES
    KINDS OF RELATIONS. Base relations: The real relations. Called "base table" in SQL. Views: The virtual relations. A view is a named, derived relation.
  28. [28]
    Relational Databases
    The theory of relations states that data is arranged as various sets of tuples, called relations, where a tuple is collection of values for attributes.
  29. [29]
    [PDF] Lecture Notes - 01 Relational Model & Algebra - CMU 15-445/645
    Every attribute can be a special value, NULL, which means for a given tuple the attribute is undefined. A relation with n attributes is called an n-ary relation ...
  30. [30]
    A formal definition of the relational model | ACM SIGMOD Record
    The relational model of data, originally introduced by Codd in [1], has three components: (1) a set of objects (relations, domains, etc.)
  31. [31]
    Relational Schema - an overview | ScienceDirect Topics
    A relational schema is defined as a set of relational tables and associated items that are interrelated, encompassing base tables, views, indexes, domains, ...
  32. [32]
    Data Types - Oracle Help Center
    Oracle Database provides a number of built-in data types as well as several categories for user-defined types that can be used as data types.
  33. [33]
    Data types (Transact-SQL) - SQL Server - Microsoft Learn
    Nov 6, 2024 · This article provides a summary of the different data types available in the SQL Server Database Engine.
  34. [34]
    Documentation: 18: CREATE DOMAIN - PostgreSQL
    CREATE DOMAIN creates a new domain. A domain is essentially a data type with optional constraints (restrictions on the allowed set of values).
  35. [35]
    Schema evolution in database systems: an annotated bibliography
    Schema Evolution is the ability of a database system to respond to changes in the real world by allowing the schema to evolve. In many systems this property ...
  36. [36]
    Relational Database Schema Evolution: An Industrial Case Study
    We recorded and studied the actions of a database architect during a complex evolution of the database at the core of a software system.
  37. [37]
    Automating the database schema evolution process
    Supporting database schema evolution represents a long-standing challenge of practical and theoretical importance for modern information systems.
  38. [38]
    1.3. Relational Integrity Constraints — Database - OpenDSA
    Are conditions that must hold on all valid relation instances. These conditions maintain database correctness by preventing errors and inconsistencies.
  39. [39]
    [PDF] The Relational Data Model and Relational Database Constraints
    Primary Key Constraint. ▫ Primary Key Constraint: ▫ Any Value of Primary Key Should Be Unique. ▫ No Duplicates. ▫ Entity Integrity Constraint. ▫ No primary ...
  40. [40]
    [PDF] The Relational Model theoretical foundation
    Formal Definition (Cont.) • A relation may be regarded as a set of tuples. • A relation is formed over the cartesian product of the ...
  41. [41]
    Foreign key (referential) constraints - IBM
    The delete rule of a referential constraint is specified when the referential constraint is defined. The choices are NO ACTION, RESTRICT, CASCADE, or SET NULL.Missing: relational | Show results with:relational
  42. [42]
    21 Data Integrity
    The rules associated with referential integrity are: Restrict: Disallows the update or deletion of referenced data.
  43. [43]
    Primary and foreign key constraints - SQL Server - Microsoft Learn
    Feb 4, 2025 · By using cascading referential integrity constraints, you can define the actions that the Database Engine takes when a user tries to delete or ...
  44. [44]
    Unique constraints and check constraints - SQL - Microsoft Learn
    Feb 4, 2025 · UNIQUE constraints and CHECK constraints are two types of constraints that can be used to enforce data integrity in SQL Server tables.Missing: entity | Show results with:entity
  45. [45]
    Primary key, referential integrity, check, and unique constraints - IBM
    A (table) check constraint sets restrictions on data added to a specific table. A unique constraint (also referred to as a unique key constraint) is a rule ...<|control11|><|separator|>
  46. [46]
    Types of constraints - IBM
    A unique constraint (also referred to as a unique key constraint) is a rule that forbids duplicate values in one or more columns within a table. Unique and ...
  47. [47]
    [PDF] Chapter 3: Relational Model
    The operators take two or more relations as inputs and give a new relation as a result. Database Systems Concepts. 3.8. Silberschatz, Korth and Sudarshan c 1997 ...
  48. [48]
    A history and evaluation of System R | Communications of the ACM
    This paper describes the three principal phases of the System R project and discusses some of the lessons learned from System R about the design of relational ...
  49. [49]
    Types of SQL Statements - Oracle Help Center
    Data definition language (DDL) statements let you to perform these tasks: Create, alter, and drop schema objects. Grant and revoke privileges and roles.
  50. [50]
    D
    Data Control Language (DCL) The category of SQL statements that control access to the data and to the database. Examples are the GRANT and REVOKE statements.
  51. [51]
    ISO/IEC 9075-2:2016 - Information technology — Database languages
    ISO/IEC 9075-2:2016 defines the data structures and basic operations on SQL-data. It provides functional capabilities for creating, accessing, maintaining, ...<|control11|><|separator|>
  52. [52]
    SQL:2023 is finished: Here is what's new | Peter Eisentraut
    Apr 4, 2023 · Various smaller changes to the existing SQL language; New features related to JSON; A new part for property graph queries. Let's look at each ...
  53. [53]
    [PDF] Further Normalization of the Data Base Relational Model
    The objectives of this further normalization are: 1) To free the collection of relations from undesirable insertion, update and deletion dependencies;. 2) To ...
  54. [54]
    Normalized data base structure
    NORMALIZED. DATA BASE STRUCTURE: A BRIEF TUTORIAL by. E. F. Codd. IBM Research Laboratory. San Jose, California. ABSTRACT: Casual and other users of large ...
  55. [55]
    [PDF] Dependency Structures of Data Base Relationships
    W. W. Armstrong,. Dependency structures of data base relationships. 5 81 sequence of values. The arbitrary imposition of an ordering on all sets of attributes.Missing: William W.
  56. [56]
    Multivalued dependencies and a new normal form for relational ...
    By using this concept, a new (“fourth”) normal form for relation schemata is defined. This fourth normal form is strictly stronger than Codd's “improved third ...
  57. [57]
    Normal forms and relational database operators - ACM Digital Library
    Aho, C. Beeri, and J. D. Ullman, The theory of joins in relational data bases. Proc. 19th IEEE Symp. on Foundations of Computer Science (Oct. 1977) ...Missing: original | Show results with:original
  58. [58]
    The theory of joins in relational databases - ACM Digital Library
    Abstract. Answering queries in a relational database often requires that the natural join of two or more relations be computed. However, the result of a join ...Missing: Fifth original
  59. [59]
    Denormalization guidelines for base and transaction tables
    This article outlines heuristic guidelines for denormalizing transaction tables in relational databases. Denormalization as a process seeks to improve the ...Missing: seminal | Show results with:seminal
  60. [60]
    Denormalization strategies for data retrieval from data warehouses
    The goal of this paper is to provide comprehensive guidelines regarding when and how to effectively exercise denormalization. We further propose common ...Abstract · Normalization Vs... · Database Design With...Missing: seminal | Show results with:seminal
  61. [61]
    Normalization in a Mixed OLTP and OLAP Workload Scenario
    Aug 7, 2025 · In this paper, we present a case study on the impact of database design focusing on normalization with respect to various workload mixes and ...
  62. [62]
    [PDF] Materialized Views in Oracle - VLDB Endowment
    Oracle Materialized Views (MVs) are designed for data warehousing and replication. For data warehousing, MVs based on inner/outer equi- joins with optional ...Missing: seminal | Show results with:seminal
  63. [63]
    [PDF] Maintenance of Materialized Views - Informatics Homepages Server
    Abstract. In this paper we motivate and describe materialized views, their applications, and the problems and techniques for their maintenance.<|separator|>
  64. [64]
    [PDF] Jim Gray - The Transaction Concept: Virtues and Limitations
    This paper restates the transaction concepts and attempts to put several implementation approaches in perspective. It then describes some areas which require ...Missing: relational | Show results with:relational
  65. [65]
    [PDF] A Critique of ANSI SQL Isolation Levels - Microsoft
    ANSI SQL isolation levels are defined by phenomena, but this paper argues they are ambiguous and fail to characterize popular levels, and the anomaly approach ...
  66. [66]
    [PDF] Lecture Notes in Computer Science - Jim Gray
    Notes on Data Base Operating Systems. Jim Gray. IBM Research Laboratory. San Jose, California. 95193. Summer 1977. ACKNOWLEDGMENTS. This paper plagiarizes the ...
  67. [67]
    Developing and Using Stored Procedures
    The main types of program units you can create with PL/SQL and store in the database are standalone procedures and functions, and packages.
  68. [68]
    Introduction to Oracle Database
    Oracle7, released in 1992, introduced PL/SQL stored procedures and triggers. Objects and partitioning. Oracle8 was released in 1997 as the object-relational ...
  69. [69]
    9 PL/SQL Triggers - Database - Oracle Help Center
    A trigger is like a stored procedure that Oracle Database invokes automatically whenever a specified event occurs.
  70. [70]
    DML Triggers - Oracle Help Center
    A DML trigger is created on either a table or view, and its triggering event is composed of the DML statements DELETE, INSERT, and UPDATE.
  71. [71]
    CREATE TRIGGER (Transact-SQL) - SQL Server - Microsoft Learn
    Sep 29, 2025 · Creates a DML, DDL, or logon trigger. A trigger is a special type of stored procedure that automatically runs when an event occurs in the database server.Syntax · Arguments
  72. [72]
    9 Using Triggers - Oracle Help Center
    Triggers are procedures that are stored in the database and implicitly run, or fired, when something happens. Traditionally, triggers supported the execution of ...
  73. [73]
    6 Partitions, Views, and Other Schema Objects - Oracle Help Center
    A view is a logical representation of one or more tables. In essence, a view is a stored query. Overview of Materialized Views A materialized view is a query ...<|separator|>
  74. [74]
    CREATE VIEW (Transact-SQL) - SQL Server - Microsoft Learn
    Apr 9, 2025 · Use this statement to create a view of the data in one or more tables in the database. For example, a view can be used for the following ...
  75. [75]
    Documentation: 18: CREATE VIEW - PostgreSQL
    Be careful that the names and types of the view's columns will be assigned the way you want. For example: CREATE VIEW vista AS SELECT 'Hello World';. is bad ...<|control11|><|separator|>
  76. [76]
    [PDF] Reference model for DBMS standardization: database architecture ...
    the ANSI/SPARC three-schema architecture of data representa- tion, conceptual, external,. .and internal, and is used in the development of the DBMS RM. A ...
  77. [77]
    [PDF] Architecture of a Database System - Berkeley
    We start with a single query's view of the system, focusing on the relational query processor. Following that, we move into the storage architecture and.
  78. [78]
    [PDF] File Organizations and Indexing
    – Heap files: Suitable when typical access is a file scan retrieving all records. – Sorted Files: Best for retrieval in search key order, or only a `range' of ...
  79. [79]
    [PDF] Organization and Maintenance of Large Ordered Indices
    The pages themselves are the nodes of a rather specialized tree, a so-called B-tree, described in the next section. In this paper these trees grow and contract ...
  80. [80]
    [PDF] Bitmap Index Design and Evaluation - CMU 15-721
    A bitmap index uses a vector of bits per attribute value, where the size of each bitmap equals the cardinality of the indexed relation.
  81. [81]
    [PDF] Access Path Selection in a Relational Database Management System
    In section 4 the optimizer cost formulas are introduced for single table queries, and section 5 dis- cusses the joining of two or more tables, and their ...
  82. [82]
    [PDF] Polynomial Heuristics for Query Optimization - Microsoft
    Conclusions. Motivated by new requirements for query processing tech- niques, in this paper we introduced polynomial-time heuristics to optimize SQL queries.
  83. [83]
    Execution plan overview - SQL Server | Microsoft Learn
    Sep 23, 2024 · If all the rows in a table are required but there's an index whose key columns are in an ORDER BY , performing an index scan instead of a table ...
  84. [84]
    [PDF] 15-445/645 Database Systems (Fall 2023) - 06 Buffer Pools
    The buffer pool is an in-memory cache of pages read from disk. It is essentially a large memory region allocated inside of the database to store pages that are ...
  85. [85]
    [PDF] AI/ML-Driven Query Optimization Using Balsa and LEON
    Nov 5, 2024 · A database tuning service for PG and MySQL start-up out of Carnegie Mellon University. • Ended 2024 saying "we got screwed over by a Private ...
  86. [86]
    Sharding pattern - Azure Architecture Center | Microsoft Learn
    Divide a data store into a set of horizontal partitions or shards. This can improve scalability when storing and accessing large volumes of data.Context and problem · Solution
  87. [87]
    What is Database Sharding? - Shard DB Explained - Amazon AWS
    Database sharding is a horizontal scaling strategy that allocates additional nodes or computers to share the workload of an application. Organizations benefit ...
  88. [88]
    Database Sharding - System Design - GeeksforGeeks
    Sep 30, 2025 · Database sharding is a technique for horizontal scaling of databases, where the data is split across multiple database instances, or shards, to improve ...Methods Of Sharding · Ways To Optimize Database... · Alternatives To Database...
  89. [89]
    Single-Master and Multi-Master Replication in DBMS - GeeksforGeeks
    Jul 15, 2025 · Multi-Master Replication is where data replication is done in such a way that data is replicated to a group of computers known as master ...
  90. [90]
    [PDF] Advanced Join Strategies for Large-Scale Distributed Computation
    Sep 1, 2014 · As illustrated in Section 1, handling data skew is a crucial challenge in large-scale distributed applications. We next present a class of ...
  91. [91]
    Towards a Non-2PC Transaction Management in Distributed ...
    The key contributor to the degradation is the expensive two-phase commit (2PC) protocol used to ensure atomic commitment of distributed transactions. In this ...
  92. [92]
    Sharding - The Vitess Docs
    Sharding is a method of horizontally partitioning a database to store data across two or more database servers. This document explains how sharding works in ...Overview · Supported Operations · Sharding scheme · Key Ranges and Partitions
  93. [93]
    Relational Database – Amazon Aurora MySQL PostgreSQL Features
    Amazon Aurora combines high-end speed with open-source simplicity, offering high performance, scalability, high availability, and cost-effectiveness.Amazon Aurora Features · Why Amazon Aurora? · High Performance And...
  94. [94]
    What is Amazon Aurora? - Amazon Aurora - AWS Documentation
    Amazon Aurora is a fully managed relational database engine compatible with MySQL and PostgreSQL, part of Amazon RDS, with high-performance storage.
  95. [95]
    Cloud SQL for MySQL, PostgreSQL, and SQL Server - Google Cloud
    Fully managed, cost-effective relational database service for PostgreSQL, MySQL, and SQL Server. Try Enterprise Plus edition for a 99.99% availability SLA.Pricing · Cloud SQL for SQL Server · Cloud SQL overview · MySQLMissing: 2025 | Show results with:2025
  96. [96]
    Google Cloud SQL: Automated MySQL, PostgreSQL & SQL Server ...
    Oct 24, 2025 · Google Cloud SQL is a fully managed, cloud-based relational database service that allows you to set up, manage, and maintain MySQL, PostgreSQL, ...
  97. [97]
    CockroachDB | Distributed SQL for always-on customer experiences
    CockroachDB is a cloud-native, Postgres-compatible database built for zero downtime, seamless scaling, and distributed, ACID-compliant transactions.Careers · CockroachDB Pricing · About · CockroachDB Docs
  98. [98]
    What is distributed SQL? The evolution of the database
    Jan 16, 2025 · Distributed SQL combines relational database consistency with NoSQL scalability, replicating data across multiple nodes, and uses SQL.
  99. [99]
    What Is a Distributed Database? A Complete Guide - TiDB
    May 29, 2025 · A distributed database stores data across multiple servers (or nodes) in different locations but appears as a single system.
  100. [100]
    TiDB vs. CockroachDB: the Distributed Clash between MySQL and ...
    Mar 25, 2025 · Both TiDB and CockroachDB offer compelling solutions for scaling beyond traditional single-node databases, but with different approaches and trade-offs.
  101. [101]
    Autonomous AI Database Select AI - Oracle
    Use natural language and generative AI with Oracle Autonomous AI Database Select AI to effortlessly gain data insights and enhance business intelligence.Oracle India · Oracle ASEAN · Oracle Nederland · Oracle APACMissing: relational 2024
  102. [102]
    Examples of Using Select AI - Oracle Help Center
    Oct 14, 2025 · Explore integrating Oracle's Select AI with various supported AI providers to generate, run, and explain SQL from natural language prompts or chat with the LLM.Example: Enable... · Example: Improve Sql Query... · Example: Generate Synthetic...
  103. [103]
    The State of RDBMS in 2025: Recent Trends and Developments
    Oct 15, 2025 · Explore the top RDBMS trends shaping 2024–2025, including serverless databases, AI-driven query optimization, and hybrid OLTP/OLAP solutions.
  104. [104]
    The Hybrid Approach with SQL, NoSQL, and NewSQL in 2025 Web ...
    Feb 24, 2025 · NewSQL: Bridging the Gap Between SQL and NoSQL. NewSQL databases (e.g., CockroachDB, SingleStore, TiDB) aim to combine ACID transactions and ...<|separator|>
  105. [105]
    What are Cloud-Native Databases? Key Benefits & Use Cases
    Feb 5, 2025 · Easy scalability, high availability, low costs, and minimal functional overhead are some of the perceived benefits of products in this category.
  106. [106]
    The Ultimate Guide to Cloud Native Databases - RisingWave
    Jun 3, 2024 · This dynamic scalability ensures optimal performance during peak usage periods without incurring unnecessary costs. Leveraging the cloud's ...Understanding Cloud Native... · Benefits Of Cloud Native... · Use Cases And Future Trends<|separator|>
  107. [107]
    Cloud Database and DBaas Market Size & Share Analysis
    Jun 23, 2025 · The Cloud Database And DBaaS Market is expected to reach USD 23.84 billion in 2025 and grow at a CAGR of 19.92% to reach USD 59.13 billion ...
  108. [108]
    90+ Cloud Computing Statistics: A 2025 Market Snapshot - CloudZero
    May 12, 2025 · 47% are pursuing a cloud-first strategy, 30% are already cloud-native, and 37% intend to become cloud-native in about three years. Only 5% plan ...