Fact-checked by Grok 2 weeks ago

Data manipulation language

A data manipulation language (DML) is a computer programming language used for adding (inserting), deleting, and modifying (updating) data in a database. In the context of the Structured Query Language (SQL), standardized under ISO/IEC 9075, DML enables the retrieval, insertion, modification, and deletion of data within relational database management systems (RDBMS). SQL's DML provides core mechanisms for interacting with stored data without altering the database schema.^[1] The primary DML statements in SQL include SELECT, which retrieves data from one or more tables based on specified criteria; INSERT, which adds new rows to a table; UPDATE, which modifies existing rows; and DELETE, which removes rows from a table.^[2] Additional statements like MERGE in some systems combine insert, update, and delete operations in a single command for efficient data synchronization.^[3] These operations are typically executed within transactions to ensure data integrity, where changes remain provisional until committed or rolled back.^[1] SQL's DML distinguishes itself from other sublanguages, such as Data Definition Language (DDL) for schema management and Data Control Language (DCL) for access permissions, by focusing exclusively on data content manipulation.^[2] The concept of DML predates SQL, with origins in early database systems, while SQL's DML was first standardized in 1986 (ANSI X3.135 and ISO 9075) and has evolved through revisions, including ISO/IEC 9075:2023 with enhancements for modern data processing.^[4] Its widespread use across major RDBMS platforms, including Oracle, SQL Server, MySQL, and Snowflake, underscores its foundational role in database operations and application development.^[3]

Overview

Definition and Purpose

Data Manipulation Language (DML) is a specialized component of database programming languages designed to handle operations for selecting, inserting, updating, and deleting data stored in databases.^[1]^[5] Unlike schema definition or access control mechanisms, DML exclusively targets the manipulation of existing data instances within database tables.^[1]^[6] The core purpose of DML is to facilitate efficient data retrieval through querying and to support modifications via the fundamental CRUD operations—Create (inserting new records), Read (retrieving data), Update (altering existing records), and Delete (removing records)—which underpin most database-driven applications.^[6]^[7] These capabilities enable seamless interactions between applications and databases, accommodating diverse use cases such as generating reports from queried datasets or processing transactions that involve real-time data updates.^[8] By focusing on data content rather than structural elements, DML ensures that operations remain isolated from changes to the database schema, such as those handled by Data Definition Language (DDL).^[1] Key characteristics of DML include its primarily declarative nature in standard implementations like SQL, where users describe the desired outcome (e.g., which data to retrieve or modify) without specifying the underlying execution steps, allowing the database engine to optimize performance. DML emphasizes operations on individual data rows or sets of instances, preserving the separation between content manipulation and database architecture to avoid unintended structural impacts.^[1]^[5] In database management systems (DBMS), DML's role is essential for upholding data integrity and system efficiency, as it groups related manipulations into transactions that can be committed for permanence or rolled back to restore prior states, thereby preventing inconsistencies during concurrent access.^[1]^[8] This separation of data logic from schema alterations minimizes risks to overall database stability while supporting scalable application development.^[1] SQL represents the most widely adopted DML standard, as defined in ANSI/ISO specifications for relational databases.^[5]

Distinction from Other Database Languages

Data Definition Language (DDL) focuses on defining and modifying the database schema and structure, using statements such as CREATE, ALTER, and DROP to create, alter, or remove objects like tables, indexes, and views. In contrast, Data Manipulation Language (DML) is dedicated to retrieving, inserting, updating, and deleting data within those pre-established structures, without altering the schema itself; DML operations thus depend on schemas defined by DDL to function effectively.^[9]^[10]^[11] Data Control Language (DCL) handles authorization and access management, employing commands like GRANT and REVOKE to assign or withdraw permissions on database objects and operations. While DML statements require DCL-defined privileges to execute successfully, DML itself lacks mechanisms for setting or enforcing security policies, ensuring that access control remains a distinct layer.^[9] Transaction Control Language (TCL) manages the grouping and finalization of database operations to maintain consistency, through statements such as COMMIT, ROLLBACK, and SAVEPOINT. DML actions occur within TCL-controlled transactions, which provide the framework for atomicity, consistency, isolation, and durability (ACID) properties, but DML does not directly control transaction boundaries.^[9] This separation of concerns into DDL, DCL, DML, and TCL in SQL sublanguages enables modular DBMS architecture by isolating schema management, data operations, security enforcement, and transaction oversight, thereby improving maintainability, reducing errors, and supporting robust security and ACID compliance in implementations like Oracle Database and MySQL.^[9]^[10] For instance, in some systems, executing a DDL statement implicitly commits any ongoing transaction, illustrating integration where structural changes interact with transaction controls without merging the languages.^[9]

History

Origins in Early Database Systems

The origins of data manipulation language (DML) trace back to the late 1960s, when early database systems required structured mechanisms for accessing and modifying data in complex, non-relational models. One foundational influence was the Conference on Data Systems Languages (CODASYL), whose Data Base Task Group (DBTG) released its initial report in October 1969, introducing specifications for a navigational DML tailored to network databases.^[12] This DML extended COBOL with commands for record-level operations, enabling programmers to traverse linked records via sets and pointers, as seen in early implementations like the Integrated Database Management System (IDMS), which commercialized these concepts for mainframe environments.^[13] The 1969 report emphasized procedural navigation over declarative queries, reflecting the era's focus on efficient handling of interconnected data structures in batch-oriented processing.^[14] Parallel to CODASYL's efforts, IBM developed the Information Management System (IMS) in 1968, originally as part of the ICS/DL/I project for NASA's Apollo program to manage inventory and bills of materials for rocket components.^[15] IMS employed a hierarchical model, organizing data in tree-like parent-child relationships, and introduced the Data Language/Interface (DL/I) as its core DML for data access and updates.^[15] DL/I calls, such as GU (Get Unique) and ISRT (Insert), allowed sequential or qualified navigation along hierarchical paths, supporting record-oriented manipulation in high-volume transaction environments.^[15] This approach prioritized pointer-based traversal for parent-dependent child segments, making it suitable for structured, bill-of-materials applications in mainframe systems.^[14] These early DMLs introduced key concepts like record-oriented manipulation, where operations targeted individual records rather than sets or relations, and pointer-based navigation to follow predefined links in network or hierarchical schemas.^[16] They emphasized batch processing for large-scale data handling, often in COBOL-embedded code, but suffered limitations such as the absence of declarative querying, which forced programmers to specify exact access paths and complicated ad-hoc retrievals.^[16] CODASYL's set-based navigation and IMS's segment hierarchies, while innovative, required detailed schema knowledge and lacked flexibility for evolving relationships.^[14] The drive toward these navigational DMLs stemmed from the rapid growth of data volumes in mainframe environments during the 1960s, fueled by affordable disk storage and the shift to third-generation computers with online terminals, which overwhelmed traditional file systems and necessitated more systematic data access methods.^[14] By the early 1970s, these limitations—particularly the procedural rigidity amid increasing data complexity—paved the way for relational models that promised simpler, set-oriented manipulation.^[14]

Standardization and Evolution

The foundations of data manipulation language (DML) in relational databases were laid by Edgar F. Codd's 1970 relational model, which proposed a declarative approach to data querying and manipulation using set-based operations on relations, moving away from navigational methods in earlier systems.^[17] This model emphasized non-procedural specifications where users describe what data is needed rather than how to retrieve it, influencing the design of subsequent DMLs.^[17] IBM's System R prototype, initiated in 1974 under Codd's guidance, implemented this model through the SEQUEL language (later renamed SQL), introducing core DML operations like SELECT for querying and INSERT for data addition in a prototype relational database management system (RDBMS).^[18] The project demonstrated the feasibility of declarative DML, achieving high data independence and set-oriented processing, which became hallmarks of modern SQL-based DML.^[18] Commercial adoption accelerated with Oracle's release of Version 2 in 1979, the first SQL implementation available outside IBM, followed by IBM's DB2 in 1983, which integrated SQL DML into production environments for mainframe systems.^[19]^[20] The first formal standardization occurred with ANSI's SQL-86 in 1986, which codified essential DML statements such as SELECT, INSERT, UPDATE, and DELETE, establishing a baseline for portable relational DML across systems.^[21] Subsequent revisions by ANSI and ISO refined and expanded DML capabilities: SQL-92 introduced outer joins and enhanced query expressiveness for complex data retrieval; SQL:2003 added window functions for advanced analytics over partitions of result sets; SQL:2011 incorporated temporal tables to support time-based data manipulation; and SQL:2023 extended DML with native JSON handling and property graph queries to address semi-structured and graph data integration.^[4]^[22] This standardization process drove a paradigm shift from procedural, record-at-a-time operations in pre-relational databases to efficient set-based DML, enabling optimizations like query rewriting and index utilization that scaled with growing data volumes.^[17] The adoption in commercial DBMS like Oracle and DB2 further propelled this transition, as vendors aligned with ANSI/ISO to ensure interoperability while leveraging relational DML's declarative nature for performance gains.^[19]^[20] In modern contexts as of 2025, SQL DML continues to evolve through ISO efforts, integrating with big data ecosystems via features like JSON and graph extensions in SQL:2023, with ongoing work toward hybrid querying that bridges relational and distributed systems without delving into non-relational specifics.^[4]^[23]

DML in SQL

Core DML Statements

The core Data Manipulation Language (DML) statements in SQL—SELECT, INSERT, UPDATE, and DELETE—enable users to retrieve, add, modify, and remove data from relational database tables, forming the foundation for data interaction in standardized SQL implementations. These statements adhere to the syntax and semantics defined in the ISO/IEC 9075 SQL standard, with practical implementations in systems like PostgreSQL that closely follow ANSI/ISO guidelines. They support essential operations while integrating with transaction control mechanisms to ensure data consistency; for instance, DML statements can be grouped into transactions managed by statements such as COMMIT and ROLLBACK. General syntax rules across these statements include the use of joins in the FROM clause to combine data from multiple tables and subqueries in the WHERE clause for complex filtering, with error handling for constraint violations like foreign keys raising exceptions to prevent invalid data modifications. The SELECT statement serves as the primary mechanism for querying and retrieving data from one or more tables, allowing users to specify columns, filter rows, and organize results for analysis or reporting. Its basic syntax is:

SELECT [ALL | DISTINCT] { * | expression [[AS] output_name] } [, ...]  
FROM from_item [, ...]  
[WHERE condition]  
[GROUP BY grouping_element [, ...]]  
[ORDER BY expression [ASC | DESC] [, ...]];  
SELECT [ALL | DISTINCT] { * | expression [[AS] output_name] } [, ...]  
FROM from_item [, ...]  
[WHERE condition]  
[GROUP BY grouping_element [, ...]]  
[ORDER BY expression [ASC | DESC] [, ...]];

Here, the FROM clause identifies source tables or views, potentially incorporating INNER JOIN, LEFT JOIN, or other join types to relate data across tables; the WHERE clause applies conditions, which may include subqueries like (SELECT ... WHERE ...) for nested filtering. GROUP BY groups rows by common values in specified columns, enabling aggregate operations, while ORDER BY sorts the output. For example, SELECT name, age FROM employees WHERE age > 30 ORDER BY age DESC; retrieves and sorts employee data meeting the age criterion, avoiding unnecessary full-table scans through selective conditions. This statement does not modify data but supports read-only retrieval, making it essential for data exploration in relational databases. INSERT facilitates adding new rows to a table, supporting both single-row and bulk insertions to populate or extend datasets efficiently. The core syntax for single- or multi-row insertion is:

INSERT INTO table_name [(column_name [, ...])]  
{ VALUES ( {expression | DEFAULT} [, ...] ) [, ...] | DEFAULT VALUES | query };  
INSERT INTO table_name [(column_name [, ...])]  
{ VALUES ( {expression | DEFAULT} [, ...] ) [, ...] | DEFAULT VALUES | query };

When using VALUES, explicit values or DEFAULT placeholders are provided for columns, with omitted columns defaulting to their defined values or NULL if no default exists; the query variant, such as INSERT INTO target SELECT * FROM source;, enables bulk transfer from another table or subquery. Constraints like primary keys or foreign keys are enforced during insertion, triggering errors if violations occur, such as attempting to insert a foreign key value not present in the referenced table. For instance, INSERT INTO employees (name, department_id) VALUES ('Alice', 5); adds a single row while respecting referential integrity, whereas multi-row forms like INSERT INTO employees VALUES ('Bob', 3), ('Charlie', 4); handle batch operations to minimize overhead. UPDATE modifies existing rows in a table by altering column values based on specified criteria, ensuring targeted changes without affecting unrelated data. Its fundamental syntax in the core ISO standard is:

UPDATE table_name [ [ AS ] alias ]  
SET column_name = { expression | DEFAULT | NULL } [, ...]  
[ WHERE condition ];  
UPDATE table_name [ [ AS ] alias ]  
SET column_name = { expression | DEFAULT | NULL } [, ...]  
[ WHERE condition ];

The SET clause assigns new values, which can reference expressions, defaults, or subqueries for values from other tables; for example, UPDATE employees SET salary = salary * 1.1 WHERE dept_id IN (SELECT id FROM departments WHERE budget > 100000); propagates department-based adjustments across related rows using a subquery. The WHERE clause is crucial to avoid full-table scans and unintended modifications, limiting updates to matching conditions that may involve subqueries or joins within the condition. If an update would violate constraints, such as a foreign key mismatch, the database raises an error, preserving data integrity; some implementations support multi-table updates via extensions like a FROM clause.^[24] DELETE removes rows from a table that satisfy a given condition, providing a precise way to eliminate obsolete or erroneous data while maintaining referential dependencies. The standard syntax is:

DELETE FROM table_name [ [ AS ] alias ]  
[ WHERE condition ];  
DELETE FROM table_name [ [ AS ] alias ]  
[ WHERE condition ];

Without a WHERE clause, all rows are deleted, but typically, conditions with subqueries or joins—e.g., DELETE FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE status = 'inactive');—target specific subsets to prevent mass removal. Cascading deletes can be configured via foreign key constraints with ON DELETE CASCADE, automatically propagating deletions to dependent tables upon a successful primary row removal. Unlike TRUNCATE, which is a Data Definition Language (DDL) operation that swiftly clears all rows without logging or conditions, DELETE is logged for rollback purposes and respects WHERE filters, making it suitable for selective cleanup. Constraint violations, such as deleting a row referenced by a foreign key without CASCADE, result in errors to enforce relational integrity. Some implementations extend DELETE with clauses like USING for multi-table operations.^[25]

Advanced Features and Clauses

Advanced features in SQL Data Manipulation Language (DML) extend the core capabilities of SELECT, INSERT, UPDATE, and DELETE statements by incorporating analytical processing, complex relational operations, and procedural constructs. These enhancements, introduced progressively through SQL standards, enable more sophisticated data summarization, querying across multiple tables, and conditional modifications without requiring multiple separate statements. They are particularly valuable in scenarios involving large datasets where efficiency and readability are paramount. Aggregate functions form a foundational advanced element, allowing summarization of data within SELECT statements. Common aggregates include COUNT for tallying rows, SUM for totalizing numeric values, and AVG for computing means, typically paired with the GROUP BY clause to partition results into subsets based on specified columns. For instance, to summarize sales by region, a query might use SELECT [region](/page/Region), [SUM](/page/Sum)(sales) AS total_sales FROM orders GROUP BY [region](/page/Region);. The HAVING clause further refines these groupings by applying conditions to aggregate results, such as filtering groups where the average exceeds a threshold:

SELECT [region](/page/Region), AVG(sales) AS avg_sales FROM orders GROUP BY [region](/page/Region) HAVING AVG(sales) > 1000;

. This mechanism, standardized in SQL-92, supports efficient data analysis without post-query processing.^[26]^[27] Joins and subqueries enable intricate data retrieval by combining or nesting relations. The INNER JOIN syntax, formalized in the ISO SQL standard, merges rows from two tables based on a matching condition, as in SELECT e.name, d.department FROM employees e INNER JOIN departments d ON e.dept_id = d.id;, returning only paired records. OUTER JOIN variants—LEFT, RIGHT, and FULL—preserve unmatched rows from one or both sides, facilitating comprehensive reporting. Subqueries add nesting for conditional logic; a correlated subquery references outer query columns, enabling dynamic filtering like SELECT name FROM employees e WHERE salary > (SELECT AVG(salary) FROM employees WHERE dept_id = e.dept_id);, which compares each employee's pay to their department's average. These constructs, part of SQL-92 and refined in later standards, optimize complex filtering over flat operations.^[28]^[29] Common Table Expressions (CTEs), introduced in the SQL:1999 standard, utilize the WITH clause to define temporary named result sets within a query, improving modularity and recursion handling. A basic CTE might appear as

WITH dept_avg AS (SELECT dept_id, AVG(salary) AS avg FROM employees GROUP BY dept_id) SELECT e.name FROM employees e JOIN dept_avg da ON e.dept_id = da.dept_id WHERE e.salary > da.avg;

, allowing reuse of the aggregated subresult without repetition. Recursive CTEs extend this for hierarchical data, such as traversing organizational structures:

WITH RECURSIVE hierarchy AS (SELECT id, manager_id, name FROM employees WHERE manager_id IS NULL UNION ALL SELECT e.id, e.manager_id, e.name FROM employees e JOIN hierarchy h ON e.manager_id = h.id) SELECT * FROM hierarchy;

. This feature enhances query readability and supports tree-like traversals in DML contexts.^[30]^[31] Window functions, added in SQL:2003, perform calculations across row sets without collapsing them via grouping, using the OVER clause for partitioning and ordering. The PARTITION BY subclause divides data into windows, while ORDER BY specifies sequence within each; for example, SELECT name, salary, ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rank FROM employees; assigns unique ranks per department based on descending salary. The RANK function handles ties by assigning shared values and skipping subsequent numbers, contrasting ROW_NUMBER's strict sequencing. These functions enable ranking, running totals, and moving averages—e.g., SUM(salary) OVER (ORDER BY hire_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) for cumulative sums—preserving all rows for detailed analytics.^[32]^[33] The MERGE statement, standardized in SQL:2003, supports upsert operations by conditionally inserting, updating, or deleting rows in a target table based on a source dataset. Its syntax integrates a join condition with WHEN clauses:

MERGE INTO target t USING source s ON t.id = s.id WHEN MATCHED THEN UPDATE SET t.value = s.value WHEN NOT MATCHED THEN INSERT (id, value) VALUES (s.id, s.value);

, atomically handling matches and non-matches to prevent duplicates and ensure synchronization. This single-statement approach reduces transaction overhead in bulk data loading scenarios, such as ETL processes.^[34] Procedural extensions integrate DML into stored procedures, allowing block-structured programming with control flow. SQL/PSM (Persistent Stored Modules), part of SQL:1999, defines syntax for procedures containing DECLARE, BEGIN...END blocks, and handlers for exceptions. Cursors facilitate row-by-row iteration: DECLARE cur CURSOR FOR SELECT * FROM employees; OPEN cur; FETCH cur INTO vars; CLOSE cur;, enabling fine-grained processing where set-based operations are infeasible, such as custom logging per record. These elements, implemented in systems like DB2 and SQL Server, extend DML for application-like logic within the database.^[35]^[36]

DML in Non-Relational Databases

Hierarchical and Network Models

In hierarchical database models, data is organized in a tree-like structure where each record (or segment) has a single parent but can have multiple children, facilitating parent-child relationships. IBM's Information Management System (IMS), introduced in the late 1960s, exemplifies this model through its Data Language/I (DL/I), a procedural DML designed for navigating and manipulating hierarchical data. DL/I uses calls such as GU (Get Unique) to retrieve a specific segment by key, ISRT (Insert) to add new segments under an existing parent, and DLET (Delete) to remove a segment along with its dependents, ensuring structural integrity in tree navigation.^[37] These operations are typically embedded in application programs written in languages like COBOL, requiring explicit path traversal from root to leaf segments for data access.^[38] Network database models extend the hierarchical approach by allowing records to have multiple parents and children through linked sets, forming a graph structure for more complex relationships. The CODASYL Data Base Task Group (DBTG) specifications from 1971 defined a DML that operates on these sets, incorporating concepts like schema currency (the current set type) and record currency (the current record instance) to maintain navigational context across operations. Key DML commands include FIND to locate records via qualifiers (e.g., FIND OWNER to position on a set owner or FIND NEXT MEMBER to traverse set members) and GET to retrieve the current record's data.^[39] For instance, a navigational path might involve FIND OWNER followed by GET CHILDREN to access dependent records in a set, with updates like MODIFY or DELETE applied to the current currency.^[40] These commands support batch-oriented processing, where multiple operations are sequenced in programs to handle linked data efficiently.^[16] Both models found extensive use in legacy applications on mainframe systems, particularly in banking and financial sectors where predictable, high-volume transaction patterns—such as account hierarchies or linked customer records—benefited from their performance advantages in known-access scenarios. IMS, for example, powered critical operations at institutions like First National Bank, enabling rapid retrieval and updates in hierarchical structures for customer data aggregation across departments.^[15] Similarly, CODASYL implementations supported complex linkages in inventory and personnel systems on mainframes, offering low-latency navigation for predefined paths. However, their procedural nature introduced significant complexity for ad-hoc queries, as programmers had to manually code traversals without declarative support, often resulting in inflexible and error-prone applications that contributed to the widespread adoption of relational models in the 1980s.^[41]

NoSQL and Document-Oriented Approaches

In NoSQL databases, data manipulation languages (DML) diverge from relational SQL paradigms to accommodate unstructured or semi-structured data, emphasizing flexibility, horizontal scalability, and distributed operations over rigid schemas. Document-oriented databases, a prominent NoSQL category, store data as JSON-like documents, enabling DML operations that handle nested structures and varying field sets without predefined tables. These approaches prioritize high-throughput writes and reads in large-scale environments, often trading immediate consistency for availability and partition tolerance as per the CAP theorem.^[42] Document stores like MongoDB exemplify this model through its query language, which supports CRUD operations via methods such as find() for querying documents, insertOne() or insertMany() for creation, updateOne() or updateMany() for modifications, and deleteOne() or deleteMany() for removals. These operations use a JSON-like syntax to manipulate embedded documents and arrays, allowing queries like db.collection.find({ "field": "value" }) to retrieve matching records efficiently. For instance, updates can target specific fields within nested objects, such as incrementing a counter in an array element, supporting complex data hierarchies without joins.^[43]^[44]^[45] Key-value stores, such as Redis, employ simple imperative commands for DML, including SET key value to insert or update a value, GET key to retrieve it, and DEL key to remove it, optimized for in-memory operations with sub-millisecond latency. These commands handle atomic operations on strings, hashes, or lists, facilitating caching and real-time applications. In column-family stores like Apache Cassandra, the Cassandra Query Language (CQL) provides a SQL-like interface for wide-column data, with statements like INSERT INTO table (key, column) VALUES (value1, value2), SELECT * FROM table WHERE key = value, UPDATE table SET column = value WHERE key = value, and DELETE FROM table WHERE key = value to manage distributed partitions across clusters. CQL's design supports tunable consistency levels, enabling writes that propagate asynchronously for scalability.^[46]^[47]^[48] Graph databases introduce pattern-matching DML through languages like Cypher in Neo4j, which uses declarative clauses such as MATCH (n:[Node](/page/Node) {property: value}) RETURN n to traverse relationships, CREATE (n:[Node](/page/Node) {property: value}) to add nodes or edges, and MERGE (n:[Node](/page/Node) {property: value}) to conditionally create or match patterns, avoiding duplicates in property graphs. This facilitates queries on interconnected data, such as finding shortest paths or communities, without recursive joins. Contemporary trends in NoSQL DML include aggregation pipelines, as in MongoDB's $match, $group, and $sort stages, which process streams of documents for analytics like summing values across collections; integration of full-text search via operators like $text for relevance scoring; and a focus on scalability, where operations favor horizontal distribution over strict ACID compliance. For example, Amazon DynamoDB's UpdateItem API performs upserts by modifying attributes or creating items if absent, often under eventual consistency to ensure high availability in multi-region setups.^[49]^[50]^[51]

Standards and Implementations

ANSI/ISO Standards

The ANSI/ISO standards for Data Manipulation Language (DML) primarily evolved through the standardization of SQL, beginning with the inaugural SQL-86 standard published by ANSI in 1986 and adopted by ISO in 1987, which established core DML operations including SELECT, INSERT, UPDATE, and DELETE.^[52] Subsequent revisions built upon this foundation, with SQL-89 providing minor clarifications, SQL-92 introducing entry-level compliance under FIPS 127-2 for broader interoperability, and progressive updates through SQL:1999, SQL:2003, and beyond, culminating in SQL:2023, which incorporates advanced DML extensions such as support for polymorphic tables and property graph queries via SQL/PGQ.^[22]^[53] SQL standards define compliance levels to ensure consistent DML functionality across implementations, with the Core level mandating essential features like SELECT statements supporting joins, basic INSERT, UPDATE, and DELETE operations, as well as subqueries and GROUP BY clauses.^[54] SQL-92 introduced conformance levels including Entry, Transitional, Intermediate, and Full, refined in later versions like SQL:1999 with Core as a baseline and additional optional feature packages including scrollable cursors for dynamic result set navigation and triggers for automated data modifications, though triggers often intersect with Data Control Language (DCL) aspects.^[55]^[56] Database management systems (DBMS) achieve certification through conformance testing against these levels; for instance, PostgreSQL demonstrates high Core SQL compliance by supporting at least 170 of the 177 mandatory features in SQL:2023, including comprehensive DML operations, while extending to many optional ones for full interoperability.^[57] In contrast, NoSQL databases lack a dedicated ISO standard as of 2025, resulting in varied DML-like query languages without formal compliance frameworks.^[58] These ANSI/ISO standards have fostered global adoption of SQL-based DML, enabling code portability across vendors like Oracle, IBM Db2, and open-source systems, thereby reducing vendor lock-in and promoting standardized data manipulation practices worldwide.^[4]

Vendor-Specific Variants

Major database management system (DBMS) vendors extend the ANSI/ISO SQL standards for data manipulation language (DML) to incorporate proprietary features that enhance performance, integration with application logic, or support for specialized data types. These extensions often build on core SQL DML statements like INSERT, UPDATE, and DELETE but introduce vendor-specific syntax and behaviors to address enterprise needs such as auditing, concurrency, or domain-specific operations. While these innovations improve functionality within their ecosystems, they can introduce portability challenges when migrating between systems. Oracle Database integrates DML deeply with its Procedural Language/SQL (PL/SQL) extension, allowing embedded DML statements within procedures, functions, and triggers for modular application development. PL/SQL enables developers to perform data modifications alongside procedural logic, such as conditional updates or loops over result sets, which streamlines complex transactions. Autonomous transactions, a PL/SQL-specific feature, permit nested transactions that operate independently of the parent transaction, suspending the main one until the autonomous block commits or rolls back; this is useful for logging or auditing without affecting the outer transaction's integrity. Additionally, Oracle's MODEL clause extends SELECT statements (which can incorporate DML in analytic contexts) to treat query results as multidimensional arrays, applying spreadsheet-like rules for calculations on measures across dimensions, facilitating advanced data warehousing manipulations. Microsoft SQL Server employs Transact-SQL (T-SQL) to augment standard DML with features tailored for auditing and conditional operations. The TOP clause in T-SQL supports "WITH TIES" to include additional rows when sorting yields equal values, enabling precise limiting of result sets in UPDATE or DELETE statements for scenarios like ranking-based modifications. The OUTPUT clause captures inserted, updated, or deleted data (including pseudocolumns like $action) directly from DML statements, often directing it to a table for audit trails or synchronization without additional queries. SQL Server's MERGE statement, while based on SQL standards, includes T-SQL enhancements such as TOP clause application post-join to limit affected rows and integration with OUTPUT for capturing operation details, optimizing upsert-like patterns in data integration workflows. MySQL and its fork MariaDB provide non-standard DML extensions focused on efficiency and geospatial handling. The INSERT ... ON DUPLICATE KEY UPDATE syntax performs an upsert: inserting a row if no unique key conflict exists, or updating specified columns otherwise, which reduces round-trips compared to separate IF EXISTS checks. For spatial data, MySQL's DML integrates with GIS functions like ST_Transform for coordinate system conversions during INSERT or UPDATE, and ST_Buffer for geometric operations, enabling direct manipulation of geometry types (e.g., POINT, LINESTRING) in tables with spatial indexes. PostgreSQL adheres closely to SQL standards but adds extensions for complex data types in DML. Its INSERT ... ON CONFLICT (UPSERT) clause handles conflicts on unique constraints or exclusions by specifying DO UPDATE actions, including WHERE conditions for selective modifications, which atomically manages inserts and updates. PostgreSQL supports array types natively, allowing DML to manipulate multidimensional arrays via operators like array_append or unnest in UPDATE statements for bulk element operations. For JSON/JSONB, DML integrates operators such as || for concatenation and #> for path extraction, enabling targeted updates to nested structures (e.g., UPDATE table SET jsonb_column = jsonb_set(jsonb_column, '{key}', '"value"')) without full document replacement. Other vendors offer niche DML variants. IBM Db2's pureXML feature stores XML documents natively in columns, supporting DML operations like INSERT of well-formed XML via XMLPARSE or UPDATE with XMLQUERY for in-place modifications using XPath/XQuery, preserving hierarchical structure without shredding. SQLite, designed for embedded use, uses PRAGMA statements as lightweight extensions to influence DML behavior, such as PRAGMA journal_mode=WAL for concurrent writes during multi-statement transactions or PRAGMA synchronous=NORMAL for faster INSERT/UPDATE at minor durability cost. In cloud environments, Amazon Aurora (compatible with MySQL/PostgreSQL) incorporates serverless scaling into DML execution; as of 2025, Aurora Serverless v2 automatically adjusts capacity from 0 to 256 Aurora Capacity Units (ACUs) based on DML workload, pausing at 0 ACUs for idle periods to optimize costs without manual intervention.^[59] These vendor-specific DML variants trade enhanced functionality—such as integrated auditing in SQL Server or native XML handling in Db2—for reduced portability, as code relying on proprietary syntax like Oracle's MODEL clause or MySQL's ON DUPLICATE KEY UPDATE requires rewriting for other systems, potentially increasing migration efforts.

References

[1]
3 About DML Statements and Transactions - Oracle Help Center
Data manipulation language (DML) statements add, change, and delete Oracle Database table data. A transaction is a sequence of one or more SQL statements.
[2]
Transact-SQL statements - SQL Server - Microsoft Learn
Nov 22, 2024 · Data Manipulation Language (DML) affect the information stored in the database. Use these statements to insert, update, and change the rows in ...
[3]
Data Manipulation Language (DML) commands
This topic provides links to all the DML commands, grouped by category. General DML Commands for inserting, deleting, updating, and merging data in Snowflake ...
[4]
The SQL Standard - ISO/IEC 9075:2023 (ANSI X3.135)
Oct 5, 2018 · SQL (Structured Query Language) standard for relational database management systems is ISO/IEC 9075:2023, with origins in ANSI X3.135.Missing: DML | Show results with:DML
[5]
Data manipulation language - Db2 for i SQL - IBM
Data manipulation language (DML) describes the portion of SQL that manipulates or controls data. Retrieving data using the SELECT statement. The SELECT ...
[6]
What is SQL? - Structured Query Language (SQL) Explained - AWS
Structured query language (SQL) is a programming language for storing and processing information in a relational database.
[7]
Introduction to SQL
Data Manipulation Language (DML): manipulate data; CRUD: Create: Insert; Read; Update; Delete. Data Administration Language: for DB administration such as user ...
[8]
[PDF] Database Foundations - 6-4 Data Manipulation Language (DML)
Objectives. This lesson covers the following objectives: • Describe the purpose of the data manipulation language (DML). • Explain the DML operations that are ...
[9]
SQL Language Reference
### Summary of DDL, DML, DCL, and TCL in SQL
[10]
MySQL :: MySQL 8.0 Reference Manual :: 15 SQL Statements
### Summary of SQL Statement Categories in MySQL 8.0
[11]
Transact-SQL statements - SQL Server
### Summary of DDL, DML, DCL, and TCL in Transact-SQL
[12]
[PDF] NBS HANDBOOK 113 CODASYL Data Description Language
The semantics and syntax of the Data Manipulation Language detailed in the October 1969 DBTG Report were proposed not only as an extension to COBOL, but also as ...
[13]
Data base task group report to the CODASYL programming ...
Apr 1, 1971 · ... data manipulation language features for their language ... Data base task group report to the CODASYL programming language committee, October 1969.
[14]
[PDF] “A Veritable Bucket of Facts” Origins of the Data Base Management ...
“A Veritable Bucket of Facts”. Origins of the Data Base Management System ... [30] CODASYL Systems Committee CODASYL Data Base Task. Group: April 1971 Report.
[15]
Information Management Systems - IBM
The commercial product had two main parts: a database management system centered on a hierarchical data model, and software for processing high-volume ...Missing: manipulation | Show results with:manipulation
[16]
[PDF] CODASYL Data-Base Management Systems ROBERT W. TAYLOR
This paper presents in tutorial fashion the concepts, notation, aud data-base languages that were defined by the CODASYL Data Description Language and.
[17]
A relational model of data for large shared data banks
A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced.
[18]
[PDF] System R: Relational Approach to Database Management
System R is a database management system which provides a high level relational data interface. The system provides a high level of data independence by ...
[19]
History of SQL
In 1979, Relational Software, Inc. (now Oracle) introduced the first commercially available implementation of SQL. Today, SQL is accepted as the standard RDBMS ...Missing: adoption | Show results with:adoption
[20]
Celebrating 40 years of Db2: Mission Critical Workloads
Boyce to create Structured Query Language (SQL). IBM Db2 was announced in 1983, and V1.1 launched on MVS in 1985. Don Haderle, a retired IBM Fellow and ...
[21]
[PDF] database language - SQL - NIST Technical Series Publications
2) A module language and a data manipulation language (SQL-DML), for declaring the database procedures and executable statements of a specific database ...
[22]
The Evolution of SQL: From SQL-86 to SQL-2023 - Coginiti
Jan 18, 2024 · The standard defined the fundamental framework of SQL, which included basic Data Definition Language (DDL) and Data Manipulation Language (DML) ...Missing: formalization | Show results with:formalization
[23]
Why SQL and Postgres Still Rule the Data World in 2025 - Simplyblock
Oct 15, 2025 · The SQL of 2025 is no longer the SQL of 1970. SQL has incorporated important changes on its way, including big data and distributed systems ...
[24]
Aggregate Functions (Transact-SQL) - SQL Server - Microsoft Learn
May 23, 2023 · Aggregate functions are often used with the GROUP BY clause of the SELECT statement. All aggregate functions are deterministic. In other ...
[25]
Aggregate Functions - Oracle Help Center
You use aggregate functions in the HAVING clause to eliminate groups from the output based on the results of the aggregate functions, rather than on the values ...
[26]
Joins (SQL Server) - Microsoft Learn
Aug 21, 2025 · This article explains how joins work, when to use different join types, and how the Query Optimizer selects the most efficient join algorithm ...Join Fundamentals · Hash Joins · Adaptive Joins
[27]
Subqueries (SQL Server) - Microsoft Learn
Aug 21, 2025 · Many Transact-SQL statements that include subqueries can be alternatively formulated as joins. ... SOME is an ISO standard equivalent for ANY .
[28]
Common table expressions for ANSI SQL - AWS Documentation
Common Table Expressions (CTE) are part of the ANSI standard since SQL:1999, simplify queries and make them more readable by defining a temporary view, or ...
[29]
WITH common_table_expression (Transact-SQL) - Microsoft Learn
Jun 25, 2025 · Specifies a temporary named result set, known as a common table expression (CTE). This is derived from a simple query and defined within the execution scope of ...
[30]
Documentation: 18: 3.5. Window Functions - PostgreSQL
The OVER clause determines exactly how the rows of the query are split up for processing by the window function. The PARTITION BY clause within OVER divides the ...<|separator|>
[31]
Window Functions - DuckDB
Window function are available in SQL since SQL:2003 and are supported by major SQL database systems. Examples. Generate a row_number column to enumerate rows:.
[32]
MERGE (Transact-SQL) - SQL Server - Microsoft Learn
Sep 8, 2025 · The MERGE statement runs insert, update, or delete operations on a target table from the results of a join with a source table.
[33]
Use of the SQL procedural language to create a stored procedure
An SQL procedure can include declarations of variables, conditions, cursors ... These extensions provide a procedural language for writing stored procedures ...
[34]
DECLARE CURSOR (Transact-SQL) - SQL Server - Microsoft Learn
Nov 22, 2024 · Defines the attributes of a Transact-SQL server cursor, such as its scrolling behavior and the query used to build the result set on which the cursor operates.Syntax · Arguments
[35]
IMS 15.5 - DL/I calls for database management - IBM
The Get Unique Record ( GUR ) call is used to retrieve entire records from the IMS catalog database. The records are returned as XML instance documents. ISRT ...
[36]
IMS DL/I Function Codes - IMS-DB Tutorial - IBMMainframer
IMS DL/I Function Codes ; RETRIEVAL CODES. GU GET UNIQUE. GN GET NEXT ; UPDATION CODES. ISRT INSERT. DLET DELETE ; GET-HOLD CODES. GHU GET-HOLD UNIQUE. GHN GET- ...
[37]
The 78 CODASYL database model - ACM Digital Library
The 73 and 78 DDLs both can specify procedures to be called upon execution of DML commands oper- ating on a schema, area, record, data item, or set. The 73 ...
[38]
[PDF] Accessing a Functional Database via CODASYL-DML Transactions.
First a FIND command identifies a record to be manipulated and then a second DML command is issued to perform an operation. Most importantly, it is the FIND ...
[39]
A Comparison of the Relational and CODASYL Approaches to Data ...
Data definition, manipulation, protection, and independence, and System performance are important factors when studying and evaluating data-base.
[40]
What Is NoSQL? NoSQL Databases Explained - MongoDB
Soft state means that the system allows temporary inconsistencies before eventually achieving consistency automatically over time. BASE compliance ensures high ...When to Use NoSQL · NoSQL Data Models · NoSQL Vs SQL DatabasesMissing: trends | Show results with:trends
[41]
MongoDB CRUD Operations - Database Manual
CRUD operations in MongoDB create, read, update, and delete documents. Create adds new documents, read retrieves, update modifies, and delete removes.Insert Documents · Query Documents · Update Documents · Delete Documents
[42]
Query Documents - Database Manual - MongoDB Docs
You can query documents in MongoDB by using the following methods: Use the Select your language drop-down menu in the upper-right to set the language of the ...Query on Embedded/Nested... · Query an Array · Query an Array of Embedded...Missing: delete | Show results with:delete
[43]
db.collection.update() (mongosh method) - Database Manual
By default, the db.collection.update() method updates a single document. Include the option multi: true to update all documents that match the query criteria.
[44]
SET | Docs - Redis
GET -- Return the old string stored at key, or nil if key did not exist. An error is returned and SET aborted if the value stored at key is not a string. Note: ...Setex · Setnx · MSET
[45]
DEL | Docs - Redis
DEL key [key ...] Available since: Redis Open Source 1.0.0; Time complexity: O(N) where N is the number of keys that will be removed.DUMP · Acl deluser · JSON.DEL · CF.DEL
[46]
Data Manipulation | Apache Cassandra Documentation
This section describes the statements supported by CQL to insert, update, delete and query data. SELECT. Querying data from data is done using a SELECT ...
[47]
MATCH - Cypher Manual - Neo4j
The MATCH clause enables you to define specific patterns that the database will search for within its graph structure. The MATCH clause can specify the ...
[48]
MERGE - Cypher Manual - Neo4j
MERGE can be used in conjunction with preceding MATCH and MERGE clauses to create a relationship between two bound nodes m and n , where m is returned by MATCH ...
[49]
UpdateItem - Amazon DynamoDB - AWS Documentation
UpdateItem edits an existing item's attributes, or adds a new item if it doesn't exist. You can put, delete, or add attribute values.
[50]
The History of SQL Standards | LearnSQL.com
Dec 8, 2020 · The first SQL standard was SQL-86. It was published in 1986 as ANSI standard and in 1987 as International Organization for Standardization (ISO) standard.Sql-86 · Sql-92 · Sql:1999Missing: DML | Show results with:DML<|separator|>
[51]
Get the SQL Standard: ISO 9075 or use these free resources
A new part was added: 9075-16:2023 Property Graph Queries (SQL/PGQ). The ... Polymorphic table functions in SQL (ISO/IEC TR 19075-7:2017). Also note ...
[52]
SQL-92 Levels: Entry, Intermediate, Full versus Core SQL:1999 ...
SQL Conformance Levels. The SQL standard is huge. More than 4300 pages in its SQL:2023 incarnation. No single implementation can ever implement all features.Missing: Enhanced | Show results with:Enhanced
[53]
Levels of Conformance - SQL in a Nutshell, 3rd Edition [Book]
SQL99 is built upon SQL92's levels of conformance. SQL92 first introduced levels of conformance by defining three categories: Entry, Intermediate, and Full.
[54]
Chapter 1 – Introduction - SQL 99
The SQL3 Standard identifies two levels of SQL conformance which a DBMS may claim: Core SQL support and enhanced SQL support. ... In order to claim enhanced ...
[55]
Documentation: 18: Appendix D. SQL Conformance - PostgreSQL
Out of 177 mandatory features required for full Core conformance, PostgreSQL conforms to at least 170. In addition, there is a long list of supported optional ...
[56]
NoSQL Database Comparison - ScyllaDB
Unlike the ANSI/ISO processes for the SQL standard, there is no industry standard around implementing NoSQL systems. The exact manner of supporting various ...