Data manipulation language
A data manipulation language (DML) is a computer programming language used for adding (inserting), deleting, and modifying (updating) data in a database. In the context of the Structured Query Language (SQL), standardized under ISO/IEC 9075, DML enables the retrieval, insertion, modification, and deletion of data within relational database management systems (RDBMS). SQL's DML provides core mechanisms for interacting with stored data without altering the database schema.[1]
The primary DML statements in SQL include SELECT, which retrieves data from one or more tables based on specified criteria; INSERT, which adds new rows to a table; UPDATE, which modifies existing rows; and DELETE, which removes rows from a table.[2] Additional statements like MERGE in some systems combine insert, update, and delete operations in a single command for efficient data synchronization.[3] These operations are typically executed within transactions to ensure data integrity, where changes remain provisional until committed or rolled back.[1]
SQL's DML distinguishes itself from other sublanguages, such as Data Definition Language (DDL) for schema management and Data Control Language (DCL) for access permissions, by focusing exclusively on data content manipulation.[2] The concept of DML predates SQL, with origins in early database systems, while SQL's DML was first standardized in 1986 (ANSI X3.135 and ISO 9075) and has evolved through revisions, including ISO/IEC 9075:2023 with enhancements for modern data processing.[4] Its widespread use across major RDBMS platforms, including Oracle, SQL Server, MySQL, and Snowflake, underscores its foundational role in database operations and application development.[3]
Overview
Definition and Purpose
Data Manipulation Language (DML) is a specialized component of database programming languages designed to handle operations for selecting, inserting, updating, and deleting data stored in databases.[1][5] Unlike schema definition or access control mechanisms, DML exclusively targets the manipulation of existing data instances within database tables.[1][6]
The core purpose of DML is to facilitate efficient data retrieval through querying and to support modifications via the fundamental CRUD operations—Create (inserting new records), Read (retrieving data), Update (altering existing records), and Delete (removing records)—which underpin most database-driven applications.[6][7] These capabilities enable seamless interactions between applications and databases, accommodating diverse use cases such as generating reports from queried datasets or processing transactions that involve real-time data updates.[8] By focusing on data content rather than structural elements, DML ensures that operations remain isolated from changes to the database schema, such as those handled by Data Definition Language (DDL).[1]
Key characteristics of DML include its primarily declarative nature in standard implementations like SQL, where users describe the desired outcome (e.g., which data to retrieve or modify) without specifying the underlying execution steps, allowing the database engine to optimize performance. DML emphasizes operations on individual data rows or sets of instances, preserving the separation between content manipulation and database architecture to avoid unintended structural impacts.[1][5]
In database management systems (DBMS), DML's role is essential for upholding data integrity and system efficiency, as it groups related manipulations into transactions that can be committed for permanence or rolled back to restore prior states, thereby preventing inconsistencies during concurrent access.[1][8] This separation of data logic from schema alterations minimizes risks to overall database stability while supporting scalable application development.[1] SQL represents the most widely adopted DML standard, as defined in ANSI/ISO specifications for relational databases.[5]
Distinction from Other Database Languages
Data Definition Language (DDL) focuses on defining and modifying the database schema and structure, using statements such as CREATE, ALTER, and DROP to create, alter, or remove objects like tables, indexes, and views. In contrast, Data Manipulation Language (DML) is dedicated to retrieving, inserting, updating, and deleting data within those pre-established structures, without altering the schema itself; DML operations thus depend on schemas defined by DDL to function effectively.[9][10][11]
Data Control Language (DCL) handles authorization and access management, employing commands like GRANT and REVOKE to assign or withdraw permissions on database objects and operations. While DML statements require DCL-defined privileges to execute successfully, DML itself lacks mechanisms for setting or enforcing security policies, ensuring that access control remains a distinct layer.[9]
Transaction Control Language (TCL) manages the grouping and finalization of database operations to maintain consistency, through statements such as COMMIT, ROLLBACK, and SAVEPOINT. DML actions occur within TCL-controlled transactions, which provide the framework for atomicity, consistency, isolation, and durability (ACID) properties, but DML does not directly control transaction boundaries.[9]
This separation of concerns into DDL, DCL, DML, and TCL in SQL sublanguages enables modular DBMS architecture by isolating schema management, data operations, security enforcement, and transaction oversight, thereby improving maintainability, reducing errors, and supporting robust security and ACID compliance in implementations like Oracle Database and MySQL.[9][10] For instance, in some systems, executing a DDL statement implicitly commits any ongoing transaction, illustrating integration where structural changes interact with transaction controls without merging the languages.[9]
History
Origins in Early Database Systems
The origins of data manipulation language (DML) trace back to the late 1960s, when early database systems required structured mechanisms for accessing and modifying data in complex, non-relational models. One foundational influence was the Conference on Data Systems Languages (CODASYL), whose Data Base Task Group (DBTG) released its initial report in October 1969, introducing specifications for a navigational DML tailored to network databases.[12] This DML extended COBOL with commands for record-level operations, enabling programmers to traverse linked records via sets and pointers, as seen in early implementations like the Integrated Database Management System (IDMS), which commercialized these concepts for mainframe environments.[13] The 1969 report emphasized procedural navigation over declarative queries, reflecting the era's focus on efficient handling of interconnected data structures in batch-oriented processing.[14]
Parallel to CODASYL's efforts, IBM developed the Information Management System (IMS) in 1968, originally as part of the ICS/DL/I project for NASA's Apollo program to manage inventory and bills of materials for rocket components.[15] IMS employed a hierarchical model, organizing data in tree-like parent-child relationships, and introduced the Data Language/Interface (DL/I) as its core DML for data access and updates.[15] DL/I calls, such as GU (Get Unique) and ISRT (Insert), allowed sequential or qualified navigation along hierarchical paths, supporting record-oriented manipulation in high-volume transaction environments.[15] This approach prioritized pointer-based traversal for parent-dependent child segments, making it suitable for structured, bill-of-materials applications in mainframe systems.[14]
These early DMLs introduced key concepts like record-oriented manipulation, where operations targeted individual records rather than sets or relations, and pointer-based navigation to follow predefined links in network or hierarchical schemas.[16] They emphasized batch processing for large-scale data handling, often in COBOL-embedded code, but suffered limitations such as the absence of declarative querying, which forced programmers to specify exact access paths and complicated ad-hoc retrievals.[16] CODASYL's set-based navigation and IMS's segment hierarchies, while innovative, required detailed schema knowledge and lacked flexibility for evolving relationships.[14]
The drive toward these navigational DMLs stemmed from the rapid growth of data volumes in mainframe environments during the 1960s, fueled by affordable disk storage and the shift to third-generation computers with online terminals, which overwhelmed traditional file systems and necessitated more systematic data access methods.[14] By the early 1970s, these limitations—particularly the procedural rigidity amid increasing data complexity—paved the way for relational models that promised simpler, set-oriented manipulation.[14]
Standardization and Evolution
The foundations of data manipulation language (DML) in relational databases were laid by Edgar F. Codd's 1970 relational model, which proposed a declarative approach to data querying and manipulation using set-based operations on relations, moving away from navigational methods in earlier systems.[17] This model emphasized non-procedural specifications where users describe what data is needed rather than how to retrieve it, influencing the design of subsequent DMLs.[17]
IBM's System R prototype, initiated in 1974 under Codd's guidance, implemented this model through the SEQUEL language (later renamed SQL), introducing core DML operations like SELECT for querying and INSERT for data addition in a prototype relational database management system (RDBMS).[18] The project demonstrated the feasibility of declarative DML, achieving high data independence and set-oriented processing, which became hallmarks of modern SQL-based DML.[18] Commercial adoption accelerated with Oracle's release of Version 2 in 1979, the first SQL implementation available outside IBM, followed by IBM's DB2 in 1983, which integrated SQL DML into production environments for mainframe systems.[19][20]
The first formal standardization occurred with ANSI's SQL-86 in 1986, which codified essential DML statements such as SELECT, INSERT, UPDATE, and DELETE, establishing a baseline for portable relational DML across systems.[21] Subsequent revisions by ANSI and ISO refined and expanded DML capabilities: SQL-92 introduced outer joins and enhanced query expressiveness for complex data retrieval; SQL:2003 added window functions for advanced analytics over partitions of result sets; SQL:2011 incorporated temporal tables to support time-based data manipulation; and SQL:2023 extended DML with native JSON handling and property graph queries to address semi-structured and graph data integration.[4][22]
This standardization process drove a paradigm shift from procedural, record-at-a-time operations in pre-relational databases to efficient set-based DML, enabling optimizations like query rewriting and index utilization that scaled with growing data volumes.[17] The adoption in commercial DBMS like Oracle and DB2 further propelled this transition, as vendors aligned with ANSI/ISO to ensure interoperability while leveraging relational DML's declarative nature for performance gains.[19][20]
In modern contexts as of 2025, SQL DML continues to evolve through ISO efforts, integrating with big data ecosystems via features like JSON and graph extensions in SQL:2023, with ongoing work toward hybrid querying that bridges relational and distributed systems without delving into non-relational specifics.[4][23]
DML in SQL
Core DML Statements
The core Data Manipulation Language (DML) statements in SQL—SELECT, INSERT, UPDATE, and DELETE—enable users to retrieve, add, modify, and remove data from relational database tables, forming the foundation for data interaction in standardized SQL implementations. These statements adhere to the syntax and semantics defined in the ISO/IEC 9075 SQL standard, with practical implementations in systems like PostgreSQL that closely follow ANSI/ISO guidelines. They support essential operations while integrating with transaction control mechanisms to ensure data consistency; for instance, DML statements can be grouped into transactions managed by statements such as COMMIT and ROLLBACK. General syntax rules across these statements include the use of joins in the FROM clause to combine data from multiple tables and subqueries in the WHERE clause for complex filtering, with error handling for constraint violations like foreign keys raising exceptions to prevent invalid data modifications.
The SELECT statement serves as the primary mechanism for querying and retrieving data from one or more tables, allowing users to specify columns, filter rows, and organize results for analysis or reporting. Its basic syntax is:
SELECT [ALL | DISTINCT] { * | expression [[AS] output_name] } [, ...]
FROM from_item [, ...]
[WHERE condition]
[GROUP BY grouping_element [, ...]]
[ORDER BY expression [ASC | DESC] [, ...]];
SELECT [ALL | DISTINCT] { * | expression [[AS] output_name] } [, ...]
FROM from_item [, ...]
[WHERE condition]
[GROUP BY grouping_element [, ...]]
[ORDER BY expression [ASC | DESC] [, ...]];
Here, the FROM clause identifies source tables or views, potentially incorporating INNER JOIN, LEFT JOIN, or other join types to relate data across tables; the WHERE clause applies conditions, which may include subqueries like (SELECT ... WHERE ...) for nested filtering. GROUP BY groups rows by common values in specified columns, enabling aggregate operations, while ORDER BY sorts the output. For example, SELECT name, age FROM employees WHERE age > 30 ORDER BY age DESC; retrieves and sorts employee data meeting the age criterion, avoiding unnecessary full-table scans through selective conditions. This statement does not modify data but supports read-only retrieval, making it essential for data exploration in relational databases.
INSERT facilitates adding new rows to a table, supporting both single-row and bulk insertions to populate or extend datasets efficiently. The core syntax for single- or multi-row insertion is:
INSERT INTO table_name [(column_name [, ...])]
{ VALUES ( {expression | DEFAULT} [, ...] ) [, ...] | DEFAULT VALUES | query };
INSERT INTO table_name [(column_name [, ...])]
{ VALUES ( {expression | DEFAULT} [, ...] ) [, ...] | DEFAULT VALUES | query };
When using VALUES, explicit values or DEFAULT placeholders are provided for columns, with omitted columns defaulting to their defined values or NULL if no default exists; the query variant, such as INSERT INTO target SELECT * FROM source;, enables bulk transfer from another table or subquery. Constraints like primary keys or foreign keys are enforced during insertion, triggering errors if violations occur, such as attempting to insert a foreign key value not present in the referenced table. For instance, INSERT INTO employees (name, department_id) VALUES ('Alice', 5); adds a single row while respecting referential integrity, whereas multi-row forms like INSERT INTO employees VALUES ('Bob', 3), ('Charlie', 4); handle batch operations to minimize overhead.
UPDATE modifies existing rows in a table by altering column values based on specified criteria, ensuring targeted changes without affecting unrelated data. Its fundamental syntax in the core ISO standard is:
UPDATE table_name [ [ AS ] alias ]
SET column_name = { expression | DEFAULT | NULL } [, ...]
[ WHERE condition ];
UPDATE table_name [ [ AS ] alias ]
SET column_name = { expression | DEFAULT | NULL } [, ...]
[ WHERE condition ];
The SET clause assigns new values, which can reference expressions, defaults, or subqueries for values from other tables; for example, UPDATE employees SET salary = salary * 1.1 WHERE dept_id IN (SELECT id FROM departments WHERE budget > 100000); propagates department-based adjustments across related rows using a subquery. The WHERE clause is crucial to avoid full-table scans and unintended modifications, limiting updates to matching conditions that may involve subqueries or joins within the condition. If an update would violate constraints, such as a foreign key mismatch, the database raises an error, preserving data integrity; some implementations support multi-table updates via extensions like a FROM clause.[24]
DELETE removes rows from a table that satisfy a given condition, providing a precise way to eliminate obsolete or erroneous data while maintaining referential dependencies. The standard syntax is:
DELETE FROM table_name [ [ AS ] alias ]
[ WHERE condition ];
DELETE FROM table_name [ [ AS ] alias ]
[ WHERE condition ];
Without a WHERE clause, all rows are deleted, but typically, conditions with subqueries or joins—e.g., DELETE FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE status = 'inactive');—target specific subsets to prevent mass removal. Cascading deletes can be configured via foreign key constraints with ON DELETE CASCADE, automatically propagating deletions to dependent tables upon a successful primary row removal. Unlike TRUNCATE, which is a Data Definition Language (DDL) operation that swiftly clears all rows without logging or conditions, DELETE is logged for rollback purposes and respects WHERE filters, making it suitable for selective cleanup. Constraint violations, such as deleting a row referenced by a foreign key without CASCADE, result in errors to enforce relational integrity. Some implementations extend DELETE with clauses like USING for multi-table operations.[25]
Advanced Features and Clauses
Advanced features in SQL Data Manipulation Language (DML) extend the core capabilities of SELECT, INSERT, UPDATE, and DELETE statements by incorporating analytical processing, complex relational operations, and procedural constructs. These enhancements, introduced progressively through SQL standards, enable more sophisticated data summarization, querying across multiple tables, and conditional modifications without requiring multiple separate statements. They are particularly valuable in scenarios involving large datasets where efficiency and readability are paramount.
Aggregate functions form a foundational advanced element, allowing summarization of data within SELECT statements. Common aggregates include COUNT for tallying rows, SUM for totalizing numeric values, and AVG for computing means, typically paired with the GROUP BY clause to partition results into subsets based on specified columns. For instance, to summarize sales by region, a query might use SELECT [region](/page/Region), [SUM](/page/Sum)(sales) AS total_sales FROM orders GROUP BY [region](/page/Region);. The HAVING clause further refines these groupings by applying conditions to aggregate results, such as filtering groups where the average exceeds a threshold: SELECT [region](/page/Region), AVG(sales) AS avg_sales FROM orders GROUP BY [region](/page/Region) HAVING AVG(sales) > 1000;. This mechanism, standardized in SQL-92, supports efficient data analysis without post-query processing.[26][27]
Joins and subqueries enable intricate data retrieval by combining or nesting relations. The INNER JOIN syntax, formalized in the ISO SQL standard, merges rows from two tables based on a matching condition, as in SELECT e.name, d.department FROM employees e INNER JOIN departments d ON e.dept_id = d.id;, returning only paired records. OUTER JOIN variants—LEFT, RIGHT, and FULL—preserve unmatched rows from one or both sides, facilitating comprehensive reporting. Subqueries add nesting for conditional logic; a correlated subquery references outer query columns, enabling dynamic filtering like SELECT name FROM employees e WHERE salary > (SELECT AVG(salary) FROM employees WHERE dept_id = e.dept_id);, which compares each employee's pay to their department's average. These constructs, part of SQL-92 and refined in later standards, optimize complex filtering over flat operations.[28][29]
Common Table Expressions (CTEs), introduced in the SQL:1999 standard, utilize the WITH clause to define temporary named result sets within a query, improving modularity and recursion handling. A basic CTE might appear as WITH dept_avg AS (SELECT dept_id, AVG(salary) AS avg FROM employees GROUP BY dept_id) SELECT e.name FROM employees e JOIN dept_avg da ON e.dept_id = da.dept_id WHERE e.salary > da.avg;, allowing reuse of the aggregated subresult without repetition. Recursive CTEs extend this for hierarchical data, such as traversing organizational structures: WITH RECURSIVE hierarchy AS (SELECT id, manager_id, name FROM employees WHERE manager_id IS NULL UNION ALL SELECT e.id, e.manager_id, e.name FROM employees e JOIN hierarchy h ON e.manager_id = h.id) SELECT * FROM hierarchy;. This feature enhances query readability and supports tree-like traversals in DML contexts.[30][31]
Window functions, added in SQL:2003, perform calculations across row sets without collapsing them via grouping, using the OVER clause for partitioning and ordering. The PARTITION BY subclause divides data into windows, while ORDER BY specifies sequence within each; for example, SELECT name, salary, ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rank FROM employees; assigns unique ranks per department based on descending salary. The RANK function handles ties by assigning shared values and skipping subsequent numbers, contrasting ROW_NUMBER's strict sequencing. These functions enable ranking, running totals, and moving averages—e.g., SUM(salary) OVER (ORDER BY hire_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) for cumulative sums—preserving all rows for detailed analytics.[32][33]
The MERGE statement, standardized in SQL:2003, supports upsert operations by conditionally inserting, updating, or deleting rows in a target table based on a source dataset. Its syntax integrates a join condition with WHEN clauses: MERGE INTO target t USING source s ON t.id = s.id WHEN MATCHED THEN UPDATE SET t.value = s.value WHEN NOT MATCHED THEN INSERT (id, value) VALUES (s.id, s.value);, atomically handling matches and non-matches to prevent duplicates and ensure synchronization. This single-statement approach reduces transaction overhead in bulk data loading scenarios, such as ETL processes.[34]
Procedural extensions integrate DML into stored procedures, allowing block-structured programming with control flow. SQL/PSM (Persistent Stored Modules), part of SQL:1999, defines syntax for procedures containing DECLARE, BEGIN...END blocks, and handlers for exceptions. Cursors facilitate row-by-row iteration: DECLARE cur CURSOR FOR SELECT * FROM employees; OPEN cur; FETCH cur INTO vars; CLOSE cur;, enabling fine-grained processing where set-based operations are infeasible, such as custom logging per record. These elements, implemented in systems like DB2 and SQL Server, extend DML for application-like logic within the database.[35][36]
DML in Non-Relational Databases
Hierarchical and Network Models
In hierarchical database models, data is organized in a tree-like structure where each record (or segment) has a single parent but can have multiple children, facilitating parent-child relationships. IBM's Information Management System (IMS), introduced in the late 1960s, exemplifies this model through its Data Language/I (DL/I), a procedural DML designed for navigating and manipulating hierarchical data. DL/I uses calls such as GU (Get Unique) to retrieve a specific segment by key, ISRT (Insert) to add new segments under an existing parent, and DLET (Delete) to remove a segment along with its dependents, ensuring structural integrity in tree navigation.[37] These operations are typically embedded in application programs written in languages like COBOL, requiring explicit path traversal from root to leaf segments for data access.[38]
Network database models extend the hierarchical approach by allowing records to have multiple parents and children through linked sets, forming a graph structure for more complex relationships. The CODASYL Data Base Task Group (DBTG) specifications from 1971 defined a DML that operates on these sets, incorporating concepts like schema currency (the current set type) and record currency (the current record instance) to maintain navigational context across operations. Key DML commands include FIND to locate records via qualifiers (e.g., FIND OWNER to position on a set owner or FIND NEXT MEMBER to traverse set members) and GET to retrieve the current record's data.[39] For instance, a navigational path might involve FIND OWNER followed by GET CHILDREN to access dependent records in a set, with updates like MODIFY or DELETE applied to the current currency.[40] These commands support batch-oriented processing, where multiple operations are sequenced in programs to handle linked data efficiently.[16]
Both models found extensive use in legacy applications on mainframe systems, particularly in banking and financial sectors where predictable, high-volume transaction patterns—such as account hierarchies or linked customer records—benefited from their performance advantages in known-access scenarios. IMS, for example, powered critical operations at institutions like First National Bank, enabling rapid retrieval and updates in hierarchical structures for customer data aggregation across departments.[15] Similarly, CODASYL implementations supported complex linkages in inventory and personnel systems on mainframes, offering low-latency navigation for predefined paths. However, their procedural nature introduced significant complexity for ad-hoc queries, as programmers had to manually code traversals without declarative support, often resulting in inflexible and error-prone applications that contributed to the widespread adoption of relational models in the 1980s.[41]
NoSQL and Document-Oriented Approaches
In NoSQL databases, data manipulation languages (DML) diverge from relational SQL paradigms to accommodate unstructured or semi-structured data, emphasizing flexibility, horizontal scalability, and distributed operations over rigid schemas. Document-oriented databases, a prominent NoSQL category, store data as JSON-like documents, enabling DML operations that handle nested structures and varying field sets without predefined tables. These approaches prioritize high-throughput writes and reads in large-scale environments, often trading immediate consistency for availability and partition tolerance as per the CAP theorem.[42]
Document stores like MongoDB exemplify this model through its query language, which supports CRUD operations via methods such as find() for querying documents, insertOne() or insertMany() for creation, updateOne() or updateMany() for modifications, and deleteOne() or deleteMany() for removals. These operations use a JSON-like syntax to manipulate embedded documents and arrays, allowing queries like db.collection.find({ "field": "value" }) to retrieve matching records efficiently. For instance, updates can target specific fields within nested objects, such as incrementing a counter in an array element, supporting complex data hierarchies without joins.[43][44][45]
Key-value stores, such as Redis, employ simple imperative commands for DML, including SET key value to insert or update a value, GET key to retrieve it, and DEL key to remove it, optimized for in-memory operations with sub-millisecond latency. These commands handle atomic operations on strings, hashes, or lists, facilitating caching and real-time applications. In column-family stores like Apache Cassandra, the Cassandra Query Language (CQL) provides a SQL-like interface for wide-column data, with statements like INSERT INTO table (key, column) VALUES (value1, value2), SELECT * FROM table WHERE key = value, UPDATE table SET column = value WHERE key = value, and DELETE FROM table WHERE key = value to manage distributed partitions across clusters. CQL's design supports tunable consistency levels, enabling writes that propagate asynchronously for scalability.[46][47][48]
Graph databases introduce pattern-matching DML through languages like Cypher in Neo4j, which uses declarative clauses such as MATCH (n:[Node](/page/Node) {property: value}) RETURN n to traverse relationships, CREATE (n:[Node](/page/Node) {property: value}) to add nodes or edges, and MERGE (n:[Node](/page/Node) {property: value}) to conditionally create or match patterns, avoiding duplicates in property graphs. This facilitates queries on interconnected data, such as finding shortest paths or communities, without recursive joins. Contemporary trends in NoSQL DML include aggregation pipelines, as in MongoDB's $match, $group, and $sort stages, which process streams of documents for analytics like summing values across collections; integration of full-text search via operators like $text for relevance scoring; and a focus on scalability, where operations favor horizontal distribution over strict ACID compliance. For example, Amazon DynamoDB's UpdateItem API performs upserts by modifying attributes or creating items if absent, often under eventual consistency to ensure high availability in multi-region setups.[49][50][51]
Standards and Implementations
ANSI/ISO Standards
The ANSI/ISO standards for Data Manipulation Language (DML) primarily evolved through the standardization of SQL, beginning with the inaugural SQL-86 standard published by ANSI in 1986 and adopted by ISO in 1987, which established core DML operations including SELECT, INSERT, UPDATE, and DELETE.[52] Subsequent revisions built upon this foundation, with SQL-89 providing minor clarifications, SQL-92 introducing entry-level compliance under FIPS 127-2 for broader interoperability, and progressive updates through SQL:1999, SQL:2003, and beyond, culminating in SQL:2023, which incorporates advanced DML extensions such as support for polymorphic tables and property graph queries via SQL/PGQ.[22][53]
SQL standards define compliance levels to ensure consistent DML functionality across implementations, with the Core level mandating essential features like SELECT statements supporting joins, basic INSERT, UPDATE, and DELETE operations, as well as subqueries and GROUP BY clauses.[54] SQL-92 introduced conformance levels including Entry, Transitional, Intermediate, and Full, refined in later versions like SQL:1999 with Core as a baseline and additional optional feature packages including scrollable cursors for dynamic result set navigation and triggers for automated data modifications, though triggers often intersect with Data Control Language (DCL) aspects.[55][56]
Database management systems (DBMS) achieve certification through conformance testing against these levels; for instance, PostgreSQL demonstrates high Core SQL compliance by supporting at least 170 of the 177 mandatory features in SQL:2023, including comprehensive DML operations, while extending to many optional ones for full interoperability.[57] In contrast, NoSQL databases lack a dedicated ISO standard as of 2025, resulting in varied DML-like query languages without formal compliance frameworks.[58]
These ANSI/ISO standards have fostered global adoption of SQL-based DML, enabling code portability across vendors like Oracle, IBM Db2, and open-source systems, thereby reducing vendor lock-in and promoting standardized data manipulation practices worldwide.[4]
Vendor-Specific Variants
Major database management system (DBMS) vendors extend the ANSI/ISO SQL standards for data manipulation language (DML) to incorporate proprietary features that enhance performance, integration with application logic, or support for specialized data types. These extensions often build on core SQL DML statements like INSERT, UPDATE, and DELETE but introduce vendor-specific syntax and behaviors to address enterprise needs such as auditing, concurrency, or domain-specific operations. While these innovations improve functionality within their ecosystems, they can introduce portability challenges when migrating between systems.
Oracle Database integrates DML deeply with its Procedural Language/SQL (PL/SQL) extension, allowing embedded DML statements within procedures, functions, and triggers for modular application development. PL/SQL enables developers to perform data modifications alongside procedural logic, such as conditional updates or loops over result sets, which streamlines complex transactions. Autonomous transactions, a PL/SQL-specific feature, permit nested transactions that operate independently of the parent transaction, suspending the main one until the autonomous block commits or rolls back; this is useful for logging or auditing without affecting the outer transaction's integrity. Additionally, Oracle's MODEL clause extends SELECT statements (which can incorporate DML in analytic contexts) to treat query results as multidimensional arrays, applying spreadsheet-like rules for calculations on measures across dimensions, facilitating advanced data warehousing manipulations.
Microsoft SQL Server employs Transact-SQL (T-SQL) to augment standard DML with features tailored for auditing and conditional operations. The TOP clause in T-SQL supports "WITH TIES" to include additional rows when sorting yields equal values, enabling precise limiting of result sets in UPDATE or DELETE statements for scenarios like ranking-based modifications. The OUTPUT clause captures inserted, updated, or deleted data (including pseudocolumns like $action) directly from DML statements, often directing it to a table for audit trails or synchronization without additional queries. SQL Server's MERGE statement, while based on SQL standards, includes T-SQL enhancements such as TOP clause application post-join to limit affected rows and integration with OUTPUT for capturing operation details, optimizing upsert-like patterns in data integration workflows.
MySQL and its fork MariaDB provide non-standard DML extensions focused on efficiency and geospatial handling. The INSERT ... ON DUPLICATE KEY UPDATE syntax performs an upsert: inserting a row if no unique key conflict exists, or updating specified columns otherwise, which reduces round-trips compared to separate IF EXISTS checks. For spatial data, MySQL's DML integrates with GIS functions like ST_Transform for coordinate system conversions during INSERT or UPDATE, and ST_Buffer for geometric operations, enabling direct manipulation of geometry types (e.g., POINT, LINESTRING) in tables with spatial indexes.
PostgreSQL adheres closely to SQL standards but adds extensions for complex data types in DML. Its INSERT ... ON CONFLICT (UPSERT) clause handles conflicts on unique constraints or exclusions by specifying DO UPDATE actions, including WHERE conditions for selective modifications, which atomically manages inserts and updates. PostgreSQL supports array types natively, allowing DML to manipulate multidimensional arrays via operators like array_append or unnest in UPDATE statements for bulk element operations. For JSON/JSONB, DML integrates operators such as || for concatenation and #> for path extraction, enabling targeted updates to nested structures (e.g., UPDATE table SET jsonb_column = jsonb_set(jsonb_column, '{key}', '"value"')) without full document replacement.
Other vendors offer niche DML variants. IBM Db2's pureXML feature stores XML documents natively in columns, supporting DML operations like INSERT of well-formed XML via XMLPARSE or UPDATE with XMLQUERY for in-place modifications using XPath/XQuery, preserving hierarchical structure without shredding. SQLite, designed for embedded use, uses PRAGMA statements as lightweight extensions to influence DML behavior, such as PRAGMA journal_mode=WAL for concurrent writes during multi-statement transactions or PRAGMA synchronous=NORMAL for faster INSERT/UPDATE at minor durability cost. In cloud environments, Amazon Aurora (compatible with MySQL/PostgreSQL) incorporates serverless scaling into DML execution; as of 2025, Aurora Serverless v2 automatically adjusts capacity from 0 to 256 Aurora Capacity Units (ACUs) based on DML workload, pausing at 0 ACUs for idle periods to optimize costs without manual intervention.[59]
These vendor-specific DML variants trade enhanced functionality—such as integrated auditing in SQL Server or native XML handling in Db2—for reduced portability, as code relying on proprietary syntax like Oracle's MODEL clause or MySQL's ON DUPLICATE KEY UPDATE requires rewriting for other systems, potentially increasing migration efforts.