XML database
An XML database is a specialized database management system designed to store, retrieve, manage, and query data in XML format, preserving its hierarchical structure and enabling efficient processing without conversion to other models like relational tables.[1] These systems support standards such as XQuery for querying, XPath for navigation, and XML Schema for validation and constraints, facilitating the handling of structured, semi-structured, and unstructured content.[2]
XML databases are categorized into two primary types: native XML databases, which use XML as the fundamental data model and store documents in their native form with dedicated indexing and retrieval mechanisms, and XML-enabled relational databases, which extend traditional relational database management systems (RDBMS) to incorporate XML storage and operations alongside SQL capabilities.[2] Native systems, such as eXist-db, offer scalability for millions of documents, ACID transaction support, and high-performance queries, making them suitable for applications like document management and web content repositories.[3] In contrast, XML-enabled RDBMS like Oracle XML DB provide interoperability between XML and relational data, supporting features such as B-tree indexing, full-text search, and protocol access via HTTP, FTP, or WebDAV.[2]
The development of XML databases arose in the late 1990s with the rise of XML as a standard for data exchange, initially through extensions in RDBMS like Oracle8i in 1999, which introduced basic XML support, evolving to full native integration in Oracle9i Release 2 in 2002.[4] These databases address challenges in handling semi-structured data, reducing the complexity and errors associated with "shredding" XML into relational schemas, and enabling faster development cycles for applications in sectors like government and finance.[1] By supporting W3C standards including XML 1.0, XQuery 1.0, and XPath 2.0, XML databases ensure portability and integration with modern web technologies.[2]
Fundamentals
Definition and History
An XML database is a data persistence system designed to store, manage, and retrieve data in XML format, where XML documents serve as the primary unit of data rather than fixed-schema tables or simple key-value pairs.[5] This approach enables the handling of hierarchical, semi-structured, and document-oriented data, preserving the native tree structure of XML while supporting operations like validation against schemas and transformation.[5]
The emergence of XML databases traces back to the late 1990s, coinciding with the World Wide Web Consortium's (W3C) release of XML 1.0 as a recommendation on February 10, 1998, which established XML as a standardized format for structured data exchange on the web.[6] Early prototypes and specifications followed, including the XML:DB API, a collaborative effort initiated in 2000 by the XML:DB Initiative to define a vendor-neutral interface for accessing XML database functionality.[7] The technology gained momentum in the early 2000s, fueled by the proliferation of web services—such as SOAP—and the demand for efficient management of document-centric applications in publishing, configuration files, and content syndication.[5]
W3C standardization efforts intensified around 2001, with the chartering of working groups to address XML querying and data models, building on earlier proposals like XML-QL from 1998.[8] A pivotal milestone came in 2007 with the publication of XQuery 1.0 as a W3C recommendation, introducing a Turing-complete query language capable of expressing complex operations over XML data sources, including joins, recursion, and functional programming constructs.[9][10] By the mid-2000s, XML databases saw widespread adoption in enterprise systems for integrating heterogeneous data in sectors like finance and healthcare, where XML's extensibility and schema support proved advantageous.[5]
Despite the ascendancy of JSON as a lighter alternative for web APIs and NoSQL stores in the 2010s, XML databases have persisted in niche applications through the 2020s, particularly where rigorous validation, interoperability with legacy standards, and complex hierarchical querying remain essential, as evidenced by ongoing support in major database systems like Oracle Database 26ai (as of 2025), though the XML DB Repository was deprecated in that release.[11][12]
Rationale and Benefits
XML's hierarchical and self-describing structure makes it particularly well-suited for managing semi-structured data, such as documents, configuration files, and reports, where rigid tabular schemas would impose unnecessary constraints.[13] This flexibility allows XML databases to handle varying data structures without predefined schemas, accommodating evolving information needs in applications like content management and data integration.[14] Unlike flat files or early semi-structured stores, XML databases offer persistent storage, efficient querying, and transactional support, enabling scalable management of complex, nested data hierarchies.[15]
Key benefits include native support for namespaces, which prevent name conflicts and enable modular reuse of vocabularies across documents, enhancing interoperability in diverse systems.[16] Validation through XML Schema Definition (XSD) ensures data integrity by enforcing structural and type constraints, facilitating reliable data exchange in protocols like SOAP web services.[17] Additionally, XML's text-based format is human-readable, aiding debugging and maintenance, while preserving document order and whitespace critical for formats like reports.[14]
Despite these advantages, XML's verbosity results in larger storage requirements compared to binary or relational formats, increasing overhead for large datasets.[15] Parsing XML introduces computational costs, particularly for deep hierarchies, making it less efficient for numerical or highly tabular data where relational models excel.[18] Nonetheless, the XML database market demonstrates niche viability, driven by ongoing needs in document-centric applications.
Types of XML Databases
XML-Enabled Databases
XML-enabled databases refer to relational database management systems (RDBMS) that have been extended to support XML data storage and processing alongside traditional tabular data. These systems typically store XML documents in dedicated columns, such as those using a native XML data type or as character large objects (CLOB), while relying on the underlying relational engine for management without providing fully native, schema-agnostic XML handling.[2][19][14]
Key features of XML-enabled databases include the ability to shred XML documents—extracting and mapping elements or attributes into relational tables for efficient querying—and support for importing and exporting XML data via standard SQL extensions. Shredding allows XML content to be decomposed into normalized relational structures, enabling the use of SQL joins and indexes on extracted data. Prominent examples include IBM DB2, which introduced XML columns with the pureXML feature in DB2 9 released in 2006, allowing storage of well-formed XML in native format within relational tables.[20][21] Oracle XML DB, integrated into the Oracle Database since release 9i in 2002, provides high-performance XML storage in XMLType columns and supports shredding through SQL functions. Microsoft SQL Server added the xml data type in version 2005, enabling native storage and methods like nodes() for shredding XML into rowsets.[22][14]
These databases offer advantages such as leveraging established SQL infrastructure for querying mixed relational and XML workloads, enabling hybrid storage models where XML complements structured data, and providing scalability through relational indexing techniques applied to shredded XML components. By integrating XML support into mature RDBMS platforms, organizations can handle semi-structured data without migrating to entirely new systems, thus maintaining performance for large-scale operations.[2][19]
For instance, in IBM DB2, a basic SQL query can extract elements from an XML column using XPath predicates, such as checking for the existence of a node:
sql
SELECT x.y
FROM table_name
WHERE XMLColumn EXISTS NODE '/root/y';
SELECT x.y
FROM table_name
WHERE XMLColumn EXISTS NODE '/root/y';
This query retrieves the value of element 'y' under 'root' only for rows where such a node exists in the XMLColumn, demonstrating seamless integration of XML path expressions within SQL.
Native XML Databases
Native XML databases are database management systems that store and query XML documents using XML itself as the core data model, treating collections of documents as the primary storage unit rather than decomposing them into relational tables or other structures. This approach allows for schema-optional storage, where documents can vary in structure without requiring predefined schemas, and preserves the inherent hierarchical, ordered, and semi-structured nature of XML data.[23][24]
These systems are document-centric, focusing on maintaining the full fidelity of XML documents to support complex navigation and retrieval over large corpora, often with built-in optimizations for tree-based structures and full-text search. They enable direct processing of XML queries without intermediate transformations, making them suitable for applications involving irregular or evolving data formats.[25][26]
Notable examples include MarkLogic, an enterprise multi-model database that handles XML alongside JSON and RDF for scalable content management; Virtuoso, a multi-model database supporting native XML DBMS among other paradigms for versatile data processing; Oracle Berkeley DB, which provides native XML storage through its Berkeley DB XML component with key-value and XML capabilities; eXist-db, an open-source Java-based system offering schema-less storage, rapid prototyping, and browser-based development tools; and BaseX, a lightweight engine emphasizing high-performance XQuery execution and support for large datasets.[27][28][29]
Native XML databases emerged in the late 1990s and early 2000s as XML gained prominence for data interchange, with significant development during that decade driven by the need for specialized handling of semi-structured data. By 2025, they occupy a niche role in legacy systems, document archiving, and specialized publishing, maintaining stable but modest popularity; DB-Engines tracks seven such systems, with MarkLogic leading at a score of 4.10, followed by Virtuoso (2.91), Oracle Berkeley DB (1.64), BaseX (1.47), and eXist-db (0.66) (as of November 2025).[27]
Unlike relational databases, which require shredding XML into normalized tables and may lose document order or hierarchy, native XML databases store data in its original form to support efficient, structure-aware operations without such preprocessing. In contrast to XML-enabled databases, native systems avoid mapping XML to relational schemas, prioritizing direct XML fidelity over integration with SQL environments.[30][31]
Querying and Standards
XML Query Languages
XML query languages provide specialized mechanisms for navigating, selecting, and transforming hierarchical XML data structures, enabling precise traversal of document trees and construction of new outputs. These languages emphasize path-based expressions and functional constructs to handle XML's nested nature efficiently, distinct from relational query paradigms.[32]
XPath serves as a foundational W3C standard for addressing parts of XML documents through concise path expressions that select nodes, attributes, or values based on their position, type, or content. XPath 2.0 became a W3C Recommendation on January 23, 2007, introducing support for sequences, functions, and XML Schema types to enhance expressiveness over earlier versions.[33] XPath 3.1, published as a W3C Recommendation on March 21, 2017, further extends the language with features like maps, arrays, and higher-order functions for more flexible data manipulation.[32] For instance, an expression such as /root/child[@attr='value'] navigates from the document root to child elements matching the specified attribute condition.[32]
XQuery builds upon XPath as a full-featured, Turing-complete functional query language designed for querying and updating XML collections, supporting complex operations like joining, sorting, and constructing results. XQuery 1.0 achieved W3C Recommendation status on January 23, 2007, establishing its core syntax and semantics for declarative querying.[34] The language's 3.1 version, released as a W3C Recommendation on March 21, 2017, added capabilities such as JSON support, private functions, and annotations to broaden applicability.[35] Central to XQuery are FLWOR expressions, which enable iterative processing with clauses for iteration, binding, filtering, ordering, and returning results; a representative example is:
xquery
for $doc in collection('docs')
let $title := $doc/title
where $title = 'Example'
return <result>{$title}</result>
for $doc in collection('docs')
let $title := $doc/title
where $title = 'Example'
return <result>{$title}</result>
This query iterates over a collection of documents, binds the title element, filters by content, and constructs a new XML result element.[35] XQuery also incorporates modular design for reusable components, strong static typing based on XML Schema, and update functionality through the XQuery Update Facility 1.0, a W3C Recommendation from March 17, 2011, which adds non-destructive update expressions like insertion, deletion, and replacement while preserving immutability principles.[36]
XSLT complements querying with transformation capabilities, defining a stylesheet-based language for converting XML source trees into alternative formats, such as other XML documents or HTML. XSLT 2.0, a W3C Recommendation from January 23, 2007, introduced grouping, user-defined functions, and multiple result documents to support advanced restructuring.[37] Version 3.0, published as a W3C Recommendation on June 8, 2017, enhanced streaming for large inputs, higher-order functions, and packaging for modular stylesheets.[38] Transformations rely on template rules matched via XPath patterns and processing modes; for example, templates can apply selectively to elements, enabling recursive tree-walking to generate outputs like formatted reports from raw XML.[38]
XPath, XQuery, and XSLT form a cohesive family of W3C Recommendations, with their 2.0 and 3.x versions stabilizing core features by 2017 to ensure interoperability across tools and systems.[32][35][38] As of 2025, ongoing discussions within the XQuery and XSLT Extensions Community Group (QT4CG) have produced editor's drafts for XQuery 4.0, incorporating extensions like enhanced JSONiq integration and improved concurrency, though it remains in development without full W3C endorsement.[39]
SQL/XML in Relational Systems
SQL/XML is a component of the ISO/IEC 9075 SQL standard, specifically Part 14 (ISO/IEC 9075-14), which defines mechanisms for integrating XML data and operations into relational SQL queries. First published in 2003, the standard enables relational database management systems (RDBMS) to import, store, manipulate, and export XML alongside traditional relational data. The latest edition, released in 2023 as the sixth revision, maintains and extends these capabilities while aligning with broader SQL evolutions.[40][41]
Key functions in SQL/XML facilitate XML processing within SQL statements. The XMLExists function evaluates an XPath or XQuery expression against an XML value and returns a boolean indicating whether any nodes match, commonly used in predicates for filtering rows. For instance, it can check for the presence of specific elements in an XML column to refine query results. Similarly, XMLQuery executes an XPath or XQuery expression and returns the resulting XML fragment, allowing extraction of structured XML subsets. A representative example in IBM Db2 is:
sql
SELECT XMLQuery('//item[productName=$n]' PASSING prod_info AS "n" RETURNING CONTENT)
FROM products;
SELECT XMLQuery('//item[productName=$n]' PASSING prod_info AS "n" RETURNING CONTENT)
FROM products;
This query retrieves XML elements matching a product name variable from an XML column named prod_info. Additionally, XMLSerialize converts an XML value to a string, binary, or large object type for easier handling in non-XML contexts, supporting serialization options like document or content modes. These functions collectively enable precise XML navigation and transformation directly in SQL.[42][43]
SQL/XML supports hybrid querying by embedding XPath or XQuery snippets within SQL, bridging relational and semi-structured data paradigms. For XML publishing—converting relational tables to XML—functions like XMLELEMENT, XMLATTRIBUTES, XMLFOREST, and XMLAGG construct XML structures from scalar or aggregate data; for example, aggregating rows into nested XML elements. Conversely, parsing XML into relational form uses the XMLTable function, which applies an XQuery to generate a virtual table with rows and columns from XML nodes. In Oracle Database, an example extracts warehouse details from an XML column:
sql
SELECT x.warehouse_id, x.warehouse_name
FROM warehouses,
XMLTABLE('/Warehouse'
PASSING warehouse_spec
COLUMNS warehouse_id NUMBER PATH '@Id',
warehouse_name VARCHAR2(30) PATH 'Name') x;
SELECT x.warehouse_id, x.warehouse_name
FROM warehouses,
XMLTABLE('/Warehouse'
PASSING warehouse_spec
COLUMNS warehouse_id NUMBER PATH '@Id',
warehouse_name VARCHAR2(30) PATH 'Name') x;
This produces relational output from nested XML, ideal for joining with other tables. Such integration allows XML-enabled RDBMS like Db2 and Oracle to handle semi-structured data incrementally, facilitating gradual adoption without migrating to native XML databases.[44][45]
The evolution of SQL/XML reflects ongoing adaptations in data standards. While the SQL:2016 revision (ISO/IEC 9075:2016) introduced native JSON support through functions analogous to SQL/XML—such as JSON_QUERY and JSON_EXISTS—for handling JSON in relational queries, XML remains a core feature for legacy systems, enterprise integrations, and domains requiring schema validation or complex hierarchies. This duality ensures SQL/XML's continued relevance in hybrid environments where XML persists alongside emerging formats.[46][47]
Architecture and Features
Storage and Indexing
Native XML databases utilize diverse storage models to maintain the hierarchical and semi-structured nature of XML data while optimizing for query performance and space efficiency. Node-based storage represents XML documents as an in-memory tree structure of nodes, preserving parent-child relationships and order, as seen in systems like eXist-db, which stores documents in a persistent DOM-like tree for direct navigation.[48] Container-based storage, alternatively, treats XML documents as self-contained units or collections, grouping related documents without fully decomposing the hierarchy, which facilitates bulk operations but may require additional parsing for fine-grained access. These models contrast with textual storage, which retains XML in its original serialized form for simplicity and human readability, versus binary storage, which encodes data in a compact, non-human-readable format to reduce I/O overhead and improve retrieval speeds.[49]
To enhance storage efficiency, some native XML databases incorporate compressed representations such as the Efficient XML Interchange (EXI) format, a binary XML standard that uses grammar-based compression to achieve up to 90% size reduction compared to plain text XML while supporting schema-informed or schema-less modes.[50] EXI integration allows for faster parsing and lower memory usage in resource-constrained environments, though it requires decompression during certain query operations.[50]
Indexing techniques in native XML databases are designed to accelerate queries over the document's structure, content, and text. Structural indexes maintain pointers for hierarchical relationships, such as parent-child and ancestor-descendant links, enabling efficient XPath traversal without repeated tree walks; for instance, path summaries approximate document structures as compact graphs to prune irrelevant subtrees during query evaluation.[51] Value indexes target attributes, element content, and metadata, often using B-tree or hash-based structures to support equality and range queries on textual or numeric values.[52] Full-text indexes, frequently powered by Apache Lucene, handle keyword searches across document corpora; BaseX's native full-text index supports fuzzy and relevance-ranked retrieval, while eXist-db employs a Lucene-based range index for ordered and full-text operations on node values.[48]
Performance considerations in XML database storage and indexing emphasize scalability for large datasets and efficient update handling. Clustering techniques partition documents based on structural similarity or query patterns, reducing search spaces and improving parallel query execution.[53] Persistence strategies balance in-memory caching for low-latency access with disk-based storage for durability; many systems, like eXist-db (version 6.4.0 as of May 2025), use hybrid approaches where frequently accessed indexes reside in RAM while committing changes to disk via journaling to minimize write amplification.[54][28] Updates are optimized to avoid full document re-parsing by leveraging incremental indexing, where only modified subtrees are reprocessed, preserving overall system responsiveness.[51]
In native XML databases, schema-agnostic indexing allows dynamic adaptation to varying document structures without predefined schemas, as implemented in MarkLogic's Universal Index, which automatically builds structural, lexical, and semantic indexes across heterogeneous XML collections for unified querying.[55] This differs from XML-enabled databases, where shredded storage decomposes XML into relational tables via schema mapping, mapping elements and attributes to columns or relations to leverage RDBMS indexes, though this can introduce join overhead for reconstructing documents.[56]
APIs and Access Methods
XML databases provide a range of programmatic interfaces to enable developers to perform create, read, update, and delete (CRUD) operations on XML data, as well as execute queries and manage collections. These APIs are designed to abstract the underlying storage mechanisms, allowing applications to interact with both native XML databases and XML-enabled relational systems without vendor-specific code. Common interfaces emphasize portability across implementations, supporting standards from organizations like the W3C and the Java Community Process (JCP).[57]
The XML:DB API is a foundational Java-based interface for native XML databases, offering methods for managing collections of XML resources and executing XPath queries. It includes core classes for database connections, resource handling, and services such as XPath execution, enabling portable applications that can switch between compliant databases like eXist-db or BaseX. This API supports basic CRUD operations on XML documents stored in collections, treating the database as a hierarchical structure similar to a file system.[57][58]
For broader language support, the XQuery API for Java (XQJ 1.0), developed under JSR 225, standardizes access to XQuery processors from Java applications. It allows submission of XQuery expressions to XML data sources, processing results as sequences of XDM (XQuery Data Model) items, and management of query contexts including namespaces and variables. XQJ promotes interoperability by reducing vendor lock-in, with implementations available in systems like Oracle XML DB and Saxon.[59][60]
In C environments, the XQuery API for C (XQC) provides a standardized binding for XQuery processors, focusing on compiling and executing queries while managing static contexts. It interfaces with the XQuery Data Model for input and output handling, supporting multiple processor implementations to facilitate integration in performance-critical applications. XQC emphasizes mechanisms for error handling and module management, making it suitable for embedded use cases.[61]
RESTful interfaces offer lightweight, HTTP-based access without requiring dedicated client libraries. For instance, eXist-db's REST API maps HTTP methods to database operations: GET for retrieving or querying documents, PUT for uploading XML resources to collections, POST for submitting XQuery updates, and DELETE for removal. This enables seamless integration with web applications, with features like result caching and pagination for efficient handling of large datasets.[62]
In XML-enabled relational databases, extensions to standard database connectivity protocols support XML operations. SQL Server, for example, integrates XML data types with ADO.NET through the SqlXml class, allowing retrieval and manipulation of XML columns via SQL queries that incorporate XQuery or XPath. This provides client-side processing of XML streams, parameter passing for XML inputs, and compatibility with .NET applications for hybrid relational-XML workflows.[63]
Access methods extend beyond APIs to include protocols for collaborative and distributed environments. WebDAV, as implemented in eXist-db, enables file-system-like management of XML collections over HTTP, supporting operations such as copying, moving, and editing documents using standard clients like Windows Explorer or oXygen XML Editor. This protocol facilitates remote document versioning and locking, bridging XML databases with content management tools.[64][65]
SOAP and REST services further enhance accessibility, with many XML databases exposing endpoints for service-oriented architectures. Transaction support ensures data integrity during multi-operation sequences; MarkLogic, for instance, provides full ACID compliance, allowing atomic commits across document updates and queries via its APIs, which maintain isolation and durability even in distributed clusters.[66]
Security features are integral to these APIs, with role-based access control (RBAC) enforcing granular permissions. In MarkLogic, RBAC operates at the document and element levels, assigning roles to users for read, write, or execute privileges, integrated directly into API calls to prevent unauthorized access. This model supports secure multi-tenancy and compliance with standards like Common Criteria.[67]
Modern extensions in 2025-era systems include built-in support for versioning and replication. eXist-db's versioning module tracks document revisions by storing diffs between changes, accessible via API extensions for rollback and audit trails. MarkLogic offers database replication for high availability, synchronizing forests across clusters through configurable APIs that maintain ACID properties during failover. These features enhance resilience in enterprise deployments without compromising XML-native access.[68][69]
Integration and Applications
Hybrid Systems and Integration
Hybrid systems in XML databases often employ multi-model architectures that natively support XML alongside other data formats, such as JSON and RDF, to handle diverse data requirements without extensive conversions. For instance, MarkLogic Server functions as a multi-model NoSQL database that integrates document-oriented XML storage with semantic RDF triples and JSON documents, enabling unified querying across these models in a scalable, distributed environment.[70] This approach addresses the limitations of single-model systems by allowing seamless ingestion and retrieval of heterogeneous data, as seen in its use for enterprise content management where XML schemas coexist with graph-based RDF relationships.[71]
Middleware solutions facilitate XML-relational mapping by transforming XML structures into relational schemas or vice versa, often leveraging XSLT for declarative style-sheet-based conversions. XSLT transformations enable the mapping of hierarchical XML elements to flat relational tables, preserving data integrity during integration while supporting bidirectional flows in enterprise applications.[72] Tools like Altova MapForce extend this by providing visual ETL capabilities for XML-to-relational imports, automating schema mapping and data loading into databases such as Oracle or SQL Server.[73]
Integration techniques commonly involve ETL processes to import XML data into relational systems, exemplified by Oracle's XMLType, which stores XML natively while allowing extraction into relational tables via SQL functions like XMLTABLE for shredding complex documents.[74] Federated queries further enhance this by enabling SQL/XML to join XML content with relational data across distributed sources, as supported in systems like IBM DB2, where remote XML wrappers allow transparent access without data replication.[75] In microservices architectures, API gateways serve as intermediaries, routing requests to XML databases and performing on-the-fly transformations, such as converting XML responses to JSON for client compatibility.[76]
In contemporary ecosystems as of 2025, XML databases integrate with JSON-dominant systems through XQuery extensions, such as in eXist-db, where functions like xml-to-json() convert XML structures to JSON arrays and objects for interoperability in web applications.[77] Cloud platforms like AWS DocumentDB offer partial XML support by storing XML as document fields, though it requires custom parsing for querying due to its primary MongoDB-compatible JSON focus.[78] For big data environments, Hadoop processes legacy XML via specialized input formats and parsers like Hadoop's StreamXmlRecordReader, enabling MapReduce jobs to extract and aggregate XML elements at scale without full schema enforcement.[79]
Key challenges in these hybrid integrations include the object-relational impedance mismatch, where XML's hierarchical, schema-flexible nature conflicts with relational databases' rigid tabular structure, leading to inefficient mappings and potential data loss during transformations.[80] Performance bottlenecks arise from repeated parsing and conversion overhead, particularly in high-volume ETL pipelines, necessitating optimized indexing and caching strategies to mitigate latency.[81]
Use Cases and Industry Applications
XML databases are widely applied in document management, particularly within publishing workflows, where XML structures enable efficient content repositories for media and digital humanities projects. Native XML databases such as eXist-db support versioning and transformation of XML documents into various formats like web pages, PDFs, and APIs, facilitating collaborative editing and long-term preservation in content-heavy environments.[82][83]
In data exchange scenarios, XML databases underpin financial reporting through the XBRL standard, which uses XML-based tagging to standardize and automate the communication of business and financial information across systems and regulatory bodies.[84][85] Similarly, in healthcare, HL7 Clinical Document Architecture (CDA) documents are stored and queried via XQuery in XML databases, enabling scalable sharing and analysis of clinical data while ensuring semantic interoperability.[86][87]
Enterprise applications leverage XML databases for configuration management, as seen in frameworks like Spring, where XML schema-based files define bean configurations and application settings for modular deployment.[88] During legacy system migrations, hybrid relational-XML database management systems preserve existing XML schemas alongside relational data, minimizing disruptions in transitional environments.[89]
As of 2025, XML databases maintain a niche role in compliance-intensive sectors, supporting structured data handling in financial (via XBRL) and healthcare (via HL7 CDA) reporting, as well as government and legal documentation requiring standardized formats.[90] Their adoption has waned for new web applications in favor of lighter formats, yet they remain essential for processing OOXML in office suites like Microsoft Office, which rely on zipped XML structures for documents, spreadsheets, and presentations.[91] In research, datasets such as the Web of Science XML feeds provide raw metadata from over 12,500 journals, enabling large-scale bibliometric analysis.[92]
Looking ahead, XML databases are expected to coexist with JSON-oriented systems in hybrid integrations, with the market for XML databases software estimated to reach approximately $329 million in 2025.[93]