Fact-checked by Grok 2 weeks ago

Multi-model database

A multi-model database is a type of (DBMS) that natively supports multiple models—such as relational, (e.g., or XML), graph, key-value, and spatial—within a single, integrated backend, allowing diverse types to be stored, queried, and managed without requiring separate specialized databases. This approach, often termed , addresses the challenges of handling heterogeneous in modern applications by providing unified administration, security, scalability, and high availability features across all supported models. Key benefits of multi-model databases include simplified and reduced operational complexity, as organizations avoid the overhead of maintaining multiple siloed systems for different formats. They enable efficient querying using a common language or extensions, such as SQL with added support for patterns (e.g., MATCH clauses), functions, for XML, and spatial operators, often leveraging and indexing tailored to each model. Notable implementations include AI Database 26ai (as of 2025), which supports via Simple Oracle Document Access (), property s with analytics, RDF semantic s, and spatial ; SQL, which integrates these capabilities into its relational engine using extensions; and , a multi-model service supporting document, key-value, wide-column, , and spatial models. The rise of multi-model databases reflects the evolution of to accommodate , cloud-native applications, and polyglot programming, with benchmarks emerging to evaluate performance across models like , , and key-value stores. These systems prioritize optimized storage formats, such as binary JSON representations, and cross-model query capabilities to support complex, real-world workloads in industries like , healthcare, and .

Overview

Definition and Characteristics

A multi-model database is a database management system (DBMS) that natively supports multiple data models—such as relational, , , and key-value—within a single, integrated backend, enabling seamless storage, querying, and management of diverse data types without requiring separate systems for each model. This approach allows applications to leverage specialized data structures and access methods tailored to specific needs while maintaining a unified for all data operations. Key characteristics of multi-model databases include a unified that efficiently manages various data formats and structures in one repository, model-agnostic querying that supports operations across different models via a interface or , and the elimination of data silos by consolidating heterogeneous data sources. Unlike , which relies on multiple specialized databases leading to increased complexity, integration overhead, and potential inconsistencies, multi-model databases achieve polyglot capabilities within a , simplifying , , and scalability. These databases evolved to overcome the rigidity of traditional single-model systems, such as relational DBMS limited to structured or silos optimized for one but inflexible for others, by enabling handling that supports the varied workloads of modern applications. This unified flexibility addresses the challenges of diversity in environments without the drawbacks of fragmented architectures.

Historical Development

The concept of multi-model databases emerged in the early , building on innovations in databases to address the growing need for handling diverse data types within a unified system, rather than relying on separate specialized databases. This development responded to the challenges of , a term coined by software architect Martin Fowler in his 2011 bliki post, which described using multiple database technologies tailored to specific application needs to manage varying data storage requirements. One of the pioneering systems, , was first released in 2010 by Luca Garulli, integrating document, , key-value, and object-oriented models into a scalable database. The term "multi-model database" itself was formally introduced by Garulli in May 2012 during his keynote at the NoSQL Matters Conference in , , envisioning an evolution of first-generation products to support broader use cases through integrated backends. Between 2014 and 2018, multi-model databases gained traction with key releases that demonstrated practical viability and enterprise appeal. , initially launched as AvocadoDB in 2011 and renamed in 2012, established itself as an open-source option supporting , , and key-value models with a focus on query flexibility via its language. Similarly, introduced in 2017 as a globally distributed, multi-model service, evolving from the internal Project Florence started in 2010 to handle large-scale, multi-tenant applications across key-value, , , and column-family models. Post-2020, the adoption of multi-model databases accelerated, driven by the demands of cloud-native architectures and AI-driven workloads that require seamless integration of structured, semi-structured, and . Systems like SurrealDB, first released in 2022, have advanced this trend through ongoing developments up to 2025, emphasizing real-time querying, extensibility, and deployment in environments to support distributed AI applications. This growth reflects broader shifts in , including the transition from rigid management systems (RDBMS), which dominated from the 1970s to the , to the scalable but fragmented paradigms of the . The rise of multi-model approaches was further influenced by the explosion, where frameworks like —initially released in April 2006—exposed the "variety" challenge in processing heterogeneous datasets, prompting hybrid designs that unify storage and querying without sacrificing performance. By consolidating models into single engines, these databases mitigated the operational overhead of while adapting to the surge in modern ecosystems.

Supported Data Models

Common Models

Multi-model databases typically support a variety of standard data models to accommodate diverse application needs, including relational, , , key-value, column-family, spatial, , and time-series structures. These models allow users to store and manage different types of data within a unified system, leveraging each model's strengths for specific use cases such as structured queries, semi-structured storage, or relationship traversals. The organizes data into tabular structures with rows and columns, supporting SQL-like querying, transactions for , and operations like joins to relate multiple tables efficiently. This model is particularly suited for applications requiring strict enforcement and complex analytical queries, as implemented in systems like Azure SQL Database, which extends traditional relational capabilities to multi-model environments. The model stores data as self-contained, semi-structured units in formats like or , offering schema flexibility to handle varying data shapes without rigid predefined structures. It excels in scenarios involving hierarchical or nested data, such as or user profiles, where rapid ingestion and retrieval are prioritized over fixed schemas, as seen in ArangoDB's native document collections. The model represents data as nodes, edges, and properties to capture complex relationships and interconnections, enabling efficient traversals and for relationship-heavy datasets like social networks or recommendation engines. This approach facilitates queries that follow paths through connected entities, providing insights into networks that tabular models struggle with, as supported natively in databases like . The spatial model handles geographic and geometric data, supporting queries for location-based analysis, proximity searches, and mapping applications using standards like GeoJSON or Well-Known Text (WKT). It is ideal for use cases in logistics, urban planning, and environmental monitoring, with native support in systems like Oracle Database and ArangoDB. Key-value and column-family models provide foundational storage for high-performance access patterns. The key-value model uses simple pairs for fast lookups and caching, ideal for session data or configuration stores with minimal overhead. Column-family models, akin to wide-column stores, organize data into dynamic columns within rows for scalable handling of sparse, semi-structured information like logs or sensor readings, as exemplified by Azure Cosmos DB's Cassandra API. Emerging support for and time-series models addresses modern demands in / and real-time analytics as of 2025. The model stores high-dimensional embeddings for similarity searches and applications, such as semantic retrieval in large language models, integrated in systems like . The time-series model manages timestamped sequential data for temporal analysis, supporting efficient aggregation and forecasting in or financial applications, as provided by SurrealDB.

Extensibility and User-Defined Models

Multi-model databases enhance flexibility by supporting user-defined models, which allow developers to create custom data structures tailored to specific application needs without altering the core system. These models are typically defined through mechanisms such as extensions, where users specify new item types, constraints, and relationships using declarative constructs like the format (<ITEM NAME, ITEM TYPE, ITEM CONSTRAINT>). For instance, custom geospatial models can be built by extending graph-based structures with path filters to handle spatial queries, while event-sourced models leverage document-oriented with matching filters for temporal event tracking. This approach enables the integration of domain-specific semantics while preserving compatibility with built-in models. Extensibility features in multi-model databases further empower customization through plugin architectures, schema-on-read paradigms, and API hooks that facilitate the addition of new models without backend modifications. Plugin architectures permit the registration of characteristic filters or functions that extend query processing for novel data types, ensuring seamless incorporation of specialized logic. Schema-on-read approaches, such as those employing supply-driven inference, dynamically interpret heterogeneous data sources—ranging from relational to graph-based—allowing on-demand extensions of existing schemas with minimal upfront definition. API hooks provide entry points for injecting domain-specific behaviors, such as custom indexing or validation, directly into the query engine. These features collectively support scalable adaptation, as demonstrated by tools that unify schemas across models using record schema descriptions (RSD) to capture integrity constraints and inter-model references. In practice, these capabilities enable multi-model databases to adapt to industry-specific requirements, fostering innovation in dynamic environments. In , extensibility allows the creation of custom models by extending multidimensional cubes with market data feeds, improving OLAP analyses for volatile conditions. For IoT applications, hybrid sensor models can be user-defined to integrate time-series and graph elements, supporting in scenarios like . By 2025, integration of in database management has supported advancements in schema evolution and , reducing manual configuration in evolving data ecosystems.

System Architecture

Core Design Principles

Multi-model database systems are engineered around a unified backend that serves as a single, integrated storage layer capable of handling diverse data models such as relational, , , and key-value without requiring separate engines or approaches. This design minimizes overhead by sharing core infrastructure services like transactions, recovery, and indexing across models, ensuring data consistency and reducing the complexity of managing multiple disparate systems. By consolidating storage, these systems avoid the procedural integration challenges of traditional polyglot setups, allowing for more efficient resource utilization and simpler administration. To facilitate seamless with varied models, multi-model databases employ layers, often in the form of unified or intermediaries like object-relational mappers, that translate operations between models without exposing underlying complexities to applications. These layers enable declarative to multiple models through a , supporting transformations such as SQL queries over or documents, which enhances developer productivity by abstracting model-specific details. For instance, views and query rewriters act as logical intermediaries, permitting flexible organization independent of physical storage while maintaining model fidelity. Scalability and consistency in multi-model databases involve strategic trade-offs guided by the , where systems prioritize availability and partition tolerance for distributed workloads while often favoring to accommodate diverse model requirements like high-throughput key-value operations alongside ACID-compliant relational transactions. This balance is achieved through tunable consistency models, such as for scalable, fault-tolerant scenarios and stricter guarantees for critical data, enabling horizontal scaling across large, semi-structured datasets without sacrificing overall system reliability. In practice, and adaptive indexing support massive data volumes, ensuring performance under varying loads from common models like graphs and documents. Security and governance are reinforced through unified access controls that apply consistently across all supported models, typically via (RBAC) policies to enforce fine-grained permissions and prevent unauthorized cross-model data exposure. This centralized approach simplifies compliance by providing a single governance framework for auditing, , and policy enforcement, reducing risks associated with fragmented in multi-model environments. For example, attribute-based controls can restrict intra-document access using standards like , ensuring secure handling of hybrid data while maintaining operational efficiency.

Storage and Indexing Mechanisms

Multi-model databases typically employ a unified engine to manage diverse data models such as documents, graphs, and key-value pairs, often building on document-oriented structures like trees or extending key-value stores to accommodate relational and graph elements. For instance, systems like utilize , an LSM-tree-based engine optimized for high write throughput, to persist all data models in a single layer, where documents serve as the foundational unit and graph edges are represented as specialized documents linking vertices. In contrast, leverages and hash-based for efficient read operations across its multi-model support, including object-oriented extensions for relational-like queries. These engines balance write-heavy workloads with LSM-trees for sequential appends and read-optimized B-trees for point lookups, enabling seamless integration of heterogeneous data without model-specific silos. Indexing strategies in multi-model databases are designed to support queries across models, incorporating composite indexes for relational joins, full-text indexes for document searches, and traversal indexes for . Composite indexes, often built on multiple attributes, facilitate efficient relational operations by combining keys from or key-value stores, as seen in ArangoDB's and skiplist indexes that span and elements. Full-text indexes employ inverted structures to handle semi-structured content, while graph-specific traversal indexes use adjacency lists or pointers to enable rapid , with OrientDB's unique traversal mechanism supporting millisecond-level queries regardless of database scale. Adaptive indexing approaches dynamically adjust based on query patterns, selecting model-appropriate structures—such as B-trees for ordered relational access or bloom filters for probabilistic key-value lookups—to optimize across mixed workloads. Data representation in multi-model databases relies on unified serialization formats to store heterogeneous efficiently, often using binary encodings like or to embed diverse models within a common structure. For example, are typically represented via adjacency lists embedded in collections, allowing key-value pairs to serve as properties and relational tuples to map onto composite keys, as implemented in systems like ArcNeural with its memory-mapped files for vectors and for payloads. evolution tools, such as the MM-evolver proposed in , support propagating changes across models—such as adding attributes to documents or altering edges—while maintaining through versioned mappings and categorical transformations. This enables flexible handling of evolving schemas without disruptions, prioritizing extensibility in environments.

Querying and Interfaces

Query Languages

Multi-model databases employ a variety of query languages to handle operations across diverse data models, typically through unified languages that abstract underlying complexities or model-specific subsets routed via a single interface. Unified query languages, such as Query Language (), enable seamless querying of key-value, document, and models within a single syntax, supporting declarative operations like traversals and joins without requiring model-specific switches. Similarly, extensions to SQL, including SQL/JSON as standardized in ISO/IEC 9075:2016, allow relational databases like to query documents alongside tabular data using operators like containment (@>) and path expressions, effectively supporting hybrid relational-document models. Model-specific query subsets are often integrated into multi-model systems to leverage specialized paradigms while maintaining a unified access point. For instance, in databases like ArcadeDB, SQL handles relational queries, supports for property s (e.g., MATCH (n:Hero)-[:IsFriendOf]->(m) RETURN n, m), and enables traversal-based graph operations, all executable through a consistent interface such as the system's Java API or web console. These subsets allow developers to apply graph-specific languages like or for complex relationship queries without abandoning relational SQL for structured data, with the database routing requests internally across models. Advanced features in these languages facilitate cross-model interactions, such as joins between graph edges and documents or aggregation pipelines that summarize data from multiple sources. In , for example, queries can perform graph traversals followed by aggregations like counting connected components across document collections, optimizing for multi-model storage targets. SQL++ variants extend this by incorporating path queries and object-relational mappings for unified aggregations over and relational data. The (GQL), standardized as ISO/IEC 39075:2023, further integrates property graph querying into SQL, enabling multi-model systems to handle graph patterns alongside relational and document data. As of November 2025, interfaces using large language models (LLMs) are an emerging trend in database querying, primarily through tools that translate plain English prompts into SQL (NL2SQL), with growing exploration for broader data models. These tools aim to enable non-experts to query enterprise-scale databases while balancing accuracy and latency, though adoption for cross-model operations across graphs, documents, and vectors remains in early stages.

APIs and Access Methods

Multi-model typically that enable model-agnostic interactions, allowing developers to perform operations across diverse models without switching interfaces. RESTful are widely adopted for their simplicity and compatibility with web-based applications, providing endpoints for CRUD operations on relational, , , and key-value . GraphQL endpoints further enhance flexibility by permitting clients to specify exact requirements, reducing over-fetching in scenarios involving multiple models. Language-specific drivers, such as JDBC extensions for and SDKs, support unified to these models, facilitating seamless integration in polyglot environments. Access protocols in multi-model databases prioritize efficiency and versatility to handle varied workloads. serves as a high-performance for low-latency, bidirectional communication, particularly suited for architectures querying hybrid data structures. WebSockets enable real-time, persistent connections for streaming updates across models, supporting applications like live analytics on graph and document data. patterns allow hybrid queries by virtually unifying multiple backend stores, enabling cross-model joins without data duplication. Integration capabilities extend multi-model databases into broader ecosystems, with connectors facilitating data flow to streaming platforms like Kafka for real-time ingestion and processing of multi-structured events. Compatibility with BI tools via standard ODBC/JDBC drivers supports workflows, allowing unified reporting on relational and non-relational . As of 2025, serverless access models have gained prominence, offering auto-scaling APIs without infrastructure management, as seen in cloud-native implementations that handle variable loads across data models efficiently.

Benefits and Limitations

Advantages

Multi-model databases offer a simplified by integrating multiple data models—such as relational, , , and key-value—into a single platform, thereby reducing the need for deploying and maintaining separate specialized databases. This consolidation addresses the challenges of , where applications require diverse data storage solutions, by minimizing integration overhead and lowering operational costs associated with data synchronization and system . In terms of performance efficiency, these databases leverage unified layers and optimized indexing strategies to enable faster query execution across different models without the introduced by (ETL) processes or data duplication between silos. This approach results in better resource utilization, particularly in distributed and cloud-based environments, where a single instance can handle varied workloads more effectively than fragmented systems. The flexibility of multi-model databases makes them well-suited for modern application development, including architectures that demand varied data access patterns, and workflows requiring vector embeddings alongside structured data, and that benefit from seamless querying of operational datasets. By natively supporting these paradigms within one system, developers can iterate more rapidly and adapt to evolving requirements without architectural overhauls.

Challenges and Drawbacks

Multi-model databases introduce significant in , primarily due to the need to handle diverse models within a unified , which often features contradictory characteristics and requires specialized skills for effective administration. This can result in a steeper for developers and administrators, as unified querying across models demands familiarity with multiple paradigms, potentially leading to suboptimal performance akin to a "jack-of-all-trades" approach that underperforms compared to specialized single-model in model-specific workloads. Query optimization across heterogeneous models exacerbates this, as execution plans must accommodate varied access patterns, sometimes degrading performance by up to 65% for operations spanning multiple models. Consistency issues pose another key challenge, particularly in balancing transaction properties across differing models, where relational components may require guarantees while document or graph elements favor , necessitating advanced to avoid inconsistencies in distributed environments. consistency models can mitigate this by providing strong guarantees for critical transactions and relaxed ones for others, potentially reducing in write-heavy scenarios by 38%, though implementing such strategies adds operational overhead. Additionally, proprietary extensions in multi-model systems can lead to , making migration difficult due to dependencies on vendor-specific features for cross-model integration. As of 2025, multi-model databases exhibit maturity gaps, remaining a relatively emerging with limited ecosystem support relative to established single-model specialists like relational or databases, which boast more mature tools, libraries, and community resources. This immaturity manifests in challenges for ultra-high-volume scenarios, where unified engines may achieve throughput within 12% of specialized systems but struggle with extreme heterogeneity without custom tuning. Despite projected growth at a 19.3% CAGR through 2028, the ecosystem's relative youth limits widespread adoption in mission-critical applications requiring proven long-term reliability.

Notable Implementations

Commercial Systems

Oracle Database is a multi-model relational database management system that supports relational data alongside document (JSON via Simple Oracle Document Access (SODA)), graph (property graphs and RDF semantic graphs), spatial, and key-value models within a single integrated engine. First introduced with multi-model capabilities in version 12c (2013) and enhanced in 19c (2019), the latest version 23ai (as of 2025) adds AI Vector Search for machine learning workloads, enabling unified querying via SQL extensions like JSON functions, graph MATCH clauses, and spatial operators. It provides enterprise-grade features such as ACID transactions, high availability through Real Application Clusters (RAC), and scalability for big data analytics, making it suitable for industries like finance and healthcare requiring secure, compliant data management across diverse models. Microsoft Azure SQL Database extends the with native support for documents, graph queries via , spatial data, and XML through T-SQL, allowing multi-model operations without separate databases. Launched as part of SQL in 2010 and with multi-model features maturing by 2017, it leverages the for cloud scalability, automatic tuning, and integration with services like Synapse Analytics. As of 2025, it supports hyperscale storage up to 100 TB and serverless compute options, ideal for hybrid transactional-analytical processing (HTAP) in applications such as and . Microsoft Azure Cosmos DB is a globally distributed, multi-model database service that supports document, key-value, graph, and column-family data models through multiple APIs, including SQL (Core), , , , and Azure Table Storage. Launched in May 2017, it provides automatic scaling, low-latency guarantees under 10 ms for point reads and writes, and multi-region replication for , making it suitable for enterprise cloud applications such as real-time analytics, personalization, and AI-driven services that require consistent across global data centers. Couchbase Server functions as a distributed multi-model database that integrates document storage with graph capabilities, enabling the modeling of complex relationships using documents and SQL++ (formerly N1QL) queries that support joins, recursive common table expressions for , and transactions. It emphasizes and real-time synchronization through Couchbase Lite and Sync Gateway, allowing seamless data replication between edge devices and cloud environments for offline-first applications. In October 2025, Couchbase released version 8.0, introducing hyperscale indexing and search enhancements like the Vector Index for billion-scale workloads, which supports hybrid queries combining vector similarity with document and graph data. These features position Couchbase for use cases in apps, data syncing, and generative applications requiring low-latency access to interconnected data. SingleStore (formerly MemSQL) is a database that provides native multi-model support for relational, , time-series, , full-text, and geospatial data within a single engine, using standard SQL queries across all models without needing separate systems. It converges (OLTP) and (OLAP) through its (HTAP) architecture, delivering sub-millisecond query latencies for real-time ingestion and analytics on petabyte-scale datasets. This low-latency unification enables use cases like fraud detection, , and interactive dashboards in enterprise environments, where immediate insights from mixed workloads are critical.

Open-Source Projects

ArangoDB is a prominent open-source multi-model database that natively integrates , , and key-value models within a single engine, enabling unified querying across data types. Its architecture leverages a flexible layer that supports documents for , structures for relationship modeling, and key-value pairs for simple lookups, all managed by a distributed cluster design for scalability. The project, initiated in 2014, uses the ArangoDB Query Language (), a declarative SQL-like syntax extended for multi-model operations, allowing complex traversals and joins in one query. Foxx provide a for embedding custom logic directly into the database, facilitating serverless-style applications without external . Community contributions have been vital since its 2.0-licensed inception, with active development on including extensions for search and analytics; recent 2025 enhancements integrate the Operator for automated cluster management, deployment, and scaling in containerized environments. OrientDB, now evolved into ArcadeDB following its 2018 acquisition by and subsequent support discontinuation in 2021, represents a key open-source multi-model effort emphasizing and paradigms with extensions for other models like key-value and time-series. ArcadeDB's architecture builds on OrientDB's record-based storage but introduces a lighter, faster transactional engine using Alien Technology for multi-model handling, supporting graphs for analytics, documents for flexibility, and vectors for workloads in a single backend. It extends standard SQL with graph-specific syntax like OpenCypher for traversals, enabling efficient analytics on large-scale connected data without model silos. Post-acquisition, community forks led by original creator Luca Garulli birthed ArcadeDB as the official continuation under Apache 2.0, fostering contributions in areas like sharding, replication, and development via , with strong emphasis on speeds reaching millions of records per second. SurrealDB, a Rust-based open-source multi-model database, unifies , , relational, time-series, geospatial, and key-value models in a scalable, embeddable designed for modern applications. Its core engine supports queries through live subscriptions and event-driven updates, allowing reactive data flows across models without polling. Focused on edge and embedded scenarios, it runs in-process for low-latency operations on devices, with offline-first synchronization and a small footprint suitable for IoT and mobile use. Gaining significant traction since 2023, the project has seen rapid adoption through $6 million in funding, major releases like in 2024, and reported $5 million revenue by 2025, driven by its developer-friendly SQL-like and AI-native features. The community, active on and , contributes to its 2.0-licensed codebase, emphasizing security (RBAC, JWT) and scalability from single-node to distributed clusters.

Evaluation and Benchmarking

Performance Metrics

Performance metrics for multi-model databases evaluate their ability to handle diverse data models and workloads efficiently, focusing on key indicators that reflect operational effectiveness across relational, document, graph, key-value, and increasingly vector-based operations. Throughput, typically measured in (QPS) or (TPS), quantifies the volume of operations a database can process over time, particularly important for mixed s involving operations like traversals and relational joins. For instance, in benchmarks simulating scenarios, throughput varies significantly by workload; a transaction spanning multiple models might achieve 230 TPS in one system, while a transaction reaches 738 TPS, highlighting how multi-model integration can optimize or constrain processing rates depending on data access patterns. , the time taken for individual query responses, is assessed in milliseconds or seconds and often scales logarithmically with size, with traversals typically incurring higher latency than simple relational joins due to traversal depth and join complexity across models. Scalability metrics examine a database's to expand without proportional degradation, including horizontal efficiency—such as the ability to distribute workloads across nodes—and sharding overhead, which measures the additional computational cost of partitioning . Resource utilization tracks CPU and consumption under mixed workloads, where handling concurrent relational queries, , and document retrievals can lead to uneven load distribution if not optimized. In evaluations using scale factors from 1GB to 30GB, multi-model databases demonstrate linear in and query execution on clustered setups, completing large-scale preparation in under on three nodes, though sharding introduces overhead in cross-model queries compared to single-model operations. Consistency and durability metrics ensure reliable data handling in distributed environments, with transaction commit rates indicating the percentage of ACID-compliant operations successfully finalized across models, often exceeding 99% in standalone modes for workloads like order processing that span graphs and relational data. Replication lag, the delay in synchronizing data across replicas, is monitored in milliseconds to balance and freshness, particularly in globally distributed systems where levels can introduce lags under high throughput. In 2025, vector similarity search speed has emerged as a critical for AI-integrated multi-model databases, measuring query for high-dimensional embeddings; for example, systems can achieve low-latency responses in the tens of milliseconds for searches over millions of vectors while maintaining high recall rates, enabling efficient hybrid workloads combining with traditional models.

Benchmarking Approaches

Benchmarking multi-model databases involves methodologies that evaluate system performance across diverse data models such as relational, , , key-value, and increasingly representations. Standard benchmarks are often adapted from single-model suites to handle combined workloads, ensuring comprehensive assessment of and . For instance, the Yahoo! Cloud Serving Benchmark (YCSB) is extended for key-value operations within multi-model contexts, while the Linked Data Benchmark Council (LDBC) benchmark supports traversals integrated with other models. The Transaction Processing Performance Council Decision Support (TPC-DS) benchmark is similarly adapted for analytic queries involving relational and data, as seen in frameworks that real-world datasets to simulate mixed-model . These adaptations emphasize combined workloads to test model conversions and joint operations, such as pattern matching alongside relational aggregations. The UniBench framework, for example, draws from YCSB, LDBC, and TPC benchmarks to generate correlated data across models, executing queries that span , , and key-value lookups in a unified . Similarly, M2Bench incorporates TPC-DS-inspired analytic tasks across relational, , , and models, using domain-specific scenarios like e-commerce recommendations that mix at least two models per task. Such approaches prioritize end-to-end query execution times and under scale factors from 1 to 10, providing a foundation for evaluating multi-model synergies. Recent developments as of 2025, such as VDBBench 1.0, further incorporate AI-augmented techniques for vector operations in real-world simulations. Custom benchmarking approaches often involve workload simulations that blend models in configurable proportions to mimic real-world applications. For example, scenarios might allocate 40% of operations to relational queries, 30% to traversals, and the remainder to document or key-value accesses, generated via synthetic tools on platforms like . The MMSBench-Net benchmark employs custom scripts to simulate workloads, integrating relational user , document logs, and topologies with adjustable query distributions for parallel execution. Tools like HammerDB, primarily for relational OLTP, can be scripted into setups for multi-model testing, while bespoke generators handle evolution and model mixing. These methods ensure repeatable tests focused on throughput and query latency, often using choke-point queries to isolate multi-model challenges. In 2025, benchmarking trends incorporate -augmented techniques for vector operations, reflecting the integration of embedding-based models in multi-model systems. Tools like VDBBench facilitate real-world simulations with streaming ingestion and concurrent reads/writes on high-dimensional datasets (e.g., 768-1536 dimensions from models like V2), measuring P95 and recall for retrieval-augmented generation workloads. Fair comparisons across vendors necessitate standardized configurations, particularly distinguishing deployments—offering auto-scaling and pay-as-you-go economics—from on-premise setups with fixed hardware control. Benchmarking studies highlight the need for identical workloads and resource normalization to account for variability versus on-premise predictability, ensuring equitable evaluation of and cost-efficiency in multi-model environments. These approaches target metrics such as throughput and resource utilization as core evaluation goals.

Theoretical Underpinnings

Foundational Concepts

The concept of , introduced by Martin Fowler and Pramod Sadalage in 2012, posits that applications benefit from employing multiple specialized technologies to match diverse data requirements, rather than relying solely on relational databases. This approach acknowledges the limitations of a one-size-fits-all model, advocating for key-value stores for simple lookups, document stores for , and graph databases for complex relationships. Multi-model databases represent an evolution of this idea, integrating these varied models into a unified system to reduce integration overhead while preserving the strengths of each . Data model unification in multi-model databases addresses the fragmentation of storage by extending abstract conceptual models, such as the model, to encompass relational, , and structures within a cohesive . The traditional ER model, which focuses on entities, attributes, and relationships, can be generalized to represent edges as relationships and hierarchies as nested entities, enabling seamless transitions between models. This unification mitigates the object-relational impedance mismatch—a longstanding challenge where the gap between application-level object models and rigid relational schemas leads to inefficient data mapping and query complexities—by allowing native support for multiple representations without custom intermediaries. Formal foundations for multi-model databases draw on to enable schema flexibility, permitting dynamic evolution of data structures while maintaining and consistency across models. provides a rigorous basis for defining polymorphic schemas that accommodate varying degrees of structure, from strictly typed relational tables to schemaless , ensuring that queries and updates preserve semantic integrity. Complementing this, offers conceptual tools for query mapping, treating data models as categories where morphisms represent transformations between relational tuples, traversals, and extractions, thus facilitating unified query processing without loss of expressiveness.

Research Directions

Recent research in multi-model databases emphasizes the integration of (AI) and (ML) capabilities to handle diverse data types, including unstructured and vector-based representations. A key focus is on incorporating native vector databases into multi-model architectures, enabling seamless storage and querying of embeddings alongside traditional models like relational and graph data. For instance, the Hybrid Multimodal Graph Index (HMGI) proposes a graph-based that combines relational indexing with vector search, using modality-aware partitioning to optimize performance for multimodal data ingestion and retrieval in databases. This approach achieves sub-linear query times and outperforms standalone vector databases in scenarios requiring relational context, addressing the limitations of retrofitting vector support into legacy systems. Advancements in further enhance AI integration by allowing collaborative model training across distributed multi-model datasets without centralizing sensitive information. The FLAMMABLE framework introduces multi-model federated learning (MMFL), where clients dynamically engage multiple models per training round based on their computational resources, adapting batch sizes to mitigate heterogeneity. This results in 1.1–10.0× faster convergence and 1.3–5.4% higher accuracy compared to single-model baselines, facilitating scalable over multi-model stores. Complementing this, 2025 studies on hybrid embeddings explore combining textual, visual, and relational embeddings within multi-model environments to support richer semantic queries, as seen in HMGI's adaptive index updates for dynamic data. Emerging challenges in quantum and are driving innovations in secure multi-model storage. Research highlights the need for quantum-resistant to protect diverse models against future quantum threats, with proposals evaluating lattice-based and hash-based algorithms for database . These schemes ensure post-quantum for multi-model systems by applying cryptographic layers that maintain compatibility with existing query engines while safeguarding and . In parallel, address by enabling secure, multi-tenant access across distributed models; the MtDB system leverages for coordination and IPFS for , supporting universal SQL queries with 35ms latency over large-scale records and 1.2–1.3× overhead for integrity enforcement. This architecture promotes in decentralized environments, such as healthcare, without compromising multi-model flexibility. Standardization efforts aim to unify querying and evaluation across multi-model databases, tackling inconsistencies in schema evolution and cross-model operations. Proposals for universal query representations, such as Directed Acyclic Graph-based primitives, provide a model-agnostic framework for multimodal retrieval, extensible to polystore systems via standardized pipelines. Similarly, natural language translation to multi-model query languages (MMQLs) introduces adaptive frameworks that improve accuracy by over 9% through schema embeddings and error correction, fostering a common interface for diverse data models. Academic benchmarks, including SIGMOD 2024–2025 studies, advance consistency testing; for example, the TransforMMer tool simulates data evolution across relational, document, and graph models, generating dynamic benchmarks to evaluate interoperability and performance under schema changes. The Multimodal Attributed Graph Benchmark (MAGB) further assesses consistency in graph-vector hybrids, revealing modality biases and the benefits of balanced embeddings for reliable multi-model learning.

References

  1. [1]
    [PDF] Multimodel Database - Oracle
    multiple data models and access methods within a single database management system. ... Such applications have a well-defined data model and data distribution.
  2. [2]
    Multi-model capabilities - Azure SQL | Microsoft Learn
    Nov 6, 2024 · Multi-model databases enable you to store and work with data in multiple formats, such as relational data, graph, JSON or XML documents, spatial data, and key- ...
  3. [3]
    [PDF] Multi-model Databases: A New Journey to Handle the Variety of Data
    From the point of view of our survey this is not a multi-model database, but a possible use case of the respective DBMS; there is no cross-model query language, ...
  4. [4]
    [PDF] Towards Benchmarking Multi-Model Databases
    For example, document, graph, relational, and key-value models are examples of data models that may be supported by a multi-model database. Nothing shows the.
  5. [5]
    [PDF] DortDB: Bridging Query Languages for Multi-Model Data Ponds
    They can be stored in a multi-model database management system. (DBMS) [9] designed to handle diverse data formats while ensuring optimized performance and ...
  6. [6]
    What Is a Multimodel Database? | Definition from TechTarget
    Oct 5, 2021 · A multimodel database is a data processing platform that supports multiple data models, which define the parameters for how the information in a database is ...
  7. [7]
    What Is a Multi-Model Database? - SingleStore
    Sep 1, 2022 · A multi-model database or a database that natively allows you to store and access data of different types, such as relational, time series, geospatial, key- ...What Is a Multi-Model Database? · Consolidation and Cross... · ACID Compliance
  8. [8]
    Multi-Model NoSQL Database Features | Progress Marklogic
    Whereas polyglot persistence results in data silos and multiple interfaces that require complex integration workflows, a multi-model database facilitates ...
  9. [9]
    Unlocking Data Potential: The Advantage of Multi-Model Databases
    Multi-model databases offer a promising solution. These integrated databases can store, manage, and query data in multiple models, simplifying data management.Missing: limitations | Show results with:limitations
  10. [10]
    Polyglot Persistence - Martin Fowler
    Nov 16, 2011 · In 2006, my colleague Neal Ford coined the term Polyglot Programming, to express the idea that applications should be written in a mix of ...
  11. [11]
    OrientDB System Properties - DB-Engines
    Initial release, 2010 ; Current release, 3.2.29, March 2024 ; License info Commercial or Open Source, Open Source info Apache version 2 ; Cloud-based only info ...Missing: history | Show results with:history
  12. [12]
    A Deep Dive into Multi-Model Databases: Hype vs. Reality
    A Gartner analyst report published in 2020 defines a multi-model DBMS as one that supports a unified database for different types of data (relational, document, ...
  13. [13]
    What is ArangoDB? - DevOpsSchool.com
    Mar 26, 2022 · ArangoDB first release in year 2011 as AvocadoDB and then renamed to ArangoDB in 2012, developed by ArangoDB GmbH. It came up with the test ...
  14. [14]
    Azure Cosmos DB: The industry's first globally-distributed, multi ...
    May 10, 2017 · It is the first cloud database to natively support a multitude of data models and popular query APIs, is built on a novel database engine ...
  15. [15]
    Releases - SurrealDB
    Release v2.3.1. Released on May 7th, 2025. This release resolves an issue identified in v2.3.0 that can corrupt the database when an UPDATE statement is ...Missing: advancements | Show results with:advancements
  16. [16]
    A brief history of databases: From relational, to NoSQL, to distributed ...
    Feb 24, 2022 · The first computer database was built in the 1960s, but the history of databases as we know them, really begins in 1970.Missing: multi- | Show results with:multi-<|control11|><|separator|>
  17. [17]
    The Evolution of Apache Hadoop: A Revolutionary Big Data ...
    Jan 17, 2024 · The initial release of Hadoop, version 0.1.0, came in April 2006. It consisted of two main components: the Hadoop Distributed File System (HDFS) ...
  18. [18]
    The Rise of Multi-Model Databases in Modern Architectures - Rapydo
    Mar 31, 2025 · At their core, multi-model databases feature a unified storage layer that efficiently handles various data formats, a model translation ...
  19. [19]
    Data Modeling - OrientDB
    The OrientDB engine supports Graph, Document, Key/Value, and Object models, so you can use OrientDB as a replacement for a product in any of these categories.
  20. [20]
  21. [21]
    [PDF] Multi-SQL: An extensible multi-model data query language - arXiv
    ABSTRACT. Big data management aims to establish data hubs that support data in multiple models and types in an all- around way. Thus, the multi-model ...
  22. [22]
    An approach to on-demand extension of multidimensional cubes in ...
    One of the potential benefits of MMDWs over traditional DWs is extensibility, which specifically refers to the potential for adding new multidimensional ...
  23. [23]
    A universal approach for multi-model schema inference
    Aug 11, 2022 · We introduce an approach that ensures inference of a common schema of multi-model data capturing their specifics.
  24. [24]
    Multi-Model Databases: A Modern Approach to Data Management
    Feb 22, 2025 · A multi-model database is a database management system designed to support multiple data models within a single, integrated backend.
  25. [25]
    [PDF] Multi-Model Database Management Systems - a Look Forward
    The biggest issue of poly- glot persistency is that the combined DBMSs is neither declarative nor unified. It leaves database application to procedurally join ...
  26. [26]
    Multi-Model Database Systems: The State of Affairs - ResearchGate
    Aug 6, 2025 · A multimodel database allows a company to store data in different data models and allows for all these models to be managed with a single management system.<|control11|><|separator|>
  27. [27]
    Multi-Model Database - Macrometa
    The BASE is typically a multi-model database that focuses on high availability, horizontal scale, as well as fault tolerance instead of consistency. What is ...
  28. [28]
    [PDF] MarkLogic Multi-Model Database
    Marklogic server is a multi-model database that has both modern NosQl and trusted enterprise capabilities to build ... • Secure – Fine-grained, role-based ...
  29. [29]
    Secure your Azure Cosmos DB for NoSQL account - Microsoft Learn
    Sep 10, 2025 · Azure Cosmos DB for NoSQL is a globally distributed, multi-model database service designed for mission-critical applications. ... role-based ...
  30. [30]
    ArangoDB RocksDB | Optimizing Performance with Storage
    ArangoDB, as a native multi-model database, competes with many single-model storage technologies. When we started the ArangoDB project, one of the key ...
  31. [31]
    OrientDB - Wikipedia
    OrientDB uses several indexing mechanisms based on B-tree and Extendible hashing, the last one is known as "hash index". Each record has Surrogate key which ...
  32. [32]
    Multi-model Databases: A New Journey to Handle the Variety of Data
    In this survey, we introduce the area of multi-model DBMSs that build a single database platform to manage multi-model data.
  33. [33]
    [PDF] ArcNeural: A Multi-Modal Database for the Gen-AI Era - arXiv
    Jun 11, 2025 · This paper introduces Arc-. Neural, a novel multi-modal database designed to address the chal- lenges of integrating and managing diverse data ...
  34. [34]
    Evolution management in multi-model databases - ScienceDirect.com
    We introduce a tool called MM-evolver, which enables to carry out user-required changes over a multi-model schema and propagates them across all sub-models.Missing: RDBMS | Show results with:RDBMS
  35. [35]
    [PDF] A Generic Schema Evolution Approach for NoSQL and Relational ...
    In this article, we present a generic schema evolution approach able to support the most popular NoSQL data models (columnar, document, key-value, and graph) ...
  36. [36]
    Multi-model query languages: taming the variety of big data
    May 31, 2023 · This article aims to offer a comprehensive survey of a wide range of multi-model query languages of MMDBs.
  37. [37]
    Documentation: 18: 8.14. JSON Types - PostgreSQL
    PostgreSQL offers two types for storing JSON data: json and jsonb . To implement efficient query mechanisms for these data types, PostgreSQL also provides the ...Missing: multi- | Show results with:multi-
  38. [38]
    ArcadeDB - The Next Generation Multi-Model DBMS
    The next generation multi-model database supporting graphs, key/value, documents, search engine, vectors and time-series.
  39. [39]
    Enterprise-grade natural language to SQL generation using LLMs
    Apr 24, 2025 · Recent advances in generative AI have led to the rapid evolution of natural language to SQL (NL2SQL) technology, which uses pre-trained large ...
  40. [40]
    What is Graph Database Federation? Benefits and Use Cases
    Jul 9, 2024 · Graph database federation enables seamless cross-database queries, enhancing data integration, scalability, and performance.
  41. [41]
    Kafka Connectors | Confluent Documentation
    You can use self-managed Apache Kafka® connectors to move data in and out of Kafka. The self-managed connectors are for use with Confluent Platform.
  42. [42]
    Top 7 Serverless Databases to Use in 2025 - GeeksforGeeks
    Jul 23, 2025 · This article discusses what a serverless database is, its various features, and the top 7 serverless databases that users can use in 2025.
  43. [43]
    Best Multi-Model Databases in 2025 - Slashdot
    Utilize Ignite as a conventional SQL database by employing JDBC drivers, ODBC drivers, or the dedicated SQL APIs that cater to Java, C#, C++, Python, and ...<|separator|>
  44. [44]
    A Comparative Performance Evaluation of Multi-Model NoSQL ...
    Jun 7, 2023 · Their expected benefits include increased versatility, reduced installation complexity, improved database performance, and smaller storage ...
  45. [45]
    The AI Database Landscape: Vector Search, Gen AI, and More
    Sep 16, 2024 · The AI database landscape includes vector-only, relational (with vector support), and multi-model databases, which support various data types ...
  46. [46]
    Self-Adapting Design and Maintenance of Multi-Model Databases
    Sep 13, 2022 · Multi-model data is organised in various mutually interlinked formats and models, often with contradictory features.
  47. [47]
    [PDF] Modernizing Data Engineering: Leveraging Advanced Distributed ...
    May 11, 2025 · Despite their compelling advantages, implementing multi-model database architectures presents significant technical challenges. Query ...<|control11|><|separator|>
  48. [48]
    A resilient and robust framework to dissolve vendor lock-in
    Vendor lock-in occurs at the database layer due to heavy data volumes, high network bandwidth costs, dependencies, or unacceptable downtime. · If there are ...
  49. [49]
    Azure Cosmos DB | Microsoft Azure
    ### Summary of Azure Cosmos DB (Microsoft)
  50. [50]
    Dear DocumentDB customers, welcome to Azure Cosmos DB!
    May 17, 2017 · Azure Cosmos DB, announced at the Microsoft Build 2017 conference, is the first globally distributed, multi-model database service for building
  51. [51]
    A technical overview of Azure Cosmos DB | Microsoft Azure Blog
    May 10, 2017 · Azure Cosmos DB is Microsoft's globally distributed, horizontally partitioned, multi-model database service.Noteworthy Aspects Of Azure... · Resource Model And Api... · Fully Resource Governed...Missing: features | Show results with:features
  52. [52]
    Couchbase Server
    ### Summary of Couchbase Server Features and Use Cases
  53. [53]
    Query Graph Models With Couchbase Recursive CTE
    May 9, 2024 · Couchbase uses Recursive CTEs to query complex data structures, like graph networks, using SQL++ for multi-model support, unlike other NoSQL  ...
  54. [54]
    Couchbase 8.0: Unified Data Platform for Hyperscale AI Applications
    Oct 21, 2025 · With over 400 features and changes, Couchbase 8.0 delivers breakthrough innovations in vector indexing, vector search usage and performance, and ...
  55. [55]
    SingleStore | The Performance You Need for Enterprise AI
    SingleStore delivers the performance you need for enterprise AI. We combine transactional (OLTP) and analytical (OLAP) processing, multi-model data support ...Careers · SingleStore Helios cloud service · Contact Sales · Real-Time AnalyticsMissing: multi- | Show results with:multi-
  56. [56]
    SingleStoreDB: The Real-Time Analytics Database
    SingleStoreDB is a real-time analytics database that powers interactive, in-app applications. From supply chain analytics to fraud, cybersecurity, ...Application With In-App... · Low-Latency Performance For... · Separation Of Storage And...Missing: OLTP OLAP
  57. [57]
  58. [58]
    ArangoDB is a native multi-model database with flexible ... - GitHub
    ArangoDB is a scalable graph database system to drive value from connected data, faster. Native graphs, an integrated search engine, and JSON support.
  59. [59]
    Deploying ArangoDB on Kubernetes and customizing settings
    May 7, 2025 · After deploying the latest ArangoDB Kubernetes operator and configuring storage resources, we will create the ArangoDB database deployment ...
  60. [60]
    ArcadeData/arcadedb - GitHub
    ArcadeDB Multi-Model Database, one DBMS that supports SQL, Cypher, Gremlin, HTTP/JSON, MongoDB and Redis. ArcadeDB is a conceptual fork of OrientDB, ...
  61. [61]
    ArcadeDB Manual
    ArcadeDB is a new generation, multi-model DBMS that runs on most hardware/software, supporting graphs, documents, and other models.Missing: SAP forks
  62. [62]
    [Announcement] Welcome to ArcadeDB, the new official fork of ...
    Sep 1, 2021 · ArcadeDB is a new generation of Multi-Model databases. It has a brand new transactional engine, much faster and lighter than OrientDB.Missing: now extension graph analytics post- 2020 acquisition
  63. [63]
    SurrealDB: The ultimate multi-model database
    Unified multi-model database. Combines document, graph, time-series, relational, geospatial and key-value data models natively, without workarounds or added ...Missing: support | Show results with:support
  64. [64]
    SurrealDB | Solutions | Embedded and Edge
    SurrealDB brings database power to the edge, can be bundled into apps, and operates autonomously with offline-first sync, using the same multi-model database.Missing: Rust- focused traction 2023-2025
  65. [65]
    SurrealDB raises $6M for its database-as-a-service offering
    Jan 4, 2023 · After being bootstrapped for three years (and despite being pre-revenue), SurrealDB closed a seed round recently that came in at $6 million.
  66. [66]
    How SurrealDB hit $5M revenue with a 45 person team in 2025.
    Since its launch in 2022, SurrealDB has shown consistent revenue growth, reflecting its expanding user base and increasing adoption across various industries.Missing: traction | Show results with:traction
  67. [67]
    surrealdb/surrealdb: A scalable, distributed, collaborative, document ...
    SurrealDB is an end-to-end cloud-native database designed for modern applications, including web, mobile, serverless, Jamstack, backend, and traditional ...SurrealDB · Issues 596 · Security · Pull requests 77Missing: traction 2023-2025
  68. [68]
    [PDF] A Benchmark for Multi-Model Database Management Systems
    UniBench consists of a mixed data model, a synthetic multi-model data generator, and a set of core workloads. Specifically, the data model sim- ulates an ...
  69. [69]
    Holistic evaluation in multi-model databases benchmarking
    Dec 6, 2019 · In this paper, we propose UniBench, a generic multi-model benchmark for a holistic evaluation of state-of-the-art MMDBs.
  70. [70]
    [PDF] M2Bench: A Database Benchmark for Multi-Model Analytic Workloads
    UniBench [51] is a recent benchmark proposed for a multi-model DBMS. However, it does not support the array data model, which is essential and common for ...
  71. [71]
    [PDF] Scenario-Based Evaluation of Multi-Model Database Systems
    Multi-model database systems have gained increasing popularity due to their efficient management of diverse types of data and support for complex queries ...
  72. [72]
    HammerDB
    Benchmark the world's most popular relational databases, on-premise and in the cloud. Native multi-DB support. One tool for SQL Server, Db2, Oracle ...Missing: custom workloads model
  73. [73]
    VDBBench 1.0: Real-World Benchmarking for Vector Databases
    Jul 3, 2025 · Discover VDBBench 1.0, an open-source tool for benchmarking vector databases with real-world data, streaming ingestion, and concurrent ...
  74. [74]
    (PDF) Evaluating Cloud-Based vs. On-Premise Data Processing Tools
    Feb 27, 2025 · This study evaluates the performance, scalability, cost efficiency, and security of cloud-based versus on-premise data processing tools through a comprehensive ...
  75. [75]
    NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot ...
    NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. by Pramod J. Sadalage, Martin Fowler. August 2012. Intermediate to advanced. 192 ...
  76. [76]
    A unified metamodel for NoSQL and relational databases
    Multi-model database tools normally use a generic or unified metamodel to represent schemas of the data model that they support. Such metamodels facilitate ...<|control11|><|separator|>
  77. [77]
    (PDF) Type theoretical databases - ResearchGate
    Aug 7, 2025 · The type theory then allows for the specification of database schemas and instances, the manipulation of the same with the usual type ...
  78. [78]
    A unified representation and transformation of multi-model data ...
    May 3, 2022 · In this paper, we extend our previous proposal of multi-model data representation using category theory for transformations between models.
  79. [79]
    Unifying categorical representation of multi-model data
    May 6, 2022 · In this paper, we show how category theory can be used for representation of multi-model data and schema and how the mutual mapping
  80. [80]
  81. [81]
    FLAMMABLE: A Multi-Model Federated Learning Framework ... - arXiv
    Oct 12, 2025 · Multi-Model Federated Learning (MMFL) is an emerging direction in Federated Learning (FL) where multiple models are trained in parallel, ...
  82. [82]
    Comparing Quantum-Resistant Cryptographic Algorithms for ...
    Sep 2, 2025 · To safeguard sensitive data from quantum-based decryption risks, the data layer has to use post-quantum encryption for databases. This ...
  83. [83]
    MtDB: A Decentralized Multi-Tenant Database for Secure Data Sharing
    ### Summary of MtDB: A Decentralized Multi-Tenant Database
  84. [84]
    Towards a Universal Query Representation for Multimodal ...
    In this paper, we present an initial proposal for a universal query representation mechanism for multimodal information retrieval. The proposed approach ...
  85. [85]
  86. [86]
    (PDF) Simulating Multi-Model Data Evolution for Benchmarking Big ...
    Sep 25, 2025 · This paper addresses the challenge of benchmarking multi-model data management systems capable of handling diverse and evolving data.
  87. [87]
    When Graph Meets Multimodal: Benchmarking and Meditating on ...
    Aug 3, 2025 · In this paper, we first propose MAGB, a comprehensive MAG benchmark ... SIGMOD: ACM Special Interest Group on Management of Data · SIGKDD: ACM ...