Fact-checked by Grok 2 weeks ago

Shared-nothing architecture

Shared-nothing architecture is a distributed computing paradigm in which multiple independent nodes, each equipped with its own private memory and storage resources, operate without sharing hardware components such as disks or central memory, communicating solely through a network to exchange data and coordinate operations.^[1] This design contrasts with shared-memory architectures, where processors access a common memory pool, and shared-disk architectures, where nodes share secondary storage but maintain private memory.^[2] First proposed by Michael Stonebraker in 1985 as an efficient approach for high-performance transaction processing systems, shared-nothing architecture emerged from research into parallel database management to address limitations in scalability and cost for handling large-scale data workloads.^[1] Stonebraker argued that it outperforms alternatives by minimizing contention over shared resources, enabling linear scalability as nodes can be added without bottlenecks from centralized components.^[1] Key advantages include enhanced fault tolerance, as a failure in one node isolates its impact without affecting the entire system, and simplified high availability through redundant data partitioning across nodes.^[2] In practice, shared-nothing systems partition data across nodes using techniques like hash-based or range partitioning, allowing parallel query execution where each node processes only its local subset of data.^[2] This architecture has proven particularly effective for data warehousing and decision-support applications, where read-heavy operations benefit from massive parallelism.^[2] Early commercial implementations included systems from Tandem and Tolerant, while modern examples encompass distributed databases like Teradata, Netezza, and Greenplum, which leverage commodity hardware for cost-effective scaling in cloud-native environments.^[1]^[2] Despite challenges in data redistribution during node additions or query coordination across nodes, its emphasis on node autonomy continues to influence big data frameworks and NoSQL systems.^[2]

Fundamentals

Definition and Principles

Shared-nothing architecture is a distributed computing architecture in which multiple nodes, each comprising its own processor, main memory, and disk storage, operate independently without sharing resources such as memory, disks, or interconnection buses across nodes.^[3] This design treats each node as an autonomous unit, akin to a local site in a distributed database system, where the absence of shared hardware components eliminates potential bottlenecks from resource contention.^[3] Node independence is a core principle, ensuring that each node runs its own operating system and manages its local database and software stack without interference from others. Communication between nodes occurs exclusively via message passing protocols over a network interconnect, enabling coordinated operations like query execution and transaction management while preserving isolation.^[3] This approach supports scalability in parallel processing environments, such as massively parallel processors (MPPs), by distributing workload across loosely coupled nodes.^[3] A key operational rule is that each update request is satisfied by precisely one node, which processes the operation locally to maintain atomicity and prevent conflicts that could arise from concurrent access.^[1] Complementing this, the architecture emphasizes data locality, where data processing tasks are executed on the node that stores the relevant data partition, thereby minimizing inter-node data movement and optimizing performance. Data partitioning techniques, such as horizontal fragmentation, facilitate this locality by assigning distinct data subsets to specific nodes.^[3]

Data Partitioning and Sharding

In shared-nothing architecture, data partitioning, often referred to as sharding, involves horizontally dividing a dataset into independent subsets called shards, each assigned to a specific node based on a shard key to ensure node autonomy and minimize inter-node dependencies.^[1] This approach allows each node to manage its shard exclusively, processing operations locally without contention for shared resources.^[4] The shard key, typically a column or attribute in the data, determines the distribution, enabling scalability by adding nodes to handle growing data volumes.^[5] Common types of partitioning include hash-based, range-based, and composite methods, selected based on data characteristics and access patterns. Hash-based partitioning applies a hash function to the shard key, such as a customer ID, to map data evenly across nodes, promoting balanced load distribution and reducing hotspots in high-throughput environments like transaction processing.^[5] Range-based partitioning divides data into contiguous ranges of the shard key, such as timestamps or geographic IDs, which is suitable for ordered data and range queries, as it allows efficient scanning of adjacent shards.^[4] Composite partitioning combines multiple strategies, for instance, first applying range or list partitioning across shardspaces (groups of shards) and then hash partitioning within each space using a consistent hash on additional keys, enabling finer control over data placement in complex schemas.^[6] Replication in shared-nothing systems is optional and typically involves creating full or partial copies of shards across nodes to enhance fault tolerance and query performance, while strict ownership rules ensure that only one node has write authority for a given data item to avoid conflicts.^[7] For example, in master-slave replication, the primary node owns writes, replicating changes asynchronously to replicas for read scalability; peer-to-peer models distribute ownership dynamically but require consensus protocols to maintain consistency.^[5] Ownership is enforced at the granularity of data pages or objects, with mechanisms like sequence numbers tracking versions to detect stale replicas during operations.^[7] Cross-shard operations, such as joins or aggregations spanning multiple shards, are handled by routing queries to the relevant nodes through a centralized coordinator or a distributed query optimizer, which decomposes the query, executes subqueries locally, and merges results.^[8] In coordinator-based systems, a global service manager directs requests to shards based on the shard key, supporting federated queries across heterogeneous nodes without requiring full data movement.^[8] Distributed optimizers parallelize plan generation across nodes, partitioning the search space (e.g., join orders) to worker nodes for concurrent exploration, then selecting the optimal plan at the master node, which scales efficiently for complex queries involving many tables.^[9]

Architectural Comparisons

Versus Shared-Everything

In shared-everything architecture, all nodes in a parallel system have access to a common pool of resources, including memory, disks, and processors, interconnected via a shared bus or high-speed network that allows uniform access to these elements.^[10] This design contrasts with shared-nothing by enabling direct, centralized resource utilization without the need for data redistribution across nodes.^[1] Key differences between the two architectures lie in resource access and coordination mechanisms. Shared-nothing eliminates global locks and contention hotspots by confining data and processing to individual nodes, which communicate only for inter-node queries, thereby reducing synchronization overhead.^[1] In contrast, shared-everything facilitates easier data access for all nodes but introduces risks of bottlenecks at the shared interconnect or memory, as multiple processors compete for the same resources, often requiring complex locking protocols to manage concurrency.^[11] For instance, while shared-nothing relies on data partitioning—such as sharding—to localize operations, shared-everything depends on a unified address space that can simplify query planning but amplifies contention in high-load scenarios.^[1] Scalability represents a primary divergence: shared-nothing achieves near-linear scaling by adding independent nodes, each handling its own workload without impacting the shared infrastructure, allowing systems to grow to hundreds of processors over large distances.^[1] Shared-everything, however, is constrained by the throughput limits of the shared bus or memory bandwidth, leading to diminishing returns beyond a modest number of processors due to Amdahl's law effects on parallel efficiency.^[1] Quantitative assessments from early benchmarks indicate that shared-nothing configurations can sustain higher transaction rates per added node compared to shared-everything setups. Shared-everything is particularly suited to small-scale, tightly coupled systems like symmetric multiprocessor (SMP) machines, where low-latency resource sharing benefits applications with frequent intra-node data exchanges, such as OLTP workloads on fewer than 10 processors.^[10] This makes it less ideal for massively parallel environments, where shared-nothing's independence better supports distributed querying across geographically dispersed nodes.^[1]

Versus Shared-Disk

In shared-disk architecture, multiple processing nodes each maintain independent processors and memory but collectively access a centralized storage system, such as a storage area network (SAN), allowing any node to read or write data from the shared pool.^[1] This contrasts with shared-nothing systems, where each node operates with fully localized storage, eliminating any shared access to disks across nodes.^[12] A primary difference lies in storage access models: shared-nothing architectures localize data through partitioning, thereby avoiding I/O contention as each node handles operations on its dedicated disks without interference from others.^[1] In shared-disk setups, concurrent access to the common storage introduces significant I/O bottlenecks, particularly under heavy loads, since multiple nodes compete for the same disk resources via a shared bus or network.^[12] To manage this concurrency, shared-disk systems rely on sophisticated locking mechanisms, such as distributed lock managers or token-passing protocols, which enforce mutual exclusion and prevent data inconsistencies during simultaneous reads and writes.^[1] Shared-nothing architectures, by design, minimize such needs through node independence, where data sharding ensures that locks are primarily local to each node, reducing the risk of centralized hot spots.^[12] Coordination overhead further distinguishes the two: shared-disk environments demand extensive inter-node communication for cache coherence and synchronized access to shared logs or lock tables, often leading to increased messaging and protocol complexity as the number of nodes grows.^[1] Shared-nothing systems shift this burden to message passing for query coordination and data redistribution, which, while introducing network latency, avoids the persistent synchronization required in shared-disk designs.^[12] The trade-offs highlight shared-disk's advantage in simplifying data management and load balancing, as data need not be pre-partitioned and can be accessed flexibly by any node, facilitating easier recovery and dynamic workload distribution.^[1] However, this comes at the cost of poorer scalability for I/O-intensive workloads, where shared storage becomes a choke point, limiting throughput even as processors are added; shared-nothing architectures excel here by enabling true parallelism, with performance scaling linearly with additional nodes and resources.^[12]

Benefits and Challenges

Advantages

Shared-nothing architecture provides superior scalability compared to shared-resource designs, as new nodes can be added independently to handle growing data volumes and workloads without disrupting ongoing operations or requiring reconfiguration of the entire system. This enables near-linear scaling, where performance increases proportionally with the number of nodes, supporting massive parallelism in query execution across distributed data. For instance, in systems with hundreds of processors, the architecture maintains efficiency by avoiding bottlenecks associated with centralized resources.^[1]^[13] A key benefit is fault isolation, which confines failures to individual nodes, preventing them from cascading across the system and thereby enhancing overall reliability and availability. Redundancy mechanisms, such as data replication on separate nodes, further bolster this resilience, allowing the system to continue functioning even if multiple nodes fail, as long as the affected data partitions are covered by backups. This node-level isolation contrasts with shared architectures, where a single component failure can halt global operations.^[1] Performance advantages stem from data locality and the absence of shared resource contention, where computations occur directly on local data partitions, minimizing network traffic and latency for operations like analytical queries. This locality, facilitated by effective data partitioning and sharding, is particularly effective for online analytical processing (OLAP) workloads, delivering high throughput without the overhead of inter-node synchronization for every transaction. Empirical demonstrations in parallel database systems show near-linear speedup on relational queries under this model.^[1]^[13] The architecture also enables non-disruptive operations, including rolling upgrades and maintenance, by allowing individual nodes to be updated or serviced offline while the rest of the cluster remains operational. This is achieved through the independent nature of nodes, treating upgrades as controlled node failures that the system can tolerate without downtime, thus supporting continuous availability in production environments.^[14]

Disadvantages

One significant drawback of shared-nothing architecture is the cross-node communication overhead incurred during query execution. In this design, queries that span multiple shards necessitate data shuffling or redistribution across the network, which can substantially increase latency and impose heavy loads on interconnect bandwidth. For instance, join operations often require shipping entire tables or intermediate results between nodes, leading to potential saturation of the network beyond a certain scale. This communication cost is particularly pronounced in distributed operations, where the ratio of local processing time to message transmission time can range from 1 to 10, meaning communication costs are comparable to or a fraction of local processing costs in typical environments, yet still amplify overall query execution time when messages are frequent. Managing a shared-nothing system introduces considerable complexity, primarily due to the need for sophisticated data partitioning schemes and ongoing query optimization. Effective partitioning demands careful design to prevent hotspots—regions of uneven data or workload distribution that degrade performance—yet determining optimal object locations across nodes is often described as a challenging "black art" requiring frequent reconfiguration and re-partitioning. Sharding challenges, such as maintaining balance in unpredictable access patterns, further exacerbate this, as automatic tuning mechanisms are essential but not always sufficient to avoid interference. The architecture also exhibits limited flexibility for workloads involving transactions that require atomicity across multiple nodes. Such distributed transactions rely on additional protocols like two-phase commit (2PC) to ensure consistency, which introduces blocking overhead and complicates concurrency control, including distributed deadlock detection. This makes shared-nothing less efficient for applications with frequent cross-partition updates, as the protocol's susceptibility to failures can hinder scalability. Finally, shared-nothing systems can suffer from higher resource utilization inefficiencies due to data skew, where uneven distribution leads to underuse of some nodes while others become overloaded. Load balancing in this context is difficult, often necessitating physical movement of data or processes, which can result in serious imbalances and suboptimal exploitation of available hardware. In cases of unpredictable workloads, this skew can prevent linear scaling and strand computational capacity.

Historical Development

Origins and Early Implementations

The development of shared-nothing architecture in the 1970s and 1980s was primarily motivated by the escalating demands for scalable and reliable transaction processing systems, as organizations grappled with rapidly growing data volumes from applications in banking, telecommunications, and enterprise operations. Traditional single-processor systems struggled to handle high transaction rates without bottlenecks, prompting the exploration of parallel architectures that could distribute workloads across independent nodes to achieve linear scalability and fault tolerance at lower costs. This need was particularly acute in environments requiring continuous availability, where even brief downtimes could result in significant losses, leading to designs that eliminated shared resources to minimize single points of failure and contention.^[1] One of the earliest implementations was Tandem Computers' NonStop hardware system, introduced in 1976, which pioneered a shared-nothing approach for fault-tolerant processing. The system featured multiple independent processors, each with dedicated memory and I/O controllers connected via redundant inter-processor buses, ensuring no shared resources and enabling automatic failover without system interruption. This architecture was specifically designed for high-availability transaction processing, with the first deployment to Citibank demonstrating its capability to support mission-critical workloads. Building on this foundation, Tandem released NonStop SQL in 1986, a relational database that extended the shared-nothing principles to software, partitioning data across nodes for parallel query execution and further enhancing scalability.^[15] Teradata followed with the first commercial shared-nothing massively parallel processing (MPP) database system, the DBC/1012, marketed starting in 1984 for data warehousing applications. This system distributed data and processing across independent nodes using hash-based partitioning, allowing it to handle large-scale analytical queries by leveraging the full capacity of multiple processors without resource contention. The design addressed the limitations of mainframe-attached databases by providing a dedicated backend appliance that scaled incrementally with added nodes, marking a significant advancement in parallel database technology.^[16] The term "shared-nothing" was formally coined by Michael Stonebraker in his paper "The Case for Shared Nothing," presented at the High Performance Transaction Systems (HPTS) conference at UC Berkeley in 1985 and published in 1986, where he advocated for this architecture as the most cost-effective solution for achieving high performance in distributed database systems. Stonebraker's analysis built on existing hardware like Tandem's, emphasizing how shared-nothing configurations could support both transaction and analytical processing by avoiding the scalability bottlenecks of shared-memory or shared-disk alternatives.^[1]

Evolution and Modern Adoption

The Gamma project, developed at the University of Wisconsin-Madison in the mid-1980s as a prototype relational database machine, demonstrated the viability of shared-nothing principles through its use of 20 interconnected VAX processors, each with dedicated local disks, enabling horizontal partitioning of relations for parallel query execution.^[17] This 1986 implementation, which processed complex queries via dataflow techniques with minimal synchronization overhead, laid the groundwork for 1990s advancements in parallel databases by proving linear scalability in I/O-bound workloads and influencing the design of commercial massively parallel processing (MPP) systems.^[17] Building on early Teradata implementations from the 1980s, these developments spurred widespread adoption in enterprise parallel databases during the decade, optimizing for distributed data processing without shared resources.^[18] In the 2000s, shared-nothing architecture saw a significant shift toward open-source distributed systems, exemplified by the introduction of Hadoop's Hadoop Distributed File System (HDFS) in 2006, which distributed large-scale data storage across independent nodes with local disks, adhering to shared-nothing tenets to support fault-tolerant, scalable file management.^[19] HDFS's design, featuring block replication on DataNodes without centralized data storage, complemented the MapReduce paradigm for batch processing, enabling economical handling of petabyte-scale datasets on commodity hardware and marking a transition from specialized database machines to general-purpose big data frameworks.^[19] The 2010s and 2020s brought shared-nothing integration into cloud-native services, with Google BigQuery's 2010 launch employing a separated storage and compute model that echoed shared-nothing scalability by allowing independent scaling of query processing across distributed nodes via Google's infrastructure.^[20] Similarly, Amazon Redshift, introduced in 2012, utilized a shared-nothing MPP architecture where compute nodes managed local attached storage for columnar data, facilitating efficient parallel execution of analytical queries.^[21] Snowflake, launched in 2014, adopted a hybrid approach blending shared-nothing MPP compute clusters with a central storage layer, enabling serverless elasticity for multi-tenant workloads without resource contention.^[22] As of 2025, recent trends emphasize hybrid models that combine shared-nothing compute with disaggregated storage, addressing limitations in traditional setups for AI and machine learning workloads by decoupling persistent data from transient processing resources in cloud environments.^[23] This evolution, as surveyed in contemporary database research, supports elastic scaling for high-throughput AI training and inference while mitigating bottlenecks in data access, with implementations optimizing for low-latency caching across disaggregated clusters.^[23]

Practical Applications

In Database Systems

In relational database systems, shared-nothing architecture is prominently utilized in massively parallel processing (MPP) environments to enable efficient data warehousing and analytics. Teradata Vantage employs a shared-nothing design where each Access Module Processor (AMP) independently manages its own virtual disks and processes data without resource contention, facilitating linear scalability by adding nodes.^[24] This architecture supports intra-query parallelism, where individual query operations like scans and joins are executed concurrently across AMPs, and inter-query parallelism, allowing multiple queries to run simultaneously on different AMPs for enhanced throughput.^[24] Originating from early implementations in the 1980s, with its first shared-nothing system released in 1984, Teradata's adoption marked a foundational step in commercializing this approach for large-scale relational processing.^[25] Similarly, IBM Netezza Performance Server leverages a shared-nothing MPP framework, combining symmetric multiprocessing within nodes and massive parallelism across them, to process petabyte-scale datasets through independent snippet processing units that handle data locally.^[26]^[27] In NoSQL databases, shared-nothing principles underpin distributed key-value and wide-column stores for handling unstructured or semi-structured data at scale. Apache Cassandra, introduced in 2008, adopts a shared-nothing architecture where each node owns its data partitions and replicas, enabling decentralized storage and replication across clusters without centralized coordination.^[28] This design supports tunable consistency levels, allowing applications to balance availability and data accuracy per operation, such as using quorum reads for stronger guarantees or one-level for higher performance.^[29] ScyllaDB, a high-performance rewrite compatible with Cassandra's query language, also implements shared-nothing by assigning shards to CPU cores on each node, ensuring independent operation and low-latency access to distributed key-value data with the same tunable consistency model.^[30]^[31] Query processing in shared-nothing database systems involves distributing operations across nodes to minimize data movement. Distributed SQL engines partition data via hashing or range methods, routing joins—such as partitioned parallel hash joins—to local nodes where matching tuples are co-located, while aggregations are computed partially on each node before merging.^[32] A coordinator node typically parses the query, dispatches fragments to relevant shards, and assembles final results, reducing network overhead through techniques like partial aggregation to limit inter-node communication.^[32] The shared-nothing model delivers high throughput in database analytics workloads, particularly for petabyte-scale data warehouses. Teradata systems, for instance, sustain analytical queries on multi-petabyte volumes with parallel execution.^[24] In NoSQL contexts, Cassandra achieves millions of operations per second for write-heavy analytics, scaling to petabyte clusters by adding nodes without performance degradation.^[33]

In Big Data and Cloud Computing

In big data frameworks, the Hadoop ecosystem exemplifies shared-nothing architecture by distributing storage and computation across independent nodes to handle massive datasets. The Hadoop Distributed File System (HDFS) stores data locally on each node, replicating blocks across the cluster for fault tolerance without relying on shared storage, which enables horizontal scaling on commodity hardware.^[34] MapReduce, Hadoop's original processing engine, processes data in parallel by moving computation to the data locality on these nodes, minimizing network overhead.^[35] YARN, introduced as a resource manager, further enhances this by allocating CPU and memory resources dynamically across the independent nodes, supporting diverse workloads beyond batch processing.^[34] Cloud services leverage shared-nothing principles for elastic scaling in data lakes and analytics, allowing seamless addition of nodes without performance bottlenecks. Amazon Redshift employs a massively parallel processing (MPP) shared-nothing architecture, where each compute node manages its own storage and processing for distributed queries on petabyte-scale data.^[36] This design facilitates high-throughput analytics in environments like data lakes by partitioning data across leader and compute nodes, enabling automatic scaling through concurrency features.^[36] Similarly, Google Cloud BigQuery utilizes a shared-disk MPP model to decouple storage from compute, processing queries across independent slots in a serverless manner for cost-effective handling of unstructured data in analytics pipelines.^[37]^[20] Modern extensions build on these foundations for faster processing paradigms. Apache Spark's cluster mode operates in a shared-nothing environment, distributing in-memory computations across nodes via resilient distributed datasets (RDDs), which allow iterative algorithms to run efficiently without disk I/O dependencies.^[38] This approach contrasts with disk-based systems like MapReduce, enabling Spark to process data up to 100 times faster in memory while scaling linearly on commodity clusters for machine learning and streaming workloads.^[39]

References

[1]
[PDF] The Case for Shared Nothing 1. INTRODUCTION 2. A SIMPLE ...
The Case for Shared Nothing. Michael Stonebraker. University of California. Berkeley, Ca. ABSTRACT. There are three dominent themes in building high transaction ...
[2]
[PDF] Architecture of a Database System - University of California, Berkeley
The shared-nothing architecture is fairly common today, and has unbeatable ... Stonebraker, “Operating system support for database management,” Com-.
[3]
https://dl.acm.org/doi/10.1145/234313.234368
[4]
https://aerospike.com/blog/shared-nothing-architecture/
[5]
Shared-Nothing Architecture Explained - Aerospike
Aug 20, 2025 · SHARE. Shared-nothing architecture is a distributed computing design in which each node in a system is independent, with no shared memory or ...
[6]
[PDF] Sharding by Hash Partitioning - SciTePress
In the nothing shared parallel architecture, processors communicate through a high speed network and each of them has its own primary and secondary memory. That ...Missing: seminal | Show results with:seminal
[7]
2 Oracle Sharding Architecture and Concepts
With composite sharding, data is first partitioned by list or range across multiple shardspaces, and then further partitioned by consistent hash across multiple ...Sharding Methods · System-Managed Sharding · Using Oracle Data Guard With...
[8]
[PDF] Zephyr: Live Migration in Shared Nothing Databases for Elastic ...
Read/write access (called ownership) on database pages of the tenant is partitioned between the two nodes with the source node owning all pages at the start and ...
[9]
Oracle Sharding Overview
Sharding uses a shared-nothing architecture in which shards share no hardware or software. All of the shards together make up a single logical database, called ...
[10]
[PDF] Parallelizing Query Optimization on Shared-Nothing Architectures∗
Each worker node translates its partition ID into a set of constraints on join orders and only considers query plans that comply with those constraints. Each ...Missing: request | Show results with:request<|control11|><|separator|>
[11]
What is shared memory architecture in parallel databases?
Jul 4, 2024 · A computer that has several simultaneously active CPU attached to an interconnection network and share a single main memory and a common array of disk storage.
[12]
Shared Memory Architecture - an overview | ScienceDirect Topics
Shared-everything architecture. Shared-everything architecture refers to system architecture where all resources are shared including storage, memory, and ...<|control11|><|separator|>
[13]
[PDF] How to Build a High-Performance Data Warehouse
In a shared-disk architecture, there are a number of independent processor nodes, each with its own memory. These nodes all access a single collection of disks, ...
[14]
Much Ado About Shared-Nothing
In a 'shared-nothing' parallel computer, each processor has its own memory and disks and processors communicate by passing messages through an interconnect.
[15]
Gorilla: A Fast, Scalable, In-Memory Time Series Database
Mar 24, 2015 · There is the added benefit that one can model rolling software upgrades as a set of controlled, single node failures, so optimizing for this ...
[16]
[PDF] Tandem NonStop History
After just two years in development, the first Tandem NonStop system was delivered to. Citibank in the USA in 1976. This system was a pioneer in the ...
[17]
[PDF] A Benchmark of NonStop SQL Release 2 Demonstrating Near
The Tandem NonStop system typifies a shared-nothing design. Previous research prototypes such as Gamma at University of Wisconsin [Dewitt-1], and Bubba at ...Missing: source | Show results with:source
[18]
[PDF] New Directions in Database-Systems Research and Development
The Teradata Corporation began marketing the DBC/1012 in 1984. Users interact with the system via a program running on a host. Communication between the.
[19]
[PDF] GAMMA - A High Performance Dataflow Database Machine
Gamma is a relational database machine using dataflow query processing, with 20 VAX 11/750 processors, and self-scheduling except for 3 control messages.Missing: nothing | Show results with:nothing
[20]
A Brief History Of Parallel Database Architectures And Their ...
Apr 21, 2023 · The shared-nothing design employs individual local disks for every processor, with processor communication exclusively reliant on the network.Missing: MPP 1983
[21]
Apache Hadoop 3.4.2 – HDFS Architecture
### HDFS Architecture and Shared-Nothing Principles
[22]
BigQuery overview | Google Cloud
### BigQuery Summary
[23]
Amazon Redshift architecture - AWS Documentation
Amazon Redshift is a relational database using massively parallel processing, columnar data storage, and data compression for efficient query performance.
[24]
Snowflake key concepts and architecture
Snowflake's architecture is a hybrid of traditional shared-disk and shared-nothing database architectures. Similar to shared-disk architectures, Snowflake ...Database storage · Compute · Cloud services · Data engineering
[25]
Disaggregation: A New Architecture for Cloud Databases
Sep 16, 2025 · This paper offers a perspective on the disaggregation trend ... VLDB (2025). Google Scholar. [18]. Fatma Özcan, Yuanyuan Tian, and ...
[26]
Teradata Vantage Engine Architecture and Concepts
Teradata's architecture is designed around a Massively Parallel Processing (MPP), shared-nothing architecture, which enables high-performance data processing ...Missing: 1983 | Show results with:1983
[27]
[PDF] The Netezza Data Appliance Architecture: - IBM
The Netezza architecture combines the best elements of SMP and MPP to create a purpose-built appliance for running blazing fast analytics on petabytes of data.
[28]
The Apache Software Foundation Announces Apache Cassandra ...
Apr 13, 2010 · Apache Cassandra is an advanced, second-generation “NoSQL” distributed data store that has a shared-nothing architecture.
[29]
Architecture | Apache Cassandra Documentation
This section describes the general architecture of Apache Cassandra. Overview · Dynamo · Storage Engine · Guarantees. Snitches. Get started with Cassandra, ...Overview · Dynamo · Storage EngineMissing: shared- nothing
[30]
What is Shared Nothing Architecture? Definition & FAQs | ScyllaDB
A shared-nothing architecture consists of multiple nodes that do not share resources (eg, memory, CPU, and NIC buffer queues).
[31]
ScyllaDB Architecture - Fault Tolerance
The Consistency Level is tunable per operation in CQL. This is known as tunable consistency. Sometimes response latency is more important, making it ...Missing: nothing | Show results with:nothing
[32]
[PDF] Chapter 22: Parallel and Distributed Query Processing
Step 1: Partition the relation on the grouping attributes. ▫ Step 2: Compute the aggregate values locally at each node. ▫ Optimization: Can reduce cost of ...
[33]
About Apache Cassandra - DataStax Docs
Cassandra's built-for-scale architecture means that it is capable of handling petabytes of information and thousands of concurrent users/operations per second.
[34]
Apache Hadoop: Advantages, Disadvantages, and Alternatives
Jul 29, 2022 · Scalability. Thanks to shared-nothing architecture, where all nodes are independent, Hadoop clusters are easy to scale without stopping the ...
[35]
Shared nothing architectures: Giving Hadoop's data processing ...
Oct 7, 2016 · This is an architecture where each node is completely independent of other nodes in the system. There are no shared resources that can become bottlenecks.Missing: 2006 | Show results with:2006
[36]
ETL and ELT design patterns for lake house architecture using ...
Dec 13, 2019 · Amazon Redshift is a fully managed data warehouse service on AWS. It uses a distributed, MPP, and shared nothing architecture. Redshift Spectrum ...
[37]
How Google BigQuery Compares as a Data Warehouse | Stitch
Data Warehouse Comparisons ; Architecture, Shared-nothing MPP, Shared-nothing MPP ; Server management, Serverless, More self-managed ; Deployment, Cloud-based ...
[38]
Apache Spark vs Hadoop - A detailed technical comparison
MapReduce is a shared-nothing architecture for big data processing using distributed algorithms while leveraging a commodity hardware cluster. It takes care of ...Apache Spark Vs Hadoop · Parallel Vs. Distributed... · Apache Hadoop
[39]
[PDF] Apache Spark: A Unified Engine for Big Data Processing
Nov 2, 2016 · On the data-sharing front, RDDs make data sharing fast by avoiding replication of intermediate data and can closely emulate the in-memory. “data ...<|separator|>
[40]
Edge Computing for IoT: Top Trends You Need to Know in 2025
Jun 23, 2025 · Learn how edge computing for IoT is the top trend you need to know in 2025, as it is empowering real-time processing, reducing latency, ...