Fact-checked by Grok 2 weeks ago

Apache HBase

Apache HBase is an open-source, distributed, versioned, non-relational database that runs on top of the Hadoop Distributed File System (HDFS) and is designed to provide random, real-time read and write access to large amounts of structured data.^[1] It is modeled after Google's Bigtable, a distributed storage system for managing structured data at massive scale, enabling the hosting of tables with billions of rows and millions of columns across clusters of commodity hardware.^[2]^[1] Key features of HBase include linear and modular scalability to handle petabyte-scale data volumes, strictly consistent reads and writes, and automatic sharding with built-in failover for high availability.^[1] It supports multiple access methods, such as a Java API for programmatic interaction, a Thrift gateway for non-Java clients, RESTful web services for HTTP-based access, and a JRuby-based shell for administrative tasks.^[1] Performance optimizations like block caching, Bloom filters for efficient lookups, and server-side filtering further enhance its suitability for real-time big data applications.^[1] HBase originated from the concepts outlined in the 2006 Bigtable paper by researchers at Google, which described a fault-tolerant, scalable storage system for structured and semi-structured data.^[2] Development of HBase began in 2006 as a sub-project of Hadoop and was accepted into the Apache Software Foundation in 2007, evolving into a top-level project by 2010 under the Apache License 2.0.^[1] Today, it serves as a foundational component in the Hadoop ecosystem for use cases including time-series data storage, messaging, and real-time analytics in distributed environments.^[1]

Overview

Definition and Purpose

Apache HBase is an open-source, distributed, scalable, column-oriented NoSQL database that provides random, real-time read/write access to large amounts of sparse data across clusters of commodity hardware.^[1] Modeled after Google's Bigtable, a distributed storage system for structured data described in a seminal 2006 paper, HBase adapts this design to the open-source ecosystem while supporting tables with billions of rows and millions of columns.^[1]^[2] The primary purpose of HBase is to serve as a fault-tolerant big data store within the Hadoop ecosystem, enabling efficient storage and retrieval of petabytes of data without the rigid schema constraints of traditional relational databases.^[3] It achieves high throughput for massive, sparse datasets by leveraging the Hadoop Distributed File System (HDFS) for underlying storage, ensuring durability through data replication and automatic failover mechanisms.^[1]^[4] Core design goals include linear scalability across distributed nodes, robust fault tolerance to maintain availability during hardware failures, and seamless integration with Hadoop tools like MapReduce for processing large-scale data workloads.^[3] This makes HBase particularly suited for applications requiring low-latency access to vast, semi-structured datasets in real-time environments.^[1]

Key Features

Apache HBase is designed to handle massive datasets in big data environments through a set of core features that emphasize scalability, performance, and efficiency. These capabilities allow it to manage billions of rows and millions of columns on clusters of commodity hardware, drawing from its modeling after Google's Bigtable while integrating seamlessly with the Hadoop ecosystem.^[5] One of HBase's primary strengths is its horizontal scalability, achieved through automatic region splitting and load balancing. Tables in HBase are divided into regions, which are distributed across multiple RegionServers; when a region grows beyond a configurable size threshold, it splits automatically to distribute the load, enabling linear scaling as more servers are added to the cluster. The HBase Master uses algorithms like the StochasticLoadBalancer to periodically rebalance regions across servers, ensuring even distribution of workload and preventing hotspots.^[6] HBase provides strong consistency for reads and writes, offering ACID-like properties for single-row operations. This is facilitated by its multi-version concurrency control (MVCC) mechanism, which allows concurrent transactions to proceed without locks by maintaining multiple versions of data cells, each timestamped for resolution during reads. As a result, HBase ensures atomicity and isolation for individual row mutations, making it reliable for applications requiring immediate data integrity.^[6]^[5] The system excels at handling sparse data efficiently, storing only non-null values in its column-family-based model to avoid wasting space. HBase tables function as distributed, sparse, multi-dimensional sorted maps, where rows, column qualifiers, and timestamps define unique cells; empty cells are simply omitted from storage in HFiles on HDFS, optimizing disk usage for wide tables with irregular data patterns.^[7] HBase supports real-time, low-latency random read and write operations, enabling high-throughput access to data without the need for batch processing. Features like the block cache for in-memory data retention and Bloom filters for quick existence checks contribute to sub-millisecond response times in typical workloads, making it suitable for interactive applications atop distributed storage.^[6]^[5] At its core, HBase employs MVCC to manage data versioning, allowing multiple versions of a cell to coexist based on timestamps without blocking concurrent operations. This lock-free approach supports snapshot isolation, where readers see a consistent view of the database at a specific point in time, enhancing concurrency in multi-user environments.^[6] Finally, HBase integrates natively with HDFS for durable, fault-tolerant storage, leveraging HDFS's distributed file system to persist data across the cluster. All HBase data, including HFiles and write-ahead logs, is stored in HDFS, ensuring high availability and recovery from failures without data loss.^[6]

History

Origins and Development

Apache HBase was inspired by Google's Bigtable, a distributed storage system described in a 2006 research paper that outlined a scalable approach to managing structured data across commodity servers.^[2] The project originated in 2006 at Powerset, a San Francisco-based company focused on natural language search, where developers sought a Bigtable-like database to handle massive, sparse datasets for web document processing on top of Apache Hadoop's Distributed File System (HDFS).^[8] Powerset contributed the initial open-source implementation as a Hadoop subproject, with early work led by engineers including Chad Walters, Jim Kellerman, and Michael Stack, who adapted Bigtable's column-family model to Hadoop's ecosystem.^[9] The first usable version of HBase was released on October 29, 2007, bundled with Hadoop 0.15.0, marking its debut as a functional distributed store capable of basic read-write operations on large tables.^[10] By February 2008, HBase had formally become a subproject of Apache Hadoop, enabling deeper integration and community-driven enhancements while benefiting from Hadoop's fault-tolerant infrastructure.^[11] This period saw initial contributions from the broader Hadoop community, focusing on core functionality like region server management and basic scalability. HBase graduated to an Apache top-level project on May 10, 2010, signifying its maturity and independence from Hadoop's direct oversight, which allowed for accelerated development under a dedicated project management committee.^[12] Post-2008 efforts emphasized stability improvements, such as refined region assignment and split policies to reduce outages in multi-node clusters. By 2010, enhancements to fault-tolerance, including better master failover mechanisms and replication support, solidified HBase's reliability for production workloads, drawing further adoption from enterprises like Facebook for high-throughput applications.^[8]

Major Releases

Apache HBase 1.0.0, released on February 24, 2015, marked the project's achievement of production readiness after seven years of development, incorporating over 1,500 resolved issues from prior versions including API reorganizations for better usability, enhanced overall stability through fixes in region server handling and recovery mechanisms, the introduction of the Coprocessor API to enable efficient server-side processing logic without client round-trips, and improved Write-Ahead Log (WAL) management for more reliable data durability and replication.^[13] The HBase 2.0.0 release, issued on April 30, 2018, shifted focus toward performance optimizations and administrative robustness, introducing a procedure-based administration framework that uses durable, atomic procedures for cluster operations like splits and merges to ensure consistency even under failures, asynchronous WAL implementation to boost write throughput by offloading log appends from the critical path, and Mob (Medium Object Blob) storage to handle large attachments exceeding typical cell sizes (default threshold of 100 KB) by offloading them to HDFS for reduced I/O overhead in blob-heavy workloads such as mobile data applications.^[14] Subsequent updates in the 2.4.x series continued with refinements in region management with improved assignment algorithms and failover handling to minimize split-brain scenarios during node failures, alongside full compatibility with Hadoop 3.x ecosystems for better integration with modern erasure coding and federation features, culminating in version 2.4.18 as the final patch release.^[15]^[16] The 2.5.x lineage advanced with releases up to 2.5.13 as of November 2025, and the 2.6.x series to 2.6.4 as of November 2025, introducing striped compaction to parallelize major compactions across multiple files for faster processing in high-throughput environments, enhanced Kerberos security with refined keytab rotation and delegation support for secure multi-hop access, and cloud-native optimizations including better integration with object stores like S3 for WAL and snapshot storage to support scalable, serverless deployments.^[17]^[18]^[19] Development toward HBase 3.0.0 began with beta releases in 2024, focusing on further enhancements for compatibility and performance in evolving big data ecosystems.^[19] Under Apache governance, HBase follows a structured release cycle involving alpha phases for feature experimentation and beta phases for stabilization and community testing before general availability, with a strong commitment to backward compatibility through semantic versioning where major releases may introduce API changes but minor and patch versions preserve existing client and data behaviors.^[20]^[21]

Data Model

Core Components

Apache HBase employs a non-relational data model inspired by Google's Bigtable, organizing data into a multi-dimensional, sorted, sparse map to support efficient storage and retrieval of large-scale structured data.^[2] This model centers on tables as the primary logical containers, with rows identified by unique keys, and data grouped into column families that enable flexible, schema-optional column definitions.^[7] Tables in HBase serve as logical containers for data, each identified by a unique table name and capable of spanning multiple regions across the distributed system for scalability.^[7] Unlike traditional relational tables with fixed schemas, HBase tables are designed to handle variable data structures, where rows can contain different sets of columns without predefined constraints.^[7] Each row within a table is uniquely identified by a row key, which is a byte array serving as the primary index for data access.^[7] Row keys are stored and sorted in lexicographical order, facilitating efficient range scans and ordered retrieval of rows based on key prefixes or sequences.^[7] Column families represent groups of related columns that share common storage attributes, such as the number of versions retained, time-to-live (TTL) settings, compression algorithms, or bloom filters for query optimization.^[7] These families are defined at table creation time and remain fixed thereafter, providing a coarse-grained schema that balances flexibility with performance.^[7] Within a column family, individual columns are dynamically specified using a qualifier, forming a full column identifier as <family>:<qualifier>, where both are byte arrays.^[7] This design allows columns to be added on-the-fly without altering the table schema, supporting applications with evolving data requirements.^[7] The atomic storage unit in HBase is the cell, which combines a row key, column family, column qualifier, and a timestamp to store a value as a byte array.^[7] Timestamps enable cell versioning, allowing multiple values for the same row-column pair over time, with the latest typically used unless specified otherwise.^[7] HBase's data model inherently supports sparsity, where rows do not require values in every possible column, and absent cells consume no storage space.^[7] This feature is particularly advantageous for datasets with irregular or semi-structured information, minimizing overhead and enhancing efficiency for wide tables with many optional attributes.^[7]

Versioning and Timestamps

In Apache HBase, timestamps serve as long integer values that identify the version of data stored in each cell, enabling temporal tracking of mutations. These timestamps are typically assigned automatically by the RegionServer using the current system time in milliseconds when a client does not specify one, though clients may explicitly provide a timestamp for precise control over versioning.^[22] This mechanism allows HBase to maintain a history of changes to a cell's value over time, distinguishing between different versions based on their associated timestamps.^[7] HBase supports multi-version storage within each cell, where multiple values can coexist, each tagged with a unique timestamp to represent sequential updates. By default, HBase retains up to one version per cell, though this is configurable per column family to balance storage efficiency and historical retention.^[7] Older versions are automatically pruned during minor or major compactions when they exceed the maximum version limit or when a time-to-live (TTL) policy expires them, ensuring bounded storage growth without manual intervention.^[7] The TTL, set per column family and defaulting to forever (no expiration), defines the lifespan of cell data in milliseconds from the timestamp of insertion.^[22] To enable concurrent reads and writes without interference, HBase employs Multi-Version Concurrency Control (MVCC), which provides snapshot isolation for transactions. Under MVCC, reads obtain a consistent view of the database at a specific read point, filtering out uncommitted or newer writes based on cell timestamps and embedded MVCC sequence numbers, thus avoiding locks and allowing non-blocking operations.^[23] Conflicts during writes are resolved using these timestamps, where newer timestamps supersede older ones in the same cell, maintaining logical consistency across distributed regions. Deletes in HBase are implemented not by immediate removal but through tombstones—special marker cells with delete types (e.g., column, family, or row deletes) and timestamps that effectively hide targeted versions from subsequent reads. These tombstones propagate the deletion intent across versions newer than their timestamp, ensuring deleted data remains invisible in scans while preserving multi-version semantics.^[7] Tombstones themselves are cleaned up only during major compactions, after a configurable purge delay to allow replication consistency.^[24] Versioning behavior is primarily configured at the column family level via the HColumnDescriptor, including the maximum versions to retain (default: 1, set via hbase.column.max.version), minimum versions to keep even after TTL expiration (default: 0, via MIN_VERSIONS), and TTL as noted. While per-cell min/max timestamp bounds are not directly configurable in storage, clients can enforce them operationally through scan filters specifying time ranges (e.g., via setTimeRange in Scan or Get operations) to retrieve or manage versions within specific temporal windows.^[22] These settings are defined during table creation or alteration using the HBase shell, API, or configuration files like hbase-site.xml.

Architecture

Main Components

The main components of Apache HBase constitute a distributed runtime environment designed for scalable, fault-tolerant data management on top of Hadoop. These include the HMaster for administrative oversight, RegionServers for data servicing, ZooKeeper for coordination, a lightweight client library for application access, and HDFS as the foundational storage layer, with HBase overlaying its own metadata structures like the .META. table to enable efficient operations.^[25] The HMaster is the primary master server that oversees the HBase cluster, handling data definition language (DDL) operations such as creating, altering, and dropping tables, as well as managing table schemas and namespace operations. It assigns regions—horizontal partitions of tables—to available RegionServers upon table creation or during recovery, monitors the lifecycle and health of RegionServers through periodic heartbeats, and executes load balancing to redistribute regions and optimize cluster performance. For high availability, HBase supports an active-passive failover model where backup HMaster instances stand ready to assume control if the active master fails, with the transition orchestrated via ZooKeeper to minimize downtime. The HMaster does not directly handle client data requests, focusing instead on coordination to ensure cluster stability.^[25] RegionServers function as the core worker processes in the HBase cluster, each running on a dedicated node to host and manage one or more regions assigned by the HMaster. They process client read and write requests for their hosted regions, leveraging in-memory memstores to buffer recent mutations for low-latency access before flushing them to immutable HFiles on disk when thresholds like memstore size limits are reached. RegionServers also maintain write-ahead logs (WALs) for durability and report region load metrics to the HMaster to inform balancing decisions, ensuring that data operations remain localized and efficient without routing through the master. Multiple RegionServers operate in parallel across the cluster, scaling horizontally to handle growing data volumes.^[25] ZooKeeper serves as an external, distributed coordination service that underpins HBase's fault tolerance and consistency, operating as a quorum of nodes (typically three or five for production) to maintain a centralized view of cluster state. It enables leader election for the active HMaster, tracks the registration and ephemeral znodes of live RegionServers to detect failures promptly, and provides distributed locks and synchronization primitives for operations like region assignment and server handoff. All HBase components, including the HMaster and RegionServers, connect to ZooKeeper upon startup to register their presence and retrieve configuration details, with session timeouts configured to trigger failover if connectivity lapses. This service is crucial for avoiding split-brain scenarios in distributed environments.^[26]^[25] The client library provides a thin, synchronous or asynchronous interface for applications to interact with HBase, encapsulating remote procedure calls (RPCs) to connect directly to the appropriate RegionServers for data operations such as inserts, updates, deletes, and queries. This direct-access model bypasses the HMaster for performance-critical data paths, reducing latency and contention, while relying on ZooKeeper to resolve region locations and the .META. table for precise routing. Clients handle retries and failover transparently, supporting multiple programming languages through APIs like Java, REST, and Thrift, and are configured with parameters like RPC timeouts to ensure reliable communication in distributed setups.^[25] HBase depends on HDFS as its underlying distributed file system for persistent storage, where RegionServers write HFiles—sorted, immutable files containing column family data—and WALs to a shared root directory, benefiting from HDFS's replication and fault tolerance to safeguard against node failures. Unlike raw HDFS usage, HBase augments this with its own metadata layer via the .META. table, a special distributed table that catalogs region boundaries, server assignments, and timestamps, stored as HFiles in HDFS and queried by clients and the HMaster to locate data efficiently. This integration allows HBase to provide random access semantics atop HDFS's sequential strengths.^[25]

Storage and Distribution

Apache HBase organizes data into tables that are horizontally partitioned into regions, each encompassing a contiguous range of row keys to enable scalable distribution across multiple servers. Regions serve as the basic unit of scalability and load distribution in an HBase cluster, with new regions created automatically when an existing one exceeds a configurable size threshold, defaulting to approximately 10 GB per region as defined by the hbase.hregion.max.filesize parameter. This splitting process ensures balanced data distribution and prevents any single region from becoming a performance bottleneck, with the default policy being the SteppingSplitPolicy that gradually increases the target region size after splits to reduce frequent splitting.^[27] Within each region, data is persisted in HFiles, which are immutable, sorted files stored on the underlying Hadoop Distributed File System (HDFS). Each HFile contains a sequence of key-value pairs organized by row key, column family, column qualifier, and timestamp, allowing for efficient range scans and point lookups. To optimize read performance, HFiles incorporate Bloom filters—probabilistic data structures that quickly determine if a key likely exists in the file, thereby minimizing unnecessary disk I/O—enabled by default at the row level via the BLOOMFILTER table descriptor setting.^[28]^[27] Writes to HBase are first buffered in the MemStore, an in-memory data structure per column family that accumulates mutations until it reaches a flush threshold, typically 128 MB as set by hbase.hregion.memstore.flush.size, at which point the data is persisted to a new HFile on disk. To ensure durability against server crashes, all writes are also appended to the Write-Ahead Log (WAL), a durable append-only file on HDFS that records the sequence of edits for recovery purposes; the WAL rolls over periodically, defaulting to every hour via hbase.regionserver.logroll.period. This combination of in-memory buffering and logged persistence allows HBase to handle high write throughput while maintaining data integrity.^[27]^[29] Data replication in HBase leverages HDFS for synchronous replication within a single cluster, where multiple copies of HFiles and WALs are maintained across nodes according to HDFS block replication factors, ensuring fault tolerance and data availability. For cross-cluster scenarios, HBase provides an asynchronous replication mechanism using its built-in replication tool, which ships WAL edits from a source cluster to one or more peer clusters, applying them in the background to maintain eventual consistency without impacting primary write performance.^[30]^[31] Metadata for region locations is managed through the .META. table, a special system table that stores information about user table regions, including their row key ranges and hosting RegionServers, queried by clients to route operations efficiently. In distributed mode, the location of the .META. regions is discovered via ZooKeeper, eliminating the need for the deprecated root region that was used in earlier versions to bootstrap metadata navigation. This hierarchical metadata approach supports dynamic region assignments and cluster scalability.^[27]^[32]

Operations

Data Ingestion and Retrieval

Data ingestion in Apache HBase primarily occurs through the Put operation, which enables atomic mutations to a single row using the Put API. When a client issues a Put, the data is first appended to the Write-Ahead Log (WAL) for durability, ensuring recovery in case of a RegionServer failure, and then buffered in the in-memory MemStore.^[25] These writes are asynchronous, with the MemStore flushing to on-disk StoreFiles (HFiles) only when it reaches a configurable size threshold, such as the default of 128 MB (hbase.hregion.memstore.flush.size).^[25] This design balances performance and persistence, allowing high-throughput ingestion without immediate disk I/O for every mutation.^[25] Retrieval of individual data points is handled by the Get operation, which performs a direct lookup using the row key via the Get API. The client locates the relevant RegionServer, and the server merges the most recent version of the requested cells from the MemStore (for unflushed data) and applicable HFiles, returning the latest timestamped value or a specific version if requested.^[25] To optimize performance, Gets leverage the block cache, which by default allocates 40% of the JVM heap to store frequently accessed data blocks from HFiles.^[25] Deletes in HBase are implemented using the Delete API, which applies timestamped markers known as tombstones rather than immediately removing data. These markers can target an entire row, a column family, or specific columns within a row, and are written to the WAL and MemStore similarly to Puts.^[25] The deleted data remains visible until a major compaction process merges HFiles and purges the tombstones along with the associated cells, typically after a short retention period defined by hbase.hstore.time.to.purge.deletes (default 0 ms).^[25] This deferred cleanup avoids costly in-place modifications while maintaining consistency during reads. HBase provides ACID guarantees at the single-row level for all mutations, including Puts, Gets, and Deletes, ensuring atomicity, consistency, isolation, and durability within a row across multiple column families.^[33] For conditional multi-mutation operations on a single row, the checkAndPut API enables atomic read-modify-write semantics, akin to a compare-and-set operation, where a Put succeeds only if a specified cell matches an expected value.^[33] However, HBase does not support distributed transactions or atomicity across multiple rows; operations like multi-Put return per-row success/failure indicators without all-or-nothing guarantees.^[33] Error handling for ingestion and retrieval operations relies on client-side retries in the event of RegionServer failures or transient issues. The client automatically retries failed requests up to a maximum of 15 attempts (hbase.client.retries.number), with an initial pause of 100 ms between retries (hbase.client.pause), escalating for conditions like server overload. If retries are exhausted, exceptions such as RetriesExhaustedException or SocketTimeoutException are thrown, bounded by the operation timeout of 1,200,000 ms (hbase.client.operation.timeout). This mechanism ensures resilience without requiring manual intervention for common failures.

Scans and Compactions

In Apache HBase, scans enable efficient iterative access to data across a range of rows, leveraging the Scan API to perform range queries without retrieving the entire table. The Scan class, part of the client API, allows specification of a start row and stop row to define the query boundaries, fetching rows in lexicographical order based on row keys.^[34] This approach supports bulk data retrieval, such as processing all rows within a key prefix, by constructing a Scan object and iterating over results via the ResultScanner interface. Server-side filters enhance scan efficiency by applying predicates directly on the RegionServer, minimizing data transfer over the network. For instance, a PrefixFilter restricts results to rows sharing a common key prefix, while a RowFilter using a RegexStringComparator enables pattern matching on row keys, such as selecting rows like "user123" via the regex "user[0-9]+".^[34] These filters are evaluated during the scan to prune irrelevant data early. To optimize iterative performance, scans employ caching, configurable via the setCaching method, which batches multiple rows (e.g., 100) per RPC call, reducing latency for large result sets; the default is effectively unlimited but tunable to balance memory usage. Compactions are background processes that maintain storage efficiency by merging HFiles within column families, thereby reducing read amplification caused by excessive file fragmentation. Minor compactions selectively combine a subset of smaller HFiles—typically when the number exceeds the hbase.hstore.compactionThreshold (default: 3)—into fewer, larger files without fully rewriting the store.^[24] These are often time-based or triggered by memstore flushes, helping to consolidate recent writes while preserving performance. In contrast, major compactions rewrite all HFiles in a store into a single file, incorporating tombstone markers to permanently remove deleted cells and reclaim space; they run periodically every hbase.hregion.majorcompaction interval (default: 7 days), with configurable jitter to distribute load.^[35] Region splitting and merging complement compactions by balancing data distribution across servers. Splitting occurs automatically when a region exceeds hbase.hregion.max.filesize (default: 10 GB), dividing it into two daughter regions at a midpoint key to prevent hotspots.^[24] Merging, enabled via the region normalizer (hbase.normalizer.merge.enabled, default: true), combines small adjacent regions—those below a minimum size (default: 1 MB) and age (default: 3 days)—to reduce overhead from numerous tiny regions.^[35] Optimizations like Bloom filters and block caching further boost scan performance by minimizing disk I/O. Bloom filters, configurable per column family (e.g., BLOOMFILTER => 'ROW'), probabilistically check for row existence in HFiles, avoiding unnecessary block reads for non-matching keys.^[24] The block cache, allocating 40% of the JVM heap by default (hfile.block.cache.size: 0.4) with an LRU eviction policy, stores frequently accessed HFile blocks in memory, accelerating sequential scans.^[35] Performance tuning involves adjusting scan batch sizes via hbase.client.scanner.max.result.size (default: 2 MB) and server-side limits (default: 100 MB) to prevent out-of-memory errors during large operations, alongside cache configurations like hbase.regionserver.global.memstore.size (default: 40% of heap) to manage overall memory pressure.^[24]

Integration and Ecosystem

With Hadoop and Other Tools

Apache HBase depends on the Hadoop Distributed File System (HDFS) for all persistent data storage, with the root directory configured via the hbase.rootdir parameter to point to an HDFS path such as hdfs://namenode.example.org:9000/hbase.^[3] This integration ensures that HBase tables, stored as HFiles within HDFS, leverage HDFS's built-in replication mechanism—typically set to a factor of three by default—to provide data durability and automatic recovery from node failures.^[3] In distributed mode, HBase requires HDFS to be operational, as it handles the underlying block-level distribution and fault-tolerance, allowing HBase to scale horizontally across commodity hardware clusters without managing storage redundancy itself.^[3] HBase integrates seamlessly with Hadoop MapReduce for batch processing, providing specialized InputFormats like TableInputFormat and OutputFormats like TableOutputFormat to read from and write to HBase tables within MapReduce jobs.^[36] For efficient bulk data ingestion, tools such as ImportTsv enable loading tab-separated value (TSV) files into HBase by generating HFiles via MapReduce and atomically loading them with completebulkload, bypassing the slower write path and reducing cluster load during imports.^[37] Additionally, HBase supports integration with Apache Hive through the HBaseStorageHandler, which allows Hive to treat HBase tables as external tables for querying and updating, facilitating secondary indexing by mapping Hive columns to HBase families and qualifiers.^[38] To enable SQL-like querying on HBase, Apache Phoenix serves as a relational database layer, compiling ANSI SQL statements into native HBase scans and providing a JDBC driver for standard connectivity, such as via URLs like jdbc:phoenix:server1,server2:2181.^[39] This overlay supports complex operations including joins, aggregations, and GROUP BY clauses by leveraging HBase coprocessors and custom filters, while maintaining low-latency performance for queries spanning millions of rows.^[39] Phoenix enables schema-on-read for existing HBase data and optional ACID transactions, making it suitable for applications requiring relational semantics without altering HBase's core NoSQL model.^[39] The HBase-Spark connector bridges HBase with Apache Spark, allowing Spark applications to access HBase tables as external data sources for in-memory processing, analytics, and machine learning workflows.^[40] Built on Spark's DataSource API, it supports reading and writing HBase data efficiently, enabling transformations like filtering and aggregation in Spark's distributed execution engine while benefiting from HBase's random access capabilities.^[3] For backup and restore operations, HBase uses snapshots to capture a point-in-time view of tables, storing metadata and references to HFiles in HDFS without duplicating data, thus providing an efficient mechanism for recovery.^[41] Snapshots are enabled by default and can be taken, cloned, or restored using HBase shell commands, with an optional failsafe snapshot created before restores to prevent data loss; these operations integrate directly with HDFS tools for archival and replication.^[41] This approach ensures minimal downtime and leverages HDFS's fault-tolerant storage for durable backups across the cluster.^[3]

APIs and Clients

Apache HBase provides a variety of programming interfaces and client tools to enable programmatic interaction with its distributed storage system, supporting both administrative tasks and data manipulation operations. The primary Java-based client API serves as the core interface for developers, offering synchronous and asynchronous methods to perform reads, writes, and scans on tables.^[42] The Java client API, located in the org.apache.hadoop.hbase.client package, facilitates direct access to HBase tables through key classes such as Table and Admin. The Table interface handles data operations, including synchronous methods like put for inserting rows (e.g., table.put(new Put(Bytes.toBytes("rowkey")).addColumn(family, qualifier, value))), get for retrieving specific rows (e.g., table.get(new Get(Bytes.toBytes("rowkey")))), and scan for iterating over multiple rows via a ResultScanner (e.g., try (ResultScanner scanner = table.getScanner(new Scan())) { ... }). Asynchronous operations are supported through AsyncTable, allowing non-blocking execution for high-throughput applications. The older HTable class has been deprecated in favor of the more flexible Table interface, which supports connection pooling and better resource management. Administrative functions, such as creating, altering, or dropping tables, are managed via the Admin class (e.g., admin.createTable(HTableDescriptor)). These APIs require a ZooKeeper quorum in the classpath for cluster discovery and ensure atomic row-level operations through internal locking mechanisms.^[42]^[43] For non-Java environments, HBase offers the REST API via the Stargate server, which exposes HTTP endpoints for CRUD operations on tables, rows, and cells, enabling access from any language with HTTP capabilities. Stargate supports standard HTTP methods—GET for reads and scans, PUT and POST for writes, and DELETE for removals—and runs on a configurable port (default 8080). It can be configured for read-only mode (hbase.rest.readonly=true) to restrict operations, making it suitable for web-based or lightweight clients without Java dependencies. The server is started using bin/hbase rest start and handles authentication if HBase security is enabled.^[44] Language-agnostic access is further provided through the Thrift gateway, which uses RPC protocols for cross-language bindings. The Thrift gateway implements the HBase API via Apache Thrift's IDL, generating client code for languages including C++ and Python, with configurable thread pools (minimum 16 workers, maximum 1000) and support for framed or compact protocols. It authenticates requests using HBase credentials but performs no additional authentication itself. The Thrift gateway allows non-Java applications to perform puts, gets, and scans without direct Java integration.^[45] The HBase Shell provides a command-line interface (CLI) for interactive administration and data operations, built on JRuby and invoked via hbase shell. It supports commands for table management, such as create 'tablename', 'cf' to define a table with column families, disable 'tablename' and enable 'tablename' for lifecycle control, and drop 'tablename' for deletion. Data operations include put 'tablename', 'rowkey', 'cf:qualifier', 'value' for inserts, get 'tablename', 'rowkey' for retrievals, and scan 'tablename' (optionally with limits like {LIMIT => 10}) for querying ranges of rows. The shell integrates with HBase configurations and is useful for scripting and quick prototyping.^[46] Security features are integrated into these APIs to enforce access controls in distributed environments. HBase supports Kerberos authentication by setting hbase.security.authentication=kerberos in hbase-site.xml, requiring principals like hbase/_HOST@REALM and keytab files for masters and region servers (e.g., hbase.master.kerberos.principal and hbase.regionserver.keytab.file). Fine-grained authorization uses Access Control Lists (ACLs) managed by the AccessController coprocessor, defined in hbase-policy.xml for RPC decisions, with superuser privileges configurable via hbase.superuser (e.g., a comma-separated list of users or groups). ACLs cover permissions like READ, WRITE, EXEC, and ADMIN on tables, cells, or namespaces, ensuring secure client connections.^[47]

Use Cases and Deployments

Typical Applications

Apache HBase is particularly well-suited for applications involving sparse, high-velocity data due to its ability to handle large-scale, random read/write operations on distributed datasets. Its column-family storage model efficiently manages multi-dimensional data with variable schemas, making it ideal for scenarios requiring real-time ingestion and low-latency access without predefined structures.^[3] In time-series data management, HBase excels at storing and querying timestamped records from sources like IoT sensors and monitoring systems, enabling efficient analytics on high-volume, append-only streams. For instance, sensor logs can be organized with row keys incorporating timestamps and device identifiers, allowing for fast range scans over time windows to support anomaly detection or trend analysis. This approach leverages HBase's versioning capabilities to retain historical data points while optimizing storage for sparse metrics.^[48]^[49] Recommendation systems often utilize HBase to maintain sparse user-item matrices, where row keys represent users or sessions and column families store interaction histories or feature vectors for personalization in e-commerce. The database's support for wide tables facilitates the ingestion of user behavior data at scale, enabling quick lookups and updates for real-time suggestions without full table scans. This is particularly effective for handling the irregular density of preference data across millions of users.^[50] For log processing, HBase provides a robust backend for real-time ingestion of web and server logs, supporting monitoring, search, and forensic analysis through its append-heavy write patterns and scan operations. Logs can be partitioned by time or source in row keys, with qualifiers capturing event details, allowing distributed processing frameworks to query subsets efficiently for pattern recognition or alerting. This setup ensures high throughput for continuous data streams while maintaining data durability via HDFS integration.^[49]^[51] HBase serves as an effective storage layer for messaging queues, accommodating high-throughput appends in applications like social feeds or chat histories through its ordered key design and atomic operations. Messages are typically stored with sequence-based row keys for ordering, enabling queue-like semantics with reliable delivery and consumer offsets managed via secondary indexes or coprocessors. This configuration supports distributed, fault-tolerant queuing without dedicated middleware, scaling to billions of events daily.^[52] In fraud detection, HBase enables real-time lookups across sparse transaction graphs, where row keys encode account or session identifiers and columns hold relational edges or attributes for graph traversal queries. Its low-latency random reads facilitate pattern matching against historical anomalies, integrating with streaming pipelines for immediate risk scoring on incoming events. This is crucial for processing vast, irregular datasets in financial systems while ensuring consistency at scale.^[53]^[54]

Notable Users

Alibaba extensively deploys Apache HBase as a core component of its e-commerce infrastructure, handling petabyte-scale data for search indexing and personalized recommendations across platforms like Taobao and Tmall.^[55]^[56] The system supports high-throughput, low-latency workloads for product discovery and promotional data, contributing to enhanced user engagement during peak events like the Singles' Day shopping festival.^[57] Twitter (now X) has historically relied on HBase for generating user timelines and handling high-velocity tweet data. Early implementations focused on scalable storage for social feeds, though workloads have evolved to other systems.^[58] Financial institutions, including Visa and Mastercard, employ HBase for storing and querying time-series trading data and transaction histories to support real-time risk analysis, fraud detection, and algorithmic trading as of 2025.^[59] This enables efficient handling of high-frequency financial datasets, with low-latency access critical for compliance and decision-making.^[53] As of 2025, HBase adoption has trended toward managed cloud services, with increased deployments on Amazon EMR for scalable Hadoop ecosystems and Azure HDInsight for integrated NoSQL capabilities in hybrid environments.^[60]^[61] These platforms facilitate easier provisioning and auto-scaling for enterprise workloads, reducing operational overhead.^[62]

Comparisons

With Column-Family Stores

Apache HBase shares the wide-column storage paradigm with other column-family stores like Apache Cassandra and Google Bigtable, but differs significantly in architecture and operational focus. These systems are designed to manage large-scale, sparse datasets through column-oriented structures, enabling efficient handling of semi-structured data across distributed environments.^[2] Compared to Apache Cassandra, HBase adopts a master-slave architecture where the HMaster coordinates region servers via ZooKeeper, and data persistence relies on HDFS for fault-tolerant storage.^[3] In contrast, Cassandra employs a peer-to-peer model with no central master, using a gossip protocol for node coordination and supporting tunable consistency levels that favor availability and partition tolerance (AP in the CAP theorem).^[63] HBase's deep integration with the Hadoop ecosystem, including seamless compatibility with tools like MapReduce and Hive, positions it as a strong choice for analytics-driven workloads, whereas Cassandra's multi-datacenter replication and eventual consistency make it more suitable for geo-replicated, high-availability applications such as real-time messaging or IoT data ingestion.^[64] Relative to Google Bigtable—and its cloud-managed variant, Cloud Bigtable—HBase functions as an open-source implementation modeled directly on Bigtable's design, yet it incorporates Hadoop-specific dependencies like HDFS and ZooKeeper, which introduce additional operational overhead in non-Hadoop setups.^[65] Bigtable, built on Google's proprietary Colossus file system, offers fully managed scaling, automatic tablet balancing, and maintenance tasks without requiring users to handle region splits or coprocessors, allowing for simpler deployment in cloud environments.^[65] A core shared trait among HBase, Cassandra, and Bigtable is the use of column families to organize sparse data, where families group related columns as the primary unit for access control, compression, and storage, accommodating unbounded qualifiers within each family to represent semi-structured information efficiently.^[2] HBase maintains strong consistency (CP in the CAP theorem), ensuring atomic operations and linearizability across replicas, while Cassandra provides configurable eventual consistency to prioritize uptime during partitions.^[64]^[63] In terms of performance, HBase optimizes random reads and point queries through features like Bloom filters and block caching on HDFS, making it effective for scan-heavy operations in integrated Hadoop pipelines.^[64] Cassandra, however, achieves higher throughput in write-intensive distributed scenarios due to its concurrent commit logs and SSTables, which minimize coordination overhead in peer-to-peer clusters.^[64]

With Document and Key-Value Stores

Apache HBase, as a column-oriented NoSQL database, differs fundamentally from document stores like MongoDB in its data model and query capabilities. HBase is optimized for structured, sparse tables that support versioning through timestamps on cells, making it suitable for handling large-scale, multidimensional data with row keys and column families. In contrast, MongoDB employs a document-oriented model using BSON (Binary JSON) format, which allows for flexible, self-describing documents that map directly to application objects and support ad-hoc queries via its expressive query language.^[66] This enables MongoDB to excel in scenarios requiring schema evolution, where documents can vary in structure without predefined schemas, unlike HBase's requirement to define column families upfront for data organization and performance tuning.^[67] Regarding scalability, HBase leverages the Hadoop Distributed File System (HDFS) to achieve petabyte-scale writes and reads in distributed environments, particularly for high-volume random access patterns in big data ecosystems. MongoDB, while also scalable through sharding across clusters, is better suited for a broader range of applications, including those with complex aggregations and multi-document ACID transactions, but may not match HBase's efficiency for extremely sparse, versioned datasets at petabyte volumes.^[66] When compared to simple key-value stores like Redis, HBase emphasizes durable, distributed persistence on disk, enabling it to manage massive, persistent datasets across multiple nodes with fault tolerance via replication. Redis, primarily an in-memory data structure store, prioritizes speed and low-latency operations for caching, session management, and real-time applications, with optional persistence mechanisms that are less robust for long-term storage. HBase's column-family structure supports multi-dimensional queries and scans over large tables, whereas Redis's key-value simplicity limits it to basic get/set operations on smaller datasets, often constrained by available RAM.^[68] Key trade-offs highlight HBase's relative rigidity: it mandates schemas for column families to ensure consistency and optimize storage, contrasting with MongoDB's fully schemaless approach that facilitates rapid prototyping and evolving data models.^[67] Similarly, Redis's lack of complex querying or versioning makes it unsuitable for HBase's strengths in analytical workloads, though it offers superior sub-millisecond latency for simple operations on non-persistent data.^[68] These differences stem from HBase's design for sparsity and scalability in sparse tables, briefly referencing its ability to handle varying column qualifiers dynamically within families. In terms of use case divergence, HBase is predominantly used for big data analytics, such as processing time-series data or log aggregation in Hadoop environments, where durability and horizontal scaling are paramount.^[3] MongoDB and Redis, however, align more with operational online transaction processing (OLTP), with MongoDB supporting flexible content management and Redis enabling high-speed caching in web applications.^[66]

References

[1]
Apache HBase – Apache HBase® Home
### Summary of Apache HBase
[2]
[PDF] Bigtable: A Distributed Storage System for Structured Data
In this paper we describe the sim- ple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we de- scribe the ...
[3]
Apache HBase® Reference Guide
This is the official reference guide for the HBase version it ships with. Herein you will find either the definitive documentation on an HBase topic as of its ...
[4]
https://hbase.apache.org/book.html#distributed
[5]
https://hbase.apache.org/book.html#preface
[6]
Apache HBase® Reference Guide
Summary of each segment:
[7]
https://hbase.apache.org/book.html#datamodel
[8]
The Apache Software Foundation Announces the 10th Anniversary ...
May 13, 2020 · HBase originated at Powerset in 2006 as an Open Source system to run on Apache Hadoop's Distributed File System (HDFS), similar to how BigTable ...Missing: history origins
[9]
HBase Leads Discuss Hadoop, BigTable and Distributed Databases
Apr 28, 2008 · Powerset, where Jim and Stack work, needed a Bigtable-like data store to hold its webtable, a wide table of web documents and their attributes ...<|control11|><|separator|>
[10]
release 0.15.0 available - Apache Hadoop
This release contains many improvements, new features, bug fixes and optimizations. Notably, this contains the first working version of HBase. See the release ...
[11]
[PDF] HBASE STATUS QUO - ApacheCon
• HBase becomes a top-level project. • Facebook chooses HBase for Messages product. • Jump from HBase 0.20 to HBase 0.89 and 0.90. • First CDH3 betas include ...Missing: March | Show results with:March
[12]
Old Apache HBase (TM) News
June 30th, Apache HBase Contributor Workshop (Day after Hadoop Summit). May 10th, 2010: Apache HBase graduates from Hadoop sub-project to Apache Top Level ...Missing: becomes date
[13]
Start of a new era: Apache HBase 1.0 - Blogs Archive
Feb 24, 2015 · Including previous 0.99.x releases, 1.0.0 contains over 1500 jiras resolved. Some of the major changes are: API reorganization and changes.
[14]
[ANNOUNCE] Apache HBase 2.0.0 is now available for download ...
Apr 30, 2018 · The HBase team is happy to announce the immediate availability of Apache HBase 2.0.0. Apache HBase™ is the Hadoop database, a distributed, ...
[15]
https://hbase.apache.org/book.html#hadoop
[16]
https://github.com/apache/hbase/releases/tag/2.4.18
[17]
https://hbase.apache.org/book.html#striped
[18]
Release Policy - The Apache Software Foundation
This page documents the ASF policy on software releases. This document is for ASF release managers and PMC members. Information for end-users is also available.
[19]
https://hbase.apache.org/downloads.html
[20]
https://www.apache.org/legal/release-policy.html
[21]
Constant Field Values (Apache HBase 2.5.0 API)
"hbase.hbaseclient.impl". public static final String · HBCK_CODE_NAME ... DEFAULT_MAX_VERSIONS, 1. public static final int, DEFAULT_MIN_VERSIONS, 0. public ...
[22]
Class MultiVersionConcurrencyControl - Apache HBase
Manages the read/write consistency. This provides an interface for readers to determine what entries to ignore, and a mechanism for writers to obtain new ...
[23]
Apache HBase® Reference Guide
Summary of each segment:
[24]
https://hbase.apache.org/book.html#hbase_default_configurations
[25]
https://hbase.apache.org/book/architecture.html
[26]
https://hbase.apache.org/book/zookeeper.html
[27]
https://hbase.apache.org/book.html#regions
[28]
https://hbase.apache.org/book.html#hfile
[29]
Apache HBase® Reference Guide
Summary of each segment:
[30]
https://hbase.apache.org/book.html#replication
[31]
https://hbase.apache.org/book.html#hdfs
[32]
Apache HBase (TM) ACID Properties
HBase guarantees atomicity within a row, consistency, isolation, visibility, and durability of successful operations, but is not ACID compliant.
[33]
Apache HBase® Reference Guide
Summary of each segment:
[34]
https://hbase.apache.org/book.html#client
[35]
Package org.apache.hadoop.hbase.mapreduce
Provides HBase MapReduce Input/OutputFormats, a table indexing MapReduce job, and utility methods. See HBase and MapReduce in the HBase Reference Guide for ...
[36]
Apache HBase® Reference Guide
Summary of each segment:
[37]
HBaseIntegration - Apache Hive - Apache Software Foundation
No readable text found in the HTML.<|control11|><|separator|>
[38]
Apache Phoenix: Overview
Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets.Phoenix in 15 minutes or less · Reference · Phoenix Downloads · Using
[39]
Apache HBase - Spark – About
### Summary of HBase-Spark Connector
[40]
https://hbase.apache.org/hbase-spark/
[41]
Package org.apache.hadoop.hbase.client
The `org.apache.hadoop.hbase.client` package provides the HBase client, allowing table access via `Table` and adding content using `Put` and `Get`.Missing: release | Show results with:release
[42]
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/package-summary.html
[43]
https://hbase.apache.org/book.html#java_api
[44]
https://hbase.apache.org/book.html#rest
[45]
https://hbase.apache.org/book.html#thrift
[46]
https://hbase.apache.org/book.html#shell
[47]
[PDF] Using HBase to Implement Speed Layer in Time Series Data ...
Feb 28, 2022 · This paper focuses on the challenges of using HBase as a system for storing and analysing time series data, related to designing an appropriate ...
[48]
What Is Apache HBase? HBase Features, Use Cases, and Alternatives
Nov 28, 2023 · Apache HBase is a NoSQL open source distributed database built on top ... HBase is written in Java and is a NoSQL column-oriented database ...
[49]
A Scalable Product Recommendations Using Collaborative Filtering ...
Port data onto the next generation databases like HBase and optimize the performance of it. For the product recommendations the Amazon dataset is used. Proposed ...<|separator|>
[50]
Apache HBase Explained in 5 Minutes or Less | Credera
Jun 4, 2014 · HBase is also optimized for sequential reads and scans, which make it well suited for batch analysis of log data. – Capturing Metrics ...
[51]
A Distributed Message Queuing System on Clouds with HBase
Aug 10, 2025 · This paper presents HBaseMQ, the first distributed message queuing system based on bare-bones HBase. HBaseMQ directly inherits HBase's ...<|separator|>
[52]
Apache HBase - CelerData
Jul 17, 2024 · Historical Background Apache HBase originated from Google's Bigtable. Google released a paper in 2006 describing Bigtable's architecture. The ...Missing: Powerset | Show results with:Powerset
[53]
Why Managing Apache HBase at Scale Isn't Easy - Ksolves
Jun 20, 2025 · Apache HBase is a distributed, column-oriented NoSQL database built to handle real-time read/write access to massive datasets.1. Column-Oriented... · Why Managing Hbase At Scale... · What Support Teams Do...
[54]
How Alibaba Have Made a High Availability Solution with HBase
Nov 8, 2019 · This article looks at how Alibaba has implemented HBase into their own cloud infrastructure to produce a high availability solution for several different ...
[55]
Offheap Read-Path in Production - The Alibaba story - Blogs Archive
Mar 9, 2017 · HBase is the core storage system in Alibaba's Search Infrastructure. Critical e-commerce data about products, sellers and promotions etc.
[56]
Improvements to Apache HBase and Its Applications in Alibaba ...
In this session, we will talk about the details of how we use HBase to serve such high-throughput, low-latency, mixed workloads and the various improvements we ...
[57]
Scaling Spark Streaming for Logging Event Ingestion - Medium
Nov 20, 2018 · Airbnb scaled Spark streaming with a novel balanced Kafka reader that can ingest massive amount of logging events from Kafka in near real-time.
[58]
Airstream: Spark Streaming At Airbnb | PDF - Slideshare
It describes how AirStream provides a unified platform for both streaming and batch data processing using Spark SQL and a shared state store in HBase. Case ...
[59]
Apache HBase at Airbnb | PPTX - Slideshare
The document discusses Airbnb's data infrastructure and the use of Apache HBase for efficient data management, event logging, and real-time data processing.
[60]
Data Processing Components | Adobe Audience Manager
Apr 12, 2021 · HBase: A very large Hadoop database. It processes and manages inbound and outbound data, trait rules, algorithmic modeling information, and ...
[61]
Massive Data Processing in Adobe Experience Platform Using Delta ...
The Unified Profile Team at Adobe was initially trying to model all of this data on HBase using HDInsights, which was not stable. Then they moved to HBase ...
[62]
[System Design Tech Case Study Pulse #80] 1 Billion Tweets Daily
Apr 15, 2025 · Twitter's choice of Apache HBase as a key component of its data infrastructure enables it to manage over 1 billion tweets daily.
[63]
How to design HBase schema for Twitter data? - Stack Overflow
Mar 7, 2013 · I have following Twitter data and I want to design a schema for the same .The queries which I would need to perform would be following: get ...hbase data modeling for activity feeds/news feeds/timelineHow to implement twitter's 'friends' timeline' function - Stack OverflowMore results from stackoverflow.com
[64]
Top HBase Companies 2025 - Built In
Top HBase Companies (118) · Comcast Advertising · BlackRock · MetLife · Sojern · Commerce · Magnite · Edmunds · Dropbox.
[65]
Companies Currently Using Apache HBase - HG Insights
Companies Currently Using Apache HBase ; Jpmorgan Chase & Co. jpmorganchase.com, New York ; Oracle Corporation. oracle.com, Austin ; Salesforce, Inc. salesforce.
[66]
Microsoft azure hdinsight vs aws emr elastic mapreduce - ProjectPro
Deployment model, HDInsight can be deployed on the Azure cloud or on-premises using Azure Stack. EMR can be deployed in a managed or unmanaged mode, allowing ...
[67]
What is Apache HBase in Azure HDInsight? - Microsoft Learn
Dec 2, 2024 · An introduction to Apache HBase in HDInsight, a NoSQL database build on Hadoop. Learn about use cases and compare HBase to other Hadoop ...Missing: deployments AWS EMR
[68]
A Comparison of Big Data Processing in the Cloud with Amazon ...
Sep 5, 2023 · Amazon EMR supports Hadoop, Spark, Hive, HBase, Flink, and more. Azure HDInsight supports similar frameworks and adds Storm, Kafka, and Hadoop ...
[69]
Overview | Apache Cassandra Documentation
- **Peer-to-Peer Architecture**: Cassandra uses a peer-to-peer, multi-primary database replication model, enabling full global replication and scaling out on commodity hardware.
[70]
Cassandra and HBase - Difference Between NoSQL Databases - AWS
Cassandra provides fast read and write performance, and HBase provides greater data consistency. HBase is also more effective for handling large, sparse ...
[71]
Differences between HBase and Bigtable
Bigtable stores timestamps in microseconds, while HBase stores timestamps in milliseconds. This distinction has implications when you use the HBase client ...Missing: comparison | Show results with:comparison
[72]
Investing In Big Data: Apache HBase - Blogs Archive
Apr 13, 2016 · Of all the NoSQL stores, why HBase? · HBase is a strongly consistent store. In the CAP Theorem, that means it's a (CP) store, not an (AP) store.What Is Hbase? · Of All The Nosql Stores, Why... · When Does Salesforce Use...
[73]
MongoDB And HBase Compared
While HBase is highly scalable and performant for a subset of use cases, MongoDB can be used across a broader range of applications. The latter's intuitive data ...
[74]
https://hbase.apache.org/book/schema.html
[75]
https://doi.org/10.33039/ami.2022.12.006