ClickHouse
ClickHouse is an open-source, column-oriented database management system (DBMS) optimized for online analytical processing (OLAP), enabling real-time generation of analytical reports from large-scale data using SQL queries.[1] It excels in handling massive datasets—such as billions or trillions of rows—by leveraging columnar storage to achieve query speeds up to 1 billion rows per second, making it ideal for applications requiring sub-second response times on petabyte-scale volumes.[1] Originally developed by Yandex in 2009 for web analytics and open-sourced in 2016, ClickHouse spun out as an independent company, ClickHouse, Inc., in 2021, supported by initial funding including a $50 million Series A and subsequent rounds such as a $350 million Series C in 2025.[2][3] Key to its performance are architectural choices like vectorized query execution, adaptive data compression, and support for distributed processing across clusters, allowing it to ingest and query hundreds of billions of records daily in production environments.[1] It adheres to ANSI SQL standards while extending with OLAP-specific optimizations, such as efficient GROUP BY operations and JOINs on large tables, and integrates seamlessly with tools like Apache Kafka for real-time data ingestion.[1] Widely used by organizations including Cloudflare for processing over 10 million records per second in observability and analytics, ClickHouse powers diverse sectors from web metrics to machine learning feature stores and IoT data analysis.[2] It has an active open-source community on GitHub contributing to ongoing enhancements, solidifying its role as a leading solution for real-time data warehousing.[4]History and Development
Origins and Founding
ClickHouse originated in 2009 as an internal experimental project at Yandex, Russia's leading technology company, aimed at addressing the challenges of real-time analytical reporting on vast, continuously incoming non-aggregated data.[2] The initiative was driven by the need to process web analytics data at unprecedented scale for Yandex.Metrica, Yandex's web analytics platform, which handles billions of events daily and accumulates petabytes of data.[5][2] The project was spearheaded by Alexey Milovidov, a key engineer on Yandex's Metrica team, along with collaborators such as Yury Izrailevsky, who contributed to product and engineering leadership.[2][5] Motivated by limitations in existing databases that could not deliver the required speed and efficiency for online analytical processing (OLAP) workloads, the team sought to build a system capable of generating reports in real time from petabyte-scale datasets.[2][5] Early development focused on creating prototypes to validate the feasibility of high-performance analytics, drawing inspirations from log-structured merge (LSM) tree concepts for efficient data handling.[6] Over the next three years, these efforts evolved into a robust, general-purpose database management system, which entered production for Yandex.Metrica in 2012. By 2014, it was processing approximately 12 billion events per day across a cluster holding over 20 trillion rows and more than 2 petabytes of compressed data.[7] This foundational work laid the groundwork for ClickHouse's emphasis on speed and scalability in analytical applications.[2]Key Milestones and Releases
ClickHouse was open-sourced by Yandex on June 15, 2016, under the Apache 2.0 license, marking its transition from an internal tool to a publicly available project that rapidly gained adoption in the analytics community.[8] The initial open-source release included version 1.0, which established the foundation for its column-oriented architecture and high-performance querying capabilities.[2] Following open-sourcing, ClickHouse saw rapid adoption, with production deployments outside Yandex beginning in late 2016 and the project accumulating thousands of contributors by 2019.[9] In 2019, the 19.x series of releases introduced significant enhancements, including materialized views in version 19.8, enabling automatic data transformation and aggregation for improved query efficiency.[10] These updates solidified ClickHouse's position as a leading OLAP database, with subsequent minor versions in the series addressing scalability and integration improvements. The project evolved further with the formation of ClickHouse, Inc. in September 2021, a U.S.-based company spun off from Yandex to provide commercial support, cloud services, and dedicated development for the open-source project.[2] As of 2025, key milestones include expanded cloud integrations, such as the general availability of ClickHouse Cloud on Microsoft Azure in early 2024 and enhanced AWS competencies for advertising and marketing technologies in August 2025.[11] Versions 24.x and 25.x have incorporated AI/ML extensions, such as support for real-time observability in AI workloads and features like the QBit data type for machine learning applications, enabling faster feature engineering and vector operations.[12][13]Architecture and Design
Column-Oriented Storage
ClickHouse employs a column-oriented storage model, where data in tables is organized as a collection of columns rather than rows, with the values of each column stored contiguously and sequentially on disk.[1] This approach contrasts with row-oriented systems by aligning data access patterns with analytical workloads, allowing queries to load only the specific columns required, thereby minimizing I/O overhead and accelerating operations like filtering and aggregation.[14] For instance, in analytical queries processing large datasets, this selective reading can achieve throughputs exceeding 1 billion rows per second on suitable hardware.[1] The primary storage mechanism in ClickHouse is the MergeTree family of table engines, which underpins most high-volume data ingestion and querying scenarios. Data is inserted into immutable parts—self-contained column files sorted by a primary key—without immediate modifications, ensuring consistency and enabling efficient background operations.[15] These parts are periodically merged in the background to consolidate data, reduce fragmentation, and apply optimizations like deduplication in variants such as ReplacingMergeTree, while maintaining immutability to avoid locking during reads or writes.[15] Compression is integral to ClickHouse's columnar design, leveraging algorithms that exploit the similarity and sorted order of values within columns for high ratios. By default, ClickHouse uses LZ4 for fast compression and decompression with low CPU overhead, and ZSTD (at level 1) as the preferred option in ClickHouse Cloud for superior space savings, often achieving 2-10x reductions in storage footprint depending on data patterns.[16] Additional low-level encodings, such as Delta for integers, are applied before general-purpose compression to further enhance efficiency in columnar blocks, typically sized from 64 KB to 1 MB uncompressed.[16] To facilitate rapid data skipping, ClickHouse implements sparse primary indexes in MergeTree engines, where index marks are generated every 8192 rows (configurable via index_granularity) to approximate row ranges without storing full offsets for every entry. These indexes, stored in.idx files for the primary key and .mrk files for columns, enable the query engine to bypass irrelevant data blocks during scans, significantly boosting performance on sorted, partitioned datasets.[15]
ClickHouse optimizes complex data types for its columnar format, supporting arrays, maps, and nested structures with memory-efficient representations. Arrays are stored using two contiguous vectors—one for offsets and one for elements—allowing vectorized operations across variable-length sequences without row reconstruction. Maps store keys and values in separate columnar vectors, preserving order and enabling efficient lookups, while nested structures treat sub-elements as multiple parallel columns (e.g., as arrays of equal length), facilitating denormalized analytics without performance penalties from joins.[15][17]
Distributed Processing
ClickHouse implements distributed processing through sharding, where data is partitioned across multiple nodes in a cluster to enable horizontal scaling and parallel query execution. Sharding divides the dataset into independent subsets stored on separate servers, with the partitioning determined by a sharding key specified in the Distributed table engine configuration, such as a hash function likeintHash64(UserID) or a simple rand() for even distribution.[18] This approach ensures that large datasets exceeding single-node capacity are handled efficiently, as each shard processes only its portion of the data. Coordination among nodes, including shard assignment and metadata management, relies on ClickHouse Keeper—a built-in coordination service compatible with Apache ZooKeeper—or ZooKeeper itself, which maintains cluster topology, facilitates leader election, and ensures consistency during operations like data replication and distributed DDL queries.[19]
The query execution pipeline in distributed setups operates without a global query plan; instead, each node executes a local plan for its shard. When a query targets a Distributed table, the initiating server sends subqueries to all relevant shards in parallel, leveraging the cluster configuration defined in the server's XML settings under <remote_servers>. Each shard performs local processing, including data reading from columnar storage, filtering, and partial aggregations using intermediaries like the AggregatingTransform operator to compute intermediate results efficiently. These partial results are then returned to the initiator, which merges them into the final output, minimizing network overhead through one-pass communication. Parallelism within shards is further enhanced by dividing processing into multiple "lanes" (typically matching CPU cores), allowing concurrent handling of data blocks.[20][18]
Distributed tables are created using the Distributed engine via SQL statements like CREATE TABLE distributed_table ENGINE = Distributed(cluster_name, database, local_table, sharding_key), which acts as a proxy without storing data locally but routing operations to underlying shards. Table functions such as remote or remoteSecure provide similar functionality for ad-hoc distributed queries, enabling seamless access to remote data without permanent table creation. Inserts into Distributed tables are routed to shards based on the sharding key, with background asynchronous processing to buffer and distribute data, configurable via settings like distributed_background_insert_sleep_time_ms.[18][21]
Fault recovery in distributed environments is supported through asynchronous replication in ReplicatedMergeTree-family engines, where changes are propagated to replicas with some latency, ensuring eventual consistency across the cluster. If a replica fails, other replicas continue serving queries, and recovery occurs by fetching missing parts from peers coordinated via ClickHouse Keeper. For enhanced durability during writes, the insert_quorum setting requires acknowledgment from a specified number of replicas (e.g., majority) before confirming the insert, preventing data loss in case of partial failures. Reads typically access local data for speed, but consistency can be managed by directing queries to specific replicas or using distributed aggregation to combine results from multiple nodes.[22]
Core Features
Query Language and SQL Support
ClickHouse employs a declarative query language that is largely compliant with the ANSI SQL standard, enabling users to execute standard SQL queries such as SELECT and INSERT, with support for UPDATE and DELETE via asynchronous mutations (ALTER TABLE UPDATE/DELETE), while supporting OLAP-specific workloads. This dialect adheres to ANSI SQL in many aspects, including support for GROUP BY, ORDER BY, and subqueries, but includes deviations optimized for analytical processing, such as relaxed rules for certain clauses to enhance performance. An optional ANSI SQL mode can be enabled to increase compatibility with standard SQL behaviors, though it may impact query speed.[23][24] To address OLAP requirements, ClickHouse extends standard SQL with features like the ARRAY JOIN clause, which unrolls arrays into separate rows, facilitating efficient processing of nested or semi-structured data common in analytics. For example, the querySELECT * FROM table ARRAY JOIN arr_column AS item duplicates non-array columns for each array element, excluding empty arrays by default, thus enabling complex aggregations over array data without explicit loops. This extension, along with support for array functions and nested data types, distinguishes ClickHouse's dialect for handling high-volume, denormalized datasets in real-time analytics.
ClickHouse provides a extensive library of built-in functions, exceeding 1,000 in total, categorized into types such as aggregate functions (e.g., sum, avg, uniqExact for cardinality estimation), string manipulation (e.g., substring, replaceRegexp), date/time operations (e.g., toDate, dateDiff), and higher-order functions for array and lambda processing. These functions are designed for row-wise or aggregate computations, with aggregate variants accumulating values across rows to support efficient OLAP queries; for instance, sumMap(key, value) computes weighted sums over key-value pairs. The functions landing page organizes them into over 20 categories, allowing developers to perform complex transformations directly in SQL without external scripting.[25]
The query optimizer in ClickHouse is primarily rule-based, applying transformations like predicate pushdown and projection pruning to minimize data scanned during execution. It incorporates basic cost estimation for selecting join algorithms and orders, such as choosing between hash joins and merges based on data distribution and memory availability, though it lacks a full cost-based optimizer for arbitrary reordering in complex multi-table joins. Users can influence optimization via settings like join_use_nulls or by using EXPLAIN to inspect plans, ensuring queries leverage column-oriented storage for sub-second response times on large datasets.[26]
Materialized views in ClickHouse accelerate queries by storing pre-computed results from a SELECT statement as a physical table, shifting computation from query time to data insertion. Created with CREATE MATERIALIZED VIEW view_name ENGINE = SummingMergeTree() AS SELECT ... FROM source_table, they act as insert triggers: new data inserted into the source is automatically transformed and appended to the view, supporting incremental aggregation without reprocessing historical data. For example, a view aggregating daily metrics can use GROUP BY and aggregate functions to maintain summarized tables, dramatically reducing query latency for frequent reports; however, they require specifying an engine like MergeTree for storage and do not support updates to existing rows.
Data Ingestion and Compression
ClickHouse supports efficient data ingestion through several mechanisms designed to handle both batch and streaming workloads at scale. The primary method is theINSERT query, which allows users to load data directly into tables using SQL syntax, supporting formats such as CSV, JSON, and the native binary format for optimal performance.[27] This approach is particularly effective for appending data to MergeTree-family tables, which are append-only and ensure eventual consistency without immediate locking.[27]
For streaming ingestion, ClickHouse integrates with Apache Kafka via the Kafka table engine, enabling real-time data consumption from Kafka topics. This engine acts as a consumer, polling messages from specified topics and inserting them into ClickHouse tables, often in conjunction with materialized views to persist and transform the data for analytical use.[28] It supports fault-tolerant storage and allows configuration of consumer groups, offsets, and message formats like JSON or Avro to facilitate seamless pipeline integration.[29]
Integration with object storage such as Amazon S3 is achieved through table functions like s3(), which enable direct ingestion of data files without intermediate staging. Users can query S3 buckets as virtual tables and insert the results into persistent ClickHouse tables, supporting large-scale bulk loads from compressed files in formats like Parquet or ORC.[27]
ClickHouse is optimized for batch processing, where bulk inserts of 1,000 to 100,000 rows per operation minimize overhead and maximize throughput, outperforming single-row inserts by orders of magnitude. Asynchronous inserts further enhance this by buffering smaller batches server-side before merging, reducing latency in high-velocity environments.[27]
On the compression front, ClickHouse employs column-oriented storage with per-column codecs to achieve high compression ratios while preserving query speed. General-purpose algorithms like ZSTD (default at level 1) and LZ4 are applied after specialized encodings, reducing I/O and storage costs; for instance, applying ZSTD to a delta-encoded integer column can halve the compressed size compared to uncompressed data.[16]
For low-cardinality data, such as categorical fields with few unique values, dictionary encoding via the LowCardinality type replaces repeated strings with integer indices mapped to a dictionary, yielding compression ratios up to 26:1 in datasets like content licenses.[16] This technique is automatically applied during ingestion and decompression is transparent during queries.
Time-series data benefits from double-delta encoding, which stores the second differences of monotonically increasing sequences like timestamps, further compressed with ZSTD to exploit small delta patterns. In optimized schemas, this contributes to overall ratios of 2.7:1 or better for large datasets, such as reducing 68.87 GiB uncompressed to 25.15 GiB.[16]
Advanced Capabilities
Replication and Fault Tolerance
ClickHouse implements replication primarily through the ReplicatedMergeTree engine family, which extends the core MergeTree storage to support asynchronous multi-master replication across multiple replicas for high availability and data durability.[30] This mechanism operates at the table level, allowing individual tables to be replicated independently while non-replicated tables coexist on the same server.[30] Coordination for replication is managed via ClickHouse Keeper or ZooKeeper (version 3.4.5 or higher), which stores metadata such as replica states, log entries for inserts and mutations, and queue information to ensure eventual consistency.[30] Inserts are performed on any replica, with data blocks asynchronously propagated to other replicas in compressed form, while background merges occur locally on each node to maintain the MergeTree structure.[31] Replication operates in a multi-master asynchronous manner without a dedicated leader election for merges or mutations; coordination relies on ClickHouse Keeper or ZooKeeper for metadata consistency across replicas.[32][31] To enhance insert durability, ClickHouse supports quorum writes via theinsert_quorum setting, which requires acknowledgment from a majority of replicas (typically at least half plus one) before confirming the insert operation.[30] This ensures that data survives the failure of a minority of nodes, as each data block is written atomically and deduplicated using a unique log entry in ZooKeeper.[30] For example, in a three-replica setup with insert_quorum=2, an insert succeeds only if at least two replicas persist the data, preventing loss from single-node failures.[30]
Backup strategies in ClickHouse leverage native snapshotting through the ALTER TABLE ... FREEZE PARTITION command, which creates instantaneous, space-efficient snapshots by using hard links to existing data parts without blocking reads or writes.[33] These snapshots can be stored locally or exported to object storage like S3, forming the basis for full and incremental backups.[33] Additionally, Time-to-Live (TTL) expressions on tables or columns automate data expiration by deleting, grouping, or moving rows after a specified interval during background merges, helping manage storage growth and compliance without manual intervention.[34] For instance, a TTL like TTL date + INTERVAL 30 DAY DELETE removes rows older than 30 days automatically.[34]
As of 2025, ClickHouse supports lightweight updates for ReplicatedMergeTree tables, allowing efficient, replicated modifications via UPDATE and DELETE statements that propagate asynchronously across replicas.[35]
Fault tolerance is achieved through automatic recovery mechanisms, where failed replicas resynchronize missing data parts from active peers upon restart by consulting ZooKeeper's replication log.[31] For operations like distributed mutations, replicas coordinate asynchronously via ClickHouse Keeper or ZooKeeper; failed replicas recover by syncing from peers, with dynamic failover based on metadata availability rather than leader election.[32][30] For shard-level imbalances or expansions, resharding is manual, involving adjustments to Distributed table configurations and data movement via INSERT INTO ... SELECT queries, as automatic rebalancing is not natively supported to avoid performance overhead.[36] This design prioritizes query performance and simplicity, with replication ensuring no single point of failure for data access.[31]
Integration and Extensibility
ClickHouse provides robust integration capabilities through standardized connectors, enabling seamless connectivity with a wide array of external applications and services. The official JDBC driver allows Java-based applications to interact with ClickHouse databases using the standard JDBC API, supporting operations such as querying and updating data via a connection URL likejdbc:clickhouse://host:port. Similarly, the ODBC driver facilitates access from ODBC-compliant tools, enabling ClickHouse to serve as a data source for various analytics platforms by implementing the ODBC interface for read and write operations.[37][38]
For business intelligence (BI) tools, ClickHouse integrates directly with platforms like Tableau through dedicated connectors that leverage the JDBC or ODBC drivers. The Tableau connector, available via Tableau Exchange, simplifies setup by requiring the installation of the ClickHouse JDBC driver (version 0.9.2 or later) in the appropriate directory, followed by configuring connection parameters such as host, port (typically 8443 for secure connections), database, username, and password. This allows users to visualize ClickHouse data within Tableau Desktop or Server, supporting live queries and extract-based analysis for interactive dashboards.[39]
ClickHouse extends its functionality via user-defined functions (UDFs), which permit the implementation of custom logic in languages such as C++ or Python. Executable UDFs are configured through XML files specifying the function name, command, input/output formats (e.g., TabSeparated or JSONEachRow), and return types, with scripts placed in a designated user directory like /var/lib/clickhouse/user_scripts/. These UDFs process data via standard input/output streams, enabling complex computations—such as array manipulations or datetime operations in Python—that are not natively available, and can be invoked directly in SQL queries like SELECT my_udf_function(input). Additionally, SQL-based UDFs can be created using the CREATE FUNCTION statement with lambda expressions for simpler custom expressions.[40][41]
To support federated queries across heterogeneous data sources, ClickHouse offers external table engines, notably the JDBC table engine, which connects to remote databases like MySQL or PostgreSQL. This engine uses the clickhouse-jdbc-bridge program (run as a daemon) to bridge connections, allowing tables to be defined with ENGINE = JDBC(datasource, external_database, external_table), where the datasource is a JDBC URI including credentials. Queries against these tables enable data ingestion or analysis from external systems without full replication, supporting Nullable types and operations like SELECT for federated access.[42] ClickHouse also supports integration with Apache Iceberg (as of early 2025), allowing direct querying and ingestion of Iceberg tables via dedicated table engines and catalog connectors for hybrid analytical workloads.[43]
In cloud environments, ClickHouse is available as a fully managed service through ClickHouse Cloud, which deploys on AWS, GCP, and Azure with serverless architecture and automatic scaling. On AWS, users can opt for Bring Your Own Cloud (BYOC) deployments via AWS Marketplace, where clusters auto-scale vertically based on workload demands in Scale and Enterprise plans, handling resource provisioning without manual intervention. Similarly, GCP integrations provide managed instances with consumption-based pricing and built-in auto-scaling to ensure performance for analytical workloads across regions. These services eliminate infrastructure management, including backups and monitoring, while maintaining high availability.[44][45][46]
Limitations and Challenges
Scalability Constraints
ClickHouse deployments face practical limits on cluster size, primarily due to coordination overhead from components like ZooKeeper or its alternative, ClickHouse Keeper. While there is no hard-coded maximum, real-world implementations typically scale to hundreds of nodes, with the largest reported clusters exceeding a thousand nodes. Beyond this, ZooKeeper's metadata synchronization and coordination traffic can introduce significant latency and resource strain, particularly in environments with high replication factors or real-time inserts, making further expansion challenging without custom optimizations.[47][48] Memory consumption poses another key constraint, especially for operations like joins and sorts that build large in-memory structures such as hash tables or sorted buffers. Joins on high-cardinality datasets can exceed available RAM, triggering out-of-memory errors or fallback to slower disk-based processing, while sorts via ORDER BY may require buffering substantial portions of result sets in memory before external sorting kicks in. To mitigate this, ClickHouse provides settings like max_memory_usage to cap per-query allocation, but deployments often need ample RAM—typically 64 GB or more per node—to handle complex analytics without degradation. For storage, official guidelines strongly recommend SSDs over HDDs for primary data volumes, as SSDs deliver superior random read/write performance critical for merge operations and query execution; HDDs suffice only for archival or cold storage tiers but can bottleneck I/O-intensive workloads.[49][50] Vertical scaling, by upgrading individual node resources like CPU and RAM, offers simplicity and efficiency for many ClickHouse workloads, enabling single-node throughput in the terabytes without distributed coordination overhead. However, it has limits tied to hardware availability and cost, beyond which horizontal scaling—adding shards and replicas—becomes necessary for fault tolerance and parallel query distribution across nodes. The trade-off lies in increased complexity: horizontal setups enhance resilience and handle massive concurrency but introduce network latency for cross-shard operations, potentially reducing overall efficiency if not balanced properly. ClickHouse documentation advises prioritizing vertical scaling for most use cases before expanding horizontally.[51][52] Configuration choices can exacerbate scalability issues, notably over-sharding, where excessive shards lead to fragmented data distribution and heightened network traffic during query coordination and replication. The number of shards should be limited; while dozens are typically acceptable, excessive sharding can increase ZooKeeper coordination overhead, especially with frequent inserts and high replication, leading to synchronization delays and potential bottlenecks in distributed environments. Proper sharding—aligning with query patterns and data volume—avoids these pitfalls, ensuring balanced load without unnecessary inter-node communication overhead.[48][53]Operational Overhead
ClickHouse provides built-in mechanisms for monitoring its operational state through system tables such assystem.metrics, system.events, system.asynchronous_metrics, and system.dimensional_metrics, which offer real-time and historical data on server performance, query execution, and resource usage.[54] These tables allow administrators to query metrics like query throughput, memory consumption, and disk I/O directly via SQL, enabling proactive issue detection without external tools. For enhanced observability in distributed environments, ClickHouse integrates natively with Prometheus by exposing metrics in the Prometheus exposition format through HTTP endpoints, facilitating collection and alerting via tools like Grafana.[55]
Maintenance tasks in ClickHouse require periodic intervention to ensure optimal performance and data integrity. Background merges in MergeTree-family tables occur automatically, but manual merges can be triggered using the OPTIMIZE TABLE statement, particularly with the FINAL clause to consolidate parts and remove duplicates in ReplacingMergeTree tables, though this is resource-intensive and should be used sparingly.[56] External dictionaries, used for enriching queries with static or semi-static data, support automatic updates based on the LIFETIME parameter or manual reloading via the SYSTEM RELOAD DICTIONARIES command to refresh data from sources like HTTP or databases.[57] Log rotation is configured in the server's config.xml file under the <logger> section, where parameters such as <size> (maximum file size, e.g., 1000M) and <count> (number of archived files) prevent disk overflow from growing log files like clickhouse-server.log.[58]
Security in ClickHouse is managed through a SQL-based access control system that supports creating users and roles with granular privileges, including CREATE ROLE for defining permission sets and GRANT statements to assign them to users or other roles, enforcing least-privilege principles across databases, tables, and even rows via row policies.[59] Connections are secured in transit using TLS/SSL encryption, configurable via server settings for certificate validation and cipher suites to protect data during client-server communication.[60] For data at rest, while ClickHouse Cloud employs default AES-256 encryption managed by the cloud provider, self-hosted deployments rely on underlying filesystem or storage-level encryption to safeguard persisted data.[61]
Upgrading ClickHouse in production environments, especially clusters, follows a rolling update strategy to minimize downtime. Administrators upgrade replicas sequentially—stopping one node, installing the new binary, restarting, and waiting for data synchronization via the ReplicatedMergeTree engine—before proceeding to the next, ensuring continuous availability without full cluster shutdown.[62] This process is supported by compatibility guarantees in release notes, allowing minor version upgrades without data migration, though major upgrades may require reviewing changelog for breaking changes.[63]