TiDB
TiDB is an open-source, distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads, offering MySQL protocol compatibility for seamless migration and application development.[1] Designed for cloud-native environments, it provides horizontal scalability by separating compute and storage layers, enabling elastic expansion across hundreds of nodes without downtime.[1] TiDB ensures strong consistency and financial-grade high availability through Multi-Raft consensus and multiple data replicas distributed across availability zones.[1] Developed by PingCAP since 2015, TiDB addresses limitations of traditional databases by combining row-based storage in TiKV for transactional processing with columnar storage in TiFlash for real-time analytics, allowing unified handling of OLTP and OLAP queries on the same dataset.[2][1] Its architecture includes a stateless SQL layer (TiDB server) that parses and optimizes queries, a key-value storage engine (TiKV) for distributed data management, and monitoring components like PD for cluster coordination.[3] TiDB supports petabyte-scale data volumes, up to 512 nodes per cluster, and up to 1,000 concurrent connections per node, making it suitable for high-traffic applications in e-commerce, gaming, and financial services.[1] In April 2025, PingCAP announced major enhancements to TiDB for improved global scale and AI-driven applications.[4] Key features of TiDB include automatic sharding for load balancing, built-in data migration tools like TiCDC for change data capture, and integration with Kubernetes via TiDB Operator for simplified deployment and operations.[3] It also offers advanced capabilities such as JSON support, full-text search, and time-series data handling, while maintaining ACID compliance for transactions.[1] As a self-managed solution, TiDB allows full control over infrastructure, with options for on-premises, cloud, or hybrid setups, and is licensed under the Apache 2.0 license for broad community adoption.[1]Overview
Description
TiDB is a cloud-native, distributed SQL database designed to support hybrid transactional and analytical processing (HTAP) workloads, providing horizontal scalability supporting up to 512 nodes while maintaining strong consistency and high availability.[5][6] Developed by PingCAP, it draws inspiration from Google's Spanner and F1 databases to enable seamless handling of both online transaction processing (OLTP) and online analytical processing (OLAP) in a single system.[7] Founded in 2015 in Beijing, China, by a team of infrastructure engineers frustrated with the challenges of managing traditional databases at scale, PingCAP launched TiDB as an open-source project under the Apache 2.0 license.[2][8] The initiative aimed to address the limitations of standalone relational database management systems (RDBMS) like MySQL, which struggle with massive data growth and high concurrency in internet-scale applications.[7] At its core, TiDB seeks to combine the familiarity and ease of use of traditional RDBMS such as MySQL—with full SQL compatibility and ACID transactions—with the horizontal scalability and resilience typically associated with NoSQL systems.[6][5] This hybrid approach allows organizations to build scalable, real-time applications without the need for complex sharding or separate OLTP and OLAP databases.[9]Design principles
TiDB's architecture is fundamentally shaped by the principle of separating compute from storage, allowing the SQL processing layer (TiDB server) to operate independently of the distributed key-value storage layer (TiKV). This separation enables horizontal scaling of each layer without impacting the other; for instance, additional TiDB servers can be added to handle increased query loads while TiKV clusters expand storage capacity separately. By design, TiDB servers remain stateless, facilitating easy deployment and replacement in dynamic environments. In October 2025, PingCAP introduced TiDB X, an evolved architecture that further decouples compute and storage by using object storage as the backbone to support adaptive scaling and native AI workloads.[10][11][12] A core tenet of TiDB's design is achieving strong consistency for distributed transactions, drawing inspiration from Google's Percolator model as implemented in systems like Spanner and Bigtable. TiKV employs this model to manage multi-version concurrency control (MVCC) and two-phase commit protocols with optimizations for performance in large-scale environments. Replication is ensured through the Raft consensus algorithm, which provides fault-tolerant data duplication across multiple nodes—typically three replicas per data shard—to guarantee durability and availability even in the face of node failures.[13][10] TiDB incorporates the Hybrid Transactional/Analytical Processing (HTAP) principle to unify online transaction processing (OLTP) and online analytical processing (OLAP) workloads on the same dataset, eliminating the need for extract, transform, load (ETL) pipelines. This is realized through TiKV handling transactional writes with row-oriented storage, complemented by TiFlash's column-oriented copies that accelerate real-time analytics queries without data duplication overhead. Such integration supports low-latency decision-making by allowing analytics to run directly on up-to-date transactional data.[14][10] Compatibility with the MySQL protocol forms a foundational design goal, enabling TiDB to serve as a drop-in replacement for MySQL in most applications with minimal or no code modifications. This wire-protocol adherence covers SQL syntax, semantics, and ecosystem tools, reducing migration barriers for enterprises reliant on existing MySQL-based stacks. The design prioritizes retaining familiar behaviors, such as indexing and query optimization, while extending capabilities for distributed scenarios.[10][11] Embracing cloud-native paradigms, TiDB is engineered with stateless components, automated failover mechanisms, and elastic scaling to thrive in multi-tenant cloud infrastructures. The Placement Driver (PD) cluster coordinates data placement and leader elections for high availability, ensuring seamless recovery from failures without manual intervention. This elasticity allows clusters to dynamically adjust resources based on demand, supporting both on-premises and managed cloud deployments.[10][11]History
Development origins
PingCAP was founded in 2015 by Liu Qi, Huang Dongxu, and Cui Qiu, three experienced infrastructure engineers from leading Chinese Internet companies, with the goal of addressing the challenges of scaling traditional relational databases like MySQL for high-growth applications in e-commerce and big data environments.[15][16] The founders, frustrated by the limitations of existing database management, scaling, and maintenance practices, sought to create a distributed SQL database that maintained MySQL compatibility while enabling seamless horizontal scaling without the complexities of manual sharding.[2] Inspired by Google's Spanner, they aimed to build an open-source solution that could handle massive data volumes and real-time analytics for modern cloud-native workloads.[17] The initial prototype of TiDB was developed in the Go programming language, building directly on the TiKV key-value store project, which originated in early 2015 as a foundational storage layer inspired by Raft consensus and capable of supporting distributed transactions.[18][19] Early development emphasized integrating TiKV's storage capabilities with SQL processing, starting with basic SQL parsing to ensure MySQL protocol compatibility from the outset. The project was hosted on GitHub from its inception, fostering immediate community involvement; initial commits focused on core parser implementation and rudimentary storage engine integration to prototype a stateless SQL layer atop the distributed key-value backend.[6] This approach allowed rapid iteration while leveraging Go's concurrency features for handling distributed query execution. A primary challenge during these origins was achieving full ACID compliance in a distributed environment without imposing sharding or partitioning burdens on users, which the team addressed by designing TiDB as a "one-stop" solution where the database automatically managed data distribution and consistency via optimistic concurrency control and multi-version concurrency control (MVCC).[2] This innovation enabled strong consistency across nodes without sacrificing the familiarity of MySQL syntax and semantics. In 2016, PingCAP secured its first major funding round, a $5 million Series A led by Yunqi Partners, which supported the shift toward full-time focus on database innovation and accelerated prototype refinement into a production-ready system.[20][21]Release milestones
TiDB's initial stable release, version 1.0, arrived on October 16, 2017, marking the production-ready debut of the distributed NewSQL database with core support for distributed SQL capabilities, including horizontal scalability and strong consistency via the Raft consensus protocol integrated into TiKV.[22] In April 2018, TiDB 2.0 was launched, introducing significant enhancements to horizontal scaling through improved region splitting and merge mechanisms in TiKV, alongside optimizations for storage engine performance to handle larger workloads more efficiently.[23] TiDB 4.0, released on June 17, 2020, advanced hybrid transactional/analytical processing (HTAP) by integrating TiFlash, a columnar storage engine that enables real-time analytics directly on transactional data without data duplication.[24] The April 7, 2021, release of TiDB 5.0 focused on query execution improvements, including vectorized execution for faster analytical processing, and expanded compatibility with MySQL 8.0 features such as window functions and common table expressions.[25] TiDB 6.0, unveiled on April 7, 2022, incorporated enterprise-grade resource control via quota management and auto-scaling capabilities to dynamically adjust cluster resources based on workload demands.[26] On March 30, 2023, TiDB 7.0 debuted with bolstered security features like enhanced role-based access control and integrated monitoring tools for better observability and compliance in production environments.[27] TiDB 8.0, released March 29, 2024, introduced bulk DML support for large transactions to mitigate out-of-memory issues, enhanced optimizer support for multi-valued indexes on JSON data, and accelerated cluster snapshot restore speeds by 1.5-3x.[28] TiDB 8.5, released as a Long-Term Support version on December 19, 2024, brought general availability for foreign key support, client-side encryption for backup data, and experimental vector search capabilities, along with improvements in scalability such as the TiKV MVCC in-memory engine.[29] In 2025, TiDB Cloud expanded with a public preview of its Dedicated service on Microsoft Azure on June 4, enabling managed deployments in that ecosystem for broader cloud portability.[30] On October 8, 2025, at the SCaiLE Summit, PingCAP announced TiDB X, a rearchitected version introducing context-aware scaling, zero-friction elasticity, and native AI integrations for adaptive resource allocation in intelligent applications.[31]Architecture
Core components
TiDB's architecture is composed of several key components that work together to provide a distributed, scalable database system. These include the TiDB Server for SQL processing, the Placement Driver (PD) for cluster management, TiKV for data storage, TiFlash for analytical workloads, and integrated monitoring tools for observability. This modular design enables horizontal scaling and high availability across nodes. In 2025, PingCAP introduced TiDB X, a new cloud-native architecture variant that uses object storage as the backbone for enhanced decoupling of compute and storage, supporting context-aware scaling and native AI integrations, available in TiDB Cloud tiers.[12][3] The TiDB Server serves as a stateless SQL layer that acts as the primary interface for client connections. It handles query parsing by analyzing MySQL protocol packets for syntactic and semantic validity, followed by optimization to generate efficient distributed execution plans, such as pushing down predicates and aggregations to storage layers. Execution involves coordinating data retrieval from underlying stores and assembling results, all while maintaining full compatibility with the MySQL wire protocol to allow seamless integration with existing MySQL tools and applications. As a stateless component, TiDB Servers can be scaled independently without data locality concerns.[32] The Placement Driver (PD) functions as the central cluster manager, maintaining metadata for data distribution, cluster topology, and transaction identifiers. It performs scheduling to balance load across nodes by allocating data regions and handling failover, ensuring even distribution and resource utilization. For high availability, PD is deployed in clusters of at least three nodes using an odd number to achieve consensus and fault tolerance.[3] TiKV is the distributed transactional key-value store that forms the foundational storage layer of TiDB. It organizes data into ordered key ranges called Regions, each replicated across multiple nodes for durability. Locally, TiKV relies on RocksDB as its embedded storage engine to manage persistent key-value data on disk. Replication is achieved through the Raft consensus algorithm, supporting multi-region deployments by ensuring data consistency and automatic recovery from failures. By default, TiKV maintains three replicas per region to provide high availability.[33] TiFlash, introduced in TiDB version 4.0, is a columnar storage engine optimized for analytical processing within the HTAP framework. It asynchronously replicates data from TiKV using Raft Learner roles, enabling real-time synchronization while co-locating instances with TiKV nodes to minimize latency for hybrid transactional and analytical workloads. TiFlash leverages coprocessors built on ClickHouse for efficient columnar query execution, such as aggregations and scans, without disrupting OLTP operations.[34][1] TiDB incorporates robust monitoring through integrations with Prometheus for collecting and storing time-series metrics from all components, and Grafana for visualizing dashboards across categories like cluster overview, TiDB performance, and TiKV storage. Additionally, the built-in TiDB Dashboard, managed by PD since version 4.0, provides a web-based interface for real-time inspection of data distribution and topology. These tools enable proactive observability in multi-tenant environments.[35] In terms of overall topology, TiDB operates as a multi-tenant system where PD acts as the coordinating "brain" for metadata and scheduling, TiKV provides the core row-based storage backbone, and TiDB Servers function as scalable gateways for SQL access, with TiFlash optionally extending capabilities for analytics. This separation allows independent scaling of compute, storage, and management layers.[3]Data flow and storage
In TiDB, the query lifecycle begins when a client connects to the TiDB Server using the MySQL protocol, sending commands and statement strings after authentication.[36] The server maintains session state, such as SQL mode and transaction context, while handling synchronous queries like non-prepared statements via themysql.ComQuery packet.[36] The statement string is then parsed into an Abstract Syntax Tree (AST) using a MySQL-compatible parser, enabling structured representation of clauses like WHERE conditions as nested expressions.[36]
Following parsing, the optimizer compiles the AST into a logical plan and then a physical execution plan using cost-based optimization, incorporating name resolution and privilege checks; simple queries may use fast planning paths like PointGet for efficiency.[36] During execution, the plan is run via the executor, which pushes down coprocessor tasks—such as scans, selections, or aggregations—to TiKV or TiFlash storage nodes to process data locally and minimize network transfer, with results filtered and returned to the client.[36] This pushdown model ensures that computations occur close to the data, enhancing performance in distributed environments.[37]
TiDB employs a hybrid storage model, with TiKV providing row-oriented storage as an ordered key-value map implemented via the Log-Structured Merge-Tree (LSM-tree) in RocksDB for transactional workloads.[13] Complementing this, TiFlash offers columnar storage optimized for analytical queries, enabling efficient aggregation and scan operations on large datasets.[13] The Placement Driver (PD) automatically manages data placement by distributing regions across TiKV and TiFlash nodes, ensuring balanced load and fault tolerance without manual intervention.[13]
Data sharding in TiDB is implicit and range-based, partitioning tables into consecutive key segments called regions—typically around 256 MiB each (default since v8.4.0)—stored in TiKV.[38][13] PD monitors region sizes and loads, triggering automatic splitting when regions exceed thresholds to prevent hotspots, and performs balancing by migrating regions across nodes for even distribution.[13]
Replication in TiDB defaults to three replicas per region, leveraging the Raft consensus algorithm to maintain strong consistency; data modifications are logged and propagated to followers, requiring majority acknowledgment for commits.[13] This setup supports multi-data-center (multi-DC) deployments for geo-redundancy, where regions can be placed across DCs to enable disaster recovery while preserving availability.[13]
Transactions in TiDB follow a two-phase commit (2PC) protocol coordinated by the TiDB Server, drawing from the Percolator model to ensure ACID properties across distributed nodes.[39] The process starts with obtaining a start timestamp from PD, followed by read operations using multi-version concurrency control (MVCC) and buffered writes; in the prewrite phase, locks are acquired on keys in parallel, and the commit phase assigns a commit timestamp before finalizing updates asynchronously.[39] TiDB supports both optimistic and pessimistic modes: optimistic transactions defer conflict detection to the prewrite phase, retrying on failures for low-contention scenarios, while pessimistic mode acquires locks during execution using a for-update timestamp to handle high contention, supporting isolation levels like repeatable read.[39]
For data ingestion, TiDB Lightning facilitates bulk loading of large datasets, such as TB-scale imports from files like SQL dumps or CSV, in either physical mode—for high-speed key-value ingestion directly into TiKV—or logical mode—for ACID-compliant SQL execution on live clusters.[40] Complementarily, Data Migration (DM) enables streaming ingestion by parsing and replicating incremental binlog events from upstream MySQL or MariaDB sources, handling DML and DDL changes with filtering and sharding merge rules to maintain consistency during ongoing operations.[41]
Features
Horizontal scalability
TiDB achieves linear horizontal scalability by decoupling its compute and storage layers, allowing independent expansion of resources to handle growing workloads. Additional TiDB Server nodes can be added to increase read and write throughput, as the stateless SQL layer processes queries in parallel behind a load balancer like TiProxy. Similarly, scaling out TiKV nodes expands storage capacity and I/O performance, with data automatically redistributed across the cluster. The Placement Driver (PD) enables auto-scaling by monitoring cluster health and orchestrating resource allocation without manual intervention.[42] Central to this scalability is TiKV's region management, where data is partitioned into discrete units called Regions, each approximately 256 MiB in size by default. These Regions split automatically when exceeding 384 MiB to prevent overload, dividing into two or more smaller units, and merge when falling below 54 MiB to optimize space and efficiency. PD detects hot-spots—Regions experiencing disproportionate load—through metrics like read/write traffic and CPU usage, then migrates or rebalances them across nodes to maintain even distribution and avoid bottlenecks.[43] In practice, TiDB clusters can sustain over 1 million queries per second (QPS) while scaling to petabyte-level data volumes, all without downtime, as demonstrated in production environments like Flipkart's e-commerce platform. This elasticity supports seamless growth for high-traffic applications, with compression and sharding ensuring efficient resource use at massive scales.[44][45] Introduced in 2025, TiDB X enhances this capability with context-aware scaling tailored for AI workloads, using real-time signals such as QPS, latency, and query patterns to predict and provision resources proactively. This architecture leverages object storage for zero-friction elasticity, adjusting compute and storage in minutes to accommodate dynamic AI-driven demands like vector search and operational analytics on a unified platform.[31] Despite these strengths, horizontal scalability in multi-region deployments can introduce network latency challenges, with cross-region round-trip times potentially reaching hundreds of milliseconds. TiDB mitigates this through follower reads, enabling replica nodes in local regions to serve queries and reduce traffic by up to 50%, thus minimizing global synchronization overhead.[46]MySQL compatibility
TiDB provides extensive compatibility with MySQL at both the wire protocol and functional levels, allowing applications built for MySQL to connect and operate with minimal modifications. This compatibility is a core design goal, enabling seamless adoption in MySQL-centric ecosystems without requiring code changes for most workloads.[47] TiDB supports the full MySQL 5.7 and 8.0 wire protocol, permitting direct connections using standard MySQL clients, drivers, and connection strings. For instance, applications can substitute a TiDB endpoint for a MySQL one without altering client configurations, as the protocol handles authentication, query execution, and result sets identically. This wire-level parity extends to tools like MySQL Workbench and Navicat, which connect natively to TiDB clusters.[47][48] In terms of SQL dialect, TiDB achieves high compatibility with MySQL 5.7 and 8.0 syntax for DDL and DML operations, covering most common use cases including index creation, table partitioning (HASH, RANGE, LIST, KEY types), and basic query constructs. It supports advanced analytical features such as window functions and common table expressions (CTEs), aligning with MySQL 8.0 standards—window functions operate similarly to MySQL's implementation, enabling ranking, aggregation over partitions, and ordered computations. However, gaps exist in specialized areas, such as limited support for geographic information system (GIS) functions, spatial data types, and indexes, as well as the absence of stored procedures, triggers, events, FULLTEXT indexes, and XML functions. DDL operations like online schema changes are supported, but multi-object ALTER TABLE statements and certain type conversions (e.g., DECIMAL to DATE) are restricted.[47][49][50] TiDB integrates effectively with the MySQL ecosystem, including object-relational mapping (ORM) frameworks like Hibernate (via TiDB Dialect in version 6.0 and later) and tools such as phpMyAdmin and DBeaver for administration and querying. It also supports binlog-style replication through TiCDC, which captures change data in formats compatible with MySQL consumers like Kafka and Debezium, facilitating downstream synchronization. These integrations ensure that monitoring, backup, and development workflows from the MySQL landscape transfer directly to TiDB environments.[51][52][53] Migration to TiDB is straightforward for most applications, as no schema alterations are required for standard MySQL schemas, and tools like TiDB Data Migration (DM) handle both full and incremental data synchronization from MySQL or MariaDB sources without downtime. DM replicates DDL and DML changes while preserving compatibility with MySQL's protocol and binlog formats, making it suitable for hybrid or phased transitions. This ease of migration underscores TiDB's role as a drop-in replacement for scaling MySQL deployments.[41][47] With the release of TiDB 8.0 in March 2024, compatibility with MySQL 8.0 was further solidified, incorporating features like window functions, CTEs, and system variables such asdiv_precision_increment for precise division handling. Earlier versions like 7.4 (June 2024) marked official MySQL 8.0 alignment, but TiDB 8.0 extends this with enhanced DML optimizations and ecosystem tooling, ensuring parity for modern MySQL applications. Default settings, including charset utf8mb4 and collation utf8mb4_bin, match MySQL 8.0 conventions to minimize behavioral differences.[28][50][54]
Distributed transactions
TiDB supports distributed ACID transactions across its nodes, ensuring atomicity, consistency, isolation, and durability in a horizontally scaled environment. This is achieved through a hybrid transaction model inspired by Google's Percolator, which combines multi-version concurrency control (MVCC) for isolation and a two-phase commit (2PC) protocol for atomicity.[55][56] In the 2PC implementation, the TiDB server acts as the transaction coordinator, while TiKV nodes serve as participants managing data storage and replication. The process begins with a prewrite phase, where TiDB assigns a start timestamp from the Placement Driver (PD) and sends parallel prewrite requests to relevant TiKV Regions; locks are acquired on keys if no conflicts exist, validated against existing MVCC versions, with the primary key lock coordinating secondary keys. If prewrites succeed, the commit phase follows: TiDB commits the primary key first via Raft consensus on TiKV, then secondaries, releasing locks and externalizing committed versions to ensure all-or-nothing atomicity and durability through quorum replication.[56][57] TiDB employs MVCC to enable snapshot isolation, allowing transactions to read consistent snapshots of data committed before their start timestamp, thus avoiding dirty reads and non-repeatable reads without blocking writers. The default isolation level is Repeatable Read, compatible with MySQL's semantics, while Read Committed (introduced in v4.0) and Read Uncommitted are also supported for less stringent scenarios; Serializable isolation, as defined by SQL-92, is not natively provided, though pessimistic locking can approximate stricter controls in high-conflict cases.[58][59] To handle varying workloads, TiDB offers both optimistic and pessimistic transaction modes, with optimistic as the default since it assumes low conflict rates and defers conflict detection until the commit phase, minimizing locking overhead and improving throughput in read-heavy or low-contention environments. Pessimistic mode, enabled via configuration or statements likeBEGIN PESSIMISTIC, acquires locks early during reads (e.g., SELECT FOR UPDATE) and writes, suiting high-conflict scenarios by preventing aborts but potentially increasing latency due to blocking. Both modes build on the same 2PC foundation, though pessimistic adds a pipelined locking phase for efficiency.[60][61][62]
Deadlock detection occurs automatically in pessimistic mode at the TiKV layer, where circular wait dependencies among lock requests are identified; upon detection, one transaction is aborted with error code 1213, and wait timeouts (default 50 seconds) trigger error 1205 if unresolved. Lock information can be queried via INFORMATION_SCHEMA.DEADLOCKS or CLUSTER_DEADLOCKS tables for troubleshooting.[63][61]
TiDB provides linearizability as the default consistency model for reads and writes, ensuring operations appear atomic and in strict total order as if executed sequentially, enforced by Raft consensus and timestamp ordering from PD. For multi-region deployments, causal consistency is supported when features like async commit are enabled, preserving operation dependencies across regions while reducing latency at the cost of weaker global ordering guarantees.[64][65]
Transaction performance in TiDB emphasizes low latency, with timestamp oracle (TSO) allocation from PD typically under 1 ms for local operations, enabling sub-millisecond point reads in optimistic mode under low load. Commit latency averages around 12-13 ms for typical workloads, influenced by prewrite and Raft replication durations; async commit, introduced in v5.0, accelerates cross-region transactions by decoupling secondary commits, reducing end-to-end latency while maintaining causal consistency when paired with one-phase commit options.[66][67][68]
Cloud-native design
TiDB is engineered with a cloud-native architecture that leverages container orchestration platforms like Kubernetes to ensure seamless deployment, management, and scaling in dynamic cloud environments. The TiDB Operator, an extension for Kubernetes, automates the full lifecycle of TiDB clusters, including provisioning, upgrading, scaling, and failover operations, allowing operators to manage distributed databases declaratively through Kubernetes custom resources.[69][1] This containerization approach enables TiDB to run portably across various infrastructures, abstracting underlying hardware complexities and facilitating rapid iteration in microservices-based applications.[70] A core aspect of TiDB's cloud-native design is its elasticity, achieved through the decoupling of compute and storage layers, which permits independent scaling of resources without downtime. In Kubernetes deployments, this manifests as dynamic resource allocation via Horizontal Pod Autoscaler (HPA) integrations and auto-healing mechanisms that restart or reschedule failed pods automatically, maintaining cluster resilience against node failures or traffic spikes.[71][72] TiDB's storage engine, TiKV, further enhances this by utilizing cloud-native object storage like Amazon S3 or compatible services, ensuring data durability and scalability decoupled from compute instances.[73] TiDB supports multi-cloud deployments across major providers including AWS, Google Cloud Platform (GCP), Microsoft Azure, and Alibaba Cloud, enabling organizations to avoid vendor lock-in while leveraging region-specific advantages. The TiDB Cloud Serverless offering, launched in July 2023, introduces a pay-per-use model that automatically scales compute resources from zero to handle variable workloads, optimizing costs for bursty applications.[74][73] In 2025, the introduction of TiDB X architecture further advanced serverless scaling, providing enhanced elasticity for unpredictable AI-driven workloads through optimized resource orchestration and faster auto-scaling responses.[12] For observability, TiDB integrates natively with Prometheus for metrics collection and Grafana for visualization, exposing key performance indicators such as query latency, throughput, and resource utilization to enable proactive monitoring in cloud setups.[35] Tracing capabilities support distributed request tracking, aligning with standards like OpenTelemetry for end-to-end visibility in microservices architectures.[75] Security in TiDB's cloud-native design incorporates role-based access control (RBAC) for fine-grained permissions on database operations, mandatory TLS encryption for all network communications to protect data in transit, and seamless integration with cloud provider IAM systems for centralized identity management.[76][77][78] These features ensure compliance with standards like GDPR and HIPAA while simplifying secure operations across hybrid and multi-cloud environments.[79]HTAP capabilities
TiDB supports Hybrid Transactional/Analytical Processing (HTAP) through a unified engine that separates online transaction processing (OLTP) workloads on its row-oriented storage layer, TiKV, from online analytical processing (OLAP) workloads on the column-oriented TiFlash layer, while maintaining shared data access to enable zero-ETL pipelines.[80] This architecture ensures strong consistency across both engines without requiring data duplication or batch exports, allowing transactional updates in TiKV to propagate in real-time to TiFlash for immediate analytical use.[34] TiFlash operates as a distributed columnar storage system integrated into the TiDB cluster, featuring Raft-replicated replicas that provide high availability and fault tolerance for analytical data.[34] These replicas are created selectively per table via manual configuration, such as using DDL commands to replicate specific tables from TiKV, enabling asynchronous synchronization of data regions without impacting OLTP performance on the primary store.[34] By leveraging Multi-Raft protocols with learner roles, TiFlash maintains logical consistency and snapshot isolation, supporting efficient columnar scans and aggregations through integrated coprocessors based on ClickHouse technology.[34] The TiDB query optimizer employs cost-based routing to direct analytical queries, such as those involving heavy aggregations or joins, to TiFlash replicas, achieving speedups of 10x or more compared to row-store processing on TiKV.[81] For instance, complex aggregation queries can execute up to 8x faster in benchmarks like TPC-H at scale 100, due to columnar storage optimizations and massively parallel processing (MPP) distribution across TiFlash nodes.[25] This routing is automatic based on query patterns, with optional SQL hints available for explicit control, ensuring OLTP queries remain on TiKV for low-latency point reads and writes.[82] TiDB's HTAP design delivers real-time analytics with sub-second latency for ad-hoc queries on up-to-date transactional data, integrating seamlessly with business intelligence tools like Tableau for interactive dashboards and reporting.[80] Since version 5.0, enhancements including a vectorized execution engine in TiFlash have further accelerated scan-intensive operations by enabling MPP mode for distributed joins and aggregations, reducing query times for large datasets exceeding 10 million rows.[25] These capabilities are particularly valuable in use cases such as e-commerce platforms performing real-time inventory analysis to optimize stock levels during peak sales, or financial systems detecting fraud patterns through instant analytical scans on transaction streams.[80]High availability
TiDB achieves high availability through a distributed architecture that emphasizes fault tolerance and automatic recovery mechanisms. At the core of its data replication strategy is the TiKV storage layer, which employs multi-replica Raft consensus groups for each data region. Typically, three replicas are maintained per region, ensuring that data is durably stored across multiple nodes. The Placement Driver (PD) component schedules these replicas with awareness of topology labels, such as zones and regions, to promote diversity and prevent single points of failure, thereby enhancing disaster recovery capabilities.[43][83][84] Failover in TiDB is handled seamlessly via Raft's automatic leader election process, which detects and resolves node failures by electing a new leader among replicas, typically completing within seconds to minimize downtime. The TiDB server layer is stateless, allowing for instantaneous scaling in or out without data loss or reconfiguration, as compute nodes do not persist state and can be replaced dynamically. This design ensures that client connections remain uninterrupted during failures, with the system retrying operations in milliseconds as needed.[85][86][3] For backup strategies supporting high availability, TiDB utilizes the Backup & Restore (BR) tool to enable point-in-time recovery (PITR), allowing clusters to be restored to any specific timestamp within the retention period using snapshot and log backups. Additionally, asynchronous replication modes facilitate disaster recovery (DR) by switching to non-synchronous log replication when primary replicas fail, maintaining availability without strict synchronization guarantees during outages.[87][88][89] Monitoring and self-healing further bolster uptime, with integrated alerting systems that notify on node failures and critical errors through Prometheus and Grafana dashboards. TiDB Operator, when deployed on Kubernetes, automates self-healing by managing pod restarts, scaling, and recovery, ensuring the cluster responds to faults without manual intervention. In TiDB Cloud, these features contribute to a 99.99% service level agreement (SLA), guaranteeing resilience against node or zone failures with no data loss.[90][91][74]Vector search and AI integration
TiDB provides native vector database functionality, allowing users to store and query vector embeddings directly within its SQL framework. This support includes the creation of vector columns using data types such asBINARY or VARBINARY to hold embeddings generated by models like those from OpenAI or Hugging Face.[92] The system integrates approximate nearest neighbor (ANN) search capabilities, enabling efficient similarity searches for high-dimensional data in AI and machine learning applications.[92]
A key component is the Hierarchical Navigable Small World (HNSW) index, which TiDB uses for vector indexing to accelerate k-nearest neighbors (k-NN) queries. Users can create an HNSW index on vector columns via SQL statements, such as CREATE INDEX idx ON table(vector_column) USING HNSW;, which builds a graph-based structure for fast approximate searches with high recall rates, often up to 98% accuracy in benchmarks.[93] This integration allows semantic search operations to be performed seamlessly alongside traditional SQL queries, without requiring separate vector databases.[94]
TiDB enhances AI workloads through features like hybrid search, which combines vector similarity with full-text search to improve relevance in retrieval tasks. For instance, queries can fuse cosine similarity on embeddings with keyword matching using functions like MATCH AGAINST in a single SQL statement.[95] Additionally, it supports retrieval-augmented generation (RAG) pipelines by providing real-time access to fresh data, enabling large language models (LLMs) to ground responses in up-to-date embeddings and structured information.[96]
In 2025, PingCAP introduced TiDB X, a context-aware architecture designed for zero-downtime scaling of AI models and native integration with LLMs. TiDB X leverages object storage as its backbone to handle dynamic workloads, allowing seamless expansion of vector datasets while maintaining query consistency and integrating directly with frameworks like LangChain for agentic AI applications.[31] This advancement builds on TiDB's HTAP capabilities to support real-time AI analytics on vector data.[97]
Vector embeddings are stored in TiKV for transactional workloads and TiFlash for analytical processing, utilizing columnar storage to optimize ANN algorithms like HNSW for billion-scale datasets. This distributed storage ensures fault-tolerant persistence and horizontal scaling of vectors across clusters.[98][92]
Common use cases include recommendation systems, where vector search powers personalized content suggestions based on user embeddings; chatbots, enabling semantic understanding of queries through similarity matching; and anomaly detection, identifying outliers in time-series data via distance metrics on embedded features.[92]
Performance-wise, TiDB's vector search delivers low millisecond latency for k-NN queries on large datasets, thanks to optimized HNSW indexing and in-memory graph traversal.[99] This efficiency supports real-time AI inference at scale, with recall rates balancing speed and accuracy for production environments.[93]
Deployment options
On-premises methods
TiDB supports several self-managed deployment options for on-premises environments, such as bare metal servers or virtual machines, enabling organizations to operate clusters without relying on cloud infrastructure. These methods leverage command-line tools and automation scripts to provision, configure, and maintain TiDB components including Placement Driver (PD), TiDB servers, and TiKV nodes.[100][101] The primary tool for on-premises deployments is TiUP, a CLI-based cluster management solution that facilitates single-command operations for deploying, upgrading, and scaling TiDB clusters. TiUP operates from a control machine, using a YAML-formatted topology file to define the cluster layout, including host specifications, node roles (e.g., PD, TiDB, TiKV), and resource allocations. This allows for straightforward setup on bare metal or VMs, with built-in support for rolling upgrades and scaling without downtime; for instance, adding TiKV nodes involves updating the topology file and executing a scale-out command. TiUP also integrates monitoring components like Prometheus and Grafana during deployment, providing metrics for observability.[100][102] For multi-node automation, TiDB Ansible offers a playbook-based approach using Ansible to orchestrate cluster provisioning across physical or virtual hosts. This method initializes the system, deploys core components, and handles tasks like rolling restarts, making it suitable for scripted, repeatable setups in enterprise environments. Hardware prerequisites include SSD storage for TiKV nodes to ensure optimal I/O performance, with recommendations for at least 8 CPU cores and 16 GB RAM per node to support production workloads. Although TiUP has largely superseded Ansible for new deployments, Ansible remains viable for managing legacy clusters or environments requiring fine-grained playbook customization.[101][103] Local development and single-node testing can be achieved using Docker Compose, which provisions a lightweight TiDB cluster via predefined Docker images for PD, TiDB, and TiKV. Users clone the official repository, pull images from Docker Hub, and start the stack with a simpledocker-compose up command, accessing the database via MySQL client on port 4000. This setup is ideal for prototyping and isolated testing but is not recommended for production due to its single-node limitations. Configurations for PD and TiKV, such as replication settings or storage paths, are managed through YAML files like docker-compose.yml and component-specific configs.[104]
On-premises best practices emphasize high availability and performance tuning. Deploy at least three PD nodes across distinct hosts to maintain quorum and fault tolerance, with NVMe SSDs recommended for TiKV storage in production to handle high-throughput workloads—aim for 2 TB per TiKV node minimum. Overall cluster sizing should include at least three TiKV nodes and two TiDB servers, with monitoring enabled via Prometheus for proactive issue detection.[105][106]
Cloud and containerized deployments
TiDB offers robust options for cloud and containerized deployments, enabling seamless integration with major cloud providers and Kubernetes environments. TiDB Cloud is a fully managed Database-as-a-Service (DBaaS) platform that automates the deployment, scaling, monitoring, and maintenance of TiDB clusters across Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure.[107] It provides two primary tiers: Dedicated, which offers isolated resources for high-performance, predictable workloads with fine-tuned configurations; and Serverless (renamed to Starter in 2025), which supports instant autoscaling and pay-per-use pricing for variable or development workloads.[107][108] In June 2025, TiDB Cloud Dedicated entered public preview on Azure, expanding multi-cloud availability and allowing enterprises to deploy distributed SQL databases natively within Azure's ecosystem.[30] For containerized deployments, TiDB Operator serves as the core automation tool on Kubernetes, utilizing Custom Resource Definitions (CRDs) to declaratively manage TiDB clusters. It handles full life-cycle operations, including initial deployment, horizontal scaling of components like TiDB servers and TiKV storage nodes, rolling upgrades without downtime, backups, and failover recovery.[109] Day 2 operations, such as resource reconfiguration and monitoring integration, are automated through Kubernetes-native controllers, ensuring high availability and elasticity in dynamic environments.[109] Users can deploy TiDB Operator via Helm charts from the official PingCAP repository, which simplifies installation on Kubernetes clusters by packaging CRDs, controllers, and dependencies into reusable templates.[110] These charts integrate with Container Storage Interface (CSI) drivers for persistent storage, enabling quick setups with commands likehelm install tidb-operator pingcap/tidb-operator.[111]
TiDB emphasizes multi-cloud portability, supporting deployment on managed Kubernetes services such as AWS Elastic Kubernetes Service (EKS), Google Kubernetes Engine (GKE), and Azure Kubernetes Service (AKS). On EKS, for instance, users provision node groups with optimized instance types (e.g., c7g.4xlarge for TiDB) and gp3 EBS volumes, then apply TiDBCluster YAML manifests for auto-provisioning.[112] Similar workflows apply to GKE with pd-ssd storage classes and n2-standard machine types, and AKS with Ultra SSD for high-IOPS TiKV nodes, allowing consistent operations across providers without vendor lock-in.[113][114]
In Serverless mode within TiDB Cloud, clusters automatically suspend during idle periods and resume on demand, optimizing costs for bursty or unpredictable workloads by scaling compute and storage to zero when inactive. This mode supports AI-driven scaling for vector search and machine learning applications across TiDB Cloud tiers including Starter, where resources dynamically adjust based on query complexity and data volume.[12]
Ecosystem tools
Data migration and ingestion
TiDB Data Migration (DM) is an integrated data migration platform that enables full data migration and incremental replication from heterogeneous sources, primarily MySQL-compatible databases such as MySQL (versions 5.6 to 8.0), MariaDB (version 10.1.2 and later, on an experimental basis), and Amazon Aurora MySQL, to TiDB clusters.[41] It supports online DDL synchronization, including compatibility with tools like gh-ost and pt-osc for ghost and online schema changes, ensuring minimal disruption during schema alterations.[41] Additionally, DM provides sharding support, allowing it to merge data from multiple upstream shards into a single TiDB database while automatically detecting and applying DDL changes across shards.[41] For initial bulk data imports into TiDB, TiDB Lightning serves as a high-speed loader capable of handling terabyte-scale datasets.[40] It accepts input from local files or Amazon S3-compatible storage in formats such as SQL dumps, CSV, or Parquet, and operates in two modes: physical import, which encodes data into key-value pairs for direct ingestion into TiKV storage (achieving speeds of 100 to 500 GiB per hour), and logical import, which generates and executes SQL statements (at 10 to 50 GiB per hour).[40][115] The physical mode is optimized for empty tables and empty clusters, making it ideal for greenfield deployments or large-scale initial loads.[40] Dumpling complements these tools by providing a logical export mechanism from MySQL-compatible sources, generating SQL dumps or CSV files that can be directly imported via TiDB Lightning.[116] It exports schema and data into structured files, including metadata and per-table splits (e.g.,{schema}.{table}.{index}.sql), supporting parallel dumping for efficiency and output to local storage or S3-compatible endpoints.[116] This makes Dumpling a key component for preparing MySQL data for TiDB ingestion, particularly in scenarios requiring portable, human-readable formats.[116]
Within DM, the loader stage handles full data migration by dumping upstream data (similar to Dumpling) and loading it into TiDB, while the syncer stage enables real-time incremental replication by parsing and applying binlog events.[117] Integration between stages ensures seamless transitions, with the syncer resuming from the loader's completion point using binlog positions as checkpoints, updated every 30 seconds to track replication progress across workers.[117] Conflict resolution during replication is managed through features like the Causality algorithm for detecting concurrent updates and a safe mode that rewrites INSERT and UPDATE operations to REPLACE on task restarts, preventing data inconsistencies based on binlog coordinates.[117]
Backup and recovery
TiDB provides robust backup and recovery mechanisms through the Backup & Restore (BR) tool, a command-line utility designed for distributed operations on cluster data stored in TiKV nodes.[88] BR enables full snapshot backups of the entire cluster at a specific point in time, capturing raw key-value data for physical consistency.[88] It also supports incremental log backups that record changes to TiKV data, allowing for Point-in-Time Recovery (PITR) with a Recovery Point Objective (RPO) as low as 5 minutes.[88] These backups are stored in S3-compatible external storage, such as Amazon S3, Google Cloud Storage, or Azure Blob Storage, ensuring scalability and durability.[88] For logical backups, TiDB uses Dumpling, a data export tool that generates SQL or CSV files compatible with MySQL ecosystems, facilitating portable restores across different systems.[116] Unlike BR's physical approach, which backs up underlying SST files for faster intra-TiDB restores, Dumpling produces human-readable exports suitable for migrations or archiving but with higher overhead due to SQL parsing.[116] BR remains the preferred method for production environments in TiDB due to its efficiency in handling distributed physical data.[118] The recovery process in TiDB leverages BR to restore data to an empty or non-conflicting cluster, supporting full cluster recovery or specific databases and tables.[119] PITR combines the most recent full snapshot with log backups up to a user-specified timestamp, such as2022-05-15 18:00:00+0800, enabling precise rollbacks.[87] During recovery, BR handles partial failures by pausing tasks and reporting details like pause times and store-specific errors, allowing operators to address issues such as unavailable TiKV nodes before resuming.[87] Restores require an empty target cluster to avoid conflicts, and the process applies changes sequentially from logs to achieve the desired state.[87]
Backups and restores can be scheduled and managed using TiUP for on-premises deployments or the TiDB Operator for Kubernetes environments, integrating seamlessly with automation workflows.[118] From TiDB v7.0.0, SQL-based backup and restore commands are available directly within the database.[88] Security is enhanced through encryption: BR supports server-side encryption (SSE) for S3 storage using AWS KMS keys, and similar mechanisms for Azure Blob Storage with encryption scopes or AES-256 keys, protecting data at rest and in transit via storage provider credentials.[120]
Performance of BR operations scales horizontally with cluster size, utilizing parallel processing across TiKV nodes for distributed I/O.[121] Snapshot backups achieve speeds of 50-100 MB/s per TiKV node with minimal impact (<20% on cluster throughput), while restores reach up to 2 TiB/hour for snapshots and 30 GiB/hour for logs in tested configurations with 6-21 nodes.[88] For terabyte-scale clusters, such as 10 TB datasets, BR enables recovery times under 1 hour by fully utilizing hardware resources, as demonstrated in benchmarks with 1+ GB/s throughput.[122] This results in low Recovery Time Objectives (RTO) for large-scale recoveries, particularly with optimizations in TiDB 8.1 that improve region scattering and communication efficiency.[123]