Fact-checked by Grok 2 weeks ago

TiDB

TiDB is an open-source, distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads, offering MySQL protocol compatibility for seamless migration and application development.^[1] Designed for cloud-native environments, it provides horizontal scalability by separating compute and storage layers, enabling elastic expansion across hundreds of nodes without downtime.^[1] TiDB ensures strong consistency and financial-grade high availability through Multi-Raft consensus and multiple data replicas distributed across availability zones.^[1] Developed by PingCAP since 2015, TiDB addresses limitations of traditional databases by combining row-based storage in TiKV for transactional processing with columnar storage in TiFlash for real-time analytics, allowing unified handling of OLTP and OLAP queries on the same dataset.^[2]^[1] Its architecture includes a stateless SQL layer (TiDB server) that parses and optimizes queries, a key-value storage engine (TiKV) for distributed data management, and monitoring components like PD for cluster coordination.^[3] TiDB supports petabyte-scale data volumes, up to 512 nodes per cluster, and up to 1,000 concurrent connections per node, making it suitable for high-traffic applications in e-commerce, gaming, and financial services.^[1] In April 2025, PingCAP announced major enhancements to TiDB for improved global scale and AI-driven applications.^[4] Key features of TiDB include automatic sharding for load balancing, built-in data migration tools like TiCDC for change data capture, and integration with Kubernetes via TiDB Operator for simplified deployment and operations.^[3] It also offers advanced capabilities such as JSON support, full-text search, and time-series data handling, while maintaining ACID compliance for transactions.^[1] As a self-managed solution, TiDB allows full control over infrastructure, with options for on-premises, cloud, or hybrid setups, and is licensed under the Apache 2.0 license for broad community adoption.^[1]

Overview

Description

TiDB is a cloud-native, distributed SQL database designed to support hybrid transactional and analytical processing (HTAP) workloads, providing horizontal scalability supporting up to 512 nodes while maintaining strong consistency and high availability.^[5]^[6] Developed by PingCAP, it draws inspiration from Google's Spanner and F1 databases to enable seamless handling of both online transaction processing (OLTP) and online analytical processing (OLAP) in a single system.^[7] Founded in 2015 in Beijing, China, by a team of infrastructure engineers frustrated with the challenges of managing traditional databases at scale, PingCAP launched TiDB as an open-source project under the Apache 2.0 license.^[2]^[8] The initiative aimed to address the limitations of standalone relational database management systems (RDBMS) like MySQL, which struggle with massive data growth and high concurrency in internet-scale applications.^[7] At its core, TiDB seeks to combine the familiarity and ease of use of traditional RDBMS such as MySQL—with full SQL compatibility and ACID transactions—with the horizontal scalability and resilience typically associated with NoSQL systems.^[6]^[5] This hybrid approach allows organizations to build scalable, real-time applications without the need for complex sharding or separate OLTP and OLAP databases.^[9]

Design principles

TiDB's architecture is fundamentally shaped by the principle of separating compute from storage, allowing the SQL processing layer (TiDB server) to operate independently of the distributed key-value storage layer (TiKV). This separation enables horizontal scaling of each layer without impacting the other; for instance, additional TiDB servers can be added to handle increased query loads while TiKV clusters expand storage capacity separately. By design, TiDB servers remain stateless, facilitating easy deployment and replacement in dynamic environments. In October 2025, PingCAP introduced TiDB X, an evolved architecture that further decouples compute and storage by using object storage as the backbone to support adaptive scaling and native AI workloads.^[10]^[11]^[12] A core tenet of TiDB's design is achieving strong consistency for distributed transactions, drawing inspiration from Google's Percolator model as implemented in systems like Spanner and Bigtable. TiKV employs this model to manage multi-version concurrency control (MVCC) and two-phase commit protocols with optimizations for performance in large-scale environments. Replication is ensured through the Raft consensus algorithm, which provides fault-tolerant data duplication across multiple nodes—typically three replicas per data shard—to guarantee durability and availability even in the face of node failures.^[13]^[10] TiDB incorporates the Hybrid Transactional/Analytical Processing (HTAP) principle to unify online transaction processing (OLTP) and online analytical processing (OLAP) workloads on the same dataset, eliminating the need for extract, transform, load (ETL) pipelines. This is realized through TiKV handling transactional writes with row-oriented storage, complemented by TiFlash's column-oriented copies that accelerate real-time analytics queries without data duplication overhead. Such integration supports low-latency decision-making by allowing analytics to run directly on up-to-date transactional data.^[14]^[10] Compatibility with the MySQL protocol forms a foundational design goal, enabling TiDB to serve as a drop-in replacement for MySQL in most applications with minimal or no code modifications. This wire-protocol adherence covers SQL syntax, semantics, and ecosystem tools, reducing migration barriers for enterprises reliant on existing MySQL-based stacks. The design prioritizes retaining familiar behaviors, such as indexing and query optimization, while extending capabilities for distributed scenarios.^[10]^[11] Embracing cloud-native paradigms, TiDB is engineered with stateless components, automated failover mechanisms, and elastic scaling to thrive in multi-tenant cloud infrastructures. The Placement Driver (PD) cluster coordinates data placement and leader elections for high availability, ensuring seamless recovery from failures without manual intervention. This elasticity allows clusters to dynamically adjust resources based on demand, supporting both on-premises and managed cloud deployments.^[10]^[11]

History

Development origins

PingCAP was founded in 2015 by Liu Qi, Huang Dongxu, and Cui Qiu, three experienced infrastructure engineers from leading Chinese Internet companies, with the goal of addressing the challenges of scaling traditional relational databases like MySQL for high-growth applications in e-commerce and big data environments.^[15]^[16] The founders, frustrated by the limitations of existing database management, scaling, and maintenance practices, sought to create a distributed SQL database that maintained MySQL compatibility while enabling seamless horizontal scaling without the complexities of manual sharding.^[2] Inspired by Google's Spanner, they aimed to build an open-source solution that could handle massive data volumes and real-time analytics for modern cloud-native workloads.^[17] The initial prototype of TiDB was developed in the Go programming language, building directly on the TiKV key-value store project, which originated in early 2015 as a foundational storage layer inspired by Raft consensus and capable of supporting distributed transactions.^[18]^[19] Early development emphasized integrating TiKV's storage capabilities with SQL processing, starting with basic SQL parsing to ensure MySQL protocol compatibility from the outset. The project was hosted on GitHub from its inception, fostering immediate community involvement; initial commits focused on core parser implementation and rudimentary storage engine integration to prototype a stateless SQL layer atop the distributed key-value backend.^[6] This approach allowed rapid iteration while leveraging Go's concurrency features for handling distributed query execution. A primary challenge during these origins was achieving full ACID compliance in a distributed environment without imposing sharding or partitioning burdens on users, which the team addressed by designing TiDB as a "one-stop" solution where the database automatically managed data distribution and consistency via optimistic concurrency control and multi-version concurrency control (MVCC).^[2] This innovation enabled strong consistency across nodes without sacrificing the familiarity of MySQL syntax and semantics. In 2016, PingCAP secured its first major funding round, a $5 million Series A led by Yunqi Partners, which supported the shift toward full-time focus on database innovation and accelerated prototype refinement into a production-ready system.^[20]^[21]

Release milestones

TiDB's initial stable release, version 1.0, arrived on October 16, 2017, marking the production-ready debut of the distributed NewSQL database with core support for distributed SQL capabilities, including horizontal scalability and strong consistency via the Raft consensus protocol integrated into TiKV.^[22] In April 2018, TiDB 2.0 was launched, introducing significant enhancements to horizontal scaling through improved region splitting and merge mechanisms in TiKV, alongside optimizations for storage engine performance to handle larger workloads more efficiently.^[23] TiDB 4.0, released on June 17, 2020, advanced hybrid transactional/analytical processing (HTAP) by integrating TiFlash, a columnar storage engine that enables real-time analytics directly on transactional data without data duplication.^[24] The April 7, 2021, release of TiDB 5.0 focused on query execution improvements, including vectorized execution for faster analytical processing, and expanded compatibility with MySQL 8.0 features such as window functions and common table expressions.^[25] TiDB 6.0, unveiled on April 7, 2022, incorporated enterprise-grade resource control via quota management and auto-scaling capabilities to dynamically adjust cluster resources based on workload demands.^[26] On March 30, 2023, TiDB 7.0 debuted with bolstered security features like enhanced role-based access control and integrated monitoring tools for better observability and compliance in production environments.^[27] TiDB 8.0, released March 29, 2024, introduced bulk DML support for large transactions to mitigate out-of-memory issues, enhanced optimizer support for multi-valued indexes on JSON data, and accelerated cluster snapshot restore speeds by 1.5-3x.^[28] TiDB 8.5, released as a Long-Term Support version on December 19, 2024, brought general availability for foreign key support, client-side encryption for backup data, and experimental vector search capabilities, along with improvements in scalability such as the TiKV MVCC in-memory engine.^[29] In 2025, TiDB Cloud expanded with a public preview of its Dedicated service on Microsoft Azure on June 4, enabling managed deployments in that ecosystem for broader cloud portability.^[30] On October 8, 2025, at the SCaiLE Summit, PingCAP announced TiDB X, a rearchitected version introducing context-aware scaling, zero-friction elasticity, and native AI integrations for adaptive resource allocation in intelligent applications.^[31]

Architecture

Core components

TiDB's architecture is composed of several key components that work together to provide a distributed, scalable database system. These include the TiDB Server for SQL processing, the Placement Driver (PD) for cluster management, TiKV for data storage, TiFlash for analytical workloads, and integrated monitoring tools for observability. This modular design enables horizontal scaling and high availability across nodes. In 2025, PingCAP introduced TiDB X, a new cloud-native architecture variant that uses object storage as the backbone for enhanced decoupling of compute and storage, supporting context-aware scaling and native AI integrations, available in TiDB Cloud tiers.^[12]^[3] The TiDB Server serves as a stateless SQL layer that acts as the primary interface for client connections. It handles query parsing by analyzing MySQL protocol packets for syntactic and semantic validity, followed by optimization to generate efficient distributed execution plans, such as pushing down predicates and aggregations to storage layers. Execution involves coordinating data retrieval from underlying stores and assembling results, all while maintaining full compatibility with the MySQL wire protocol to allow seamless integration with existing MySQL tools and applications. As a stateless component, TiDB Servers can be scaled independently without data locality concerns.^[32] The Placement Driver (PD) functions as the central cluster manager, maintaining metadata for data distribution, cluster topology, and transaction identifiers. It performs scheduling to balance load across nodes by allocating data regions and handling failover, ensuring even distribution and resource utilization. For high availability, PD is deployed in clusters of at least three nodes using an odd number to achieve consensus and fault tolerance.^[3] TiKV is the distributed transactional key-value store that forms the foundational storage layer of TiDB. It organizes data into ordered key ranges called Regions, each replicated across multiple nodes for durability. Locally, TiKV relies on RocksDB as its embedded storage engine to manage persistent key-value data on disk. Replication is achieved through the Raft consensus algorithm, supporting multi-region deployments by ensuring data consistency and automatic recovery from failures. By default, TiKV maintains three replicas per region to provide high availability.^[33] TiFlash, introduced in TiDB version 4.0, is a columnar storage engine optimized for analytical processing within the HTAP framework. It asynchronously replicates data from TiKV using Raft Learner roles, enabling real-time synchronization while co-locating instances with TiKV nodes to minimize latency for hybrid transactional and analytical workloads. TiFlash leverages coprocessors built on ClickHouse for efficient columnar query execution, such as aggregations and scans, without disrupting OLTP operations.^[34]^[1] TiDB incorporates robust monitoring through integrations with Prometheus for collecting and storing time-series metrics from all components, and Grafana for visualizing dashboards across categories like cluster overview, TiDB performance, and TiKV storage. Additionally, the built-in TiDB Dashboard, managed by PD since version 4.0, provides a web-based interface for real-time inspection of data distribution and topology. These tools enable proactive observability in multi-tenant environments.^[35] In terms of overall topology, TiDB operates as a multi-tenant system where PD acts as the coordinating "brain" for metadata and scheduling, TiKV provides the core row-based storage backbone, and TiDB Servers function as scalable gateways for SQL access, with TiFlash optionally extending capabilities for analytics. This separation allows independent scaling of compute, storage, and management layers.^[3]

Data flow and storage

In TiDB, the query lifecycle begins when a client connects to the TiDB Server using the MySQL protocol, sending commands and statement strings after authentication.^[36] The server maintains session state, such as SQL mode and transaction context, while handling synchronous queries like non-prepared statements via the mysql.ComQuery packet.^[36] The statement string is then parsed into an Abstract Syntax Tree (AST) using a MySQL-compatible parser, enabling structured representation of clauses like WHERE conditions as nested expressions.^[36] Following parsing, the optimizer compiles the AST into a logical plan and then a physical execution plan using cost-based optimization, incorporating name resolution and privilege checks; simple queries may use fast planning paths like PointGet for efficiency.^[36] During execution, the plan is run via the executor, which pushes down coprocessor tasks—such as scans, selections, or aggregations—to TiKV or TiFlash storage nodes to process data locally and minimize network transfer, with results filtered and returned to the client.^[36] This pushdown model ensures that computations occur close to the data, enhancing performance in distributed environments.^[37] TiDB employs a hybrid storage model, with TiKV providing row-oriented storage as an ordered key-value map implemented via the Log-Structured Merge-Tree (LSM-tree) in RocksDB for transactional workloads.^[13] Complementing this, TiFlash offers columnar storage optimized for analytical queries, enabling efficient aggregation and scan operations on large datasets.^[13] The Placement Driver (PD) automatically manages data placement by distributing regions across TiKV and TiFlash nodes, ensuring balanced load and fault tolerance without manual intervention.^[13] Data sharding in TiDB is implicit and range-based, partitioning tables into consecutive key segments called regions—typically around 256 MiB each (default since v8.4.0)—stored in TiKV.^[38]^[13] PD monitors region sizes and loads, triggering automatic splitting when regions exceed thresholds to prevent hotspots, and performs balancing by migrating regions across nodes for even distribution.^[13] Replication in TiDB defaults to three replicas per region, leveraging the Raft consensus algorithm to maintain strong consistency; data modifications are logged and propagated to followers, requiring majority acknowledgment for commits.^[13] This setup supports multi-data-center (multi-DC) deployments for geo-redundancy, where regions can be placed across DCs to enable disaster recovery while preserving availability.^[13] Transactions in TiDB follow a two-phase commit (2PC) protocol coordinated by the TiDB Server, drawing from the Percolator model to ensure ACID properties across distributed nodes.^[39] The process starts with obtaining a start timestamp from PD, followed by read operations using multi-version concurrency control (MVCC) and buffered writes; in the prewrite phase, locks are acquired on keys in parallel, and the commit phase assigns a commit timestamp before finalizing updates asynchronously.^[39] TiDB supports both optimistic and pessimistic modes: optimistic transactions defer conflict detection to the prewrite phase, retrying on failures for low-contention scenarios, while pessimistic mode acquires locks during execution using a for-update timestamp to handle high contention, supporting isolation levels like repeatable read.^[39] For data ingestion, TiDB Lightning facilitates bulk loading of large datasets, such as TB-scale imports from files like SQL dumps or CSV, in either physical mode—for high-speed key-value ingestion directly into TiKV—or logical mode—for ACID-compliant SQL execution on live clusters.^[40] Complementarily, Data Migration (DM) enables streaming ingestion by parsing and replicating incremental binlog events from upstream MySQL or MariaDB sources, handling DML and DDL changes with filtering and sharding merge rules to maintain consistency during ongoing operations.^[41]

Features

Horizontal scalability

TiDB achieves linear horizontal scalability by decoupling its compute and storage layers, allowing independent expansion of resources to handle growing workloads. Additional TiDB Server nodes can be added to increase read and write throughput, as the stateless SQL layer processes queries in parallel behind a load balancer like TiProxy. Similarly, scaling out TiKV nodes expands storage capacity and I/O performance, with data automatically redistributed across the cluster. The Placement Driver (PD) enables auto-scaling by monitoring cluster health and orchestrating resource allocation without manual intervention.^[42] Central to this scalability is TiKV's region management, where data is partitioned into discrete units called Regions, each approximately 256 MiB in size by default. These Regions split automatically when exceeding 384 MiB to prevent overload, dividing into two or more smaller units, and merge when falling below 54 MiB to optimize space and efficiency. PD detects hot-spots—Regions experiencing disproportionate load—through metrics like read/write traffic and CPU usage, then migrates or rebalances them across nodes to maintain even distribution and avoid bottlenecks.^[43] In practice, TiDB clusters can sustain over 1 million queries per second (QPS) while scaling to petabyte-level data volumes, all without downtime, as demonstrated in production environments like Flipkart's e-commerce platform. This elasticity supports seamless growth for high-traffic applications, with compression and sharding ensuring efficient resource use at massive scales.^[44]^[45] Introduced in 2025, TiDB X enhances this capability with context-aware scaling tailored for AI workloads, using real-time signals such as QPS, latency, and query patterns to predict and provision resources proactively. This architecture leverages object storage for zero-friction elasticity, adjusting compute and storage in minutes to accommodate dynamic AI-driven demands like vector search and operational analytics on a unified platform.^[31] Despite these strengths, horizontal scalability in multi-region deployments can introduce network latency challenges, with cross-region round-trip times potentially reaching hundreds of milliseconds. TiDB mitigates this through follower reads, enabling replica nodes in local regions to serve queries and reduce traffic by up to 50%, thus minimizing global synchronization overhead.^[46]

MySQL compatibility

TiDB provides extensive compatibility with MySQL at both the wire protocol and functional levels, allowing applications built for MySQL to connect and operate with minimal modifications. This compatibility is a core design goal, enabling seamless adoption in MySQL-centric ecosystems without requiring code changes for most workloads.^[47] TiDB supports the full MySQL 5.7 and 8.0 wire protocol, permitting direct connections using standard MySQL clients, drivers, and connection strings. For instance, applications can substitute a TiDB endpoint for a MySQL one without altering client configurations, as the protocol handles authentication, query execution, and result sets identically. This wire-level parity extends to tools like MySQL Workbench and Navicat, which connect natively to TiDB clusters.^[47]^[48] In terms of SQL dialect, TiDB achieves high compatibility with MySQL 5.7 and 8.0 syntax for DDL and DML operations, covering most common use cases including index creation, table partitioning (HASH, RANGE, LIST, KEY types), and basic query constructs. It supports advanced analytical features such as window functions and common table expressions (CTEs), aligning with MySQL 8.0 standards—window functions operate similarly to MySQL's implementation, enabling ranking, aggregation over partitions, and ordered computations. However, gaps exist in specialized areas, such as limited support for geographic information system (GIS) functions, spatial data types, and indexes, as well as the absence of stored procedures, triggers, events, FULLTEXT indexes, and XML functions. DDL operations like online schema changes are supported, but multi-object ALTER TABLE statements and certain type conversions (e.g., DECIMAL to DATE) are restricted.^[47]^[49]^[50] TiDB integrates effectively with the MySQL ecosystem, including object-relational mapping (ORM) frameworks like Hibernate (via TiDB Dialect in version 6.0 and later) and tools such as phpMyAdmin and DBeaver for administration and querying. It also supports binlog-style replication through TiCDC, which captures change data in formats compatible with MySQL consumers like Kafka and Debezium, facilitating downstream synchronization. These integrations ensure that monitoring, backup, and development workflows from the MySQL landscape transfer directly to TiDB environments.^[51]^[52]^[53] Migration to TiDB is straightforward for most applications, as no schema alterations are required for standard MySQL schemas, and tools like TiDB Data Migration (DM) handle both full and incremental data synchronization from MySQL or MariaDB sources without downtime. DM replicates DDL and DML changes while preserving compatibility with MySQL's protocol and binlog formats, making it suitable for hybrid or phased transitions. This ease of migration underscores TiDB's role as a drop-in replacement for scaling MySQL deployments.^[41]^[47] With the release of TiDB 8.0 in March 2024, compatibility with MySQL 8.0 was further solidified, incorporating features like window functions, CTEs, and system variables such as div_precision_increment for precise division handling. Earlier versions like 7.4 (June 2024) marked official MySQL 8.0 alignment, but TiDB 8.0 extends this with enhanced DML optimizations and ecosystem tooling, ensuring parity for modern MySQL applications. Default settings, including charset utf8mb4 and collation utf8mb4_bin, match MySQL 8.0 conventions to minimize behavioral differences.^[28]^[50]^[54]

Distributed transactions

TiDB supports distributed ACID transactions across its nodes, ensuring atomicity, consistency, isolation, and durability in a horizontally scaled environment. This is achieved through a hybrid transaction model inspired by Google's Percolator, which combines multi-version concurrency control (MVCC) for isolation and a two-phase commit (2PC) protocol for atomicity.^[55]^[56] In the 2PC implementation, the TiDB server acts as the transaction coordinator, while TiKV nodes serve as participants managing data storage and replication. The process begins with a prewrite phase, where TiDB assigns a start timestamp from the Placement Driver (PD) and sends parallel prewrite requests to relevant TiKV Regions; locks are acquired on keys if no conflicts exist, validated against existing MVCC versions, with the primary key lock coordinating secondary keys. If prewrites succeed, the commit phase follows: TiDB commits the primary key first via Raft consensus on TiKV, then secondaries, releasing locks and externalizing committed versions to ensure all-or-nothing atomicity and durability through quorum replication.^[56]^[57] TiDB employs MVCC to enable snapshot isolation, allowing transactions to read consistent snapshots of data committed before their start timestamp, thus avoiding dirty reads and non-repeatable reads without blocking writers. The default isolation level is Repeatable Read, compatible with MySQL's semantics, while Read Committed (introduced in v4.0) and Read Uncommitted are also supported for less stringent scenarios; Serializable isolation, as defined by SQL-92, is not natively provided, though pessimistic locking can approximate stricter controls in high-conflict cases.^[58]^[59] To handle varying workloads, TiDB offers both optimistic and pessimistic transaction modes, with optimistic as the default since it assumes low conflict rates and defers conflict detection until the commit phase, minimizing locking overhead and improving throughput in read-heavy or low-contention environments. Pessimistic mode, enabled via configuration or statements like BEGIN PESSIMISTIC, acquires locks early during reads (e.g., SELECT FOR UPDATE) and writes, suiting high-conflict scenarios by preventing aborts but potentially increasing latency due to blocking. Both modes build on the same 2PC foundation, though pessimistic adds a pipelined locking phase for efficiency.^[60]^[61]^[62] Deadlock detection occurs automatically in pessimistic mode at the TiKV layer, where circular wait dependencies among lock requests are identified; upon detection, one transaction is aborted with error code 1213, and wait timeouts (default 50 seconds) trigger error 1205 if unresolved. Lock information can be queried via INFORMATION_SCHEMA.DEADLOCKS or CLUSTER_DEADLOCKS tables for troubleshooting.^[63]^[61] TiDB provides linearizability as the default consistency model for reads and writes, ensuring operations appear atomic and in strict total order as if executed sequentially, enforced by Raft consensus and timestamp ordering from PD. For multi-region deployments, causal consistency is supported when features like async commit are enabled, preserving operation dependencies across regions while reducing latency at the cost of weaker global ordering guarantees.^[64]^[65] Transaction performance in TiDB emphasizes low latency, with timestamp oracle (TSO) allocation from PD typically under 1 ms for local operations, enabling sub-millisecond point reads in optimistic mode under low load. Commit latency averages around 12-13 ms for typical workloads, influenced by prewrite and Raft replication durations; async commit, introduced in v5.0, accelerates cross-region transactions by decoupling secondary commits, reducing end-to-end latency while maintaining causal consistency when paired with one-phase commit options.^[66]^[67]^[68]

Cloud-native design

TiDB is engineered with a cloud-native architecture that leverages container orchestration platforms like Kubernetes to ensure seamless deployment, management, and scaling in dynamic cloud environments. The TiDB Operator, an extension for Kubernetes, automates the full lifecycle of TiDB clusters, including provisioning, upgrading, scaling, and failover operations, allowing operators to manage distributed databases declaratively through Kubernetes custom resources.^[69]^[1] This containerization approach enables TiDB to run portably across various infrastructures, abstracting underlying hardware complexities and facilitating rapid iteration in microservices-based applications.^[70] A core aspect of TiDB's cloud-native design is its elasticity, achieved through the decoupling of compute and storage layers, which permits independent scaling of resources without downtime. In Kubernetes deployments, this manifests as dynamic resource allocation via Horizontal Pod Autoscaler (HPA) integrations and auto-healing mechanisms that restart or reschedule failed pods automatically, maintaining cluster resilience against node failures or traffic spikes.^[71]^[72] TiDB's storage engine, TiKV, further enhances this by utilizing cloud-native object storage like Amazon S3 or compatible services, ensuring data durability and scalability decoupled from compute instances.^[73] TiDB supports multi-cloud deployments across major providers including AWS, Google Cloud Platform (GCP), Microsoft Azure, and Alibaba Cloud, enabling organizations to avoid vendor lock-in while leveraging region-specific advantages. The TiDB Cloud Serverless offering, launched in July 2023, introduces a pay-per-use model that automatically scales compute resources from zero to handle variable workloads, optimizing costs for bursty applications.^[74]^[73] In 2025, the introduction of TiDB X architecture further advanced serverless scaling, providing enhanced elasticity for unpredictable AI-driven workloads through optimized resource orchestration and faster auto-scaling responses.^[12] For observability, TiDB integrates natively with Prometheus for metrics collection and Grafana for visualization, exposing key performance indicators such as query latency, throughput, and resource utilization to enable proactive monitoring in cloud setups.^[35] Tracing capabilities support distributed request tracking, aligning with standards like OpenTelemetry for end-to-end visibility in microservices architectures.^[75] Security in TiDB's cloud-native design incorporates role-based access control (RBAC) for fine-grained permissions on database operations, mandatory TLS encryption for all network communications to protect data in transit, and seamless integration with cloud provider IAM systems for centralized identity management.^[76]^[77]^[78] These features ensure compliance with standards like GDPR and HIPAA while simplifying secure operations across hybrid and multi-cloud environments.^[79]

HTAP capabilities

TiDB supports Hybrid Transactional/Analytical Processing (HTAP) through a unified engine that separates online transaction processing (OLTP) workloads on its row-oriented storage layer, TiKV, from online analytical processing (OLAP) workloads on the column-oriented TiFlash layer, while maintaining shared data access to enable zero-ETL pipelines.^[80] This architecture ensures strong consistency across both engines without requiring data duplication or batch exports, allowing transactional updates in TiKV to propagate in real-time to TiFlash for immediate analytical use.^[34] TiFlash operates as a distributed columnar storage system integrated into the TiDB cluster, featuring Raft-replicated replicas that provide high availability and fault tolerance for analytical data.^[34] These replicas are created selectively per table via manual configuration, such as using DDL commands to replicate specific tables from TiKV, enabling asynchronous synchronization of data regions without impacting OLTP performance on the primary store.^[34] By leveraging Multi-Raft protocols with learner roles, TiFlash maintains logical consistency and snapshot isolation, supporting efficient columnar scans and aggregations through integrated coprocessors based on ClickHouse technology.^[34] The TiDB query optimizer employs cost-based routing to direct analytical queries, such as those involving heavy aggregations or joins, to TiFlash replicas, achieving speedups of 10x or more compared to row-store processing on TiKV.^[81] For instance, complex aggregation queries can execute up to 8x faster in benchmarks like TPC-H at scale 100, due to columnar storage optimizations and massively parallel processing (MPP) distribution across TiFlash nodes.^[25] This routing is automatic based on query patterns, with optional SQL hints available for explicit control, ensuring OLTP queries remain on TiKV for low-latency point reads and writes.^[82] TiDB's HTAP design delivers real-time analytics with sub-second latency for ad-hoc queries on up-to-date transactional data, integrating seamlessly with business intelligence tools like Tableau for interactive dashboards and reporting.^[80] Since version 5.0, enhancements including a vectorized execution engine in TiFlash have further accelerated scan-intensive operations by enabling MPP mode for distributed joins and aggregations, reducing query times for large datasets exceeding 10 million rows.^[25] These capabilities are particularly valuable in use cases such as e-commerce platforms performing real-time inventory analysis to optimize stock levels during peak sales, or financial systems detecting fraud patterns through instant analytical scans on transaction streams.^[80]

High availability

TiDB achieves high availability through a distributed architecture that emphasizes fault tolerance and automatic recovery mechanisms. At the core of its data replication strategy is the TiKV storage layer, which employs multi-replica Raft consensus groups for each data region. Typically, three replicas are maintained per region, ensuring that data is durably stored across multiple nodes. The Placement Driver (PD) component schedules these replicas with awareness of topology labels, such as zones and regions, to promote diversity and prevent single points of failure, thereby enhancing disaster recovery capabilities.^[43]^[83]^[84] Failover in TiDB is handled seamlessly via Raft's automatic leader election process, which detects and resolves node failures by electing a new leader among replicas, typically completing within seconds to minimize downtime. The TiDB server layer is stateless, allowing for instantaneous scaling in or out without data loss or reconfiguration, as compute nodes do not persist state and can be replaced dynamically. This design ensures that client connections remain uninterrupted during failures, with the system retrying operations in milliseconds as needed.^[85]^[86]^[3] For backup strategies supporting high availability, TiDB utilizes the Backup & Restore (BR) tool to enable point-in-time recovery (PITR), allowing clusters to be restored to any specific timestamp within the retention period using snapshot and log backups. Additionally, asynchronous replication modes facilitate disaster recovery (DR) by switching to non-synchronous log replication when primary replicas fail, maintaining availability without strict synchronization guarantees during outages.^[87]^[88]^[89] Monitoring and self-healing further bolster uptime, with integrated alerting systems that notify on node failures and critical errors through Prometheus and Grafana dashboards. TiDB Operator, when deployed on Kubernetes, automates self-healing by managing pod restarts, scaling, and recovery, ensuring the cluster responds to faults without manual intervention. In TiDB Cloud, these features contribute to a 99.99% service level agreement (SLA), guaranteeing resilience against node or zone failures with no data loss.^[90]^[91]^[74]

Vector search and AI integration

TiDB provides native vector database functionality, allowing users to store and query vector embeddings directly within its SQL framework. This support includes the creation of vector columns using data types such as BINARY or VARBINARY to hold embeddings generated by models like those from OpenAI or Hugging Face.^[92] The system integrates approximate nearest neighbor (ANN) search capabilities, enabling efficient similarity searches for high-dimensional data in AI and machine learning applications.^[92] A key component is the Hierarchical Navigable Small World (HNSW) index, which TiDB uses for vector indexing to accelerate k-nearest neighbors (k-NN) queries. Users can create an HNSW index on vector columns via SQL statements, such as CREATE INDEX idx ON table(vector_column) USING HNSW;, which builds a graph-based structure for fast approximate searches with high recall rates, often up to 98% accuracy in benchmarks.^[93] This integration allows semantic search operations to be performed seamlessly alongside traditional SQL queries, without requiring separate vector databases.^[94] TiDB enhances AI workloads through features like hybrid search, which combines vector similarity with full-text search to improve relevance in retrieval tasks. For instance, queries can fuse cosine similarity on embeddings with keyword matching using functions like MATCH AGAINST in a single SQL statement.^[95] Additionally, it supports retrieval-augmented generation (RAG) pipelines by providing real-time access to fresh data, enabling large language models (LLMs) to ground responses in up-to-date embeddings and structured information.^[96] In 2025, PingCAP introduced TiDB X, a context-aware architecture designed for zero-downtime scaling of AI models and native integration with LLMs. TiDB X leverages object storage as its backbone to handle dynamic workloads, allowing seamless expansion of vector datasets while maintaining query consistency and integrating directly with frameworks like LangChain for agentic AI applications.^[31] This advancement builds on TiDB's HTAP capabilities to support real-time AI analytics on vector data.^[97] Vector embeddings are stored in TiKV for transactional workloads and TiFlash for analytical processing, utilizing columnar storage to optimize ANN algorithms like HNSW for billion-scale datasets. This distributed storage ensures fault-tolerant persistence and horizontal scaling of vectors across clusters.^[98]^[92] Common use cases include recommendation systems, where vector search powers personalized content suggestions based on user embeddings; chatbots, enabling semantic understanding of queries through similarity matching; and anomaly detection, identifying outliers in time-series data via distance metrics on embedded features.^[92] Performance-wise, TiDB's vector search delivers low millisecond latency for k-NN queries on large datasets, thanks to optimized HNSW indexing and in-memory graph traversal.^[99] This efficiency supports real-time AI inference at scale, with recall rates balancing speed and accuracy for production environments.^[93]

Deployment options

On-premises methods

TiDB supports several self-managed deployment options for on-premises environments, such as bare metal servers or virtual machines, enabling organizations to operate clusters without relying on cloud infrastructure. These methods leverage command-line tools and automation scripts to provision, configure, and maintain TiDB components including Placement Driver (PD), TiDB servers, and TiKV nodes.^[100]^[101] The primary tool for on-premises deployments is TiUP, a CLI-based cluster management solution that facilitates single-command operations for deploying, upgrading, and scaling TiDB clusters. TiUP operates from a control machine, using a YAML-formatted topology file to define the cluster layout, including host specifications, node roles (e.g., PD, TiDB, TiKV), and resource allocations. This allows for straightforward setup on bare metal or VMs, with built-in support for rolling upgrades and scaling without downtime; for instance, adding TiKV nodes involves updating the topology file and executing a scale-out command. TiUP also integrates monitoring components like Prometheus and Grafana during deployment, providing metrics for observability.^[100]^[102] For multi-node automation, TiDB Ansible offers a playbook-based approach using Ansible to orchestrate cluster provisioning across physical or virtual hosts. This method initializes the system, deploys core components, and handles tasks like rolling restarts, making it suitable for scripted, repeatable setups in enterprise environments. Hardware prerequisites include SSD storage for TiKV nodes to ensure optimal I/O performance, with recommendations for at least 8 CPU cores and 16 GB RAM per node to support production workloads. Although TiUP has largely superseded Ansible for new deployments, Ansible remains viable for managing legacy clusters or environments requiring fine-grained playbook customization.^[101]^[103] Local development and single-node testing can be achieved using Docker Compose, which provisions a lightweight TiDB cluster via predefined Docker images for PD, TiDB, and TiKV. Users clone the official repository, pull images from Docker Hub, and start the stack with a simple docker-compose up command, accessing the database via MySQL client on port 4000. This setup is ideal for prototyping and isolated testing but is not recommended for production due to its single-node limitations. Configurations for PD and TiKV, such as replication settings or storage paths, are managed through YAML files like docker-compose.yml and component-specific configs.^[104] On-premises best practices emphasize high availability and performance tuning. Deploy at least three PD nodes across distinct hosts to maintain quorum and fault tolerance, with NVMe SSDs recommended for TiKV storage in production to handle high-throughput workloads—aim for 2 TB per TiKV node minimum. Overall cluster sizing should include at least three TiKV nodes and two TiDB servers, with monitoring enabled via Prometheus for proactive issue detection.^[105]^[106]

Cloud and containerized deployments

TiDB offers robust options for cloud and containerized deployments, enabling seamless integration with major cloud providers and Kubernetes environments. TiDB Cloud is a fully managed Database-as-a-Service (DBaaS) platform that automates the deployment, scaling, monitoring, and maintenance of TiDB clusters across Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure.^[107] It provides two primary tiers: Dedicated, which offers isolated resources for high-performance, predictable workloads with fine-tuned configurations; and Serverless (renamed to Starter in 2025), which supports instant autoscaling and pay-per-use pricing for variable or development workloads.^[107]^[108] In June 2025, TiDB Cloud Dedicated entered public preview on Azure, expanding multi-cloud availability and allowing enterprises to deploy distributed SQL databases natively within Azure's ecosystem.^[30] For containerized deployments, TiDB Operator serves as the core automation tool on Kubernetes, utilizing Custom Resource Definitions (CRDs) to declaratively manage TiDB clusters. It handles full life-cycle operations, including initial deployment, horizontal scaling of components like TiDB servers and TiKV storage nodes, rolling upgrades without downtime, backups, and failover recovery.^[109] Day 2 operations, such as resource reconfiguration and monitoring integration, are automated through Kubernetes-native controllers, ensuring high availability and elasticity in dynamic environments.^[109] Users can deploy TiDB Operator via Helm charts from the official PingCAP repository, which simplifies installation on Kubernetes clusters by packaging CRDs, controllers, and dependencies into reusable templates.^[110] These charts integrate with Container Storage Interface (CSI) drivers for persistent storage, enabling quick setups with commands like helm install tidb-operator pingcap/tidb-operator.^[111] TiDB emphasizes multi-cloud portability, supporting deployment on managed Kubernetes services such as AWS Elastic Kubernetes Service (EKS), Google Kubernetes Engine (GKE), and Azure Kubernetes Service (AKS). On EKS, for instance, users provision node groups with optimized instance types (e.g., c7g.4xlarge for TiDB) and gp3 EBS volumes, then apply TiDBCluster YAML manifests for auto-provisioning.^[112] Similar workflows apply to GKE with pd-ssd storage classes and n2-standard machine types, and AKS with Ultra SSD for high-IOPS TiKV nodes, allowing consistent operations across providers without vendor lock-in.^[113]^[114] In Serverless mode within TiDB Cloud, clusters automatically suspend during idle periods and resume on demand, optimizing costs for bursty or unpredictable workloads by scaling compute and storage to zero when inactive. This mode supports AI-driven scaling for vector search and machine learning applications across TiDB Cloud tiers including Starter, where resources dynamically adjust based on query complexity and data volume.^[12]

Ecosystem tools

Data migration and ingestion

TiDB Data Migration (DM) is an integrated data migration platform that enables full data migration and incremental replication from heterogeneous sources, primarily MySQL-compatible databases such as MySQL (versions 5.6 to 8.0), MariaDB (version 10.1.2 and later, on an experimental basis), and Amazon Aurora MySQL, to TiDB clusters.^[41] It supports online DDL synchronization, including compatibility with tools like gh-ost and pt-osc for ghost and online schema changes, ensuring minimal disruption during schema alterations.^[41] Additionally, DM provides sharding support, allowing it to merge data from multiple upstream shards into a single TiDB database while automatically detecting and applying DDL changes across shards.^[41] For initial bulk data imports into TiDB, TiDB Lightning serves as a high-speed loader capable of handling terabyte-scale datasets.^[40] It accepts input from local files or Amazon S3-compatible storage in formats such as SQL dumps, CSV, or Parquet, and operates in two modes: physical import, which encodes data into key-value pairs for direct ingestion into TiKV storage (achieving speeds of 100 to 500 GiB per hour), and logical import, which generates and executes SQL statements (at 10 to 50 GiB per hour).^[40]^[115] The physical mode is optimized for empty tables and empty clusters, making it ideal for greenfield deployments or large-scale initial loads.^[40] Dumpling complements these tools by providing a logical export mechanism from MySQL-compatible sources, generating SQL dumps or CSV files that can be directly imported via TiDB Lightning.^[116] It exports schema and data into structured files, including metadata and per-table splits (e.g., {schema}.{table}.{index}.sql), supporting parallel dumping for efficiency and output to local storage or S3-compatible endpoints.^[116] This makes Dumpling a key component for preparing MySQL data for TiDB ingestion, particularly in scenarios requiring portable, human-readable formats.^[116] Within DM, the loader stage handles full data migration by dumping upstream data (similar to Dumpling) and loading it into TiDB, while the syncer stage enables real-time incremental replication by parsing and applying binlog events.^[117] Integration between stages ensures seamless transitions, with the syncer resuming from the loader's completion point using binlog positions as checkpoints, updated every 30 seconds to track replication progress across workers.^[117] Conflict resolution during replication is managed through features like the Causality algorithm for detecting concurrent updates and a safe mode that rewrites INSERT and UPDATE operations to REPLACE on task restarts, preventing data inconsistencies based on binlog coordinates.^[117]

Backup and recovery

TiDB provides robust backup and recovery mechanisms through the Backup & Restore (BR) tool, a command-line utility designed for distributed operations on cluster data stored in TiKV nodes.^[88] BR enables full snapshot backups of the entire cluster at a specific point in time, capturing raw key-value data for physical consistency.^[88] It also supports incremental log backups that record changes to TiKV data, allowing for Point-in-Time Recovery (PITR) with a Recovery Point Objective (RPO) as low as 5 minutes.^[88] These backups are stored in S3-compatible external storage, such as Amazon S3, Google Cloud Storage, or Azure Blob Storage, ensuring scalability and durability.^[88] For logical backups, TiDB uses Dumpling, a data export tool that generates SQL or CSV files compatible with MySQL ecosystems, facilitating portable restores across different systems.^[116] Unlike BR's physical approach, which backs up underlying SST files for faster intra-TiDB restores, Dumpling produces human-readable exports suitable for migrations or archiving but with higher overhead due to SQL parsing.^[116] BR remains the preferred method for production environments in TiDB due to its efficiency in handling distributed physical data.^[118] The recovery process in TiDB leverages BR to restore data to an empty or non-conflicting cluster, supporting full cluster recovery or specific databases and tables.^[119] PITR combines the most recent full snapshot with log backups up to a user-specified timestamp, such as 2022-05-15 18:00:00+0800, enabling precise rollbacks.^[87] During recovery, BR handles partial failures by pausing tasks and reporting details like pause times and store-specific errors, allowing operators to address issues such as unavailable TiKV nodes before resuming.^[87] Restores require an empty target cluster to avoid conflicts, and the process applies changes sequentially from logs to achieve the desired state.^[87] Backups and restores can be scheduled and managed using TiUP for on-premises deployments or the TiDB Operator for Kubernetes environments, integrating seamlessly with automation workflows.^[118] From TiDB v7.0.0, SQL-based backup and restore commands are available directly within the database.^[88] Security is enhanced through encryption: BR supports server-side encryption (SSE) for S3 storage using AWS KMS keys, and similar mechanisms for Azure Blob Storage with encryption scopes or AES-256 keys, protecting data at rest and in transit via storage provider credentials.^[120] Performance of BR operations scales horizontally with cluster size, utilizing parallel processing across TiKV nodes for distributed I/O.^[121] Snapshot backups achieve speeds of 50-100 MB/s per TiKV node with minimal impact (<20% on cluster throughput), while restores reach up to 2 TiB/hour for snapshots and 30 GiB/hour for logs in tested configurations with 6-21 nodes.^[88] For terabyte-scale clusters, such as 10 TB datasets, BR enables recovery times under 1 hour by fully utilizing hardware resources, as demonstrated in benchmarks with 1+ GB/s throughput.^[122] This results in low Recovery Time Objectives (RTO) for large-scale recoveries, particularly with optimizations in TiDB 8.1 that improve region scattering and communication efficiency.^[123]

Change data capture

TiDB provides change data capture (CDC) capabilities through dedicated tools that enable real-time synchronization of incremental data changes from the database to downstream systems, ensuring low-latency replication for distributed environments.^[124] The primary tool, TiCDC (TiDB CDC), captures row-level changes by pulling and processing change logs from TiKV storage nodes, which are based on Raft consensus protocol logs, and exports sorted row-based incremental data in formats compatible with various sinks.^[124] TiCDC supports streaming to targets such as Apache Kafka and Debezium connectors, facilitating integration with event-driven architectures and real-time processing pipelines.^[124] Additionally, TiDB Binlog offers a legacy MySQL-compatible binary log protocol for replicating changes to downstream systems like MySQL databases and Elasticsearch, though it was deprecated starting from TiDB version 7.5.0, fully deprecated in v8.3.0, and removed in v8.4.0 in favor of more scalable alternatives.^[125]^[126]^[127] Common use cases for TiDB's CDC tools include populating data warehouses with real-time streams from TiDB to storage like Apache Kafka and Amazon S3 for analytics, invalidating caches by propagating updates to in-memory systems, and enabling cross-data-center synchronization for high-availability setups across regions.^[128] Key features of TiCDC include exactly-once delivery semantics when configured with idempotent sinks like Kafka, handling of schema evolution through replication of DDL statements alongside DML changes to maintain downstream consistency, and parallel processing via multi-threaded processors and event queues in its updated architecture for improved throughput.^[129] Despite these strengths, TiDB CDC tools exhibit limitations, such as increased latency in high-throughput workloads due to transaction splitting needs and partial horizontal scalability constraints, particularly when network latency exceeds 100 ms between clusters.^[130]^[131]^[124]

References

[1]
What is TiDB Self-Managed
TiDB is a distributed database designed for the cloud, providing flexible scalability, reliability, and security on the cloud platform. Users can elastically ...
[2]
About PingCAP | TiDB
Our History. PingCAP started in 2015 when three seasoned infrastructure engineers were sick and tired of the way databases were managed, scaled, and ...
[3]
TiDB Architecture
Provides a rich series of data migration tools for migrating, replicating, or backing up data. As a distributed database, TiDB is designed to consist of ...
[4]
The Unified Database for Modern Workloads - TiDB
TiDB delivers strong consistency, built-in horizontal scalability, and cloud-native resilience for the most demanding workloads.
[5]
pingcap/tidb: TiDB - the open-source, cloud-native, distributed SQL ...
An open-source, cloud-native, distributed SQL database designed for high availability, horizontal and vertical scalability, strong consistency, and high ...
[6]
How we build TiDB
Oct 17, 2016 · Inspired by Spanner and F1, we are making a NewSQL database. Of course, it's open source. What to build? So we are building a NewSQL database ...
[7]
PingCAP - Crunchbase Company Profile & Funding
$$10.4M This year, PingCAP is projected to spend $10.4M on IT ... PingCAP is located in Sunnyvale, California, United States . Who invested in ...
[8]
TiDB Self-Managed | TiDB Docs
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads.
[9]
TiDB Architecture
As a distributed database, TiDB is designed to consist of multiple components. These components communicate with each other and form a complete TiDB system.Tidb Server · Storage Servers · Tikv Server
[10]
How we build TiDB | PingCAP株式会社
We need TiDB to be easy to maintain so we chose the loose coupling approach. We design the database to be highly layered with a SQL layer and a Key-Value layer.
[11]
TiDB Storage | TiDB Docs
Transaction of TiKV adopts the model used by Google in BigTable: Percolator. TiKV's implementation is inspired by this paper, with a lot of optimizations ...Key-Value pairs · Local storage (RocksDB) · Raft protocol · Region
[12]
https://www.pingcap.com/blog/introducing-tidb-x-a-new-foundation-distributed-sql-ai-era/
[13]
China's Biggest Startups Ditch Oracle and IBM for Home-Made Tech
Jun 25, 2019 · PingCAP founders Huang Dongxu, right, Liu Qi, center, and Cui Qiu. Source: PingCAP. “A lot of firms that used to resort to Oracle or IBM ...
[14]
Top 100 Influential Figures in the Domestic Database Industry
Jun 21, 2024 · In 2015, Liu Qi co-founded the enterprise-level open-source distributed database company PingCAP with Huang Dongxu and Cui Qiu, serving as CEO.
[15]
China's biggest startups ditch Oracle, IBM for home-made technology
Jun 25, 2019 · Inspired by Google's Cloud Spanner, which pioneered the distributed database model, the trio - Mr Huang, Liu Qi and Cui Qiu - began creating an ...
[16]
Building a Large-scale Distributed Storage System Based on Raft
May 21, 2020 · Since April 2015, we PingCAP have been building TiKV, a large-scale open-source distributed database based on Raft. It's the core storage ...<|control11|><|separator|>
[17]
Why did we choose Rust over Golang or C/C++ to develop TiKV?
Sep 26, 2017 · TiKV originates from the end of 2015. Our team was struggling among different language choices such as Pure Go, Go + Cgo, C++11, or Rust. Pure ...
[18]
PingCAP Raises $50 Million in Series C Round, and Yunqi Partners ...
Oct 1, 2018 · This is the third investment by Yunqi Partners after it led PingCAP's A round financing in September 2016. PingCAP completed its $15 million B ...
[19]
How Much Did PingCAP Raise? Funding & Key Investors - Clay
Apr 7, 2025 · How Much Funding Has PingCAP Raised? · Amount Raised: $5M · Date: August 2016 · Lead Investors: Yunqi Partners · Valuation at Round: Not publicly ...
[20]
PingCAP Launches TiDB 1.0 | TiDB
October 16, 2017 – PingCAP Inc., a cutting-edge distributed database technology company, officially announces the release of TiDB 1.0.
[21]
TiDB 2.0 is Ready - Faster, Smarter, and Battle-Tested
Apr 29, 2018 · TiDB 2.0 is ready! Experience faster, smarter, and battle-tested HTAP database technology with TiDB 2.0 today.
[22]
TiDB 4.0 GA, Gearing You Up for an Unpredictable World with a ...
Jun 17, 2020 · TiDB 4.0 is a real-time HTAP, truly elastic, cloud-native database, which meets your application requirements in various scenarios.Serverless Tidb · Tidb 4.0 Achieves Faster... · Tidb 4.0 Is Smarter And More...
[23]
What's New in TiDB 5.0
Apr 7, 2021 · What's New in TiDB 5.0. Release date: April 7, 2021. TiDB version: 5.0.0. In v5.0, PingCAP is dedicated to helping enterprises quickly build ...Compatibility changes · New features · Performance optimization · Improve stability
[24]
TiDB 6.0: A Leap Towards an Enterprise-Grade Cloud Database
Apr 7, 2022 · TiDB 6.0 significantly enhances the manageability as an enterprise product and incorporates many of the essential features.
[25]
TiDB 7.0.0 Release Notes
Release date: March 30, 2023. TiDB version: 7.0.0-DMR. Quick access: Quick start. In v7.0.0-DMR, the key new features and improvements are as follows:Feature details · Performance · Reliability · Compatibility changes
[26]
TiDB 8.0.0 Release Notes
Release date: March 29, 2024. TiDB version: 8.0.0. Quick access: Quick start. 8.0.0 introduces the following key features and improvements.Feature details · SQL · Data migration · Compatibility changes
[27]
TiDB Cloud Release Notes in 2025
Oct 21, 2025 · July 15, 2025. Upgrade the default TiDB version of new TiDB Cloud Dedicated clusters from v8. 1.2 to v8. 5.2.
[28]
PingCAP Launches TiDB X and New AI Capabilities
Oct 8, 2025 · Today at SCaiLE Summit 2025, PingCAP unveiled TiDB X, a new architecture for context-aware, zero-friction scaling and native AI support.
[29]
TiDB Computing
### Summary of TiDB Server: Stateless SQL Layer
[30]
TiDB Storage
### Summary of TiKV Details
[31]
TiFlash Overview
- **TiFlash Overview**:
[32]
TiDB Monitoring Framework Overview
The TiDB monitoring framework adopts two open source projects: Prometheus and Grafana. TiDB uses Prometheus to store the monitoring and performance metrics and ...Missing: core | Show results with:core
[33]
The Lifecycle of a Statement - TiDB Development Guide
The Life cycle of a Statement. MySQL protocol package with command and statement string. After connecting and getting authenticated, the server is in a ...
[34]
TiDB Query Execution Plan Overview
The process of considering query execution plans is known as SQL optimization. The EXPLAIN statement shows the selected execution plan for a given statement.
[35]
[PDF] A Raft-based HTAP Database - TiDB - VLDB Endowment
The distributed storage layer consists of a row store (TiKV) and a columnar store (TiFlash). Logically, the data stored in TiKV is an ordered key-value map.
[36]
TiDB Lightning Overview
TiDB Lightning is a tool used for importing data at TB scale to TiDB clusters. It is often used for initial data import to TiDB clusters.
[37]
TiDB Data Migration Overview
Compatibility with MySQL. DM is compatible with the MySQL protocol and most of the features and syntax of MySQL 5.7 and MySQL 8.0. Replicating DML and DDL ...
[38]
Horizontal Scaling vs. Vertical Scaling: Choosing the Right Strategy
Several core mechanisms underpin TiDB's horizontal scalability: Automatic Sharding (Regions): TiDB automatically partitions table data into Regions ...
[39]
TiKV Overview - TiDB Docs
The Region size is currently 256 MiB by default. This mechanism helps the PD component to balance Regions among nodes in a TiKV cluster.Architecture Overview · Region and RocksDB · Region and Raft Consensus...
[40]
How Flipkart Scales Over 1M QPS with Zero Downtime Maintenance
May 29, 2025 · These tests demonstrated that TiDB could handle over 1 million QPS with 7.4 ms P99 latency and 120K writes per second at 13 ms. These benchmarks ...
[41]
Business Growth: How TiDB Scales Petabyte-Level Data Volumes
Dec 18, 2024 · Data Volume Explosion: Rapid user activity generates terabytes (TBs) to petabytes (PBs) of data that must be stored and managed efficiently. ...
[42]
How We Reduced Multi-region Read Latency and Network Traffic by ...
Feb 19, 2020 · TiDB reduced multi-region read latency and network traffic by 50% using Follower Read, Follower Replication, and a proxy service to reduce ...
[43]
MySQL Compatibility - TiDB Docs
TiDB is highly compatible with the MySQL protocol and the common features and syntax of MySQL 5.7 and MySQL 8.0. The ecosystem tools for MySQL (PHPMyAdmin, ...Differences from MySQL · Auto-increment ID · DDL operations
[44]
Connect to TiDB - TiDB Docs
TiDB supports the MySQL Client/Server Protocol, which allows most client drivers and ORM frameworks to connect to TiDB just as they connect to MySQL.
[45]
Window Functions - TiDB Docs
Window Functions. The usage of window functions in TiDB is similar to that in MySQL 8.0. For details, see MySQL Window Functions.
[46]
TiDB 7.4 Release: Officially Compatible with MySQL 8.0
Jun 21, 2024 · After supporting the complete functions of MySQL 5.7, TiDB continues to add support for new features released in MySQL 8.0. The recent version ...
[47]
Choose Driver or ORM - TiDB Docs
TiDB is highly compatible with the MySQL protocol but some features are incompatible with MySQL. For a full list of compatibility differences, see MySQL ...
[48]
Third-Party Tools Supported by TiDB - TiDB Docs
TiDB is highly compatible with the MySQL protocol, so most of the MySQL drivers, ORM frameworks, and other tools that adapt to MySQL are compatible with TiDB.
[49]
https://docs.pingcap.com/tidb/stable/window-functions/
[50]
System Variables - TiDB Docs
Additionally, TiDB presents several MySQL variables as both readable and settable. This is required for compatibility, because it is common for both ...
[51]
TiDB vs Traditional Databases: Scalability and Performance
Sep 5, 2024 · TiDB excels in horizontal scaling, where adding more nodes to the cluster can increase capacity and performance without downtime. The separation ...
[52]
ACID Transactions in Distributed Databases - TiDB
Oct 31, 2025 · Percolator uses the snapshot isolation model. This allows transactions to take snapshots of the database at initiation, promote parallel updates ...Inside Tidb's Distributed... · How Tidb Ensures Acid... · Tidb Vs. Other Distributed...
[53]
https://docs.pingcap.com/tidb/stable/ticdc-overview/
[54]
TiDB Transaction Isolation Levels
Transaction retries in TiDB's optimistic concurrency control might fail, leading to a final failure of the transaction, while in TiDB's pessimistic concurrency ...
[55]
Ensuring Data Consistency in Distributed Databases - TiDB
Sep 26, 2024 · A diagram comparing the ACID, BASE, and CAP Theorem consistency models. ACID: Atomicity: Ensures that transactions are all-or-nothing.Introduction To Data... · Overview Of Consistency... · Best Practices For...
[56]
TiDB Optimistic Transaction Model
In the case that concurrent transactions frequently modify the same rows (a conflict), optimistic transactions may perform worse than Pessimistic Transactions.Principles of optimistic... · Transaction retries · Automatic retryMissing: flow | Show results with:flow
[57]
TiDB Pessimistic Transaction Mode
TiDB supports the pessimistic transaction mode on top of the optimistic transaction model. This document describes the features of the TiDB pessimistic ...Switch transaction mode · Behaviors · Differences from MySQL InnoDBMissing: flow | Show results with:flow
[58]
Optimistic Transactions and Pessimistic Transactions - TiDB Docs
The optimistic transaction model commits the transaction directly, and rolls back when there is a conflict. By contrast, the pessimistic transaction model ...Pessimistic transactions · Write a pessimistic transaction... · Optimistic transactionsMissing: flow 2PC
[59]
Troubleshoot Lock Conflicts - TiDB Docs
This document describes how to use Lock View to troubleshoot lock issues and how to deal with common lock conflict issues in optimistic and pessimistic ...
[60]
Ensuring Data Consistency in Distributed Systems - TiDB
Nov 30, 2024 · By default, TiDB provides linearizability (strong consistency), ensuring each transaction appears instantaneously from any client's perspective.
[61]
Transactions - TiDB Docs
TiDB supports explicit transactions (use [BEGIN|START TRANSACTION] and COMMIT to define the start and end of the transaction) and implicit transactions ( SET ...Common Statements · Autocommit · Causal ConsistencyMissing: date | Show results with:date
[62]
Latency Breakdown - TiDB Docs
Commit. The commit duration can be broken down into four metrics: Get_latest_ts_time records the duration of getting latest TSO in async-commit or single-phase ...Read Queries · Write Queries · Async Write
[63]
Performance Analysis and Tuning - TiDB Docs
Learn how to optimize database system based on database time and how to utilize the TiDB Performance Overview dashboard for performance analysis and tuning.Performance analysis and... · TiDB key metrics and cluster...
[64]
https://www.pingcap.com/article/ensuring-data-consistency-in-distributed-systems/
[65]
TiDB Operator Source Code Reading (I): Overview
Mar 23, 2021 · TiDB Operator is an automatic operation system for TiDB in Kubernetes. As a Kubernetes Operator tailored for TiDB, it is widely used by TiDB users to manage ...
[66]
TiDB: Scalable Cloud-Native SQL Database Solution
Sep 15, 2025 · Drawing inspiration from Google's Spanner and F1, TiDB emerged as an innovative open-source project that sought to reconcile the demands of ...
[67]
Optimizing Cloud-Native Apps with TiDB's Scalable Architecture | TiDB
TiDB Operator simplifies these processes by managing TiDB clusters on Kubernetes, enabling self-healing and adapting quickly to workload changes. Solutions ...
[68]
Best Practices for TiDB on AWS Cloud | PingCAP株式会社
Nov 24, 2020 · The storage layer is decoupled from the Elastic Compute Cloud (EC2) instance, and therefore is resilient to host failure and is easy to scale.
[69]
How PingCAP transformed TiDB into a serverless DBaaS using ...
Nov 14, 2023 · In July 2023, PingCAP released TiDB Serverless, a fully managed, autonomous DBaaS offering of TiDB. However, based on TiDB's existing ...Missing: GCP Azure 2022
[70]
TiDB Cloud FAQs
For new TiDB Cloud Dedicated clusters, the default TiDB version is v8.5.2 starting from July 15, 2025. For TiDB Cloud Starter and TiDB Cloud Essential clusters, ...Missing: June | Show results with:June
[71]
Introducing A New Foundation for Distributed SQL - TiDB X
Oct 8, 2025 · Today, at SCaiLE Summit 2025, we're announcing TiDB X – a breakthrough new architecture for TiDB that redefines how distributed SQL databases ...The Tidb X Breakthrough... · Why This Matters For You · Looking Ahead
[72]
How We Trace a KV Database with Less than 5% Performance Impact
Jun 30, 2021 · This article describes how we achieved tracing all requests' time consumption in TiKV with less than 5% performance impact.
[73]
Security - TiDB Docs
TiDB Cloud provides a robust and flexible security framework designed to protect data, enforce access control, and meet modern compliance standards. This ...Tidb Cloud User Accounts · Tidb Privileges And Roles · Network Access Control
[74]
Enhancing Data Security and Privacy in Distributed SQL Database
Dec 2, 2024 · TiDB deploys role-based access control (RBAC), enabling administrators to assign precise permissions to users based on predefined roles. This ...
[75]
Enable TLS Between TiDB Clients and Servers
To use connections secured with TLS, you first need to configure the TiDB server to enable TLS. Then you need to configure the client application to use TLS.Configure Tidb Server To Use... · Enable Authentication · Check Whether The Current...Missing: enforcement | Show results with:enforcement
[76]
TiDB Cloud Security: Protecting Data Without Added Complexity
Mar 27, 2025 · Explore the security features behind TiDB Cloud and learn when a self-managed TiDB deployment might be the best fit.How Tidb Cloud Keeps Your... · Tidb Cloud's Security Model... · How Tidb Cloud Manages...
[77]
Explore HTAP - TiDB Docs
TiDB HTAP can handle the massive data that increases rapidly, reduce the cost of DevOps, and be deployed in either self-hosted or cloud environments easily, ...Use cases · Environment preparation · Data preparation · Data processing
[78]
Transforming AI with Scalable Data Management | TiDB
TiDB's storage engine, in conjunction with TiFlash ... speedups of 10x or more. This is crucial for ... Product Overview TiDB Cloud TiDB Self-Managed Pricing.Leveraging Tidb For... · Why Tidb Is Ideal For Ai... · How Tidb Enhances Machine...<|control11|><|separator|>
[79]
HTAP Queries - TiDB Docs
HTAP stands for Hybrid Transactional and Analytical Processing. ... TiDB databases can perform both transactional and analytical tasks, which greatly simplifies ...
[80]
Best Practices for PD Scheduling - TiDB Docs
Because you cannot actually distribute a single hotspot, you need to manually add a split-region operator to split such a region. The load of some nodes is ...Pd Scheduling Policies · Pd Scheduling In Common... · Leaders/regions Are Not...
[81]
Schedule Replicas by Topology Labels - TiDB Docs
To improve the high availability and disaster recovery capability of TiDB clusters, it is recommended that TiKV nodes are physically scattered as much as ...Configure Location-Labels... · Configure Isolation-Level... · Pd Schedules Based On...
[82]
High Availability in TiDB Cloud Serverless
TiDB ensures high availability and data durability using the Raft consensus algorithm. This algorithm consistently replicates data changes across multiple nodes ...Overview · Zonal High Availability... · Regional High Availability...
[83]
A TiKV Source Code Walkthrough - Raft in TiKV - TiDB
Jul 28, 2017 · election_tick : When a Follower hasn't received the message sent by its Leader after the election_tick time, then there will be a new election ...
[84]
TiDB Log Backup and PITR Guide
To restore the cluster to any point in time within the backup retention period, you can use tiup br restore point . When you run this command, you need to ...Back up TiDB cluster · Query the status of the log... · Clean up outdated data
[85]
TiDB Backup & Restore Overview
By running the br restore point command, you can restore the latest snapshot backup data before recovery time point and log backup data to a specified time.Br Features · Compatibility · Before You Use
[86]
Two Availability Zones in One Region Deployment - TiDB Docs
If the disaster recovery AZ fails and a few Voter replicas are lost, the cluster automatically switches to the asynchronous replication mode.Configuration · Placement Rules · Enable The Dr Auto-Sync Mode
[87]
TiDB Cluster Alert Rules
This document describes the alert rules for different components in a TiDB cluster, including the rule descriptions and solutions of the alert items.Missing: healing | Show results with:healing
[88]
TiDB Operator Architecture
Starting from TiDB Operator v1.1, the TiDB cluster, monitoring, initialization, backup, and other components are deployed and managed using CR.Missing: healing | Show results with:healing
[89]
Transforming TiDB with AI: HTAP, Scalability & Real-World Cases
Aug 12, 2024 · By leveraging AI for predictive maintenance, TiDB ensures high availability and reduces the risk of unplanned outages. Enhancing TiDB ...Introduction To Tidb And Ai · Overview Of Tidb · Enhancing Tidb Performance...
[90]
Vector Search Overview - TiDB Docs
You can store vector embeddings in TiDB and perform vector search queries to find the most relevant data using these data types. Embedding model. Embedding ...Concepts · How Vector Search Works · Use Cases
[91]
Vector Search Index - TiDB Docs
HNSW is one of the most popular vector indexing algorithms. The HNSW index provides good performance with relatively high accuracy, up to 98% in specific cases.Create the HNSW vector index · Use the vector index · View index build progress
[92]
TiDB Vector Search Public Beta
Jun 25, 2024 · With built-in vector search in TiDB, you can develop AI applications directly, eliminating the need for additional databases or tech stacks.
[93]
Hybrid Search with TiDB: Combining Full-Text and Vector Search for ...
TiDB's unique architecture facilitates seamless Hybrid Search integration. It streamlines RAG pipelines and sets a new standard for AI applications' efficacy ...Semantic Search (vector... · Why Hybrid Search Is Crucial... · Tidb As Your Hybrid Search...
[94]
Build Gen-AI Applications with TiDB
TiDB supports vector search, full-text search, and SQL-native hybrid queries so your LLMs always get the most relevant, grounded information. Graph-based ...
[95]
TiFlash Overview - TiDB Docs
TiFlash is the key component that makes TiDB essentially a Hybrid Transactional/Analytical Processing (HTAP) database. As a columnar storage extension of TiKV, ...Asynchronous replication · Consistency
[96]
Storing Billions of Vectors with TiDB Serverless
May 21, 2024 · Learn how TiDB Serverless incorporates cutting-edge vector storage mechanisms designed specifically to handle storing billions of vectors.The Solution: Tidb... · Efficient Vector Storage · Similarity Search...
[97]
Introduce Vector Search Indexes in TiDB
Jun 3, 2024 · Explore how TiDB implements vector search indexes using HNSW, and how they can be utilized for efficient nearest neighbor searches.Missing: sub- latency 100M
[98]
Deploy a TiDB Cluster Using TiUP
Deploy a TiDB Cluster Using TiUP · Step 1. Prerequisites and prechecks · Step 2. Deploy TiUP on the control machine · Step 3. Initialize the cluster topology file.Step 2. Deploy TiUP on the... · Step 4. Run the deployment...Missing: premises | Show results with:premises
[99]
pingcap/tidb-ansible - GitHub
Jun 24, 2021 · TiDB-Ansible is a TiDB cluster deployment tool developed by PingCAP, based on Ansible playbook. TiDB-Ansible enables you to quickly deploy a new TiDB cluster.
[100]
https://docs.pingcap.com/tidb/stable/production-deployment-using-tiup
[101]
Ansible Deployment - TiKV
Use TiDB-Ansible to deploy a TiKV cluster on multiple nodes. This guide describes how to install and deploy TiKV using Ansible.
[102]
pingcap/tidb-docker-compose - GitHub
Jul 26, 2025 · You can customize TiDB cluster configuration by editing docker-compose.yml and the above config files if you know what you're doing.
[103]
Minimal Deployment Topology - TiDB Docs
This document describes the minimal deployment topology of TiDB clusters. ... Developer Guide · FAQs · Support. Company. About Us · News · Careers · Contact Us.Missing: premises | Show results with:premises
[104]
TiDB Software and Hardware Requirements
This document describes the software and hardware requirements for deploying and running the TiDB database.Os And Platform Requirements · Server Requirements · Storage Requirements
[105]
Select Your Cluster Plan - TiDB Docs
TiDB Cloud Serverless (now Starter) is a fully managed, multi-tenant TiDB offering. It delivers an instant, autoscaling MySQL-compatible database and offers a ...Tidb Cloud Serverless · Usage Quota · Tidb Cloud Essential
[106]
TiDB Cloud Starter: Our Renamed Auto-Scaling Plan
Jul 22, 2025 · TiDB Cloud Serverless will now become TiDB Cloud Starter, providing a clear a path for both existing users and the next wave of builders.
[107]
TiDB Operator Overview
TiDB Operator is an automatic operation system for TiDB clusters on Kubernetes. It provides a full management life-cycle for TiDB including deployment, upgrades ...
[108]
Deploy TiDB Operator on Kubernetes
These two components are stateless and deployed via Deployment . You can customize resource limit , request , and replicas in the values.yaml file. After ...Deploy Tidb Operator · Online Deployment · Offline Installation
[109]
Get Started with TiDB on Kubernetes
This document introduces how to create a simple Kubernetes cluster and use it to deploy a basic test TiDB cluster using TiDB Operator.
[110]
Deploy TiDB on AWS EKS
This document describes how to deploy a TiDB cluster on AWS Elastic Kubernetes Service (EKS). To deploy TiDB Operator and the TiDB cluster in a self-managed ...Create An Eks Cluster And A... · Configure Storageclass · Deploy Tiflash/ticdc
[111]
Deploy TiDB on Google Cloud GKE
This document describes how to deploy a Google Kubernetes Engine (GKE) cluster and deploy a TiDB cluster on GKE.Configure Storageclass · Deploy A Tidb Cluster And... · Access The Tidb Database
[112]
Deploy TiDB on Azure AKS
Before deploying a TiDB cluster on Azure AKS, perform the following operations: Install Helm 3 for deploying TiDB Operator. Deploy a Kubernetes (AKS) cluster ...Deploy A Tidb Cluster And... · Access The Database · Deploy Tiflash/ticdcMissing: charts | Show results with:charts
[113]
Transforming App Development with Serverless Computing - TiDB
Mar 14, 2025 · Discover how serverless computing and TiDB enhance app development with scalability, cost-efficiency, and simplified management.Understanding Serverless... · Advantages Of Tidb In A... · Implementing Tidb Serverless...
[114]
TiDB Lightning Data Sources
TiDB Lightning supports importing data from CSV, SQL, and Parquet files. It also supports schema files and compressed files.Missing: documentation | Show results with:documentation
[115]
Dumpling Overview | TiDB Docs
Jul 2, 2020 · Dumpling exports data stored in TiDB/MySQL as SQL or CSV data files and can be used to make a logical full backup or export.Export data from TiDB or MySQL · Export to SQL files · Export to CSV files
[116]
DML Replication Mechanism in Data Migration - TiDB Docs
This document introduces the complete processing flow of DML events in DM, including the logic of binlog reading, filtering, routing, transformation, ...Missing: loader stage real-
[117]
Usage Overview of TiDB Backup and Restore
If you have started log backup and regularly performed a full backup, you can run the tiup br restore point command to restore data to any time point within the ...Recommended Practices · How To Manage Backup Data? · Deploy And Use Br
[118]
TiDB Snapshot Backup and Restore Command Manual
TiDB Snapshot Backup and Restore Command Manual describes commands for backing up and restoring cluster snapshots, databases, and tables.
[119]
Encryption at Rest - TiDB Docs
TiKV supports KMS encryption for three platforms: AWS, Google Cloud, and Azure. Depending on the platform where your service is deployed, you can choose one of ...Encryption Support In... · Tikv Encryption At Rest · Tiflash Encryption At Rest
[120]
Overview of TiDB Backup & Restore Architecture - TiDB Docs
You can use Backup & Restore (BR) and TiDB Operator to access these features, and create tasks to back up data from TiKV nodes or restore data to TiKV nodes.
[121]
How to Back Up and Restore a 10-TB Cluster at 1+ GB/s - TiDB
Apr 20, 2020 · BR enables backup and restore to horizontally scale; that is, you can increase BR's backup and restore speeds by adding new TiKV instances.Missing: RTO | Show results with:RTO
[122]
Cluster Recovery: How TiDB Redefines Large-Scale Data Restores
Jan 31, 2025 · Dive into the performance improvements, challenges overcome, and innovations that make TiDB 8.1 a leader in large-scale cluster recovery.
[123]
TiCDC Overview - TiDB Docs
High availability with no single point of failure, supporting dynamically adding and deleting TiCDC nodes. Cluster management through Open API, including ...Major Features · Ticdc Architecture Overview · Implementation Of Processing...
[124]
pingcap/tidb-binlog - GitHub
Initial commit. 9 years ago. Makefile · Makefile · refactor(ci): delete the ... The best way to install TiDB-Binlog is via TiDB-Binlog-Ansible. Tutorial.<|control11|><|separator|>
[125]
Deploy TiDB Binlog
Install Helm and configure it with the official PingCAP chart. Deploy TiDB Binlog in a TiDB cluster. TiDB Binlog is disabled in the TiDB cluster by default. To ...Deploy Tidb Binlog In A Tidb... · Deploy Pump · Remove Pump/drainer Nodes<|separator|>
[126]
I Like To Move IT, Move IT - Replication in TiDB & MySQL - Fosdem
Use cases include easily reversible TiDB version upgrades, cross-region high availability with standby TiDB clusters, and real-time Change Data Capture into ...
[127]
TiCDC New Architecture - TiDB Docs
Downstream Adapter, as the stateless component, uses a lightweight scheduling mechanism that allows quick migration of replication tasks between instances. It ...Comparison Between The... · Compatibility · Upgrade Guide
[128]
Enhancing Real-Time Analytics with TiDB and AI Integration
Mar 23, 2025 · TiDB's capability to manage both OLTP and OLAP workloads enables seamless integration of AI models, resulting in enhanced real-time analytics.Missing: TiCDC enhancements
[129]
How to Resolve High Latency in CDC - Translated - TiDB Forum
Jun 21, 2024 · Splitting transactions can significantly reduce the latency and memory consumption of MySQL sink when synchronizing large transactions.How to Resolve High Latency in CDC - TiDB ForumTiCDC latency is high, consuming a lot of CPU and memoryMore results from ask.pingcap.comMissing: limitations | Show results with:limitations
[130]
TiDB Adoption at Pinterest. Authors - Medium
Jul 19, 2024 · - Change Data Capture (CDC). CDC is an essential requirement for many near-real-time use cases to stream database changes. It is also needed ...