Fact-checked by Grok 2 weeks ago

Queries per second

Queries per second (QPS) is a key performance metric in computing that measures the number of queries or requests a system—such as a database, search engine, or API service—can process within one second.^[1] This throughput indicator is essential for assessing the scalability and efficiency of information-retrieval and data-processing systems under load.^[2] QPS is widely applied in cloud computing environments to guide capacity planning and resource allocation. For instance, services like Amazon Kendra allow users to provision query capacity units that support a baseline of 0.1 QPS, with options to scale up to thousands of queries daily while accumulating unused capacity for up to 24 hours.^[3] Similarly, Google Cloud Spanner provides throughput estimates in QPS for different instance configurations, enabling peak read and write operations to be benchmarked for distributed databases.^[4] In Azure AI Search, QPS metrics monitor query volume in real-time, helping detect throttling and latency issues during high-demand scenarios.^[5] The metric also plays a critical role in benchmarking and optimization, where factors like query complexity, hardware resources, and network latency influence achievable QPS rates. For example, Amazon Timestream can handle approximately 76 QPS with low latency using just four compute units for time-series workloads.^[6] High QPS capabilities, often exceeding 1 million in optimized clusters, are vital for real-time applications like online transaction processing.^[7] Limits on QPS, such as those enforced by API quotas (e.g., 40 QPS for certain Microsoft Advertising endpoints), prevent overload and ensure fair resource distribution across users.^[8]

Definition and Fundamentals

Definition

Queries per second (QPS) is a key performance metric in computing that quantifies the number of queries a system can successfully process within one second under typical operating conditions. A query refers to a request for data or computation, such as retrieving information from a storage system or executing a specific operation. This metric is particularly relevant for high-throughput environments where systems must handle continuous streams of incoming requests without significant degradation in response times.^[9]^[1] The concept of QPS emerged in the 1990s alongside the proliferation of web servers and relational databases, as organizations sought standardized ways to evaluate system capacity amid growing internet usage. It built upon earlier transaction-based metrics and gained formal structure through industry benchmarks, including those developed by the Transaction Processing Performance Council (TPC), which introduced analogous measures like transactions per second (tps) in the late 1980s and early 1990s. By the early 2000s, QPS had become a widely adopted indicator for query-intensive workloads, reflecting the shift toward scalable web and database architectures.^[10] Fundamentally, QPS is calculated as the total number of successfully completed queries divided by the elapsed time in seconds:

\text{QPS} = \frac{\text{Total queries completed}}{\text{Time in seconds}}

Measurements emphasize steady-state performance, focusing on sustained throughput after excluding initial ramp-up or warm-up periods to ensure realistic assessments of long-term capacity. Common examples of queries include SQL SELECT statements in relational databases for data retrieval, HTTP API calls in web services for user interactions, and vector similarity searches in modern AI systems for recommendation or retrieval tasks.^[11]

Units and Notation

The base unit for measuring queries per second is simply queries per second, abbreviated as QPS, which quantifies the rate at which a system processes queries or requests in a one-second interval.^[1] This unit is widely adopted in performance metrics for information retrieval systems, ensuring a standardized temporal scale for throughput evaluation.^[12] For larger-scale systems, SI prefixes are applied to QPS to denote multiples, such as kiloQPS (kQPS) for 1,000 QPS, megaQPS (MQPS) for 1,000,000 QPS, and gigaQPS (GQPS) for 1,000,000,000 QPS, facilitating concise reporting in high-volume environments like cloud services and data centers.^[13] These prefixes align with international standards for unit scaling, promoting clarity and avoiding ambiguity in technical documentation and benchmarks.^[13] Notation conventions emphasize abbreviations like QPS for consistency, while in web services and API contexts, requests per second (RPS) serves as a synonymous term, often used interchangeably to describe the same throughput metric for HTTP or network requests.^[14] SI-compliant scaling ensures reports remain precise, with values expressed in decimal multiples to reflect actual system capacity without exaggeration.^[13] In distributed systems, such as clustered databases, total QPS is calculated by summing the QPS contributions from individual nodes, providing an aggregate measure of cluster-wide throughput; for instance, a MySQL NDB Cluster 8.0.26 configuration with two data nodes achieved over 1.5 million QPS in a sysbench OLTP point-select benchmark.^[15] Industry-specific variations in QPS application include its use in relational databases to track SQL query execution rates, contrasting with search engines where it measures full-text or semantic search queries; for example, as of August 2025, Google processes approximately 190,000 search queries per second on average, equivalent to about 16.4 billion daily, highlighting the scale in information retrieval workloads.^[16]^[17]

Importance and Applications

Role in Performance Evaluation

Queries per second (QPS) serves as a primary indicator of system throughput and capacity in performance evaluation, quantifying the volume of queries a system can process within a given timeframe. This metric is particularly valuable for identifying potential bottlenecks during pre-production testing, allowing engineers to simulate loads and detect limitations in resource utilization before deployment. For instance, in database management systems, QPS is employed as the principal measure of throughput to assess overall system efficiency under varying conditions.^[18]^[19] In capacity planning, QPS plays a crucial role in determining whether a system can sustain peak loads without degradation, enabling organizations to provision resources adequately for anticipated demand surges. For example, web services handling high-traffic events, such as sales periods in e-commerce platforms, rely on QPS projections to scale infrastructure and avoid overloads that could disrupt operations. By modeling QPS against expected traffic patterns, planners can forecast compute and networking requirements, ensuring scalability while minimizing overprovisioning costs.^[20] High QPS levels are closely correlated with low response times and elevated system availability, which in turn enhance user experience and drive business outcomes like increased revenue. Latency-sensitive services that maintain high throughput under load prevent user dissatisfaction and revenue loss, as even minor delays can lead to significant financial impacts. This connection underscores QPS's strategic importance in aligning technical performance with commercial goals.^[21]^[22] However, QPS has limitations as a standalone metric, as it does not account for query complexity or error rates, potentially leading to incomplete assessments of system health. Variations in query types can skew throughput interpretations, necessitating the integration of complementary metrics like response time distributions and failure rates for a holistic evaluation. Relying solely on QPS may overlook nuances in workload mixes, resulting in suboptimal scalability assumptions.^[23]^[24]

Key Use Cases

Queries per second (QPS) serves as a critical performance metric in database systems, particularly for evaluating transaction processing capabilities in both SQL and NoSQL environments. In SQL databases like MySQL, optimized configurations can achieve over 140,000 read/write requests per second in OLTP workloads on high-performance hardware, enabling efficient handling of read-heavy scenarios common in e-commerce and financial applications.^[25] For NoSQL systems, Cassandra demonstrates superior throughput in distributed setups, reaching up to 28,847 operations per second in read-heavy benchmarks using the YCSB workload on a three-node cluster, making it suitable for scalable, high-volume data ingestion and retrieval in big data pipelines.^[26] MongoDB, while versatile for document-oriented storage, typically sustains around 13,849 operations per second under similar read-heavy conditions, highlighting trade-offs in consistency and scalability for real-time analytics.^[26] In web and API services, QPS measures server capacity to process incoming requests, essential for RESTful APIs and microservices architectures in cloud platforms. Amazon API Gateway supports REST APIs that scale to handle variable loads, with performance influenced by integration methods like Lambda, where throughput can exceed thousands of requests per second in production deployments optimized for low-latency responses.^[27] In Microsoft Azure, API Management in the Premium tier provides an estimated maximum throughput of approximately 4,000 requests per second per instance, allowing developers to evaluate and throttle loads for secure, high-availability microservices in enterprise environments.^[28] These metrics guide capacity planning, ensuring APIs maintain responsiveness during peak traffic without degradation. Search engines and AI systems leverage QPS to quantify query handling efficiency, vital for information retrieval and model inference at scale. Elasticsearch clusters, when benchmarked for search operations, can process up to 1,000 queries per second concurrently with indexing workloads, using tools like Rally to simulate real-world log and metrics queries on multi-node setups.^[29] In AI applications, large language model (LLM) inference benchmarks reveal high throughput potential; for instance, on NVIDIA H100 GPUs with TensorRT-LLM, LLaMA-3-70B achieves around 12,000 tokens per second at batch size 64, enabling systems to support elevated QPS for batched inference in chatbots and recommendation engines where daily queries number in the billions globally.^[30] Real-time systems, including IoT and gaming backends, rely on QPS to ensure low-latency processing of concurrent user interactions. In IoT deployments, time-series databases like Amazon Timestream handle up to 72 queries per second with sub-200ms p99 latency in analytics workloads, facilitating real-time data aggregation from sensors without bottlenecks.^[6] For gaming, distributed databases such as Google Cloud Spanner power backends that sustain over 2 billion requests per second at peak, supporting global multiplayer sessions with strong consistency and horizontal scaling to manage sudden surges in player queries.^[31]

Measurement and Benchmarking

Measurement Techniques

Load testing for queries per second (QPS) typically involves simulating concurrent user requests to a database or query-processing system, starting with a gradual ramp-up phase to avoid sudden overload and reaching a steady-state load where the system operates under consistent pressure. This approach allows measurement of sustained QPS over extended periods, such as minutes to hours, to capture realistic performance under prolonged operation.^[32] The standard formula for calculating QPS is the number of successful queries divided by the test duration in seconds, explicitly excluding failed requests, timed-out operations, or those not meeting success criteria to ensure the metric reflects reliable throughput. For instance, in benchmarking suites like HyBench, QPS is computed as the total processed (successful) queries during the measurement phase divided by the actual runtime in seconds.^[33] Testing protocols emphasize a warm-up period to stabilize system components, such as caches and buffers, before transitioning to constant load phases that maintain a fixed rate of query issuance. This is followed by a dedicated measurement window, often lasting several minutes (e.g., 3–9 minutes depending on dataset scale), during which QPS is recorded under steady conditions. Protocols also stress the use of realistic query mixes to mirror production workloads, such as approximately 80% read operations and 20% write or update operations, ensuring the benchmark evaluates representative performance across operation types.^[32]^[33] In error handling, measurements define success thresholds to filter out suboptimal responses, such as requiring 95% of operations to complete within specified latency bounds (e.g., responses under 200 ms) or on-time execution windows, rendering the test invalid if error rates exceed allowable limits like logged failures during the measurement phase. This ensures QPS quantifies usable, high-quality performance rather than raw request volume.^[32]^[34]

Tools and Standards

Several software tools are widely used to simulate loads and measure queries per second (QPS) in various systems. Apache JMeter, an open-source Java-based application, enables load testing by simulating multiple users sending requests to servers, networks, or objects, with built-in reporting on throughput metrics including QPS for web applications and APIs. Sysbench, a scriptable multi-threaded benchmarking tool based on LuaJIT, is commonly employed for database performance testing, particularly for relational databases like MySQL and PostgreSQL, where it generates workloads to measure QPS under read-write scenarios. Locust, a Python-based open-source load testing framework, allows defining user behavior in code to swarm systems with simulated users, facilitating QPS assessment for API endpoints by tracking request rates and response times during high-concurrency tests. Established benchmark standards provide standardized methodologies for evaluating QPS in database environments. The TPC-C benchmark, developed by the Transaction Processing Performance Council, assesses online transaction processing (OLTP) systems through a mix of five transaction types simulating a wholesale supplier, reporting performance in transactions per minute (tpmC); this can be converted to approximate QPS, as each transaction averages around 30 queries, with top results exceeding 2 billion tpmC on clustered hardware as of 2025.^[35] TPC-H, also from the TPC, evaluates decision support systems with 22 complex ad-hoc queries and data modifications on a scalable dataset, measuring query throughput in QphH@size factor (queries per hour at a given scale), which informs analytical QPS capabilities in data warehousing scenarios. For NoSQL systems, the Yahoo! Cloud Serving Benchmark (YCSB) framework tests key-value and cloud data serving platforms across workloads like read-heavy or update-heavy operations, reporting throughput in operations per second (ops/sec), directly comparable to QPS, to compare systems under distributed loads. Cloud providers offer specialized benchmarks and reported QPS metrics tailored to their services. In Azure AI Search, QPS is monitored via Azure Monitor logs to analyze query volume and latency, with performance optimization guidelines recommending baseline testing to avoid throttling at high loads, such as during indexing or vector search workloads.^[5] Google Cloud services, including Cloud Storage and SQL databases, enforce QPS quotas and tiers— for instance, Cloud Storage supports initial QPS limits scalable to thousands per bucket, while higher tiers in services like the Gemini API enable elevated rates based on cumulative spending.^[36]^[37] Examples of certified hardware achieving over 100,000 QPS include a single Redis instance on high-performance servers handling approximately 100,000 QPS for simple operations, and AWS c5d.metal instances with PostgreSQL reaching up to approximately 630,000 QPS for point lookups in sysbench tests.^[38] To ensure standardization in QPS benchmarking, best practices emphasize creating reproducible test environments through containerization or virtual machines that mirror production setups, minimizing variability from OS or network differences. Reports should consistently include QPS metrics alongside detailed hardware specifications—such as CPU cores, RAM, storage type, and network bandwidth—to enable fair comparisons across systems, as recommended in guidelines for accurate and precise measurements.

Factors Affecting QPS

Hardware Factors

The number of CPU cores significantly impacts queries per second (QPS) in compute-intensive database workloads, as most database engines assign one core per concurrent query to enable parallel execution. Increasing core count allows for more simultaneous queries, directly scaling throughput in CPU-bound scenarios like complex analytical processing. Clock speed also plays a key role, with higher frequencies reducing individual query execution times for operations such as joins, aggregations, and sorting, thereby elevating overall QPS. For example, systems optimized for high-core-count processors, like those in modern analytical databases, demonstrate proportional QPS gains with additional cores when workloads are parallelizable.^[39]^[39]^[40] Memory capacity and type are crucial for QPS, particularly through caching mechanisms that store query results and hot data in RAM to bypass slower disk I/O. Adequate RAM enables high cache hit ratios, where frequently accessed data resides in memory, potentially achieving up to 80 times faster read performance and supporting QPS levels of 32,000 or more for workloads with 80% cacheable data. Storage technology further differentiates performance: solid-state drives (SSDs) deliver over 10 times faster random read speeds than hard disk drives (HDDs) in I/O-bound database operations, leading to substantial QPS uplifts for disk-intensive queries by minimizing seek times and latency. In benchmarks with growing datasets up to 12,000 records, SSDs can reduce load times by up to 28% compared to HDDs, with benefits amplifying for larger-scale operations.^[41]^[41]^[42]^[43] In distributed environments, network bandwidth and latency directly constrain QPS by affecting inter-node communication for query coordination and data shuffling. High-bandwidth, low-latency interconnects like InfiniBand, offering 3-5 microsecond latencies and up to 400 Gb/s throughput, support elevated QPS in clustered databases by reducing synchronization overhead. Conversely, even modest latency increases—for example, a 100-microsecond increase—can degrade throughput by over 20% in latency-sensitive components like caching layers integrated with databases. Bandwidth saturation in underprovisioned networks further limits scalability, capping effective QPS despite ample compute resources.^[44]^[44]^[45] Horizontal scaling via cluster expansion exemplifies hardware's role in QPS growth, where adding nodes linearly boosts aggregate throughput until bottlenecks emerge. In sharded MySQL deployments, for instance, increasing from 16 to 32 shards can double QPS from around 420,000 to 840,000 by distributing load across more hardware instances. Further expansion to 40 shards sustains over 1 million QPS, but gains plateau due to network saturation and resource contention, highlighting the need for balanced interconnects. Such approaches rely on hardware parallelism but are complemented by software configurations for optimal node utilization.^[46]^[46]^[46]

Software and Optimization Factors

Query optimization plays a pivotal role in enhancing queries per second (QPS) by minimizing execution times through strategic indexing and intelligent query planning. Indexing structures such as B-trees accelerate data retrieval by organizing records in a balanced tree format, enabling logarithmic-time searches that significantly reduce the number of disk accesses required for query resolution. For instance, in MySQL environments, query optimization involving B-tree lookups can streamline response preparation and transmission, contributing to overall throughput improvements. Advanced query planners further refine this by selecting optimal execution paths, such as join orders or predicate pushdowns, which can potentially double QPS in retrieval-augmented generation pipelines by addressing bottlenecks in multi-stage processing.^[47]^[48] Caching mechanisms at the software level substantially boost read QPS by intercepting frequent queries and serving results from high-speed in-memory stores, thereby bypassing slower database operations. Tools like Redis implement cache-aside patterns where applications first consult the cache; on a hit, data is returned sub-millisecond, avoiding database hits entirely for repeated accesses. This approach is particularly effective for read-heavy workloads, with Redis capable of handling up to 200 million operations per second while maintaining low latency. In practice, integrating Redis via services like Amazon ElastiCache can support up to 400,000 QPS per node, offloading database read replicas and achieving comparable throughput to scaled replicas but with reduced response times, such as 1 ms average versus 80 ms.^[49]^[41] Concurrency handling in software frameworks maximizes QPS under varying loads by efficiently managing multiple simultaneous requests through thread pools and asynchronous processing. Thread pools allocate a fixed number of worker threads to handle incoming queries, preventing resource exhaustion while scaling with demand; for example, in microservices, asynchronous models using 4 worker threads can achieve significant throughput at high loads exceeding 10,000 QPS, representing a 42% throughput gain over synchronous counterparts by minimizing queuing delays. Event-driven, non-blocking I/O models can process concurrent operations without dedicated threads per request, enabling sustained high QPS with minimal threading overhead across varying loads. Auto-tuning techniques dynamically adjust pool sizes and models, improving tail latency by up to 1.9 times across load variations.^[50] At the code level, efficient algorithms and avoidance of pitfalls like the N+1 query problem in object-relational mapping (ORM) tools are essential for tuning database interactions to sustain high QPS. The N+1 problem arises when an initial query fetches a list of records, followed by individual queries for each related entity, leading to excessive database round-trips and degraded performance; ORM misuse can generate far more queries than necessary, inflating latency in data-intensive applications. Mitigating this through eager loading or batch queries in ORMs reduces query volume, directly enhancing throughput; studies show ORM frameworks impact relational database performance by orders of magnitude if not optimized, with refactoring tools automating fixes to eliminate such regressions. Database tuning guides emphasize algorithmic choices, such as using joins over loops, to ensure scalable QPS without hardware dependencies.^[51]

Distinctions from Other Metrics

Queries per second (QPS) measures the volume of queries a system can process in a unit of time, serving as a key indicator of throughput in database and search engine environments. In contrast, latency quantifies the response time for a single query, focusing on individual performance rather than aggregate capacity. A system can achieve high QPS while experiencing elevated latency, particularly under overload conditions where queued requests prolong individual processing times.^[52]^[53] Although requests per second (RPS) is often synonymous with QPS in web services, where it tracks HTTP requests, QPS more precisely emphasizes query-oriented workloads in databases and information retrieval systems. Throughput, as a broader metric, may incorporate not only the number of operations but also the data volume handled, distinguishing it from the operation-count focus of QPS.^[54]^[55] QPS differs from input/output operations per second (IOPS), which specifically gauges storage subsystem performance by counting low-level read and write accesses to disk. While IOPS is confined to I/O efficiency, QPS encompasses the full query lifecycle, including computational overhead beyond mere storage interactions.^[56]^[57] Efforts to enhance QPS frequently involve trade-offs, such as sacrificing query accuracy in AI-driven vector databases through approximate nearest neighbor techniques that prioritize speed over precision. Similarly, in distributed databases, opting for eventual consistency over strong consistency can double read throughput, enabling higher QPS at the expense of immediate data synchronization across replicas.^[58]

Scaling and Improvements

Vertical scaling involves upgrading the resources of a single node, such as increasing CPU, memory, or storage, to enhance queries per second (QPS) capacity within a system. This approach can significantly boost performance for workloads that fit within one machine's limits, as seen in Redis Enterprise deployments where enhanced hardware configurations enable higher QPS through improved processing efficiency. However, vertical scaling is constrained by hardware ceilings, such as maximum CPU cores or RAM availability on a single server, beyond which further gains diminish due to physical limits.^[59]^[60] Horizontal scaling distributes the query load across multiple nodes using techniques like sharding, which partitions data into subsets across servers, and replication, which creates copies of data for parallel processing. Database clusters employing these methods, often coordinated by load balancers to evenly distribute traffic, can achieve over 1 million QPS; for instance, PlanetScale's MySQL-based system uses horizontal sharding to handle 1 million QPS by dividing data across shards while maintaining consistency. This scalability allows systems to grow linearly with added nodes, making it suitable for high-volume applications like e-commerce platforms.^[46]^[61] Advanced techniques further optimize QPS by targeting specific workload patterns. Read replicas, which are synchronized copies of a primary database instance dedicated to read operations, separate read and write queries to prevent bottlenecks on the primary node, enabling systems to scale read-heavy workloads where read QPS can be 10 to 100 times higher than write QPS. In AI inference scenarios, model quantization reduces parameter precision (e.g., from 16-bit to 8-bit or 4-bit) to lower memory usage and computational demands, accelerating QPS; benchmarks show up to 30% faster inference speeds on quantized large language models without substantial accuracy loss.^[62]^[63]^[64] Monitoring QPS trends is essential for iterative improvements, as it reveals performance bottlenecks and informs scaling decisions in real-time. By tracking QPS alongside metrics like latency and error rates, teams can proactively adjust resources, such as adding replicas during peaks. In Netflix's microservices architecture, continuous monitoring of requests per second (RPS, analogous to QPS) across services enables dynamic load shedding and autoscaling, supporting reliable performance for global traffic volumes; similarly, Flipkart's TiDB cluster uses QPS monitoring to scale horizontally to over 1 million QPS with zero-downtime maintenance during high-demand events like sales.^[65]^[66]

References

[1]
What is QPS? - Tencent Cloud
Apr 25, 2025 · QPS stands for Queries Per Second. It is a measure of how many queries or requests a system can handle in one second.
[2]
What is queries per second (QPS) and what is it used for? - SOAX
Queries per second (QPS) is a performance metric that measures the number of queries a system can process in one second. It is commonly used in databases, ...
[3]
Adjusting capacity - Amazon Kendra - AWS Documentation
You can use up to 8,000 queries per day with a minimum throughput of 0.1 queries per second (per query capacity unit). Accumulated queries will last up to 24 ...
[4]
Performance overview | Spanner - Google Cloud Documentation
The following table provides the approximate throughput (queries per second) for Spanner instance configurations: Instance configuration type, Peak reads (QPS ...
[5]
Azure AI Search - Monitor queries - Microsoft Learn
Aug 8, 2025 · Query volume (QPS). Volume is measured as Search Queries Per Second (QPS), a built-in metric that can be reported as an average, count ...
[6]
Understanding and optimizing Amazon Timestream Compute Units ...
Sep 26, 2024 · With just 4 TCUs, the service supports 4,560 queries per minute, which is approximately 76 queries per second (QPS), with p99 latency of less ...
[7]
Scale-up MySQL NDB Cluster 8.0.26 to +1.5M QPS the easy way ...
Jul 27, 2021 · Using this Cluster configuration and workload it is possible to go above 1.7M queries per second. It's also possible to further reduce the ...
[8]
Request and method limits - Microsoft Advertising API
Nov 13, 2024 · Queries per second (QPS), Limit the number of HTTP requests you send per second to 40. Method calls per minute, Limit the number of method ...
[9]
QPS FAQ | Microsoft Learn
QPS is defined as queries per second. For external bidders, QPS is used to describe the total number of bid requests that can be sent per second.
[10]
TPC History of TPC - TPC.org
In total, about 300 TPC-A benchmark results were published. The first TPC-A result was 33 tpsA at a cost of $25,500 per transaction or tpsA. The highest TPC-A ...
[11]
TPC Benchmarks Overview
The benchmark defines the required mix of transactions the benchmark must maintain. The TPC-E metric is given in transactions per second (tps).
[12]
What is QPS (Queries per second)? - Bigabid
Queries per second (QPS) is a metric used in online systems to measure the number of requests for information that a server receives per second.Missing: definition | Show results with:definition
[13]
SI prefixes - BIPM
SI prefixes are decimal multiples and submultiples of SI units, such as kilo (k, 10^3) and milli (m, 10^-3).
[14]
What Is the Difference Between QPS and the Number of Requests?
Apr 25, 2025 · Queries Per Second (QPS) is the number of requests a server can handle per second. NOTE: QPS is used to measure the number of queries, or ...
[15]
Scale-up MySQL NDB Cluster 8.0.26 to +1.5M QPS the easy way ...
Jul 27, 2021 · This blog provides an introductory walk-through on how to scale up MySQL NDB Cluster 8.0.26 in an easy way reporting over 1.7M primary key lookups per second.<|separator|>
[16]
Google Search Statistics - Internet Live Stats
Google now processes over 40,000 search queries every second on average (visualize them here), which translates to over 3.5 billion searches per day and 1.2 ...
[17]
What is QPS and How it Affects System Design
Requests Per Second (RPS) and Queries Per Second (QPS) are metrics commonly used to measure the performance of systems that handle incoming requests.
[18]
A methodology for database system performance evaluation
Performance Metric. We have used system throughput measured in queries-per-second as our principal performance metric. Where illustrative, response time has ...
[19]
[PDF] Power-Aware Throughput Control for Database Management Systems
Jun 26, 2013 · In our study, we focus on the DBMS throughput. (query per second, QPS) as the main performance metric. The throughput, as the reciprocal of ...
[20]
[PDF] Capacity Planning - USENIX
Web queries per second (QPS) are likely to impact compute and, possibly, networking. Finding the fewest number of drivers. (QPS, gigs uploaded, etc.) that ...
[21]
[PDF] PerfIso:Performance Isolation for Commercial Latency-Sensitive ...
Jul 13, 2018 · Even slightly higher response times decrease user satisfaction and impact revenues [29, 10, 17]. Over-provisioning means that resource ...
[22]
How Fast is Your Web Site? - ACM Queue
Mar 4, 2013 · The overwhelming evidence indicates that a Web site's performance (speed) correlates directly to its success, across industries and business metrics.Web Site Performance Data... · Passive Performance... · When Is Timing Done?Missing: revenue | Show results with:revenue
[23]
Metrics That Matter - Communications of the ACM
Apr 1, 2019 · For example, it is not uncommon to measure the QPS (queries per second) received at a Web or API server, and to assess that this metric ...Missing: origin | Show results with:origin
[24]
[PDF] Performance Analysis of Cloud Applications - USENIX
Apr 11, 2018 · We show, using data from Gmail, that the biggest challenges in analyzing performance come not from changing QPS but in chang- ing load ...
[25]
PostgreSQL and MySQL: Millions of Queries per Second - Percona
Jan 6, 2017 · How PostgreSQL and MySQL work together to handle millions of queries per second under high workloads.Missing: heavy | Show results with:heavy
[26]
MongoDB vs. Cassandra Performance Studie - benchANT
We did 18 different benchmark measurements to find out more about the performance and scalability of MongoDB and Apache Cassandra.
[27]
Things to Consider When You Build REST APIs with Amazon API ...
Aug 13, 2019 · This post will dive deeper into the things an API architect or developer should consider when building REST APIs with Amazon API Gateway.Things To Consider When You... · Request Rate (a.K.A... · Integrations And Design...Missing: QPS benchmarks<|separator|>
[28]
Azure API Management Instance - Throughput - Microsoft Q&A
May 12, 2022 · Azure documentation indicates that the estimated maximum throughput of an API Management Instance in a Premium tier is about 4000 requests/sec.Missing: QPS | Show results with:QPS
[29]
Benchmarking and sizing your Elasticsearch cluster for logs and ...
Oct 29, 2020 · With 1 node and 1 shard we got 22K events per second. · With 2 nodes and 2 shards we got 43k events per second. · With 3 nodes and 3 shards we got ...
[30]
[PDF] Inference Benchmarking of Large Language Models on AI ... - arXiv
Nov 3, 2024 · We introduce LLM-Inference-Bench, a comprehensive benchmarking suite to evaluate the hardware inference performance of LLMs. We thoroughly ...
[31]
Choosing Cloud Spanner for game development | Google Cloud Blog
Nov 16, 2022 · Organizations globally use Cloud Spanner, because of its unlimited scale, strong consistency, and up to 99.999% of availability.<|control11|><|separator|>
[32]
[PDF] The LDBC Social Network Benchmark (version 2.2.5-SNAPSHOT ...
LDBC's Social Network Benchmark (LDBC SNB) is an effort intended to test various functionalities of systems used for graph-like data management.
[33]
[PDF] HyBench: A New Benchmark for HTAP Databases
To obtain steady performance results, each phase includes a warm-up run and the measurement run. Particularly, 3 min warm-up and 3 min measurement phase for.Missing: period constant<|separator|>
[34]
[PDF] Lancet: A self-correcting Latency Measuring Tool - USENIX
Jul 10, 2019 · We use the ADF test to determine the duration of the warm-up phase and whether the experiment results change over time. Finally, we use the ...
[35]
How to Measure Database Performance | Severalnines
May 4, 2022 · Queries per second (QPS) We can have INSERTs, UPDATEs, SELECTs. We can have simple queries that access the data using indexes or even primary ...
[36]
Optimizing ClickHouse for Intel's ultra-high core count processors
Sep 17, 2025 · As a result of our optimization, ClickBench query Q3 saw a 27.4% improvement on ultra-high core count systems. The performance gain increases ...
[37]
Optimize cost and boost performance of RDS for MySQL using ...
Nov 10, 2023 · In-memory caching of query results helps in boosting application performance while providing customers the ability to grow their business and ...
[38]
SSD or HDD for server - hard drive
Oct 4, 2019 · Write speed on the SSD RAID is about the same speed as the HDD RAID, but random access read speed is more than 10X faster on the SSD RAID.Consumer (or prosumer) SSD's vs. fast HDD in a server environmentConfiguring SQL for optimal performance... SSD or HDD?More results from serverfault.com
[39]
[PDF] Analysis of SSD's Performance in Database Servers
The main objective of this study is to analyze the components of solid-state drives and what kind of impact they can make when the storage infrastructure in ...<|separator|>
[40]
InfiniBand in focus: bandwidth, speeds and high-performance ...
Jun 18, 2024 · InfiniBand is a high-speed, low-latency interconnect for HPC, offering up to 400Gb/s throughput, low latency of 3-5 microseconds, and high ...
[41]
[PDF] Characterizing the impact of network latency on cloud-based ...
Small network delays can cause significant performance degradation in cloud applications, affecting user costs and resource usage. Different applications are ...
[42]
One million queries per second with MySQL - PlanetScale
Sep 1, 2022 · Discover how PlanetScale handles one million queries per second (QPS) with horizontal sharding in MySQL.
[43]
[PDF] Executing Web Application Queries on a Partitioned Database
A profile of the MySQL server shows that the cost consists of optimizing the query, doing a lookup in the btree index, and preparing the response and sending ...
[44]
RAGO: Systematic Performance Optimization for Retrieval ...
Jun 20, 2025 · For the 8B model, retrieval is the primary bottleneck; as query counts double, QPS nearly halves due to increased retrieval demands.
[45]
Caching | Redis
Redis caching is a fast, scalable layer that achieves sub-millisecond performance for real-time apps, designed for caching at scale.
[46]
[PDF] µTune: Auto-Tuned Threading for OLDI Microservices - USENIX
Oct 8, 2018 · All three thread pools vary in size. Typically, one network thread is sufficient, while the other pools must scale with load. Asynchronous ...
[47]
reformulator: Automated Refactoring of the N+1 Problem in ...
This added layer of abstraction hides the significant performance cost of database operations, and misuse of ORMs can lead to far more queries being generated ...
[48]
Analyze performance - Azure AI Search | Microsoft Learn
This article describes the tools, behaviors, and approaches for analyzing query and indexing performance in Azure AI Search.Develop baseline numbers · Use resource logging
[49]
Benchmarking databases 101 - part 1 - Severalnines
Jun 7, 2022 · Queries per second, latency, 99 percentile, this all tells you how ... QPS represents the throughput but it ignores the latency. You ...
[50]
Defining slo: service level objective meaning - Google SRE
For incoming HTTP requests from the outside world to your service, the queries per second (QPS) metric is essentially determined by the desires of your users, ...
[51]
Metrics for performance tests - Alibaba Cloud
Oct 30, 2024 · Average response time refers to the average value of the same transaction when the system is running stably. In general, the average response ...
[52]
IOPS QPS TPS - Alibaba Cloud News Network
Jun 28, 2016 · IOPS refers to how many times the storage can accept access from the host per second. A host's IO requires multiple accesses to the storage to ...
[53]
Database Performance: Impact of Storage Limitations | simplyblock
Jan 21, 2025 · As IOPS increases, latency increases due to queuing and resource contention. Higher latency constraints maximum achievable IOPS and QPS. The ...Missing: bandwidth | Show results with:bandwidth
[54]
Consistency level choices - Azure Cosmos DB - Microsoft Learn
Sep 3, 2025 · Eventual consistency is the weakest form of consistency because a client might read values older than those values it read in the past. Eventual ...Missing: QPS | Show results with:QPS
[55]
Redis 7.2 Sets New Standard for Developers to Harness the Power ...
Aug 15, 2023 · Redis Enterprise 7.2 introduces scalable search to its vector database capabilities, delivering even higher queries per second, furthering its best-in-class ...
[56]
Scaling with MemoryDB Multi-Region - AWS Documentation
Vertical changes the node type to resize the MemoryDB Multi-Region cluster. The online vertical scaling allows scaling up/down while the regional clusters ...Missing: queries per<|separator|>
[57]
Horizontal Scaling with Oracle Database
Jul 31, 2021 · This article focuses on horizontal scaling with Oracle Databases, and the unique ways in which Oracle Database software supports horizontal scaling.Vertical Scalability · Availability & Scalability... · Combining Rac With Sharding...
[58]
Working with DB instance read replicas - AWS Documentation
A read replica is a read-only copy of a DB instance. You can reduce the load on your primary DB instance by routing queries from your applications to the read ...Missing: QPS | Show results with:QPS
[59]
System design paradigm: Primary-replica pattern | by Abracadabra
Dec 23, 2020 · Read QPS is often 10~100 times higher than write QPS; DB query is slower than most app server computation since it usually needs to read from ...<|separator|>
[60]
Speeding up LLM inference by using model quantization in Databricks
Apr 6, 2025 · The results revealed up to a 30% improvement in inference ... Model quantization has become a game-changer in edge AI applications ...
[61]
Implementing the Netflix Media Database | by Netflix Technology Blog
Dec 14, 2018 · In the Netflix microservices environment, different business applications ... While RPS or CPU usage could be useful metrics for scaling ...
[62]
How Flipkart Scales Over 1M QPS with Zero Downtime Maintenance
May 29, 2025 · Scale up the TiDB cluster by adding new row storage nodes to handle load redistribution. · Rebalance regions and data away from the node ...