Fact-checked by Grok 2 weeks ago

Rate limiting

Rate limiting is a technique used in computer networks, web services, and software systems to control the rate at which requests are processed or data is transmitted, thereby preventing server overload, ensuring resource availability, and mitigating abusive behaviors such as denial-of-service attacks or brute-force attempts.^[1]^[2] By enforcing predefined thresholds on the number of actions—such as API calls, logins, or network packets—within a specified time window, rate limiting maintains system stability and promotes fair resource allocation among users or clients.^[1]^[3] Commonly implemented at the application layer, such as in web servers like NGINX or API gateways, rate limiting identifies clients typically by IP address or user credentials and tracks their request volume over time.^[2]^[1] It employs algorithms to enforce these controls, including the leaky bucket model, which treats incoming requests as a queue that drains at a constant rate, allowing bursts but smoothing out traffic over time, and the token bucket algorithm, which permits traffic up to a sustained rate while accommodating short bursts through accumulated tokens.^[4] These mechanisms, often configurable for burst sizes and delay thresholds, can reject, queue, or throttle excess requests to protect backend resources without disrupting legitimate traffic.^[2]^[4] Beyond security, rate limiting supports scalability in distributed environments by preventing any single client from monopolizing bandwidth or compute cycles, a critical feature in cloud computing and microservices architectures.^[3]^[2] For instance, in HTTP-based systems, it counters threats like credential stuffing by capping login attempts per IP, while in network protocols, it aligns with standards for traffic shaping to avoid congestion.^[1] These concepts originated in 1980s telecommunications for traffic shaping and have been standardized in IETF RFCs for protocols like SIP, ensuring interoperability and robustness across the internet infrastructure.^[5]

Fundamentals

Definition

Rate limiting is a control mechanism in computer networking and software systems that restricts the number of requests, operations, or data units processed by a resource within a defined time frame, thereby managing load and preventing overload or abuse.^[1]^[6] This technique enforces boundaries on traffic flow to ensure stability, often by rejecting or delaying excess activity once limits are reached.^[3] Key components of rate limiting include the rate, which defines the permitted volume of activity per time unit (such as requests per second or minute); the burst allowance, which accommodates short-term excesses by allowing a limited number of additional operations beyond the steady rate; and enforcement thresholds, which trigger actions like blocking when these limits are exceeded.^[7]^[8] These elements work together to balance resource allocation while permitting flexibility for legitimate usage patterns.^[9] The concept originated in the 1980s as part of traffic shaping efforts in early packet-switched networks, where mechanisms like the leaky bucket algorithm were developed to regulate data flow and enforce bandwidth contracts in telecommunications and ATM systems.^[10] It gained formal structure in computer networking through RFC 1633 in 1994, which outlined integrated services for the Internet, incorporating rate-based guarantees to support quality-of-service (QoS) for real-time applications via traffic control functions such as scheduling and admission.^[11] Rate limiting is distinct from throttling, which reduces the speed of processing or transmission for excess requests rather than blocking them outright.^[12] It also differs from quota systems, which apply cumulative caps over extended periods (e.g., daily or monthly totals) to govern overall usage, whereas rate limiting focuses on immediate, time-bound rates.^[13]^[14]

Purposes and Benefits

Rate limiting serves several primary purposes in modern computing systems, particularly in web services and APIs. It prevents server overload by capping the number of requests processed within a given timeframe, thereby maintaining operational capacity during unexpected traffic volumes. This mechanism is crucial for mitigating denial-of-service (DoS) attacks, where malicious actors flood systems with requests to disrupt availability; by enforcing limits, rate limiting blocks excessive traffic before it reaches backend resources. Additionally, it ensures fair resource allocation among users, preventing any single client from monopolizing bandwidth or compute power, which promotes equitable access in multi-tenant environments. Finally, rate limiting enforces service level agreements (SLAs) by aligning usage with contractual terms, such as request quotas per user or tier, helping providers manage expectations and billing. The benefits of rate limiting extend to enhanced system stability and efficiency. By controlling inbound traffic, it reduces latency spikes that occur during surges, allowing consistent response times for legitimate requests and improving overall user experience. In cloud environments, it enables cost savings by avoiding the need for over-provisioning resources to handle worst-case scenarios; for instance, in a case study of the Have I Been Pwned service using Cloudflare's rate limiting, infrastructure costs were reduced by 90% through efficient traffic management and caching integration. This approach also bolsters security postures without requiring extensive additional infrastructure, as it inherently curbs abuse patterns like brute-force attempts. Despite these advantages, rate limiting introduces trade-offs that require careful configuration. If limits are set too strictly, legitimate users may be inadvertently blocked, leading to false positives and potential frustration, especially in scenarios with shared IP addresses like corporate networks or mobile carriers. Effective implementation thus demands ongoing tuning based on traffic patterns and user feedback to balance protection with accessibility.

Algorithms and Techniques

Token Bucket Algorithm

The token bucket algorithm is a permissive rate limiting technique that regulates traffic by allowing short bursts while enforcing a sustainable long-term rate. It operates on the principle of a conceptual "bucket" that accumulates tokens over time, where each token represents permission to process a unit of work, such as a network packet or API request. Tokens are added to the bucket at a fixed rate, enabling the system to handle variable loads without strictly queuing excess traffic. The core mechanism involves monitoring the bucket's token count upon each incoming request. If sufficient tokens are available, the request proceeds, and an equivalent number of tokens is deducted from the bucket. This design inherently supports bursts: if the bucket fills during low-activity periods, a sudden influx of requests can deplete it rapidly up to the bucket's maximum capacity, after which further requests are throttled until more tokens accumulate. In contrast to stricter methods, this approach prioritizes responsiveness for intermittent traffic while preventing sustained overloads. Key parameters define the algorithm's behavior: the refill rate r (tokens added per unit time, often in tokens per second), the bucket capacity b (maximum tokens the bucket can hold, determining burst size), and the request cost c (tokens consumed per request, typically c = 1 for uniform operations). These allow fine-tuning for specific workloads, such as setting r = 100 tokens/second and b = 1000 to permit up to 10 seconds' worth of requests in a burst.^[15] A standard mathematical formulation for processing a request at current time t_{\text{now}} (assuming the last update was at t_{\text{last}}) proceeds as follows:

Compute elapsed time \Delta t = t_{\text{now}} - t_{\text{last}}.
Refill the token count: t \leftarrow \min(b, t + r \cdot \Delta t), where t is the previous token balance.
If t \geq c, grant the request, update t \leftarrow t - c, and set t_{\text{last}} = t_{\text{now}}; otherwise, deny or delay the request.

This on-demand refill ensures accurate rate enforcement without clock drift, though implementations may vary slightly for efficiency.^[15] The algorithm's advantages lie in its flexibility for bursty traffic—common in web services and networks—and its straightforward software implementation using simple counters and timers, without needing complex queues. It has been widely adopted in production systems, including Google's Guava library's RateLimiter class, which applies a smoothed variant for concurrent Java applications. However, a key limitation is the potential for large bursts (up to b) to cause temporary resource spikes, possibly overwhelming downstream components if not paired with additional safeguards.^[15]

Leaky Bucket Algorithm

The leaky bucket algorithm functions as a smoothing technique for rate limiting, where incoming requests or packets are queued in a finite-capacity bucket that continuously leaks at a constant rate, ensuring a steady output flow. If the bucket fills to capacity due to a burst of arrivals, any excess requests are discarded rather than queued further. This mechanism enforces a uniform transmission rate, preventing sudden spikes from overwhelming downstream systems.^[16]^[17] Key parameters of the algorithm include the leak rate \mu, typically measured in requests or bytes per second, which dictates the constant output rate, and the bucket depth d, representing the maximum number of requests that can be held in the queue before overflow. Unlike approaches that permit controlled bursts, the leaky bucket provides no additional allowance beyond the queue size itself, prioritizing consistent flow over temporary surges.^[18]^[17] The algorithm's operation can be mathematically formulated through updates to the queue length q. Over a time interval \Delta t, with a denoting the number of arrivals, the queue evolves as

q \leftarrow \max(0, q + a - \mu \Delta t).

If the resulting q > d, the excess is dropped, maintaining the bucket within bounds. This formulation models the bucket as a finite queue draining continuously, with decisions made at arrival or departure events.^[19]^[17] A primary advantage of the leaky bucket is its ability to deliver a constant output rate, making it particularly suitable for traffic shaping in networks where steady transmission reduces congestion and jitter. It has been implemented in protocols such as aspects of TCP congestion control, where it helps regulate flow to avoid overwhelming links. However, a key limitation is its strict discarding of bursts, which can lead to unfair treatment of users with intermittent high-demand patterns, as no temporary excess capacity is tolerated beyond the fixed depth.^[16]^[18]^[17]

Window-Based Methods

Window-based methods for rate limiting involve discretely counting requests within defined time intervals to enforce limits, providing a straightforward approach to controlling traffic bursts over short periods. These techniques divide time into windows and track request counts accordingly, differing from continuous smoothing mechanisms by focusing on bounded, countable events. They are particularly suited for API endpoints where precise, time-bound quotas are needed without complex queuing. The fixed window method partitions time into non-overlapping intervals, such as one-minute epochs, and maintains a counter for requests within each interval. At the start of a new window, the counter resets to zero, allowing a fresh allocation of permitted requests. For a given request at time t, the system determines the current window w = \lfloor t / W \rfloor, where W is the window duration. If the count for w is less than the limit L, the count is incremented; otherwise, the request is rejected. Upon crossing a window boundary, the count for the new w initializes to 1 (or 0 before incrementing). This formulation ensures enforcement per interval but can permit up to twice the limit in bursts near boundaries, as a client may exhaust one window's quota just before reset and immediately consume the next.^[20] Fixed window methods offer simplicity in implementation and low memory overhead, typically requiring only a single counter per client per window size, making them efficient for short-term limits in resource-constrained environments. They are widely adopted in API gateways, such as Kong, where the default rate limiting plugin uses fixed windows configurable in seconds, minutes, or longer periods to cap HTTP requests per consumer or IP. However, the boundary burst limitation can lead to uneven traffic distribution, potentially overwhelming backends during window transitions.^[20]^[21] The sliding window method refines this by using a continuously moving time frame, such as the last 60 seconds, to count requests more accurately and avoid fixed boundary issues. It tracks individual request timestamps within the window, evicting those older than the window's start (current time minus duration) before checking the total against L. For efficiency, exact tracking of all timestamps can be memory-intensive, so approximations combine multiple fixed windows or leverage data structures like Redis sorted sets: timestamps are added as scores in a sorted set per client, old entries are removed via range queries (e.g., ZREMRANGEBYSCORE for scores below t - W), and the count (ZCOUNT within the window) determines allowance. If the count exceeds L, the request is denied; otherwise, the timestamp is added. This approach ensures no more than L requests in any sliding interval of length W. In Kong's Rate Limiting Advanced plugin, sliding windows dynamically incorporate prior data for smoother enforcement across multiple window sizes.^[22]^[23]^[24] Sliding window methods provide higher precision for burst control and fairness, with low average memory use when using approximations like multi-fixed windows (e.g., 10 one-second sub-windows for a 10-second limit), enabling scalability in distributed systems. They are common for real-time applications requiring strict per-second accuracy without the predictability loss of fixed windows. Drawbacks include increased computational cost for timestamp management and eviction, especially in high-throughput scenarios, and higher storage needs for log-based variants compared to fixed counters.^[22]^[20]

Implementations

Software Implementations

Software implementations of rate limiting have evolved significantly since the late 1990s, beginning with early Unix tools such as iptables, which introduced the limit module for basic packet rate control in Linux kernels around 2001 to mitigate denial-of-service threats.^[25] By the mid-2000s, web servers like NGINX adopted more sophisticated middleware, with the ngx_http_limit_req_module—introduced in version 0.7.21—implementing leaky bucket algorithms to limit HTTP request rates per key, such as IP address, using shared memory zones.^[26] In modern cloud-native environments post-2017, service meshes like Istio leverage Envoy proxies for both local (per-instance) and global rate limiting, enabling dynamic traffic control across microservices without altering application code.^[27] Common software approaches rely on in-memory counters for simple, single-instance setups, where request counts are tracked locally using data structures like atomic integers to enforce limits efficiently.^[28] For distributed systems, Redis serves as a popular shared storage backend, implementing sliding window or token bucket algorithms via atomic operations like INCR and EXPIRE to synchronize counters across nodes and prevent race conditions.^[28] Middleware solutions, such as NGINX's limit_req module, provide out-of-the-box integration by defining zones (e.g., 10MB shared memory) and rates (e.g., 1 request/second with burst=5), delaying or rejecting excess requests with HTTP 503 responses.^[26] Distributed rate limiting introduces challenges like ensuring consistent counters across multiple instances, where local in-memory tracking can lead to per-node limits that exceed global quotas if not synchronized.^[29] To address this, shared storage solutions such as Redis or etcd are used for centralized state management; for instance, Redis employs Lua scripts for atomic decrements, while etcd provides distributed locking for peer coordination in systems like Gubernator.^[28]^[30] Databases like PostgreSQL can also serve as backends but introduce higher latency compared to in-memory options.^[29] Programming languages offer dedicated libraries for seamless integration. In Python, Flask-Limiter extends Flask applications using the underlying limits library, which supports token bucket strategies via Redis-backed storage for distributed environments.^[31] A basic integration example limits routes by IP:

python
from flask import Flask
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

app = Flask(__name__)
limiter = Limiter(
    key_func=get_remote_address,
    app=app,
    storage_uri="redis://localhost:6379",  # For distributed use
    default_limits=["200 per day", "50 per hour"]
)

@app.route("/api/resource")
@limiter.limit("5 per minute")  # Token bucket: 5 requests/minute
def resource():
    return "Access granted"
from flask import Flask
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

app = Flask(__name__)
limiter = Limiter(
    key_func=get_remote_address,
    app=app,
    storage_uri="redis://localhost:6379",  # For distributed use
    default_limits=["200 per day", "50 per hour"]
)

@app.route("/api/resource")
@limiter.limit("5 per minute")  # Token bucket: 5 requests/minute
def resource():
    return "Access granted"

This configuration tracks requests in Redis, rejecting excess with HTTP 429.^[31] In Java, Resilience4j provides a RateLimiter module that divides time into configurable cycles (e.g., 1ms refresh period with 10 permissions per cycle), using semaphores or atomic references for thread-safe enforcement.^[32] A simple example decorates a service call:

java
import io.github.resilience4j.ratelimiter.RateLimiter;
import io.github.resilience4j.ratelimiter.RateLimiterConfig;
import java.time.[Duration](/page/Duration);

RateLimiterConfig config = RateLimiterConfig.custom()
    .limitRefreshPeriod([Duration](/page/Duration).ofMillis(1000))
    .limitForPeriod(10)
    .timeoutDuration([Duration](/page/Duration).ofMillis(500))
    .build();

RateLimiter rateLimiter = RateLimiter.of("backendService", config);
Supplier<String> restrictedSupplier = RateLimiter.decorateSupplier(rateLimiter, () -> "Success");
String result = Try.ofSupplier(restrictedSupplier).get();
import io.github.resilience4j.ratelimiter.RateLimiter;
import io.github.resilience4j.ratelimiter.RateLimiterConfig;
import java.time.[Duration](/page/Duration);

RateLimiterConfig config = RateLimiterConfig.custom()
    .limitRefreshPeriod([Duration](/page/Duration).ofMillis(1000))
    .limitForPeriod(10)
    .timeoutDuration([Duration](/page/Duration).ofMillis(500))
    .build();

RateLimiter rateLimiter = RateLimiter.of("backendService", config);
Supplier<String> restrictedSupplier = RateLimiter.decorateSupplier(rateLimiter, () -> "Success");
String result = Try.ofSupplier(restrictedSupplier).get();

This allows up to 10 calls per second, blocking excess for up to 500ms.^[32] Best practices distinguish between per-user limits, which target individual abuse (e.g., 100 requests/hour per API key) to maintain fairness, and global limits, which cap total system load (e.g., 10,000 requests/minute across all users) to ensure stability.^[33]^[34] For graceful degradation, implementations should return HTTP 429 "Too Many Requests" status codes with Retry-After headers indicating wait times, allowing clients to back off exponentially without abrupt failures.^[35] Limits should be configurable per endpoint, with monitoring to adjust dynamically based on load, prioritizing low-cost operations over resource-intensive ones.^[36]

Hardware Implementations

Hardware implementations of rate limiting primarily rely on dedicated appliances and application-specific integrated circuits (ASICs) to enforce traffic controls at high speeds. Dedicated appliances, such as F5 BIG-IP systems, utilize firmware-based mechanisms to perform rate shaping, limiting ingress traffic rates to mitigate volumetric attacks like DDoS without significant processing delays.^[37] Similarly, Cisco Adaptive Security Appliances (ASA) and routers implement Quality of Service (QoS) features, including policing and shaping, to regulate bandwidth on interfaces, ensuring compliant traffic adheres to specified rates while excess is dropped or queued.^[38] These appliances have been integral to enterprise firewalls since the early 2000s, when ASIC advancements enabled mainstream adoption for performance-critical environments.^[39] In network switches and routers, ASICs facilitate line-rate enforcement through mechanisms like access control lists (ACLs) combined with policers. For instance, Cisco Nexus switches apply rate limiters per ASIC to control egress traffic, preventing congestion without involving the CPU for packet processing.^[40] Many vendors, including HPE and Juniper, embed token bucket algorithms in firmware to manage bursty traffic; tokens accumulate at a defined rate, allowing transmission only when sufficient tokens are available, thus maintaining gigabits-per-second throughput with minimal latency.^[41]^[42] This hardware acceleration avoids CPU overhead, enabling sustained performance at scales like 10 Gbps or higher on enterprise edges.^[43] A notable application involves Border Gateway Protocol (BGP) Flowspec, as defined in RFC 5575, which propagates rate-limiting rules across ISP peering sessions to enforce policies such as capping traffic at 10 Gbps per IP prefix.^[44] In ISP deployments, this mechanism allows rapid dissemination of flow specifications for DDoS mitigation, where upstream providers apply hardware-enforced limits on peered traffic to protect downstream networks, as recommended in industry best practices.^[45] Such case studies demonstrate hardware's role in inter-domain agreements, ensuring compliance without software intervention at the core. Despite these advantages, hardware rate limiting incurs higher upfront costs compared to software solutions and offers less flexibility for dynamic policy adjustments, often requiring firmware flashes for updates.^[46] These limitations make hardware suitable for fixed, high-volume scenarios but challenging for rapidly evolving requirements.

Applications

In Web Services and APIs

In web services and content delivery networks (CDNs), rate limiting is employed to control the volume of HTTP requests from individual clients, typically identified by IP address, to prevent abuse such as web scraping or denial-of-service attacks. For instance, Cloudflare's Web Application Firewall (WAF) allows administrators to configure rules that track requests over periods ranging from 10 seconds to 1 hour, blocking or throttling traffic when thresholds are exceeded; an example rule permits a maximum of 100 requests in 10 minutes from a mobile app to specific endpoints, mitigating excessive automated access while allowing legitimate bursts. This approach helps maintain service availability by distributing load evenly and reducing the impact of malicious or high-volume scraping, which can otherwise overwhelm origin servers.^[33]^[47] API throttling extends these principles to programmatic interfaces, enforcing per-key or per-user limits to ensure fair resource allocation and protect backend systems. The Twitter API (now X API), launched in 2006 without initial restrictions, introduced mandatory authentication and rate limiting in its 1.1 version in 2012 to curb abuse from third-party applications, evolving further with tiered access models in 2017 that differentiated limits based on developer plans, such as 15 requests per 15-minute window for certain read endpoints in standard tiers. Similarly, Stripe's API, operational since around 2011, applies a default limit of 25 requests per second across endpoints, with higher allowances granted to accounts based on usage patterns and subscription tiers to accommodate enterprise-scale operations without uniform throttling. These mechanisms promote sustainable API usage by capping requests during peak loads, such as bursts in payment processing.^[48]^[49]^[50]^[51] Enforcement in web services often involves standardized HTTP responses and metadata headers to signal limits to clients. When a rate limit is exceeded, servers return a 429 Too Many Requests status code, indicating temporary overload and suggesting a retry delay via the Retry-After header. Complementary headers, such as X-RateLimit-Remaining, provide real-time quota information—for example, the number of remaining requests in the current window—allowing clients to adjust behavior proactively; this is commonly integrated with OAuth authentication for user-specific limits, where tokens carry individualized quotas tied to account permissions. In practice, services like GitHub apply primary rate limits of 5,000 requests per hour to OAuth tokens, scaling with user authentication levels to enforce granular control.^[52]^[53]^[54]^[55] Challenges in these contexts include evasion tactics like proxies and VPNs, which obscure client identities and allow distributed request patterns to bypass IP-based limits, necessitating advanced detection such as behavioral analysis or ASN-level tracking. Adaptive limits address this by dynamically adjusting quotas based on user tiers—e.g., higher allowances for premium subscribers—or observed behavior, though tuning remains complex to avoid false positives for legitimate high-volume users. Rate limiting effectively mitigates bot traffic, which comprises 51% of overall web activity, with bad bots accounting for 37%.^[56]^[57]^[58]^[59]^[60] Emerging standards aim to standardize communication of these policies. The IETF's draft-ietf-httpapi-ratelimit-headers (version 10, as of September 2025) defines headers like RateLimit-Policy for declaring quotas (e.g., 100 requests over 60 seconds) and RateLimit for current status (e.g., remaining requests until reset), enabling consistent client-side handling across APIs and reducing trial-and-error throttling. This draft, on the Standards Track, builds on earlier proposals to foster interoperability in HTTP-based services.^[61]

In Network Security and Data Centers

In network security, rate limiting plays a critical role in mitigating distributed denial-of-service (DDoS) attacks by constraining the volume of incoming traffic at key protocol layers. Firewalls often implement SYN flood limits to cap the rate of TCP SYN packets, preventing attackers from overwhelming connection tables with half-open sessions; this technique has been a standard defense since the early 2000s, allowing legitimate traffic to proceed while dropping excess SYN requests. Similarly, Border Gateway Protocol (BGP) rate limiting helps prevent route flapping, where unstable route advertisements propagate rapidly across networks, by damping or suppressing frequent updates; RFC 3882 outlines mechanisms like BGP communities for blackholing affected prefixes during DoS events, enhancing overall routing stability.^[62] In data centers, rate limiting supports efficient resource allocation and autoscaling to maintain performance under varying loads. Amazon Web Services (AWS) employs concurrency limits in Lambda functions, introduced with the service in 2014 and refined for per-function controls by 2017, to throttle invocations and prevent any single workload from monopolizing capacity across regions.^[63] Google Cloud integrates rate limiting policies in its load balancers via Cloud Armor, enabling per-client throttling to distribute traffic evenly and protect backend services from overload. These mechanisms ensure multi-tenant isolation, such as in Kubernetes clusters where network policies, stabilized post-2017, enforce bandwidth limits between namespaces to prevent noisy neighbors from impacting shared infrastructure.^[64] Large-scale deployments exemplify rate limiting's impact in hyperscale environments. Akamai's Prolexic scrubbing centers, with over 20 Tbps of dedicated capacity across 36 global locations, apply per-prefix limits to filter DDoS traffic at terabit-per-second scales, as demonstrated in mitigating a 1.3 Tbps volumetric attack in 2024.^[65] Integration with Security Information and Event Management (SIEM) tools further enhances this by feeding rate limit violation logs into anomaly detection systems, enabling real-time correlation of traffic spikes with potential threats.^[66] Such practices have been shown to reduce outage risks in hyperscale data centers by limiting resource exhaustion during attacks, though exact reductions vary by implementation. Hardware accelerators, like those in network interface cards, provide core enforcement for these limits at line rates. Emerging trends leverage artificial intelligence (AI) for dynamic rate limiting in 5G networks, deployed widely post-2020, where machine learning models adjust limits in real-time based on traffic patterns and slicing needs to optimize resource allocation without fixed thresholds.^[67] This AI-driven approach supports ultra-reliable low-latency communications by predicting and preempting congestion in hybrid satellite-terrestrial setups.^[68]

References

[1]
What is rate limiting? | Rate limiting and bots - Cloudflare
Rate limiting is a strategy for limiting network traffic. It puts a cap on how often someone can repeat an action within a certain timeframe.
[2]
Rate Limiting with NGINX - NGINX Community Blog
Jun 12, 2017 · In this blog we will cover the basics of rate limiting with NGINX as well as more advanced configurations. Rate limiting works the same way in NGINX Plus.
[3]
Rate Limiting - Redis
Rate limiting is a technique used in computer systems to control the rate at which requests are sent or processed in order to maintain system stability and ...Missing: science | Show results with:science
[4]
RFC 3290 - An Informal Management Model for Diffserv Routers
o Appendix A contains a brief discussion of the token bucket and leaky bucket algorithms used in this model and some of the practical effects of the use of ...
[5]
RFC 7415 - Session Initiation Protocol (SIP) Rate Control
Conceptually, the leaky bucket algorithm can be viewed as a finite capacity bucket whose real-valued content drains out at a continuous rate of 1 unit of ...
[6]
Rate limit - Glossary - MDN Web Docs
Jul 11, 2025 · In computing, especially in networking, rate limiting means controlling how many operations can be performed in a given amount of time, usually ...Missing: definition | Show results with:definition
[7]
Rate Limiting Fundamentals - by Alex Xu
May 31, 2023 · Rate limiting controls the rate at which users or services can access a resource. When the rate of requests exceeds the threshold defined by ...Missing: science | Show results with:science
[8]
burst-size | Junos OS - Juniper Networks
Configure the number of bytes of bursting traffic allowed to pass through a storm control interface. The burst size allows for short periods of back-to-back ...
[9]
Design a Distributed Rate Limiter - Hello Interview
Jul 30, 2025 · What is a Rate Limiter? A rate limiter controls how many requests a client can make within a specific timeframe. It acts like a traffic ...High-Level Design · Token Bucket · Potential Deep Dives
[10]
The 'leaky bucket' policing method in the ATM (asynchronous ...
An important function of the ATM network is bandwidth enforcement or policing. The so-called 'leaky bucket' algorithm can be used to monitor and enforce the bit ...
[11]
RFC 1633: Integrated Services in the Internet Architecture
This memo discusses a proposed extension to the Internet architecture and protocols to provide integrated services.
[12]
API Throttling vs. API Rate Limiting - System Design - GeeksforGeeks
Jul 23, 2025 · Throttling regulates the rate of incoming requests over time to prevent traffic spikes, while Rate Limiting sets strict limits on the number of requests a ...
[13]
Best Practices for API Rate Limits and Quotas - DZone
Feb 3, 2025 · Unlike short-term rate limits, the goal of quotas is to enforce business terms such as monetizing your APIs and protecting your business from ...
[14]
API Rate Limiting, Throttling, API Quota & API Bursts Defined
Jan 18, 2024 · With API rate limiting or API throttling, you can cap the number of requests an API gateway can process in a given period.
[15]
New directions in communications (or which way to the information ...
New directions in communications (or which way to the information age?) Published in: IEEE Communications Magazine ( Volume: 24 , Issue: 10 , October 1986 ).
[16]
24 Token Bucket Rate Limiting - An Introduction to Computer Networks
Token bucket uses a bucket filled with tokens at a steady rate. To send a packet, a token is taken; if the bucket is empty, the packet is non-compliant.
[17]
Leaky Bucket Algorithm - an overview | ScienceDirect Topics
The leaky bucket algorithm is defined as a network traffic management mechanism that monitors data flows by using a finite-capacity bucket, which drains at ...Introduction · Theoretical Model and Operation · Comparison with Token...
[18]
https://www.sciencedirect.com/science/article/pii/B9780128007372000211
[19]
https://www.sciencedirect.com/science/article/pii/S1574013721000265
[20]
[PDF] Load Balancer Filter-Based Approach To Enable Distributed ... - Fruct
The rate limiting framework was tested with five algorithms: Fixed Window Counter, Sliding Window Log, Sliding Window. Counter, Token Bucket and GCRA. The Leaky ...
[21]
Rate Limiting - Plugin - Kong Docs
You can use the Rate Limiting plugin to limit how many HTTP requests can be made in a given period of seconds, minutes, hours, days, months, or years.Rate Limiting Advanced · Configuration Reference · Examples · Changelog
[22]
[PDF] Selection of A Suitable Algorithm for the Implementation of Rate ...
While Bucket4j is mainly based on token bucket algorithm, rate limiting processes can be based on various effective algo- rithms. Selecting the most suitable ...
[23]
Redis sorted sets | Docs
Rate limiters. In particular, you can use a sorted set to build a sliding-window rate limiter to prevent excessive API requests. You can think of sorted ...
[24]
Rate Limiting Advanced - Plugin - Kong Docs
Rate limit how many HTTP requests can be made in a given time frame using multiple rate limits and window sizes, and applying sliding windows.Configuration Reference · Examples · Changelog
[25]
How to Set Packet Rate Limit Through iptables | Baeldung on Linux
Mar 18, 2024 · Packet rate limiting, using iptables, is set by source address, using extensions like conntrack, limit, and state, to prevent DoS attacks.Missing: history | Show results with:history
[26]
Module ngx_http_limit_req_module - nginx
The ngx_http_limit_req_module module (0.7. 21) is used to limit the request processing rate per a defined key, in particular, the processing rate of requests ...limit_req · limit_req_log_level
[27]
Enabling Rate Limits using Envoy - Istio
This task shows you how to use Envoy's native rate limiting to dynamically limit the traffic to an Istio service.Rate limits · Global rate limit · Local rate limit · Verify the results
[28]
How to build a Rate Limiter using Redis
Jan 31, 2025 · Step 1. Pre-requisite# · Step 2. Clone the repository# · Step 3. Run docker compose or install redis manually# · Step 4. Setup and run#.Step 4. Setup And Run · How It Works? · How The Data Is Stored
[29]
Rate Limiting pattern - Azure Architecture Center - Microsoft Learn
The rate limiting pattern introduces the concept of a distributed mutual exclusion system on partitions, which allows you to manage capacity for multiple ...
[30]
Gubernator: Cloud-native distributed rate limiting for microservices
Gubernator evenly distributes rate limit requests across the entire cluster, which means you can scale the system by simply adding more nodes. Read more –
[31]
Flask-Limiter
### Summary of Flask-Limiter Rate Limiting Implementation
[32]
RateLimiter
### Summary of Resilience4j RateLimiter
[33]
Rate limiting best practices - WAF - Cloudflare Docs
Sep 22, 2025 · A typical use case of rate limiting is to protect a login endpoint against attacks such as credential stuffing ↗. The following example contains ...
[34]
Mastering API Rate Limiting: Strategies, Challenges, and Best ...
Aug 7, 2024 · API rate limiting is a technique for controlling how many requests a user or an application can send to an API in a timeframe.
[35]
API Rate Limits Explained: Best Practices for 2025 - Orq.ai
Feb 5, 2025 · HTTP 429: Too Many Requests – This error indicates that the API usage limit has been reached and further requests are being denied until the ...
[36]
10 Best Practices for API Rate Limiting in 2025 - Zuplo
Jan 6, 2025 · Dynamic rate limiting can further improve performance by adjusting thresholds based on real-time metrics like server load and user behavior. For ...Missing: quantitative | Show results with:quantitative
[37]
[PDF] F5 DDoS Protection: Recommended Practices
Rate- shaping can limit the rate of ingress traffic at the BIG-IP and may be the easiest way to push back against a volumetric attack. While powerful, rate ...
[38]
Configure Quality of Service on Adaptive Security Appliance - Cisco
You can configure QoS on the security appliance in order to provide rate limiting on selected network traffic for both individual flows and VPN tunnel flows, in ...
[39]
A Practical History of the Firewall - Part 3 - FireMon
Apr 9, 2024 · Learn how ASICs and firewall appliances revolutionized performance in the early 2000s, making security mainstream and transforming enterpriseMissing: gigabits | Show results with:gigabits
[40]
[PDF] Configuring Rate Limits - Cisco
The rate-limiter on egress ports is limited per ASIC, rather than per port or SPAN session. The rate-limiter only applies to ERSPAN and not local SPAN traffic.<|separator|>
[41]
[PDF] HPE FlexFabric 5944 & 5945 Switch Series - ACL and QoS ...
Device A. Device B. Physical link. Page 54. 47. Figure 12 Rate limit implementation. The token bucket mechanism limits traffic rate when accommodating bursts.
[42]
Configuring a Rate-Limiting Filter Based on Destination Class
Configure a rate-limiting filter by creating a policer template, referencing it in a filter term, and applying the filter to a logical interface.
[43]
Hardware Firewall - an overview | ScienceDirect Topics
Hardware firewalls can process packets at high bandwidths, with 10 gigabits per second (Gbps) easily achieved, and operate at wireline speeds to minimize delay ...
[44]
RFC 5575: Dissemination of Flow Specification Rules
This document defines a new Border Gateway Protocol Network Layer Reachability Information (BGP NLRI) encoding format that can be used to distribute traffic ...Missing: ISP | Show results with:ISP
[45]
[PDF] M3AAWG Border Gateway Protocol (BGP) Flowspec Best Practices
Like an upstream ISP blocking DDoS traffic for a customer, ISPs that peer with another ISP that is the victim of a DDoS attack can receive Flowspec routes to.
[46]
Firewall Types and Selection. Hardware vs Software - zenarmor.com
Oct 18, 2023 · The main advantages of a software firewall are as follows: Lower Cost: Since a hardware firewall is a hardware appliance, installing one is more ...
[47]
Rate limiting parameters - WAF - Cloudflare Docs
Oct 22, 2025 · For example, you might want to perform rate limiting for clients sending more than five requests to /api/ resulting in a 403 HTTP status code ...
[48]
5 lessons from their ever-changing API strategy - Criteria
Jan 18, 2023 · A look back at Twitter's API strategy over the years and lessons for your own API progam.
[49]
https://blog.twitter.com/developer/en_us/a/2012/changes-coming-to-twitter-api
[50]
Rate limits | Stripe Documentation
API endpoints have a default limit of 25 requests per second. Stripe may increase rate limits for specific accounts based on usage. Subscriptions API:
[51]
Scaling your API with rate limiters - Stripe
Mar 30, 2017 · In this post, we'll explain in detail which rate limiting strategies we find the most useful, how we prioritize some API requests over others, ...Scaling Your Api With Rate... · Request Rate Limiter · Concurrent Requests Limiter
[52]
429 Too Many Requests - HTTP - MDN Web Docs
Jul 4, 2025 · The HTTP 429 error means the client sent too many requests in a given time, a mechanism called 'rate limiting'. A Retry-After header may be ...Rate limit · Retry-After header · 431 Request Header Fields...
[53]
RateLimit Header Fields for HTTP - IETF
Mar 3, 2020 · This document defines the RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset header fields for HTTP, thus allowing servers to publish current request quotas.Table of Contents · Expressing rate-limit policies · Providing RateLimit headers
[54]
Rate limits for the REST API - GitHub Docs
The primary rate limit for unauthenticated requests is 60 requests per hour. Primary rate limit for authenticated users. You can use a personal access token to ...
[55]
Rate limiting - Jira Cloud platform - Developer, Atlassian
Apps can detect rate limits by checking if the HTTP response status code is 429 . ... X-RateLimit-Remaining : This header shows the number of requests ...
[56]
Mastering Rate Limit Bypass Techniques | by coffinxp
May 9, 2025 · Websites and APIs often implement rate limiting through various techniques, including limiting requests based on IP addresses, API keys or user ...
[57]
Using machine learning to detect bot attacks that leverage ...
Jun 24, 2024 · Advanced bots bypass country blocks, ASN blocks, and rate-limiting. Every time, the bot operator moves to a new IP address space until they ...Residential Ip Proxies · Ml Model Training · Ml Features For Bot...
[58]
The Complete Rate Limiting Handbook: Prevent Abuse & Optimize ...
Apr 3, 2025 · The Core Components of a Rate Limiting System. An effective rate limiting system consists of several interconnected components: 1. Request ...
[59]
[PDF] 2025-Bad-Bot-Report.pdf - Thales
In 2024, the Imperva Threat Research team observed a significant surge in API-directed attacks, with 44% of advanced bot traffic targeting APIs—compared to just ...Missing: statistics | Show results with:statistics
[60]
Infographic: Bad Bot Sophistication Levels - Imperva
Jun 25, 2021 · Imperva's 2021 Bad Bot Report reported that bad bot traffic has maintained its upwards trend, amounting to 25.6 percent of all traffic, a new ...
[61]
draft-ietf-httpapi-ratelimit-headers-10
Sep 27, 2025 · This document defines the RateLimit-Policy and RateLimit HTTP header fields for servers to advertise their quota policies and the current service limits.
[62]
RFC 3882 - Configuring BGP to Block Denial-of-Service Attacks
RFC 3882 describes using BGP communities to remotely trigger black-holing of a network to block DoS attacks, and also sinkhole tunnels.
[63]
Understanding Lambda function scaling - AWS Documentation
By default, Lambda provides your account with a total concurrency limit of 1,000 concurrent executions across all functions in an AWS Region.Provisioned Concurrency · Configure reserved concurrency
[64]
Network Policies - Kubernetes
Apr 1, 2024 · You can create a "default" ingress isolation policy for a namespace by creating a NetworkPolicy that selects all pods but does not allow any ...Network Plugins · Declare Network Policy · DNS for Services and PodsMissing: tenant 2017
[65]
Akamai Prevents DDoS Attack on Major U.S. Customer
Sep 13, 2024 · The attack peaked at 1.3 terabits per second (Tbps), making it the third-largest volumetric DDoS attack recorded on the Akamai Prolexic ...Missing: prefix | Show results with:prefix
[66]
Integrating APIM with SIEM - Gravitee
Aug 12, 2024 · Integrating API management systems with SIEM enhances the security of APIs by providing real-time monitoring, threat detection, and compliance reporting.
[67]
(PDF) AI-Driven 5G Network Optimization - ResearchGate
Oct 25, 2024 · This paper comprehensively reviews AI-driven methods applied to 5G network optimization, focusing on resource allocation, traffic management, and network ...
[68]
[PDF] AI-Based Dynamic Spectrum Allocation for Hybrid Satellite-5G ...
In this framework, a hybrid network management system uses AI to continually monitor spectrum usage across both satellite and terrestrial 5G networks, adjusting.