Billion laughs attack
The Billion Laughs attack, also known as an XML bomb or exponential entity expansion attack, is a type of denial-of-service (DoS) attack that exploits vulnerabilities in XML parsers by using recursively nested entity definitions within a document type definition (DTD) to trigger massive, exponential data expansion during parsing, often consuming excessive memory and CPU resources to crash the targeted system.[1][2]
This attack works by defining a series of entities in an inline DTD that reference each other in a nested manner, where each subsequent entity repeats the previous one multiple times, leading to rapid amplification of the parsed output.[3] For instance, an attacker might craft an XML document starting with a base entity like <!ENTITY lol "lol">, followed by <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;"> (repeating 10 times), and continuing this pattern up to <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">, so that invoking &lol9; in the document body expands to a billion instances of "lol", transforming a small input file (under 1 KB) into gigabytes of data.[1][3] The name "Billion Laughs" derives from this explosive repetition of a humorous string like "lol", though variations can use any repeating content and have been adapted to other markup languages such as YAML that support similar macro expansions.[2][4]
First documented in detail around 2009, the attack highlights longstanding issues in XML processing libraries across platforms, including Microsoft's .NET Framework, Java's DOM parsers, and Python's xml modules, where default configurations often enable DTD processing on untrusted inputs.[3] Its consequences include application crashes, service disruptions, and potential denial of service for entire servers, with a medium risk level as classified by security tools due to its ease of execution and broad applicability to any system parsing external XML.[2] To mitigate it, developers should disable DTD processing entirely (e.g., via ProhibitDtd = true in .NET or disallow-doctype-decl in Java), limit entity expansion depth and character counts, and use secure parsing libraries like Python's defusedxml or updated frameworks such as .NET 4.5.2 and later, which prohibit such expansions by default.[5][3]
Introduction and Background
Definition and Overview
The Billion Laughs attack, also known as the XML bomb or exponential entity expansion (XEE) attack, is a type of denial-of-service (DoS) exploit that targets XML parsers by leveraging nested entity definitions to trigger exponential growth in the processed document size during parsing.[6][3] This attack exploits the way XML parsers resolve and expand entities as specified in the XML 1.0 standard, where a small input file can expand to consume vast amounts of memory and computational resources.[7][2]
In XML, entities serve as placeholders for reusable content and are declared within a Document Type Definition (DTD), which provides a schema for validating document structure. Internal entities are defined entirely within the DTD subset of the document itself, while external entities reference content from outside sources, such as files or URLs; parsers replace entity references with their resolved values during processing, potentially leading to recursive expansions if entities are nested.[7][8] The DTD, enclosed in a DOCTYPE declaration, enables these entity definitions and is a core mechanism for XML modularity, but it introduces risks when parsers fully expand entities without limits.[7]
The primary objective of the Billion Laughs attack is to overwhelm target systems with resource exhaustion using a compact payload, often just a few kilobytes, which expands to gigabytes or more in memory usage, causing parser crashes, application slowdowns, or complete system denial.[6][9] This makes it particularly effective against services that parse untrusted XML inputs, such as web APIs or document processors.[3]
As a form of algorithmic complexity attack, the Billion Laughs exploits the worst-case quadratic or higher time and space complexity in entity resolution algorithms, aligning it with broader resource exhaustion techniques that manipulate data structures to trigger disproportionate computational demands.[10][11]
History and Discovery
The Billion Laughs attack, a form of denial-of-service (DoS) exploit targeting XML parsers, was first publicly discussed in late December 2002, with security researcher Amit Klein of Sanctum Inc. disclosing details on December 17, 2002.[12] Klein demonstrated how malicious Document Type Definitions (DTDs) in XML documents could trigger recursive entity references, leading to exponential expansion and unlimited resource consumption in vulnerable parsers, ultimately causing crashes or severe performance degradation. This discovery highlighted a critical flaw in XML processing across multiple vendors' implementations, including parsers for SOAP and web services servers, marking the initial identification of entity expansion as a potent DoS vector.[13]
In early discussions, the vulnerability was commonly termed an "XML bomb" due to its explosive impact on system resources. The more evocative name "Billion Laughs" gained popularity in the mid-2000s, originating from proof-of-concept payloads that defined nested entities like &lol;, &lolz;, and others, resulting in billions of repeated "lol" strings upon expansion and evoking cascading laughter. This terminology underscored the attack's deceptive simplicity and humorous yet destructive nature, while Klein's work shifted focus from general parser overloads to targeted entity manipulation.[13]
Prior to 2002, general DoS attacks on web applications were documented as early as 1992, typically involving large or malformed inputs to exhaust resources, but XML-specific exploits emerged with the adoption of XML following its standardization in 1998. Post-disclosure, awareness evolved rapidly, with entity-specific exploits becoming a focal point in XML security research. The vulnerability was formally cataloged by MITRE as CWE-776 ("Improper Restriction of Recursive Entity References in DTDs") in 2009, establishing it as a canonical DoS technique in XML processing and prompting updates to security guidelines in standards bodies, though core XML 1.0 specifications saw clarifications on entity handling in subsequent editions without mandating restrictions.[3][13]
Mechanism
How the Attack Works
The Billion Laughs attack targets XML parsers that enable Document Type Definition (DTD) processing and entity expansion by default, as these features allow the definition and recursive resolution of internal entities within the XML document.[3][6]
The attack begins with the inclusion of an inline DTD in the XML document, where a series of nested general entities is defined to create recursive references. For instance, a base entity such as "lol" is first defined with a short string value, like "lol"; subsequent entities then reference this base multiple times, with each higher-level entity expanding the previous one by a fixed factor, building a chain of dependencies up to several levels deep. When the parser encounters a reference to the deepest nested entity in the document body, it initiates recursive resolution: the deepest entity is expanded by substituting its references, which in turn triggers further expansions down the chain, ultimately replicating the base string an enormous number of times. This process exploits the parser's obligation to fully resolve all entities before constructing the document object model (DOM) or output stream.[1][3][6]
The growth follows an exponential mathematical model, where the final expanded size approximates b^n, with b as the base expansion factor (commonly 10, representing 10 repetitions per level) and n as the number of nesting levels. For example, with 8 levels and a base of 10, the resolution yields approximately $10^8 (100 million) instances of the base string, transforming an input document of under 1 KB into an output requiring around 300 MB of memory. This amplification occurs because each level multiplies the preceding one's output without bound, limited only by the parser's configuration or system resources.[3][1][6]
During expansion, the parser allocates memory for the increasingly large intermediate strings at each recursion level, leading to rapid exhaustion of available RAM and potential invocation of costly virtual memory swapping. Additionally, the recursive substitutions demand significant CPU cycles for string concatenation and entity lookup, while poorly implemented parsers may encounter stack overflows from deep recursion depths, culminating in a denial-of-service condition through resource depletion.[3][1][6][2]
Code Example
The following XML document provides a practical example of a Billion Laughs attack payload, utilizing an internal DTD to define nested general entities that enable exponential expansion during parsing.[1]
xml
<?xml version="1.0"?>
<!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
<!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
<!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
<!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
<!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
<!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>
<?xml version="1.0"?>
<!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
<!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
<!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
<!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
<!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
<!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>
This payload begins with the base entity lol defined as the string "lol" and builds eight levels of nesting, where each higher-level entity incorporates the prior entity ten times. When a vulnerable XML parser resolves the reference to &lol9; in the root element, it triggers recursive expansion, resulting in approximately 100 million instances of "lol". The original document is under 1 KB (roughly 800 bytes), but the fully expanded output exceeds 300 million characters, equivalent to about 300 MB of memory allocation (assuming UTF-8 encoding).[3][1] Vulnerable parsers, including older versions of libxml2 prior to adequate recursion limits, exhibit denial-of-service effects such as excessive CPU usage, memory exhaustion, or outright crashes during entity resolution.[14]
The expansion can be visualized through the following table, which outlines each entity level, its references, the multiplier introduced at that level, and the approximate cumulative character count after expansion (based on 3 characters per "lol" instance):
| Level | Entity | References | Multiplier (per level) | Cumulative Multiplier | Approximate Size (characters) |
|---|
| 0 | lol | "lol" | 1 | 1 | 3 |
| 1 | lol2 | 10 × lol | 10 | 10 | 30 |
| 2 | lol3 | 10 × lol2 | 10 | 10² | 300 |
| 3 | lol4 | 10 × lol3 | 10 | 10³ | 3,000 |
| 4 | lol5 | 10 × lol4 | 10 | 10⁴ | 30,000 |
| 5 | lol6 | 10 × lol5 | 10 | 10⁵ | 300,000 |
| 6 | lol7 | 10 × lol6 | 10 | 10⁶ | 3,000,000 |
| 7 | lol8 | 10 × lol7 | 10 | 10⁷ | 30,000,000 |
| 8 | lol9 | 10 × lol8 | 10 | 10⁸ | 300,000,000 |
To test this payload safely, operate within an isolated environment, such as a virtual machine or container, to prevent resource exhaustion on the primary system. The xmllint utility from the libxml2 library serves as a common tool for reproduction; invoking it on the file without entity expansion limits (e.g., xmllint example.xml) in vulnerable configurations will demonstrate the attack's effects through high memory consumption or process termination.[15][3]
Exponential Entity Expansion Variants
While the classic billion laughs attack relies on deeply nested internal entities to achieve exponential growth in output size, variants of exponential entity expansion modify this approach to exploit different parser behaviors, often targeting quadratic resource consumption or stack limits within XML processing. These adaptations emerged as parsers began implementing basic safeguards against nested recursion, prompting attackers to find alternative paths for denial-of-service effects.[16][3]
One prominent variant is the quadratic blowup attack, which avoids deep nesting by defining a single large internal entity—typically a long string of repeated characters, such as 50,000 instances of a base unit—and then referencing it multiple times (e.g., 50,000 references) in the document body or attributes. This results in O(n²) expansion, where a compact input document of around 200 KB can balloon to over 2.5 GB during parsing, exhausting memory without triggering recursion limits on entity depth. Discovered by security researcher Amit Klein, this technique bypasses defenses focused solely on nested entities, as the expansion occurs through repeated substitution rather than recursion.[3][16]
Deep recursion variants shift the attack vector toward CPU and stack exhaustion by creating long chains of entities that reference each other in a linear or cyclic manner, rather than exponential nesting. For instance, a single chain of hundreds or thousands of sequentially defined entities (e.g., &e1; expands to &e2;, &e2; to &e3;, and so on) can force the parser into excessive recursive calls, leading to stack overflows in implementations that do not enforce strict depth limits. Although the XML specification prohibits circular references, some parsers tolerate deep but non-cyclic chains, making this viable for resource denial; mutual recursion (e.g., &a; referencing &b;, and &b; referencing &a;) further amplifies CPU usage by inducing infinite loops where permitted. These attacks are particularly effective against mobile or embedded XML processors with limited stack space.[16]
Mixed internal and external entity expansions combine internal general entities with external parameter entities to amplify effects, often by embedding references to remote or large external DTDs within internal expansions. An attacker might define an internal entity that incorporates an external parameter entity (e.g., %ext; pulling in a large remote subset), then nest or repeat this hybrid within the document, leading to compounded growth as the parser fetches and substitutes external content during internal resolution. This variant exploits parsers that process both entity types simultaneously, increasing the attack surface beyond purely internal mechanisms.[16]
The evolution of XML standards and parsers has directly responded to these variants, with key implementations like Apache Xerces and .NET's XmlReader introducing configurable limits to curb expansion. Early XML 1.0 specifications (1998) lacked explicit resource bounds, allowing unchecked growth, but post-2002 attacks prompted additions such as Xerces-J's entityExpansionLimit property (defaulting to 100,000 expansions since version 2.6.2 in 2005) to cap total substitutions and prevent quadratic or deep recursion blowups. Similarly, .NET's XmlReader, vulnerable in versions 1.x to attribute-based expansions, incorporated MaxCharactersFromEntities (default 0, meaning unlimited, in .NET 2.0, 2005; explicitly configurable for limits) and DtdProcessing modes to prohibit or limit entity handling, reflecting a shift toward secure-by-default parsing in enterprise environments. These measures, informed by analyses of attack patterns, have reduced vulnerability in compliant parsers while highlighting ongoing gaps in legacy or non-standard implementations. Recent CVEs as of 2025, such as CVE-2024-1455 in LangChain libraries and CVE-2025-3225 in sitemap parsers, illustrate the continued exploitation of these variants in modern software.[16][17][18][19][20]
In YAML parsers, the billion laughs attack is replicated through the use of anchors (&) and aliases (*), which enable recursive references that cause exponential data expansion during deserialization. An anchor defines a reusable node, while aliases reference it multiple times, leading to nested structures that balloon in memory usage as the parser resolves them—for example, a small YAML file with 10 levels of 10-fold alias nesting can expand to billions of repeated elements, overwhelming libraries like PyYAML and causing denial of service.[21]
SVG files, being XML-based, allow the embedding of malicious entity definitions that trigger the classic billion laughs expansion when parsed by image processors or thumbnail generators. For instance, an SVG with nested entities like <!ENTITY lol1 "lol"> followed by progressively larger references can force exponential growth during rendering, consuming gigabytes of memory. Similarly, XMP metadata in formats like PDF or JPEG can carry these XML bombs, exploiting metadata extraction tools in workflows that handle diverse media.[22]
Although JSON lacks native entity support, denial-of-service attacks analogous to billion laughs occur via deeply nested or recursive structures that exhaust stack space during parsing. In Java's Jackson library, for example, unbounded nesting in JSON objects during deserialization can trigger stack overflow exceptions, leading to application crashes from resource depletion. These vulnerabilities, such as CVE-2020-36518, demonstrate how template or macro-like expansions in JSON processors mimic the exponential impact without true entities.
Cross-format vulnerabilities arise in file upload handlers and content management systems that process embedded markup from multiple sources, amplifying the attack surface. Systems like MediaWiki, for instance, can be targeted through uploaded images containing SVG or XMP payloads, where automated thumbnail generation or metadata parsing resolves the expansions, potentially denying service to the entire platform.[22]
Impact and Real-World Examples
Effects on Systems
The Billion Laughs attack induces severe resource exhaustion in vulnerable XML parsers by exploiting recursive entity expansion, where a compact input document under 1 KB balloons to around 3 GB in memory during processing. This exponential growth occurs as the parser resolves nested entities, consuming vast amounts of RAM and often triggering out-of-memory (OOM) conditions that activate system killers or force process termination to prevent total system failure.[3][23]
Concurrently, the attack drives CPU utilization to near 100% as the parser iteratively expands entities, halting normal operations and rendering the affected application unresponsive for extended periods. In environments like web services or API endpoints that ingest XML payloads, this results in widespread denial of service (DoS), where legitimate requests queue indefinitely or fail outright due to resource contention.[14][24]
Parser implementations exhibit varied responses to the attack; for instance, libxml2 in versions prior to 2.9.2 fails to block entity expansion even when disabled, leading to excessive CPU drain and potential application crashes from unchecked recursion. Similarly, older XML processors in desktop applications or services may hang or terminate abruptly, amplifying the DoS impact across connected systems.[24][14][23]
In microservices setups reliant on XML parsing, the attack on a single endpoint can propagate disruptions, as resource-starved services delay or fail responses to downstream components, creating cascading outages. Performance benchmarks indicate that just 9 levels of entity nesting suffice to generate billions of resolved strings, overwhelming typical system limits and necessitating manual intervention for recovery, such as process restarts.[3][23]
Notable Incidents
One of the earliest documented vulnerabilities related to the billion laughs attack occurred in 2002, when researcher Amit Klein demonstrated how recursive XML entity expansion could overwhelm parsers, leading to server crashes in systems using libraries like those in Apache Axis versions 1.x and early .NET Framework implementations.[23][25] These early incidents highlighted resource exhaustion risks in XML processing during the 2002-2010 period, affecting web services and applications reliant on unpatched parsers.[3]
In 2014, a coordinated vulnerability disclosure affected millions of WordPress and Drupal installations, where XML quadratic blowup attacks—a variant of exponential entity expansion—enabled denial-of-service through PHP's XML processor, impacting content management systems globally.[26] Similarly, in 2015, MediaWiki versions prior to 1.24.2 were susceptible to billion laughs attacks via SVG uploads and XMP metadata parsing under HHVM, as detailed in task T85848 and assigned CVE-2015-2942, potentially causing resource denial in wiki platforms.[22][27]
More recently, in 2024, the langchain-ai/langchain Python library (versions below 0.1.35) was found vulnerable to billion laughs attacks through its XMLOutputParser, allowing exponential entity expansion via a Billion Laughs attack that could lead to denial-of-service in AI-driven applications, as documented in CVE-2024-1455.[19] Although specific high-profile exploits in cloud APIs like pre-mitigated AWS XML services remain less publicly detailed, such vulnerabilities underscore ongoing risks in XML-dependent infrastructures.[5]
Reports of billion laughs and related XML entity expansion attacks have increased in the 2020s, driven by the proliferation of APIs and microservices, with OWASP noting their persistence as a misconfiguration risk in web applications, though no longer a standalone Top 10 category since 2021.[28][29]
Prevention and Mitigation
Parser Configuration
To mitigate the Billion Laughs attack, XML parsers can be configured to disable Document Type Definitions (DTDs), which prevents the declaration and expansion of internal entities used in the attack. In the Apache Xerces parser for Java, this is achieved by setting the feature http://apache.org/xml/features/disallow-doctype-decl to true on a DocumentBuilderFactory instance, ensuring no DTD processing occurs.[5] Similarly, in the libxml2 library, the XML_PARSE_NODTD option can be specified when creating a parser context with xmlCreatePushParserCtxt or equivalent functions, avoiding DTD loading and entity definitions altogether.[5]
Entity expansion limits provide an additional layer of protection by capping the number of recursive expansions, halting processing if the threshold is exceeded. In Java's JAXP implementation, the system property jdk.xml.entityExpansionLimit can be set to a low value like 1024 (e.g., via -Djdk.xml.entityExpansionLimit=1024 at JVM startup), limiting total entity expansions across documents.[30] For .NET's XmlReader, the XmlReaderSettings.MaxCharactersFromEntities property enforces a memory quota on expanded entity content, such as setting it to 1024000 characters to restrict output size from entity resolution.[31]
Secure parsing modes further enhance resilience by using non-validating parsers and disabling external entity resolution. In Java, enabling secure processing with factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true) on DocumentBuilderFactory activates built-in safeguards against excessive resource use, while overriding resolveEntity to return null prevents external fetches.[32] Non-validating parsers, which skip full DTD validation, are the default in many libraries when DTD features are disabled, reducing attack surface.
Library-specific configurations tailor these protections to common tools. In Python's xml.etree.ElementTree, external entity resolution is disabled by default since Python 3.6, but for added safety against internal expansions, the defusedxml package can be used with DefusedXMLParser(disable_entities=True) to block all entity processing.[33] Go's encoding/xml package lacks support for entity declarations by design, inherently preventing expansion attacks without additional configuration.[34]
Best Practices
To prevent billion laughs attacks, organizations should implement robust input validation mechanisms for XML processing. Sanitizing XML inputs by removing or prohibiting Document Type Definitions (DTDs) is a fundamental step, as these enable entity expansions that fuel the attack.[5] Additionally, employing schema validation without entity support ensures only well-formed, expected structures are parsed, rejecting malformed or oversized payloads.[13] Imposing strict length limits on incoming XML payloads further mitigates resource exhaustion by capping the data volume before parsing begins.[3]
Architectural defenses play a critical role in broader system resilience. Where feasible, applications should avoid XML altogether in favor of simpler formats like JSON for data exchange, reducing exposure to parser-specific vulnerabilities.[5] Isolating XML parsers within sandboxes or containers limits the blast radius of any expansion attempt, containing resource spikes to non-critical environments.[13] Rate limiting parse requests per user or IP address prevents flood-based exploitation, ensuring steady-state processing without overwhelming backend resources.[3]
Effective monitoring and response strategies enhance detection and recovery. Logging parser resource usage, such as memory and CPU consumption during XML processing, allows for real-time anomaly detection and alerting on unusual spikes indicative of entity expansion.[5] Deploying web application firewalls (WAFs) with rules targeting entity patterns, such as those in ModSecurity's OWASP Core Rule Set, blocks suspicious XML payloads at the network edge. Regular updates to XML parsing libraries address known vulnerabilities that could enable attacks, maintaining a hardened posture against evolving threats.[13]
Adhering to established compliance frameworks and fostering developer education ensures long-term prevention. Following the OWASP XXE Prevention Cheat Sheet provides standardized guidelines for secure XML handling across the development lifecycle.[5] Training developers on XML security risks, including entity expansion pitfalls, promotes awareness and proactive coding practices. Regular audits for CWE-776 weaknesses in code and configurations identify and remediate gaps before deployment.[13]