eDonkey network
The eDonkey network, also known as eD2k, is a hybrid peer-to-peer file-sharing system that relies on distributed index servers for search and metadata coordination while facilitating direct data exchanges between clients, optimized for reliable distribution of large files via multi-source downloads and content-addressed hashing.[1][2] Developed by MetaMachine Inc. as the backend for its eDonkey2000 client software in the early 2000s, the network employed the ed2k protocol, which used MD4-based hash identifiers for files and supported serverless fallback via the integrated Overnet Kademlia distributed hash table for enhanced decentralization and resilience.[3][4] This architecture addressed limitations of earlier pure peer-to-peer systems like Gnutella by reducing search overhead through server indexing, enabling scalability to millions of users and terabytes of shared content.[1] The network rapidly expanded, incorporating features such as user credits for upload contributions, preview capabilities for media files, and compatibility with open-source clients like eMule, which extended the protocol with encryption and advanced search filters to improve privacy and efficiency.[5] By the mid-2000s, it had become a dominant platform for file sharing, particularly for copyrighted music, movies, and software, though this prevalence of unauthorized distribution drew scrutiny from intellectual property enforcers.[1] In 2006, MetaMachine settled lawsuits from the Recording Industry Association of America (RIAA) by paying $30 million and agreeing to halt software distribution and disable its servers, aiming to dismantle the network's core infrastructure amid claims of inducing mass copyright infringement.[6][7] Despite these efforts, the eD2k protocol endured through community-maintained servers and clients, underscoring the challenges of fully eradicating decentralized systems reliant on voluntary participation rather than central control.[8]Technical Architecture
Protocol Fundamentals
The eDonkey protocol operates as a hybrid peer-to-peer file-sharing system, combining decentralized client interactions with a distributed network of indexing servers. Clients connect to these servers primarily via TCP for tasks such as login, search queries, and source discovery, while direct file transfers occur peer-to-peer between clients. Servers maintain indices of file hashes and associated client sources but do not store or transfer file data themselves, distributing the load to avoid single points of failure. This architecture, formalized around 2001–2002, supported millions of users by leveraging both centralized metadata management and decentralized content distribution.[1][9] File identification relies on a 128-bit MD4 hash computed from the entire file content, ensuring unique, content-based identifiers independent of filenames or metadata variations. Files are divided into fixed-size parts of approximately 9.28 MB, each with its own MD4 part hash, enabling partial downloads and verification. The ed2k URI scheme encodes this data in hyperlinks, formatted ased2k://|file|<encoded_filename>|<filesize>|<hex_hash>|/, facilitating direct sharing of file references via web browsers or applications integrated with eDonkey clients. This hashing mechanism underpins search resolution, where keyword queries yield matching hashes, followed by source lists for those hashes.[3][9][1]
Protocol messages follow a structured binary format with a 6-byte header: a 2-byte protocol identifier (0xE3 for core eDonkey, 0xC5 for eMule extensions), a 4-byte little-endian payload length, and a 1-byte message type code, often followed by type-length-value (TLV) encoded data. Client identification uses a 4-byte client ID (combining IP address components) or a 128-bit globally unique identifier (GUID) for low-ID clients behind NAT. UDP complements TCP for lightweight operations like server pings, callback requests, and distributed searches, with packets starting with similar identifiers but shorter headers. Ports typically include TCP 4662 for client-server and client-client links, and UDP 4665 for auxiliary traffic.[9]
Core operations emphasize efficiency in large-file handling: searches use server-side Boolean keyword matching to return file hashes and sources, while transfers employ a multisource model dividing files into 180 KB blocks for selective, credit-based queuing from multiple peers, enforcing minimum speeds around 2.4 KB/s. Servers facilitate inter-client discovery via source exchange packets, listing IPs and ports of available providers. This design prioritized resilience and scalability over pure decentralization, with UDP enhancements in later implementations (circa 2003–2005) enabling serverless source pings.[1][9]
File Identification and Linking
In the eDonkey network, files are uniquely identified by a 128-bit MD4 hash derived from the file's content, ensuring content-based addressing independent of filenames or metadata variations. This file identifier, often called the eDonkey hash or file ID, is computed by dividing the file into fixed 9 MB chunks (precisely 9,728,000 bytes), calculating an MD4 hash for each chunk (padding the last incomplete chunk with zeros if necessary), concatenating these chunk hashes, and then applying a final MD4 hash to the concatenation.[1][3][10] This hierarchical structure forms a simple Merkle-like tree, enabling efficient verification of file integrity during transfers and detection of duplicates across the network, as identical content yields the same hash with overwhelming probability.[1] To enhance robustness against corruption or partial downloads, the protocol incorporates AICH (Advanced Intelligent Corruption Handler) hashes, which use SHA-1-based block-level hashing organized in a redundant tree structure for error-tolerant verification without relying solely on the primary MD4 hashes.[9] AICH values are computed over smaller 184 KB blocks within chunks, providing redundancy that allows reconstruction of missing hash data and precise corruption localization during multi-source downloads.[9] While MD4 serves as the core identifier for indexing and searching, AICH operates as a secondary layer, transmitted optionally in protocol extensions to minimize overhead on legacy clients.[3] File linking occurs primarily through the ed2k URI scheme, a standardized hyperlink format that encodes essential file metadata for direct integration into client applications. The basic syntax ised2k://|file|<encoded_filename>|<file_size_in_bytes>|<file_hash>|/, where the filename is URL-encoded, size is decimal, and hash is the hexadecimal MD4 root hash.[11] Optional extensions include |hashset|<hashset_hash>| for full chunk hash lists or |sources|<encoded_sources>| listing available peers or servers, enabling clients to queue downloads, resolve sources, and initiate transfers upon parsing the link.[11] This scheme facilitates easy sharing via web forums, emails, or hyperlinks, as clients like eMule or MLDonkey register handlers to automatically process ed2k URIs and search servers or peers for matching files using the embedded hash.[9] The design prioritizes immutability and verifiability, reducing risks of misidentification from altered metadata.[1]