Fact-checked by Grok 2 weeks ago

Message-ID

The Message-ID is a standard header field in Internet email messages that provides a globally unique identifier for a specific version of a message, enabling distinct referencing in communications such as replies, threading, and message tracking. Defined in RFC 5322, it is optional but recommended, consisting of a string enclosed in angle brackets (< and >), with a syntax of msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS], where id-left and id-right are typically dot-atom-text resembling an local part and domain, respectively, to ensure uniqueness across systems. The identifier is generated by the originating host and must not be reused for other messages to avoid conflicts in processing. Introduced in 822 in 1982 as part of the standardization of Internet text messages, the Message-ID field built on earlier email formats like RFC 733 to support machine-readable referencing of message instances. Over time, it has evolved through updates in RFC 2822 (2001) and 5322 (2008), maintaining its core role while adapting to modern syntax rules, including folding whitespace (CFWS) and obsolescent forms for . Beyond email, the field is also standardized for use in network news (netnews) protocols, such as in 5536, where it similarly identifies articles for threading and archival purposes. In practice, the Message-ID facilitates key email functionalities, including the construction of conversation threads via the In-Reply-To and References headers, which directly reference prior Message-IDs to link related messages. It aids in duplicate detection, message archival, and forensic analysis by providing a persistent, host-generated token that traces a message's origin without revealing sensitive details. Compliance with its syntax is crucial for interoperability, as non-conforming IDs can lead to delivery issues or rejection by mail systems enforcing RFC 5322 standards.

Definition and Purpose

Definition

The Message-ID header field is a standard email header defined in RFC 5322 that provides a for a particular version of a message. It is specified as an optional header in the Internet Message Format, serving as a machine-readable to distinguish one email from others. This identifier functions as a globally unique string assigned to a single email message, enabling reliable tracking, threading in conversations, and referencing across systems. By ensuring uniqueness, typically through inclusion of the originating host's , it prevents collisions in email processing and supports features like reply chains via related headers such as In-Reply-To and References. The basic syntax consists of the header name followed by the identifier enclosed in angle brackets, such as <[email protected]>, where the unique-string portion is implementation-specific and the domain aids in achieving global uniqueness.

Purpose

The Message-ID header serves as a globally unique identifier for each email message, enabling precise tracking and management throughout its lifecycle in electronic mail systems. According to RFC 5322, which defines the Internet Message Format, the Message-ID provides a distinct reference for a specific version of a message, with its uniqueness guaranteed by the originating system, typically incorporating elements like a timestamp and domain name to avoid collisions. This core function addresses the need for reliable message identification in distributed networks, where messages may traverse multiple servers and clients. In email clients and mailing lists, the Message-ID facilitates message threading by allowing related emails to be grouped into coherent conversations. Email software uses it to link replies and forwards, often in conjunction with the In-Reply-To header, which references the parent message's ID, and the References header, which accumulates IDs from the entire reply chain. For instance, in mailing lists, this ensures that discussions remain organized, preventing fragmented views of ongoing threads and improving in tools like or list archives. On email servers, the Message-ID supports critical operational tasks, including duplicate prevention, detection, and archival indexing. Servers can compare Message-IDs to discard redundant copies of the same message, reducing storage overhead and delivery errors. In filtering, anomalies in Message-ID generation—such as reused or malformed IDs—can signal or bulk campaigns, aiding forensic analysis and blacklisting. For archiving, it enables efficient indexing and retrieval, allowing administrators to search and validate stored messages uniquely, as seen in solutions like MailStore Server. The Message-ID field originated in RFC 733 (1977) and was updated in RFC 822 (1982), which obsoleted the earlier standard and refined the syntax for better compatibility with evolving network addressing.

Format and Syntax

Structure

The Message-ID header in messages follows Message-ID: <local-part@domain>, where the entire value is enclosed in angle brackets to delineate the unique identifier from any surrounding comments or folding whitespace, as specified in RFC 5322 Section 3.6.4. This syntactic structure ensures the identifier is treated as a single, atomic unit within the header field, adhering to the Augmented Backus-Naur Form (ABNF) definition of msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS]. The local-part, corresponding to id-left in the ABNF, serves as the component and is dot-atom-text or obs-id-left for , with dot-atom-text preferred; it permits letters, digits, and a limited set of special characters such as !#$%&'*+-/=?^_{|}~`, but excludes spaces, control characters, and folding whitespace to maintain syntactic validity. This flexibility allows implementations to incorporate elements like timestamps or process identifiers within the local-part, provided they conform to the dot-atom rules and avoid reserved characters that could disrupt parsing. The component, or id-right, must be a valid expressed as either dot-atom-text or a no-fold-literal (such as a domain literal in square brackets), typically representing the sending to support global uniqueness. In practice, this is often the originating augmented with a , like msg.example.com, to isolate message identifiers from other services on the same . The full value must use only printable US-ASCII characters (codes 33-126) and cannot include quoted strings or embedded comments within the brackets.

Requirements

The Message-ID header field is optional but every message SHOULD include it to provide a unique identifier, as specified in RFC 5322. Originating SMTP servers MAY add the field if it is absent, while relay servers MUST NOT modify or add it; this applies in gateway scenarios interfacing non-SMTP systems with SMTP, ensuring compliance without unnecessary alteration by intermediate relays. A valid Message-ID must be globally unique, with no duplicates permitted across any messages generated by the same host, and this uniqueness is guaranteed by the originating system. The local-part of the Message-ID follows syntax similar to addr-spec, where modern usage employs dot-atom-text to avoid folding whitespace and ensure parsability, permitting letters, digits, and defined special characters while prohibiting unquoted spaces; certain special characters may require quoting in obsolete formats for compatibility. No explicit byte length limit is defined for the Message-ID itself, but it is subject to practical constraints from SMTP header folding rules, where entire header lines must not exceed 998 characters (excluding the CRLF line break). The domain part of the Message-ID is case-insensitive, following established DNS standards that treat domain names as case-preserving but case-insensitive for resolution and comparison.

Generation

Methods

Message-ID values are commonly generated by concatenating a , , and a random string to form the local part, followed by the generating host's , ensuring high probability of uniqueness within the domain. This approach, recommended in RFC 5322, leverages the local time and a unique host-generated identifier to minimize collision risks. For instance, a typical format might appear as <20231111120000.12345.example.com>, where 20231111120000 represents the , 12345 the , and example.com the . In some implementations, cryptographic hashes are applied to message content or metadata to produce the local part, providing a deterministic when timestamps or process IDs alone may not suffice. For example, the Notmuch email indexer generates missing Message-IDs using an hash of message elements to ensure database uniqueness. Domain-based approaches incorporate the responsible domain in the right-hand side of the Message-ID, often using a dedicated like id.example.com to identifiers from different services or systems, thereby reducing global collision risks while adhering to requirements. Modern methods may also use UUIDs (per RFC 4122) for the local part, such as <[email protected]>, offering high uniqueness without relying on system-specific details like timestamps or PIDs. Various software libraries and servers implement these techniques. In , the email.utils.make_msgid generates compliant IDs by combining a timestamp, process ID, random elements, and the local hostname (or specified domain). Java's javax.mail.internet.MimeMessage class automatically adds a Message-ID via its updateMessageID method if absent, allowing customization for uniqueness. The Postfix mail server adds missing Message-IDs using the message's queue ID prefixed to the hostname, such as <queueID@myhostname>.

Best Practices

To ensure robust and secure generation of Message-IDs in email systems, incorporating high-entropy random elements into the local part of the identifier is essential. This approach prevents predictability, thereby mitigating risks such as adversaries anticipating or forging identifiers based on patterns like sequential numbering or timestamps alone. High-entropy randomness, such as cryptographically secure pseudo-random numbers combined with timestamps or process IDs, helps maintain global uniqueness without relying solely on deterministic components. For hashing methods, use secure algorithms like SHA-256 instead of deprecated ones such as MD5 or SHA-1. Using a (FQDN) in the right-hand side of the Message-ID is a critical practice to avoid local collisions, particularly in multi-server environments where multiple hosts might generate IDs independently. The specifies that the domain portion should be a valid under the generator's control, ensuring no overlap with external domains and facilitating reliable threading and deduplication across distributed systems. In high-volume systems processing millions of messages daily, testing for uniqueness through periodic audits—such as logging Message-IDs to a database and querying for duplicates—can help detect and resolve collisions early. These audits can involve sampling recent IDs against historical records to verify the generation mechanism's effectiveness over time. For compatibility with legacy systems, Message-IDs should be generated to conform to both RFC 822 and the updated RFC 5322 standards, prioritizing the latter's stricter syntax while avoiding obsolete elements like comments or folding whitespace. This ensures seamless in mixed environments without requiring separate fallback logic. Message-ID generators should use abstract unique tokens compliant with allowed characters (e.g., A-Z, a-z, 0-9, !#$%&'*+-/=?^_`{|}~.).

Standards and Usage

Relevant RFCs

The Message-ID header was first defined in RFC 822, published in August 1982, as an optional field providing a for a specific version of a message. This identifier takes the form of msg-id = "<" addr-spec ">", where addr-spec consists of a local-part followed by "@" and a , ensuring machine-readable uniqueness guaranteed by the generating without human interpretability. RFC 2822, published in April 2001, obsoleted RFC 822 and refined the Message-ID syntax and semantics for Internet . It specifies the format as msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS], where id-left and id-right are restricted to forms like dot-atom-text or quoted strings, emphasizing global uniqueness to prevent identification conflicts across systems. The document recommends incorporating the sender's domain in id-right and a or in id-left to achieve this uniqueness. The current standard, RFC 5322 from October 2008, further obsoletes RFC 2822 while introducing no major functional changes to the Message-ID itself, though it tightens syntax by disallowing folding whitespace (CFWS) and obsolete forms within the identifier. It mandates that every message SHOULD include a Message-ID field, limited to one occurrence, with the syntax msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS] and id-left or id-right using dot-atom-text, no-fold-literal, or obsolete variants. Clarifications include stricter quoting rules, prohibiting quoted-pairs inside the msg-id for modern conformance. Related standards extend Message-ID usage. RFC 6532, published in February 2012, enables internationalized email headers by allowing encoding in Message-IDs, including non-ASCII characters in domains (id-right), though it advises preferring ASCII for in threading. Similarly, RFC 3461 from January 2003 defines delivery status notifications (DSNs) that reference envelope identifiers, such as the Original-Envelope-ID, for tracking delivery failures.

Implementation in Protocols

In the (SMTP), as defined in 5321, the Message-ID header is transmitted as part of the message content during the DATA command in email and delivery processes. Originating SMTP servers may add a Message-ID header to a message if one is absent, to support , while intermediate servers must not modify existing Message-ID fields or add new ones unless performing gatewaying across different mail environments, where header rewriting may occur. During , the Message-ID can optionally appear in the "ID" clause of a "Received" trace header added by the receiving server, aiding in debugging message paths without altering the original identifier. In the Network News Transfer Protocol (NNTP) for , outlined in RFC 3977, the Message-ID serves a parallel role to by providing a for articles, ensuring no duplicates across server-handled content. Servers synthesize a Message-ID if one is missing from an incoming article during commands like POST or IHAVE, and transmit it unchanged in responses to retrieval commands such as ARTICLE or HEAD, supporting global uniqueness for article distribution and reference in threading via References headers. Email retrieval protocols like IMAP (RFC 3501) and POP3 (RFC 1939) integrate Message-ID primarily through client-side processing, though server capabilities vary. In IMAP, clients leverage the extension (RFC 5256) to use Message-ID values from References and In-Reply-To headers for server-side searching and constructing conversation threads, enabling efficient retrieval of related messages without full downloads. For POP3, which lacks native search or threading, clients retrieve full messages via RETR and parse the Message-ID header locally to enable features like duplicate detection or manual threading, often using the server's UIDL command for session-based uniqueness separate from the email's Message-ID. In gateway scenarios bridging to non-email systems, such as HTTP-based mail (e.g., or integrations), the Message-ID must be preserved in payloads to maintain continuity across protocols. 5321 specifies that gateways may rewrite headers during cross-environment transfers but recommends retaining the original Message-ID where possible to avoid breaking threading or tracking in downstream systems like interfaces. Compliance tools like SpamAssassin enforce Message-ID integrity by scanning for malformed or missing headers during processing. The MISSING_MID rule flags messages lacking a valid Message-ID, indicating potential misconfiguration, while INVALID_MSGID detects non-conformant formats per RFC 5322, contributing to scoring without altering transmission flows.

Issues and Considerations

Uniqueness Challenges

Ensuring global uniqueness of Message-IDs is a core requirement under RFC 5322, which mandates that each identifier be distinct across all email systems without centralized coordination. This decentralized generation, often relying on local elements like timestamps, process IDs (PIDs), and random components combined with a , introduces inherent risks of collisions when billions of messages are produced daily by uncoordinated software instances. Collision risks stem primarily from clock skew, PID reuse, and suboptimal random number generation. Clock skew in distributed email infrastructures can result in identical timestamps across servers, leading to duplicate IDs if other components like PIDs align similarly. PID reuse occurs in operating systems where identifiers are recycled after process termination, potentially causing conflicts in high-frequency email generation scenarios if a mail process restarts rapidly without sufficient safeguards. Poor random number generators exacerbate this by producing predictable sequences, reducing the effective uniqueness space in the local part of the ID. These issues can manifest as duplicate identifiers for distinct messages, violating the global uniqueness guarantee. High-volume senders, such as providers, face amplified scale challenges in generating billions of unique IDs without a central . Without robust local mechanisms, the vast output—potentially exceeding 100,000 messages per day in environments—heightens collision probabilities, as even minor flaws in generation logic propagate globally. Detection of duplicates typically involves hash-based checks on the Message-ID or probabilistic assessments informed by the birthday paradox, which estimates collision likelihood in large sets. For example, hashing the ID allows efficient comparison to identify matches across datasets, while birthday paradox calculations highlight that in a space of roughly 2^32 possible short IDs, collisions become probable after approximately 77,000 messages. The EDRM Message ID standard facilitates cross-platform duplicate detection by normalizing and hashing the ID for reliable identification. Documented cases illustrate the impacts, including threading failures in clients like and . In one instance, emails generated by 2003 resulted in duplicate Message-IDs, causing messages to vanish from threading views in receiving systems. Such duplicates can also trigger drops by SMTP servers, as seen when identical IDs lead to rejection of subsequent deliveries, disrupting flows. In forensic and e-discovery contexts, these failures complicate message reconstruction and duplicate suppression.

Privacy Implications

The Message-ID header in messages can inadvertently leak sensitive about the sender's and timing of transmission. The local-part of a Message-ID, which precedes the "@" symbol, is often generated by mail transfer agents (MTAs) or user agents using components such as timestamps, process identifiers, or to ensure uniqueness. For instance, formats like those produced by certain MTAs may embed the exact time of message creation or the originating server's , potentially revealing operational details of the sender's or precise send times that could correlate with activity. Predictable patterns in Message-ID generation also pose fingerprinting risks, enabling the tracking of users across multiple messages or sessions. When an or server employs consistent formatting—such as specific delimiters, sequential numbering, or vendor-specific encodings—recipients or intermediaries can infer the , , or even the sending device. This allows for behavioral , where an adversary reconstructs communication patterns or links messages to a single source, even if the body is encrypted. In forensic contexts, these patterns have been used to attribute emails to particular systems or organizations. Under privacy regulations like the EU's (GDPR) and the (CCPA), Message-IDs may qualify as if they contain or enable identification of individuals through linkage with other . The European Data Protection Supervisor (EDPS) classifies email headers, including Message-ID, as traffic data that can process personal information, necessitating lawful bases for handling, data minimization, and safeguards against unauthorized access. Non-compliance, such as retaining Message-IDs with embedded identifiers without consent, can lead to fines, as these elements contribute to or risks. Similarly, CCPA requires businesses to disclose and limit the sale of such identifiers in email communications. To mitigate these privacy implications, anonymization techniques focus on obfuscating or randomizing Message-ID components in privacy-enhanced email clients. Services like employ and aliasing features, generating Message-IDs on their controlled domains without exposing user-specific timestamps or hostnames, while allowing users to route traffic via VPNs or for IP anonymity. In -based email setups, such as accessing providers through onion services, ephemeral or temporary domains can be used to create disposable Message-IDs that avoid persistent identifiers, combined with obfuscated strings to prevent pattern-based fingerprinting. Additionally, configurable MTAs can substitute neutral domains in the Message-ID's domain part, reducing hostname leakage without violating standards.

References

  1. [1]
    RFC 5322: Internet Message Format
    Summary of each segment:
  2. [2]
  3. [3]
    Importance of email header and its compliance to RFC standards
    May 15, 2025 · The email header enables proper sorting and timely delivery. The header contains details that enable email servers to identify the sender and receiver.
  4. [4]
    Frequent email contact suddenly gets "rfc822" failure message every ...
    Feb 13, 2012 · A frequent email contact suddenly fails with an rfc822 error in Outlook, but works via Comcast or Gmail directly. The error is 5.1.0.
  5. [5]
    Email Headers: What can they tell the forensic investigator? - Alyn, Inc.
    Nov 10, 2018 · The Message-ID is another good place to identify spoofing. The Message-ID is a unique identifier of digital messages and is difficult to ...Missing: detection prevention archival<|control11|><|separator|>
  6. [6]
    Searching by Message ID - MailStore Server Help
    Sep 22, 2025 · Administrators are able to search for messages by message id. This helps to validate whether a specific mail has been archived successfully.Missing: indexing | Show results with:indexing
  7. [7]
  8. [8]
  9. [9]
  10. [10]
  11. [11]
  12. [12]
  13. [13]
    CC TB mochitest+valgrind uncovered uninitialized memory access.
    BTW, it seems that this MD5 calculation is invoked for missing message-ID header case to generate message ID. This means that A MALICIOUS MAIL SERVER CAN ...
  14. [14]
  15. [15]
  16. [16]
  17. [17]
  18. [18]
    [PDF] Message-Id Based Automatic Phishing Detection
    This paper presents a novel approach to detect phishing emails, which is simple and effective. It leverages the unique characteristics of the Message-ID field ...
  19. [19]
    Message-ID Syntax | Word to the Wise
    Aug 12, 2025 · A Message-ID starts with `<` followed by a dot-atom-text, then `@`, then another dot-atom-text, and ends with `>`, with optional whitespace and ...
  20. [20]
    RFC 822 - STANDARD FOR THE FORMAT OF ARPA INTERNET ...
    1. MESSAGE-ID / RESENT-MESSAGE-ID This field contains a unique identifier (the local-part address unit) which refers to THIS version of THIS message. · 2. IN- ...
  21. [21]
    RFC 2822 - Internet Message Format - IETF Datatracker
    This standard specifies a syntax for text messages that are sent between computer users, within the framework of electronic mail messages.
  22. [22]
  23. [23]
    RFC 5322 - Internet Message Format - IETF Datatracker
    RFC 5322 specifies the Internet Message Format (IMF), a syntax for text messages sent between computer users in electronic mail.RFC 6854 · RFC 5321 · RFC 2822
  24. [24]
  25. [25]
  26. [26]
  27. [27]
  28. [28]
  29. [29]
  30. [30]
  31. [31]
  32. [32]
  33. [33]
  34. [34]
    How likely are collisions of timestamp-based identifiers? [closed]
    Dec 26, 2015 · The probability of a collision is 1 in 23. On average, you'll have a collision every 23 seconds. But it's worse than that.Missing: PID | Show results with:PID
  35. [35]
    Will process ids be recycled? What if you reach the maximal id?
    Dec 13, 2011 · The only requirements on process IDs are: A process ID shall not be reused by the system until the process lifetime ends. In addition, if there ...PID reuse possibility in LinuxKeeping a PID from being reused by mounting /proc/[pid]/ns/pid?More results from unix.stackexchange.comMissing: collisions skew<|separator|>
  36. [36]
    [PDF] Email 'Message-IDs' helpful for forensic analysis?
    Dec 3, 2008 · $i is referred as a queue id. It is generated with a special algorithm. Queue id has three different formats with respect to sendmail versions.
  37. [37]
    Exchange Online's New High Volume Email (HVE) Solution
    Jul 30, 2024 · The Exchange Online High-Volume Email (HVE) solution handles up to 100000 internal messages and 2000 external messages daily.
  38. [38]
    Introducing the EDRM Message ID Hash: Simplify Cross-Platform ...
    Sep 13, 2023 · The EDRM Message Identification Hash (MIH) rejects the complexity of message duplicate identification without sacrificing effectiveness. It does ...
  39. [39]
    Calculating minimum number of messages hashed a 50 ...
    Mar 10, 2021 · You have to hash approximately 1420 messages to get a 50% probability of having a collision. The solution doesn't explain how they arrive at this.Why does Birthday attack work only with random messages and not ...How do hashes really ensure uniqueness?More results from crypto.stackexchange.com
  40. [40]
    Problem with duplicate message_id in email - EspoCRM Forum
    Jun 28, 2021 · When clients use Outlook 2003 for sending, sometimes the message_id is duplicated and the message does not show up in espo. yuri could you help?Missing: documented | Show results with:documented
  41. [41]
    Outlook and Gmail "Conversations" are broken by Hubspot emails
    May 25, 2022 · We noticed that responses from the Conversation inbox in HubSpot "break the chain" for customers that are writing to us from Outlook/Gmail.
  42. [42]
    What happens when receiving email with a Message-ID header ...
    Oct 26, 2023 · If another message arrives with the same Message-ID header, SMTP transport behavior should be to drop that message and not deliver it.Understanding Microsoft Message-ID Email HeaderDifference in MessageId Structure from Microsoft 365 ExchangeMore results from learn.microsoft.com
  43. [43]
    Find Duplicate Outlook Emails If They Were Not Detected As ...
    Duplicates can be identified in different ways, for example, by comparing Message-ID or hash values of other message fields. We must remember that emails with ...
  44. [44]
    Understanding Email Headers in Digital Forensics
    Apr 24, 2024 · Construction of Message ID: Typically combines the current date/time with unique system identifiers like a process ID or domain name. Detection: ...
  45. [45]
    Message-ID Forensics Analyzer for In-Depth Forensic Analysis
    Jun 11, 2025 · The above discussion reveals that message ID in the email header plays an important role in the message id forensics investigation process.
  46. [46]
  47. [47]
    [PDF] Guidelines on personal data and electronic communications in the ...
    • Message-ID: • To: • Subject: • Bcc: • Cc: • Content-Type: • Sender: 22. Conservation periods: personal data must not be kept for longer than is necessary for.
  48. [48]
    How Email Metadata Undermines Privacy: 2025 Guide - Mailbird
    Oct 31, 2025 · Email metadata exposes your location, communication patterns, and relationships even when messages are encrypted.
  49. [49]
    How to send an anonymous email - Proton
    May 26, 2023 · By using Proton Mail with aliases and a VPN or Tor, you can send a message with a high degree of anonymity. Learn more about what Proton Mail ...
  50. [50]
    Do not expose local hostname in Message-ID header - mutt - GitLab
    Aug 19, 2023 · By default, the local hostname is revealed in the Message-ID header. This poses two problems: I suggest using the hostname of the From: address.
  51. [51]
    How CTemplar Over Tor Makes Your Email More Anonymous and ...
    CTemplar is a private email service that you can connect to anonymously using the Tor browser and not worry about being traced via IP or some other way.<|control11|><|separator|>