Fact-checked by Grok 2 weeks ago

Email address

An email address is a unique identifier for a mailbox in the Internet's electronic mail system, consisting of a local-part (specifying the recipient on the host), the "@" symbol, and a domain (identifying the host or server).^[1] The local-part may include letters, numbers, and certain special characters, either as a dot-atom (e.g., "user.name") or a quoted-string for more complex cases, while the domain follows hostname conventions with subdomains separated by dots (e.g., "example.com").^[1] This structure ensures precise routing and delivery of messages across the global network.^[1] The concept of the email address emerged in 1971 when Ray Tomlinson, working on the ARPANET, developed a program to send messages across distributed computers, selecting the "@" symbol to distinguish the user from the host machine.^[2] By 1973, email accounted for about 75% of ARPANET traffic, highlighting its rapid adoption among researchers.^[2] The format was first standardized in RFC 822 in 1982, which defined the addr-spec syntax as local-part "@" domain and introduced hierarchical domains for broader scalability.^[3] Subsequent updates in RFC 2822 (2001) and RFC 5322 (2008) refined the syntax for clarity and compatibility, while prohibiting obsolete elements like source routes.^[1] In the Internet Mail Architecture, email addresses function as globally unique identifiers that enable spontaneous end-to-end communication without prior setup, appearing in SMTP commands for envelope routing (e.g., MAIL FROM and RCPT TO) and in message headers (e.g., From:, To:) for content association.^[4] They extend beyond mere delivery to serve as persistent online identities for services like authentication and notifications.^[4] To accommodate global users, internationalized email addresses supporting non-ASCII characters in both local-parts and domains were specified starting in RFC 6530 (2011), with UTF-8 encoding for broader linguistic inclusion.^[5]

Role in Email Communication

Definition and Purpose

An email address is a unique string that identifies the recipient of an electronic mail message within the Internet's messaging framework, serving as a specific identifier for a mailbox on a host computer.^[6] It typically follows the format of a local-part followed by an "@" symbol and a domain, enabling precise targeting of messages to individual users or shared mailboxes.^[7] The primary purpose of an email address is to facilitate the routing and delivery of messages across interconnected networks, supporting both one-to-one correspondence and one-to-many distributions such as mailing lists.^[8] Beyond message transport, it functions as a foundational digital identity, commonly used for user authentication, account registration, subscription to services like newsletters, and integration with other online systems.^[8] Email addresses originated in the early 1970s as part of the ARPANET, the precursor to the modern Internet, where engineer Ray Tomlinson developed the first networked email system in 1971 by extending existing programs to allow inter-host messaging.^[9] This innovation quickly evolved into a global standard for internet-based electronic communication, standardizing user addressing across diverse systems.^[10] Unlike telephone numbers, which primarily enable voice or short message services, or IP addresses, which identify network devices for data routing, email addresses specifically target human users or virtual mailboxes for asynchronous text-based exchange.^[11]

Message Transport Usage

Email addresses play a central role in the Simple Mail Transfer Protocol (SMTP), the standard for transporting email messages across the internet. In an SMTP transaction, the sender's email address is specified using the MAIL FROM command, which defines the reverse-path for error notifications and delivery reports.^[12] Similarly, each recipient's email address is indicated via the RCPT TO command, establishing the forward-path to guide message delivery.^[13] These commands form the SMTP envelope, which encapsulates the routing information separate from the message content itself.^[14] The routing process relies on the domain portion of the email address to determine the appropriate mail server. When an SMTP server receives a message, it resolves the recipient's domain through DNS MX (Mail Exchanger) records to identify the target server for relay or final delivery.^[15] The local-part of the address then specifies the individual mailbox on that server, enabling precise delivery.^[13] A key distinction exists between the transport envelope and the message headers. The envelope addresses (from MAIL FROM and RCPT TO) are used exclusively for routing and are not visible to end users, whereas header fields like From: and To: serve display and informational purposes within the email client.^[16] This separation ensures that routing remains efficient and independent of the message's visible content, such as in cases of blind carbon copies where recipients are not listed in headers.^[17] If an email address proves undeliverable during transport, the SMTP server generates error responses and bounce messages. For instance, a 550 reply code indicates a permanent failure, such as an invalid or non-existent recipient, prompting the sending server to notify the original sender via the reverse-path.^[18] These bounce messages, often containing diagnostic details, are sent back to the MAIL FROM address to inform the sender of the issue.^[19]

Syntax and Components

Local-part

The local-part of an email address is the portion preceding the "@" symbol, which specifies the recipient's mailbox or alias on the mail server indicated by the domain.^[7] It serves to uniquely identify the user within that specific domain, allowing for flexible naming conventions determined by the receiving server.^[20] According to RFC 5322, the syntax for the local-part is defined as a dot-atom, a quoted-string, or an obsolete local-part form (obs-local-part).^[7] The dot-atom consists of one or more dot-atom-text elements separated by dots, where dot-atom-text includes letters (a-z and A-Z), digits (0-9), and the special characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~, but it cannot begin or end with a dot, nor contain consecutive dots.^[21] The quoted-string format encloses content in double quotes, permitting a broader range of ASCII characters (excluding CR and LF) through escaped quoted-pairs, such as backslash-escaped specials or spaces.^[22] Obsolete forms, retained for backward compatibility, allow additional structures like unquoted spaces or other legacy characters, though modern implementations favor the standard dot-atom and quoted-string.^[23] The maximum length of the local-part is 64 octets, as specified in RFC 5321 for SMTP compliance, ensuring compatibility across mail transfer agents.^[24] Regarding case sensitivity, RFC 5321 mandates that the local-part be treated as case-sensitive, requiring SMTP servers to preserve its casing during transmission.^[25] However, many email providers, such as those implementing common extensions, treat it as case-insensitive for delivery purposes to improve user experience and reduce errors.^[26] Common formats for the local-part include simple alphanumeric usernames (e.g., user), dotted variants for substructure (e.g., user.name), and plus-addressing extensions (e.g., user+tag), where the plus sign and following tag are valid per RFC 5322 and often used by providers like Gmail for filtering or disposable aliases.^[21] Server-specific quoting enables inclusion of spaces or other restricted characters, such as "user name" or "user with space", by wrapping in double quotes and escaping as needed.^[22] These formats enhance flexibility while adhering to the core syntax rules.

Domain

The domain part of an email address is the segment following the "@" symbol, which specifies the destination mail server or organization for message delivery. It typically consists of a fully qualified domain name (FQDN), such as "example.com," or an IP address literal, ensuring the email can be routed accurately within the internet mail system.^[27] The syntax of the domain adheres to rules outlined in RFC 5321 and aligns with DNS hostname specifications in RFC 1035. It comprises one or more labels separated by periods, where each label includes only letters (a-z, A-Z), digits (0-9), and hyphens (-), with hyphens not permitted at the start or end of a label and no underscores allowed in standard domain names. The entire domain must not exceed 255 octets in length to maintain compatibility with SMTP transport limits.^[27]^[28] To resolve the domain for email routing, the sending SMTP server queries the Domain Name System (DNS) for MX (Mail Exchanger) records associated with the domain, as defined in RFC 5321 and detailed in RFC 974. These records list the preferred mail servers, ordered by a numeric preference value (lower values indicating higher priority), allowing selection of the optimal server for delivery. In the absence of MX records, the server falls back to querying A (IPv4) or AAAA (IPv6) records to obtain the domain's IP address directly.^[27]^[29] Domain literals provide an alternative to FQDNs by embedding IP addresses directly in the email address, enclosed in square brackets to distinguish them from domain names. For IPv4, this appears as [192.0.2.1]; for IPv6, it uses the format [IPv6:2001:db8::1], supporting literal resolution without DNS involvement, though such usage is deprecated in modern systems for security reasons.^[27] Domains incorporating non-ASCII characters, known as Internationalized Domain Names (IDNs), are represented in Punycode (xn-- prefix) to ensure ASCII compatibility during transmission, with full details on encoding provided in RFC 3490.

Sub-addressing

Sub-addressing, also known as plus-addressing or tagged addressing, is an extension to the local-part of an email address that allows users to append optional tags using specific delimiters, enabling emails to be routed to the same mailbox without requiring a separate account. For instance, an email sent to [email protected] is delivered to the primary mailbox associated with [email protected], as the receiving server interprets the tag after the delimiter and strips it during processing.^[30]^[31] The most common delimiter is the plus sign (+), which is supported by major providers such as Gmail and Microsoft Exchange Online, where it separates the base local-part from the tag. Other delimiters include the hyphen (-), used by some systems like certain spam filtering services, and the pipe (|), which is less commonly implemented across providers. These delimiters are permitted within the local-part syntax as defined by RFC 5322, but their interpretive handling for sub-addressing is implementation-specific and not mandated by the standard.^[32]^[33] Common use cases for sub-addressing include organizing incoming mail by category, such as directing messages to [email protected] for professional correspondence or [email protected] for e-commerce notifications, thereby facilitating automated filtering rules. It also enables tracking the origin of email sign-ups, for example, by using [email protected] to identify which services might be sources of spam or data breaches. Additionally, users create temporary aliases for one-time purposes, like online registrations, to enhance privacy without exposing the primary address.^[34]^[35] Support for sub-addressing varies significantly among email providers and mail transfer agents, as it is not standardized in RFC 5322 and relies on server-side configuration to recognize and process the delimiters by stripping them along with the tag before final delivery. While widely implemented in consumer services like Gmail, Outlook.com, and Proton Mail, enterprise systems or older infrastructures may not support it, potentially causing delivery failures if the tag is not handled.^[32]^[36] Limitations of sub-addressing include its inconsistent treatment regarding case sensitivity, where tags are generally ignored in case comparisons since the base local-part's sensitivity is domain-dependent, but most modern providers treat the entire local-part as case-insensitive in practice. Furthermore, the feature can be vulnerable to abuse in spam filtering scenarios, as attackers might leverage varying provider support to generate multiple aliases and bypass blacklists or rate limits, though it is more commonly employed by legitimate users to detect and mitigate unwanted mail.^[30]^[32]^[37]

Examples

Valid Email Addresses

Valid email addresses must adhere to the syntax rules defined in RFC 5322, which specifies the permissible structures for the local-part and domain to ensure proper parsing and routing in Internet mail.^[1] Basic valid examples demonstrate straightforward formats using alphanumeric characters in the local-part and a simple domain name. For instance, [email protected] is valid because the local-part "user" consists solely of allowed letters, and the domain "domain.com" follows the dot-atom structure with periods separating label sequences of permitted characters.^[1] Similarly, [email protected] is acceptable, as the local-part incorporates dots to separate components without leading, trailing, or consecutive periods, while the domain uses hierarchical labels connected by dots, all within the atext character set (letters, digits, and specific symbols).^[1] The local-part supports quoting to include spaces or other non-standard characters. An example is "user name"@domain.com, where double quotes enclose the local-part to allow the embedded space, adhering to the quoted-string production in the standard.^[1] To illustrate the full range of special characters permitted without quoting, !#$%&'*+-/=?^_{|}~@domain.com` is valid, as each symbol belongs to the atext set defined for unquoted local-parts, enabling robust handling of diverse identifiers up to 64 octets in length.^[1]^[38] Sub-addressing extends functionality within the local-part syntax. For example, [email protected] is syntactically correct, since the plus sign (+) is an allowed atext character, allowing the tag to augment the base local-part without violating length limits or character restrictions.^[1] Domain variations further highlight flexibility in addressing. The address user@[IPv6:2001:db8::1] uses a domain literal enclosed in brackets to specify an IPv6 address directly, bypassing DNS resolution as permitted for transport scenarios.^[1] Additionally, [email protected] is valid, with the domain incorporating hyphens within labels, as hyphens are part of the permitted characters and conform to the overall domain length constraint of 255 octets.^[1]^[28] These examples reflect the core syntax rules for local-parts and domains, providing a foundation for compliant email construction.^[1]

Invalid Email Addresses

Invalid email addresses are those that violate the syntactic rules defined for the Internet Message Format, primarily outlined in RFC 5322, which specifies the structure of an addr-spec as a local-part followed by "@" and a domain.^[7] These violations prevent proper parsing and transport in email systems, leading to rejection during validation or delivery attempts. Note that some examples below are syntactically valid but practically invalid due to real-world constraints like DNS resolution or system compatibility. Common issues arise from missing components, improper character usage, or exceeding length constraints, as detailed in standards like RFC 3696, which imposes practical limits on address components to ensure compatibility with SMTP protocols.^[38] One frequent syntax error is the absence of a domain after the "@" symbol, as in "user@", which fails because the addr-spec requires a non-empty domain following the separator.^[7] Similarly, while "user@domain" is syntactically valid as a single-label domain per RFC 5322, it is practically invalid because it lacks a top-level domain (TLD) required for DNS resolution in Internet mail systems.^[7] Another basic violation occurs with unquoted spaces in the local-part, such as "user [email protected]", since spaces are not permitted in dot-atom form without enclosing quotes, and quoted-strings must properly escape such characters.^[21] More explicit syntax violations include multiple "@" symbols, like "user@@domain.com", which contravenes the single-separator rule in the addr-spec definition, allowing only one "@" between local-part and domain.^[7] Consecutive dots in the domain, as in "[email protected]", are prohibited because the dot-atom syntax mandates at least one atext character (letters, digits, or specified specials) between dots.^[21] Addresses exceeding 254 characters in total length, such as a contrived local-part of 200 characters followed by a long domain, are invalid due to SMTP command length restrictions clarified in errata for RFC 3696 and aligned with RFC 5321's path limits. Deprecated or non-standard forms further illustrate invalidity under modern rules. For instance, a domain starting with a dot, like "[email protected]", violates the dot-atom requirement that labels begin with atext, not a period, as obsolete syntax allowing leading dots has been prohibited.^[39] Although "[email protected]" is syntactically valid since digits are allowed in atext, numeric-only local-parts are non-standard in many legacy systems and may fail delivery in contexts enforcing alphanumeric requirements for mailboxes.^[21] Additionally, the inclusion of comments, like "user(comment)@domain.com", is invalid in current addr-spec syntax, as RFC 5322 explicitly prohibits comments within local-parts or domains to avoid parsing ambiguities, obsoleting their use from earlier standards.^[39]

Invalid Example	Reason for Invalidity	Relevant RFC Reference
user@	Missing domain after "@"	RFC 5322, Section 3.4.1^[7]
user@domain	Lacks TLD (syntactically valid but practically invalid for DNS resolution)	RFC 5322, Section 3.4.1^[7]
user [email protected]	Unquoted space in local-part	RFC 5322, Section 3.2.3^[21]
user@@domain.com	Multiple "@" symbols	RFC 5322, Section 3.4.1^[7]
[email protected]	Consecutive dots in domain	RFC 5322, Section 3.2.3^[21]
[Very long address exceeding 254 chars]@domain.com	Exceeds total length limit	RFC 3696 Errata
[email protected]	Leading dot in domain	RFC 5322, Section 4^[39]
user(comment)@domain.com	Comments not allowed in addr-spec	RFC 5322, Section 4^[39]

Internationalized Email Addresses

Internationalized email addresses incorporate non-ASCII characters from various scripts and languages, enabling users worldwide to employ native writing systems in both the local-part and domain components. These addresses conform to standards that extend traditional ASCII-based email syntax, allowing Unicode characters while maintaining compatibility with existing infrastructure. For instance, domains with accented or non-Latin characters are encoded using the Internationalizing Domain Names in Applications (IDNA) protocol, which converts them to Punycode for DNS resolution.^[40] A common example involves an IDNA domain, such as user@exämple.com, where the domain "exämple.com" is represented in Punycode as xn--exmple-cua.com to ensure ASCII compatibility in the Domain Name System (DNS). This format supports internationalized domain names (IDNs) by mapping Unicode labels to ASCII-compatible encoding (ACE) strings prefixed with "xn--". Similarly, an ASCII local-part paired with a non-Latin domain, like user@dömäin.tld, uses the Punycode equivalent xn--dmin-5qa.tld for the domain, demonstrating mixed-language support in email routing.^[40] The local-part can also include Unicode characters when the Simple Mail Transfer Protocol (SMTP) server supports the SMTPUTF8 extension, which permits UTF-8 encoding throughout the email transmission process. For example, café@domain.com or é[email protected] are valid under this extension, as it expands the allowable characters in the local-part beyond ASCII while preserving quoted or bracketed structures from earlier standards. Without SMTPUTF8, such addresses may fail delivery, as legacy systems expect ASCII-only local-parts.^[41] Fully internationalized addresses combine Unicode in both parts, such as π@δóμäïň.com, where the local-part uses the Greek letter pi (π) and the domain incorporates accented Latin characters along with Greek delta (δ). The domain resolves via Punycode as xn--nxad5e.com, and the entire address requires SMTPUTF8 for transport to handle the non-ASCII local-part. Another illustration is 您好@example.com, featuring Chinese characters in the local-part (U+60A8 U+597D), which is supported in contexts like X.509 certificates for email verification. These examples highlight how internationalized addresses facilitate global communication but depend on end-to-end UTF-8 support to avoid downgrading or rejection.^[41]

Validation and Verification

Syntax Validation

Syntax validation of an email address involves verifying its format against established standards, such as those defined in RFC 5322, without performing any network queries or existence checks. This process ensures the address adheres to syntactic rules for the local-part (before the @ symbol) and the domain (after the @ symbol), focusing on character sets, lengths, and structural elements. The primary goal is to identify malformed addresses early, preventing errors in applications like user registration or data entry forms. Regex-based validation is a common approach, using regular expressions to match the complex patterns outlined in RFC 5322. For the local-part, which can include up to 64 characters of letters, digits, and special symbols like dots (.), hyphens (-), and quoted strings for unusual characters, a comprehensive regex might incorporate escaped characters and domain literals (e.g., [IPv4-address]). The domain portion requires patterns for dot-separated labels, each consisting of 1-63 characters from letters, digits, and hyphens, excluding leading or trailing hyphens. An example regex for basic validation could be ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$, but more robust implementations account for RFC 5322's allowances like comments (in parentheses) and folding whitespace, though these are rarely used in practice. Such patterns are derived directly from the RFC's ABNF (Augmented Backus-Naur Form) grammar for the addr-spec production rule. Algorithmic checks provide an alternative or complementary method, parsing the address step-by-step rather than relying on a single regex. This begins by locating the @ symbol, ensuring exactly one occurrence and that it is neither at the start nor end of the string. The local-part is then validated for length (up to 64 octets) and permissible characters, including checking for properly quoted sections if present (e.g., "user name"@example.com). For the domain, the string is split by dots to verify each label's length (1-63 characters) and composition, confirming it ends with a top-level domain of at least two characters and disallowing consecutive dots or dots at the beginning or end. These checks align with RFC 5321 for SMTP envelope syntax but are applied locally without transmission. Tools implementing this often use state machines or recursive descent parsers for accuracy. Programming libraries and tools facilitate syntax validation in various languages, balancing strict adherence to standards with practical usability. In Python, the email.utils module's parseaddr function or the validate_email package from PyPI performs checks based on RFC 5322, returning structured components or raising exceptions for invalid formats; it supports both strict mode (rejecting non-ASCII without quoting) and lenient mode (accepting common real-world variations). Similarly, Java's javax.mail.internet.InternetAddress class validates via its constructor, throwing AddressException for syntax errors and offering options for lenient parsing to handle legacy or internationalized addresses. Strict parsing ensures compliance but may reject valid yet uncommon formats like those with comments, while lenient approaches improve user experience by accepting 99% of practical addresses at the cost of potential false positives. Pros of library use include built-in handling of edge cases and updates for standard revisions, whereas cons involve dependency on specific implementations that might not cover all RFC nuances. Common pitfalls in syntax validation arise from oversimplification or misunderstanding of the standards. A frequent error is using basic regex patterns like ^[\w\.-]+@[\w\.-]+\.[\w]{2,}$, which fail to handle quoted local-parts (e.g., "O'Brien"@example.com) or international characters without proper encoding, leading to rejection of valid addresses. Another issue is ignoring domain length limits or allowing invalid top-level domains, as domains must conform to DNS rules where labels avoid certain reserved characters. Additionally, validators might overlook the distinction between display names and actual addresses in full RFC 822-style strings (e.g., User [email protected]), parsing only the addr-spec. These errors can result in high false negative rates for simplistic checkers compared to full RFC compliance, emphasizing the need for comprehensive testing against diverse examples.

Existence Verification

Existence verification refers to methods used to determine whether an email address corresponds to an active mailbox that can receive messages, focusing on deliverability and user activity rather than format alone. A common technique is SMTP probing, which involves initiating a connection to the recipient's mail server and sending the RCPT TO command as defined in the Simple Mail Transfer Protocol (SMTP). This command specifies the recipient address, and the server responds with codes indicating acceptance or rejection; for instance, a 250 OK response signifies the mailbox is valid and will accept mail, while a 550 response (e.g., "User unknown") indicates the address does not exist.^[42] The probe simulates the early stages of email transmission—connecting via the domain's MX record, greeting the server, and querying the recipient—without sending a full message or body, thereby testing server-side confirmation of the address.^[43] Callback verification, also known as double opt-in, provides an interactive confirmation by sending a verification email to the address and requiring the recipient to respond, typically by clicking a link or replying with a code. This method verifies not only existence but also the user's intent and control over the mailbox, as the address is not activated until confirmation is received. In practice, after an initial signup, an automated confirmation email is dispatched with clear instructions for action, ensuring compliance with regulations like CAN-SPAM and improving list quality by filtering out invalid or mistyped entries.^[44] Third-party services offer automated existence verification through APIs, often combining SMTP probing with proprietary checks to validate addresses at scale. For example, Hunter.io's Email Verifier performs an SMTP test to assess if the address exists by simulating a server handshake, alongside domain and database lookups, achieving high accuracy for business emails.^[45] Similarly, NeverBounce integrates SMTP validation within its 20+ step process, conducted from multiple global locations to confirm deliverability and reduce bounces, supporting integrations with over 85 platforms.^[46]^[47] These tools are widely used in marketing to clean lists, but they raise privacy concerns, as probing can inadvertently expose valid addresses to unauthorized parties or facilitate spam if data is mishandled.^[48] Despite their utility, these methods have significant limitations. Catch-all domains, configured to accept emails for any local-part (e.g., *@example.com routes all to a single inbox), produce false positives by returning acceptance codes for non-existent addresses, complicating accurate verification.^[49] Anti-spam protections further hinder probing; many servers disable or restrict RCPT TO responses since the late 1990s to prevent address enumeration by spammers, often returning generic errors or temporary failures (e.g., 450 codes). High-volume probes can trigger rate limiting, firewalls, or blacklisting, rendering services unreliable over time and potentially damaging the verifier's IP reputation.^[48]

Internationalization

IDNA and Domain Internationalization

The Internationalizing Domain Names in Applications (IDNA) protocol enables the use of non-ASCII characters in domain names by defining a mechanism to map Unicode strings to ASCII-compatible encodings, ensuring compatibility with the Domain Name System (DNS).^[50] Specified in RFC 5890 through RFC 5894, IDNA2008 (the current version) replaces the earlier IDNA2003 framework and relies on Punycode for the actual encoding process.^[50] Under IDNA, a domain label containing Unicode characters—known as a U-label—is converted to an A-label, which is an ASCII string prefixed with "xn--" and encoded in Punycode, allowing it to be stored and resolved in the DNS without modifications to the underlying infrastructure.^[51] Punycode, detailed in RFC 3492, is a bootstring encoding algorithm that transforms a Unicode string into a representation using only ASCII letters, digits, and hyphens, preserving the original string's order and length constraints.^[52] The process separates basic ASCII characters (which remain unchanged) from non-ASCII ones, then encodes the latter using a base-36 numbering system with a delimiter ("-") to indicate the insertion point for the encoded portion.^[52] For example, the Unicode label "café" (where "é" is U+00E9) encodes to the A-label "xn--caf-dma", which can then be used in DNS queries.^[52] This encoding ensures reversibility: decoding an A-label yields the original U-label in Unicode Normalization Form C (NFC).^[50] In the context of email addresses, IDNA integration occurs at the DNS level, where MX records for internationalized domains are registered and resolved using A-label forms.^[50] SMTP protocols, as defined in RFC 5321, require domain names in commands like MAIL FROM and RCPT TO to be in ASCII, so applications must convert U-labels to A-labels before performing DNS lookups for MX records.^[42] This means email servers and clients need IDNA-aware implementations to handle the conversion; otherwise, resolution fails for non-ASCII domains.^[51] Browser and server support for IDNA has become widespread, with modern systems automatically applying Punycode encoding during domain registration and resolution.^[50] IDNA imposes several limitations to ensure security and stability, including validity checks that prohibit certain Unicode code points classified as DISALLOWED in RFC 5892, such as many punctuation marks and symbols that could lead to confusion or attacks. For right-to-left (RTL) scripts like Arabic or Hebrew, RFC 5893 defines bidirectional rules to mitigate visual spoofing risks: RTL labels must begin and end with specific character types (e.g., starting with R, AL, or L, and ending with R, AL, EN, or AN, optionally followed by non-spacing marks), and they cannot mix certain numeric types or include left-to-right characters inappropriately.^[53] These rules prevent unrestricted RTL usage in domains, requiring strict validation during encoding to avoid invalid labels that could be rejected by DNS resolvers.^[53]

Local-part Internationalization and SMTPUTF8

The original specification for the Simple Mail Transfer Protocol (SMTP) in RFC 5321 restricts the local-part of email addresses to ASCII characters, explicitly prohibiting non-ASCII octets (those with the high-order bit set to 1) and ASCII control characters (decimal values 0-31 and 127).^[26] This limitation confines usernames to the Latin alphabet, numerals, and a limited set of symbols, creating significant challenges for international users who wish to employ native scripts such as Cyrillic, Arabic, or Chinese characters in their email addresses.^[26] As global internet usage expands beyond English-speaking regions, this ASCII-only constraint hinders email accessibility, cultural inclusivity, and the ability to create personalized, linguistically appropriate usernames.^[41] To overcome these restrictions, RFC 6531 defines the SMTPUTF8 extension, which extends SMTP to support the transport and delivery of email messages containing internationalized addresses and header information encoded in UTF-8.^[41] This extension permits UTF-8 characters in the local-part of mailbox addresses (e.g., before the "@" symbol) and in header fields, while domain names remain encoded via Internationalizing Domain Names in Applications (IDNA) for DNS compatibility.^[41] Servers implementing SMTPUTF8 must advertise their capability by including the "SMTPUTF8" keyword—without parameters—in the response to the client's EHLO command, informing the sender that non-ASCII content can be transmitted without modification.^[41] Without this advertisement, clients are prohibited from sending internationalized messages to avoid delivery failures.^[41] Server implementation of SMTPUTF8 involves several key requirements to ensure reliable handling of UTF-8 content. Servers must validate UTF-8 syntax in mailbox local-parts and headers, perform IDNA-compliant domain lookups, and store messages using UTF-8 encoding, typically in conjunction with the 8BITMIME extension (RFC 6152) to support 8-bit data in message bodies.^[41] No inspection of the message body for non-ASCII content is mandated, but servers should reject invalid UTF-8 sequences with appropriate error codes, such as 553 for mailbox issues.^[41] In cases where the receiving server does not support SMTPUTF8, sending clients must not attempt delivery and should either reject the transaction (e.g., with a 550 or 553 response) or, if configured, downgrade the message to an ASCII-compatible form, though the latter risks data loss and is discouraged.^[41] Adoption of the SMTPUTF8 extension remains partial and uneven across the email ecosystem. Major providers such as Google Workspace (including Gmail) have supported SMTPUTF8 since 2014, enabling users to send and receive emails with UTF-8 local-parts. Similarly, Microsoft has integrated support in Exchange Server 2019 and later, as well as in Microsoft 365 environments.^[54] However, legacy systems, on-premises deployments of older Exchange versions, and many smaller or regional providers continue to lack compatibility, resulting in bounce rates and errors for internationalized messages—such as the common "SMTPUTF8 is required, but was not offered" rejection.^[55] Recent advancements, including ICANN's achievement of full Email Address Internationalization (EAI) support in its systems in July 2025 and the Universal Acceptance Steering Group's (UASG) FY2025-2029 strategic plan focusing on governments and providers, signal increasing momentum, though global uptake was limited to approximately 10% of domains as of 2021, with ongoing efforts to accelerate deployment in multilingual regions.^[56]^[57]

History and Evolution

Early Development

The development of email addresses began in the context of the ARPANET, the precursor to the modern Internet, where early messaging systems required a way to specify recipients across networked computers. In 1971, Ray Tomlinson, working at Bolt, Beranek and Newman (BBN), implemented the first program to send electronic mail between users on different ARPANET hosts using the TENEX operating system. He introduced the "@" symbol as a separator to denote "user at host," creating the foundational format of user@host to distinguish the recipient's identifier from the destination machine. This choice of the "@" was arbitrary among available non-alphanumeric symbols on the keyboard, but it quickly became the standard delimiter for network email addressing.^[58] Early standardization efforts followed to address inconsistencies in mail headers and formats across ARPANET systems. RFC 561, published in September 1973 by Abhay Bhushan and Ray Tomlinson, proposed uniform network mail headers, defining fields such as "FROM: AT " (e.g., "White at SRI-ARC") for the sender's address, alongside "DATE" and "SUBJECT" to facilitate processing and routing of messages transmitted via FTP commands like MAIL or MLFL. This document emphasized a single occurrence of core headers per message and assumed ASCII characters, though it used textual "AT" in examples while supporting the "@" symbol in practice. Building on this, RFC 733 in November 1977, authored by David Crocker, John Vittal, Kenneth Pogran, and David Henderson, formalized the email address syntax as a host-phrase combining a user phrase with a host-indicator using "@" or "at" (e.g., "Neuman@BBN-TENEXA"). It introduced support for hierarchical routing paths with multiple "@" signs (e.g., "User@hosta@local-net1@major-net") and explicitly restricted characters to the 128-printable ASCII set from TELNET (codes 32-126 decimal), establishing an ASCII-only assumption that persisted in early implementations. By 1973, email had already become dominant on ARPANET, comprising 75% of network traffic, underscoring its rapid adoption among researchers.^[59]^[60]^[2] The evolution toward more scalable addressing culminated in RFC 822, published in August 1982 by David Crocker, which adapted the ARPANET standards for the broader ARPA Internet. This specification replaced simple hostnames with hierarchical domain names, defining the address as local-part@domain where the domain is a dot-separated sequence of sub-domains (e.g., "[email protected]"). It eliminated multi-"@" paths in favor of source routing via separate mechanisms, making addresses more logical and extensible for internetwork use while preserving the local-part's case sensitivity and uninterpreted nature by intermediate systems. RFC 822 retained the ASCII character restriction, focusing on printable US-ASCII for compatibility. Concurrently, email addressing spread beyond ARPANET through systems like UUCP (Unix-to-Unix Copy Protocol), introduced in the late 1970s for dial-up Unix networks, which initially used "bang-path" notation (e.g., "host1!host2!user") but increasingly integrated with Internet-style @ addresses for interoperability in the early 1980s, enabling wider adoption in academic and research communities.^[3]

Key Standards and Updates

In the early 2000s, significant updates to email standards addressed evolving internet infrastructure and clarified ambiguities in prior specifications. RFC 2821, published in April 2001, updated the Simple Mail Transfer Protocol (SMTP) by obsoleting RFC 821 and refining transport rules for greater reliability, including deprecation of obsolete features like source routing to simplify routing paths. Complementing this, RFC 2822 from the same year revised the Internet Message Format, obsoleting RFC 822 and specifying a stricter syntax for email headers and addresses, such as clarifying that comments in addresses (enclosed in parentheses) are obsolete and should not be generated. By 2008, further refinements established the current core standards. RFC 5321 updated SMTP once more, obsoleting RFC 2821 and emphasizing robustness against malformed inputs while formally removing support for source routing—a legacy feature from RFC 821 that allowed explicit relay paths like <@host1,@host2:user@domain> but was deemed unnecessary with modern DNS-based routing. Similarly, RFC 5322 updated the message format, obsoleting RFC 2822 and providing a more precise Address Specification syntax to handle edge cases in local-parts and domains, ensuring better interoperability. These documents remain the foundational references for email address handling in SMTP and message composition. Internationalization efforts advanced in the 2010s, enabling non-ASCII characters in email addresses. The IDNA2008 protocol, detailed in RFC 5890 (overview), RFC 5891 (protocol), and related documents like RFC 5892 (code points), updated domain name handling to support Unicode via Punycode encoding (e.g., converting "café.com" to "xn--caf-dma.com"), superseding the earlier IDNA2003 for broader script compatibility. Building on this, RFC 6530 (2012) provided an overview framework for internationalized email, while RFC 6531 extended SMTP with the SMTPUTF8 mechanism to transport messages containing UTF-8 addresses without 8BITMIME conversions, and RFC 6532 updated the message format to allow Unicode in headers and local-parts. As of 2025, email address standards have seen no major overhauls, with activity limited to errata corrections and applicability statements clarifying legacy provisions in RFC 5321 and 5322. Efforts have shifted toward security enhancements, such as DMARC (RFC 7489, 2015), which integrates with address validation to mitigate spoofing by specifying domain-level policies for authentication failure handling. Ongoing revisions in the IETF emailcore working group address minor clarifications, but core syntax remains stable.

References

[1]
RFC 5322 - Internet Message Format - IETF Datatracker
RFC 5322 specifies the Internet Message Format (IMF), a syntax for text messages sent between computer users in electronic mail.RFC 2822 · RFC 5321 · RFC 6854
[2]
RFC 2235: Hobbes' Internet Timeline
### Summary of Ray Tomlinson and Email Invention from RFC 2235
[3]
The Origins of E-mail - Stanford Computer Science
Sep 17, 1999 · The great number of mail-reading programs on the ARPANET brought up the need to develop a standard format for email, as it was necessary that ...Leading Up To E-Mail · The Rise Of Email By Way Of... · Interesting Tidbits
[4]
RFC 822 - STANDARD FOR THE FORMAT OF ARPA INTERNET ...
1. DOMAINS A name-domain is a set of registered (mail) names. · 2. ABBREVIATED DOMAIN SPECIFICATION Since any number of levels is possible within the domain ...
[5]
RFC 5598 - Internet Mail Architecture - IETF Datatracker
RFC 5598 describes the enhanced Internet Mail architecture, aiming to provide a common view of the complex system and its components.
[6]
RFC 6530 - Overview and Framework for Internationalized Email
This document introduces a series of specifications that define mechanisms and protocol extensions needed to fully support internationalized email addresses.
[7]
https://datatracker.ietf.org/doc/html/rfc5322#section-3.4.1
[8]
https://datatracker.ietf.org/doc/html/rfc5598#section-3.2
[9]
https://lemelson.mit.edu/resources/ray-tomlinson
[10]
Ray Tomlinson - Lemelson-MIT
In 1971, he developed ARPANET's first application for network email by combining the SNDMSG and CPYNET programs, allowing messages to be sent to users on other ...
[11]
The History of Email - The Cloudflare Blog
Sep 23, 2017 · The format for defining how a message should be transmitted (and often how it would be stored on disk) was first standardized in 1977: Date : 27 ...Mailboxes · Pine, Elm & Mutt · Webmail
[12]
https://datatracker.ietf.org/doc/html/rfc5321#section-4.1.1.2
[13]
https://datatracker.ietf.org/doc/html/rfc5321#section-4.1.1.3
[14]
https://datatracker.ietf.org/doc/html/rfc5321#section-2.3.1
[15]
https://datatracker.ietf.org/doc/html/rfc5321#section-5.1
[16]
https://datatracker.ietf.org/doc/html/rfc5321#section-2.3.9
[17]
https://datatracker.ietf.org/doc/html/rfc5321#section-3.7
[18]
https://datatracker.ietf.org/doc/html/rfc5321#section-4.2.2
[19]
https://datatracker.ietf.org/doc/html/rfc5321#section-6.1
[20]
https://datatracker.ietf.org/doc/html/rfc5321#section-2.3.11
[21]
https://datatracker.ietf.org/doc/html/rfc5322#section-3.2.3
[22]
https://datatracker.ietf.org/doc/html/rfc5322#section-3.2.4
[23]
https://datatracker.ietf.org/doc/html/rfc5322#section-4.4
[24]
https://datatracker.ietf.org/doc/html/rfc5321#section-4.5.3.1.1
[25]
https://datatracker.ietf.org/doc/html/rfc5321#section-2.4
[26]
https://datatracker.ietf.org/doc/html/rfc5321#section-4.1.2
[27]
https://datatracker.ietf.org/doc/html/rfc5321
[28]
RFC 5321 - Simple Mail Transfer Protocol - IETF Datatracker
RFC 5321 specifies the basic protocol for Internet electronic mail transport, aiming to transfer mail reliably and efficiently.
[29]
RFC 1035 - Domain names - implementation and specification
MX records cause type A additional section processing for the host specified by EXCHANGE. The use of MX RRs is explained in detail in [RFC-974]. 3.3.10 ...
[30]
RFC 974 - Mail routing and the domain system - IETF Datatracker
This RFC presents a description of how mail systems on the Internet are expected to route messages based on information from the domain system described in ...
[31]
RFC 5233 - Sieve Email Filtering: Subaddress Extension
RFC 5233 defines a Sieve extension to compare against user and detail sub-parts of email addresses, which are added to the local-part of an address.<|control11|><|separator|>
[32]
Plus Addressing in Exchange Online | Microsoft Learn
May 17, 2024 · Plus addressing, or subaddressing, uses a syntax like <local-part>+<tag>@<domain> for unique, receive-only, disposable email addresses. It's ...Missing: formats | Show results with:formats
[33]
RFC 5322: Internet Message Format
Summary of each segment:
[34]
Plus Addressing (Email Address Tagging) - SpamHero
Feb 20, 2024 · SpamHero supports the following delimiter characters for email address tagging: Plus (+); Equals (=); Hyphen (-). In other words, SpamHero ...
[35]
What is plus addressing? - Verifalia
Plus addressing, also known as subaddressing, is a simple but powerful feature offered by many email providers like Gmail, Yahoo, and Microsoft Outlook.
[36]
Plus Addressing: The Best Way to Track Spammers in 2025
Sep 3, 2024 · Plus addressing is a fast, easy, and convenient way to manage your inbox and track spammers. Essentially, it allows you to use different versions of the same ...
[37]
Understanding Email Aliasing: Which Providers Support email+alias ...
Oct 21, 2025 · 1. Gmail (Google). Support: Yes; Details: Gmail fully supports plus addressing. · 2. Proton Mail. Support: Yes · 3. Fastmail. Support: Yes · 4.
[38]
What are the security reasons for disallowing the plus sign in email ...
Aug 12, 2014 · There is no security vulnerability with the plus sign in email addresses. The issue is often due to whitelisting, broken regular expressions, ...
[39]
RFC 3696 - Application Techniques for Checking ... - IETF Datatracker
... length limit on email addresses. That limit is a maximum of 64 characters (octets) in the "local part" (before the "@") and a maximum of 255 characters ...
[40]
https://datatracker.ietf.org/doc/html/rfc5890
[41]
RFC 5890 - Internationalized Domain Names for Applications (IDNA)
This document is one of a collection that, together, describe the protocol and usage context for a revision of Internationalized Domain Names for Applications ...
[42]
RFC 6531: SMTP Extension for Internationalized Email
This document specifies an SMTP extension for transport and delivery of email messages with internationalized email addresses or header information.
[43]
RFC 5321: Simple Mail Transfer Protocol
That is, the SMTP client SHOULD use the command sequence: MAIL, RCPT, RCPT, ..., RCPT, DATA instead of the sequence: MAIL, RCPT, DATA, ..., MAIL, RCPT, DATA.
[44]
How does email verification work using SMTP? - Email Hippo
Aug 13, 2019 · We 'ping' a mail server to check the status of the inbox associated with an email address. The response received should follow the SMTP rules ( ...
[45]
What Is Double Opt-in in Email for List Building? - Twilio
Nov 19, 2024 · Double opt-in verifies that your contact wants your marketing emails before you officially add the contact to your list.Missing: callback | Show results with:callback<|separator|>
[46]
What checks are performed on an email with the Email Verifier?
SMTP Check: We test the email address to see if it exists. In case the SMTP server prevents us from performing this verification, we will mark the email as " ...
[47]
What does the verification process include? - Neverbounce.com
Mar 28, 2024 · Our proprietary 20+ step verification process checks each email up to 75 times from different locations around the globe.<|separator|>
[48]
NeverBounce Software Reviews, Demo & Pricing - 2025
Rating 4.4 (49) The platform lets professionals utilize various proprietary methods and Simple Mail Transfer Protocol (SMTP) capabilities to validate email servers. NeverBounce ...
[49]
Dubious merits of email verification services - Spamhaus
Apr 30, 2015 · Email verification services must operate with a strong policy which prohibits listwashing, trap washing and related spam support services, and ...Missing: limitations | Show results with:limitations
[50]
What Are Catch-All Emails? Benefits, Risks & Strategies - ZeroBounce
Rating 4.8 (2,565) Feb 23, 2025 · Catch-all emails are accounts designed to receive all messages sent to a domain, even to incorrect addresses, to ensure all emails are received.
[51]
RFC 5890: Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework
### Summary of RFC 5890: IDNA Definitions
[52]
https://www.rfc-editor.org/rfc/rfc3492.html
[53]
RFC 3492: Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)
### Summary of RFC 3492: Punycode Encoding
[54]
RFC 5893: Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)
### Summary of Limitations and Rules for Right-to-Left Scripts in IDNA Domains (RFC 5893)
[55]
Support UTF-8 in localpart of an email address for mail submission ...
Sep 26, 2014 · RFC 6531 - SMTP Extension for Internationalized Email (SMTPUTF8); RFC 6532 - Internationalized Email Headers; RFC 6533 - Internationalized ...Missing: details | Show results with:details
[56]
Support for SMTPUTF8 - Microsoft Q&A
May 14, 2021 · It's available only in Exchange 2019 OnPrem onwards and Microsoft 365. The SMTPUTF8 is likely not supported in the OnPrem version of Exchange 2016 and earlier.Missing: details | Show results with:details
[57]
SMTPUTF8 Errors and Bad Encoding: A Story - Spam Resource
Mar 26, 2025 · To properly handle those, the sending and receiving mail servers need to support the "SMTPUTF8" extension, defined in RFC 6531. In this case ...Missing: details | Show results with:details
[58]
ICANN Achieves Key Milestone in Universal Acceptance
Jul 2, 2025 · ICANN Account now supports internationalized email addresses, also known as Email Address Internationalization (EAI).
[59]
Beyond ASCII: The Vital Role of Email Address Internationalization ...
Mar 27, 2023 · Email Address Internationalization (EAI) is an open standard for allowing non-ASCII characters, such as Arabic, Chinese, Cyrillic or Devanagari, in email ...
[60]
Internet History of 1970s
At BBN, Ray Tomlinson writes a program to enable electronic mail to be sent over the ARPANET. It is Tomlinson who develops the 'user@host' convention ...
[61]
RFC 561 - Standardizing Network Mail Headers - IETF Datatracker
RFC 561 - Standardizing Network Mail Headers. This RFC is labeled as "Legacy"; it was published before a formal source was recorded. This RFC is not endorsed ...
[62]
RFC 733 - Standard for the format of ARPA network text messages
This standard specifies a syntax for text messages which are passed between computer users within the framework of electronic mail.
[63]
[PDF] Email Innovation Timeline
Aug 18, 2022 · ARPANET email message format standard RFC 733 is replaced by the Internet standard,. RFC 822. The DARPA-sponsored standard for the format of ...