Fact-checked by Grok 2 weeks ago

Secure Real-time Transport Protocol

The Secure Real-time Transport Protocol (SRTP) is a profile of the Real-time Transport Protocol (RTP) that adds security services to RTP and RTP Control Protocol (RTCP) packets, including confidentiality through encryption of payloads, message authentication for integrity protection, and defenses against replay attacks, enabling secure transmission of real-time media such as audio and video streams.^[1] Developed by the Internet Engineering Task Force (IETF) Audio/Video Transport (AVT) working group, SRTP was standardized in RFC 3711 in March 2004 to address the vulnerabilities of unsecured RTP in applications like Voice over IP (VoIP) and multimedia conferencing, where eavesdropping, tampering, or packet replay could compromise communications.^[1] SRTP operates by transforming RTP/RTCP packets at the sender and verifying them at the receiver, using a master key to derive session-specific keys for cryptographic operations while minimizing computational overhead and packet overhead to suit bandwidth-constrained environments, including both wired and wireless networks.^[1] Key security mechanisms include optional payload encryption via stream ciphers like AES in counter mode, authentication through a truncated HMAC-SHA-1 (default 80-bit tag), and replay protection via window-based checks on RTP sequence numbers combined with rollover counters.^[1] It supports unicast and multicast scenarios, with mandatory integrity protection for RTCP (SRTCP) but optional confidentiality, allowing flexibility for different threat models.^[1] SRTP is integrated into signaling protocols for key establishment, such as Session Description Protocol (SDP) with Security Descriptions (SDES) for exchanging master keys in SIP sessions or Datagram Transport Layer Security (DTLS-SRTP) for mutual authentication and key agreement without prior shared secrets.^[2]^[3] In modern applications, SRTP is mandatory for media transport in WebRTC, the framework for browser-based real-time communication, where it combines with the RTP/SAVPF profile to enforce encryption and integrity for all RTP/RTCP traffic.^[4] It is also widely adopted in SIP-based VoIP systems and 3GPP multimedia telephony, providing a standardized foundation for securing real-time internet communications against common threats.^[5]^[6]

Introduction and Background

Definition and Purpose

The Secure Real-time Transport Protocol (SRTP) is a profile of the Real-time Transport Protocol (RTP), as defined in RFC 3550, designed to enhance the security of RTP and RTP Control Protocol (RTCP) packets by incorporating confidentiality, message authentication, and integrity protections.^[7]^[8] SRTP operates as a "bump-in-the-stack" extension to RTP, meaning it integrates seamlessly without modifying the underlying RTP core functionality, which primarily handles timing, sequencing, and delivery of real-time data.^[9] The primary purpose of SRTP is to safeguard real-time media streams against common threats in IP-based communications, including eavesdropping on sensitive content, tampering with packet data, and replay attacks that could disrupt session integrity.^[7] By applying cryptographic transforms to RTP and RTCP payloads—while leaving certain fixed header fields unencrypted to support header compression—SRTP ensures secure transmission in resource-limited environments.^[10] This approach is particularly vital for applications like voice over IP (VoIP) and video conferencing, where unprotected RTP traffic is vulnerable to interception and manipulation.^[11] In practice, SRTP secures both unicast and multicast media streams, making it suitable for scenarios such as secure group communications or bandwidth-constrained wireless networks.^[11] For initial key establishment, SRTP often integrates with external protocols like DTLS-SRTP to derive session keys efficiently.^[3]

Historical Development

The development of the Secure Real-time Transport Protocol (SRTP) emerged in the early 2000s through efforts by the Internet Engineering Task Force (IETF) to enhance the security of the Real-time Transport Protocol (RTP), driven by the rapid growth of Voice over IP (VoIP) applications amid the expansion of internet-based communication following the 1990s boom in network infrastructure. A collaborative team of Internet Protocol and cryptographic experts from Cisco Systems and Ericsson led the initiative, focusing on providing confidentiality, authentication, and integrity for real-time media streams. This work addressed the vulnerabilities of unsecured RTP in emerging multimedia applications, culminating in the publication of RFC 3711 in March 2004 by the IETF's Audio/Video Transport (AVT) working group, with primary authors including M. Baugher, D. McGrew, M. Naslund, E. Carrara, and K. Norrman; the document established SRTP as a Proposed Standard for encrypting and authenticating RTP and RTP Control Protocol (RTCP) packets.^[1]^[12] Subsequent advancements built upon this foundation to improve key management and interoperability. In July 2006, RFC 4568 introduced Session Description Protocol (SDP) Security Descriptions (SDES), enabling the negotiation of cryptographic parameters for SRTP within SIP-based sessions. This was followed by RFC 5764 in May 2010, which extended Datagram Transport Layer Security (DTLS) to derive keys for SRTP, facilitating secure key exchange without relying on signaling protocols. Further enhancing media path security, RFC 6189 in May 2011 defined ZRTP, a Diffie-Hellman-based key agreement protocol multiplexed with RTP for unicast secure sessions. These developments addressed limitations in earlier key distribution methods, promoting broader deployment in diverse network environments.^[2]^[3]^[13] SRTP's adoption accelerated due to escalating security demands in VoIP systems, as well as the rise of browser-native real-time applications like WebRTC starting around 2011, which mandated SRTP for media encryption. By integrating robust protection against eavesdropping and tampering, SRTP became integral to secure multimedia transmission in enterprise and consumer VoIP deployments. As of 2025, its continued relevance is evident in 5G and 3GPP standards, such as ETSI Technical Specification TS 126 139 version 19.0.0 released in October 2025, which specifies SRTP for IP Multimedia Subsystem (IMS) media security, and in updated cryptographic guidelines like NIST Special Publication 800-135 Revision 1, which endorses SRTP's key derivation function for application-specific use.^[12]^[14]^[15]^[16]

Cryptographic Mechanisms

Encryption

The Secure Real-time Transport Protocol (SRTP) provides confidentiality primarily through the AES Counter Mode (AES-CM) encryption transform, which secures the RTP payload and certain header extensions while leaving essential fixed header fields unencrypted to preserve packet routing and processing compatibility.^[1] This mode operates on 128-bit blocks using a 128-bit session encryption key and a 112-bit session salt, generating a keystream that is XORed with the plaintext to produce the ciphertext.^[1] Fixed RTP header fields, such as the version, sequence number, and synchronization source (SSRC), are explicitly skipped during encryption to ensure these elements remain visible for network operations.^[1] The encryption process begins with the construction of a 128-bit initialization vector (IV) as the bitwise XOR of two values: one formed by concatenating the 32-bit SSRC, the 48-bit packet index i (where i = ROC << 16 | SEQ), and 48 zero bits, and the other by concatenating the 112-bit session salt with 16 zero bits.^[1] This IV initializes the AES counter mode, which encrypts successive counter values under the session key to produce a keystream matching the length of the variable RTP payload (including any padding).^[1] The resulting keystream is then bitwise XORed with the payload data, providing semantic security against chosen-plaintext attacks due to the pseudorandom IV and strong block cipher.^[1] An alternative encryption transform is AES in f8-mode, a synchronous stream cipher variant developed for 3GPP UMTS networks, which also employs a 128-bit key and operates similarly by generating a keystream for XOR with the payload.^[1] For scenarios where confidentiality is not required but other security services are, SRTP supports a null encryption option that passes the plaintext unchanged while still applying other transforms.^[1] Extensions to the base SRTP specification enable support for 256-bit keys in AES-CM (AES-256-CM), using a corresponding 256-bit session encryption key alongside the 112-bit salt, to offer enhanced security for high-threat environments without altering the core encryption mechanics.^[17] This configuration maintains protection against chosen-plaintext attacks while accommodating longer key material derived from the master key.^[17]

Authentication and Integrity

The Secure Real-time Transport Protocol (SRTP) provides authentication and integrity protection primarily through the Hash-based Message Authentication Code (HMAC) using the SHA-1 hash function, as defined in the protocol specification.^[1] This mechanism generates a message authentication tag (MAC tag) that verifies the authenticity and integrity of RTP and RTCP packets, detecting any unauthorized modifications during transmission.^[1] The authentication process begins with the sender deriving a session authentication key from the master key using a key derivation function.^[1] The MAC tag is then computed over the plaintext RTP or RTCP packet, encompassing the payload, selected header fields (such as sequence number and timestamp, with certain fields masked for implicit header authentication), and associated data.^[1] The tag itself is included in the computation to ensure self-authentication, and it is appended to the end of the packet before transmission.^[1] Upon receipt, the receiver recomputes the tag using the same session key and data; a mismatch indicates tampering or forgery, prompting packet discard.^[1] SRTP authenticates the full RTP/RTCP payload along with pertinent headers, ensuring comprehensive coverage against alteration while excluding non-essential fields to minimize computational overhead.^[1] For SRTCP packets, authentication extends to the entire packet, including the roll-over counter or index to prevent replay attacks in conjunction with sequence validation.^[1] The default MAC tag length is 80 bits (10 bytes), which offers strong resistance to forgery attacks while introducing modest overhead of approximately 2-10% bandwidth increase for typical voice packets.^[1]^[18] An optional 32-bit (4-byte) truncation is permitted for bandwidth-constrained or low-latency environments, such as certain real-time applications, but it elevates the collision probability and forgery risk.^[1] However, due to demonstrated collision attacks on SHA-1 since 2017 and NIST's 2022 retirement announcement (with phase-out by 2030), its use is discouraged in new designs, and stronger alternatives like HMAC-SHA-256 are supported in some profiles (e.g., 3GPP) but not yet in core IETF SRTP specifications as of 2025.^[19]

Replay Protection

The Secure Real-time Transport Protocol (SRTP) employs a replay protection mechanism that leverages the Real-time Transport Protocol (RTP) sequence number and an implicit Roll-Over Counter (ROC) to ensure packet uniqueness and temporal ordering. The core approach constructs a 48-bit packet index i = 2^{16} \times \mathrm{ROC} + \mathrm{SEQ}, where SEQ is the 16-bit RTP sequence number carried in each packet header, and ROC is a 32-bit counter maintained by the SRTP sender. This index effectively extends the SEQ to prevent wrap-around ambiguities over long sessions, allowing up to $2^{48} packets per master key before rekeying is required.^[20] At the receiver, replay protection operates through a sliding window mechanism implemented via a Replay List, which tracks recently authenticated packet indices. The receiver maintains a window of size at least 64 (configurable and implementation-dependent, often larger for robustness), discarding any incoming packet whose index falls below the window's lower bound (indicating staleness) or matches an existing entry in the list (indicating a duplicate). Packets with indices within the window but previously unseen are accepted and added to the list, while those ahead of the window advance the window forward after authentication. This process ensures efficient storage using techniques like bitmaps, as the window need only represent recent packets rather than the entire history.^[21] The ROC is managed by the sender, which increments it by 1 (modulo $2^{32}) each time the SEQ wraps around after reaching $2^{16} - 1 (i.e., every 65,536 packets). Receivers do not receive the ROC explicitly in SRTP packets; instead, they infer it locally by evaluating three candidate values— the current local ROC, ROC-1, and ROC+1—selecting the one that yields an index closest to the local highest received index s_l without exceeding it by more than the window size. Initial ROC synchronization occurs during session setup via key management protocols, while ongoing alignment can be facilitated by RTP Control Protocol (RTCP) packets, which provide cumulative sequence counts to resolve ambiguities in high-loss scenarios.^[20]^[22] This mechanism provides security against replay attacks by rejecting outdated or reordered packets, even those that might otherwise pass encryption and message authentication, thereby maintaining session integrity in real-time streams. It operates independently on the packet indices, which are included in the authenticated portion of the SRTP packet, but requires non-null integrity protection to be effective, as unauthenticated indices could be forged. Derived keys from the master key are used in conjunction with this index-based check for comprehensive packet validation.^[21]^[23] A key limitation arises from the sliding window's finite size, which assumes no more than the window size minus one out-of-order packets arrive before legitimate ones; with the minimum window of 64, this tolerates limited reordering typical of real-time networks but may discard valid packets in highly congested or lossy environments, necessitating larger windows in such cases. This design balances security with the low-latency demands of applications like voice over IP, where exhaustive history tracking would be impractical.^[21]

Key Management

Key Derivation

In SRTP, key derivation generates session-specific keys for encryption, authentication, and salt refresh from a master key and associated master salt, ensuring secure and synchronized cryptographic operations across RTP packets. This process uses a pseudorandom function (PRF) to produce deterministic outputs, preventing the need for explicit key exchange during the session. The derivation is central to SRTP's security model, as it allows for the creation of unique keystreams for each transform while maintaining compatibility with the underlying RTP structure.^[1] The PRF employed in SRTP is based on AES-128 in Counter Mode (AES-CM), which generates a keystream from the master key when provided with an input string derived from a label and an index. The label is an 8-bit constant identifying the key type, such as 0x00 for the SRTP encryption key, 0x01 for the authentication key, and 0x02 for the salt. The index is a 48-bit value formed by concatenating the 32-bit rollover counter (ROC) and the 16-bit RTP sequence number (SEQ), specifically index = (ROC << 16) | SEQ. To derive a session key of length n bits (where n ≤ 2^{23}), the PRF input string x is constructed as the master salt XORed with (label || (index DIV key_derivation_rate) || 0x00 || 0x00), where key_derivation_rate (kdr) controls periodic refresh and defaults to 0. The formula for the session key is then:

\text{Session key} = \text{first } n \text{ octets of AES-CM keystream}(k_{\text{master}}, x)

This mechanism derives separate keys for encryption, authentication, and salt, with the initialization vector (IV) for transforms incorporating the session salt, SSRC, ROC, and SEQ to ensure uniqueness and prevent reuse.^[24]^[25] The master key is typically 128 bits (16 octets) in length, paired with a 112-bit (14-octet) master salt to mitigate key collision attacks; extensions support 256-bit keys, such as with a 112-bit salt in AES-256-CM or 96-bit salt in AES-256-GCM, for enhanced security.^[17]^[26] These values are refreshed periodically to limit exposure, with rekeying recommended before sending more than 2^{48} SRTP packets or 2^{31} SRTCP packets with the same master key, deriving new session keys from a fresh master key without interrupting the session. The process begins with selecting the active master key and salt based on the Master Key Identifier (MKI) if present, computing the index from packet headers, applying the label for the desired transform, and generating the keystream via the PRF; this supports seamless rekeying through MKI signaling in RTP headers.^[27]^[28]^[29]^[30] The deterministic nature of this derivation—relying solely on shared master key/salt and packet-specific index—ensures automatic synchronization between sender and receiver without additional signaling, even after rollover events. Security-wise, periodic master key rotation via rekeying provides forward secrecy, protecting past session traffic if a key is compromised, while the salt's inclusion in IV formation resists time-memory tradeoff attacks.^[20]^[31]^[30]

External Key Exchange Protocols

The Secure Real-time Transport Protocol (SRTP) does not include an integrated mechanism for key management and instead relies on external protocols to establish and exchange the initial master keys used for securing RTP and RTCP packets.^[1] These protocols operate outside the SRTP framework, providing the necessary cryptographic material through signaling or media paths, and are optional depending on the application context.^[1] One common method is Security Descriptions for Media Streams (SDES), defined in RFC 4568, which enables key exchange by embedding base64-encoded master keys and associated parameters directly within Session Description Protocol (SDP) attributes during signaling, such as in SIP sessions.^[2] This approach is straightforward for unicast streams but exposes the keys to eavesdropping if the signaling channel is not encrypted, making it unsuitable for untrusted networks.^[2] Multimedia Internet KEYing (MIKEY), specified in RFC 3830, offers a more flexible key management scheme supporting both unicast and multicast scenarios through methods like pre-shared keys, public-key encryption, and Diffie-Hellman exchanges. It is particularly suited for group communications, allowing a key distributor to securely deliver symmetric keys to multiple recipients while accommodating real-time constraints in multimedia applications. ZRTP, outlined in RFC 6189, provides an end-to-end key agreement protocol that negotiates SRTP session keys directly over the media path using ephemeral Diffie-Hellman exchanges, bypassing reliance on signaling security.^[13] To mitigate man-in-the-middle attacks, it generates short authentication strings (SAS) displayed to users for verbal verification, ensuring mutual authentication without prior shared secrets.^[13] Among these, SDES prioritizes simplicity for direct key transport but lacks protection against channel compromise, whereas ZRTP achieves perfect forward secrecy through its ephemeral keying and out-of-band authentication, making it robust for peer-to-peer VoIP.^[2]^[13] MIKEY, in contrast, excels in scalability for multicast sessions but requires more complex setup for Diffie-Hellman modes. By 2025, SDES has been deprecated in favor of DTLS-based methods in modern systems like WebRTC due to its security limitations in exposing keys to JavaScript and signaling intermediaries.^[32] These master keys from external protocols are then used to derive session-specific keys within SRTP.^[1]

DTLS-SRTP

Protocol Overview

DTLS-SRTP is a key management protocol that extends Datagram Transport Layer Security (DTLS), as defined in RFC 6347, to negotiate cryptographic parameters and derive master keys for the Secure Real-time Transport Protocol (SRTP) and Secure Real-time Transport Control Protocol (SRTCP). Specified in RFC 5764 (2010), it enables secure establishment of SRTP keys directly over the media path using a DTLS handshake conducted via UDP, ensuring confidentiality, authentication, and replay protection for RTP/RTCP flows without relying on out-of-band signaling. It is compatible with DTLS 1.3 (RFC 9147), which provides enhanced security features such as a simplified handshake and support for TLS 1.3 cipher suites.^[3]^[33]^[34] The process begins with a standard DTLS handshake between endpoints to agree on cryptographic algorithms and establish shared secrets, after which the DTLS exporter interface—with the specific label "EXTRACTOR-dtls_srtp"—generates distinct client_write_SRTP_master_key, server_write_SRTP_master_key, client_write_SRTP_master_salt, and server_write_SRTP_master_salt values. These exported keys and salts are then directly input into the SRTP key derivation function as outlined in RFC 3711, allowing session-specific keys for encryption and authentication to be generated securely. This in-band approach minimizes exposure risks compared to methods that embed keys in signaling messages.^[35]^[29] DTLS-SRTP offers several advantages, including mutual authentication during the handshake, perfect forward secrecy through support for ephemeral Diffie-Hellman or Elliptic Curve Diffie-Hellman cipher suites, and inherent resistance to man-in-the-middle attacks on signaling channels—unlike earlier methods such as Security Descriptions (SDES) that transmit keys in plain text within SDP. Additionally, its UDP-based design facilitates better NAT traversal than TCP-oriented protocols like TLS, often in conjunction with STUN for endpoint discovery, making it particularly suitable for real-time media applications where low latency and firewall compatibility are essential.^[36]^[5]^[37] Integration with the Session Description Protocol (SDP) offer/answer mechanism, as detailed in RFC 5763, allows DTLS-SRTP to be signaled using specific media transport tokens such as "UDP/TLS/RTP/SAVP" in SDP descriptions, enabling endpoints to negotiate its use alongside attributes like setup roles (e.g., active, passive) and certificate fingerprints for initial verification. It supports certificate-based mutual authentication, with optional pre-shared key (PSK) methods available through DTLS's native capabilities, ensuring flexible deployment in various network environments while delivering master keys and salts to SRTP without any exposure in the signaling plane.^[38]^[39]

Security Features

DTLS-SRTP provides robust authentication mechanisms through the DTLS handshake, supporting server and client certificates as well as pre-shared keys (PSKs) to verify peer identities.^[3] For enhanced efficiency in resource-constrained environments, it accommodates raw public keys without requiring full X.509 certificates, as specified in RFC 7250.^[40] These methods ensure mutual authentication, preventing unauthorized access to media streams. Forward secrecy is a core security property enabled by ephemeral Diffie-Hellman (DHE) or Elliptic Curve DHE (ECDHE) key exchanges during the DTLS handshake, which generate unique session keys that protect the confidentiality of prior communications even if long-term private keys are later compromised.^[3] Cipher suite negotiation occurs within DTLS, allowing selection of secure options such as TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, which are subsequently mapped to appropriate SRTP protection profiles using the TLS exporter interface to derive SRTP master keys and salts.^[3]^[34] This integration guarantees that the encryption and integrity mechanisms for RTP packets align with negotiated parameters. DTLS-SRTP addresses key vulnerabilities inherent to real-time protocols, including mitigation of downgrade attacks via strict enforcement of protocol versions and cipher suites during handshake validation.^[3] It also resists denial-of-service (DoS) attacks through stateless cookie mechanisms that challenge clients to prove reachability before allocating server resources for the full handshake.^[3] Implementations must carefully ensure that SRTP parameters, such as the chosen cipher suite and key derivation rates, precisely match the values exported from the DTLS session to avoid mismatches that could compromise security.^[3] Looking toward emerging threats, 2025 guidelines from the IETF UTA Working Group recommend hybrid key exchanges that combine classical algorithms like ECDHE with post-quantum variants such as ML-KEM to safeguard against quantum-enabled attacks during the transition period.^[41] In practice, these features support secure media setup in applications like WebRTC, where DTLS-SRTP establishes encrypted channels for browser-based real-time communication.^[14]

Applications and Interoperability

VoIP and Telephony

The Secure Real-time Transport Protocol (SRTP) plays a central role in securing voice over IP (VoIP) systems, particularly by encrypting Real-time Transport Protocol (RTP) streams in Session Initiation Protocol (SIP)-based deployments. Open-source platforms like Asterisk and FreeSWITCH commonly integrate SRTP to protect media flows, with Asterisk providing native support since version 1.8 for encrypting audio packets during calls. FreeSWITCH similarly enables SRTP through Security Descriptions for Media Streams (SDES), allowing seamless encryption of voice data in SIP sessions. In addition, SRTP is mandatory for media plane security in the IP Multimedia Subsystem (IMS), ensuring confidentiality in core network elements and user equipment for standardized VoIP architectures.^[42] Interoperability of SRTP in VoIP telephony relies on widespread support across softphones and hardware endpoints, though it demands alignment on cryptographic suites during negotiation. Softphones such as Linphone incorporate SRTP alongside methods like SDES and ZRTP for secure key exchange and media protection. Jitsi also supports SRTP, leveraging ZRTP to negotiate session keys for encrypted audio. On the hardware side, Cisco IP phones enable SRTP for secure voice calls, with compatibility across most models when paired with TLS-secured signaling. Mismatched crypto suites can disrupt connections, necessitating careful configuration to maintain end-to-end security. SRTP delivers significant benefits in VoIP and telephony environments by safeguarding against eavesdropping, especially in PSTN-to-VoIP gateways where unencrypted media could be intercepted during circuit-to-packet transitions. For codecs like G.711, which operate at 64 kbps uncompressed, SRTP adds only marginal overhead—typically a small increase in packet size and processing—preserving low-latency performance suitable for toll-quality voice. This minimal impact ensures that security enhancements do not substantially degrade call quality in bandwidth-abundant telephony setups. Despite these advantages, SRTP introduces challenges, notably in key management for mesh calling scenarios where multiple endpoints form direct connections, amplifying the need for pairwise key negotiations and potentially straining computational resources. In telephony-specific integrations, SRTP requires adaptation to legacy protocols like H.323, which supports it via gateway configurations for encrypted trunks, and Media Gateway Control Protocol (MGCP), utilizing dedicated SRTP packages to secure media on controlled endpoints. By 2025, SRTP adoption remains robust in enterprise private branch exchange (PBX) systems, including 3CX, where it enforces media encryption for internal and external calls to meet compliance needs. In 5G-enabled voice services such as Voice over New Radio (VoNR), SRTP profiles secure RTP media within the 3GPP IMS framework, enabling protected end-to-end voice delivery over standalone 5G networks. A representative implementation is secure SIP trunking, where providers and PBX systems exchange keys via SDES embedded in SIP signaling or DTLS-SRTP over UDP for opportunistic encryption, ensuring trunk media remains confidential from transit intermediaries.

WebRTC and Browser Support

WebRTC mandates the use of Secure Real-time Transport Protocol (SRTP) for encrypting all media streams, with Datagram Transport Layer Security-SRTP (DTLS-SRTP) serving as the exclusive mechanism for key exchange to ensure secure session establishment.^[14] This architecture, defined in the WebRTC security specifications, prohibits unencrypted RTP usage and requires protection profiles like AES_CM_128_HMAC_SHA1_80 for confidentiality and integrity.^[14] By integrating SRTP directly into the protocol stack, WebRTC eliminates the need for external encryption layers, streamlining secure peer-to-peer communication. Browser implementations provide full support for SRTP in WebRTC across major graphical browsers as of 2025. Google Chrome has included it since version 23 (released in 2012), Mozilla Firefox since version 22, Apple Safari since version 11, and Microsoft Edge following its adoption of the Chromium engine in 2020.^[43] In contrast, text-based browsers such as Lynx lack JavaScript support and thus cannot implement WebRTC or SRTP.^[44] JavaScript developers interact with SRTP functionality transparently through the WebRTC API, centered on the RTCPeerConnection interface. This API abstracts encryption details, allowing methods like getSenders() and getReceivers() to add and manage media tracks while the browser automatically applies SRTP for transmission without developer intervention.^[45] WebRTC's requirement for mandatory media encryption has been a key driver in SRTP's widespread adoption, embedding it as the de facto standard for secure real-time web communications and influencing broader RTP ecosystem security practices.^[46] In 2025, browser updates have extended support for advanced codecs like AV1 in WebRTC sessions protected by SRTP, with full support in Chrome (version 113+) and Firefox (version 136+), and experimental support in Safari when hardware decoding is available.^[47]^[48] Cross-browser interoperability relies on the Interactive Connectivity Establishment (ICE) framework, which employs Session Traversal Utilities for NAT (STUN) and Traversal Using Relays around NAT (TURN) to negotiate optimal paths through NATs and firewalls.^[49] Persistent challenges include certificate pinning conflicts in corporate settings, where firewalls performing TLS interception can disrupt DTLS-SRTP handshakes by breaking the expected certificate chain.^[50] Applications like Google Meet and Zoom's web client leverage WebRTC's SRTP for end-to-end media security in browser-based video conferencing, ensuring encrypted audio and video flows without additional configuration.^[51]^[52]

Standards and Specifications

Core Standards

The Secure Real-time Transport Protocol (SRTP) is defined by its core standards, which establish the foundational mechanisms for securing RTP and RTCP packets through encryption, authentication, and replay protection. The primary specification is RFC 3711, published in 2004, which outlines the SRTP protocol as a profile of the Real-time Transport Protocol (RTP), specifying cryptographic transforms including Advanced Encryption Standard in Counter Mode (AES-CM) for confidentiality and HMAC-SHA1 truncated to 80 bits (HMAC-SHA1-80) for message authentication. This standard also details key derivation processes using a pseudorandom function based on AES in counter mode and includes security profiles that mandate the use of AES-CM with HMAC-SHA1-80 as the mandatory-to-implement combination for interoperability.^[1] RFC 3711 builds directly upon the RTP and RTP Control Protocol (RTCP) frameworks established in RFC 3550 and RFC 3551, both from 2003, which define the base packet formats, multiplexing, and payload types that SRTP extends with security features while maintaining compatibility. For authentication, SRTP relies on the keyed-hash message authentication code (HMAC) algorithm specified in RFC 2104 from 1997, which employs the Secure Hash Algorithm 1 (SHA-1) to generate integrity checks against packet tampering. These core documents collectively ensure that SRTP operates within the RTP ecosystem, requiring adherence to RTP's packet structure for header preservation and payload encryption. Developed by the IETF's Audio/Video Transport (AVT) working group, RFC 3711 was published as a Proposed Standard in 2004, reflecting its maturity and broad adoption for securing multimedia streams. Errata and clarifications for these standards, including fixes to key derivation ambiguities in RFC 3711, have been tracked and incorporated through updates as recent as 2025 by the IETF, ensuring ongoing relevance without altering the baseline functionality.

Extensions and Updates

Since the publication of the core SRTP specification in RFC 3711, several extensions have been developed to enhance its cryptographic flexibility, key management options, and integration with emerging technologies, while maintaining compatibility with existing deployments. One significant update is provided by RFC 7714, which introduces support for AES-GCM (Advanced Encryption Standard in Galois/Counter Mode) and AES-GMAC (Galois Message Authentication Code) as authenticated encryption modes within SRTP. These algorithms offer confidentiality, data origin authentication, and integrity protection with reduced computational overhead compared to earlier modes like AES-CM (Counter Mode) combined with HMAC-SHA1, making them suitable for resource-constrained real-time applications.^[26] A more recent update, RFC 9335 from 2023, further enhances SRTP by defining Cryptex, a mechanism that completely encrypts RTP header extensions and Contributing Source (CSRC) identifiers, improving privacy for metadata in media packets while simplifying session description.^[53] Key management has also evolved through extensions like RFC 5764, which defines DTLS-SRTP, an adaptation of Datagram Transport Layer Security (DTLS) to negotiate and establish shared keys for SRTP and SRTCP (SRTP Control Protocol) sessions directly over UDP. This approach enables secure key exchange without relying on signaling protocols, improving resistance to man-in-the-middle attacks in peer-to-peer scenarios. Complementing this, RFC 6189 specifies ZRTP, a media-path key agreement protocol that uses Diffie-Hellman exchanges multiplexed on the same ports as RTP to derive SRTP session keys, providing an alternative for unicast secure RTP without external certificate infrastructure.^[3]^[13] Further adaptations address modern ecosystems, such as RFC 8827, which outlines the security architecture for WebRTC, mandating SRTP for media encryption and integrating it with DTLS for key derivation to ensure end-to-end protection in browser-based real-time communications. In mobile networks, 3GPP and ETSI have incorporated SRTP into 5G specifications, notably in TS 126 139 version 19.0.0 (October 2025), where it secures RTP-based conversational services like Multimedia Telephony over IP (MTSI) and mission-critical push-to-talk, supporting low-latency requirements in 5G New Radio environments.^[14]^[15] Cryptographic strengthening continues with updates from NIST SP 800-135 Revision 1, which endorses the SRTP key derivation function (KDF) for generating subkeys from master keys but recommends pairing it with stronger hash functions like SHA-256 or SHA-384 in new profiles, while deprecating SHA-1 due to collision vulnerabilities. These profiles avoid breaking backward compatibility by allowing optional negotiation of enhanced parameters. Additionally, SRTP benefits from post-quantum resilience through DTLS integrations; for instance, TLS 1.3 extensions enable hybrid key exchanges incorporating Kyber (now ML-KEM) alongside classical elliptic curve methods, allowing SRTP sessions to inherit quantum-resistant properties without altering the core protocol.^[16] Overall, these extensions prioritize enhanced security and interoperability—such as better performance in high-throughput 5G and web applications—while preserving the protocol's lightweight design and compatibility with legacy SRTP implementations.^[26]^[3]

References

[1]
RFC 3711 - The Secure Real-time Transport Protocol (SRTP)
This document describes the Secure Real-time Transport Protocol (SRTP), a profile of the Real-time Transport Protocol (RTP), which can provide confidentiality, ...
[2]
RFC 4568 - Session Description Protocol (SDP) Security ...
This document defines a Session Description Protocol (SDP) cryptographic attribute for unicast media streams.
[3]
RFC 5764 - Datagram Transport Layer Security (DTLS) Extension to ...
This document describes a Datagram Transport Layer Security (DTLS) extension to establish keys for Secure RTP (SRTP) and Secure RTP Control Protocol (SRTCP) ...RFC 3711 · RFC 5741 - RFC Streams... · RFC 5246 · RFC 5226
[4]
RFC 8834 - Media Transport and Use of RTP in WebRTC
This memo describes the media transport aspects of the WebRTC framework. It specifies how the Real-time Transport Protocol (RTP) is used in the WebRTC context.
[5]
RFC 5763 - Framework for Establishing a Secure Real-time ...
This document specifies how to use the Session Initiation Protocol (SIP) to establish a Secure Real-time Transport Protocol (SRTP) security context using the ...
[6]
RFC 7201 - Options for Securing RTP Sessions - IETF Datatracker
This document provides an overview of a number of security solutions for RTP and gives guidance for developers on how to choose the appropriate security ...
[7]
https://datatracker.ietf.org/doc/html/rfc3711#section-1
[8]
RFC 3550 - RTP: A Transport Protocol for Real-Time Applications
RTP provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data.
[9]
https://datatracker.ietf.org/doc/html/rfc3711#section-3
[10]
https://datatracker.ietf.org/doc/html/rfc3711#section-3.1
[11]
https://datatracker.ietf.org/doc/html/rfc3711#section-2
[12]
[PDF] NIST SP 800-58, Security Considerations for Voice Over IP Systems
SRTP was being standardized at the IETF in the AVT working group. It was released as RFC 3711 in March 2004. SRTP provides a framework for encryption and ...
[13]
RFC 6189 - ZRTP: Media Path Key Agreement for Unicast Secure RTP
This document defines ZRTP, a protocol for media path Diffie-Hellman exchange to agree on a session key and parameters for establishing unicast Secure Real- ...
[14]
RFC 8827 - WebRTC Security Architecture - IETF Datatracker
This document defines the security architecture for WebRTC, a protocol suite intended for use with real-time applications that can be deployed in browsers.Table of Contents · Trust Model · Overview · Web-Based Peer Authentication
[15]
[PDF] ETSI TS 126 139 V19.0.0 (2025-10)
The present document can be downloaded from the. ETSI Search & Browse Standards application. The present document may be made available in electronic ...
[16]
SP 800-135 Rev. 1, Recommendation for Existing Application ...
Recommendation for Existing Application-Specific Key Derivation Functions ... Planning Note (08/20/2024):. NIST has decided to revise this publication. See this ...
[17]
RFC 6188: The Use of AES-192 and AES-256 in Secure RTP
### Summary of SRTP Encryption Key Sizes in RFC 6188
[18]
SRTP Performance & Capacity - Avaya Documentation
Sep 25, 2022 · SRTP reduces call capacity by 66% on IP500 V2 and 50% on Linux servers. Direct media is important, and authentication adds 4-10 bytes per ...Missing: tag overhead
[19]
NIST Retires SHA-1 Cryptographic Algorithm
Dec 15, 2022 · NIST is announcing that SHA-1 should be phased out by Dec. 31, 2030, in favor of the more secure SHA-2 and SHA-3 groups of algorithms.Missing: SRTP | Show results with:SRTP
[20]
[MS-SRTP]: Message Authentication and Integrity - Microsoft Learn
May 20, 2025 · The SRTP default authentication algorithm is Hash-based Message Authentication Code (HMAC)-SHA-256 [RFC2104], as specified in [RFC3711] ...
[21]
https://datatracker.ietf.org/doc/html/rfc3711#section-3.3.2
[22]
RFC 3711: The Secure Real-time Transport Protocol (SRTP)
Summary of each segment:
[23]
https://datatracker.ietf.org/doc/html/rfc3711#section-9.5
[24]
https://datatracker.ietf.org/doc/html/rfc3711#section-4.3
[25]
https://datatracker.ietf.org/doc/html/rfc3711#section-4.3.3
[26]
https://datatracker.ietf.org/doc/html/rfc7714
[27]
https://datatracker.ietf.org/doc/html/rfc3711#section-5
[28]
https://datatracker.ietf.org/doc/html/rfc3711#section-7.1
[29]
https://datatracker.ietf.org/doc/html/rfc3711#section-8
[30]
https://datatracker.ietf.org/doc/html/rfc3711#section-9
[31]
https://datatracker.ietf.org/doc/html/rfc3711#section-7.2
[32]
Intent to deprecate and remove: SDES key exchange for WebRTC
The reason why SDES is deprecated is that it is a security problem: It exposes session keys to Javascript, which means that entities with access to the ...Missing: 2025 | Show results with:2025
[33]
RFC 6347: Datagram Transport Layer Security Version 1.2
### Definition and Summary of DTLS from RFC 6347
[34]
https://datatracker.ietf.org/doc/html/rfc9147
[35]
https://datatracker.ietf.org/doc/html/rfc5764#section-4.2
[36]
https://datatracker.ietf.org/doc/html/rfc5764#section-1
[37]
https://datatracker.ietf.org/doc/html/rfc5764#section-5.1.2
[38]
https://datatracker.ietf.org/doc/html/rfc5763#section-8
[39]
RFC 7250 - Using Raw Public Keys in Transport Layer Security (TLS ...
This document specifies a new certificate type and two TLS extensions for exchanging raw public keys in Transport Layer Security (TLS) and Datagram Transport ...
[40]
Post-Quantum Cryptography Recommendations for Applications
Dec 18, 2024 · This document explores Quantum-Ready usage profiles for applications specifically designed to defend against passive and on-path attacks employing CRQCs.Missing: 8827 | Show results with:8827
[41]
WebRTC Browser Support 2025: Complete Compatibility Guide
Sep 25, 2025 · All browsers (Safari, Chrome, Firefox) are required to use WebKit. WebRTC behavior is therefore identical to Safari on iOS. This results in H. ...
[42]
Which Browsers Support WebRTC? — Video Conferencing Blog
Which Browsers Support WebRTC? · Google Chrome · Mozilla Firefox · Opera · Safari · Microsoft Edge · WebRTC in Mobile Browsers · Take your team communication to the ...
[43]
RTCPeerConnection - Web APIs - MDN Web Docs
It provides methods to connect to a remote peer, maintain and monitor the connection, and close the connection once it's no longer needed.PeerConnection · RTCPeerConnection() · RTCPeerConnection.getStats() · Close()
[44]
WebRTC Encryption and Security: Everything You Need to Know ...
Mar 8, 2023 · To send video, voice, or data between two peers in WebRTC, the information must be encrypted with Secure Real-time Transport Protocol (SRTP).Webrtc Security... · Types Of Webrtc Security · Webrtc Protocol Security<|separator|>
[45]
Codecs used by WebRTC - Media - MDN Web Docs - Mozilla
May 23, 2025 · AV1 uses the Dependency Descriptor (DD) RTP Header Extension to provide frame dependency information needed to support multi-party conferencing ...Supported video codecs · Supported audio codecs · Specifying and configuring...
[46]
Introduction to WebRTC protocols - Web APIs | MDN
Aug 19, 2025 · This article introduces the protocols on top of which the WebRTC API is built. In this article. ICE; STUN; NAT; TURN; SDP; Multi-party video ...
[47]
WebRTC 1.0: Real-time Communication Between Browsers - W3C
Jun 5, 2017 · This typically happens when the IdP uses certificate pinning and an intermediary such as an enterprise firewall has intercepted the TLS ...Missing: challenges | Show results with:challenges<|separator|>
[48]
How Google Meet keeps video conferences secure
Apr 8, 2020 · Google Meet employs an array of counter-abuse protections to keep your meetings safe. These include anti-hijacking measures for both web meetings and dial-ins.Missing: Zoom | Show results with:Zoom
[49]
Comparing Zoom, Microsoft Teams and Google Meet - Devoteam
Meet adheres to Internet Engineering Task Force (IETF) security standards like Datagram Transport Layer Security (DTLS) and Secure Real-time Protocol (SRTP).
[50]
RFC 7714 - AES-GCM Authenticated Encryption in the Secure Real ...
This document defines how the AES-GCM Authenticated Encryption with Associated Data family of algorithms can be used to provide confidentiality and data ...