Fact-checked by Grok 2 weeks ago

Session Description Protocol

The Session Description Protocol (SDP) is a text-based format intended for describing multimedia sessions for the purposes of session announcement, session invitation, and other forms of multimedia session initiation.^[1] It provides a structured way to convey session parameters such as media types, formats, transport protocols, and network addresses, without incorporating negotiation mechanisms itself.^[1] Originally specified in RFC 2327 in 1998, SDP has been updated in RFC 4566 (2006) and most recently in RFC 8866 (2021) to address evolving requirements in real-time communication.^[2]^[1] SDP's structure consists of a session-level description followed by zero or more media-level descriptions, all formatted as a series of lines in the form <type>=<value>.^[1] Mandatory session-level fields include the version (v=), origin (o=), session name (s=), and timing (t=), while media descriptions begin with an m= line specifying the media type (e.g., audio, video), port, transport protocol, and format list.^[1] Optional fields such as information (i=), connection data (c=), and attributes (a=) allow for additional details like bandwidth requirements, encryption keys, and custom extensions, making SDP highly extensible.^[1] The protocol supports network types like Internet (IN) and address types such as IPv4 or IPv6, with formal grammar defined using Augmented Backus-Naur Form (ABNF) for parsing.^[1] In practice, SDP is transport-agnostic and commonly embedded in signaling protocols for establishing sessions in applications like Voice over IP (VoIP), video conferencing, and streaming media.^[1] For instance, it integrates with the Session Initiation Protocol (SIP) via an offer/answer model to negotiate session parameters between endpoints.^[3] SDP also supports multicast announcements through protocols like Session Announcement Protocol (SAP) and is used in Real Time Streaming Protocol (RTSP) for media delivery control.^[1] Its role in describing capabilities for multiple media streams—such as audio, video, and data—enables interoperability in diverse real-time multimedia environments, with ongoing extensions managed through IANA registries for parameters like media types and attributes.^[1]

Overview

Purpose and Design

The Session Description Protocol (SDP) is a text-based format designed to describe the parameters of multimedia sessions, including media types, encoding formats, transport addresses, and other session metadata, while remaining independent of the underlying transport protocol used for communication.^[4] This independence allows SDP to be embedded in various protocols, such as the Session Announcement Protocol (SAP), Session Initiation Protocol (SIP), or Real Time Streaming Protocol (RTSP), facilitating its use in diverse network environments.^[4] The primary purposes of SDP are to enable session announcement, where the existence and details of a multimedia session are broadcast to potential participants; session invitation, allowing one party to invite another to join a session; and negotiation, providing sufficient information for recipients to participate effectively in unicast or multicast scenarios.^[4] By conveying essential details about media streams, timing, and connection parameters, SDP supports the initiation and management of multimedia communications without prescribing the actual negotiation mechanisms.^[4] SDP's design emphasizes simplicity and extensibility through a human-readable textual format encoded in UTF-8, using a hierarchical structure that begins with session-level descriptions followed by optional media-level sections, each delineated by key-value pairs such as "v=" for version, "o=" for origin, and "s=" for session name.^[4] This approach avoids binary encoding to enhance portability across different systems and tools, while the use of attributes allows for flexible extension without altering the core syntax.^[4] The protocol's development originated from the Internet Engineering Task Force (IETF) Multiparty Multimedia Session Control (MMUSIC) working group, which aimed to standardize session descriptions for applications like IP multicast to promote interoperability in multimedia networking.^[4]

History and Standards

The Session Description Protocol (SDP) was initially developed in 1998 by Mark Handley and Van Jacobson as part of the Internet Engineering Task Force (IETF) efforts to standardize multimedia session descriptions. Published as RFC 2327 in April 1998, it provided a foundational text-based format for announcing and inviting participation in multimedia sessions over IP networks.^[5] This work originated within the IETF's Multiparty Multimedia Session Control (MMUSIC) working group, which focused on protocols for teleconferencing and multimedia control, and drew contributions from the Audio/Video Transport (AVT) working group for related media handling aspects.^[5] SDP quickly became a building block for higher-level protocols, notably the Session Initiation Protocol (SIP) defined in RFC 3261, enabling session negotiation in VoIP environments. In 2006, RFC 2327 was superseded by RFC 4566, authored by Mark Handley, Van Jacobson, and Colin Perkins, to address clarifications, errata, and practical implementation issues identified since the original specification. Published in July 2006, this update restructured the document for clarity, revised the Augmented Backus-Naur Form (ABNF) grammar to correct inconsistencies and incorporate IPv6 support from RFC 3266, and introduced stricter rules for IANA parameter registration while maintaining backward compatibility.^[4] The revisions were driven by real-world deployments and feedback from the MMUSIC and AVT working groups, ensuring SDP's robustness as multimedia communications evolved.^[4] The most recent update, RFC 8866, was published in January 2021 by Ali C. Begen, Paul Kyzivat, Colin Perkins, and Mark Handley, obsoleting RFC 4566 to incorporate security enhancements and support for modern media types and formats. This version strengthened protections against vulnerabilities in session descriptions, such as improved handling of binary content and transport security considerations, while extending SDP's applicability to contemporary applications like interactive video and adaptive streaming.^[1] Developed under the MMUSIC working group, it reflects ongoing IETF efforts to align SDP with evolving network technologies.^[1] Key milestones in SDP's evolution include its early adoption in VoIP systems during the late 1990s, where it facilitated session setup in nascent IP telephony deployments alongside SIP. By the 2010s, SDP had integrated deeply into WebRTC standards, enabling browser-based real-time communications as specified in IETF documents like RFC 8827, and into 5G architectures for enhanced multimedia services via the IP Multimedia Subsystem (IMS). These integrations underscore SDP's enduring role as a versatile protocol across generations of communication technologies.^[1]

Syntax and Format

Session-Level Description

The Session Description Protocol (SDP) employs a text-based, hierarchical format for describing multimedia sessions, where the session-level description forms the initial block of an SDP message. This block consists of a series of lines, each beginning with a single uppercase character type identifier followed by an equals sign and the corresponding value, terminated by a carriage return and line feed (CRLF). The format is case-sensitive, with no whitespace permitted around the equals sign, and lines must appear in a specific order to ensure proper parsing.^[6] Mandatory fields in the session-level description include the protocol version, origin, session name, and timing. The version line ("v=") specifies the SDP version, typically "v=0" for the current standard, indicating the protocol's grammar and semantics. The origin line ("o=") identifies the session creator and provides unique identifiers, formatted as "o=

", where and are numeric values for session tracking, denotes the network type (e.g., IN for Internet), specifies the address type (e.g., IP4 or IP6), and gives the creator's address. The session name line ("s=") supplies a human-readable title for the session, such as "s=SDP Seminar", and must be non-empty. Connection data ("c=") must be present either at the session level or in every media description; if at the session level, it defines network and addressing parameters applicable to the session, structured as "c=

", for example "c=IN IP4 224.2.17.12/127", where the address may include a time-to-live (TTL) value for multicast. These fields establish the foundational context for the entire session. The timing line ("t=") specifies the start and stop times for the session using NTP timestamps.^[7]^[8]^[9]^[10]

Optional fields provide additional session-wide information and may include session details ("i=" for free-form text like "i=A Seminar on the session description protocol"), a URI ("u=" such as "u=http://www.example.com/seminars/sdp.pdf"), email contact ("e=" like "e=j.doe@example.com (Jane Doe)"), phone number ("p=" e.g., "p=+1 617 555-6011"), and bandwidth ("b=" formatted as "b=:" for modifiers like conference total bandwidth). These lines enhance interoperability and management without altering core session identity.^[7]^[11]^[12] The session-level description applies globally to all subsequent media streams unless overridden by equivalent fields in individual media descriptions, enabling a structured, layered approach to session negotiation. This hierarchy ensures that default parameters propagate efficiently while allowing per-media customization. An example session-level block illustrates this structure:

v=0
o=jdoe 2890844526 2890842807 IN IP4 10.47.16.5
s=SDP [Seminar](/page/Seminar)
i=A [Seminar](/page/Seminar) on the session description protocol
u=http://www.example.com/seminars/sdp.pdf
e=j.doe@[example.com](/page/Example.com) (Jane Doe)
c=IN IP4 224.2.17.12/127
t=2873397496 2873404696
v=0
o=jdoe 2890844526 2890842807 IN IP4 10.47.16.5
s=SDP [Seminar](/page/Seminar)
i=A [Seminar](/page/Seminar) on the session description protocol
u=http://www.example.com/seminars/sdp.pdf
e=j.doe@[example.com](/page/Example.com) (Jane Doe)
c=IN IP4 224.2.17.12/127
t=2873397496 2873404696

Here, the version declares SDP compliance, the origin identifies the initiator at IP address 10.47.16.5 with session ID 2890844526 and version 2890842807, the name and info provide descriptive context, the URI and contacts facilitate outreach, the connection sets multicast addressing, and timing defines the active period from NTP timestamp 2873397496 to 2873404696.^[6]

Media-Level Descriptions

Media-level descriptions in the Session Description Protocol (SDP) define the individual media streams that comprise a multimedia session, allowing for flexible configuration of multiple concurrent streams such as audio, video, or data. Each media stream is described by a dedicated block starting with an "m=" line, which specifies the media type, transport address (typically a UDP port), the underlying transport protocol, and a list of supported formats identified by payload type numbers. These blocks are nested under the session-level description and can appear multiple times to support diverse media types within the same session, enabling endpoints to negotiate and select compatible streams during session setup. The syntax of the "m=" line follows the augmented Backus-Naur Form (ABNF) as m= ..., where indicates the type (e.g., "audio", "video", "text", "application", or "image"), is the numeric port number for receiving media (or 0 to indicate an inactive stream), denotes the protocol such as "RTP/AVP" for Real-time Transport Protocol over IPv4, and lists one or more payload types (integers referencing codec or format definitions elsewhere in the SDP or in external profiles). For instance, "m=audio 49172 RTP/AVP 0 8" describes an audio stream on port 49172 using RTP/AVP, supporting payload type 0 (PCMU/8000) and type 8 (PCMA/8000), allowing the remote endpoint to choose based on capabilities. Supported media types are limited to those explicitly defined in the SDP specification, with audio and video being the most common in real-time applications like voice calls. Within each media block, additional lines can override session-level parameters for finer control. The "c=" line may appear to specify a unique connection address (e.g., IP and TTL for multicast) if the stream differs from the session-wide connection, while "b=" lines set bandwidth limits specific to the media stream, such as "b=AS:64" for 64 kbps of application-specific bandwidth. Attributes via "a=" lines provide media-specific details, like codec parameters or directionality (e.g., "a=sendrecv" for bidirectional flow), but these are tailored to the stream without affecting others. Streams can be explicitly rejected or marked inactive by setting the port to 0 in the "m=" line, which signals non-use without removing the description, or by using the "a=inactive" attribute to indicate temporary suspension while preserving the overall session structure. This modular approach ensures SDP descriptions remain compact yet expressive for complex sessions involving multiple media types.

Core Components

Attributes

Attributes in the Session Description Protocol (SDP) are specified using lines beginning with "a=", allowing for the conveyance of additional properties about the session or individual media streams. The general format is a=<attribute> for attributes without values or a=<attribute>:<value> for those with values, where the attribute name is a token and the value, if present, follows a colon.^[13] These attributes can appear at the session level, applying to the entire multimedia session, or at the media level, applying only to a specific media stream described by an "m=" line.^[13] Core attributes include those defining media direction, which indicate the transmission capabilities for a stream: "sendrecv" for bidirectional communication, "sendonly" for transmission without reception, "recvonly" for reception without transmission, and "inactive" for no active media flow.^[14] The "rtpmap" attribute maps RTP payload types to encoding names and clock rates, such as a=rtpmap:0 PCMU/8000, which specifies that payload type 0 corresponds to the PCMU codec at an 8000 Hz sampling rate.^[15] Similarly, the "fmtp" attribute provides format-specific parameters for media codecs, enabling customization like a=fmtp:96 profile-level-id=42e01f.^[16] For media identification, the "mid" attribute assigns a unique token to a media stream, as in a=mid:1, facilitating grouping and association in multi-stream sessions.^[17] SDP's extensibility relies heavily on attributes as the primary mechanism for introducing new features without altering the core syntax, with new attributes registered through IANA and defined across numerous RFCs—over 100 such attributes exist for various purposes like codec negotiation and transport options.^[18] This approach supports backward compatibility, as parsers are required to ignore any unrecognized attributes, ensuring that implementations can process SDP descriptions even when encountering extensions they do not support.^[13]

Connection and Bandwidth Information

The Session Description Protocol (SDP) specifies connection information through the "c=" line, which declares the network address and type for media transmission. This line is essential for establishing where and how session participants connect, supporting both unicast and multicast scenarios across IPv4 and IPv6 networks.^[9] The format of the connection line is c=<nettype> <addrtype> <connection-address>, where <nettype> indicates the network type, typically "IN" for Internet Protocol networks. The <addrtype> specifies the address family as "IP4" for IPv4 or "IP6" for IPv6, enabling compatibility with both address protocols. The <connection-address> provides the actual address, which can be a unicast IP for point-to-point sessions (e.g., 192.0.2.1), a multicast group address for group communications (e.g., 224.2.1.1 for IPv4), or an IPv6 scoped address. For IPv4 multicast, a time-to-live (TTL) value may follow the address with a slash (e.g., 224.2.36.42/127), though TTL scoping is deprecated in favor of administrative scoping. Layered encodings, used in hierarchical multicast setups, extend this with <base address>[/<ttl>]/<number of addresses> (e.g., IN IP4 224.2.1.1/127/3 for three layered addresses).^[9] Placement of the "c=" line is mandatory at the session level to define a default for all media streams, but it may be included at the media level to override the session-wide setting for specific streams, allowing flexible network configurations in multi-media sessions. This structure supports multicast groups by specifying the group address, which receivers join to receive data, and provides hints for NAT traversal by using publicly reachable addresses, though SDP relies on external protocols like STUN for full NAT handling.^[9] Bandwidth information in SDP is conveyed via the optional "b=" line, which estimates resource requirements to aid in session setup and scaling, particularly in conferences. The format is b=<bwtype>:<bandwidth>, where <bwtype> is a modifier defining the bandwidth's scope and <bandwidth> is the value, defaulting to kilobits per second (kbps). Common modifiers include "AS" for application-specific bandwidth per media stream or site, and "CT" for the total conference bandwidth across all media and participants. Additional modifiers like "RS" (sender rate) and "RR" (receiver rate) apply to RTCP feedback control.^[12]^[19] The "b=" line can appear at the session level for an overall estimate or at the media level for stream-specific limits, helping endpoints negotiate feasible configurations. A representative example is b=AS:64 for a 64 kbps audio session, signaling the expected bitrate to prevent overload in bandwidth-constrained environments like multi-party conferences. Unknown modifiers are ignored by parsers, ensuring robustness, and new types require IANA registration for standardization. Some attributes may further refine these bandwidth declarations.^[12]

Timing and Scheduling

Time Specifications

The Session Description Protocol (SDP) uses the "t=" line to define the temporal boundaries of a multimedia session, specifying when the session becomes active and when it ends. This line follows the format t=<start-time> <stop-time>, where both <start-time> and <stop-time> are represented as decimal values of Network Time Protocol (NTP) timestamps in whole seconds since January 1, 1900, 00:00:00 UTC.^[10] The start time indicates the earliest moment the session can activate, while the stop time marks the latest point of activity; implementations must ensure that the stop time is greater than or equal to the start time.^[10] For sessions without a defined end, such as ongoing conferences, the values t=0 0 denote an indefinite duration, signaling a permanent or unbounded session that remains active after initiation.^[10] NTP timestamps in SDP derive from the 32-bit unsigned integer seconds field of the NTP format, which counts seconds elapsed since the reference epoch of 1900-01-01, excluding the 16-bit fractional seconds.^[10] To align with Unix time (seconds since 1970-01-01), subtract 2,208,988,800 from the NTP value, as this offset accounts for the 70-year difference between epochs.^[10] Due to the decimal representation in SDP—rather than binary—there is no wraparound issue at the 2036 rollover of the 32-bit NTP counter; timestamps can extend as arbitrary-length decimals beyond that point without ambiguity.^[20] For instance, the line t=2873397496 2873404696 describes a two-hour session, as the difference of 7,200 seconds corresponds to 120 minutes starting from the NTP-equivalent of approximately July 14, 2003, 09:30:00 UTC (after epoch conversion).^[10] Sessions with non-contiguous active periods can employ multiple "t=" lines, each delineating an additional interval of activity for irregularly spaced timings.^[10] This mechanism allows SDP to describe complex schedules, such as a session active from 10:00 to 12:00 and again from 14:00 to 16:00 on the same day, by listing sequential "t=" lines without overlap.^[10] However, for regularly recurring patterns within these periods, an "r=" line is recommended in conjunction with a single "t=" line rather than multiple "t=" entries.^[10] The use of NTP timestamps ensures synchronization with real-time clocks across distributed participants, providing a globally consistent reference independent of local time zones or offsets.^[10] In multicast environments, where SDP announcements reach diverse receivers, this uniform timing facilitates precise coordination, allowing endpoints to join or leave sessions at exact moments without desynchronization issues.^[10] The NTP foundation, as defined in RFC 1305, underpins this reliability by offering high-accuracy time dissemination suitable for network-based multimedia applications.^[21]

Repetitions and Zones

The repeat times line in SDP, denoted by "r=", allows for the specification of recurring session schedules beyond a single active period defined in the timing description. This line follows the format r=<repeat interval> <active duration> <offsets from start-time>, where the repeat interval indicates the fixed time between successive repetitions (expressed in seconds by default, or with units such as d for days, h for hours, m for minutes, or s for seconds), the active duration specifies the length of each repeated session instance, and the offsets list zero or more comma-separated values representing additional start times relative to the base start-time in the "t=" line.^[22] For instance, r=7d 1h 0 25h describes a session repeating weekly for one hour each time, with occurrences at the base start-time and an additional offset of 25 hours from it.^[22] Offsets in the "r=" line are space-separated when multiple, enabling flexible patterning of repeats within the interval, and all values must be less than the repeat interval to avoid overlap.^[22] This mechanism is particularly useful for scheduled events, such as weekly conferences, where r=604800 3600 0 could define hourly sessions repeating every seven days (604800 seconds), starting at the base time without further offsets.^[22] However, for longer-term repeats like monthly or yearly, multiple "t=" lines are recommended instead, as the "r=" line is optimized for shorter, regular intervals.^[22] The time zones line, denoted by "z=", provides adjustments for daylight saving time or other regional clock changes that affect repeated sessions, using the format z=<adjustment time> <offset> [<adjustment time> <offset> ...], with multiple pairs of values as needed.^[23] Here, each adjustment time is an NTP timestamp (a 32-bit integer seconds since 1900) marking when the offset applies, and the offset is a signed value in the form +/-hh:mm (or equivalently +/-hh for hours) relative to the base session time.^[23] The "z=" line modifies only the immediately preceding "r=" line(s); without a preceding "r=", it is a syntax error. An example is z=2882844526 -1:20 2898848070 0, which shifts the session time back by 80 minutes at the first NTP timestamp (e.g., for a daylight saving transition) and resets it to zero offset at the second.^[23] These adjustments apply globally to all repeat times in the session description, ensuring consistency across time zone boundaries without cumulative effects.^[23] The "r=" line is optional and appears at the session level, typically used in multicast or broadcast scenarios for announced events rather than on-demand unicast sessions, where simple "t=" lines suffice. The "z=" line, when used, must follow an "r=" line.^[22]^[23] Parsing requires handling offsets as space- or comma-delimited lists for compatibility, with validation against the repeat interval to maintain non-overlapping schedules and alignment with NTP timestamps for global timing precision.^[22]^[23]

Applications and Integration

Use in SIP and RTP

The Session Description Protocol (SDP) is integral to the Session Initiation Protocol (SIP) for negotiating multimedia sessions in voice over IP (VoIP) systems. In SIP, SDP bodies are embedded within signaling messages such as INVITE requests and 200 OK responses to exchange session capabilities between endpoints.^[24] This integration enables the description of media streams, codecs, and transport parameters, facilitating the establishment of real-time communication sessions.^[4] The offer-answer model, defined in RFC 3264, governs SDP usage in SIP by allowing one party (the offerer) to propose session parameters in an SDP offer, while the answerer responds with an SDP answer that confirms, modifies, or rejects elements.^[24] Additions or removals of media streams are handled through specific rules: new streams in the answer must match offered ones, while rejected streams use port number zero.^[24] This mechanism ensures mutual agreement on session attributes before media exchange begins, supporting capabilities like audio and video codec selection. SDP maps directly to the Real-time Transport Protocol (RTP) for media transport by associating payload types in media lines (m=) with RTP profiles. Static payload types (0-95) follow predefined mappings in the RTP Audio/Video Profile (RFC 3551), such as payload type 0 for PCMU audio, while dynamic types (96-127) require rtpmap attributes to specify codec details like clock rate and channels.^[4]^[25] For instance, an rtpmap attribute might define "a=rtpmap:96 opus/48000/2" to indicate Opus codec usage over RTP.^[4] Ports negotiated in SDP's c= and m= lines correspond to RTP and RTCP endpoints, enabling synchronized media delivery.^[4] In a typical SIP audio/video call, an INVITE message might include an SDP offer like the following, proposing G.711 audio and VP8 video over RTP:

v=0
o=- 123456 1 IN IP4 192.0.2.1
s=Call
c=IN IP4 192.0.2.1
t=0 0
m=audio 5004 RTP/AVP 0
a=rtpmap:0 PCMU/8000
a=ptime:20
m=video 5005 RTP/AVP 96
a=rtpmap:96 VP8/90000
v=0
o=- 123456 1 IN IP4 192.0.2.1
s=Call
c=IN IP4 192.0.2.1
t=0 0
m=audio 5004 RTP/AVP 0
a=rtpmap:0 PCMU/8000
a=ptime:20
m=video 5005 RTP/AVP 96
a=rtpmap:96 VP8/90000

The responder's 200 OK would echo compatible parameters, adjusting ports or codecs as needed, such as confirming RTP ports 5004/5005 for audio/video streams.^[24]^[4] Historically, SDP with SIP emerged as a lightweight alternative to ITU-T H.323 for packet-based multimedia, gaining traction in the late 1990s for its simplicity in Internet environments. In early IP Multimedia Subsystem (IMS) deployments by 3GPP around 2002, SDP's role in SIP signaling standardized multimedia negotiation for mobile networks, replacing proprietary H.323 gateways and enabling interoperable VoIP services.

Role in WebRTC and Other Protocols

In WebRTC, the Session Description Protocol (SDP) serves as the primary mechanism for negotiating peer-to-peer multimedia sessions, enabling endpoints to exchange capabilities such as media types, codecs, and transport parameters through an offer/answer model. This process integrates SDP with the Interactive Connectivity Establishment (ICE) protocol, which uses STUN and TURN for NAT traversal and candidate discovery, allowing SDP to convey ICE credentials like usernames and passwords for secure connectivity checks.^[26] A key adaptation in WebRTC is Trickle ICE, defined in RFC 8838, which extends SDP to support incremental provisioning of ICE candidates during signaling, rather than requiring a complete SDP exchange upfront. This enables faster connection establishment by allowing candidates to be added via subsequent SDP updates, reducing latency in dynamic environments compared to the monolithic SDP approach typically used in SIP-based systems. In practice, WebRTC's RTCPeerConnection API generates SDP offers through the createOffer() method, which produces a session description including attributes like a=ice-ufrag for ICE username fragments, facilitating candidate exchange over signaling channels.^[27]^[28] WebRTC SDP also addresses security and multiplexing challenges, such as negotiating DTLS-SRTP for encrypting media streams with Datagram Transport Layer Security (DTLS) key exchange followed by Secure RTP (SRTP) transport, ensuring end-to-end protection without relying on signaling server mediation. Additionally, SDP extensions support simulcast, allowing senders to offer multiple RTP streams at varying qualities (e.g., low, medium, high resolution) within a single media description, signaled via attributes like a=simulcast to enable receiver selection for bandwidth adaptation.^[26]^[29] Beyond WebRTC, SDP integrates with other protocols for diverse applications. In the Real-Time Streaming Protocol (RTSP), SDP describes media sessions for on-demand streaming, specifying stream parameters like RTP ports and codecs in DESCRIBE responses to enable client setup of unicast or multicast flows. For messaging, the Message Session Relay Protocol (MSRP) uses SDP offer/answer exchanges over SIP to negotiate reliable text and file transfers, defining transport parameters such as TCP/MSRP for session establishment. In 5G media architectures, SDP operates within the IP Multimedia Subsystem (IMS) for call control, negotiating multimedia sessions via SIP in the 5G core, supporting high-bandwidth applications like immersive video over enhanced connectivity.^[30] As of March 2025, the WebRTC-HTTP Ingestion Protocol (WHIP), defined in RFC 9725, employs SDP offer/answer over HTTP for ingesting WebRTC streams into content delivery networks (CDNs) and streaming services, simplifying live media production workflows.^[31]

Extensions and Considerations

Common Extensions

The Session Description Protocol (SDP) supports extensibility through attribute definitions, allowing for enhancements to its core functionality while maintaining backward compatibility. Common extensions address aspects such as transport directionality, media stream grouping, user interface labeling, feedback mechanisms for codec control, and simulcast capabilities, all registered via standardized processes. These extensions are widely adopted in multimedia applications to improve negotiation efficiency and media handling without modifying the fundamental SDP grammar.^[1] Directionality extensions enable precise negotiation of connection roles in transport protocols like TCP, where SDP traditionally assumes UDP-based RTP. The a=setup attribute, defined as a media-level attribute, specifies the roles for establishing connections: "active" for the endpoint that initiates the connection, "passive" for the one that waits, "actpass" for offering either role, and "holdconn" for maintaining an existing connection. This attribute facilitates media direction control in offer/answer exchanges, ensuring the active party sends the initial SYN to the port indicated in the counterpart's m= line, with port 9 recommended for passive responses. For example, an offer might include a=setup:actpass to allow flexibility, while the answer uses a=setup:passive.^[32] Grouping extensions allow multiple media streams to be associated for coordinated handling, such as bundling or synchronization. The a=group attribute, a session-level extension, identifies streams by their a=mid labels and applies semantics like BUNDLE for multiplexing streams into a single transport flow or LS for lip synchronization. For instance, a=group:BUNDLE audio video bundles audio and video streams to reduce overhead in RTP sessions, enabling shared transport attributes like ports and SSRCs. This framework supports efficient resource use in scenarios like video conferencing.^[33] Extensions for fidelity provide labels and hints to guide client applications in rendering or managing media, enhancing user experience without altering stream semantics. The a=label attribute, a media-level tag, assigns a human-readable identifier to a stream, allowing external tools or UIs to reference it uniquely, such as labeling a video stream as "main-camera" for selective display. Complementing this, the LS (Lip Synchronization) semantic within the grouping framework (a=group:LS audio video) ensures timed alignment of related streams, offering implicit UI hints for synchronized playback in clients. These are particularly useful in browser-based or multi-stream environments.^[34]^[33] Codec control extensions incorporate RTCP feedback to enable adaptive media transmission. The a=rtcp-fb attribute, applied at the media level, signals support for specific feedback types per payload type, such as NACK for negative acknowledgments of lost packets or PLI for picture loss indications in video codecs. An example is a=rtcp-fb:96 nack pli, which instructs the sender to handle retransmissions or intra-frame requests based on receiver reports, improving reliability in real-time applications under the RTP/AVPF profile. This mechanism allows fine-grained control over codec behavior during sessions.^[35] Simulcast extensions support sending multiple versions of the same media at different qualities or resolutions over separate RTP streams, negotiated via SDP. The a=simulcast attribute, a media-level extension, describes the sender's or receiver's capabilities using comma-separated stream descriptions, such as a=simulcast:send 720p_30,480p_15 recv 1080p_60,720p_30. This enables scalable video delivery, where endpoints select streams based on bandwidth, without requiring separate m= lines.^[29] The IANA maintains a centralized registry for SDP attribute types to ensure interoperability and prevent conflicts. Attributes are registered under a "Specification Required" policy, requiring details like name, syntax (often ABNF), semantics, usage level (session or media), and multiplexing category, with updates reviewed by a designated expert. RFC 8866 updated this registry by consolidating prior lists and adding categories like TRANSPORT for attributes affecting lower layers, with examples including a=rtpmap for payload mapping and a=sendrecv for directionality. This management supports ongoing evolution while preserving SDP's extensibility.^[1]^[36]

Security and Limitations

The Session Description Protocol (SDP) is susceptible to several security vulnerabilities primarily stemming from its plain-text format and the information it conveys. A key issue is the leakage of IP addresses and ports through fields like the connection information ("c=" line) and interactive connectivity establishment (ICE) candidates, which can enable network reconnaissance attacks by revealing endpoint topology and facilitating targeted exploits. ^[1] Additionally, malformed SDP descriptions can trigger denial-of-service (DoS) attacks, such as resource exhaustion in parsers or session setup failures, as attackers craft invalid syntax to overwhelm SIP servers or user agents during offer/answer exchanges. ^[37] To mitigate these risks, several strategies have been standardized. The SDP "fingerprint" attribute ("a=fingerprint"), defined for TLS and DTLS setups, enables certificate pinning to prevent man-in-the-middle attacks by verifying the peer's certificate hash during key exchange. Encrypting SDP payloads in signaling protocols, such as using SIPS (SIP over TLS) as per RFC 3261, protects against eavesdropping and tampering. ^[38] Furthermore, ICE mechanisms (RFC 8445) allow address hiding by negotiating indirect paths via STUN or TURN relays, reducing direct exposure of private IPs in SDP candidates. SDP also faces inherent limitations that impact its security and usability. Its text-based nature introduces parsing overhead and vulnerabilities to injection attacks due to the absence of a formal schema, leading to potential ambiguities in interpretation across implementations. ^[1] Versioning is rudimentary, relying solely on the "v=" line (fixed at version 0 with no support for minor updates), which complicates evolution and backward compatibility without breaking changes. ^[1] Large SDP bodies, often exceeding several kilobytes in complex scenarios like WebRTC with numerous ICE candidates, can strain bandwidth and processing resources. ^[1] Recent updates in RFC 8866 address some legacy insecurities by deprecating the "k=" line for key exchange, which was prone to interception, and mandating secure transports for any keying material to prevent unauthorized access. ^[1] Best practices include anonymizing addresses in SDP (e.g., via TURN-only ICE configurations) and rigorously validating received descriptions against the Augmented Backus-Naur Form (ABNF) grammar to reject malformed inputs early. ^[39]

References

[1]
RFC 8866 - SDP: Session Description Protocol - IETF Datatracker
SDP is intended for describing multimedia sessions for the purposes of session announcement, session invitation, and other forms of multimedia session ...Table of Contents · Requirements and... · SDP Specification · SDP Attributes
[2]
RFC 2327: SDP: Session Description Protocol
SDP is intended for describing multimedia sessions for the purposes of session announcement, session invitation, and other forms of multimedia session ...
[3]
An Offer/Answer Model with Session Description Protocol (SDP)
This document defines a mechanism by which two entities can make use of the Session Description Protocol (SDP) to arrive at a common view of a multimedia ...<|control11|><|separator|>
[4]
RFC 4566 - SDP: Session Description Protocol - IETF Datatracker
SDP is intended for describing multimedia sessions for the purposes of session announcement, session invitation, and other forms of multimedia session ...
[5]
RFC 2327 - SDP: Session Description Protocol - IETF Datatracker
SDP is intended for describing multimedia sessions for the purposes of session announcement, session invitation, and other forms of multimedia session ...
[6]
https://datatracker.ietf.org/doc/html/rfc8866#section-5
[7]
https://datatracker.ietf.org/doc/html/rfc8866#section-5.2
[8]
https://datatracker.ietf.org/doc/html/rfc8866#section-5.4
[9]
https://datatracker.ietf.org/doc/html/rfc8866#section-5.7
[10]
https://datatracker.ietf.org/doc/html/rfc8866#section-5.9
[11]
https://datatracker.ietf.org/doc/html/rfc8866#section-5.3
[12]
https://datatracker.ietf.org/doc/html/rfc8866#section-5.8
[13]
https://datatracker.ietf.org/doc/html/rfc8866#section-5.13
[14]
https://datatracker.ietf.org/doc/html/rfc8866#section-6.7
[15]
https://datatracker.ietf.org/doc/html/rfc8866#section-6.6
[16]
https://datatracker.ietf.org/doc/html/rfc8866#section-6.15
[17]
https://datatracker.ietf.org/doc/html/rfc8866#section-6.21
[18]
https://datatracker.ietf.org/doc/html/rfc8866#section-8.2.4
[19]
https://datatracker.ietf.org/doc/html/rfc3556
[20]
https://datatracker.ietf.org/doc/html/rfc8866#section-9
[21]
RFC 3264 - An Offer/Answer Model with Session Description ...
This document defines a mechanism by which two entities can make use of the Session Description Protocol (SDP) to arrive at a common view of a multimedia ...
[22]
RFC 3551 - RTP Profile for Audio and Video Conferences with ...
RFC 3551 defines an RTP profile for audio/video conferences with minimal control, using RTP/AVP, and defines default mappings for payload types to encodings.
[23]
RFC 8827 - WebRTC Security Architecture - IETF Datatracker
This document defines the security architecture for WebRTC, a protocol suite intended for use with real-time applications that can be deployed in browsers.Missing: simulcast | Show results with:simulcast
[24]
RFC 8838 - Trickle ICE: Incremental Provisioning of Candidates for ...
This document defines "Trickle ICE", a supplementary mode of ICE operation in which candidates can be exchanged incrementally as soon as they become available.
[25]
WebRTC: Real-Time Communication in Browsers - W3C
Mar 13, 2025 · The createOffer method generates a blob of SDP that contains an RFC 3264 offer with the supported configurations for the session, including ...
[26]
RFC 8853 - Using Simulcast in Session Description Protocol (SDP ...
This document describes how to accomplish simulcast in RTP and how to signal it in the Session Description Protocol (SDP).
[27]
RFC 4975 - The Message Session Relay Protocol (MSRP)
This document describes the Message Session Relay Protocol, a protocol for transmitting a series of related instant messages in the context of a session.
[28]
RFC 4145: TCP-Based Media Transport in the Session Description Protocol (SDP)
- **Abstract**: Describes expressing media transport over TCP using SDP, defining the 'TCP' protocol identifier and attributes 'setup' and 'connection' for connection setup and reestablishment.
[29]
RFC 5888 - The Session Description Protocol (SDP) Grouping ...
In this specification, we define a framework to group "m" lines in the Session Description Protocol (SDP) for different purposes.Missing: design | Show results with:design
[30]
RFC 4574 - The Session Description Protocol (SDP) Label Attribute
The 'label' attribute is a new SDP media-level attribute that points to a media stream, allowing external documents to reference it.
[31]
RFC 4585: Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)
Summary of each segment:
[32]
https://datatracker.ietf.org/doc/html/rfc4145
[33]
https://datatracker.ietf.org/doc/html/rfc5888
[34]
https://datatracker.ietf.org/doc/html/rfc4574
[35]
RFC 8862 - Best Practices for Securing RTP Media Signaled with SIP
Best Practices for Securing RTP Media Signaled with SIP. Abstract. Although the Session Initiation Protocol (SIP) includes a suite of security services that has ...Missing: vulnerabilities | Show results with:vulnerabilities