RTP-MIDI
RTP-MIDI is a network protocol that specifies a Real-time Transport Protocol (RTP) payload format for transmitting Musical Instrument Digital Interface (MIDI) 1.0 messages over IP networks, supporting low-latency applications such as collaborative musical performances and MIDI content streaming while incorporating mechanisms for packet loss recovery and timing synchronization.[1] Developed by researchers John Lazzaro and John Wawrzynek at the University of California, Berkeley, RTP-MIDI originated from a 2004 presentation at the Audio Engineering Society on RTP payloads for MIDI, leading to its standardization by the Internet Engineering Task Force (IETF) Audio/Video Transport working group in collaboration with the MIDI Manufacturers Association (MMA).[2] The protocol was first published as RFC 4695 in November 2006 and updated as RFC 6295 in June 2011 to refine payload encoding, recovery features, and integration with Session Description Protocol (SDP) for session management.[1] RFC 4696 provides non-normative implementation guidance, emphasizing its use in both interactive real-time scenarios and non-interactive streaming.[3] Key features of RTP-MIDI include encoding all standard MIDI 1.0 commands into RTP packets with headers for MIDI data and optional recovery journals—sequences of prior commands that enable reconstruction of lost packets without retransmission, using policies like closed-loop or anchor recovery to balance latency and reliability.[1] It supports up to 16 MIDI channels per stream, timestamp-based timing for synchronization (including MIDI Time Code), and multiple related streams via SDP parameters such as musicport for port numbering or ordered relationships.[1] The protocol operates over UDP or TCP in unicast or multicast modes, integrates with Secure RTP (SRTP) for security, and is compatible with standards like General MIDI, Downloadable Sounds Level 2 (DLS2), and MPEG-4 Structured Audio, making it suitable for low-bitrate music coding and real-time audio/video synchronization.[1][3] RTP-MIDI, often referred to as AppleMIDI in Apple's ecosystem, gained widespread adoption through native support in macOS and iOS via the Core MIDI framework's Network Driver, which adds a proprietary session establishment protocol using Bonjour for peer discovery and UDP ports for control and data transmission.[4][5] Other implementations include open-source drivers for Windows (e.g., by Tobias Erichsen), Linux support in frameworks like PipeWire, hardware devices such as Kiss-Box Ethernet interfaces, and libraries for microcontrollers, Android, and iOS apps, enabling wireless MIDI over Ethernet and Wi-Fi networks.[6][5] While RTP-MIDI remains focused on MIDI 1.0, the MMA has introduced Network MIDI 2.0 as a UDP-based successor for enhanced bidirectional communication and MIDI 2.0 features.[7]Overview
Definition and Purpose
RTP-MIDI is a network protocol specification that encapsulates Musical Instrument Digital Interface (MIDI) messages within Real-time Transport Protocol (RTP) packets transmitted over User Datagram Protocol (UDP) or Transmission Control Protocol (TCP)/Internet Protocol (IP), enabling their transport across Ethernet and WiFi networks.[8][5] This format supports the full range of MIDI 1.0 commands, including those for real-time performance data, synchronization, and control, while integrating with standard IP-based networking infrastructure to facilitate low-latency communication.[8] The primary purpose of RTP-MIDI is to enable real-time, bidirectional transmission of MIDI data between devices without requiring specialized hardware beyond standard network interfaces, thereby supporting collaborative music production, remote instrument control, and live performances over IP networks.[8] By leveraging RTP's timing mechanisms and optional recovery features, the protocol ensures reliable delivery suitable for interactive applications, such as synchronized ensemble playing or streaming MIDI content, while minimizing latency critical for musical timing.[8] It addresses the limitations of physical MIDI connections, like DIN cables or USB, by allowing virtual "cables" over networks that mimic direct device linking.[5] At its core, RTP-MIDI employs RTP for precise packet sequencing and timestamping to maintain MIDI event timing, RTCP for session control, feedback on packet loss, and stream synchronization, and a session-based architecture that establishes persistent connections between endpoints.[8] These components collectively provide resilience against network variability, such as jitter or dropped packets, through configurable recovery journals and synchronization tools.[8]Key Features
RTP-MIDI enables low-latency transmission of MIDI data over IP networks by leveraging Real-time Transport Protocol (RTP) timestamps, which synchronize commands with precise timing relative to the RTP clock rate, typically set to ensure accurate playback in musical applications. Delta times encoded in the payload (1-4 octets) represent the interval between MIDI commands and the RTP timestamp, allowing for faithful reproduction of timing from sources like Standard MIDI Files. This real-time capability supports interactive performances where synchronization across devices is critical, with configurable modes for timestamp semantics such as asynchronous or buffered rendering.[8] The protocol facilitates bidirectional, full-duplex communication through sendrecv sessions that emulate the simultaneous send-and-receive behavior of physical MIDI DIN cables, enabling interactive exchanges between endpoints without directional restrictions. Multiple streams can share a MIDI namespace, identified by unique synchronization source (SSRC) identifiers, which supports virtual port mappings for complex routing in applications like networked music ensembles. This duplex nature ensures seamless integration with existing MIDI workflows, treating network connections as virtual cables.[8] RTP-MIDI operates with network transparency over standard IP infrastructures, utilizing unicast or multicast UDP/IP (or optionally TCP/IP). Scalability is achieved through support for multiple endpoints in peer-to-peer topologies or client-server configurations, where a central session can route data among numerous participants using unique SSRCs per stream and multicast for group communications. This enables applications ranging from small duets to large-scale networked orchestras, with session descriptions via SDP parameters defining transport details like IP versions and port assignments. The protocol's design accommodates varying network sizes without performance degradation in typical musical contexts.[8] Error resilience is provided by RTP sequence numbers for detecting and ordering packets, combined with recovery journals that maintain a history of recent MIDI commands to reconstruct lost data without retransmissions that could introduce latency. Journals use checkpoint packets as anchors and include tools like recency bits for SysEx messages, employing closed-loop or anchor policies to balance reliability and real-time flow. This mechanism ensures uninterrupted MIDI streams even under moderate packet loss, preserving the protocol's suitability for time-sensitive audio production.[8][9]History
Origins and Development
RTP-MIDI emerged from efforts in the early 2000s to transport Musical Instrument Digital Interface (MIDI) data over IP networks, addressing the constraints of traditional wired connections such as serial cables and USB, which limited mobility in music studios and live performances. Independent developers, notably John Lazzaro and John Wawrzynek at the University of California, Berkeley, initiated the project to encapsulate MIDI messages within Real-time Transport Protocol (RTP) packets, drawing on the RTP/RTCP framework outlined in IETF RFC 3550 published in 2003. This work was conducted in cooperation with the MIDI Manufacturers Association (MMA), aiming to enable low-latency, reliable MIDI transmission for network musical performances and remote collaboration among musicians.[10][2][9] A pivotal milestone occurred in 2004 when Lazzaro and Wawrzynek presented "An RTP Payload for MIDI" at the 117th Audio Engineering Society (AES) Convention in San Francisco, introducing the core concepts of the payload format and its integration with IETF multimedia protocols like Session Description Protocol (SDP) and Session Initiation Protocol (SIP). This presentation built on earlier explorations, such as their 2001 paper "A Case for Network Musical Performance," which highlighted the potential for IP-based MIDI in interactive applications. The motivations centered on creating a robust solution for wireless MIDI over Ethernet and Wi-Fi, mitigating packet loss through innovative recovery mechanisms like journals, while supporting both interactive real-time use and streaming content delivery.[11] By 2006, the first draft specifications culminated in the publication of IETF RFC 4695, "RTP Payload Format for MIDI," formalizing the protocol as a proposed standard under the Audio/Video Transport Working Group. This document detailed the packetization of MIDI commands, synchronization strategies, and error handling tailored for unreliable networks. Prior to this standardization, open-source aspects were evident in early prototypes shared among developer communities; for instance, in 2004, developer Tobias Erichsen encountered Lazzaro's draft and began experimenting with RTP-MIDI encapsulation, contributing feedback and creating initial implementations discussed on forums and mailing lists. These grassroots efforts fostered innovation before the protocol's broader adoption.[10][12] The foundational RTP-MIDI specifications paved the way for subsequent commercial integrations, including Apple's implementation in 2005.[10]AppleMIDI Introduction
Apple introduced support for network-based MIDI transport in macOS 10.4 Tiger, released on April 29, 2005, under the name "Network MIDI." This implementation utilized Apple's Bonjour zero-configuration networking protocol for automatic discovery of MIDI sessions on local IP networks, allowing multiple Macintosh computers to share MIDI data without additional hardware or drivers.[13][4] The feature was built on the emerging RTP-MIDI protocol, which encapsulates MIDI messages within Real-time Transport Protocol (RTP) packets to ensure low-latency transmission suitable for real-time music performance.[10] A key innovation was the deep integration with Apple's Core MIDI framework, which abstracted the networking layer to appear as standard virtual MIDI ports within applications. This enabled seamless pairing and session management through intuitive interfaces reminiscent of iTunes device connections, simplifying setup for musicians.[4] In technical documentation, the protocol became known as AppleMIDI, reflecting its proprietary extensions and the Bonjour service type_apple-midi._[udp](/page/UDP) used for advertisement.[4]
The first public applications to leverage Network MIDI were Apple's GarageBand 2.0 and Logic Pro 7, which were compatible with and supported the feature in macOS 10.4 Tiger.[13] In 2010, support extended to iOS devices with the introduction of Core MIDI APIs in iOS 4.2, enabling wireless MIDI connectivity in mobile music apps.[14]
This driverless integration on Apple platforms significantly boosted RTP-MIDI adoption among consumer musicians, as it eliminated the need for specialized hardware interfaces and facilitated easy network-based workflows in popular software like GarageBand, democratizing access to networked MIDI for home studios and education.[4][12]
Evolution Toward Modern Standards
Following Apple's introduction of its proprietary session protocol atop the standardized RTP payload format in 2005, open-source initiatives emerged to broaden RTP-MIDI accessibility beyond macOS and iOS ecosystems.[10] The rtpmidid project, a Linux daemon for sharing ALSA sequencer devices via RTP-MIDI, marked a key effort, with its initial beta release in April 2020 enabling network import and export of MIDI sessions.[15] These developments facilitated informal standardization through community-driven implementations, compensating for the lack of cross-platform native support in early RTP-MIDI adopters.[16] While the IETF formalized the RTP payload for MIDI in RFC 4695 (November 2006), which defined packet structures and recovery mechanisms for real-time transmission, the protocol was further refined in RFC 6295, published in November 2011. Discussions on full protocol integration, including Apple's session management, did not yield additional RFCs due to the proprietary nature of those extensions.[10][1] This left RTP-MIDI's session establishment reliant on reverse-engineered components in non-Apple environments, contributing to persistent compatibility challenges such as connection failures across operating systems, network instability during OS upgrades, and difficulties in multi-device setups.[17] Recent advancements, like the rtpmidid version 24.12 release in December 2024, addressed some issues by enhancing the MIDI router for improved session routing and stability in diverse network topologies.[18] The recognition of RTP's header overhead and complexity in resource-constrained devices spurred a transition to lighter UDP-based alternatives, prioritizing lower latency and simpler error handling.[7] The MIDI Association advanced this shift with Network MIDI 2.0 (UDP), initially prototyped in 2023 and formally ratified in November 2024, which supports both MIDI 1.0 and 2.0 via Universal MIDI Packets while incorporating forward error correction and authentication absent in RTP-MIDI.[19] At the NAMM 2025 Show in January, the Association unveiled initial implementations of Network MIDI 2.0, positioning RTP-MIDI as a foundational bridge to these MIDI 2.0 network extensions through backward compatibility layers.[20]Protocol Fundamentals
Packet Header Format
The RTP-MIDI packet format adheres to the Real-time Transport Protocol (RTP) structure defined in RFC 3550, consisting of a fixed 12-byte RTP header followed by a MIDI-specific payload that encapsulates Musical Instrument Digital Interface (MIDI) commands and timing information. This design enables low-latency transmission of MIDI data over IP networks while supporting error recovery and synchronization. The payload type for RTP-MIDI is dynamically assigned from the range 96-127, as registered with the Internet Assigned Numbers Authority (IANA).[21] The RTP header includes essential fields for packet identification, ordering, and timing, formatted in big-endian byte order:| Field | Size (bits) | Value/Description |
|---|---|---|
| Version (V) | 2 | Set to 2. |
| Padding (P) | 1 | Typically 0; indicates padding if 1. |
| Extension (X) | 1 | Typically 0; indicates RTP header extension if 1. |
| CSRC Count (CC) | 4 | Number of contributing sources (usually 0). |
| Marker (M) | 1 | Set to 1 if the MIDI command section length is greater than 0. |
| Payload Type (PT) | 7 | Dynamic value (96-127) for RTP-MIDI. |
| Sequence Number | 16 | Monotonically increasing counter (initial value random) to detect packet loss. |
| Timestamp | 32 | Reflects the sampling instant of the first octet in the RTP payload; clock rate specified in session setup (e.g., 1000 Hz for 1 ms resolution). |
| SSRC | 32 | Synchronization source identifier, unique per stream to distinguish sources. |
| CSRC List | Variable (0-15 × 32) | Contributing sources, if CC > 0 (rarely used in RTP-MIDI). |
- Bytes 0-1: V=2, P=0, X=0, CC=0, M=1, PT=97 (0x61) → 0x8061
- Bytes 2-3: Sequence Number (e.g., 0x0001) → 0x0001
- Bytes 4-7: Timestamp (e.g., 0x00002710 for 10000 at 1000 Hz) → 0x00002710
- Bytes 8-11: SSRC (e.g., 0x12345678) → 0x12345678
- Byte 12: MIDI Header (e.g., B=0, J=0, Z=0, P=0, LEN=4 → 0x04)
- Bytes 13-16: Delta Time (e.g., 0 for immediate, encoded as 0x00) + MIDI Command (Note On channel 0, note 60, velocity 64 → 0x90 0x3C 0x40)
Session Establishment and Management
RTP-MIDI sessions are established and managed using the Session Description Protocol (SDP) to negotiate transport parameters, media encoding, and stream configurations, typically in conjunction with signaling protocols such as SIP Offer/Answer or declarative protocols like RTSP. SDP media lines (e.g.,m=audio 5004 RTP/AVP 96) specify the RTP payload type, clock rate (e.g., 1000 Hz for 1 ms resolution or 44100 Hz), and attributes like a=rtpmap:96 rtp-midi/44100 for native streams. Additional format-specific parameters (fmtp) configure features such as timestamp mode (tsmode), recovery journal policies (e.g., j_sec=1 for 1-second updates), and MIDI command subsets (e.g., cm_used=note-on, note-off). For related streams sharing a MIDI namespace, SDP grouping attributes (e.g., a=group:FID 1 2) or the musicport parameter define identities or ordering.[27][28]
Sessions support unicast or multicast over UDP (with recovery journals for loss resilience) or TCP (without journals). Multiple concurrent streams per endpoint are possible, enabling complex topologies like splitting namespaces across streams with synchronized timestamps and shared SSRC values. Sequence numbers in RTP headers ensure ordering and loss detection, while RTCP provides feedback for quality monitoring. Synchronization relies on RTP timestamps aligned across streams and periodic RTCP sender reports to maintain timing accuracy, with configurable parameters like rtp_ptime (packet duration) and guardtime (minimum inter-packet interval, often 0 ms for low latency). For teardown, standard RTP/RTCP mechanisms (e.g., BYE packets) release resources, though application-specific signaling may handle session closure.[29][23]
This SDP-based approach decouples session parameters from physical devices, allowing flexible virtual MIDI port mappings independent of underlying network transports.[28]
Endpoint and Participant Roles
In RTP-MIDI, endpoints refer to any IP-capable devices that function as MIDI sources or sinks, such as controller keyboards, synthesizers, sequencers, or content servers, enabling the transmission and reception of MIDI data over networks.[21] Each endpoint is uniquely identified within an RTP session by a 32-bit Synchronization Source Identifier (SSRC) in the RTP header, which distinguishes multiple streams, and by a Canonical Name (CNAME) in RTCP reports, which provides persistent identification across sessions and detects SSRC collisions.[21] These identifiers ensure that endpoints can participate in unicast UDP-based sessions, where each stream typically encodes a single MIDI namespace comprising 16 voice channels plus system commands, though namespaces may be split across sessions using identical SSRC values for related streams.[21] Participant roles in RTP-MIDI sessions are defined by their involvement in data flow and session dynamics, primarily as senders or receivers, with senders responsible for transcoding MIDI data into RTP packets, timestamping commands, and maintaining recovery journals to mitigate packet loss, while receivers detect losses, repair artifacts using those journals, and render the MIDI output.[21] In session establishment, participants adopt temporary roles as initiator or acceptor: the initiator (e.g., SDP offerer) proposes connection parameters, while the acceptor (e.g., answerer) confirms or modifies them, after which roles become symmetric for bidirectional exchange. This dynamic is specified via SDP attributes such assendrecv, recvonly, or sendonly.[21]
RTP-MIDI supports multiple participants through RTP mixing at a central point or by grouping multiple unicast/multicast streams, enabling configurations like ensemble performances where MIDI is distributed to several receivers via shared namespaces or coordinated sessions. Role flexibility allows endpoints to switch functions (e.g., from sender to receiver) across sessions without fixed hierarchy.[21][23]
Compared to physical MIDI, RTP-MIDI extends connectivity over IP networks using standard RTP ports, treating sessions as virtual channels for sources and destinations while leveraging timestamps for synchronization.[21]
Apple's Session Protocol
Invitation and Connection Sequence
The invitation and connection sequence in Apple's RTP-MIDI implementation, known as AppleMIDI, begins with service discovery via Bonjour, where participating devices advertise their availability using the service type_apple-midi._udp. This zero-configuration protocol allows devices on the same local network to discover each other without manual IP configuration, registering a control port (denoted as N) and an adjacent MIDI data port (N+1) for UDP communication. AppleMIDI sessions via Bonjour are designed for devices on the same local network; connections across NAT or subnets may require additional network configuration.[4]
Once a device identifies a potential peer through Bonjour, the initiator sends an INVITE packet, represented by the 16-bit command 'IN' (ASCII 0x494E), over the control port. This packet includes the protocol version (set to 2 in network byte order), a random 32-bit initiator token generated by the sender, the sender's 32-bit Synchronization Source Identifier (SSRC) for distinguishing RTP streams, and an optional NULL-terminated UTF-8 string for the initiator's name. If no response is received, the initiator resends the INVITE every second, up to a maximum of 12 attempts. The responder, upon receiving the INVITE, replies on the same control port with either an OK packet (command 'OK', ASCII 0x4F4B) to accept—copying the initiator's token and including its own SSRC and name—or a rejection via the NAK equivalent, the 'NO' packet (command 'NO', ASCII 0x4E4F), which omits the name field.[4]
Following successful control port negotiation, the initiator repeats the INVITE on the MIDI port to establish the data channel. The responder mirrors the response with OK or NO on the MIDI port, using the same field structure. Upon mutual acceptance, the initiator initiates clock synchronization using dedicated sync packets to align timestamps and compensate for network latency. These packets include the SSRC, a count field (starting at 0 and incrementing to 2 over three exchanges), and 64-bit timestamps measured in 100-microsecond units from the local system clock. The sequence computes a round-trip offset as ((timestamp3 + timestamp1) / 2) - timestamp2, enabling latency adjustment for subsequent RTP-MIDI data packets; this sync process repeats at least every 60 seconds to maintain the session.[4]
Rejection via NO terminates the attempt without further exchanges, and failed retries after 12 attempts prompt the initiator to restart discovery.[4]
Synchronization Mechanisms
RTP-MIDI maintains timing alignment between session participants after connection establishment primarily through RTP timestamps embedded in packet headers and RTCP sender reports, which allow receivers to synchronize multiple streams from the same sender by correlating their timing fields.[30] In Apple's implementation of RTP-MIDI, known as AppleMIDI, clock synchronization is further refined using CK (synchronization) command packets, which exchange local clock values alongside RTP timestamps to compute timing offsets.[4] These CK packets include up to three 64-bit timestamps in 100-microsecond units, enabling participants to estimate clock offsets for ongoing alignment.[31] The basic offset calculation in this process derives from the difference between remote and local timestamps, normalized by the clock rate: \text{offset} = \frac{\text{remote_timestamp} - \text{local_timestamp}}{\text{clock_rate}} This formula provides a straightforward adjustment for drift, with more advanced NTP-like averaging (e.g., \text{offset_estimate} = \frac{\text{timestamp3} + \text{timestamp1}}{2} - \text{timestamp2}) used in initial exchanges and periodically refreshed every 60 seconds.[4] Receivers apply these offsets to RTP timestamps to align incoming MIDI commands accurately. To mitigate network variability, RTP-MIDI employs an adaptive jitter buffer at the receiver, which dynamically adjusts its size based on observed packet arrival times and sender timing consistency, typically ranging from 100 µs to 2 ms on low-jitter LANs.[32] This buffering smooths out jitter without introducing excessive latency, ensuring MIDI commands are played out in the correct sequence and timing.[33] Resynchronization is triggered by detected anomalies such as sequence number gaps in RTP packets or insights from periodic RTCP reports, prompting the receiver to realign its clock and buffer using the latest offset data.[34] In AppleMIDI, periodic CK timing packets sent every 60 seconds allow for drift corrections during active sessions and maintain tight synchronization even under minor network fluctuations.[4]Journal Updates and Error Handling
In RTP-MIDI, the recovery journal serves as a key mechanism for mitigating packet loss by maintaining a structured history of recent MIDI events at each endpoint, enabling state reconstruction without relying on retransmission requests. The journal is organized into chapters that categorize MIDI commands—such as channel-specific notes (Chapter N), control changes (Chapter C), and system messages (Chapter Q)—and references a checkpoint packet via its RTP sequence number, allowing receivers to apply corrective actions like NoteOff commands for indefinite artifacts (e.g., stuck notes). This buffer captures the session state in an oldest-first order, supporting active, N-active, and C-active command types to prioritize essential recovery data.[8][3] Journal updates occur dynamically as the sender appends new MIDI events to the recovery journal after transmitting each RTP packet and trims older entries based on RTCP feedback from the receiver, which reports the highest successfully received sequence number. These periodic RTCP sender and receiver reports facilitate a closed-loop policy (the default), reducing journal overhead while ensuring sufficient history for loss recovery; for instance, checkpoints can be updated every 5 seconds to optimize size without compromising reliability. In AppleMIDI implementations, the journal always includes the recovery section (indicated by the J bit) and encompasses specific chapters like P (program change), C (control), W (aftertouch), N (note), T (timing), A (active sense), Q (sequencer), and F (SysEx fragments), while excluding others such as M, E, D, V, and X to streamline transmission.[8][3][4] Packet loss is handled through gap detection in the 16-bit RTP sequence numbers (extended to 32 bits internally for rollover tracking), prompting the receiver to execute recovery commands from the journal embedded in arriving packets; the S bit in journal headers further aids detection of single-packet losses. To prevent bandwidth overload, journal size is constrained by RTCP feedback and policy parameters (e.g., j_sec="recj" enables journaling, with limits on history depth), ensuring efficient operation over UDP. This approach delivers reliable MIDI transmission comparable to TCP but with lower latency and overhead, suitable for network musical applications.[8][3][4]Disconnection Procedures
RTP-MIDI supports both graceful and abrupt disconnection procedures to ensure reliable session termination and resource management. In graceful teardown, a participant sends an RTCP BYE packet to signal its exit from the session, which includes an optional reason code to specify the cause, such as user disconnection or protocol errors. This packet is transmitted unreliably over the control channel, and upon receipt, the receiving peer acknowledges it implicitly by ceasing transmission and closing the RTP and RTCP ports associated with the session. In the AppleMIDI variant, this corresponds to the "End Session" command encoded as the two-byte sequence 0x4259 ('BY'), sent via the control UDP port, mirroring the RTCP BYE structure while integrating with Apple's session management.[4][35] For abrupt disconnections, such as those caused by network failures or crashes, RTP-MIDI implementations detect inactivity through timeouts on control packets. Receivers monitor for the absence of RTCP packets, including Sender Reports (SR), Receiver Reports (RR), and recovery journals, typically timing out after 5-10 seconds of silence to trigger automatic session closure and prevent indefinite resource holding or stuck MIDI notes. This aligns with the protocol's minimum RTCP transmission interval of 5 seconds for small sessions, allowing prompt detection without excessive delay. In AppleMIDI, the Clock Synchronization (CK, 0x434B) packets, sent approximately every 60 seconds by the session initiator, provide an additional heartbeat; prolonged absence reinforces the timeout-based disconnect.[3][4] Upon disconnection—whether graceful or abrupt—implementations release resources by flushing any buffered MIDI events to avoid artifacts, deregistering the virtual MIDI ports created for the session, and updating the mDNS (Bonjour) service announcement to remove the endpoint from network discovery lists. AppleMIDI handles multiple concurrent sessions independently, ensuring that a disconnection in one does not propagate to others, with each session maintaining separate port pairs and state.[4][36] Specific error conditions are conveyed via the BYE packet's reason field, a variable-length text string that may indicate "user disconnected" for manual terminations or "network failure" for connectivity issues, aiding in diagnostics and logging without requiring additional packets.Advanced Protocol Features
MIDI Merging
In RTP-MIDI, the merging process occurs at the session receiver, where multiple incoming MIDI streams are combined into a single output stream to maintain compatibility with traditional MIDI 1.0 DIN cable merging standards.[37] Receivers interleave MIDI commands from these streams based on their RTP timestamps, ensuring proper ordering and timing preservation across the combined output.[38] This timestamp-based approach relies on RTP sequence numbers to detect and reconstruct packet order, preventing out-of-sequence delivery that could disrupt musical performance.[22] The protocol does not include a native command for merging; instead, it is implemented through endpoint logic that processes streams identified by unique Synchronization Source Identifiers (SSRCs).[37] A common use case for MIDI merging in RTP-MIDI is in professional studio setups, where multiple controllers—such as keyboards or sequencers—transmit data over a network to feed a single synthesizer or audio workstation, enabling collaborative music production without physical cabling.[39] For instance, in network musical performances, participants can share a session where incoming streams from remote devices are seamlessly integrated into the local receiver's MIDI namespace, supporting real-time synchronization via RTCP sender reports.[22] Limitations arise from potential channel conflicts when multiple streams target the same MIDI channels, which can lead to artifacts like stuck notes if not managed properly.[37] Senders mitigate this by partitioning streams—such as assigning distinct channels to separate RTP sessions—while receivers resolve duplicates through filtering based on sequence numbers and SSRCs, though the protocol recommends careful configuration to avoid indefinite issues.[37] In AppleMIDI implementations, the Core MIDI framework handles this merging transparently at the system level, presenting the combined stream as a unified virtual MIDI port without requiring application-level intervention.[40]MIDI Splitting and Thru Functionality
RTP-MIDI supports MIDI splitting by allowing endpoints to replicate incoming MIDI packets across multiple active sessions, ensuring that data from a single source can be distributed to numerous destinations without loss of synchronization. This replication process preserves the original RTP timestamps embedded in the packets, which are crucial for maintaining timing accuracy and order of delivery across the network.[41][42] The thru functionality in RTP-MIDI emulates the behavior of traditional physical MIDI thru ports, where incoming data is forwarded to additional outputs without alteration or processing, enabling seamless passthrough in networked environments. Within a single session, this occurs automatically as MIDI messages from one participant are duplicated and broadcast to all other connected devices, functioning as a virtual thru box.[41] Implementation of splitting and thru relies on virtual MIDI ports created by the operating system's RTP-MIDI driver or dedicated router software, which handle the duplication and routing logic. For instance, endpoints can support fan-out to four or more outputs by participating in multiple concurrent sessions, with each session treated independently to direct replicated streams. This approach allows a single incoming MIDI stream to be fanned out across diverse network destinations, such as multiple hardware interfaces or software applications.[42][41] At the protocol level, RTP-MIDI provides no dedicated commands for splitting or thru; instead, these features emerge from the underlying multi-session management capabilities of the AppleMIDI session protocol layered atop RTP. Endpoints manage replication by joining multiple sessions simultaneously, using session identifiers and port assignments to segregate traffic without dedicated signaling.[41][4] In networked setups, this functionality facilitates daisy-chaining of MIDI devices over Ethernet or Wi-Fi, where intermediate endpoints can filter and replicate streams to downstream participants while preventing feedback loops through selective session participation and port isolation. This contrasts with MIDI merging, which combines inputs from multiple sources into a unified stream.[41]Distributed Patchbay Concept
The distributed patchbay concept in RTP-MIDI envisions the IP network as a flexible matrix of virtual MIDI cables, where endpoints dynamically connect and route data without physical interconnections. Devices advertise their virtual ports through Bonjour service discovery on the_apple-midi._udp service, enabling peer-to-peer session invitations that establish bidirectional MIDI streams. Each session functions as a virtual cable pair, supporting up to 16 such pairs per endpoint in typical implementations, allowing users to patch MIDI sources to destinations across the network as if using a traditional hardware patchbay. This model leverages the protocol's UDP-based control and RTP payload channels to create on-demand connections, transforming scattered devices into an interconnected MIDI ecosystem.[4]
A key benefit of this approach is its scalability for expansive setups, accommodating dozens to over 100 devices in professional environments by distributing routing logic across participants rather than requiring centralized hardware. It significantly reduces cabling complexity, as Ethernet or Wi-Fi infrastructure handles long-distance transmission, supporting runs up to hundreds of meters without signal degradation. In contrast to legacy MIDI 1.0 systems limited by daisy-chaining and single-cable constraints, RTP-MIDI's virtual patching minimizes setup time and physical clutter, making it suitable for mobile or venue-based applications.[43]
The protocol enables this distributed patching through multi-session support, where a single endpoint can maintain concurrent connections to multiple peers using unique session tokens and SSRC identifiers for isolation. Automatic discovery and invitation sequences allow ad-hoc reconfiguration, with MIDI data automatically merged from incoming sessions at the receiver or split to outgoing ones, building on endpoint-level operations like thru functionality. This facilitates seamless integration in heterogeneous networks, where devices join or leave without disrupting existing routings, provided the underlying IP topology remains stable.[4]
Representative examples include large-scale live performance networks, such as theater productions or ensemble setups, where a central control station dynamically routes MIDI from a conductor's surface to distributed instrument sections across a venue, ensuring synchronized playback via redundant virtual paths. In such configurations, technicians can re-patch streams in real-time—e.g., redirecting clock signals to backup devices—leveraging the protocol's timestamping for precise synchronization.[43]
Despite these advantages, the concept has limitations in highly complex topologies, often relying on a dedicated central router or hub device to aggregate and manage connections for stability and conflict avoidance. The protocol provides no native arbitration for simultaneous data streams from multiple sources, leaving resolution to endpoint merging logic, which may introduce variability in large multicast scenarios without additional network optimizations.[43]