Fact-checked by Grok 2 weeks ago

Presentation timestamp

A presentation timestamp (PTS) is a 33-bit field embedded in the header of Packetized Elementary Stream (PES) packets within MPEG-2 program streams or transport streams, indicating the precise time at which an access unit—such as a video frame or audio sample—should be presented to the viewer relative to a 90 kHz system clock derived from the Program Clock Reference (PCR). This timestamp ensures between audio and video elements, preventing issues like lip-sync discrepancies during playback, and is essential for decoding and in digital video broadcasting standards. In systems, the operates alongside the Decoding Time Stamp (DTS), which specifies when an access unit must be decoded; for intra-coded (I) and predictive (P) frames, PTS and DTS typically align since decoding and presentation occur sequentially, whereas for bi-directional predictive (B) frames, DTS precedes PTS due to their out-of-order transmission for efficient . Audio PES packets contain only PTS, as samples are presented in sequence without reordering, while video packets may include both depending on frame type, with separations up to three picture periods in sequences like IPBB. The PTS provides millisecond-level precision (90 kHz resolution), with a maximum interval of 700 ms between timestamps; decoders interpolate absent values to maintain smooth playback. Beyond core applications in DVD, digital TV, and streaming, principles extend to related formats like RTP payloads for MPEG video over networks, where 32-bit fields at 90 kHz accuracy synchronize frames across packets. In modern contexts such as (HLS), facilitates timing of media segments for adaptive bitrate delivery, ensuring seamless transitions and synchronization in web-based video. Standards like ISO/IEC 13818 govern implementation, emphasizing its role in system target decoders for accurate presentation unit timing.

Fundamentals

Definition

A (PTS) is a field embedded in the headers of packetized elementary streams (PES) within MPEG transport streams (TS) or program streams (). The PTS indicates the exact time at which a media frame—such as a video picture or audio access unit—should be presented to the user after decoding, relative to a reference clock. It was introduced in the standard (ISO/IEC 11172-1), published in 1993, to enable synchronized playback of audio and video. In (ISO/IEC 13818-1), the is encoded as a 33-bit value sampled from a 90 kHz clock, wrapping around every approximately 26.5 hours (233/90,000 seconds).

Purpose in Media Synchronization

The presentation timestamp () primarily serves to achieve frame-accurate between audio and video tracks in streams, ensuring that corresponding elements such as spoken and lip movements align precisely during playback to prevent lip-sync discrepancies. In MPEG standards, values, derived from a 90 kHz system clock, indicate the exact time when an access unit—such as a video frame or audio sample—should be presented, allowing decoders to coordinate multiple streams relative to a shared timeline. This is essential for immersive media experiences, where even minor temporal offsets can degrade perceptual quality. PTS also plays a key role in buffering mechanisms within systems, where it guides the management of queues by specifying the release time for each access unit, thereby preventing playback or buffer underruns caused by variable decoding delays. For instance, in the System Target Decoder model of , PTS ensures that frames are held in until their designated presentation instant, maintaining smooth output rates despite fluctuations in input data arrival. This timed presentation control supports real-time playback in constrained environments like broadcast receivers, where buffer overflow or starvation must be avoided to uphold continuous media flow. Furthermore, PTS facilitates clock recovery at the decoder by enabling the reconstruction of the encoder's system clock, often in conjunction with program clock references (PCR) in MPEG transport streams, to align playback with the original timing intent. The decoder uses PTS values to adjust its local clock, compensating for drift and ensuring long-term synchronization across extended streams, with PTS insertions required at intervals no greater than 700 ms to maintain accuracy. This process is vital for applications like digital television, where precise clock alignment prevents cumulative timing errors over hours of content. In error handling, supports the detection of missing or out-of-order access units by allowing decoders to compare received timestamps against expected sequences, flagging anomalies such as large gaps or regressions that indicate or reordering during transmission. For example, non-monotonic progression can trigger resynchronization procedures, mitigating impacts from network jitter or in streaming scenarios. Such validation ensures robust playback resilience without requiring additional overhead metadata.

Standards and Protocols

MPEG Family

The MPEG family of standards introduced and evolved the (PTS) as a core mechanism for synchronizing audio and video streams within compressed bitstreams. Beginning with , PTS was defined to ensure precise timing in the system layer, setting the foundation for subsequent enhancements in later standards that addressed broader applications such as and . In , formalized as ISO/IEC 11172 in 1993, the was initially specified in the system layer for video and audio elementary streams into a single . It appears in the packet headers of the packet layer, providing timing for units such as decoded audio access units or video pictures. The uses a granularity of 90 kHz for audio and video , enabling end-to-end timing correction across streams in storage media like . This design prioritized simplicity for consumer applications, with values indicating the intended time relative to the system target decoder's clock. MPEG-2, defined in ISO/IEC 13818 in 1995, enhanced the to support more robust delivery scenarios, particularly broadcasting. The is embedded in the headers of Packetized Elementary Stream (PES) packets, using a 33-bit value encoded across three fields separated by marker bits, maintaining the 90 kHz for compatibility with MPEG-1. This extension facilitates in both program streams and transport streams, where PES packets are further encapsulated for error-prone environments like or transmission. In transport streams, the enables multi-program transport by associating timing with specific Packet Identifiers (PIDs), ensuring seamless switching between programs without disrupting playback. Notably, is mandatory for video and audio PIDs to support multi-program capabilities, with requirements for inclusion in the first access unit of each stream and at intervals not exceeding 700 ms. The MPEG-4 standard, outlined in ISO/IEC 14496 starting in 1999, adapted for object-based and interactive multimedia, integrating it into the synchronization layer () that packetizes elementary streams. is conveyed within packets associated with object descriptors, which manage stream identification and updates during a presentation. This structure supports dynamic scene descriptions, allowing to synchronize not only audio-visual elements but also interactive components like user events in and mobile applications. The 32-bit operates at a flexible , often 90 kHz or configurable, promoting adaptability for bandwidth-constrained or variable-rate environments.

Streaming and Transport Protocols

In the (RTP), standardized in 3550 in 1996, the presentation timestamp (PTS) from underlying media formats such as MPEG is mapped directly to the RTP header's 32-bit timestamp field to support real-time delivery of audio and video over IP networks. This field indicates the sampling instant of the payload's first octet, enabling receivers to synchronize presentation across multiple streams by correlating RTP timestamps with (NTP) timestamps via RTCP Sender Reports. Network is compensated through RTCP feedback mechanisms, which provide timing adjustments without altering the core PTS values. HTTP Live Streaming (HLS), developed by Apple and first released in , relies on PTS embedded in MPEG-2 Transport Stream (TS) segments to facilitate over HTTP. Each TS segment maintains continuous PTS sequencing from the prior segment, ensuring precise alignment of audio and video during playback transitions and preventing discontinuities in live or on-demand scenarios. This approach allows clients to switch bitrates seamlessly while preserving temporal synchronization. The (DASH) standard, defined in ISO/IEC 23009-1 and published in 2012, integrates within segment timelines referenced by the media presentation description () to enable synchronized delivery of multi-track content such as audio, video, and over HTTP. The specifies presentation durations and offsets, with values in the underlying segments (e.g., ISOBMFF or fragmented MP4) ensuring coordinated rendering across tracks regardless of varying bandwidth conditions. WebRTC, which emerged in 2011 as a framework for browser-based real-time communication, employs in conjunction with RTP for transporting media in video calls and peer-to-peer sessions. RTP timestamps derived from are used for lip-sync and jitter buffering, and in WebRTC's statistics API, these values are normalized to milliseconds relative to the session start for performance monitoring and synchronization diagnostics. The Audio Video Transport Protocol (AVTP), outlined in IEEE 1722 and finalized in 2016, incorporates a presentation time stamp in its stream data units for low-latency media transport over Ethernet in automotive infotainment and professional audio-visual systems. This timestamp, aligned to the IEEE 802.1AS gPTP clock, provides sub-microsecond precision to schedule exact presentation times at listeners, compensating for network delays in time-sensitive environments.

Technical Aspects

Encoding and Calculation

The presentation timestamp (PTS) in MPEG standards is derived from a system clock with a base frequency of 90 kHz, obtained by dividing the primary 27 MHz system clock frequency by 300, ensuring across video and audio . This 90 kHz provides a tick duration of approximately 11.11 microseconds, suitable for precise media timing. The PTS value itself is a 33-bit , representing the count of these 90 kHz ticks from a reference point, and is encoded in the packetized elementary stream (PES) header or equivalent structures. The calculation of the PTS follows the formula PTS = round(90,000 \times t_p) \mod 2^{33}, where t_p is the intended presentation time in seconds relative to the stream's start. Equivalently, using the 27 MHz clock count STC (system time clock), it is computed as PTS = \lfloor \frac{\text{STC}}{300} \rfloor \mod 2^{33}, reflecting the downsampling to 90 kHz units. The actual presentation time is then recovered as t_p = \frac{\text{PTS}}{90,000} seconds during decoding. These computations ensure the PTS wraps around after approximately 26.5 hours (2^{33} / 90,000 seconds), necessitating careful handling of discontinuities in long streams. For video streams, the PTS increments by the duration of each access unit (typically ) in 90 kHz ticks; for example, at 30 frames per second, the frame duration is 1/30 seconds, yielding an increment of 3,000 ticks (90,000 / 30). In audio streams, increments occur per access unit, often aligned to sample blocks, with the PTS advanced by the corresponding time interval; for a 48 kHz sample rate, this equates to roughly 1.875 ticks per sample (90,000 / 48,000), accumulated across samples to produce integer PTS values for each audio . These increments maintain lip-sync by aligning audio and video PTS values to the same clock reference. In video encoding with B-frames (bidirectional predicted frames), the is assigned based on the intended order rather than the encoding or decoding sequence, as B-frames are typically encoded after subsequent I- or P-frames but presented earlier. This requires the decoder to reorder frames using both and decoding (DTS) values, ensuring correct temporal display while the encoder generates in display sequence. MPEG-4 introduces extensions for greater precision through the sync layer () header, where the length (SL.TSlen) allows variable bit widths up to 31 bits, and the (SL.TSres) defines the clock , enabling finer control beyond the fixed 33-bit 90 kHz of earlier standards. The full time is reconstructed by combining the coded (CTS) with an extension factor derived from clock references like the Object Clock Reference (OCR) to handle wraparound, supporting the defined . The DTS/ flags in the PES header indicate the presence of these extended fields for compatibility.

Handling in Playback Systems

In playback systems, the decoder pipeline parses the presentation timestamp (PTS) from the packetized elementary stream (PES) headers embedded within the transport stream or program stream. This PTS value, encoded in 33-bit resolution at a 90 kHz , specifies the exact time for presenting the associated audio or video access unit to the user after decoding. The extracted is then utilized to manage a presentation in the decoder buffer, where decoded frames are reordered and sorted by their PTS values to reflect the intended display sequence, particularly to handle out-of-order arrival due to B-frame dependencies in compressed video. This ensures temporal correctness during rendering, with the decoder removing frames from the queue for output when the current matches or exceeds the frame's PTS. For streams subject to network variability, such as those delivered via RTP over , systems like FFmpeg implement a to mitigate packet arrival delays and reorder issues. Packets are enqueued based on sequence numbers, and values are adjusted during dequeued processing by converting timestamps between RTP time bases and the stream's clock using functions like av_rescale_q, which rescales rational time bases while accounting for wraparounds. This adjustment aligns with the local playback timeline, preventing desynchronization from transmission . Synchronization algorithms in playback systems continuously compare incoming PTS values against a local reference clock to regulate rendering pace and maintain lip-sync between media elements. In MPEG-based streams, the program clock reference (PCR) from the transport stream serves as the primary clock source, periodically updating the decoder's 27 MHz system time clock (STC) to track the encoder's timing; discrepancies are corrected by slewing the STC rate. For IP-delivered content, (NTP) timestamps from RTCP sender reports can map RTP-derived PTS to wall-clock time for similar alignment. If a frame's adjusted PTS indicates it is excessively late relative to the current clock—typically beyond thresholds like 100 ms to avoid perceptible stutter—playback systems may drop it to prioritize fluidity over completeness. In multi-stream scenarios, such as combined audio and video tracks, players like leverage to align presentation across elements by deriving a master clock from the audio output or input , adjusting playback rates accordingly. If drift occurs between tracks, applies audio resampling or buffer flushing to interpolate and correct timing without introducing visible artifacts, ensuring coherent audiovisual rendering. A specific implementation is seen in the MediaCodec , where since its introduction in API level 16 ( 4.1, 2012), developers supply as the presentationTimeUs (in microseconds) via the queueInputBuffer . This feeds the directly into the hardware decoder for synchronized output buffer release, enabling efficient, accelerated media rendering on device surfaces or extractors while preserving timing integrity across frames.

Decoding Timestamp

The decoding timestamp (DTS) specifies the time at which a video access unit, such as , must be decoded by the receiving system to ensure proper handling of inter-frame . In contrast to the presentation timestamp (), which determines display timing, the DTS addresses the need for decoding frames in a dependency order that may differ from the display order due to methods, particularly those involving B-frames that rely on subsequent P-frames for . Within the MPEG-2 systems framework, the DTS is structured as a 33-bit value, distributed across three fields in the packetized elementary stream (PES) header—typically 3 marker bits followed by 15 bits, another 15 bits with markers, and a final 15 bits—to indicate decoding time relative to a 90 kHz clock. This format mirrors that of the but prioritizes the sequence required for resolving frame dependencies; in cases where no reordering is necessary, such as intra-frame or purely predictive sequences without bi-directional frames, the DTS value matches the . The primary role of the DTS is to guide decoders in processing frames sequentially according to their predictive relationships, ensuring that reference frames (I- or P-frames) are available before dependent B-frames are decoded, thereby preventing errors in and maintaining synchronization before frames are queued for presentation. This is particularly critical in compressed video streams where decoding order deviates from display order to optimize compression efficiency. In the H.264/AVC standard (published in 2003), the DTS-to-PTS delta serves as an indicator for the reordering buffer requirements within the decoded picture buffer (DPB), where the maximum delta can extend to 16 frames based on the specified and level constraints. To derive the final sequence, frames are first decoded in the order prescribed by their DTS values, after which the sorts the completed frames by for output in the intended display order.

Other Timestamp Variants

In transport protocols, the RTP timestamp serves as a media-specific to facilitate across packets, typically operating at 90 kHz for video streams to align with common encoding standards, though it is not identical to a presentation timestamp but can be converted for timing adjustments during playback. This approach ensures that timestamps reflect the sampling instant of the first data octet, allowing receivers to reconstruct the original timing despite network variability. The Presentation Time Protocol, proposed for in 2014, provides hardware-derived timestamps to deliver precise feedback on when frames are actually presented on the display, enabling low-latency adjustments in compositors for smoother video rendering and audio-video alignment. By leveraging direct hardware measurements converted by the driver, it accounts for display path latencies that software clocks might overlook, thus supporting applications requiring tight in graphical environments. In Kinesis Video Streams, introduced in , fragment timestamps function as presentation timestamps relative to the start of each data fragment, aiding in the precise reconstruction of video sequences during cloud-based archiving and retrieval. This relative timing model accommodates fragmented storage in the AWS cloud, where each fragment encapsulates time-delimited media segments for efficient processing and playback without absolute clock dependencies. Within the multimedia framework, established in , the presentation timestamp () operates as a pipeline-internal metric that incorporates processing delays, resulting in values that differ from raw capture timestamps by the cumulative introduced across elements. This internal adjustment ensures synchronized rendering at the , measured against the pipeline's clock rather than the source's capture instant, which is essential for handling variable buffering in complex media workflows. The AVTP timestamp in IEEE 1722, designed for (AVB) networks, relies on cycle-time references from the IEEE 802.1AS protocol to define presentation offsets, functioning similarly to a by specifying the exact gPTP-aligned time for media presentation at the listener. This cycle-based mechanism supports deterministic delivery in time-sensitive networks, where offsets account for transit delays to maintain lip-sync and low-jitter performance in professional audio-video applications.

References

  1. [1]
    [PDF] A Guide to MPEG Fundamentals and Protocol Analysis - Tektronix
    PTS – Presentation Time Stamp. The time at which a presentation unit is to be available to the viewer. PU – Presentation Unit. One compressed picture or ...
  2. [2]
    RFC 2250: RTP Payload Format for MPEG1/MPEG2 Video
    PT: MPEG video or audio stream ID. timestamp: 32-bit 90K Hz timestamp representing presentation time of MPEG picture or audio frame. Same for all packets ...
  3. [3]
    Presentation Timestamp (PTS) - Streaming Video Wiki
    Apr 7, 2024 · Presentation Timestamp (PTS) is a crucial concept used to ensure the proper synchronization and presentation timing of media segments within an HLS stream.
  4. [4]
    ISO/IEC 11172-1:1993 - Information technology — Coding of moving ...
    In stock 2–5 day deliveryPublication date. : 1993-08 ; Stage. : International Standard confirmed [90.93] ; Edition. : 1 ; Number of pages. : 53 ; Technical Committee : ISO/IEC JTC 1/SC 29.
  5. [5]
    [PDF] ETSI TS 101 154 V1.9.1 (2009-09)
    AU_PTS_32: The 32 most significant bits of the 33-bit PTS encoded in the PES header immediately following this adaptation field, or of the value that ...
  6. [6]
    [PDF] Understanding Timelines within MPEG Standards
    The second one can be either the. Presentation Timestamp (PTS), in MPEG-2, or the Composi- tion Timestamp (CTS), in MPEG-4. These timestamps will be discussed ...
  7. [7]
    PTS - Presentation Time Stamp
    The Presentation Time Stamp (PTS) indicates the instant at which an access unit should be removed from the receiver buffer, instantaneously decoded, and ...Missing: definition | Show results with:definition
  8. [8]
    [PDF] ISO/IEC - 11172-1 - iTeh Standards
    Because presentation time-stamps apply to the decoding of individual elementar-y streams, they reside in the packet layer. End-to-end synchronization occurs ...
  9. [9]
    [PDF] ISO/IEC 13818-1
    Dec 1, 2000 · For each elementary stream of a Program Stream or Transport Stream, a presentation timestamp (PTS) shall be encoded for the first access ...
  10. [10]
    [PDF] ISO/IEC 14496-1 - iTeh Standards
    Jun 1, 2010 · This packetized representation additionally provides timing and synchronization information, as well as fragmentation and random access ...
  11. [11]
    [PDF] ITU-T Rec. H.265 (08/2021) High efficiency video coding - TI E2E
    Aug 22, 2021 · 265 | International Standard ISO/IEC 23008-2 represents an evolution ... H.265. High efficiency video coding. 0. Introduction. 0.1. General.
  12. [12]
    RFC 3550 - RTP: A Transport Protocol for Real-Time Applications
    RTP is a real-time transport protocol for end-to-end delivery of real-time data like audio and video, including payload identification and sequence numbering.Missing: PTS | Show results with:PTS
  13. [13]
    HTTP Live Streaming (HLS) authoring specification for Apple devices | Apple Developer Documentation
    ### Summary: PTS in HLS TS Segments for Streaming and Synchronization
  14. [14]
    RFC 8834 - Media Transport and Use of RTP in WebRTC
    This memo describes the media transport aspects of the WebRTC framework. It specifies how the Real-time Transport Protocol (RTP) is used in the WebRTC context.
  15. [15]
    Identifiers for WebRTC's Statistics API - W3C
    Sep 25, 2025 · This document defines a set of WebIDL objects that allow access to the statistical information about a RTCPeerConnection.
  16. [16]
    Timestamp validation in Transport Stream - Elecard
    Jun 14, 2023 · Each PES packet has at least one timestamp, a PTS (Presentation Time Stamp) and possibly a DTS (Decoding Time Stamp). The PTS indicates the ...
  17. [17]
    CSE 126/228F Lecture 7: April 23
    Synchronization ... MPEG has two timestamps that are inserted into the packet header, the DTS or decoding timestamp, and the PTS, or presentation timestamp.
  18. [18]
    Set top box decoders process MPEG-2 - and offload the CPU
    A presentation time stamp (PTS) value is extracted from the header and placed in a DRAM buffer, and a header length field is decoded to identify the start ...<|separator|>
  19. [19]
    FFmpeg: libavformat/rtpdec.c Source File
    ### Summary of Jitter Buffer and PTS Adjustment in FFmpeg RTP Decoder
  20. [20]
    [FFmpeg-user] Ffmpeg freeze after a random time receiving live rtsp ...
    May 22, 2016 · [FFmpeg-user] Ffmpeg freeze after a random time receiving live rtsp stream. Anacelia Sarlo anacelia.sarlo at gmail.com
  21. [21]
    include/vlc/libvlc_media_player.h · master · VideoLAN / VLC - GitLab
    The output source are. * audio and video outputs: an update is received each time a video frame is. * displayed or an audio sample is written. The delay ...
  22. [22]
    What does `PTS is out of range` mean to VLC? - Super User
    Feb 13, 2013 · The software notices (when it's too late) that they're out of sync, and drops frames in the audio buffer to fix it.How can I re-sync the subtitle and the video using VLC media player?Replace audio + sync + save all to a new video file – VLC - Super UserMore results from superuser.com
  23. [23]
  24. [24]
  25. [25]
    [DOC] JVT-D125-L.doc - ITU
    Jul 26, 2002 · The timestamps that indicate the presentation time of video are called Presentation Time Stamps (PTS) while those that indicate the decoding ...
  26. [26]
  27. [27]
    RFC 2250: RTP Payload Format for MPEG1/MPEG2 Video
    Presentation Time Stamps (PTS) of 32 bits with an accuracy of 90 kHz shall be carried in the fixed RTP header. All packets that make up a audio or video ...Missing: formula | Show results with:formula
  28. [28]
    [RFC v2] Wayland presentation extension (video protocol)
    Feb 8, 2016 · [RFC v2] Wayland presentation extension (video protocol). Pekka Paalanen ppaalanen at gmail.com. Thu Jan 30 07:35:17 PST 2014.[PATCH 0/8] Wayland Presentation Extension v5 - Mailing Lists[PATCH 0/8] Wayland Presentation Extension v4 - Mailing ListsMore results from lists.freedesktop.orgMissing: derived | Show results with:derived
  29. [29]
    Presentation time protocol | Wayland Explorer
    The main feature of this interface is accurate presentation timing feedback to ensure smooth video playback while maintaining audio/video synchronization.Missing: derived PTS
  30. [30]
    Kinesis Video Streams data model - AWS Documentation
    Presentation Timestamp: The timestamp of when this frame is displayed. This value is relative to the start of the fragment. Duration: The playback duration of ...Missing: 2016 | Show results with:2016
  31. [31]
    Latency - GStreamer
    The latency is the time it takes for a sample captured at timestamp 0 to reach the sink. This time is measured against the pipeline's clock.
  32. [32]
    Clocks and synchronization in GStreamer
    The latency is the time it takes for a sample captured at timestamp X to reach the sink. This time is measured against the clock in the pipeline. For pipelines ...Clock Running-Time · Buffer Running-Time · Clock Providers
  33. [33]
    [PDF] IEEE 1722 Media on AVB Networks - Avnu Alliance
    IEEE 1722 Presentation Timestamp. • Tells Listener the exact time to present media samples. – Using the common 802.1AS clock as a “measuring stick”. – De ...Missing: sub- microsecond precision