Time, Timecode, and Sync for NDI

Introduction

While it seems like the concept of time and a timecode value for a given audio or video frame should be fairly straightforward, various real-world issues can quickly make the issue of time, timecode, and timing very complicated. This document attempts to codify how the NDI library handles time values and proposes best practices of how to handle time values in various situations.

The Basics

NDI Time Values

All time values in the NDI SDK are represented as 64-bit integer values with units of 100ns. Time values for a frame can be synthesized by the SDK or provided by the application layer. When synthesized by the SDK, both the timecode and timestamp field use the same logic to synthesize a time value, however this logic varies between platforms (discussed in more detail below).

NDI timecode

The NDI timecode value has no specified epoch so the use and interpretation of timecode values is somewhat workflow dependent.

NDI expressly does not define an epoch for the timecode value, with the idea that legacy SMPTE timecode values (which include 0 as a legitimate 00:00:00:00 hh:mm:ss:ff value) are legal. As with SMPTE timecode, the NDI timecode may increment in step with real or "wall-clock" time or it may increment faster or slower than real time (eg: fast-forward or slow-motion), potentially even stopping or decrementing (eg: paused or rewinding). It is also possible for the timecode value to be counting (normal playback) but not incrementing every frame, as legacy SMPTE timecode formats cannot represent frame rates over 30 fps.

Timecode values for live sources, however, should generally increment in step with real time. While not mandatory, it is recommended that the epoch used for timecode values for live sources use either the Unix epoch for timecode values which also include date information or zero for SMPTE style timecode that does not include any date information and "wraps" after 24 hours.

While not explicitly specified by the documentation, timecode values should be considered to represent the end of the associated audio or video data. This way implementations that are capturing real world data (eg: video camera or screen capture) will have similar timecode values to applications which are having the NDI SDK synthesize the timecode. The NDI SDK generates the timecode for a frame when that frame is passed to the SDK.

Synthesized timecode

When the NDI library syntesizes timecode, the timecode value depends on the previous context of the stream. If the first frame passed to the NDI library has the timecode value set to NDIlib_send_timecode_synthesize, the initial timecode value will be derived from the current UTC time. Subsequent frames submitted to the NDI library will be advanced by the frame time as determined by the frame_rate_N and frame_rate_D values for video frames and the sample_rate and no_samples values for audio frames.

If the application provides a timecode value other than NDIlib_send_timecode_synthesize and subsequently requests that timecode be synthesized, new timecode values will be calculated from the last user-provided timecode value, incremented by the frame time determined by the frame details as above.

Additional details can be found in the NDI SDK Documentationarrow-up-right

NDI timestamp

NDI timestamp values represent the time a frame was passed to the NDI SDK and are referenced to the UNIX time epoch (00:00:00 UTC on 1 January 1970).

The timestamp value will be generated by the NDI SDI if the timecode value passed with the NDI frame is either zero or NDIlib_send_timecode_synthesize.

Applications are strongly encouraged to allow the NDI library to synthesize the timestamp value.

Additional Details

SMPTE Timecode & MPEG-2 PCR Values

Conversion between SMPTE timecode, NDI timecode values, MPEG-2 PCR time base values, and absolute (or "wall clock") time can be confusing and complex. The guidelines and formulas from SMPTE Engineering Guidelne 40-2002 are recommended for anyone wishing to convert between these formats. For NDI timecode values, the conversion routines for absolute time. Conversion between NDI time values (100 nS) and the absolute time values (seconds) used in EG 40 are fairly straightforward:

  • NDI time = absolute time / 100e-9

  • absolute time = NDI time * 100e-9

Note that like NDI time values, absolute time values can represent more than 24 hours. The conversion formulas in EG 40 account for this.

Note that for some formats, conversion between SMPTE and NDI timecode values requires information that is not necessarily easily available. For instance fractional frame rates (eg: 29.97 or 59.94 fps) may use drop-frame or non-drop-frame timecode. If this is distinction is critical for a particular use case, the application should provide a means to select the appropriate option.

Platform Specific Details

The logic used to determine the current UTC time is platform specific and has changed over time as operating systems have provided improved APIs.

Windows 8 and newer:

UTC time is generated using GetSystemTimePreciseAsFileTime() with the return value adjusted to the Unix epoc. The time values will track system time, including any potential corrections made by clock synchronizing code such as NTP or PTP clients.

Earlier Windows versions:

UTC time is generated using QueryPerformanceCounter() with the epoch anchored to UTC using getSystemTimeAsFileTime() when the NDI instance is created.

NOTE: GetSystemTimeAsFileTime() has apx. 10 millisecond precision which is why QueryPerformanceCounter() is used. The performance counter timebase is typically a free running clock that is not corrected by NTP or PTP and is often not particuarly accurate (frequency errors of 1-2% or more are common).

Mac and Linux:

UTC time is generated using std::chrono::high_resolution_clock with the epoch anchored to UTC using gettimeofday() when the NDI instance is created. The high resolution clock is typically a free running clock that is not corrected by NTP or PTP and is often not particuarly accurate (frequency errors of 1-2% or more are common).

Future versions of the NDI library will switch to using clock_gettime() with CLOCK_REALTIME to more closely match the behavior of NDI on modern Windows platforms.

Long Term Drift

When synthesizing timecode values and "clocking" audio or video sending, the NDI library (up to and including version 6.3.0) uses an integer time value in units of 100ns for the frame duration. This means for example that a frame rate of 59.94 fps which should be 166,833.333... will instead be 166,833 meaning that the NDI timecode values and clocked frame rate will show some drift from the expected values and times over a long period of time (apx. 0.173 seconds every 24 hours for 59.94 fps).

Note that frame rates that can be exactly represented by 100 ns units (such as 25 and 50 fps) are not affected by this issue.

This is scheduled to be fixed in the next NDI release, and is mentioned here for those concerned with absolute accuracy or long term drift.

Sending

Best Practices

All NDI senders should follow some basic best practices whenever possible:

  • The system clock should be synchronized to UTC via NTP, PTP, GPS, or similar

Best practice when the user application is generating time values for live sources:

  • The timebase should based on “wall clock” time, as used by the NDI SDK when synthesizing timecode/timestamp values

  • All frame types sent to a given NDI sender should use the same timebase

  • All NDI senders on a given system should use the same timebase

Software Sources

Live Sources

Sources which are "live", or being dynamically generated (such as a rendered graphics overlay, virtual 3D enviornment, or similar) should generate timecode values based on the current UTC time.

Applications without an inherent sample timebase should generally set “clocking” to true and allow the NDI SDK to generate timecode values, or alternately use the Genlock API (see below) to pace sending.

Applications with an inherent timebase (eg: live screen capture of a GPU output) should be treated as a hardware source.

Recorded Sources

Applications which are playing back recorded content such as a media player or video editing application preview should generally use a timecode value related to the specific media clip or project. For media which has no inherent timecode, convention is to start timecode values with 00:00:00:00 at the start of the clip or project.

Genlock API

This API is designed for software based NDI senders with no inherent video or audio timebase, such as a graphics rendering engine or perhaps a media player or DDR. The Genlock API allows a sender that would otherwise use the “clocked” feature of the NDI SDK to pace sending to instead pace sending based on a reference NDI source. This allows software defined workflows (eg: virtual machines or cloud instances) to synchronize without the need for traditional genlock signals and video I/O cards. This API cannot be used to synchronize a source with an inherent timebase of it’s own, such as a video camera or video capture card. These hardware devices need to be synchronized with more traditional methods (eg: traditional genlock via black-burst).

To turn a free-running NDI sender into a genlocked NDI sender, the clock_video and clock_audio flags are set to false when creating the NDI sender and a call to NDIlib_genlock_wait_video or NDIlib_genlock_wait_audio is added into the sending loop to pace the generation of video or audio frames.

Hardware Sources

Hardware sources such as video capture cards have an inherent video and audio timebase that is typically not configurable via software but is instead driven by the upstream video timing. In this case, the applications should set “clocking” to false and allow the hardware to pace the frame sending.

Best practice for generating timecode values for these devices is to generate a UTC based timecode value as close as possible to the reception of the complete frame of audio or video data, ideally in an IRQ routine or as soon as possible in the "data available" callback or similar. If there is a known latency in the input chain (eg: a known delay in a camera’s signal processing pipeline) that offset can be subtracted to provide a timecode value more representative of exactly when that audio or video data existed in the real world.

Alternately, a live production environment may have a specific workflow dictating how timecode values should be generated (perhaps being distributed via serial link, black-burst with VITC, LTC, or some other method). In this case, the production specific workflow requirements should be followed.

Synchronization

The NDI library encodes and transmits audio and video frames from the sender to the receiver as quickly as possible. The amount of time it takes to send any given frame is dependent on multiple variables, including the format, sender and receiver CPU/system performance, network architecture, network traffic, and other factors. Typically, audio frames require less processing than video frames resulting in slightly lower NDI transport latency for audio vs. video frames (on the order of a few milliseconds).

In order for downstream NDI receivers to be able to properly synchronize received audio and video streams, it is important for the NDI sender to pass aligned audio and video data to the NDI library. This does not mean that audio frames need to exactly match the length of a video frame and be submitted at the same time as a video frame (although this is perfectly acceptable), but the timecode values for audio packets need to accurately represent the audio timing relative to the video frame.

In addition to any delay mismatch potentially added by NDI, capture and playback pipelines often have significantly different delays for audio vs video paths. Measuring and accounting for any audio to video offset outside of NDI is outside the scope of this paper. All following content regarding synchronization assumes the audio and video data are properly alignend when passed to the NDI sender instance.

Synchronizing Audio and Video

Since the difference between audio and video latency in NDI is typically small, simply passing the received NDI audio and video frames directly to the application for processing or output as soon as they are received will often maintain sufficient synchronization between the streams for practical use cases. In those cases where the latency difference added by NDI is significant (UHD resolutions can exhibit significant video encode and decode delay on lower-powered systems) or when audio and video alignment is critical, applications may wish to delay the audio or the video stream to restore alignment.

Timecode or timestamp values can be used to determine the relative timing between the audio and video streams and some additional delay can be added to the earlier stream (the stream with lower latency) to bring them into alignment. Timecode values are typically more accurate for this than the timestamp value, but note that applications need to be able to handle cases where the timecode value does not increment uniformly as discuessed in the "NDI timecode" section, above. The AVSync API (below) can also be used to obtain the audio data which corresponds to a particular video frame.

AVSync API

Broadly speaking, the AVSync API is used for aligning audio and video data at the receiver when the audio is not being resampled. This would be useful for example when recording an NDI source into a multimedia file, or when mixing multiple NDI streams with synchronized timing between the NDI senders and receiver such that no frame synchronization or audio resampling is needed.

The AV Sync API is used by an NDI receiver to pull aligned frames of audio and video data from the NDI stream and it’s functionality is essentially summarized as “Provide me the audio samples which align in time with this specific video frame”. Note that no resampling of any sort is performed, so users of this API need to be able to handle both clocking differences between the NDI sender and receiver as well as the case when the audio sample rate is not exactly locked to the video frame rate.

Synchronizing Senders and Receivers

With NDI, the sender determines all stream details, including frame timing and audio sample rate. While some receiver applications are able to follow the sender frame timing, frequently an NDI receiver will need to process the received audio and video frames on a local timbase which is not synchronized to the sender. This process is performed by a frame synchronizer.

Framesync API

Audio and video frame timing in NDI is driven by the sender, however NDI receivers often have a local audio or video timebase for signal outputs and the received NDI frames need to be synchronized to these output timebases. The NDI library provides the frame synchronizer API to assist with this task.

There are several details that are important to understand when using the NDI frame synchronizer to retime audio and video data at the receiver. First, the basic operating principle of the frame synchronizer is that an NDI receiver application should be requesting audio and video frames from the frame synchronizer based on it’s local timebase, typically derived from physical audio and video output hardware. The application should try to request frames (especially audio frames) with as consistent timing as possible, ideally driven by something like a hardware interrupt.

When a frame is requested, the frame synchronizer returns the most appropriate recent video frame or retimed audio samples. If no recent video or audio is available, null frames (video) or silence (audio) will be returned.

NOTE: When using the NDI Advanced SDK, the frame synchronizer only supports compressed SpeedHQ video. The H.264 and H.265 video formats used by NDI HX as well as the AAC and Opus audio formats do not function properly when passed through the frame synchronizer.

Video

It is critical to note that the frame synchronizer does not modify the video frames in any way, including the timecode, timestamp, and frame_format_type fields which remain unchanged. If the receiver asks for a progressive frame but the NDI source is sending interlaced video, the frame_format_type returned will be either NDIlib_frame_format_type_field_0 or NDIlib_frame_format_type_field_1 and not NDIlib_frame_format_type_progressive. Similarly, if the receiver is operating with an interlaced output and asks for a NDIlib_frame_format_type_field_0 or NDIlib_frame_format_type_field_1 but the NDI sender is sending progressive video, the frame_format_type returned by the frame synchronizer will be NDIlib_frame_format_type_progressive.

Additionally, if the video source is interlaced and the receiver application requires the normal alternating sequence of frame 0 and frame 1 types from the frame synchronizer, it is important to explicitly request either NDIlib_frame_format_type_field_0 or NDIlib_frame_format_type_field_1 when asking for a frame. If the application instead requests a NDIlib_frame_format_type_progressive or NDIlib_frame_format_type_interleaved frame, the frame synchronizer will return the frame nearest in time regardless of it’s field, meaning the receiver could see consecutive field 0s or field 1s.

If the receiver application needs a different format (progressive or interlaced) than is currently being sent by the NDI sender, it is up to the application to perform any needed conversion.

Audio

The NDI framesync instance tracks the effective audio sample rate of both the sender (based on received audio frames) and the receiver (based on audio frames requested by the application) and uses a sample rate converter (SRC) to synthesize audio appropriate for playback on the receivers timebase. While the tracking logic does account for jitter, the SRC performance improves with more consistent timing of application requests for audio frames.

Unlike video, where the original video format can be identified by the frame_format_type field, audio is dynamically resampled and the frame synchronizer generates new audio frames with the requested number of audio samples and channels. To identify the original source details, the receiver application can call the NDIlib_framesync_capture_audio() routine with sample_rate, no_channels, and no_samples set to zero and the original sender sample rate and number of channels will be returned.

Note that it is possible for audio format details (sample rate and number of channels) to change dynamically. Applications wishing to track sample rate or channel number changes in the original source should periodically request the original source details via the above method to monitor for changes.

Future Improvements

It might be useful for NDI receivers to have some knowledge about the sender which is not currently communicated, such as:

  • The format of timecode values

  • System clock synchronization status

  • System clock synchronization reference

  • Genlock status of audio and video timebases

  • Audio and video timebase reference

This information could potentially be passed as connection metadata Anyone with an application that would benefit from this and an interest in developing real-world use cases should contact the NDI team via the Medata Lab submissions pagearrow-up-right.

Last updated

Was this helpful?