Frame Types
NDI sending and receiving use common structures to define video, audio, and metadata types. The parameters of these structures are documented below.
Video Frames (NDILIB_VIDEO_FRAME_V2_T)
xres, yres (int)
This is the resolution of the frame expressed in pixels. Note that, because data is internally all considered in 4:2:2 formats, image width values should be divisible by two.
FourCC
(NDIlib_FourCC_video_type_e)
This is the pixel format for this buffer. The supported formats are listed in the table below.
NDIlib_FourCC_type_UYVY
This is a buffer in the “UYVY” FourCC and represents a 4:2:2 image in YUV color space. There is a Y sample at every pixel, and U and V sampled at every second pixel horizontally on each line. A macro-pixel contains 2 pixels in 1 DWORD. The ordering of these pixels is U0, Y0, V0, Y1.
Please see the notes below regarding the expected YUV color space for different resolutions.
Note that when using UYVY video, the color space is maintained end-to-end through the pipeline, which is consistent with how almost all video is created and displayed.
NDIlib_FourCC_type_UYVA
This is a buffer that represents a 4:2:2:4 image in YUV color space. There is a Y sample at every pixels with U,V sampled at every second pixel horizontally. There are two planes in memory, the first being the UYVY color plane, and the second the alpha plane that immediately follows the first.
For instance, if you have an image with p_data
and stride
, then the planes are located as follows:
NDIlib_FourCC_type_P216
This is a 4:2:2 buffer in semi-planar format with full 16bpp color precision. This is formed from two buffers in memory, the first is a 16bpp luminance buffer and the second is a buffer of U,V pairs in memory. This can be considered as a 16bpp version of NV12.
For instance, if you have an image with p_data
and stride
, then the planes are located as follows:
As a matter of illustration, a completely packed image would have stride as xres*sizeof(uint16_t)
.
NDIlib_FourCC_type_PA16
This is a 4:2:2:4 buffer in semi-planar format with full 16bpp color and alpha precision. This is formed from three buffers in memory. The first is a 16bpp luminance buffer, and the second is a buffer of U,V pairs in memory. A single plane alpha channel at 16bpp follows the U,V pairs.
For instance, if you have an image with p_data and stride, then the planes are located as follows:
To illustrate, a completely packed image would have stride as
xres*sizeof(uint16_t)
.
NDIlib_FourCC_type_YV12
This is a planar 4:2:0 in Y, U, V planes in memory.
For instance, if you have an image with p_data and stride, then the planes are located as follows:
As a matter of illustration, a completely packed image would have stride as xres*sizeof(uint8_t)
.
NDIlib_FourCC_type_I420
This is a planar 4:2:0 in Y,U,V planes in memory with the U,V planes reversed from the YV12 format.
For instance, if you have an image with p_data and stride, then the planes are located as follows:
To illustrate, a completely packed image would have stride as
xres*sizeof(uint8_t)
.
NDIlib_FourCC_type_NV12
This is a semi planar 4:2:0 in Y, UV planes in memory. The luminance plane is at the lowest memory address with the UV pairs immediately following them.
For instance, if you have an image with p_data and stride, then the planes are located as follows:
To illustrate, a completely packed image would have stride as
xres*sizeof(uint8_t).
NDIlib_FourCC_type_BGRA
A 4:4:4:4, 8-bit image of red, green, blue and alpha components, in memory order blue, green, red, alpha. This data is not pre-multiplied.
NDIlib_FourCC_type_BGRX
A 4:4:4, 8-bit image of red, green, blue components, in memory order blue, green, red, 255. This data is not pre-multiplied.
This is identical to BGRA, but is provided as a hint that all alpha channel values are 255, meaning that alpha compositing may be avoided. The lack of an alpha channel is used by the SDK to improve performance when possible.
NDIlib_FourCC_type_RGBA
A 4:4:4:4, 8-bit image of red, green, blue and alpha components, in memory order red, green, blue, alpha. This data is not pre-multiplied.
NDIlib_FourCC_type_RGBX
A 4:4:4, 8-bit image of red, green, blue components, in memory order red, green, blue, 255. This data is not pre-multiplied.
This is identical to RGBA, but is provided as a hint that all alpha channel values are 255, meaning that alpha compositing may be avoided. The lack of an alpha channel is used by the SDK to improve performance when possible.
When running in a YUV color space, the following standards are applied:
SD resolutions
BT.601
HD resolutions
xres>720 || yres>576
Rec.709
UHD resolutions
xres>1920 || yres>1080
Rec.2020
Alpha channel
Full range for data type (0-255 range when running 8-bit and 0-65536 range when running 16-bit.)
For the sake of compatibility with standard system components, Windows APIs expose 8-bit UYVY and RGBA video (common FourCCs used in all media applications).
frame_rate_N, frame_rate_D (int)
This is the framerate of the current frame. The framerate is specified as a numerator and denominator, such that the following is valid:
frame_rate = (float)frame_rate_N / (float)frame_rate_D
Some examples of common framerates are presented in the table below.
NTSC 1080i59.94
30000 / 1001
29.97 Hz
NTSC 720p59.94
60000 / 1001
59.94 Hz
PAL 1080i50
30000 / 1200
25 Hz
PAL 720p50
60000 / 1200
50 Hz
NTSC 24fps
24000 / 1001
23.98 Hz
picture_aspect_ratio (float)
The SDK defines picture aspect ratio (as opposed to pixel aspect ratios). Some common aspect ratios are presented in the table below. When the aspect ratio is 0.0 it is interpreted as xres/yres, or square pixel; for most modern video types this is a default that can be used.
4:3
4.0/3.0
1.333...
16:9
16.0/9.0
1.667...
16:10
16.0/10.0
1.6
frame_format_type (NDIlib_frame_format_type_e)
This is used to determine the frame type. Possible values are listed in the next table.
NDIlib_frame_format_type_progressive
This is a progressive video frame
NDIlib_frame_format_type_interleaved
This is a frame of video that is comprised of two fields. The upper field comes first, and the lower comes second (see note below)
NDIlib_frame_format_type_field_0
This is an individual field 0 from a fielded video frame. This is the first temporal, upper field (see note below).
NDIlib_frame_format_type_field_1
This is an individual field 1 from a fielded video frame. This is the second temporal, lower field (see note below).
To make everything as easy to use as possible, the SDK always assumes that fields are ‘top field first.’
This is, in fact, the case for every modern format, but does create a problem for two specific older video formats as discussed below:
NTSC 486 Lines
The best way to handle this format is simply to offset the image vertically by one line (p_uyvy_data + uyvy_stride_in_bytes)
and reduce the vertical resolution to 480 lines. This can all be done without modification of the data being passed in at all; simply change the data and resolution pointers.
DV NTSC
This format is a relatively rare these days, although still used from time to time. There is no entirely trivial way to handle this other than to move the image down one line and add a black line at the bottom.
timecode (int64_t, 64-bit signed integer)
This is the timecode of this frame in 100 ns intervals. This is generally not used internally by the SDK but is passed through to applications, which may interpret it as they wish. When sending data, a value of NDIlib_send_timecode_synthesize
can be specified (and should be the default). The operation of this value is documented in the sending section of this documentation.
p_data (const uint8_t*)
This is the video data itself laid out linearly in memory in the FourCC format defined above. The number of bytes defined between lines is specified in line_stride_in_bytes
. No specific alignment requirements are needed, although larger data alignments might result in higher performance (and the internal SDK codecs will take advantage of this where needed).
line_stride_in_bytes (int)
This is the inter-line stride of the video data, in bytes.
p_metadata (const char*)
This is a per-frame metadata stream that should be in UTF-8 formatted XML and NULL
-terminated. It is sent and received with the frame.
timestamp (int64_t, 64-bit signed integer)
This is a per-frame timestamp filled in by the NDI SDK using a high precision clock. It represents the time (in 100 ns intervals measured in UTC time, since the Unix Time Epoch 1/1/1970 00:00) when the frame was submitted to the SDK.
On modern sender systems this will have ~1 μs accuracy; this can be used to synchronize streams on the same connection, between connections, and between machines. For inter-machine synchronization, it is important to use external clock locking capability with high precision (such as NTP).
Audio Frames (NDILIB_AUDIO_FRAME_V3_T)
NDI Audio is passed to the SDK in floating-point and has a dynamic range without practical limits (without clipping). To define how floating-point values map into real-world audio levels, a sinewave that is 2.0 floating-point units peak-to-peak (i.e., -1.0 to +1.0) is assumed to represent an audio level of +4 dBU, corresponding to a nominal level of 1.228 V RMS.
Two tables are provided below that explain the relationship between NDI audio values for the SMPTE and EBU audio standards.
SMPTE Audio levels - reference Level
NDI
0.0
0.063
0.1
0.63
1.0
10.0
dBu
-∞
-20 dB
-16 dB
+0 dB
+4 dB
+24 dB
dBVU
-∞
-24 dB
-20 dB
-4 dB
+0 dB
+20 dB
SMPTE dBFS
-∞
-44 dB
-40 dB
-24 dB
-20 dB
+0 dB
If you want a simple ‘recipe’ that matches SDI audio levels based on the SMPTE audio standard, you will want to have 20 dB of headroom above the SMPTE reference level at +4 dBu, which is at +0 dBVU, to correspond to a level of 1.0 in NDI floating-point audio. Conversion from floating-point to integer audio would thus be performed with:
int smpte_sample_16bit = max(-32768, min(32767, (int)(3276.8f*smpte_sample_fp)));
EBU Audio levels - reference Level
NDI
0.0
0.063
0.1
0.63
1.0
5.01
dBu
-∞
-20 dB
-16 dB
+0 dB
+4 dB
+18 dB
dBVU
-∞
-24 dB
-20 dB
-4 dB
+0 dB
+14 dB
EBU dBFS
-∞
-38 dB
-34 dB
-18 dB
-14 dB
+0 dB
If you want a simple ‘recipe’ that matches SDI audio levels based on the EBU audio standard, you will want to have 18 dB of headroom above the EBU reference level at 0 dBu (i.e., 14 dB above the SMPTE/NDI reference level). Conversion from floating-point to integer audio would thus be performed with:
int ebu_sample_16bit = max(-32768, min(32767, (int)(6540.52f*ebu_sample_fp)));
Because many applications provide interleaved 16-bit audio, the NDI library includes utility functions that will convert in and out of floating-point formats from PCM 16-bit formats.
There is also a utility function for sending signed 16-bit audio using NDIlib_util_send_send_audio_interleaved_16s. Please refer to the example projects and the header file Processing.NDI.utilities.h, which lists the available functions.
In general, we recommend the use of floating-point audio since clamping is not possible, and audio levels are well-defined without a need to consider audio headroom.
The audio sample structure is defined as described below.
sample_rate (int)
This is the current audio sample rate. For instance, this might be 44100, 48000 or 96000. It can, however, be any value.
no_channels (int)
This is the number of discrete audio channels. 1 represents MONO audio, 2 represents STEREO, and so on. There is no reasonable limit on the number of allowed audio channels.
no_samples (int)
This is the number of audio samples in this buffer. Any number will be handled correctly by the NDI SDK. However, when sending audio and video together, please bear in mind that many audio devices work better with audio buffers of the same approximate length as the video framerate.
We encourage sending audio buffers that are approximately half the length of the video frames and that receiving devices support buffer lengths as broadly as they reasonably can.
timecode (int64_t, 64-bit signed integer)
This is the timecode of this frame in 100 ns intervals. This is generally not used internally by the SDK but is passed through to applications which may interpret it as they wish. When sending data, a value of NDIlib_send_timecode_synthesize
can be specified (and should be the default), the operation of this value is documented in the sending section of this documentation.
NDIlib_send_timecode_synthesize
will yield UTC time in 100 ns intervals since the Unix Time Epoch 1/1/1970 00:00. When interpreting this timecode, a receiving application may choose to localize the time of day based on time zone offset, which can optionally be communicated by the sender in connection metadata.
Since the timecode is stored in UTC within NDI, communicating timecode time of day for non-UTC time zones requires a translation.
FourCC (NDIlib_FourCC_audio_type_e)
This is the sample format for this buffer. There is currently one supported format: NDIlib_FourCC_type_FLTP
. This format stands for floating-point audio.
p_data (uint8_t*)
If FourCC is NDIlib_FourCC_type_FLTP
, then this is the floating-point audio data in planar format, with each audio channel stored together with a stride between channels specified by channel_stride_in_bytes.
channel_stride_in_bytes (int)
This is the number of bytes that are used to step from one audio channel to another.
p_metadata (const char*)
This is a per-frame metadata stream that should be in UTF-8 formatted XML and NULL
-terminated. It is sent and received with the frame.
timestamp (int64_t, 64-bit signed integer)
This is a per-frame timestamp filled in by the NDI SDK using a high-precision clock. It represents the time (in 100 ns intervals measured in UTC time since the Unix Time Epoch 1/1/1970 00:00) when the frame was submitted to the SDK.
On modern sender systems, this will have ~1 μs accuracy and can be used to synchronize streams on the same connection, between connections, and between machines.
For inter-machine synchronization, it is important that some external clock locking capability with high precision is used, such as NTP.
Metadata Frames (NDILIB_METADATA_FRAME_T)
Metadata is specified as NULL
-terminated UTF-8 XML data. The reason for this choice is so that the format can naturally be extended by anyone using it to represent data of any type and length.
XML is also naturally backward and forward compatible because any implementation would happily ignore tags or parameters that are not understood (which, in turn, means that devices should naturally work with each other without requiring a rigid set of data parsing and standard complex data structures).
length (int)
This is the length of the metadata message in bytes. It includes the NULL
-terminating character. If this is zero, then the length will be derived from the string length automatically.
p_data (char*)
This is the XML message data.
timecode (int64_t, 64-bit signed integer)
This is the timecode of this frame in 100 ns intervals. It is generally not used internally by the SDK but is passed through to applications who may interpret it as they wish.
When sending data, a value of NDIlib_send_timecode_synthesize
can be specified (and should be the default); the operation of this value is documented in the sending section of this documentation.
If you wish to put your own vendor specific metadata into fields, please use XML namespaces. The “NDI” XML namespace is reserved.
It is very important that you compose legal XML messages for sending. (On receiving metadata, it is important that you support badly formed XML in case a sender did send something incorrect.)
If you want specific metadata flags to be standardized, please contact us.
Last updated