Frame Types
NDI sending and receiving use common structures to define video, audio, and metadata types. The parameters of these structures are documented below.
Video Frames (NDILIB_VIDEO_FRAME_V2_T)
Parameters | Description |
---|---|
xres, yres (int) | This is the resolution of the frame expressed in pixels. Note that, because data is internally all considered in 4:2:2 formats, image width values should be divisible by two. |
FourCC (NDIlib_FourCC_video_type_e) | This is the pixel format for this buffer. The supported formats are listed in the table below. |
FourCC | Description |
---|---|
NDIlib_FourCC_type_UYVY | This is a buffer in the โUYVYโ FourCC and represents a 4:2:2 image in YUV color space. There is a Y sample at every pixel, and U and V sampled at every second pixel horizontally on each line. A macro-pixel contains 2 pixels in 1 DWORD. The ordering of these pixels is U0, Y0, V0, Y1. Please see the notes below regarding the expected YUV color space for different resolutions. Note that when using UYVY video, the color space is maintained end-to-end through the pipeline, which is consistent with how almost all video is created and displayed. |
NDIlib_FourCC_type_UYVA | This is a buffer that represents a 4:2:2:4 image in YUV color space. There is a Y sample at every pixels with U,V sampled at every second pixel horizontally. There are two planes in memory, the first being the UYVY color plane, and the second the alpha plane that immediately follows the first. For instance, if you have an image with |
NDIlib_FourCC_type_P216 | This is a 4:2:2 buffer in semi-planar format with full 16bpp color precision. This is formed from two buffers in memory, the first is a 16bpp luminance buffer and the second is a buffer of U,V pairs in memory. This can be considered as a 16bpp version of NV12. For instance, if you have an image with As a matter of illustration, a completely packed image would have stride as |
NDIlib_FourCC_type_PA16 | This is a 4:2:2:4 buffer in semi-planar format with full 16bpp color and alpha precision. This is formed from three buffers in memory. The first is a 16bpp luminance buffer, and the second is a buffer of U,V pairs in memory. A single plane alpha channel at 16bpp follows the U,V pairs. For instance, if you have an image with p_data and stride, then the planes are located as follows: To illustrate, a completely packed image would have stride as
|
NDIlib_FourCC_type_YV12 | This is a planar 4:2:0 in Y, U, V planes in memory. For instance, if you have an image with p_data and stride, then the planes are located as follows: As a matter of illustration, a completely packed image would have stride as |
NDIlib_FourCC_type_I420 | This is a planar 4:2:0 in Y,U,V planes in memory with the U,V planes reversed from the YV12 format. For instance, if you have an image with p_data and stride, then the planes are located as follows: To illustrate, a completely packed image would have stride as
|
NDIlib_FourCC_type_NV12 | This is a semi planar 4:2:0 in Y, UV planes in memory. The luminance plane is at the lowest memory address with the UV pairs immediately following them. For instance, if you have an image with p_data and stride, then the planes are located as follows: To illustrate, a completely packed image would have stride as xres*sizeof(uint8_t). |
NDIlib_FourCC_type_BGRA | A 4:4:4:4, 8-bit image of red, green, blue and alpha components, in memory order blue, green, red, alpha. This data is not pre-multiplied. |
NDIlib_FourCC_type_BGRX | A 4:4:4, 8-bit image of red, green, blue components, in memory order blue, green, red, 255. This data is not pre-multiplied. This is identical to BGRA, but is provided as a hint that all alpha channel values are 255, meaning that alpha compositing may be avoided. The lack of an alpha channel is used by the SDK to improve performance when possible. |
NDIlib_FourCC_type_RGBA | A 4:4:4:4, 8-bit image of red, green, blue and alpha components, in memory order red, green, blue, alpha. This data is not pre-multiplied. |
NDIlib_FourCC_type_RGBX | A 4:4:4, 8-bit image of red, green, blue components, in memory order red, green, blue, 255. This data is not pre-multiplied. This is identical to RGBA, but is provided as a hint that all alpha channel values are 255, meaning that alpha compositing may be avoided. The lack of an alpha channel is used by the SDK to improve performance when possible. |
When running in a YUV color space, the following standards are applied:
Resolution | Standard |
---|---|
SD resolutions | BT.601 |
HD resolutions
| Rec.709 |
UHD resolutions
| Rec.2020 |
Alpha channel | Full range for data type (0-255 range when running 8-bit and 0-65536 range when running 16-bit.) |
For the sake of compatibility with standard system components, Windows APIs expose 8-bit UYVY and RGBA video (common FourCCs used in all media applications).
Parameters (Cont.) | Description |
---|---|
frame_rate_N, frame_rate_D (int) | This is the framerate of the current frame. The framerate is specified as a numerator and denominator, such that the following is valid:
Some examples of common framerates are presented in the table below. |
Standard | Framerate ratio | Framerate |
---|---|---|
NTSC 1080i59.94 | 30000 / 1001 | 29.97 Hz |
NTSC 720p59.94 | 60000 / 1001 | 59.94 Hz |
PAL 1080i50 | 30000 / 1200 | 25 Hz |
PAL 720p50 | 60000 / 1200 | 50 Hz |
NTSC 24fps | 24000 / 1001 | 23.98 Hz |
Parameters (Cont.) | Description |
---|---|
picture_aspect_ratio (float) | The SDK defines picture aspect ratio (as opposed to pixel aspect ratios). Some common aspect ratios are presented in the table below. When the aspect ratio is 0.0 it is interpreted as xres/yres, or square pixel; for most modern video types this is a default that can be used. |
Aspect Ratio | Calculated ad | image_aspect_ratio |
---|---|---|
4:3 | 4.0/3.0 | 1.333... |
16:9 | 16.0/9.0 | 1.667... |
16:10 | 16.0/10.0 | 1.6 |
Parameters (Cont.) | Description |
---|---|
frame_format_type (NDIlib_frame_format_type_e) | This is used to determine the frame type. Possible values are listed in the next table. |
Value | Description |
---|---|
NDIlib_frame_format_type_progressive | This is a progressive video frame |
NDIlib_frame_format_type_interleaved | This is a frame of video that is comprised of two fields. The upper field comes first, and the lower comes second (see note below) |
NDIlib_frame_format_type_field_0 | This is an individual field 0 from a fielded video frame. This is the first temporal, upper field (see note below). |
NDIlib_frame_format_type_field_1 | This is an individual field 1 from a fielded video frame. This is the second temporal, lower field (see note below). |
To make everything as easy to use as possible, the SDK always assumes that fields are โtop field first.โ
This is, in fact, the case for every modern format, but does create a problem for two specific older video formats as discussed below:
NTSC 486 Lines
The best way to handle this format is simply to offset the image vertically by one line (p_uyvy_data + uyvy_stride_in_bytes)
and reduce the vertical resolution to 480 lines. This can all be done without modification of the data being passed in at all; simply change the data and resolution pointers.
DV NTSC
This format is a relatively rare these days, although still used from time to time. There is no entirely trivial way to handle this other than to move the image down one line and add a black line at the bottom.
Parameters (Cont.) | Description |
---|---|
timecode (int64_t, 64-bit signed integer) | This is the timecode of this frame in 100 ns intervals. This is generally not used internally by the SDK but is passed through to applications, which may interpret it as they wish. When sending data, a value of |
p_data (const uint8_t*) | This is the video data itself laid out linearly in memory in the FourCC format defined above. The number of bytes defined between lines is specified in |
line_stride_in_bytes (int) | This is the inter-line stride of the video data, in bytes. |
p_metadata (const char*) | This is a per-frame metadata stream that should be in UTF-8 formatted XML and |
timestamp (int64_t, 64-bit signed integer) | This is a per-frame timestamp filled in by the NDI SDK using a high precision clock. It represents the time (in 100 ns intervals measured in UTC time, since the Unix Time Epoch 1/1/1970 00:00) when the frame was submitted to the SDK. On modern sender systems this will have ~1 ฮผs accuracy; this can be used to synchronize streams on the same connection, between connections, and between machines. For inter-machine synchronization, it is important to use external clock locking capability with high precision (such as NTP). |
Audio Frames (NDILIB_AUDIO_FRAME_V3_T)
NDI Audio is passed to the SDK in floating-point and has a dynamic range without practical limits (without clipping). To define how floating-point values map into real-world audio levels, a sinewave that is 2.0 floating-point units peak-to-peak (i.e., -1.0 to +1.0) is assumed to represent an audio level of +4 dBU, corresponding to a nominal level of 1.228 V RMS.
Two tables are provided below that explain the relationship between NDI audio values for the SMPTE and EBU audio standards.
SMPTE Audio levels - reference Level
NDI | 0.0 | 0.063 | 0.1 | 0.63 | 1.0 | 10.0 |
dBu | -โ | -20 dB | -16 dB | +0 dB | +4 dB | +24 dB |
dBVU | -โ | -24 dB | -20 dB | -4 dB | +0 dB | +20 dB |
SMPTE dBFS | -โ | -44 dB | -40 dB | -24 dB | -20 dB | +0 dB |
If you want a simple โrecipeโ that matches SDI audio levels based on the SMPTE audio standard, you will want to have 20 dB of headroom above the SMPTE reference level at +4 dBu, which is at +0 dBVU, to correspond to a level of 1.0 in NDI floating-point audio. Conversion from floating-point to integer audio would thus be performed with:
int smpte_sample_16bit = max(-32768, min(32767, (int)(3276.8f*smpte_sample_fp)));
EBU Audio levels - reference Level
NDI | 0.0 | 0.063 | 0.1 | 0.63 | 1.0 | 5.01 |
dBu | -โ | -20 dB | -16 dB | +0 dB | +4 dB | +18 dB |
dBVU | -โ | -24 dB | -20 dB | -4 dB | +0 dB | +14 dB |
EBU dBFS | -โ | -38 dB | -34 dB | -18 dB | -14 dB | +0 dB |
If you want a simple โrecipeโ that matches SDI audio levels based on the EBU audio standard, you will want to have 18 dB of headroom above the EBU reference level at 0 dBu (i.e., 14 dB above the SMPTE/NDI reference level). Conversion from floating-point to integer audio would thus be performed with:
int ebu_sample_16bit = max(-32768, min(32767, (int)(6540.52f*ebu_sample_fp)));
Because many applications provide interleaved 16-bit audio, the NDI library includes utility functions that will convert in and out of floating-point formats from PCM 16-bit formats.
There is also a utility function for sending signed 16-bit audio using NDIlib_util_send_send_audio_interleaved_16s. Please refer to the example projects and the header file Processing.NDI.utilities.h, which lists the available functions.
In general, we recommend the use of floating-point audio since clamping is not possible, and audio levels are well-defined without a need to consider audio headroom.
The audio sample structure is defined as described below.
Parameter | Description |
---|---|
sample_rate (int) | This is the current audio sample rate. For instance, this might be 44100, 48000 or 96000. It can, however, be any value. |
no_channels (int) | This is the number of discrete audio channels. 1 represents MONO audio, 2 represents STEREO, and so on. There is no reasonable limit on the number of allowed audio channels. |
no_samples (int) | This is the number of audio samples in this buffer. Any number will be handled correctly by the NDI SDK. However, when sending audio and video together, please bear in mind that many audio devices work better with audio buffers of the same approximate length as the video framerate. We encourage sending audio buffers that are approximately half the length of the video frames and that receiving devices support buffer lengths as broadly as they reasonably can. |
timecode (int64_t, 64-bit signed integer) | This is the timecode of this frame in 100 ns intervals. This is generally not used internally by the SDK but is passed through to applications which may interpret it as they wish. When sending data, a value of
Since the timecode is stored in UTC within NDI, communicating timecode time of day for non-UTC time zones requires a translation. |
FourCC (NDIlib_FourCC_audio_type_e) | This is the sample format for this buffer. There is currently one supported format: |
p_data (uint8_t*) | If FourCC is |
channel_stride_in_bytes (int) | This is the number of bytes that are used to step from one audio channel to another. |
p_metadata (const char*) | This is a per-frame metadata stream that should be in UTF-8 formatted XML and |
timestamp (int64_t, 64-bit signed integer) | This is a per-frame timestamp filled in by the NDI SDK using a high-precision clock. It represents the time (in 100 ns intervals measured in UTC time since the Unix Time Epoch 1/1/1970 00:00) when the frame was submitted to the SDK. On modern sender systems, this will have ~1 ฮผs accuracy and can be used to synchronize streams on the same connection, between connections, and between machines. For inter-machine synchronization, it is important that some external clock locking capability with high precision is used, such as NTP. |
Metadata Frames (NDILIB_METADATA_FRAME_T)
Metadata is specified as NULL
-terminated UTF-8 XML data. The reason for this choice is so that the format can naturally be extended by anyone using it to represent data of any type and length.
XML is also naturally backward and forward compatible because any implementation would happily ignore tags or parameters that are not understood (which, in turn, means that devices should naturally work with each other without requiring a rigid set of data parsing and standard complex data structures).
Parameter | Description |
---|---|
length (int) | This is the length of the metadata message in bytes. It includes the |
p_data (char*) | This is the XML message data. |
timecode (int64_t, 64-bit signed integer) | This is the timecode of this frame in 100 ns intervals. It is generally not used internally by the SDK but is passed through to applications who may interpret it as they wish. When sending data, a value of |
If you wish to put your own vendor specific metadata into fields, please use XML namespaces. The โNDIโ XML namespace is reserved.
It is very important that you compose legal XML messages for sending. (On receiving metadata, it is important that you support badly formed XML in case a sender did send something incorrect.)
If you want specific metadata flags to be standardized, please contact us.
Last updated