NV_ENC_CLOCK_TIMESTAMP_SET was changed in SDK 13.1: countingType was
replaced by countingTypeLSB and countingTypeMSB.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
The device-only compilation path of vf_scale_cuda.h pulled in <stdint.h>
solely to obtain uint8_t for the CUdeviceptr typedef. On Windows-on-ARM
(aarch64 mingw) this drags in _mingw.h, whose ARM __prefetch intrinsic is
guarded by !__has_builtin(__prefetch). During clang's --cuda-device-only
pass __has_builtin has deferred/inconsistent semantics on the auxiliary
(host) target, so the guard mis-fires, the inline __prefetch definition is
emitted, and clang rejects it:
_mingw.h: error: definition of builtin function '__prefetch'
This broke the msys2-clangarm64 FATE slot once ffnvcodec (and thus the
nvcc-compiled CUDA filters) was enabled for aarch64 Windows.
uint8_t is unsigned char, so use that directly and drop the <stdint.h>
include. Device-only code should not depend on the host C runtime headers.
No functional or ABI change.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Adding support to build FFmpeg with HW accelerated decode (nvdec) and
encode (nvenc) on aarch64 Windows, covering both the MinGW (mingw32/
mingw64) and MSVC (win32/win64) toolchains. The dynamically-loaded
NVIDIA codec headers and the CUDA loader are architecture-agnostic, so
the only gate was the target_os check in the aarch64/ppc64 branch.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
The NVENC driver currently forces deltaQ_v_ac equal to deltaQ_u_ac for
AV1, so crQPIndexOffset is silently ignored. The SDK header annotates
the field as "for future use only" (nvEncodeAPI.h, NV_ENC_RC_PARAMS).
Reported in #22737
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
frames_ctx->width/height were unconditionally rounded to even, causing
odd-dimension monochrome/444 clips to be reported with incorrect
surface pool dimensions. Round only for 4:2:0 and 4:2:2; for
monochrome/444 use avctx->coded_width/coded_height unchanged,
matching the dimensions set by the software codec layer.
Patch by: Aniket Dhok <adhok@nvidia.com>
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Both paths unconditionally rounded display dimensions to even via
(dim + 1) & ~1. This is required for 4:2:0 and 4:2:2 (chroma
subsampling requires even-aligned surfaces) but incorrect for
monochrome and 4:4:4. AV1 monochrome clips with odd dimensions
(e.g. 1273x713) were output as 1274x714.
cuinfo.ulTargetWidth/Height still receives the even-aligned value
for internal NVDEC surface allocation. avctx->width/height are only
updated to the rounded value for 420/422; for monochrome/444 the
original display dimensions are preserved and the cuMemcpy2D copy
crops naturally.
Patch by: Aniket Dhok <adhok@nvidia.com>
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
H.264 Baseline profile cannot contain B-frames, but the NVENC preset
defaults and the max_b_frames-derived frameIntervalP override leave
frameIntervalP > 1 when -profile:v baseline is requested. The
unconditional check at the end of nvenc_setup_encoder() then sets
has_b_frames = 2, which propagates to the muxer and causes
compute_muxer_pkt_fields() to back-calculate
DTS = PTS - (delay + 1) * frame_duration. The resulting bitstream
contains no B-frames, yet every packet has a spurious 3-frame PTS/DTS
gap, breaking MPEG-TS/HLS output and DTS-based players.
This patch forces frameIntervalP to 1 in the Baseline branch of
nvenc_setup_h264_config() and warn if the user (or preset) had asked
for B-frames. The later has_b_frames assignment then sees the corrected
value and leaves avctx->has_b_frames at 0.
Fixes#22727.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Monochrome formats (gray, gray10le) have log2_chroma_w == 0 and
log2_chroma_h == 0 because they have no chroma planes — the same
values as YUV444. This caused them to be misclassified as YUV444 by
the is_yuv444 detection introduced in bcea693f75.
After fed6612415 changed cuvid_test_capabilities to use is_yuv444
instead of hardcoding cudaVideoChromaFormat_420, monochrome AV1
streams were rejected with "Codec av1_cuvid is not supported with
this chroma format".
Add an nb_components > 1 guard to exclude single-component formats
from the YUV444 path.
Patch by: Aniket Dhok <adhok@nvidia.com>
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
AV1CodecConfigurationRecord may contain only the 4-byte header and no
configOBUs. Still skip the header in that case so only configOBUs are
passed to cuvidParseVideoData().
Otherwise the av1C header itself is treated as sequence header data
and AV1 decoding can fail with an unknown error.
Suggested-by: Aniket Dhok <adhok@nvidia.com>
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Cap ulNumDecodeSurfaces to 32 and ulNumOutputSurfaces to 64 to prevent
cuvidCreateDecoder from failing with CUDA_ERROR_INVALID_VALUE when
initial_pool_size exceeds the hardware limits.
Also cap the decoder index pool (dpb_size) to 32 so that indices
handed out via av_refstruct_pool_get stay within the valid range
for cuvidDecodePicture's CurrPicIdx.
When unsafe_output is enabled, stop holding idx_ref in the unmap
callback. Since cuvidMapVideoFrame copies decoded data into an
independent output mapping slot, the decode surface index can safely
be reused as soon as the DPB releases it, without waiting for the
downstream consumer to release the mapped frame. This decouples the
decode surface index lifetime (max 32) from the output mapping slot
lifetime (max 64), eliminating the "No decoder surfaces left" error
that occurred when downstream components like nvenc held too many
frames.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
The NVENC H.264 high profile provides up to 16% bitrate savings
(BD-Rate measured with VMAF) compared to the main profile.
Since most users do not explicitly set a profile, changing the
default benefits the common case. Users requiring the main profile
for legacy decoder compatibility can still set it explicitly.
The change is gated behind a versioned define so it only takes
effect on the next major version bump (libavcodec 63).
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
For AV1, NV_ENC_BFRAME_REF_MODE_MIDDLE does not use a single middle
B-frame. Per the NVENC Programming Guide, it sets every other B-frame
as an Altref2 reference except the last B-frame in the Altref interval.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
The b_adapt option allows users to control adaptive B-frame decision
when lookahead is enabled in HEVC encoding. This feature was already
available for H.264 and AV1 encoders, but was missing from HEVC.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
The supported YUV pixel formats were separated between planar
and semiplanar. This approach reduces the number of CUDA kernels
for all pixel formats.
This patch:
1. Adds support for YUV 4:2:2 planar and semi-planar formats:
yuv422p, yuv422p10, nv16, p210, p216
2. Implements new conversion structures and kernel definitions
for planar and semi-planar formats
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Add support for additional pixel formats in CUDA hardware context:
- Planar formats (yuv420p10, yuv422p, yuv422p10, yuv444p10)
- Semiplanar formats (nv16, p210, p216)
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Added support for MV-HEVC encoding for stereoscopic videos (2 views
only). Compatible with the framepack filter when using the
AV_STEREO3D_FRAMESEQUENCE format.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
This commit extends the support for Temporal Filtering in NVENC for
AV1 and H.264 codecs. For natural videos with noise, NVENC temporal
filtering improves video coding efficiency by 4-5%.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
This commit adds support for Ultra High Quality mode for AV1 on
NVIDIA GPUs.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
This commit adds support for 4:2:2 encoding for HEVC and H.264 on
NVIDIA Blackwell GPUs. Additionally, it supports 10-bit encoding
for H.264 on Blackwell GPUs.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
This commit adds support for 4:2:2 decoding for HEVC and H.264 on
NVIDIA Blackwell GPUs for cuviddec. Moreover, it supports 10-bit
decoding for H.264 on Blackwell GPUs.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
This commit adds support for 4:2:2 decoding for HEVC and H.264 on
NVIDIA Blackwell GPUs. Additionally, it supports 10-bit decoding
for H.264 on Blackwell GPUs.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
This commit adds support for 4:2:2 pixel formats, namely NV16 and
P216 for NVIDIA GPUs.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
The Constant Quality (CQ) range for the AV1 codec is actually 0 to
63, contrary to what is stated in the header and documentation.
Signed-off-by: Diego Felix de Souza <ddesouza@nvidia.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
Adding 10-bit encoding support for HEVC if the input is 8-bit. In
case of 8-bit input content, NVENC performs an internal CUDA 8 to
10-bit conversion of the input prior to encoding. Currently, only
AV1 supports encoding 8-bit content as 10-bit.
Signed-off-by: Diego Felix de Souza <ddesouza@nvidia.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
When Split frame encoding is enabled, each input frame is partitioned into
horizontal strips which are encoded independently and simultaneously by
separate NVENCs, usually resulting in increased encoding speed compared to
single NVENC encoding.
Signed-off-by: Diego Felix de Souza <ddesouza@nvidia.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>