The parser has been reading f->flt for combined_version >= 0x40004
since commit c1b330bf24 (avcodec/ffv1: Basic float16 support), but
ff_ffv1_write_extradata() never had a matching put_symbol().
The result was that the parsed f->flt was whatever the next symbol's
worth of rangecoded bits happened to decode to — often 0, but for a
yuv420p16le -level 4 -strict experimental stream produced locally it
parses as 1. The software decoder doesn't notice because the YUV
pixfmt-selection branches never check f->flt, but anything else that
trusts it gets garbage.
Sponsored-by: Sovereign Tech Fund
The issue is the legacy path does not support hardware frames, so falling
back means erroring with ENOTSUP, which would fail the tests.
Sponsored-by: Sovereign Tech Fund
This was a mess, we were using incorrect pixels outside of the image boundaries as
valid, the iteration had undefined behaviour since it was non-uniform across the workgroup.
Calculate the per-invoc iterations from the slice dimensions instead, making all of
them identical. And add a valid flag to decide whether to use them or not. And fix the
synchronization.
Sponsored-by: Sovereign Tech Fund
The issue is that SliceContext was passed as an inout, which caused all
invocs to locally copy and modify it.
When the main invoc wrote it, only the very last written value was used,
choosing the wrong coeffs.
Sponsored-by: Sovereign Tech Fund
There was a race condition where the main invocation would race ahead and use
values not yet written by other invocs.
Sponsored-by: Sovereign Tech Fund
GPUs filter out denormals when reading floats via imageLoad. Denormals shouldn't
be present in general, but if they are, this is a lossless codec, and we have to
preserve them. This allows reading the exact values.
Sponsored-by: Sovereign Tech Fund
Float pixfmts are meant to be normalized between [0, 1], but in case they
were not, and negative numbers were present, then the top bits would be
filled with garbage.
Sponsored-by: Sovereign Tech Fund
This was an oversight while microoptimizing. The outstanding_byte can
reach 0xFF in some situations, which was causing errors when encoding,
particularly with 32-bit floats.
Sponsored-by: Sovereign Tech Fund
The issue is that while Vulkan already does the decomposition for us,
swscale assumes that the pixels will be in bitstream order, rather than
in their decomposed form.
This is valid for all packed formats for which these instructions are
issued (XV30 and X2RGB10).
This allows us to support the formats in Vulkan.
Sponsored-by: Sovereign Tech Fund
The issue was that XV30 is a native 444 10-bit format, rather than
16-bits. This resulted in padding leaking into bits where it shouldn't.
Sponsored-by: Sovereign Tech Fund
After an extended Ghidra session, it turns out that the camera/recorder bakes a
custom curve that *has* to be applied. It contains both the camera's inverse
transfer curve, plus whatever else the camera applied. It could (and does) contain
quantization refinements. And its used to switch between low and high quality encoding
by boosting coeffs (thus acting as an additional dequant curve).
Reverse engineered the decoder a bit more. All tiles are always 16x1.
The issue is that at the edges, tiles don't have the same width.
Instead, the first tile that starts to clip is half, and then the
next tile after that is also half the previous tile's width.
Unlike other decoders or encoders, prores_raw only has a single
Vulkan format to worry about.
This is a 20% speedup on AMD, since AMD apparently has optimizations
for this.
This just adds a Vulkan compute-based 360-degree video conversion.
It implements a sufficient subset of the most popular 360-degree video formats.
Options such as rotation are dynamic and can be adjusted during runtime.
Some of the work was based on Paul B. Mahol's patch from 2020. There
were spots where the arithmetic conversion was incorrect.
The swscale internals currently have a quirk which causes the memcpy
backend to be called when the pixfmts match. Obviously, this doesn't do
what is expected, as hardware frames cannot just be copied.
Check for this.
Sponsored-by: Sovereign Tech Fund
swscale gets runtime-defined assembly once again!
This commit splits the Vulkan backend into two, SPIR-V and GLSL,
enabling falling back onto the GLSL implementation if an instruction
is unavailable, or simply for testing.
Sponsored-by: Sovereign Tech Fund
This commit adds a SPIR-V assembler header file. It was partially generated
from the SPIR-V header file JSON definition, then edited by hand to template
and reduce its size as much as possible.
It only implements the essentials required for SPIR-V assembly that swscale
requires.
Sponsored-by: Sovereign Tech Fund
Uniform buffers are much simpler to index, and require no work from
the driver compiler to optimize.
In SPIR-V, large 2D shader constants can be spilled into scratch memory,
since you need to create a function variable to index them during runtime.
Sponsored-by: Sovereign Tech Fund
The issue is that very often, hardware has limited support for BGRA
formats.
As this is a limitation of Vulkan itself, we cannot work around this
in a compatible way.
Sponsored-by: Sovereign Tech Fund
FFmpeg has had an issue with GLSL compilation libraries since they
were first merged 6 years ago. The libraries don't have a stable ABI,
are very difficult for packagers to compile and integrate, are slow,
not threadsafe, and uncomfortable to use. The decision to switch all
Vulkan code to either compile-time GLSL or SPIR-V assembly was taken
in January, and since then, and included with the release of FFmpeg 8.1,
the progress has been steadily eliminating all remaining runtime GLSL
compilation.
Sponsored-by: Sovereign Tech Fund
The main issue is that BGR formats only semi-exist in Vulkan. Unlike all
other formats, they require the user to manually remap the pixel order, and
are also forbidden from being written to without a format in shaders. The main
reason for this was conservative - Vulkan is supposed to work everywhere, including
platforms where there is no write-time remapping/swizzing support.
Sponsored-by: Sovereign Tech Fund
The issue is that with multiplane images, or packed images,
there may be some mismatching between what .elems has, and what
we need.
Descriptors are cheap, so just always reserve 4.
Sponsored-by: Sovereign Tech Fund
The issue is that the main Vulkan context is shared between possibly
multiple shaders, and registering a new shader requires allocating
descriptors.
Sponsored-by: Sovereign Tech Fund