100 Commits
Author SHA1 Message Date
Niklas HaasandNiklas Haas 976e18fdef swscale/x86: use correct HOSTCC_E flag instead of CC_E
HOSTCC and CC might be completely different compilers.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-10 15:04:50 +00:00
Niklas Haas 7b59a86633 swscale/uops_tmpl: move attributes before static keyword
This fails to compile with C23 standard attributes otherwise.

Technically only av_unused requires this, but move the other attributes
as well for consistency / future proofing.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-10 16:27:58 +02:00
Niklas Haas 941a35149b swscale/x86/ops_int: switch to SWS_UOP_MOVE
Instead of SWS_UOP_PERMUTE/SWS_UOP_COPY.

No real measurable difference in performance (it just eliminates a few
practically free register renames), but definitely simpler.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas HaasandRamiro Polla 36004d681f swscale/uops: add SWS_UOP_MOVE for optimal register-register swizzles
This decomposes a swizzle mask into a series of optimal register-register
moves, using at most two temporary scratch registers.

This is a better match for ASM-style backends than the existing PERMUTE/COPY
uops that are designed for the needs of the C backend (or other backends which
either apply the swizzle mask directly or permute pointers).

I originally had logic equivalent to this written in NASM macros, but it was
just such a complicated mess that I think it's better to rewrite it in C and
have the resulting metadata be an explicit part of the uop definition.

This commit only adds the uop, I'll update the x86 implementation in the
next step.

Co-authored-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 228ef8d97b swscale/ops: make compile() take const SwsOpList *
The old x86 backend was the only backend that actually mutated the ops list.
With this gone, we can constify this parameter.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas a7c6a5f74e swscale/ops_chain: remove dead code
This is no longer needed now that both C and x86 are ported to uops.
The other ff_sws_setup_*() functions are still used by the aarch64 backend.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 43a8e2da01 swscale/x86/ops: rewrite based on uops_macros.h
This is a ground-up refactor of the existing x86 ops code, using the new
uops macros to auto-generate every single kernel instance without guesswork.

While I was at it, I also cleaned up the file a bit and made sure we have only
a single, consistent way of writing/defining the kernels. This also gets rid
of some of the old boilerplate like decl_pattern.

Most kernels are trivial ports, but a few deserve attention or note:

- SWS_UOP_LINEAR is now generated more efficiently, thanks to the distinction
  between 0/1/arbitrary components. I also rewrote the code to keep track of
  whether the output was initialized yet or not, which lets us skip the
  initial `xorps` and `addps` for the first component.

- SWS_UOP_PERMUTE is generated automatically by using some NASM logic to
  detect permutation cycles and emit the minimal sequence of `mova`
  instructions. SWS_UOP_COPY, on the other hand, is implemented naively. I
  originally had a more complex implementation that could handle both, but
  I decided it really isn't worth the complication just to save 2-3 cycles.

- SWS_UOP_SCALE now has a native 8-bit implementation, which is faster than
  falling back to C code.

- SWS_UOP_SWAP_BYTES is no longer compiled as a type-agnostic pshufb, instead
  we hard-code the shuffle mask

- SWS_UOP_DITHER is now much simpler and avoids branching etc. entirely

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 257f1438a5 swscale/x86/ops: simplify mmsize determination
No reason for this to be a separate function also, it just obscures
the error path for no reason.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 2a09d0346e swscale/x86/ops_include: clarify/fix some comments
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 6deae052a2 swscale/x86/uops: generate NASM macros using uops_macros.h
Rather than hard-coding a separate set of NASM macros, or generating them
with a separate function, we can just leverage the C preprocessor to generate
a NASM source file *from* the existing ops macros.

This is maybe a bit unorthodox, but it avoids unnecessary overhead from
re-generating the macros twice, avoids manual updating of the NASM macros,
and generally does not come with any real downside except being a bit ugly.

The main source of ugliness is the fact that the C preprocessor expands
everything into a single line, whereas NASM expects separate statements to
be on separate lines. Very fortunately, we can work around this by writing a
another NASM macro to take its arguments and dump them onto multiple lines.

It may seem premature, but I went ahead and defined all the macros, since
it was easy enough to do.

I added the %include in this commit to trigger build errors that occur only
as a result of introducing this file in the same commit that introduces it.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 6057759ffc swscale/uops: parametrize filter op result type
The ops.h infrastructure currently hard-codes this as SWS_PIXEL_F32,
but I want to at least properly parametrize this in case we ever
decide to revisit this decision in the future. In particular, it
may become relevant for trivial kernels or kernels whose intermediates
are bounded, exact integers (which could possibly be output directly
as e.g. U16 or U32).

The FATE change is just because the filter op names gained a suffix.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 4a8a1f5b8b swscale/uops: add SWS_UOP_READ_PLANAR_FV_FMA
Analog of SWS_UOP_READ_PLANAR_FV for FMA-enabled backends.
The logic for determining when we can safely use FMA is maybe a bit
obtuse, given that a `return type == SWS_PIXEL_U8` would have just done
the trick as well, but better to be safe than sorry, if we ever decide to
tune this constant in the future.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas dbe961b4cd swscale/uops: add SWS_UOP_LINEAR_FMA and SWS_UOP_FLAG_FMA
This is like SWS_UOP_LINEAR but parametrized by which matrix entries can use
FMA instead of bitexact IEEE mul/add instructions.

I decided to make these a separate uop to avoid bogging down the reference
backend with arch-specific details like FMA. However, I think FMA ops are quite
common/universal so I pre-emptively split it into its own separate flag rather
than defining something like SWS_UOP_FLAG_X86.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 4e18068165 swscale/uops: also generate macros under SWS_BITEXACT
And SWS_BITEXACT|SWS_ACCURATE_RND, for completeness. This roughly doubles
the runtime of the uops macros generation. Let's hope it doesn't explode
further.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 157f586e5c swscale/uops: thread SwsContext through ff_sws_ops_translate()
Needed to access ctx->flags, in particular SWS_BITEXACT.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas f97ba8cbe7 swscale/uops: loop over all flags when generating macros
This list is currently empty but will be expanded by the following commit.

I briefly tested whether it would be worth avoiding the free/realloc on
the uops array, but found the performance difference to be negligible.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 02a168a576 swscale/uops: keep track of input range during op translation
Needed for the FMA decision logic.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 3f9219d605 swscale/uops: add SwsUOpFlags to ff_sws_ops_translate()
These will be used to e.g. enable extra uops during translation.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas b7a80a9f0d swscale/ops_backend: delete ops-based C backend
And make uops_backend.c the new reference.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 100ce4ac41 tests/checkasm/sw_ops: rewrite using uops_macros.h
This ensures 100% coverage of all uop primitives by generating the set of
tests exactly from the list of seen primitives, using the uops macros.

There are some annoying quirks still because of the fact that we have to
essentially "untranslate" the UOPs back to SwsOps that result back in the
intended uop after the translation, but overall it's not too bad and still
much better than the status quo of hand-rolling the list of test cases.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 636b9eda75 swscale/ops_tmpl_float: allow arbitrary values for 1x1 dither
Removes the 1x1 dither fast path, mirroring the previous commit.

This is not really needed nor useful but it will make the transition to
the uops architecture slightly easier, as 1x1 dither gets reinterpreted
as SWS_UOP_ADD there.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas ca8774b9d6 swscale/x86: remove broken and unnecessary 1x1 dither fast path
This is broken because it fails to check dither.y_offset[] to determine if
dithering for a channel is requested or not.

This is unnecessary because the generic dither code already jumps over unused
components, which is cheap enough not to worry about this special case for
now.

This code will, in any case, soon be replaced by a uops_macros.h-derived
approach. This commit is only needed as a stopgap to make checkasm continue
working after the sws_uops refactor.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 19652a83a2 swscale/x86/ops_include: use %assign instead of %xdefine
For numeric 1/0 constants. As an aside, fix the broken comment.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas b328e152a4 swscale/x86: move entry points to ops_common.asm
As well as the packed shuffle solver. These don't really interact with
the rest of the code in ops_int.asm, which is, by name at least, intended for
integer op kernels.

More importantly, these functions will be shared with the uops rewrite.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas c5c9c6d996 swscale/x86: rename ops_common.asm to ops_include.asm
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 8118e964bb swscale/uops: auto-generate reference C backend from uops_macros.h
Instead of choosing by hand which kernels to implement, this rewrite focuses
on leveraging the power of uops_macros.h to auto-generate all needed kernels.
This not only simplifies maintenance, but also improves performance.

I have decided to develop the replacement backend as a separate file, under
a separate prefix, for the explicit purpose of being able to verify the
correctness of the rewrite using the current backend as a checkasm reference.

The code for the kernels themselves has been largely copied from the old
C backend, modified slightly to conform to the uop template style. This does
result in some code duplication, but a following commit will clean it up.
I nonetheless want to preserve this commit for bisection purposes, to ensure
we have one commit that contains both backends side-by-side.

Overall speedup=1.182x faster, min=0.197x max=3.450x

The big slowdowns are flukes caused by tiny deviations in the runtime of
a noop memcpy conversion.

As a nice side benefit, the compiled binary is now also ~10% smaller, and
the code ~50% smaller.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 1e268fbedf swscale/ops_chain: add uop-based helpers to assemble SwsOpChain
This will eventually replace the existing op_match() and
ff_sws_op_compile_tables(), but I've decided to introduce it separately first
so that I can incrementally update the backends to use the new API, at the
cost of some temporary code duplication.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas adaf142647 swscale/uops: generate uop helper macros
This follows the same approach as is used currently by ops_entries_aarch64,
except I decided to have the generation logic live directly in uops.c
to allow re-using internal helpers and move it closer to the other helpers
that depend on the exact set of uops and their fields.

Unlike libswscale/tests/sws_ops.c, we make an effort to actually test all
relevant flag combinations, since these can affect the generated op lists.

I will use these macros to auto-generate both the C template-based kernels,
as well as the entire x86 backend, in the near future, hence their excessive
flexibility.

Re-use the libswscale/tests/sws_ops.c that we already compile. We could put it
in its own file but this is just as convenient, and it's easily moved anyways.
Having it be a FATE test ensures that it is always up-to-date.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 18:27:20 +02:00
Niklas Haas 8ad7cc6ccd swscale/tests/sws_ops: also print/test micro-op list
Tests for changes or regressions in the generated micro-ops. This will be
instrumental in my development of the micro-ops optimizer, and my plans to
phase out some of the macro-op optimization passes in favor of doing those
optimizations on the uop level instead.

 rgb24 16x16 -> rgb24 16x32:
   [ u8 +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
     min: {0 0 0 _}, max: {255 255 255 _}
   [ u8 ...X] SWS_OP_FILTER_V     : 16 -> 32 bilinear (2 taps)
     min: {0 0 0 _}, max: {255 255 255 _}
   [f32 ...X] SWS_OP_DITHER       : 16x16 matrix + {0 3 2 -1}
     min: {1/512 1/512 1/512 _}, max: {255.998047 255.998047 255.998047 _}
   [f32 ...X] SWS_OP_MIN          : x <= {255 255 255 _}
     min: {1/512 1/512 1/512 _}, max: {255 255 255 _}
   [f32 +++X] SWS_OP_CONVERT      : f32 -> u8
     min: {0 0 0 _}, max: {255 255 255 _}
   [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) packed >> 0
     (X = unused, z = byteswapped, + = exact, 0 = zero)
  Retrying with split passes:
   [ u8 +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
     min: {0 0 0 _}, max: {255 255 255 _}
   [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) planar >> 0
     (X = unused, z = byteswapped, + = exact, 0 = zero)
+ translated micro-ops:
+    u8_read_packed_xyz
+    u8_write_planar_xyz
  Sub-pass #1:
   [ u8 ...X] SWS_OP_READ         : 3 elem(s) planar >> 0 + 2 tap bilinear filter (V)
     min: {0 0 0 _}, max: {255 255 255 _}
   [f32 ...X] SWS_OP_DITHER       : 16x16 matrix + {0 3 2 -1}
     min: {1/512 1/512 1/512 _}, max: {255.998047 255.998047 255.998047 _}
   [f32 ...X] SWS_OP_MIN          : x <= {255 255 255 _}
     min: {1/512 1/512 1/512 _}, max: {255 255 255 _}
   [f32 +++X] SWS_OP_CONVERT      : f32 -> u8
     min: {0 0 0 _}, max: {255 255 255 _}
   [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) packed >> 0
     (X = unused, z = byteswapped, + = exact, 0 = zero)
+ translated micro-ops:
+    u8_read_planar_fv_xyz
+    f32_dither_xyz_0_3_2_16x16
+    f32_min_xyz
+    f32_to_u8_xyz
+    u8_write_packed_xyz
...

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 01:11:01 +02:00
Niklas Haas 3a7331d311 swscale/ops: remove unused function ff_sws_enum_ops()
Users can trivially recreate this logic anyways.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 01:10:57 +02:00
Niklas Haas 6b75166758 swscale/tests/sws_ops: minor cleanup / consistency
Clean up after the previous revert.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 01:10:54 +02:00
Niklas Haas dcfe3d3b90 Revert "swscale/tests/sws_ops: add option for summarizing all operation patterns"
This reverts commit f76aa4e408.

This is no longer needed once we switch to uops_macros.h, which will do the
same thing except better.
2026-06-09 01:10:49 +02:00
Niklas Haas aaf6a52fe6 swscale/uops: add uop translation logic
This will replace the fuzzy matching logic in op_match() that is used by the
C and x86 implementations, as well as the translation to AARCH64_OP_* that is
used by the NEON asmgen backend.

Down the line, this function will also take a set of flags to enable
backend-specific kernels like FMA variants, but I also decided to keep it
initially simple to ease the transition.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 01:10:39 +02:00
Niklas Haas dc88bcdf8c swscale/uops: add uop definitions
Taken from AARCH64_OP_*, but generalized/simplified a bit and updated to add
missing op types, especially for special cases that already have dedicated
implementations on x86.

This initial definition is kept intentionally simple and close to SwsOp, to
make it easier to port the existing ops backends to the new infrastructure.
However, in the future, this will be refactored dramatically - distinctions
like convert vs expand will cease to exist on the SwsOp level, and will
instead be introduced by separate optimization passes on the uops level.

SWS_UOP_LINEAR in particular will most likely be broken up into multiple
uops. I also took this opportunity to redefine the mask in a more useful way.

I decided to split up SWS_OP_CONVERT as well, because it was making x86
codegen unnecessarily difficult due to the strong interaction between exact
pixel sizes.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-09 01:09:34 +02:00
Niklas Haas ae6f3ce02c swscale/uops: split off from ops.h
Forming what will be the start of a larger helper file for backend-internal
translation of higher-level ops into lower level kernels. This header file
needs to be includable from independent source files, as it will be used to
provide definitions for build-time code generation (e.g. ops_asmgen.c), so
it must be self-contained.

Pulling in all of ops.h from uops.h would be too large dependency, since
ops.h pulls in graph.h, refstruct, bprint, etc. It's easier to start from a
fresh file that is documented as being usable at compile time.

For now, just declare the common types that will be needed by the uops layer.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-08 18:29:02 +02:00
Niklas Haas 48a42b5f21 configure: add -P to $CC_E flag
This suppresses the addition of #line directives in the preprocessed output,
which is what we want when we're invoking the hostcc just to preprocess some
files. (Currently, this variable is only used for configure-internal checks
anyways, but I want to use it to preprocess a NASM file)

On MSVC/Intel, /EP is the equivalent syntax, though we use -EP instead for
consistency.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-08 18:24:45 +02:00
Niklas HaasandNiklas Haas 3137d337fe tests/checkasm/sw_ops: use new checkasm_set_func_variant()
The current approach of re-testing the C reference for every backend
separately leads to both confusing output (e.g. having an extra redundant
`memcpy_c` line for every op, even those not implemented by the memcpy
backend), as well as a lot of unnecessary wasted time re-testing and
re-benching the same C variant for every backend.

This new API function lets us test the C function only a single time, while
simultaneously having all of the other backends implicitly compare themselves
against the C reference.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-07 09:24:23 +00:00
Niklas HaasandRamiro Polla 7fc7aaf265 swscale/graph: prefer ops backend for floating point formats
These have horrible support in legacy swscale; in particular, they break the
pixel range (limited vs full) when converting to yuva444p, resulting in SSIM
errors like:

uyva 96x96 -> grayf32le 96x96, SSIM={Y=0.997654 U=1.000000 V=1.000000 A=1.000000} loss=1.876414e-03
  loss 1.876414e-03 is worse by 1.864254e-03, expected loss 1.215935e-05

(The ops-based backend gets a 100% bit-exact roundtrip here)

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-05 22:37:18 +02:00
Niklas HaasandRamiro Polla 5a6bf8d4f4 swscale/tests/swscale: allow all backends for auxiliary conversions
This enables testing all internally supported pixel formats.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-05 22:32:51 +02:00
Niklas HaasandRamiro Polla 8366d5f8d6 swscale/tests/swscale: refactor format testing logic
Uses the internal ff_sws_test_pixfmt_backend() to test for format support
on the concrete backend that's in-use for the auxiliary/main conversions,
respectively, while taking into account the -backends and -api options.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-05 22:22:27 +02:00
Niklas HaasandRamiro Polla 517c3d5fc1 swscale/graph: re-check pixel format support in add_legacy_sws_pass()
When the user passes multiple backends (e.g. SWS_BACKEND_ALL), the
static check in sws_setup_frame() might have succeeded for the ops
backend but not the legacy backend, so we need to properly restrict
the legacy backend implementation function as well. Otherwise, this
may trigger internal errors / AVERROR(EINVAL) inside sws_init_context().

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-05 21:55:27 +02:00
Niklas Haas afce637550 avformat/shared: add option to verify cache file contents
This will effectively disable the cache but allows the cache layer to verify
cached files against the original input file. Useful only for debugging
the shared cache protocol itself, as file corruption can already be caught by
the CRC check.
2026-06-04 17:48:12 +02:00
Niklas Haas ca748964fe avformat/shared: implement 16-bit CRC check
Decided to split this off from the previous commit in case we
ever want to revert it, since it does double the overhead of the spacemap
as well as adding extra overhead to both the read and write path.

Bump the cache version to 2 to reflect the changed disk format.
2026-06-04 17:48:12 +02:00
Niklas Haas 56de70a2e6 avformat: add shared concurrent block cache protocol
This adds a new protocol shared:URI which is distinct from the existing
`cache:` in that it is explicity designed to be thread-safe and cross-process,
enabling multiple ffmpeg processes (or multiple ffmpeg decoders within the same
process) to share a single cache file, for e.g. a remote HTTP stream. As such,
it uses a radically different internal design.

To facilitate zero-knowledge cross-process interoperability, the cache file
itself is just a memory-mapped representation of the underlying file data,
which has the side benefit that the resulting cache file will contain a
working copy of the streamed file (assuming the stream was read to
completion).

To keep track of which regions are cached and which are not, we use a
secondary file that contains a minimal header along with a static bytemap of
blocks within the file. This secondary file is also used to store metadata
such as the filesize, if known, as well as marking "failed" blocks.

Both files can grow dynamically in order to accommodate larger/growing files,
and can be atomically updated (through the use of shared space maps). I have
extensively checked the space map initalization and update code for race
conditions, and I believe the current design to be solid.

That said, it is the user's responsibility to some extent to ensure that the
same URI is not used for different streams, as we rely on the URI to uniquely
identify the cache files. That said, we use a cryptographic hash with
sufficient collision resistance to protect against possible abuse. The lack of
any implicit default on `-cache_dir` also means that `shared:` can't be enabled
via URL injection to possibly access random files on the disk (or intentionally
leak content from other streams with similar URIs, even if the cryptograhic
hash function is broken).
2026-06-04 17:48:12 +02:00
Niklas Haas cd3f335207 avformat/file: return ENOSYS for filesize query on files with follow=1
If the input is expected to grow, we shouldn't make any assumptions about
the file size. This matches e.g. the behavior of streamed protocols like
chunked HTTP, which similarly return ENOSYS for streams of unknown size.

Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-04 17:48:12 +02:00
Niklas Haas 7cb93fb200 avformat/http: return ENOSYS instead of UINT64_MAX for unknown filesize
This matches the behavior of e.g. the pipe: protocol, which returns ENOSYS
on account of ffurl_seek() not being implemented.

The previous behavior of returning s->filesize directly is almost surely a
bug, as s->filesize is UINT64_MAX when never initialized.

Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-04 17:48:12 +02:00
Niklas Haas c27a3b12e3 configure: re-indent after previous change
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-04 11:44:52 +02:00
Niklas Haas 310ff99f62 configure: support building without checkasm
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-04 11:44:52 +02:00
Niklas HaasandMartin Storsjö 3b1d7cd1f7 tests/checkasm: switch to shared libcheckasm implementation
The checkasm tool originated in x264. It was later rewritten and
modernized for FFmpeg (and relicensed to LGPL). For the dav1d
project, it was relicensed again to 2-clause BSD (with permission
from the relevant authors).

The FFmpeg and dav1d implementations of checkasm have since evolved
independently (with some amount of ported code between the two,
with relicensing permission where relevant).

To synchronize the development, and to make it possible to easily
adopt checkasm in other projects, it has been split out into a
standalone project/library on its own, developed at
https://code.videolan.org/videolan/checkasm/.

That version has all the features of checkasm in both FFmpeg and
dav1d, and has got a number of extra improvements on top:

- More/fixed tests (e.g. properly clobbering high bits of 32-bit registers
  on most platforms),

- Vastly improved overall performance / runtime for benchmarking, due
  primarily to the ability to scale the runtime of each test to that test's
  complexity.

- Much more robust statistical analysis of benchmarking results; including
  robust outlier rejection, an estimation of the histogram, and the ability
  to report the variance / stddev in addition to the (trimmed) mean.

- Interactive HTML and JSON output formats in addition to CSV/TSV.

- More readable and user-friendly output across the board, especially for
  failures and data dumps (e.g. also showing errors inside padding bytes).

- Better cross-platform support, including dynamic fallback of timer
  implementations on ARM platforms, a better RISC-V and AArch64 harness,
  and more.

  On AArch64, it tests which timer out of pmccntr_el0, linux perf,
  macos kperf, cntvct_el0 is available, without the user needing to
  configure things, and falling back on clock_gettime if neither of
  them can be used. This means one automatically gets the best
  available timer, if userspace access to pmccntr_el0 has been
  unlocked with a kernel module, or if one has permission to use
  the perf API, or if the cntvct_el0 is exact enough to be useful.

  On AArch64 macOS, there is now a test harness that catches clobbered
  registers and stack clobbering, like on other platforms.

- An option for setting affinity, for benchmarking on heterogenous
  core systems. (On Linux, this is already easily done through
  taskset, but on Windows, the checkasm built in option makes it
  possible there as well, and portable.)

- Printing of the tested CPU core name, where possible.

To integrate this external implementation of checkasm into FFmpeg,
without having to build libcheckasm as an external library, the upstream
sources are added as a git subtree, and integrated into the FFmpeg
build system as a foreign source.

For the long and storied history of how we arrived at this solution,
see: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/22546

The relevant config headers for checkasm are generated by configure,
and the sources are built as part of the main ffmpeg build. The
upstream sources, while they use meson as primary build system,
are structured to make it easy to build as part of a foreign build
system.

The existing testcases are mostly kept untouched (only three minor
changes are required, in crc.c, sw_ops.c and vp8dsp.c), while the
majority of the logic from checkasm.c, checkasm.h and the arch
specific assembly files are removed, replaced with the external
implementation.

Co-Authored-By: Martin Storsjö <martin@martin.st>
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-04 11:44:52 +02:00
Niklas Haas 21ac0b276e Merge commit 'df966476d760f1bfe4c5f52c463b82be5bf6b9ed' as 'tests/checkasm/ext'
To reproduce this commit, run:

$ git subtree add --squash --prefix=tests/checkasm/ext \
  https://code.ffmpeg.org/FFmpeg/checkasm.git master

To update at a later point in time, replace `add` by `pull`
2026-06-04 11:44:40 +02:00
Niklas Haas 66eaaa644a Squashed 'tests/checkasm/ext/' content from commit 0df02535c7
git-subtree-dir: tests/checkasm/ext
git-subtree-split: 0df02535c7435cf3969ca141c9e3ff7b1c1e6c28
2026-06-04 11:44:26 +02:00
Niklas Haas 362e309710 forgejo/codespell: exclude tests/checkasm/ext
Pre-emptively exclude the external checkasm sources. Split off from the
following merge commit to make the history easier to follow.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-04 11:44:22 +02:00
Niklas Haas 566dd20247 tests/fate/source-check.sh: exclude tests/checkasm/ext
Pre-emptively exclude the external checkasm sources. Split off from the
following merge commit to make the history easier to follow.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-04 11:44:22 +02:00
Niklas Haas 068173f329 tests/checkasm: factorize out randomize_buffer for doubles
Not only is this duplicating code, but it also hard-codes a reference to
`checkasm_lfg`, which I want to eliminate in the interest of being able to
switch out the checkasm implementation.
2026-06-04 11:44:22 +02:00
Niklas Haas 8df8f8b1bb swscale/x86/ops: fix typo
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-03 23:53:47 +02:00
Niklas Haas 71b4666ba5 tests/checkasm/sw_ops: re-indent after previous change
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-03 23:53:37 +02:00
Niklas Haas 7af4faf6df tests/checkasm/sw_ops: skip test data setup if not testing anything
The test data size is quite large, so re-setting up unused data is eating up
quite a significant amount of CPU time.

This commit cuts execution time of sw_ops in half.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-03 23:53:23 +02:00
Niklas Haas ef182f2289 swscale/tests/sws_ops: avoid confusing double label
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-03 23:53:15 +02:00
Niklas HaasandNiklas Haas b8bfd7800a swscale/graph: only prefer unstable backends with SWS_UNSTABLE
If the user passes `-backends all` but without `-flags unstable`, then the
default/legacy backend will be picked unless it doesn't support a given
pixel format.

This allows gradually opting into the new code to handle more pixel formats
than what the legacy backend currently supports, without disturbing the
predictable output/behavior.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-03 21:39:55 +00:00
Niklas HaasandNiklas Haas 57541f5f41 swscale/graph: move legacy fallback out of add_convert_pass()
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-03 21:39:55 +00:00
Niklas HaasandNiklas Haas dfeb4fdbc7 swscale/graph: add metadata about backends in use
Not currently publicly visible, but useful inside the test framework
nonetheless.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-03 21:39:55 +00:00
Niklas HaasandNiklas Haas 6df223ce02 swscale/format: generalize ff_test_fmt() to take SwsBackend
This allows us to test support in either the legacy code, or the ops-based
code, or both.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-03 21:39:55 +00:00
Niklas HaasandNiklas Haas 945151851e swscale/tests/swscale: add -backends option
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-03 21:39:55 +00:00
Niklas HaasandNiklas Haas 972c0cf91f swscale: add new SwsContext.backends option
This allows constraining the set of available backends. This serves as a
better replacement for the "unstable" flag, which is a bit ambiguous. Allows
users to, for example, opt into the memcpy or x86 backend, while excluding
e.g. the upcoming JIT backends.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-03 21:39:55 +00:00
Niklas HaasandNiklas Haas dc902654de swscale: add missing validation for newly added enums
Gives slightly better error messages for invalid values.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-03 21:39:55 +00:00
Niklas Haas 8a6027a54f swscale/x86/ops_int: fix write_bits over-write
This writes 4 bytes but in SSE4 mode only produces 2 bytes per vector. We
can avoid over-writing by using the appropriately sized register.

Reproducible by:
  make libswscale/tests/swscale
  libswscale/tests/swscale -dst monob -unscaled 1 -flags unstable -align_src 1 -align_dst 1

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-02 15:37:54 +02:00
Niklas Haas 8f38703323 swscale/ops_dispatch: calculate correct slice line count for tail copy
These loops were both assuming that `h` lines need to be copied; but this
varies. First of all, for plane subsampling; but more importantly, when
vertically scaling, the input line count may be substantially lower than the
actual line count.

This fixes an out-of-bounds read/write when vertically upscaling with a tail
buffer.

Verifiable via e.g.:
  make libswscale/tests/swscale
  valgrind -- libswscale/tests/swscale -s 63x63 -src yuv444p -dst rgb24 \
              -flags unstable -align_src 1 -align_dst 1

(As well as the SSIM scores, which drop from ~e-5 to ~e-3 without this fix)

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-02 15:36:42 +02:00
Niklas Haas a00db63da7 swscale/tests/swscale: add option to force specific buffer alignment
Useful to make sure the memcpy_in/out paths work as expected.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-02 15:35:00 +02:00
Niklas Haas bb5c461a47 avfilter/vf_libplacebo: setup pl_vulkan_queue.flags on import params
libplacebo versions before v365 passed .flags = 0 when retrieving the queues
from imported Vulkan devices, so we have to error out in the case of a mismatch
to avoid undefined behavior (Vulkan spec).

See-Also: https://code.videolan.org/videolan/libplacebo/-/merge_requests/856
Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-02 13:32:44 +02:00
Niklas Haas 9b9d29e09a avfilter/vf_libplacebo: don't unnecessarily set fields to 0 (cosmetic)
Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-02 13:32:44 +02:00
Niklas Haas 9fe5758da5 avutil/hwcontext_vulkan: publicly expose queue device creation flags
These are needed for interop with e.g. libplacebo, which needs to know the
correct flags to call vkGetDeviceQueue2.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-02 13:32:43 +02:00
Niklas Haas aa08cf8112 swscale/options: add missing option value for SWS_STRICT
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-23 11:31:54 +02:00
Niklas HaasandNiklas Haas 03dfac5630 fftools/ffmpeg_sched: allow throttling decoder outputs
This is a departure from the conventional idea of decoders always outputting
data as fast as possible. Instead, this allows decoders to be throttled in the
same way filter graphs can be.

This comes into play when e.g. a demuxer is feeding into two decoders, but
only one of the two decoders is actually currently needed (e.g. due to
A/V misalignment). In that case, what typically happens is that the unneeded
decoder alse decodes all frames, and then piles them up on the "buffersrc"
filter's downstream link (growing indefinitely).

Another issue this solves manifests when e.g. a single demuxer is feeding many
decoders that all try to feed frames to the same filter graph. In this case,
all decoders run as fast as posssible, leading to lock contention on the
filter graph input queue; resulting in (again) many frames piling up on the
buffersrc (or downstream filters) for the unneeded inputs that are not actually
the bottleneck, while the input that's actually undersatisfied can end up
starved for CPU time, possibly for long enough to exhaust memory limits. The
normal rate limiting fails to apply in this scenario because all decoders share
a single demuxer, and are hence rate-limited only by the demuxer speed; whereas
the demuxer is not choked because from the PoV of the scheduler, the filter
graph is simply not getting enough frames.

In a more general sense, there's a philosophical argument to be made here.
Since a decoder is typically also a decompressor, it produces more data than
it consumes. So, it a sense, it's acting like a type of producer also - in
the same way that a filter graph can produce more input that outputs.

Solve all of these issues by allowing decoders to be output-choked, which
gives the scheduler control over when decoders are allowed to output frames.
This does mean we have to add some sort of internal packet queue, because the
decoder thread may need to continue *accepting* upstream packets from the
demuxer (or else we risk stalling the demuxer), but defer the actual decoding
by placing them inside an internal "overflow" queue.

This effectively simulates a sort of "filter graph"-type semantics but
for the decoder queue.

This overflow logic is fairly self-contained inside `sch_dec_receive`, though
it is quite nontrivial. I have added as much documentation as is hopefully
needed to understand the logic.

Importantly, we cannot simply unlimit the decoder input thread queue because
the demuxer relies on backpressure from the decoder to rate limit itself. (Note
that demuxers may only be active if there is at least one downstream decoder
that is alse active, so we always have at least one decoder providing
backpressure)

Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-23 08:41:12 +00:00
Niklas HaasandNiklas Haas 2b72d5243c fftools/ffmpeg_sched: drain incoming frames before blocking filters
When a filter is choked, but upstream threads are trying to write to its input,
this can result in the filter's input queue getting stuck. Normally, the
unchoke_downstream() logic would prevent this from happening, since the
filter would itself get unchoked as a result of upstream decoders receiving
pressure from the demuxer.

However, upcoming changes to this logic will require weakening this upstream
unchoking logic, so preventing the deadlock in a more elegant way helps with
making the code more robust.

Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-23 08:41:12 +00:00
Niklas HaasandNiklas Haas 95391352b5 fftools/thread_queue: add THREAD_QUEUE_FLAG_NO_BLOCK
Exactly what it says on the tin. There is some ambiguity as to whether this
should also prevent reading from *choked*, as opposed to empty queue, but
I think it makes sense to consider them equivalent, as I struggle to think
of a use case where it would be beneficial to allow draining a queue that
was explicitly choked by the upstream (to e.g. prevent further reads).

Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-23 08:41:12 +00:00
Niklas HaasandNiklas Haas 321b0e36a3 fftools/thread_queue: add flags parameter to tq_receive()
I want to use this to allow a non-blocking use of this function.

Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-23 08:41:12 +00:00
Niklas HaasandNiklas Haas 6a563dab71 fftools/ffmpeg_sched: allow choosing nodes to unchoke
This level of granularity will help for the upcoming patch.

Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-23 08:41:12 +00:00
Niklas HaasandNiklas Haas 04888287b3 fftools/ffmpeg_sched: fix sch_stop() and schedule_update_locked() race
schedule_update_locked() is supposed to be a no-op when `sch->terminate`
is 1. However, there is a TOCTOU error here, where a different thread may
currently be executing schedule_update_locked(), having successfully passed
the sch->terminate check but without actually updating the choke status.

This does not matter for the current code, but will matter with the following
commit, where it creates the theoretical possibility of a race where sch_stop()
is trying to choke the demuxers (and unchoke the decoders) while
schedule_update_locked() is simultaneously trying to choke the decoders,
leading to a deadlock if the last decoder is left choked and unable to
propagate EOF downstream.

The cleanest solution is to just take the scheduler lock while updating the
choke status here. This ensures that any other schedule_update_locked() calls
will have completed.

Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-23 08:41:12 +00:00
Niklas HaasandNiklas Haas 0d123a3c23 fftools/ffmpeg_sched: use macros for schedule_update_locked() loops
Instead of awkwardly looping over the type, just split this up into
multiple loops. The loss in complexity seems worth the loss in conciseness
to me, and more importantly, this allows us to easily add more waiter types.

Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-23 08:41:12 +00:00
Niklas HaasandNiklas Haas d94c293e62 swscale/ops_dispatch: prevent float over-read when horizontal filtering
The code made the fundamental assumption that over-read into the padding
bytes is okay to do; because the most that can happen is that those pixel
values end up corrupted, which doesn't affect any adjacent pixels.

However, this is not true for SWS_OP_FILTER_H, because this operation
fundamentally mixes together horizontal pixels. Normally, this was fine,
because the filter weights for those pixels are set to 0, and 0 * x = 0.

However, that is not true for floating point inputs, which can contain
Infinity; and 0 * Infinity = NaN, thus corrupting the entire pixel.

Solve it by specifically preventing over-read when it would be unsafe.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-20 21:45:28 +00:00
Niklas HaasandNiklas Haas 6bc0f9517c swscale/ops_dispatch: rename filter_size to filter_size_h
Since this is not set for vertical filters.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-20 21:45:28 +00:00
Niklas HaasandNiklas Haas 0c1a1ee12e swscale/ops_optimizer: don't push scale past truncating conversions
In an op list like:

  [ u8 +XXX] SWS_OP_READ         : 1 elem(s) planar >> 3
  [ u8 .XXX] SWS_OP_FILTER_V     : 256 -> 320 bilinear (2 taps)
  [f32 .XXX] SWS_OP_SCALE        : * 65535
  [f32 +XXX] SWS_OP_CONVERT      : f32 -> u16
  [u16 zXXX] SWS_OP_SWAP_BYTES
  [u16 zzzX] SWS_OP_SWIZZLE      : 0003
  [u16 zzz+] SWS_OP_CLEAR        : {_ _ _ 65535}
  [u16 XXXX] SWS_OP_WRITE        : 4 elem(s) packed >> 0

The current version of the code would happily push the SWS_OP_SCALE past
the truncating conversion, leading to degenerate loss of information. (In
this case, the result was quite extreme)

Affects quality across a wide range of formats, e.g.:

 rgb24 16x16 -> rgb48be 16x32:
   [ u8 +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
     min: {0 0 0 _}, max: {255 255 255 _}
   [ u8 ...X] SWS_OP_FILTER_V     : 16 -> 32 bilinear (2 taps)
     min: {0 0 0 _}, max: {255 255 255 _}
+  [f32 ...X] SWS_OP_SCALE        : * 257
+    min: {0 0 0 _}, max: {65535 65535 65535 _}
   [f32 +++X] SWS_OP_CONVERT      : f32 -> u16
-    min: {0 0 0 _}, max: {255 255 255 _}
-  [u16 +++X] SWS_OP_SCALE        : * 257
     min: {0 0 0 _}, max: {65535 65535 65535 _}
   [u16 zzzX] SWS_OP_SWAP_BYTES
     min: {0 0 0 _}, max: {65535 65535 65535 _}
   [u16 XXXX] SWS_OP_WRITE        : 3 elem(s) packed >> 0
     (X = unused, z = byteswapped, + = exact, 0 = zero)

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-17 10:41:34 +00:00
Niklas HaasandNiklas Haas 812b5654ae swscale/tests/sws_ops: use SWS_SCALE_BILINEAR for printing ops lists
This actually changes the behavior vs SWS_SCALE_POINT, because point scaling
is bit-exact and thus implies a different set of optimizations.

Ideally, we would still try and somehow merge this with tests/swscale.c to
allow testing a different set of scalers; but I still don't have a good idea
for how to accomplish that here.

As it stands, results in additional extra dithering steps in almost all
filters involving scaling, e.g.:

 rgb24 16x16 -> rgb24 16x32:
   [ u8 +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
     min: {0 0 0 _}, max: {255 255 255 _}
-  [ u8 +++X] SWS_OP_FILTER_V     : 16 -> 32 point (1 taps)
+  [ u8 ...X] SWS_OP_FILTER_V     : 16 -> 32 bilinear (2 taps)
     min: {0 0 0 _}, max: {255 255 255 _}
+  [f32 ...X] SWS_OP_DITHER       : 16x16 matrix + {0 3 2 -1}
+    min: {1/512 1/512 1/512 _}, max: {255.998047 255.998047 255.998047 _}
+  [f32 ...X] SWS_OP_MIN          : x <= {255 255 255 _}
+    min: {1/512 1/512 1/512 _}, max: {255 255 255 _}
   [f32 +++X] SWS_OP_CONVERT      : f32 -> u8
     min: {0 0 0 _}, max: {255 255 255 _}
   [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) packed >> 0
     (X = unused, z = byteswapped, + = exact, 0 = zero)

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-17 10:41:34 +00:00
Niklas HaasandNiklas Haas 2dfe055ddd swscale/tests/sws_ops: print split sub-passes for lists with filters
This allows us to inspect exactly the logic that is going on inside the CPU
backends (which don't support bare filter passes).

 rgb24 16x16 -> rgb24 16x32:
   [ u8 +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
     min: {0 0 0 _}, max: {255 255 255 _}
   [ u8 +++X] SWS_OP_FILTER_V     : 16 -> 32 point (1 taps)
     min: {0 0 0 _}, max: {255 255 255 _}
   [f32 +++X] SWS_OP_CONVERT      : f32 -> u8
     min: {0 0 0 _}, max: {255 255 255 _}
   [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) packed >> 0
     (X = unused, z = byteswapped, + = exact, 0 = zero)
+ Retrying with split passes:
+  [ u8 +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
+    min: {0 0 0 _}, max: {255 255 255 _}
+  [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) planar >> 0
+    (X = unused, z = byteswapped, + = exact, 0 = zero)
+ Sub-pass #1:
+  [ u8 +++X] SWS_OP_READ         : 3 elem(s) planar >> 0 + 1 tap point filter (V)
+    min: {0 0 0 _}, max: {255 255 255 _}
+  [f32 +++X] SWS_OP_CONVERT      : f32 -> u8
+    min: {0 0 0 _}, max: {255 255 255 _}
+  [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) packed >> 0
+    (X = unused, z = byteswapped, + = exact, 0 = zero)
 rgb24 16x16 -> rgb24 32x16:
   [ u8 +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
     min: {0 0 0 _}, max: {255 255 255 _}
   [ u8 +++X] SWS_OP_FILTER_H     : 16 -> 32 point (1 taps)
     min: {0 0 0 _}, max: {255 255 255 _}
   [f32 +++X] SWS_OP_CONVERT      : f32 -> u8
     min: {0 0 0 _}, max: {255 255 255 _}
   [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) packed >> 0
     (X = unused, z = byteswapped, + = exact, 0 = zero)
+ Retrying with split passes:
+  [ u8 +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
+    min: {0 0 0 _}, max: {255 255 255 _}
+  [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) planar >> 0
+    (X = unused, z = byteswapped, + = exact, 0 = zero)
+ Sub-pass #1:
+  [ u8 +++X] SWS_OP_READ         : 3 elem(s) planar >> 0 + 1 tap point filter (H)
+    min: {0 0 0 _}, max: {255 255 255 _}
+  [f32 +++X] SWS_OP_CONVERT      : f32 -> u8
+    min: {0 0 0 _}, max: {255 255 255 _}
+  [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) packed >> 0
+    (X = unused, z = byteswapped, + = exact, 0 = zero)

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-17 10:41:34 +00:00
Niklas HaasandNiklas Haas 369a301669 swscale/tests/sws_ops: use a dummy ops backend for printing
This ensures that the ops printing path goes through the same code as the
actual ops dispatch backend, including all sub-passes etc.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-17 10:41:34 +00:00
Niklas Haas 76dc83d9be swscale/ops_dispatch: make ff_sws_ops_compile() output optional
Allows the uops macro generation code to not actually compile any passes.
More generally, this could be used to e.g. test if an op list is supported by
a backend without actually creating the passes.

The `bool first` change is needed because the `input == prev` check no longer
works if we don't actually compiled any passes.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas 420b1bf368 swscale/ops_dispatch: allow forcing specific ops backend
This will be used eventually when I rewrite checkasm/sw_ops to re-use the
code in ops_dispatch.c instead of hand-rolling the execution layer.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas 9021448857 swscale/ops_dispatch: merge ff_sws_ops_compile_backend() and compile()
Passing backend == NULL now loops over the backends as before.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas ad17144ce6 swscale/ops_dispatch: move op list print to ff_sws_ops_compile_backend()
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas 90669ab52e swscale/ops: move ff_sws_compile_pass() and friends to ops_dispatch.h
This function actually lives in ops_dispatch.c, and doesn't really make
sense in ops.h anymore. We should also move some stuff out of ops_internal.h,
which doesn't depend on any external ops stuff, here.

This allows the backend/compilation-related stuff to co-exist more nicely.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas 1d841635a4 swscale/ops: also include scaling ops in ff_sws_enum_op_lists()
Using the configured scaler from the SwsContext implicitly. This does affect
the output of libswscale/tests/sws_ops.c, which now prints about 4x as much
data (taking roughly 4x as long, but still within a second on my machine).

We can make this process a lot faster by forcing SWS_SCALE_POINT as the
scaler, which skips calculating any actual filter weights in favor of
generating a trivial 1-tap filter.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas eec9f712f5 swscale/ops: re-use ff_sws_op_list_generate() in ff_sws_enum_op_lists()
The only difference here is an extra ff_sws_add_filters() call, which is
a no-op because src w/h = dst w/h = 16.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas cac183f46f swscale/ops: don't silently suppress non-ENOTSUP errors
Matches the behavior to the comment.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas dacbf080f3 swscale/ops_chain: simplify ff_sws_op_compile_tables() signature
This no longer accesses prev/next as a result of the `unused` removal, so
the signature can be simplified to just take the op directly.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas 064600585e swscale/ops_chain: remove flexible from SWS_OP_MIN/MAX entries
We have other op types that skip checking the data even in non-flexible mode,
so there is a precedent for just leaving away `flexible` for such kernels.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas 98c1dbafbe swscale/ops_memcpy: don't depend on ops_backend.h
This is private to the C template based backend.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas 62aad4513c swscale/graph: move format conversion logic to formats.c
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas 0611abc1bb swscale/graph: move code for adding filters to format.h
Mirroring the precedent established by the other SwsOp-generating functions.
This allows us to re-use it for the uops macro generator.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas 9fe0ff3d56 swscale/graph: make _reinit() only call _init(), not _create()
This allows us to preserve the same memory allocation when
reinitializing a graph, which is a nice bonus.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00
Niklas Haas 56305c460c swscale/graph: add ff_sws_graph_alloc() and _init()
As an alternative to the current _create() API.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-05-15 18:53:05 +02:00