ffmpeg

x/ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-06-11 08:13:06 +00:00

Author	SHA1	Message	Date
Niklas HaasandNiklas Haas	976e18fdef	swscale/x86: use correct HOSTCC_E flag instead of CC_E HOSTCC and CC might be completely different compilers. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-10 15:04:50 +00:00
Niklas Haas	7b59a86633	swscale/uops_tmpl: move attributes before `static` keyword This fails to compile with C23 standard attributes otherwise. Technically only av_unused requires this, but move the other attributes as well for consistency / future proofing. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-10 16:27:58 +02:00
Niklas Haas	941a35149b	swscale/x86/ops_int: switch to SWS_UOP_MOVE Instead of SWS_UOP_PERMUTE/SWS_UOP_COPY. No real measurable difference in performance (it just eliminates a few practically free register renames), but definitely simpler. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas HaasandRamiro Polla	36004d681f	swscale/uops: add SWS_UOP_MOVE for optimal register-register swizzles This decomposes a swizzle mask into a series of optimal register-register moves, using at most two temporary scratch registers. This is a better match for ASM-style backends than the existing PERMUTE/COPY uops that are designed for the needs of the C backend (or other backends which either apply the swizzle mask directly or permute pointers). I originally had logic equivalent to this written in NASM macros, but it was just such a complicated mess that I think it's better to rewrite it in C and have the resulting metadata be an explicit part of the uop definition. This commit only adds the uop, I'll update the x86 implementation in the next step. Co-authored-by: Ramiro Polla <ramiro.polla@gmail.com> Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	228ef8d97b	swscale/ops: make compile() take const SwsOpList * The old x86 backend was the only backend that actually mutated the ops list. With this gone, we can constify this parameter. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	a7c6a5f74e	swscale/ops_chain: remove dead code This is no longer needed now that both C and x86 are ported to uops. The other ff_sws_setup_*() functions are still used by the aarch64 backend. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	43a8e2da01	swscale/x86/ops: rewrite based on uops_macros.h This is a ground-up refactor of the existing x86 ops code, using the new uops macros to auto-generate every single kernel instance without guesswork. While I was at it, I also cleaned up the file a bit and made sure we have only a single, consistent way of writing/defining the kernels. This also gets rid of some of the old boilerplate like decl_pattern. Most kernels are trivial ports, but a few deserve attention or note: - SWS_UOP_LINEAR is now generated more efficiently, thanks to the distinction between 0/1/arbitrary components. I also rewrote the code to keep track of whether the output was initialized yet or not, which lets us skip the initial `xorps` and `addps` for the first component. - SWS_UOP_PERMUTE is generated automatically by using some NASM logic to detect permutation cycles and emit the minimal sequence of `mova` instructions. SWS_UOP_COPY, on the other hand, is implemented naively. I originally had a more complex implementation that could handle both, but I decided it really isn't worth the complication just to save 2-3 cycles. - SWS_UOP_SCALE now has a native 8-bit implementation, which is faster than falling back to C code. - SWS_UOP_SWAP_BYTES is no longer compiled as a type-agnostic pshufb, instead we hard-code the shuffle mask - SWS_UOP_DITHER is now much simpler and avoids branching etc. entirely Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	257f1438a5	swscale/x86/ops: simplify mmsize determination No reason for this to be a separate function also, it just obscures the error path for no reason. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	2a09d0346e	swscale/x86/ops_include: clarify/fix some comments Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	6deae052a2	swscale/x86/uops: generate NASM macros using uops_macros.h Rather than hard-coding a separate set of NASM macros, or generating them with a separate function, we can just leverage the C preprocessor to generate a NASM source file from the existing ops macros. This is maybe a bit unorthodox, but it avoids unnecessary overhead from re-generating the macros twice, avoids manual updating of the NASM macros, and generally does not come with any real downside except being a bit ugly. The main source of ugliness is the fact that the C preprocessor expands everything into a single line, whereas NASM expects separate statements to be on separate lines. Very fortunately, we can work around this by writing a another NASM macro to take its arguments and dump them onto multiple lines. It may seem premature, but I went ahead and defined all the macros, since it was easy enough to do. I added the %include in this commit to trigger build errors that occur only as a result of introducing this file in the same commit that introduces it. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	6057759ffc	swscale/uops: parametrize filter op result type The ops.h infrastructure currently hard-codes this as SWS_PIXEL_F32, but I want to at least properly parametrize this in case we ever decide to revisit this decision in the future. In particular, it may become relevant for trivial kernels or kernels whose intermediates are bounded, exact integers (which could possibly be output directly as e.g. U16 or U32). The FATE change is just because the filter op names gained a suffix. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	4a8a1f5b8b	swscale/uops: add SWS_UOP_READ_PLANAR_FV_FMA Analog of SWS_UOP_READ_PLANAR_FV for FMA-enabled backends. The logic for determining when we can safely use FMA is maybe a bit obtuse, given that a `return type == SWS_PIXEL_U8` would have just done the trick as well, but better to be safe than sorry, if we ever decide to tune this constant in the future. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	dbe961b4cd	swscale/uops: add SWS_UOP_LINEAR_FMA and SWS_UOP_FLAG_FMA This is like SWS_UOP_LINEAR but parametrized by which matrix entries can use FMA instead of bitexact IEEE mul/add instructions. I decided to make these a separate uop to avoid bogging down the reference backend with arch-specific details like FMA. However, I think FMA ops are quite common/universal so I pre-emptively split it into its own separate flag rather than defining something like SWS_UOP_FLAG_X86. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	4e18068165	swscale/uops: also generate macros under SWS_BITEXACT And SWS_BITEXACT\|SWS_ACCURATE_RND, for completeness. This roughly doubles the runtime of the uops macros generation. Let's hope it doesn't explode further. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	157f586e5c	swscale/uops: thread SwsContext through ff_sws_ops_translate() Needed to access ctx->flags, in particular SWS_BITEXACT. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	f97ba8cbe7	swscale/uops: loop over all flags when generating macros This list is currently empty but will be expanded by the following commit. I briefly tested whether it would be worth avoiding the free/realloc on the uops array, but found the performance difference to be negligible. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	02a168a576	swscale/uops: keep track of input range during op translation Needed for the FMA decision logic. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	3f9219d605	swscale/uops: add SwsUOpFlags to ff_sws_ops_translate() These will be used to e.g. enable extra uops during translation. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	b7a80a9f0d	swscale/ops_backend: delete ops-based C backend And make uops_backend.c the new reference. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	100ce4ac41	tests/checkasm/sw_ops: rewrite using uops_macros.h This ensures 100% coverage of all uop primitives by generating the set of tests exactly from the list of seen primitives, using the uops macros. There are some annoying quirks still because of the fact that we have to essentially "untranslate" the UOPs back to SwsOps that result back in the intended uop after the translation, but overall it's not too bad and still much better than the status quo of hand-rolling the list of test cases. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	636b9eda75	swscale/ops_tmpl_float: allow arbitrary values for 1x1 dither Removes the 1x1 dither fast path, mirroring the previous commit. This is not really needed nor useful but it will make the transition to the uops architecture slightly easier, as 1x1 dither gets reinterpreted as SWS_UOP_ADD there. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	ca8774b9d6	swscale/x86: remove broken and unnecessary 1x1 dither fast path This is broken because it fails to check dither.y_offset[] to determine if dithering for a channel is requested or not. This is unnecessary because the generic dither code already jumps over unused components, which is cheap enough not to worry about this special case for now. This code will, in any case, soon be replaced by a uops_macros.h-derived approach. This commit is only needed as a stopgap to make checkasm continue working after the sws_uops refactor. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	19652a83a2	swscale/x86/ops_include: use %assign instead of %xdefine For numeric 1/0 constants. As an aside, fix the broken comment. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	b328e152a4	swscale/x86: move entry points to ops_common.asm As well as the packed shuffle solver. These don't really interact with the rest of the code in ops_int.asm, which is, by name at least, intended for integer op kernels. More importantly, these functions will be shared with the uops rewrite. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	c5c9c6d996	swscale/x86: rename ops_common.asm to ops_include.asm Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	8118e964bb	swscale/uops: auto-generate reference C backend from uops_macros.h Instead of choosing by hand which kernels to implement, this rewrite focuses on leveraging the power of uops_macros.h to auto-generate all needed kernels. This not only simplifies maintenance, but also improves performance. I have decided to develop the replacement backend as a separate file, under a separate prefix, for the explicit purpose of being able to verify the correctness of the rewrite using the current backend as a checkasm reference. The code for the kernels themselves has been largely copied from the old C backend, modified slightly to conform to the uop template style. This does result in some code duplication, but a following commit will clean it up. I nonetheless want to preserve this commit for bisection purposes, to ensure we have one commit that contains both backends side-by-side. Overall speedup=1.182x faster, min=0.197x max=3.450x The big slowdowns are flukes caused by tiny deviations in the runtime of a noop memcpy conversion. As a nice side benefit, the compiled binary is now also ~10% smaller, and the code ~50% smaller. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	1e268fbedf	swscale/ops_chain: add uop-based helpers to assemble SwsOpChain This will eventually replace the existing op_match() and ff_sws_op_compile_tables(), but I've decided to introduce it separately first so that I can incrementally update the backends to use the new API, at the cost of some temporary code duplication. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	adaf142647	swscale/uops: generate uop helper macros This follows the same approach as is used currently by ops_entries_aarch64, except I decided to have the generation logic live directly in uops.c to allow re-using internal helpers and move it closer to the other helpers that depend on the exact set of uops and their fields. Unlike libswscale/tests/sws_ops.c, we make an effort to actually test all relevant flag combinations, since these can affect the generated op lists. I will use these macros to auto-generate both the C template-based kernels, as well as the entire x86 backend, in the near future, hence their excessive flexibility. Re-use the libswscale/tests/sws_ops.c that we already compile. We could put it in its own file but this is just as convenient, and it's easily moved anyways. Having it be a FATE test ensures that it is always up-to-date. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	8ad7cc6ccd	swscale/tests/sws_ops: also print/test micro-op list Tests for changes or regressions in the generated micro-ops. This will be instrumental in my development of the micro-ops optimizer, and my plans to phase out some of the macro-op optimization passes in favor of doing those optimizations on the uop level instead. rgb24 16x16 -> rgb24 16x32: [ u8 +++X] SWS_OP_READ : 3 elem(s) packed >> 0 min: {0 0 0 _}, max: {255 255 255 _} [ u8 ...X] SWS_OP_FILTER_V : 16 -> 32 bilinear (2 taps) min: {0 0 0 _}, max: {255 255 255 _} [f32 ...X] SWS_OP_DITHER : 16x16 matrix + {0 3 2 -1} min: {1/512 1/512 1/512 _}, max: {255.998047 255.998047 255.998047 _} [f32 ...X] SWS_OP_MIN : x <= {255 255 255 _} min: {1/512 1/512 1/512 _}, max: {255 255 255 _} [f32 +++X] SWS_OP_CONVERT : f32 -> u8 min: {0 0 0 _}, max: {255 255 255 _} [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) packed >> 0 (X = unused, z = byteswapped, + = exact, 0 = zero) Retrying with split passes: [ u8 +++X] SWS_OP_READ : 3 elem(s) packed >> 0 min: {0 0 0 _}, max: {255 255 255 _} [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) planar >> 0 (X = unused, z = byteswapped, + = exact, 0 = zero) + translated micro-ops: + u8_read_packed_xyz + u8_write_planar_xyz Sub-pass #1: [ u8 ...X] SWS_OP_READ : 3 elem(s) planar >> 0 + 2 tap bilinear filter (V) min: {0 0 0 _}, max: {255 255 255 _} [f32 ...X] SWS_OP_DITHER : 16x16 matrix + {0 3 2 -1} min: {1/512 1/512 1/512 _}, max: {255.998047 255.998047 255.998047 _} [f32 ...X] SWS_OP_MIN : x <= {255 255 255 _} min: {1/512 1/512 1/512 _}, max: {255 255 255 _} [f32 +++X] SWS_OP_CONVERT : f32 -> u8 min: {0 0 0 _}, max: {255 255 255 _} [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) packed >> 0 (X = unused, z = byteswapped, + = exact, 0 = zero) + translated micro-ops: + u8_read_planar_fv_xyz + f32_dither_xyz_0_3_2_16x16 + f32_min_xyz + f32_to_u8_xyz + u8_write_packed_xyz ... Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 01:11:01 +02:00
Niklas Haas	3a7331d311	swscale/ops: remove unused function ff_sws_enum_ops() Users can trivially recreate this logic anyways. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 01:10:57 +02:00
Niklas Haas	6b75166758	swscale/tests/sws_ops: minor cleanup / consistency Clean up after the previous revert. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 01:10:54 +02:00
Niklas Haas	dcfe3d3b90	Revert "swscale/tests/sws_ops: add option for summarizing all operation patterns" This reverts commit `f76aa4e408`. This is no longer needed once we switch to uops_macros.h, which will do the same thing except better.	2026-06-09 01:10:49 +02:00
Niklas Haas	aaf6a52fe6	swscale/uops: add uop translation logic This will replace the fuzzy matching logic in op_match() that is used by the C and x86 implementations, as well as the translation to AARCH64_OP_* that is used by the NEON asmgen backend. Down the line, this function will also take a set of flags to enable backend-specific kernels like FMA variants, but I also decided to keep it initially simple to ease the transition. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 01:10:39 +02:00
Niklas Haas	dc88bcdf8c	swscale/uops: add uop definitions Taken from AARCH64_OP_*, but generalized/simplified a bit and updated to add missing op types, especially for special cases that already have dedicated implementations on x86. This initial definition is kept intentionally simple and close to SwsOp, to make it easier to port the existing ops backends to the new infrastructure. However, in the future, this will be refactored dramatically - distinctions like convert vs expand will cease to exist on the SwsOp level, and will instead be introduced by separate optimization passes on the uops level. SWS_UOP_LINEAR in particular will most likely be broken up into multiple uops. I also took this opportunity to redefine the mask in a more useful way. I decided to split up SWS_OP_CONVERT as well, because it was making x86 codegen unnecessarily difficult due to the strong interaction between exact pixel sizes. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 01:09:34 +02:00
Niklas Haas	ae6f3ce02c	swscale/uops: split off from ops.h Forming what will be the start of a larger helper file for backend-internal translation of higher-level ops into lower level kernels. This header file needs to be includable from independent source files, as it will be used to provide definitions for build-time code generation (e.g. ops_asmgen.c), so it must be self-contained. Pulling in all of ops.h from uops.h would be too large dependency, since ops.h pulls in graph.h, refstruct, bprint, etc. It's easier to start from a fresh file that is documented as being usable at compile time. For now, just declare the common types that will be needed by the uops layer. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-08 18:29:02 +02:00
Niklas Haas	48a42b5f21	configure: add -P to $CC_E flag This suppresses the addition of #line directives in the preprocessed output, which is what we want when we're invoking the hostcc just to preprocess some files. (Currently, this variable is only used for configure-internal checks anyways, but I want to use it to preprocess a NASM file) On MSVC/Intel, /EP is the equivalent syntax, though we use -EP instead for consistency. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-08 18:24:45 +02:00
Niklas HaasandNiklas Haas	3137d337fe	tests/checkasm/sw_ops: use new checkasm_set_func_variant() The current approach of re-testing the C reference for every backend separately leads to both confusing output (e.g. having an extra redundant `memcpy_c` line for every op, even those not implemented by the memcpy backend), as well as a lot of unnecessary wasted time re-testing and re-benching the same C variant for every backend. This new API function lets us test the C function only a single time, while simultaneously having all of the other backends implicitly compare themselves against the C reference. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-07 09:24:23 +00:00
Niklas HaasandRamiro Polla	7fc7aaf265	swscale/graph: prefer ops backend for floating point formats These have horrible support in legacy swscale; in particular, they break the pixel range (limited vs full) when converting to yuva444p, resulting in SSIM errors like: uyva 96x96 -> grayf32le 96x96, SSIM={Y=0.997654 U=1.000000 V=1.000000 A=1.000000} loss=1.876414e-03 loss 1.876414e-03 is worse by 1.864254e-03, expected loss 1.215935e-05 (The ops-based backend gets a 100% bit-exact roundtrip here) Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-05 22:37:18 +02:00
Niklas HaasandRamiro Polla	5a6bf8d4f4	swscale/tests/swscale: allow all backends for auxiliary conversions This enables testing all internally supported pixel formats. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-05 22:32:51 +02:00
Niklas HaasandRamiro Polla	8366d5f8d6	swscale/tests/swscale: refactor format testing logic Uses the internal ff_sws_test_pixfmt_backend() to test for format support on the concrete backend that's in-use for the auxiliary/main conversions, respectively, while taking into account the -backends and -api options. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-05 22:22:27 +02:00
Niklas HaasandRamiro Polla	517c3d5fc1	swscale/graph: re-check pixel format support in add_legacy_sws_pass() When the user passes multiple backends (e.g. SWS_BACKEND_ALL), the static check in sws_setup_frame() might have succeeded for the ops backend but not the legacy backend, so we need to properly restrict the legacy backend implementation function as well. Otherwise, this may trigger internal errors / AVERROR(EINVAL) inside sws_init_context(). Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-05 21:55:27 +02:00
Niklas Haas	afce637550	avformat/shared: add option to verify cache file contents This will effectively disable the cache but allows the cache layer to verify cached files against the original input file. Useful only for debugging the shared cache protocol itself, as file corruption can already be caught by the CRC check.	2026-06-04 17:48:12 +02:00
Niklas Haas	ca748964fe	avformat/shared: implement 16-bit CRC check Decided to split this off from the previous commit in case we ever want to revert it, since it does double the overhead of the spacemap as well as adding extra overhead to both the read and write path. Bump the cache version to 2 to reflect the changed disk format.	2026-06-04 17:48:12 +02:00
Niklas Haas	56de70a2e6	avformat: add shared concurrent block cache protocol This adds a new protocol shared:URI which is distinct from the existing `cache:` in that it is explicity designed to be thread-safe and cross-process, enabling multiple ffmpeg processes (or multiple ffmpeg decoders within the same process) to share a single cache file, for e.g. a remote HTTP stream. As such, it uses a radically different internal design. To facilitate zero-knowledge cross-process interoperability, the cache file itself is just a memory-mapped representation of the underlying file data, which has the side benefit that the resulting cache file will contain a working copy of the streamed file (assuming the stream was read to completion). To keep track of which regions are cached and which are not, we use a secondary file that contains a minimal header along with a static bytemap of blocks within the file. This secondary file is also used to store metadata such as the filesize, if known, as well as marking "failed" blocks. Both files can grow dynamically in order to accommodate larger/growing files, and can be atomically updated (through the use of shared space maps). I have extensively checked the space map initalization and update code for race conditions, and I believe the current design to be solid. That said, it is the user's responsibility to some extent to ensure that the same URI is not used for different streams, as we rely on the URI to uniquely identify the cache files. That said, we use a cryptographic hash with sufficient collision resistance to protect against possible abuse. The lack of any implicit default on `-cache_dir` also means that `shared:` can't be enabled via URL injection to possibly access random files on the disk (or intentionally leak content from other streams with similar URIs, even if the cryptograhic hash function is broken).	2026-06-04 17:48:12 +02:00
Niklas Haas	cd3f335207	avformat/file: return ENOSYS for filesize query on files with follow=1 If the input is expected to grow, we shouldn't make any assumptions about the file size. This matches e.g. the behavior of streamed protocols like chunked HTTP, which similarly return ENOSYS for streams of unknown size. Sponsored-by: nxtedition AB Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-04 17:48:12 +02:00
Niklas Haas	7cb93fb200	avformat/http: return ENOSYS instead of UINT64_MAX for unknown filesize This matches the behavior of e.g. the pipe: protocol, which returns ENOSYS on account of ffurl_seek() not being implemented. The previous behavior of returning s->filesize directly is almost surely a bug, as s->filesize is UINT64_MAX when never initialized. Sponsored-by: nxtedition AB Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-04 17:48:12 +02:00
Niklas Haas	c27a3b12e3	configure: re-indent after previous change Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-04 11:44:52 +02:00
Niklas Haas	310ff99f62	configure: support building without checkasm Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-04 11:44:52 +02:00
Niklas HaasandMartin Storsjö	3b1d7cd1f7	tests/checkasm: switch to shared libcheckasm implementation The checkasm tool originated in x264. It was later rewritten and modernized for FFmpeg (and relicensed to LGPL). For the dav1d project, it was relicensed again to 2-clause BSD (with permission from the relevant authors). The FFmpeg and dav1d implementations of checkasm have since evolved independently (with some amount of ported code between the two, with relicensing permission where relevant). To synchronize the development, and to make it possible to easily adopt checkasm in other projects, it has been split out into a standalone project/library on its own, developed at https://code.videolan.org/videolan/checkasm/. That version has all the features of checkasm in both FFmpeg and dav1d, and has got a number of extra improvements on top: - More/fixed tests (e.g. properly clobbering high bits of 32-bit registers on most platforms), - Vastly improved overall performance / runtime for benchmarking, due primarily to the ability to scale the runtime of each test to that test's complexity. - Much more robust statistical analysis of benchmarking results; including robust outlier rejection, an estimation of the histogram, and the ability to report the variance / stddev in addition to the (trimmed) mean. - Interactive HTML and JSON output formats in addition to CSV/TSV. - More readable and user-friendly output across the board, especially for failures and data dumps (e.g. also showing errors inside padding bytes). - Better cross-platform support, including dynamic fallback of timer implementations on ARM platforms, a better RISC-V and AArch64 harness, and more. On AArch64, it tests which timer out of pmccntr_el0, linux perf, macos kperf, cntvct_el0 is available, without the user needing to configure things, and falling back on clock_gettime if neither of them can be used. This means one automatically gets the best available timer, if userspace access to pmccntr_el0 has been unlocked with a kernel module, or if one has permission to use the perf API, or if the cntvct_el0 is exact enough to be useful. On AArch64 macOS, there is now a test harness that catches clobbered registers and stack clobbering, like on other platforms. - An option for setting affinity, for benchmarking on heterogenous core systems. (On Linux, this is already easily done through taskset, but on Windows, the checkasm built in option makes it possible there as well, and portable.) - Printing of the tested CPU core name, where possible. To integrate this external implementation of checkasm into FFmpeg, without having to build libcheckasm as an external library, the upstream sources are added as a git subtree, and integrated into the FFmpeg build system as a foreign source. For the long and storied history of how we arrived at this solution, see: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/22546 The relevant config headers for checkasm are generated by configure, and the sources are built as part of the main ffmpeg build. The upstream sources, while they use meson as primary build system, are structured to make it easy to build as part of a foreign build system. The existing testcases are mostly kept untouched (only three minor changes are required, in crc.c, sw_ops.c and vp8dsp.c), while the majority of the logic from checkasm.c, checkasm.h and the arch specific assembly files are removed, replaced with the external implementation. Co-Authored-By: Martin Storsjö <martin@martin.st> Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-04 11:44:52 +02:00
Niklas Haas	21ac0b276e	Merge commit 'df966476d760f1bfe4c5f52c463b82be5bf6b9ed' as 'tests/checkasm/ext' To reproduce this commit, run: $ git subtree add --squash --prefix=tests/checkasm/ext \ https://code.ffmpeg.org/FFmpeg/checkasm.git master To update at a later point in time, replace `add` by `pull`	2026-06-04 11:44:40 +02:00
Niklas Haas	66eaaa644a	Squashed 'tests/checkasm/ext/' content from commit 0df02535c7 git-subtree-dir: tests/checkasm/ext git-subtree-split: 0df02535c7435cf3969ca141c9e3ff7b1c1e6c28	2026-06-04 11:44:26 +02:00
Niklas Haas	362e309710	forgejo/codespell: exclude tests/checkasm/ext Pre-emptively exclude the external checkasm sources. Split off from the following merge commit to make the history easier to follow. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-04 11:44:22 +02:00
Niklas Haas	566dd20247	tests/fate/source-check.sh: exclude tests/checkasm/ext Pre-emptively exclude the external checkasm sources. Split off from the following merge commit to make the history easier to follow. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-04 11:44:22 +02:00
Niklas Haas	068173f329	tests/checkasm: factorize out randomize_buffer for doubles Not only is this duplicating code, but it also hard-codes a reference to `checkasm_lfg`, which I want to eliminate in the interest of being able to switch out the checkasm implementation.	2026-06-04 11:44:22 +02:00
Niklas Haas	8df8f8b1bb	swscale/x86/ops: fix typo Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-03 23:53:47 +02:00
Niklas Haas	71b4666ba5	tests/checkasm/sw_ops: re-indent after previous change Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-03 23:53:37 +02:00
Niklas Haas	7af4faf6df	tests/checkasm/sw_ops: skip test data setup if not testing anything The test data size is quite large, so re-setting up unused data is eating up quite a significant amount of CPU time. This commit cuts execution time of sw_ops in half. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-03 23:53:23 +02:00
Niklas Haas	ef182f2289	swscale/tests/sws_ops: avoid confusing double label Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-03 23:53:15 +02:00
Niklas HaasandNiklas Haas	b8bfd7800a	swscale/graph: only prefer unstable backends with SWS_UNSTABLE If the user passes `-backends all` but without `-flags unstable`, then the default/legacy backend will be picked unless it doesn't support a given pixel format. This allows gradually opting into the new code to handle more pixel formats than what the legacy backend currently supports, without disturbing the predictable output/behavior. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-03 21:39:55 +00:00
Niklas HaasandNiklas Haas	57541f5f41	swscale/graph: move legacy fallback out of add_convert_pass() Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-03 21:39:55 +00:00
Niklas HaasandNiklas Haas	dfeb4fdbc7	swscale/graph: add metadata about backends in use Not currently publicly visible, but useful inside the test framework nonetheless. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-03 21:39:55 +00:00
Niklas HaasandNiklas Haas	6df223ce02	swscale/format: generalize ff_test_fmt() to take SwsBackend This allows us to test support in either the legacy code, or the ops-based code, or both. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-03 21:39:55 +00:00
Niklas HaasandNiklas Haas	945151851e	swscale/tests/swscale: add -backends option Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-03 21:39:55 +00:00
Niklas HaasandNiklas Haas	972c0cf91f	swscale: add new SwsContext.backends option This allows constraining the set of available backends. This serves as a better replacement for the "unstable" flag, which is a bit ambiguous. Allows users to, for example, opt into the memcpy or x86 backend, while excluding e.g. the upcoming JIT backends. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-03 21:39:55 +00:00
Niklas HaasandNiklas Haas	dc902654de	swscale: add missing validation for newly added enums Gives slightly better error messages for invalid values. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-03 21:39:55 +00:00
Niklas Haas	8a6027a54f	swscale/x86/ops_int: fix write_bits over-write This writes 4 bytes but in SSE4 mode only produces 2 bytes per vector. We can avoid over-writing by using the appropriately sized register. Reproducible by: make libswscale/tests/swscale libswscale/tests/swscale -dst monob -unscaled 1 -flags unstable -align_src 1 -align_dst 1 Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-02 15:37:54 +02:00
Niklas Haas	8f38703323	swscale/ops_dispatch: calculate correct slice line count for tail copy These loops were both assuming that `h` lines need to be copied; but this varies. First of all, for plane subsampling; but more importantly, when vertically scaling, the input line count may be substantially lower than the actual line count. This fixes an out-of-bounds read/write when vertically upscaling with a tail buffer. Verifiable via e.g.: make libswscale/tests/swscale valgrind -- libswscale/tests/swscale -s 63x63 -src yuv444p -dst rgb24 \ -flags unstable -align_src 1 -align_dst 1 (As well as the SSIM scores, which drop from ~e-5 to ~e-3 without this fix) Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-02 15:36:42 +02:00
Niklas Haas	a00db63da7	swscale/tests/swscale: add option to force specific buffer alignment Useful to make sure the memcpy_in/out paths work as expected. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-02 15:35:00 +02:00
Niklas Haas	bb5c461a47	avfilter/vf_libplacebo: setup pl_vulkan_queue.flags on import params libplacebo versions before v365 passed .flags = 0 when retrieving the queues from imported Vulkan devices, so we have to error out in the case of a mismatch to avoid undefined behavior (Vulkan spec). See-Also: https://code.videolan.org/videolan/libplacebo/-/merge_requests/856 Sponsored-by: nxtedition AB Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-02 13:32:44 +02:00
Niklas Haas	9b9d29e09a	avfilter/vf_libplacebo: don't unnecessarily set fields to 0 (cosmetic) Sponsored-by: nxtedition AB Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-02 13:32:44 +02:00
Niklas Haas	9fe5758da5	avutil/hwcontext_vulkan: publicly expose queue device creation flags These are needed for interop with e.g. libplacebo, which needs to know the correct flags to call vkGetDeviceQueue2. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-02 13:32:43 +02:00
Niklas Haas	aa08cf8112	swscale/options: add missing option value for SWS_STRICT Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-23 11:31:54 +02:00
Niklas HaasandNiklas Haas	03dfac5630	fftools/ffmpeg_sched: allow throttling decoder outputs This is a departure from the conventional idea of decoders always outputting data as fast as possible. Instead, this allows decoders to be throttled in the same way filter graphs can be. This comes into play when e.g. a demuxer is feeding into two decoders, but only one of the two decoders is actually currently needed (e.g. due to A/V misalignment). In that case, what typically happens is that the unneeded decoder alse decodes all frames, and then piles them up on the "buffersrc" filter's downstream link (growing indefinitely). Another issue this solves manifests when e.g. a single demuxer is feeding many decoders that all try to feed frames to the same filter graph. In this case, all decoders run as fast as posssible, leading to lock contention on the filter graph input queue; resulting in (again) many frames piling up on the buffersrc (or downstream filters) for the unneeded inputs that are not actually the bottleneck, while the input that's actually undersatisfied can end up starved for CPU time, possibly for long enough to exhaust memory limits. The normal rate limiting fails to apply in this scenario because all decoders share a single demuxer, and are hence rate-limited only by the demuxer speed; whereas the demuxer is not choked because from the PoV of the scheduler, the filter graph is simply not getting enough frames. In a more general sense, there's a philosophical argument to be made here. Since a decoder is typically also a decompressor, it produces more data than it consumes. So, it a sense, it's acting like a type of producer also - in the same way that a filter graph can produce more input that outputs. Solve all of these issues by allowing decoders to be output-choked, which gives the scheduler control over when decoders are allowed to output frames. This does mean we have to add some sort of internal packet queue, because the decoder thread may need to continue accepting upstream packets from the demuxer (or else we risk stalling the demuxer), but defer the actual decoding by placing them inside an internal "overflow" queue. This effectively simulates a sort of "filter graph"-type semantics but for the decoder queue. This overflow logic is fairly self-contained inside `sch_dec_receive`, though it is quite nontrivial. I have added as much documentation as is hopefully needed to understand the logic. Importantly, we cannot simply unlimit the decoder input thread queue because the demuxer relies on backpressure from the decoder to rate limit itself. (Note that demuxers may only be active if there is at least one downstream decoder that is alse active, so we always have at least one decoder providing backpressure) Sponsored-by: nxtedition AB Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-23 08:41:12 +00:00
Niklas HaasandNiklas Haas	2b72d5243c	fftools/ffmpeg_sched: drain incoming frames before blocking filters When a filter is choked, but upstream threads are trying to write to its input, this can result in the filter's input queue getting stuck. Normally, the unchoke_downstream() logic would prevent this from happening, since the filter would itself get unchoked as a result of upstream decoders receiving pressure from the demuxer. However, upcoming changes to this logic will require weakening this upstream unchoking logic, so preventing the deadlock in a more elegant way helps with making the code more robust. Sponsored-by: nxtedition AB Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-23 08:41:12 +00:00
Niklas HaasandNiklas Haas	95391352b5	fftools/thread_queue: add THREAD_QUEUE_FLAG_NO_BLOCK Exactly what it says on the tin. There is some ambiguity as to whether this should also prevent reading from choked, as opposed to empty queue, but I think it makes sense to consider them equivalent, as I struggle to think of a use case where it would be beneficial to allow draining a queue that was explicitly choked by the upstream (to e.g. prevent further reads). Sponsored-by: nxtedition AB Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-23 08:41:12 +00:00
Niklas HaasandNiklas Haas	321b0e36a3	fftools/thread_queue: add `flags` parameter to tq_receive() I want to use this to allow a non-blocking use of this function. Sponsored-by: nxtedition AB Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-23 08:41:12 +00:00
Niklas HaasandNiklas Haas	6a563dab71	fftools/ffmpeg_sched: allow choosing nodes to unchoke This level of granularity will help for the upcoming patch. Sponsored-by: nxtedition AB Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-23 08:41:12 +00:00
Niklas HaasandNiklas Haas	04888287b3	fftools/ffmpeg_sched: fix sch_stop() and schedule_update_locked() race schedule_update_locked() is supposed to be a no-op when `sch->terminate` is 1. However, there is a TOCTOU error here, where a different thread may currently be executing schedule_update_locked(), having successfully passed the sch->terminate check but without actually updating the choke status. This does not matter for the current code, but will matter with the following commit, where it creates the theoretical possibility of a race where sch_stop() is trying to choke the demuxers (and unchoke the decoders) while schedule_update_locked() is simultaneously trying to choke the decoders, leading to a deadlock if the last decoder is left choked and unable to propagate EOF downstream. The cleanest solution is to just take the scheduler lock while updating the choke status here. This ensures that any other schedule_update_locked() calls will have completed. Sponsored-by: nxtedition AB Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-23 08:41:12 +00:00
Niklas HaasandNiklas Haas	0d123a3c23	fftools/ffmpeg_sched: use macros for schedule_update_locked() loops Instead of awkwardly looping over the type, just split this up into multiple loops. The loss in complexity seems worth the loss in conciseness to me, and more importantly, this allows us to easily add more waiter types. Sponsored-by: nxtedition AB Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-23 08:41:12 +00:00
Niklas HaasandNiklas Haas	d94c293e62	swscale/ops_dispatch: prevent float over-read when horizontal filtering The code made the fundamental assumption that over-read into the padding bytes is okay to do; because the most that can happen is that those pixel values end up corrupted, which doesn't affect any adjacent pixels. However, this is not true for SWS_OP_FILTER_H, because this operation fundamentally mixes together horizontal pixels. Normally, this was fine, because the filter weights for those pixels are set to 0, and 0 * x = 0. However, that is not true for floating point inputs, which can contain Infinity; and 0 * Infinity = NaN, thus corrupting the entire pixel. Solve it by specifically preventing over-read when it would be unsafe. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-20 21:45:28 +00:00
Niklas HaasandNiklas Haas	6bc0f9517c	swscale/ops_dispatch: rename filter_size to filter_size_h Since this is not set for vertical filters. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-20 21:45:28 +00:00
Niklas HaasandNiklas Haas	0c1a1ee12e	swscale/ops_optimizer: don't push scale past truncating conversions In an op list like: [ u8 +XXX] SWS_OP_READ : 1 elem(s) planar >> 3 [ u8 .XXX] SWS_OP_FILTER_V : 256 -> 320 bilinear (2 taps) [f32 .XXX] SWS_OP_SCALE : * 65535 [f32 +XXX] SWS_OP_CONVERT : f32 -> u16 [u16 zXXX] SWS_OP_SWAP_BYTES [u16 zzzX] SWS_OP_SWIZZLE : 0003 [u16 zzz+] SWS_OP_CLEAR : {_ _ _ 65535} [u16 XXXX] SWS_OP_WRITE : 4 elem(s) packed >> 0 The current version of the code would happily push the SWS_OP_SCALE past the truncating conversion, leading to degenerate loss of information. (In this case, the result was quite extreme) Affects quality across a wide range of formats, e.g.: rgb24 16x16 -> rgb48be 16x32: [ u8 +++X] SWS_OP_READ : 3 elem(s) packed >> 0 min: {0 0 0 _}, max: {255 255 255 _} [ u8 ...X] SWS_OP_FILTER_V : 16 -> 32 bilinear (2 taps) min: {0 0 0 _}, max: {255 255 255 _} + [f32 ...X] SWS_OP_SCALE : * 257 + min: {0 0 0 _}, max: {65535 65535 65535 _} [f32 +++X] SWS_OP_CONVERT : f32 -> u16 - min: {0 0 0 _}, max: {255 255 255 _} - [u16 +++X] SWS_OP_SCALE : * 257 min: {0 0 0 _}, max: {65535 65535 65535 _} [u16 zzzX] SWS_OP_SWAP_BYTES min: {0 0 0 _}, max: {65535 65535 65535 _} [u16 XXXX] SWS_OP_WRITE : 3 elem(s) packed >> 0 (X = unused, z = byteswapped, + = exact, 0 = zero) Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-17 10:41:34 +00:00
Niklas HaasandNiklas Haas	812b5654ae	swscale/tests/sws_ops: use SWS_SCALE_BILINEAR for printing ops lists This actually changes the behavior vs SWS_SCALE_POINT, because point scaling is bit-exact and thus implies a different set of optimizations. Ideally, we would still try and somehow merge this with tests/swscale.c to allow testing a different set of scalers; but I still don't have a good idea for how to accomplish that here. As it stands, results in additional extra dithering steps in almost all filters involving scaling, e.g.: rgb24 16x16 -> rgb24 16x32: [ u8 +++X] SWS_OP_READ : 3 elem(s) packed >> 0 min: {0 0 0 _}, max: {255 255 255 _} - [ u8 +++X] SWS_OP_FILTER_V : 16 -> 32 point (1 taps) + [ u8 ...X] SWS_OP_FILTER_V : 16 -> 32 bilinear (2 taps) min: {0 0 0 _}, max: {255 255 255 _} + [f32 ...X] SWS_OP_DITHER : 16x16 matrix + {0 3 2 -1} + min: {1/512 1/512 1/512 _}, max: {255.998047 255.998047 255.998047 _} + [f32 ...X] SWS_OP_MIN : x <= {255 255 255 _} + min: {1/512 1/512 1/512 _}, max: {255 255 255 _} [f32 +++X] SWS_OP_CONVERT : f32 -> u8 min: {0 0 0 _}, max: {255 255 255 _} [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) packed >> 0 (X = unused, z = byteswapped, + = exact, 0 = zero) Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-17 10:41:34 +00:00
Niklas HaasandNiklas Haas	2dfe055ddd	swscale/tests/sws_ops: print split sub-passes for lists with filters This allows us to inspect exactly the logic that is going on inside the CPU backends (which don't support bare filter passes). rgb24 16x16 -> rgb24 16x32: [ u8 +++X] SWS_OP_READ : 3 elem(s) packed >> 0 min: {0 0 0 _}, max: {255 255 255 _} [ u8 +++X] SWS_OP_FILTER_V : 16 -> 32 point (1 taps) min: {0 0 0 _}, max: {255 255 255 _} [f32 +++X] SWS_OP_CONVERT : f32 -> u8 min: {0 0 0 _}, max: {255 255 255 _} [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) packed >> 0 (X = unused, z = byteswapped, + = exact, 0 = zero) + Retrying with split passes: + [ u8 +++X] SWS_OP_READ : 3 elem(s) packed >> 0 + min: {0 0 0 _}, max: {255 255 255 _} + [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) planar >> 0 + (X = unused, z = byteswapped, + = exact, 0 = zero) + Sub-pass #1: + [ u8 +++X] SWS_OP_READ : 3 elem(s) planar >> 0 + 1 tap point filter (V) + min: {0 0 0 _}, max: {255 255 255 _} + [f32 +++X] SWS_OP_CONVERT : f32 -> u8 + min: {0 0 0 _}, max: {255 255 255 _} + [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) packed >> 0 + (X = unused, z = byteswapped, + = exact, 0 = zero) rgb24 16x16 -> rgb24 32x16: [ u8 +++X] SWS_OP_READ : 3 elem(s) packed >> 0 min: {0 0 0 _}, max: {255 255 255 _} [ u8 +++X] SWS_OP_FILTER_H : 16 -> 32 point (1 taps) min: {0 0 0 _}, max: {255 255 255 _} [f32 +++X] SWS_OP_CONVERT : f32 -> u8 min: {0 0 0 _}, max: {255 255 255 _} [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) packed >> 0 (X = unused, z = byteswapped, + = exact, 0 = zero) + Retrying with split passes: + [ u8 +++X] SWS_OP_READ : 3 elem(s) packed >> 0 + min: {0 0 0 _}, max: {255 255 255 _} + [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) planar >> 0 + (X = unused, z = byteswapped, + = exact, 0 = zero) + Sub-pass #1: + [ u8 +++X] SWS_OP_READ : 3 elem(s) planar >> 0 + 1 tap point filter (H) + min: {0 0 0 _}, max: {255 255 255 _} + [f32 +++X] SWS_OP_CONVERT : f32 -> u8 + min: {0 0 0 _}, max: {255 255 255 _} + [ u8 XXXX] SWS_OP_WRITE : 3 elem(s) packed >> 0 + (X = unused, z = byteswapped, + = exact, 0 = zero) Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-17 10:41:34 +00:00
Niklas HaasandNiklas Haas	369a301669	swscale/tests/sws_ops: use a dummy ops backend for printing This ensures that the ops printing path goes through the same code as the actual ops dispatch backend, including all sub-passes etc. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-17 10:41:34 +00:00
Niklas Haas	76dc83d9be	swscale/ops_dispatch: make ff_sws_ops_compile() output optional Allows the uops macro generation code to not actually compile any passes. More generally, this could be used to e.g. test if an op list is supported by a backend without actually creating the passes. The `bool first` change is needed because the `input == prev` check no longer works if we don't actually compiled any passes. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	420b1bf368	swscale/ops_dispatch: allow forcing specific ops backend This will be used eventually when I rewrite checkasm/sw_ops to re-use the code in ops_dispatch.c instead of hand-rolling the execution layer. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	9021448857	swscale/ops_dispatch: merge ff_sws_ops_compile_backend() and compile() Passing backend == NULL now loops over the backends as before. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	ad17144ce6	swscale/ops_dispatch: move op list print to ff_sws_ops_compile_backend() Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	90669ab52e	swscale/ops: move ff_sws_compile_pass() and friends to ops_dispatch.h This function actually lives in ops_dispatch.c, and doesn't really make sense in ops.h anymore. We should also move some stuff out of ops_internal.h, which doesn't depend on any external ops stuff, here. This allows the backend/compilation-related stuff to co-exist more nicely. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	1d841635a4	swscale/ops: also include scaling ops in ff_sws_enum_op_lists() Using the configured scaler from the SwsContext implicitly. This does affect the output of libswscale/tests/sws_ops.c, which now prints about 4x as much data (taking roughly 4x as long, but still within a second on my machine). We can make this process a lot faster by forcing SWS_SCALE_POINT as the scaler, which skips calculating any actual filter weights in favor of generating a trivial 1-tap filter. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	eec9f712f5	swscale/ops: re-use ff_sws_op_list_generate() in ff_sws_enum_op_lists() The only difference here is an extra ff_sws_add_filters() call, which is a no-op because src w/h = dst w/h = 16. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	cac183f46f	swscale/ops: don't silently suppress non-ENOTSUP errors Matches the behavior to the comment. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	dacbf080f3	swscale/ops_chain: simplify ff_sws_op_compile_tables() signature This no longer accesses prev/next as a result of the `unused` removal, so the signature can be simplified to just take the op directly. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	064600585e	swscale/ops_chain: remove flexible from SWS_OP_MIN/MAX entries We have other op types that skip checking the data even in non-flexible mode, so there is a precedent for just leaving away `flexible` for such kernels. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	98c1dbafbe	swscale/ops_memcpy: don't depend on ops_backend.h This is private to the C template based backend. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	62aad4513c	swscale/graph: move format conversion logic to formats.c Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	0611abc1bb	swscale/graph: move code for adding filters to format.h Mirroring the precedent established by the other SwsOp-generating functions. This allows us to re-use it for the uops macro generator. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	9fe0ff3d56	swscale/graph: make _reinit() only call _init(), not _create() This allows us to preserve the same memory allocation when reinitializing a graph, which is a nice bonus. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Niklas Haas	56305c460c	swscale/graph: add ff_sws_graph_alloc() and _init() As an alternative to the current _create() API. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00