ffmpeg

x/ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-06-24 21:53:20 +00:00

Author	SHA1	Message	Date
Niklas HaasandNiklas Haas	310df19fcd	tests/checkasm/sw_ops: add check for SWS_UOP_READ_PALETTE We just need to ensure the palette contains valid data, which will happen automatically as long as the plane 1 is large enough. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-20 14:08:49 +00:00
Ramiro Polla	176493e4c4	swscale/ops: pass SwsLinearOp by pointer instead of value in ff_sws_linear_mask()	2026-06-19 14:32:44 +00:00
Zhao ZhiliandNiklas Haas	4acfab044d	checkasm/sw_ops: fix typo in write operations check_write() matched against SWS_UOP_READ_PACKED/PLANAR, copied from check_read(), instead of SWS_UOP_WRITE_PACKED/PLANAR.	2026-06-15 14:36:05 +00:00
Andreas Rheinhardt	19e377b4b9	avcodec/x86/hpeldsp: Port mmxext functions to SSE2 The only noticable changes in benchmarks are for the x2 horizontal no_rnd case where SSE2 and movhps are beneficial: Old benchmarks: avg_pixels_tab[1][1]_c: 42.2 ( 1.00x) avg_pixels_tab[1][1]_mmxext: 10.8 ( 3.89x) avg_pixels_tab[1][2]_c: 18.0 ( 1.00x) avg_pixels_tab[1][2]_mmxext: 6.1 ( 2.96x) put_no_rnd_pixels_tab[1][1]_c: 29.7 ( 1.00x) put_no_rnd_pixels_tab[1][1]_mmxext: 12.3 ( 2.41x) put_no_rnd_pixels_tab[1][2]_c: 20.4 ( 1.00x) put_no_rnd_pixels_tab[1][2]_mmxext: 12.2 ( 1.67x) put_pixels_tab[1][1]_c: 29.9 ( 1.00x) put_pixels_tab[1][1]_mmxext: 7.6 ( 3.92x) put_pixels_tab[1][2]_c: 16.8 ( 1.00x) put_pixels_tab[1][2]_mmxext: 6.4 ( 2.63x) New benchmarks: avg_pixels_tab[1][1]_c: 42.3 ( 1.00x) avg_pixels_tab[1][1]_sse2: 10.7 ( 3.95x) avg_pixels_tab[1][2]_c: 17.8 ( 1.00x) avg_pixels_tab[1][2]_sse2: 6.3 ( 2.83x) put_no_rnd_pixels_tab[1][1]_c: 29.6 ( 1.00x) put_no_rnd_pixels_tab[1][1]_sse2: 10.5 ( 2.81x) put_no_rnd_pixels_tab[1][2]_c: 20.4 ( 1.00x) put_no_rnd_pixels_tab[1][2]_sse2: 12.3 ( 1.67x) put_pixels_tab[1][1]_c: 30.1 ( 1.00x) put_pixels_tab[1][1]_sse2: 7.6 ( 3.93x) put_pixels_tab[1][2]_c: 16.8 ( 1.00x) put_pixels_tab[1][2]_sse2: 6.4 ( 2.64x) Switching to SSE2 unfortunately increased codesize of the relevant functions by 160B. This makes these functions ABI compatible, i.e. they no longer rely on others calling emms_c to fix the fpu state. It also implies that many mpegvideo decoders (the exceptions are MPEG-4, RV30, RV40 and the VC-1 family) now no longer use any mmx registers at all. So one can remove the emms_c from the MPEG-1/2 decoder. The same is true for VP3. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-06-14 22:04:42 +02:00
Andreas Rheinhardt	c35f57f3c4	avcodec/x86/fpel: Use SSE2 in avg_pixels8 No change in benchmarks here; this already allows to remove an emms_c from cavsdec.c. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-06-14 22:04:42 +02:00
Niklas HaasandNiklas Haas	b488ee5553	swscale/ops: generalize SwsReadWriteOp.packed to enum I want to start adding more data layouts, like semiplanar formats (nv12), or palette formats. I made an effort to distinguish existing checks for rw.packed into "mode != PLANAR" and "mode == PACKED", based on the intent of the surrounding code, in anticipation of these new layouts. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-11 16:27:47 +00:00
Niklas HaasandNiklas Haas	addee69955	swscale/ops_dispatch: generalize block_size_in/out to array See previous commit for justification. I decided to split these refactors up into several independent commits to make it easier to review and bisect, since they are all independent atomic changes. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-11 16:27:47 +00:00
Niklas HaasandNiklas Haas	11900e4e12	swscale/ops: generalize SWS_OP_FILTER_* result type Instead of hard-coding SWS_PIXEL_F32 here. This is not really useful yet, but I wanted to clean up the semantics here regardless. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-11 16:27:47 +00:00
Niklas HaasandNiklas Haas	091149b187	swscale/ops: group filtered rw metadata into struct This is a minor cosmetic improvement that allows me to use more convenient names for a filter-related metadata fields, without confusion. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-11 16:27:47 +00:00
DROOdotFOOandRamiro Polla	cc7c567920	swscale/aarch64/yuv2rgb_neon: add BE 16bpp output formats BE counterparts to the LE paths in 2e142e52ae; pack adds rev16 before store. nv12/nv21 paths are added but bench-only (no C ref, same as `2e142e52ae`). Test Name A55-gcc M1-clang A76-gcc ------------------------------------------------------------------------------------- yuv420p_rgb565be_1920_neon 15086.1 ( 3.91x) 5507.0 ( 4.34x) 19229.1 ( 2.02x) yuv420p_bgr565be_1920_neon 15291.7 ( 3.84x) 5476.9 ( 4.37x) 19229.4 ( 2.02x) yuv420p_rgb555be_1920_neon 15091.5 ( 3.67x) 5569.0 ( 3.97x) 19229.3 ( 1.90x) yuv420p_bgr555be_1920_neon 15298.6 ( 3.62x) 5600.6 ( 3.98x) 19228.8 ( 1.90x) yuv422p_rgb565be_1920_neon 16862.3 ( 4.00x) 6378.8 ( 4.64x) 22110.3 ( 2.07x) yuv422p_bgr565be_1920_neon 17139.3 ( 3.93x) 6448.1 ( 4.50x) 22104.1 ( 2.07x) yuv422p_rgb555be_1920_neon 16853.3 ( 3.98x) 6468.8 ( 4.12x) 22106.4 ( 1.98x) yuv422p_bgr555be_1920_neon 17202.2 ( 3.89x) 6467.0 ( 4.12x) 22110.2 ( 1.98x) yuva420p_rgb565be_1920_neon 15050.2 ( 3.92x) 5452.5 ( 4.39x) 19229.5 ( 2.02x) yuva420p_bgr565be_1920_neon 15346.6 ( 3.84x) 5462.4 ( 4.36x) 19228.9 ( 2.02x) yuva420p_rgb555be_1920_neon 15050.8 ( 3.69x) 5463.3 ( 3.95x) 19228.6 ( 1.90x) yuva420p_bgr555be_1920_neon 15352.8 ( 3.61x) 5543.6 ( 3.89x) 19228.6 ( 1.90x) Co-authored-by: Ramiro Polla <ramiro.polla@gmail.com> Signed-off-by: DROOdotFOO <drew@axol.io>	2026-06-10 17:54:20 +00:00
Martin Storsjö	b20c4c6f98	checkasm: Update to the latest upstream version This update was done by running this command: $ git subtree pull --squash --prefix=tests/checkasm/ext \ https://code.ffmpeg.org/FFmpeg/checkasm.git master This includes fixes for a couple regressions noted after integrating the new external checkasm into ffmpeg: - Fixes spurious errors about missing vzeroupper in C code generated by MSVC, fixing https://code.ffmpeg.org/FFmpeg/FFmpeg/issues/23360 - Fixes building for WINAPI_FAMILY_PHONE_APP, and for UWP with older Windows SDKs, https://code.videolan.org/videolan/checkasm/-/work_items/37 - Fixes building in x86_32 mode for Windows with --disable-asm, https://code.videolan.org/videolan/checkasm/-/work_items/36	2026-06-09 20:57:59 +03:00
Niklas Haas	100ce4ac41	tests/checkasm/sw_ops: rewrite using uops_macros.h This ensures 100% coverage of all uop primitives by generating the set of tests exactly from the list of seen primitives, using the uops macros. There are some annoying quirks still because of the fact that we have to essentially "untranslate" the UOPs back to SwsOps that result back in the intended uop after the translation, but overall it's not too bad and still much better than the status quo of hand-rolling the list of test cases. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Jun ZhaoandJun Zhao	3d71b9ec93	tests/checkasm: hevc_pred: use pixel helpers for diagnostic output Replace plain memcmp+fail() with checkasm_check_pixel_padded() for DC, planar, and angular prediction tests. Use PIXEL_RECT for output buffers instead of flat arrays. This enables: - Detailed per-pixel difference output when run with 'checkasm -v' - Detection of out-of-bounds writes beyond the NxN block area - Padding violation reporting (writes past block boundary) Previously, a test failure would only report "FAILED" with no information about which pixels were wrong, making assembly debugging difficult. Follows the pattern established in `4d4b301e4a` (checkasm: hevc_pel: Use helpers for checking for writes out of bounds). Suggested-by: Martin Storsjö <martin@martin.st> Signed-off-by: Jun Zhao <barryjzhao@tencent.com>	2026-06-07 23:29:33 +00:00
Niklas HaasandNiklas Haas	3137d337fe	tests/checkasm/sw_ops: use new checkasm_set_func_variant() The current approach of re-testing the C reference for every backend separately leads to both confusing output (e.g. having an extra redundant `memcpy_c` line for every op, even those not implemented by the memcpy backend), as well as a lot of unnecessary wasted time re-testing and re-benching the same C variant for every backend. This new API function lets us test the C function only a single time, while simultaneously having all of the other backends implicitly compare themselves against the C reference. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-07 09:24:23 +00:00
Martin Storsjö	96470d1e8c	checkasm: Fix defining CHECKASM_HAVE_GENERATED_H Commit `4569ab7eaa` tried to set this only on the object files for the checkasm library itself, but missed that EXT_CHECKASMOBJS lacks the path prefix, thus this wasn't set at all. Alternatively, for simplicity, we could keep passing this for all checkasm object files, not only the checkasm library objects; the other object files don't use it in any case.	2026-06-05 11:46:38 +00:00
Kacper Michajłow	2a54b181c0	tests/checkasm/vvc_mc: prevent function inline to avoid stack overflow Fixes stack overflow on Windows when by default we have 1 MB. Individually those functions fit, but when they are all inlined, it's too much. Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2026-06-05 08:22:40 +02:00
Kacper Michajłow	7d2a629ccf	tests/checkasm/rv34dsp: pass correct buffer to bench function The test can negate stride, in which case we have to use adjusted `dst_newp`. Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2026-06-05 08:22:37 +02:00
Martin Storsjö	4569ab7eaa	configure: Provide checkasm_header_config_generated.h as well This is required for overriding defines that exist in the public headers of checkasm, when e.g. building with assembly disabled for an architecture where we normally would use the checked_call wrapper. This fixes a leftover in how checkasm is integrated into the ffmpeg build system; there were many different approaches considered for fixing --disable-asm, and the ffmpeg configure integration didn't end up matching the final solution. This fixes building with --disable-asm.	2026-06-04 18:26:50 +00:00
Niklas Haas	310ff99f62	configure: support building without checkasm Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-04 11:44:52 +02:00
Niklas HaasandMartin Storsjö	3b1d7cd1f7	tests/checkasm: switch to shared libcheckasm implementation The checkasm tool originated in x264. It was later rewritten and modernized for FFmpeg (and relicensed to LGPL). For the dav1d project, it was relicensed again to 2-clause BSD (with permission from the relevant authors). The FFmpeg and dav1d implementations of checkasm have since evolved independently (with some amount of ported code between the two, with relicensing permission where relevant). To synchronize the development, and to make it possible to easily adopt checkasm in other projects, it has been split out into a standalone project/library on its own, developed at https://code.videolan.org/videolan/checkasm/. That version has all the features of checkasm in both FFmpeg and dav1d, and has got a number of extra improvements on top: - More/fixed tests (e.g. properly clobbering high bits of 32-bit registers on most platforms), - Vastly improved overall performance / runtime for benchmarking, due primarily to the ability to scale the runtime of each test to that test's complexity. - Much more robust statistical analysis of benchmarking results; including robust outlier rejection, an estimation of the histogram, and the ability to report the variance / stddev in addition to the (trimmed) mean. - Interactive HTML and JSON output formats in addition to CSV/TSV. - More readable and user-friendly output across the board, especially for failures and data dumps (e.g. also showing errors inside padding bytes). - Better cross-platform support, including dynamic fallback of timer implementations on ARM platforms, a better RISC-V and AArch64 harness, and more. On AArch64, it tests which timer out of pmccntr_el0, linux perf, macos kperf, cntvct_el0 is available, without the user needing to configure things, and falling back on clock_gettime if neither of them can be used. This means one automatically gets the best available timer, if userspace access to pmccntr_el0 has been unlocked with a kernel module, or if one has permission to use the perf API, or if the cntvct_el0 is exact enough to be useful. On AArch64 macOS, there is now a test harness that catches clobbered registers and stack clobbering, like on other platforms. - An option for setting affinity, for benchmarking on heterogenous core systems. (On Linux, this is already easily done through taskset, but on Windows, the checkasm built in option makes it possible there as well, and portable.) - Printing of the tested CPU core name, where possible. To integrate this external implementation of checkasm into FFmpeg, without having to build libcheckasm as an external library, the upstream sources are added as a git subtree, and integrated into the FFmpeg build system as a foreign source. For the long and storied history of how we arrived at this solution, see: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/22546 The relevant config headers for checkasm are generated by configure, and the sources are built as part of the main ffmpeg build. The upstream sources, while they use meson as primary build system, are structured to make it easy to build as part of a foreign build system. The existing testcases are mostly kept untouched (only three minor changes are required, in crc.c, sw_ops.c and vp8dsp.c), while the majority of the logic from checkasm.c, checkasm.h and the arch specific assembly files are removed, replaced with the external implementation. Co-Authored-By: Martin Storsjö <martin@martin.st> Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-04 11:44:52 +02:00
Niklas Haas	21ac0b276e	Merge commit 'df966476d760f1bfe4c5f52c463b82be5bf6b9ed' as 'tests/checkasm/ext' To reproduce this commit, run: $ git subtree add --squash --prefix=tests/checkasm/ext \ https://code.ffmpeg.org/FFmpeg/checkasm.git master To update at a later point in time, replace `add` by `pull`	2026-06-04 11:44:40 +02:00
Niklas Haas	068173f329	tests/checkasm: factorize out randomize_buffer for doubles Not only is this duplicating code, but it also hard-codes a reference to `checkasm_lfg`, which I want to eliminate in the interest of being able to switch out the checkasm implementation.	2026-06-04 11:44:22 +02:00
Niklas Haas	71b4666ba5	tests/checkasm/sw_ops: re-indent after previous change Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-03 23:53:37 +02:00
Niklas Haas	7af4faf6df	tests/checkasm/sw_ops: skip test data setup if not testing anything The test data size is quite large, so re-setting up unused data is eating up quite a significant amount of CPU time. This commit cuts execution time of sw_ops in half. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-03 23:53:23 +02:00
James Almer	de261b9bb2	tests/checkasm/crc: use libavutil memory allocation helpers Signed-off-by: James Almer <jamrial@gmail.com>	2026-05-28 22:04:27 +00:00
James Almer	224659360a	tests/checkasm/crc: retain offset values between calls Should fix buffer overflows as reported by clang-asan and use of uninitialized values as reported by valgrind. Signed-off-by: James Almer <jamrial@gmail.com>	2026-05-28 22:04:27 +00:00
DROOdotFOOandMartin Storsjö	34501921fd	tests/checkasm/sw_yuv2rgb: cover nv12 and nv21 The previous chroma stride formula (width >> log2_chroma_w) is correct for planar yuv but wrong for semi-planar nv12/nv21, where the UV plane is interleaved at width bytes per row (width/2 UV pairs of 2 bytes each). Use av_image_get_linesize() so the test feeds a valid stride to libswscale regardless of input format; for the existing planar suites the value is unchanged. With the stride fixed, add nv12 and nv21 to check_yuv2rgb() so the upcoming NEON 16bpp paths get bench coverage. ff_get_unscaled_swscale does not wire a C yuv2rgb fast path for these inputs, so the suites report bench-only (no correctness reference); they still run clobber detection and cycle counts. Signed-off-by: DROOdotFOO <drew@axol.io>	2026-05-22 10:03:07 +00:00
Andreas Rheinhardt	7971953d29	avfilter/x86/vf_pp7: Port ff_pp7_dctB_mmx to SSE2 Unfortunately a bit slower than the MMX version due to the impossibility to use memory operands in paddw. The situation would reverse if ff_dctB_mmx() would have to issue emms. dctB_c: 3.7 ( 1.00x) dctB_mmx: 3.3 ( 1.13x) dctB_sse2: 3.6 ( 1.03x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-05-15 20:29:29 +02:00
Andreas Rheinhardt	94a49068db	tests/checkasm: Add vf_pp7 checkasm test Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-05-15 20:29:29 +02:00
Niklas Haas	9021448857	swscale/ops_dispatch: merge ff_sws_ops_compile_backend() and compile() Passing backend == NULL now loops over the backends as before. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-05-15 18:53:05 +02:00
Andreas Rheinhardt	f5ed254528	swscale/x86/yuv2yuvX: Port ff_yuv2yuvX_mmxext to SSE2 The mmx function performs two registers in parallel; given the larger register size of SSE2, the same amount of data can be processed in one register with some speedups. (Given that this function is used for tail-processing, not processing more data is important.) Switching to SSE2 also fixes a bug introduced in `554c2bc708`: Since said commit, only half the dither values were used. This seems not to matter in practice, as the functions here use dither only in the following form: ((filtersize-1)*8+dither)>>4. The dither values used here come from ff_dither_8x8_128 which has the property that ff_dither_8x8_128[i][j] and ff_dither_8x8_128[i][j+4] always lead to the same result in the above formula. Old benchmarks: yuv2yuvX_8_2_0_512_approximate_c: 2309.9 ( 1.00x) yuv2yuvX_8_2_0_512_approximate_mmxext: 250.2 ( 9.23x) yuv2yuvX_8_2_0_512_approximate_sse3: 98.8 (23.39x) yuv2yuvX_8_2_0_512_approximate_avx2: 52.9 (43.63x) yuv2yuvX_8_2_16_512_approximate_c: 2263.0 ( 1.00x) yuv2yuvX_8_2_16_512_approximate_mmxext: 245.3 ( 9.22x) yuv2yuvX_8_2_16_512_approximate_sse3: 114.3 (19.80x) yuv2yuvX_8_2_16_512_approximate_avx2: 85.6 (26.45x) yuv2yuvX_8_2_32_512_approximate_c: 2155.8 ( 1.00x) yuv2yuvX_8_2_32_512_approximate_mmxext: 235.6 ( 9.15x) yuv2yuvX_8_2_32_512_approximate_sse3: 93.6 (23.04x) yuv2yuvX_8_2_32_512_approximate_avx2: 78.1 (27.60x) yuv2yuvX_8_2_48_512_approximate_c: 2084.8 ( 1.00x) yuv2yuvX_8_2_48_512_approximate_mmxext: 230.2 ( 9.05x) yuv2yuvX_8_2_48_512_approximate_sse3: 105.0 (19.85x) yuv2yuvX_8_2_48_512_approximate_avx2: 71.9 (29.00x) yuv2yuvX_8_4_0_512_approximate_c: 3496.3 ( 1.00x) yuv2yuvX_8_4_0_512_approximate_mmxext: 455.0 ( 7.68x) yuv2yuvX_8_4_0_512_approximate_sse3: 157.5 (22.20x) yuv2yuvX_8_4_0_512_approximate_avx2: 88.4 (39.53x) yuv2yuvX_8_4_16_512_approximate_c: 3380.9 ( 1.00x) yuv2yuvX_8_4_16_512_approximate_mmxext: 440.0 ( 7.68x) yuv2yuvX_8_4_16_512_approximate_sse3: 175.0 (19.32x) yuv2yuvX_8_4_16_512_approximate_avx2: 134.1 (25.22x) yuv2yuvX_8_4_32_512_approximate_c: 3277.6 ( 1.00x) yuv2yuvX_8_4_32_512_approximate_mmxext: 427.2 ( 7.67x) yuv2yuvX_8_4_32_512_approximate_sse3: 149.7 (21.89x) yuv2yuvX_8_4_32_512_approximate_avx2: 115.5 (28.37x) yuv2yuvX_8_4_48_512_approximate_c: 3167.8 ( 1.00x) yuv2yuvX_8_4_48_512_approximate_mmxext: 414.9 ( 7.63x) yuv2yuvX_8_4_48_512_approximate_sse3: 164.1 (19.31x) yuv2yuvX_8_4_48_512_approximate_avx2: 101.2 (31.30x) yuv2yuvX_8_8_0_512_approximate_c: 5987.5 ( 1.00x) yuv2yuvX_8_8_0_512_approximate_mmxext: 854.1 ( 7.01x) yuv2yuvX_8_8_0_512_approximate_sse3: 294.6 (20.32x) yuv2yuvX_8_8_0_512_approximate_avx2: 144.1 (41.56x) yuv2yuvX_8_8_16_512_approximate_c: 5848.9 ( 1.00x) yuv2yuvX_8_8_16_512_approximate_mmxext: 834.4 ( 7.01x) yuv2yuvX_8_8_16_512_approximate_sse3: 312.1 (18.74x) yuv2yuvX_8_8_16_512_approximate_avx2: 214.9 (27.22x) yuv2yuvX_8_8_32_512_approximate_c: 5610.1 ( 1.00x) yuv2yuvX_8_8_32_512_approximate_mmxext: 811.6 ( 6.91x) yuv2yuvX_8_8_32_512_approximate_sse3: 277.5 (20.21x) yuv2yuvX_8_8_32_512_approximate_avx2: 189.8 (29.55x) yuv2yuvX_8_8_48_512_approximate_c: 5415.8 ( 1.00x) yuv2yuvX_8_8_48_512_approximate_mmxext: 782.3 ( 6.92x) yuv2yuvX_8_8_48_512_approximate_sse3: 289.4 (18.72x) yuv2yuvX_8_8_48_512_approximate_avx2: 165.3 (32.76x) yuv2yuvX_8_16_0_512_approximate_c: 11100.7 ( 1.00x) yuv2yuvX_8_16_0_512_approximate_mmxext: 1682.1 ( 6.60x) yuv2yuvX_8_16_0_512_approximate_sse3: 558.8 (19.86x) yuv2yuvX_8_16_0_512_approximate_avx2: 280.1 (39.63x) yuv2yuvX_8_16_16_512_approximate_c: 10772.1 ( 1.00x) yuv2yuvX_8_16_16_512_approximate_mmxext: 1611.0 ( 6.69x) yuv2yuvX_8_16_16_512_approximate_sse3: 578.1 (18.63x) yuv2yuvX_8_16_16_512_approximate_avx2: 418.8 (25.72x) yuv2yuvX_8_16_32_512_approximate_c: 10381.5 ( 1.00x) yuv2yuvX_8_16_32_512_approximate_mmxext: 1560.4 ( 6.65x) yuv2yuvX_8_16_32_512_approximate_sse3: 525.8 (19.74x) yuv2yuvX_8_16_32_512_approximate_avx2: 370.7 (28.01x) yuv2yuvX_8_16_48_512_approximate_c: 10046.1 ( 1.00x) yuv2yuvX_8_16_48_512_approximate_mmxext: 1512.4 ( 6.64x) yuv2yuvX_8_16_48_512_approximate_sse3: 546.0 (18.40x) yuv2yuvX_8_16_48_512_approximate_avx2: 315.0 (31.89x) New benchmarks: yuv2yuvX_8_2_0_512_approximate_c: 2302.5 ( 1.00x) yuv2yuvX_8_2_0_512_approximate_sse2: 184.4 (12.49x) yuv2yuvX_8_2_0_512_approximate_sse3: 100.1 (23.01x) yuv2yuvX_8_2_0_512_approximate_avx2: 54.9 (41.98x) yuv2yuvX_8_2_16_512_approximate_c: 2224.6 ( 1.00x) yuv2yuvX_8_2_16_512_approximate_sse2: 180.0 (12.36x) yuv2yuvX_8_2_16_512_approximate_sse3: 109.5 (20.31x) yuv2yuvX_8_2_16_512_approximate_avx2: 81.3 (27.35x) yuv2yuvX_8_2_32_512_approximate_c: 2165.3 ( 1.00x) yuv2yuvX_8_2_32_512_approximate_sse2: 176.6 (12.26x) yuv2yuvX_8_2_32_512_approximate_sse3: 93.7 (23.11x) yuv2yuvX_8_2_32_512_approximate_avx2: 73.1 (29.61x) yuv2yuvX_8_2_48_512_approximate_c: 2088.0 ( 1.00x) yuv2yuvX_8_2_48_512_approximate_sse2: 170.7 (12.23x) yuv2yuvX_8_2_48_512_approximate_sse3: 103.4 (20.20x) yuv2yuvX_8_2_48_512_approximate_avx2: 69.4 (30.10x) yuv2yuvX_8_4_0_512_approximate_c: 3496.8 ( 1.00x) yuv2yuvX_8_4_0_512_approximate_sse2: 320.3 (10.92x) yuv2yuvX_8_4_0_512_approximate_sse3: 158.8 (22.02x) yuv2yuvX_8_4_0_512_approximate_avx2: 86.4 (40.49x) yuv2yuvX_8_4_16_512_approximate_c: 3443.5 ( 1.00x) yuv2yuvX_8_4_16_512_approximate_sse2: 325.3 (10.59x) yuv2yuvX_8_4_16_512_approximate_sse3: 171.9 (20.03x) yuv2yuvX_8_4_16_512_approximate_avx2: 123.6 (27.85x) yuv2yuvX_8_4_32_512_approximate_c: 3272.2 ( 1.00x) yuv2yuvX_8_4_32_512_approximate_sse2: 302.7 (10.81x) yuv2yuvX_8_4_32_512_approximate_sse3: 148.9 (21.98x) yuv2yuvX_8_4_32_512_approximate_avx2: 110.6 (29.58x) yuv2yuvX_8_4_48_512_approximate_c: 3166.3 ( 1.00x) yuv2yuvX_8_4_48_512_approximate_sse2: 291.0 (10.88x) yuv2yuvX_8_4_48_512_approximate_sse3: 162.9 (19.44x) yuv2yuvX_8_4_48_512_approximate_avx2: 102.3 (30.95x) yuv2yuvX_8_8_0_512_approximate_c: 5967.6 ( 1.00x) yuv2yuvX_8_8_0_512_approximate_sse2: 691.2 ( 8.63x) yuv2yuvX_8_8_0_512_approximate_sse3: 294.2 (20.28x) yuv2yuvX_8_8_0_512_approximate_avx2: 154.9 (38.52x) yuv2yuvX_8_8_16_512_approximate_c: 5780.2 ( 1.00x) yuv2yuvX_8_8_16_512_approximate_sse2: 606.2 ( 9.53x) yuv2yuvX_8_8_16_512_approximate_sse3: 309.3 (18.69x) yuv2yuvX_8_8_16_512_approximate_avx2: 208.7 (27.69x) yuv2yuvX_8_8_32_512_approximate_c: 5604.3 ( 1.00x) yuv2yuvX_8_8_32_512_approximate_sse2: 592.3 ( 9.46x) yuv2yuvX_8_8_32_512_approximate_sse3: 281.1 (19.94x) yuv2yuvX_8_8_32_512_approximate_avx2: 185.4 (30.23x) yuv2yuvX_8_8_48_512_approximate_c: 5413.7 ( 1.00x) yuv2yuvX_8_8_48_512_approximate_sse2: 570.4 ( 9.49x) yuv2yuvX_8_8_48_512_approximate_sse3: 294.9 (18.36x) yuv2yuvX_8_8_48_512_approximate_avx2: 166.5 (32.51x) yuv2yuvX_8_16_0_512_approximate_c: 11099.4 ( 1.00x) yuv2yuvX_8_16_0_512_approximate_sse2: 1213.6 ( 9.15x) yuv2yuvX_8_16_0_512_approximate_sse3: 563.0 (19.72x) yuv2yuvX_8_16_0_512_approximate_avx2: 294.8 (37.65x) yuv2yuvX_8_16_16_512_approximate_c: 10718.1 ( 1.00x) yuv2yuvX_8_16_16_512_approximate_sse2: 1121.2 ( 9.56x) yuv2yuvX_8_16_16_512_approximate_sse3: 563.7 (19.01x) yuv2yuvX_8_16_16_512_approximate_avx2: 389.5 (27.51x) yuv2yuvX_8_16_32_512_approximate_c: 10373.3 ( 1.00x) yuv2yuvX_8_16_32_512_approximate_sse2: 1096.2 ( 9.46x) yuv2yuvX_8_16_32_512_approximate_sse3: 526.7 (19.70x) yuv2yuvX_8_16_32_512_approximate_avx2: 354.7 (29.24x) yuv2yuvX_8_16_48_512_approximate_c: 10066.9 ( 1.00x) yuv2yuvX_8_16_48_512_approximate_sse2: 1055.8 ( 9.53x) yuv2yuvX_8_16_48_512_approximate_sse3: 527.9 (19.07x) yuv2yuvX_8_16_48_512_approximate_avx2: 313.7 (32.09x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-26 23:48:21 +02:00
Jun ZhaoandJun Zhao	188757d43d	tests/checkasm: add hevc_pred ref_filter_3tap and ref_filter_strong tests Test 3-tap for 8x8/16x16/32x32 (both filtered_left and filtered_top outputs). Test strong smoothing for filtered_top and in-place left modification. Signed-off-by: Jun Zhao <barryjzhao@tencent.com>	2026-04-21 07:50:49 +00:00
Andreas Rheinhardt	415b466d41	avcodec/x86/vp3dsp: Port ff_vp3_idct_dc_add_mmxext to SSE2 This change should improve performance on Skylake and later Intel CPUs (which have only half the ports for saturated adds/subs for mmx register compared to xmm register): llvm-mca predicts a 25% performance improvement on Skylake. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-19 08:21:17 +02:00
Andreas Rheinhardt	88879f2eff	tests/checkasm/vp3dsp: Add test for idct_add, idct_put, idct_dc_add Due to a discrepancy between SSE2 and the C version coefficients for idct_put and idct_add are restricted to a range not causing overflows. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-19 08:21:08 +02:00
Andreas Rheinhardt	84b9de0633	avcodec/x86/vp3dsp: Port ff_put_vp_no_rnd_pixels8_l2_mmx to SSE2 This allows to use pavgb to reduce the amount of instructions used to calculate the average; processing two rows via movhps allows to reduce the amount of pxor and pavgb even further and turned out to be beneficial. This patch also avoids a load as the constant used here can be easily generated at runtime. Old benchmarks: put_no_rnd_pixels_l2_c: 13.3 ( 1.00x) put_no_rnd_pixels_l2_mmx: 11.6 ( 1.15x) New benchmarks: put_no_rnd_pixels_l2_c: 13.4 ( 1.00x) put_no_rnd_pixels_l2_sse2: 7.5 ( 1.77x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-19 08:15:54 +02:00
Andreas Rheinhardt	37bc3a237b	tests/checkasm/vp3dsp: Add test for put_no_rnd_pixels_l2 Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-19 08:14:50 +02:00
Niklas Haas	cf2d40f65d	swscale/ops: add explicit clear mask to SwsClearOp Instead of implicitly testing for NaN values. This is mostly a straightforward translation, but we need some slight extra boilerplate to ensure the mask is correctly updated when e.g. commuting past a swizzle. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:23:36 +02:00
Kacper Michajłow	03967fcff4	tests/checkasm/sw_ops: fix too large shift for int Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2026-04-16 18:56:22 +00:00
Andreas Rheinhardt	39f34ee019	tests/checkasm/h264chroma: Use more realistic block sizes Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-16 07:36:01 +02:00
Niklas HaasandNiklas Haas	dcfd8ebe86	tests/checkasm/sw_ops: remove random value clears These can randomly trigger the alpha/zero fast paths, resulting in spurious tests or randomly diverging performance if the backend happens to implement that particular fast path. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-15 14:51:16 +00:00
Niklas HaasandNiklas Haas	80b86f0807	tests/checkasm/sw_ops: fix check_scale() This was not actually testing integer path. Additionally, for integer scales, there is a special fast path for expansion from bits to full range, which we should separate from the random value test.	2026-04-15 14:51:16 +00:00
Niklas HaasandNiklas Haas	026a6a3101	tests/checkasm/sw_ops: remove redundant filter tests Most of these filters don't test anything meaningfully different relative to each other; the only filters that really have special significant are POINT (for now) and maybe BILINEAR down the line. Apart from that, SINC, combined with the src size loop, already tests both extreme cases (large and small filters), with large, oscillating unwindonwed weights. The other filters are not adding anything of substance to this, while massively slowing down the runtime of this test. We can, of course, change this if the backends ever get more nuanced handling. checkasm: all 855 tests passed (down from 1575) Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-15 14:51:16 +00:00
Niklas HaasandNiklas Haas	91582f7287	tests/checkasm/sw_ops: explicitly test all backends The current code was a bit clumsy in that it always picked the first available backend when choosing the new function. This meant that some x86 paths were not being tested at all, whenever the memcpy backend (which has higher priority) could serve the request. This change makes it so that each backend is explicitly tested against only implementations provided by that same backend. checkasm: all 1575 tests passed (up from 1305) As an aside, it also lets us benchmark the memcpy backend directly against the C reference backend. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-15 14:51:16 +00:00
Niklas HaasandNiklas Haas	d5089a1c62	tests/checkasm/sw_ops: don't shadow 'report' Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-15 14:51:16 +00:00
Niklas HaasandNiklas Haas	3c1781f931	tests/checkasm/sw_ops: separate op compilation from testing This commit is purely moving around code; there is no functional change. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-15 14:51:16 +00:00
Niklas HaasandNiklas Haas	e83de76f08	tests/checkasm/sw_ops: check all planes in CHECK_COMMON() This can help e.g. properly test that the masked/excluded components are left unmodified. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-15 14:51:16 +00:00
Niklas HaasandNiklas Haas	eac90ce6ce	tests/checkasm/sw_ops: set correct plane index order All four components were accidentally being read/written to/from the same plane. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-15 14:51:16 +00:00
Niklas HaasandNiklas Haas	590eb4b70d	tests/checkasm/sw_ops: remove some unnecessary checks These don't actually exist at runtime, and will soon be removed from the backends as well. This commit is intentionally a bit incomplete; as I will rewrite this based on the auto-generated macros in the upcoming ops_micro series. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-15 14:51:16 +00:00
Andreas Rheinhardt	338dc25642	avcodec/x86/snowdsp_init: Remove MMXEXT, SSE2 inner_add_yblock versions They have been superseded by SSSE3; the SSE2 version was even disabled (and segfaults if enabled). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-13 12:53:17 +02:00
Andreas Rheinhardt	2fdccaf7d6	tests/checkasm/mpegvideo_unquantize: Fix precedence problem Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-13 12:51:35 +02:00

1 2 3 4 5 ...