I want to start adding more data layouts, like semiplanar formats (nv12), or
palette formats. I made an effort to distinguish existing checks for rw.packed
into "mode != PLANAR" and "mode == PACKED", based on the intent of the
surrounding code, in anticipation of these new layouts.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
See previous commit for justification. I decided to split these
refactors up into several independent commits to make it easier
to review and bisect, since they are all independent atomic changes.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of hard-coding SWS_PIXEL_F32 here. This is not really useful
yet, but I wanted to clean up the semantics here regardless.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is a minor cosmetic improvement that allows me to use more
convenient names for a filter-related metadata fields, without
confusion.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Rather than hard-coding a separate set of NASM macros, or generating them
with a separate function, we can just leverage the C preprocessor to generate
a NASM source file *from* the existing ops macros.
This is maybe a bit unorthodox, but it avoids unnecessary overhead from
re-generating the macros twice, avoids manual updating of the NASM macros,
and generally does not come with any real downside except being a bit ugly.
The main source of ugliness is the fact that the C preprocessor expands
everything into a single line, whereas NASM expects separate statements to
be on separate lines. Very fortunately, we can work around this by writing a
another NASM macro to take its arguments and dump them onto multiple lines.
It may seem premature, but I went ahead and defined all the macros, since
it was easy enough to do.
I added the %include in this commit to trigger build errors that occur only
as a result of introducing this file in the same commit that introduces it.
Signed-off-by: Niklas Haas <git@haasn.dev>
The ops.h infrastructure currently hard-codes this as SWS_PIXEL_F32,
but I want to at least properly parametrize this in case we ever
decide to revisit this decision in the future. In particular, it
may become relevant for trivial kernels or kernels whose intermediates
are bounded, exact integers (which could possibly be output directly
as e.g. U16 or U32).
The FATE change is just because the filter op names gained a suffix.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This ensures 100% coverage of all uop primitives by generating the set of
tests exactly from the list of seen primitives, using the uops macros.
There are some annoying quirks still because of the fact that we have to
essentially "untranslate" the UOPs back to SwsOps that result back in the
intended uop after the translation, but overall it's not too bad and still
much better than the status quo of hand-rolling the list of test cases.
Signed-off-by: Niklas Haas <git@haasn.dev>
This follows the same approach as is used currently by ops_entries_aarch64,
except I decided to have the generation logic live directly in uops.c
to allow re-using internal helpers and move it closer to the other helpers
that depend on the exact set of uops and their fields.
Unlike libswscale/tests/sws_ops.c, we make an effort to actually test all
relevant flag combinations, since these can affect the generated op lists.
I will use these macros to auto-generate both the C template-based kernels,
as well as the entire x86 backend, in the near future, hence their excessive
flexibility.
Re-use the libswscale/tests/sws_ops.c that we already compile. We could put it
in its own file but this is just as convenient, and it's easily moved anyways.
Having it be a FATE test ensures that it is always up-to-date.
Signed-off-by: Niklas Haas <git@haasn.dev>
Replace plain memcmp+fail() with checkasm_check_pixel_padded() for
DC, planar, and angular prediction tests. Use PIXEL_RECT for output
buffers instead of flat arrays.
This enables:
- Detailed per-pixel difference output when run with 'checkasm -v'
- Detection of out-of-bounds writes beyond the NxN block area
- Padding violation reporting (writes past block boundary)
Previously, a test failure would only report "FAILED" with no
information about which pixels were wrong, making assembly debugging
difficult. Follows the pattern established in 4d4b301e4a (checkasm:
hevc_pel: Use helpers for checking for writes out of bounds).
Suggested-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
The current approach of re-testing the C reference for every backend
separately leads to both confusing output (e.g. having an extra redundant
`memcpy_c` line for every op, even those not implemented by the memcpy
backend), as well as a lot of unnecessary wasted time re-testing and
re-benching the same C variant for every backend.
This new API function lets us test the C function only a single time, while
simultaneously having all of the other backends implicitly compare themselves
against the C reference.
Signed-off-by: Niklas Haas <git@haasn.dev>
The fate test for unscaled conversions (fate-sws-unscaled) does not
test the filtering (scaling) paths.
This commit adds a test for all the scaling paths for the new swscale
code, but only runs 2% of the tests (otherwise this test alone would
take about two and a half minutes on a modern x86_64 machine).
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
This is more explicit than -flags unstable, because it also excludes
any pixel formats that are only handled by the legacy code.
Sponsored-by: Sovereign Tech Fund
Co-authored-by: Niklas Haas <git@haasn.dev>
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Commit 4569ab7eaa tried to set this
only on the object files for the checkasm library itself, but
missed that EXT_CHECKASMOBJS lacks the path prefix, thus this
wasn't set at all.
Alternatively, for simplicity, we could keep passing this for
all checkasm object files, not only the checkasm library objects;
the other object files don't use it in any case.
Fixes stack overflow on Windows when by default we have 1 MB.
Individually those functions fit, but when they are all inlined, it's
too much.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
The -unscaled parameter has been removed in favour of "-scaler none".
Some legacy scalers cannot be selected with these options (i.e.: SWS_X
and SWS_FAST_BILINEAR). To test these, the -flags parameter shoule be
used instead.
This option sets the scaler/scaler_sub fields in SwsContext. There is a
comment about these fields in struct SwsContext:
Note: Does not affect the legacy (stateful) API.
This comment is not entirely correct, since scaler/scaler_sub are taken
into consideration to select the algorithm, but that doesn't update the
flags field, which is still used to select implementations:
libswscale/x86/swscale.c:574: if (c->opts.flags & SWS_FAST_BILINEAR && c->canMMXEXTBeUsed) {
libswscale/ppc/swscale_vsx.c:2033: if (c->opts.flags & SWS_FAST_BILINEAR && c->opts.dst_w >= c->opts.src_w && c->chrDstW >= c->chrSrcW) {
libswscale/swscale_unscaled.c:2465: && (!needsDither || (c->opts.flags&(SWS_FAST_BILINEAR|SWS_POINT))))
libswscale/swscale_unscaled.c:2650: if (c->opts.flags&(SWS_FAST_BILINEAR|SWS_POINT)) {
libswscale/utils.c:1279: && !(sws->flags & SWS_FAST_BILINEAR)
libswscale/utils.c:1388: (flags & SWS_FAST_BILINEAR)))
libswscale/utils.c:1417: && (flags & SWS_FAST_BILINEAR)) {
libswscale/utils.c:1437: if (flags & SWS_FAST_BILINEAR) {
libswscale/utils.c:1648: if (c->canMMXEXTBeUsed && (flags & SWS_FAST_BILINEAR)) {
libswscale/swscale.c:678: if (c->opts.flags & SWS_FAST_BILINEAR) {
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
This is required for overriding defines that exist in the public
headers of checkasm, when e.g. building with assembly disabled
for an architecture where we normally would use the checked_call
wrapper.
This fixes a leftover in how checkasm is integrated into the
ffmpeg build system; there were many different approaches
considered for fixing --disable-asm, and the ffmpeg configure
integration didn't end up matching the final solution.
This fixes building with --disable-asm.
The checkasm tool originated in x264. It was later rewritten and
modernized for FFmpeg (and relicensed to LGPL). For the dav1d
project, it was relicensed again to 2-clause BSD (with permission
from the relevant authors).
The FFmpeg and dav1d implementations of checkasm have since evolved
independently (with some amount of ported code between the two,
with relicensing permission where relevant).
To synchronize the development, and to make it possible to easily
adopt checkasm in other projects, it has been split out into a
standalone project/library on its own, developed at
https://code.videolan.org/videolan/checkasm/.
That version has all the features of checkasm in both FFmpeg and
dav1d, and has got a number of extra improvements on top:
- More/fixed tests (e.g. properly clobbering high bits of 32-bit registers
on most platforms),
- Vastly improved overall performance / runtime for benchmarking, due
primarily to the ability to scale the runtime of each test to that test's
complexity.
- Much more robust statistical analysis of benchmarking results; including
robust outlier rejection, an estimation of the histogram, and the ability
to report the variance / stddev in addition to the (trimmed) mean.
- Interactive HTML and JSON output formats in addition to CSV/TSV.
- More readable and user-friendly output across the board, especially for
failures and data dumps (e.g. also showing errors inside padding bytes).
- Better cross-platform support, including dynamic fallback of timer
implementations on ARM platforms, a better RISC-V and AArch64 harness,
and more.
On AArch64, it tests which timer out of pmccntr_el0, linux perf,
macos kperf, cntvct_el0 is available, without the user needing to
configure things, and falling back on clock_gettime if neither of
them can be used. This means one automatically gets the best
available timer, if userspace access to pmccntr_el0 has been
unlocked with a kernel module, or if one has permission to use
the perf API, or if the cntvct_el0 is exact enough to be useful.
On AArch64 macOS, there is now a test harness that catches clobbered
registers and stack clobbering, like on other platforms.
- An option for setting affinity, for benchmarking on heterogenous
core systems. (On Linux, this is already easily done through
taskset, but on Windows, the checkasm built in option makes it
possible there as well, and portable.)
- Printing of the tested CPU core name, where possible.
To integrate this external implementation of checkasm into FFmpeg,
without having to build libcheckasm as an external library, the upstream
sources are added as a git subtree, and integrated into the FFmpeg
build system as a foreign source.
For the long and storied history of how we arrived at this solution,
see: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/22546
The relevant config headers for checkasm are generated by configure,
and the sources are built as part of the main ffmpeg build. The
upstream sources, while they use meson as primary build system,
are structured to make it easy to build as part of a foreign build
system.
The existing testcases are mostly kept untouched (only three minor
changes are required, in crc.c, sw_ops.c and vp8dsp.c), while the
majority of the logic from checkasm.c, checkasm.h and the arch
specific assembly files are removed, replaced with the external
implementation.
Co-Authored-By: Martin Storsjö <martin@martin.st>
Signed-off-by: Niklas Haas <git@haasn.dev>
To reproduce this commit, run:
$ git subtree add --squash --prefix=tests/checkasm/ext \
https://code.ffmpeg.org/FFmpeg/checkasm.git master
To update at a later point in time, replace `add` by `pull`
Pre-emptively exclude the external checkasm sources. Split off from the
following merge commit to make the history easier to follow.
Signed-off-by: Niklas Haas <git@haasn.dev>
Not only is this duplicating code, but it also hard-codes a reference to
`checkasm_lfg`, which I want to eliminate in the interest of being able to
switch out the checkasm implementation.
The test data size is quite large, so re-setting up unused data is eating up
quite a significant amount of CPU time.
This commit cuts execution time of sw_ops in half.
Signed-off-by: Niklas Haas <git@haasn.dev>
Outputting an UNSPEC layout will make most callers guess the speaker layout, and
more likely than not get it wrong.
Now that we can freely export custom order layouts, lets use them.
Signed-off-by: James Almer <jamrial@gmail.com>
The heuristics run to detect PES streams are much laxer than mp3/ac3 ones,
which check for valid headers, so it should not have a higher score than the
latter.
Fixes misdetection of some mp3 files with big id3v2 tags at the beginning.
Signed-off-by: James Almer <jamrial@gmail.com>
When AV_PKT_DATA_HEVC_CONF is present on an HEVC track, write
an hvcE BlockAdditionMapping alongside the existing dvcC/dvvC one,
carrying the raw HEVCDecoderConfigurationRecord for the enhancement layer.
Handle MATROSKA_BLOCK_ADD_ID_TYPE_HVCE in mkv_parse_block_addition_mappings
and store the raw HEVCDecoderConfigurationRecord as
AV_PKT_DATA_HEVC_CONF on the stream's coded side data, mirroring
the existing dvcC/dvvC handling.
When AV_PKT_DATA_HEVC_CONF is present on a MODE_MP4 HEVC
track, write it as an hvcE box alongside hvcC and dvcC. Like dvcC,
writing requires -strict unofficial.
The hvcE box carries the HEVCDecoderConfigurationRecord for the Dolby
Vision enhancement layer in ISOM-based containers. Store its raw
contents as AV_PKT_DATA_HEVC_CONF on the stream's coded side data,
mirroring the existing dvcC/dvvC handling.
Should fix buffer overflows as reported by clang-asan and use of uninitialized
values as reported by valgrind.
Signed-off-by: James Almer <jamrial@gmail.com>
Add a CNG (comfortnoise) round-trip FATE test using the existing enc_dec_pcm + framemd5 pattern and include its generated reference output.
and a 2nd test that compares MD5 of the encoded stream
Tested on x86-32 & 64, arm, mips qemu
Co-Authored-with: AI