There are a number of benefits tied to the upstream / third-party checkasm
version, including:
- Improved long-term maintainability, code reuse with other projects, etc.
- Vastly improved overall performance / runtime for benchmarking, due
primarily to the ability to scale the runtime of each test to that test's
complexity.
- Much more robust statistical analysis of benchmarking results; including
robust outlier rejection, an estimation of the histogram, and the ability
to report the variance / stddev in addition to the (trimmed) mean.
- Interactive HTML and JSON output formats in addition to CSV/TSV.
- More readable and user-friendly output across the board, especially for
failures and data dumps (e.g. also showing errors inside padding bytes).
- Better cross-platform support, including dynamic fallback of timer
implementations on ARM platforms, a better RISC-V harness, and more.
There are multiple approaches to how we can solve the problem of integrating
this third party checkasm into dav1d, but I think the hybrid approach of
loading it as an external dependency, falling back to a meson wrap file,
provides the best overall compromise. This avoids the messiness of git e.g.
git submodules, while still allowing us to pin individual tags.
On AArch64, the performance counter registers usually are
restricted and not accessible from user space.
On macOS, we currently use mach_absolute_time() as timer on
aarch64. This measures wallclock time but with a very coarse
resolution.
There is a private API, kperf, that one can use for getting
high precision timers though. Unfortunately, it requires running
the checkasm binary as root (e.g. with sudo).
Also, as it is a private, undocumented API, it can potentially
change at any time.
This is handled by adding a new meson build option, for switching
to this timer. If the timer source in checkasm could be changed
at runtime with an option, this wouldn't need to be a build time
option.
This allows getting benchmarks like this:
mc_8tap_regular_w16_hv_8bpc_c: 1522.1 ( 1.00x)
mc_8tap_regular_w16_hv_8bpc_neon: 331.8 ( 4.59x)
Instead of this:
mc_8tap_regular_w16_hv_8bpc_c: 9.0 ( 1.00x)
mc_8tap_regular_w16_hv_8bpc_neon: 1.9 ( 4.76x)
Co-authored-by: J. Dekker <jdek@itanimul.li>
When compiling with asm enabled there's no point in compiling
C versions of DSP functions that have asm implementations using
instruction sets that the compiler can unconditionally use.
E.g. when compiling with -mssse3 we can remove the C version
of all functions with SSSE3 implementations.
This is accomplished using the compiler's dead code elimination
functionality.
Can be configured using the new 'trim_dsp' meson option, which
by default is enabled when compiling in release mode.
The required 'xxhash.h' header can either be in system include directory
or can be copied to 'tools/output'.
The xxh3_128bits based muxer shows no significant slowdown compared to
the null muxer. Decoding times Chimera-AV1-8bit-1920x1080-6736kbps.ivf
with 4 frame and 4 tile threads on a core i7-8550U (disabled turbo boost):
null: 72.5 s
md5: 99.8 s
xxh3: 73.8 s
Decoding Chimera-AV1-10bit-1920x1080-6191kbps.ivf with 6 frame and 4 tile
threads on a m1 mc mini:
null: 27.8 s
md5: 105.9 s
xxh3: 28.3 s
Needed for oss-fuzz after switching to '-fsanitize=fuzzer' for the
libfuzzer based build. Adding '-fsanitize=fuzzer' for all oss-fuzz based
build breaks afl.
Replaces the boolean 'build_libfuzzer' meson option with 'fuzzing_engine'.
This allows reproducing fuzzing test cases on systems without libfuzzer.
Also prevents regressions in the fuzzing test target since it will be
build by default.
Disabled by default, enabble with `meson -Dbuild_libfuzzer=true -Db_lundef=false ...`.
Fuzz target improved by the paralell work by Thierry Foucu in !138.