10 Commits
Author SHA1 Message Date
Niklas Haas 3a2a874994 tests/checkasm: switch to external checkasm
There are a number of benefits tied to the upstream / third-party checkasm
version, including:

- Improved long-term maintainability, code reuse with other projects, etc.

- Vastly improved overall performance / runtime for benchmarking, due
  primarily to the ability to scale the runtime of each test to that test's
  complexity.

- Much more robust statistical analysis of benchmarking results; including
  robust outlier rejection, an estimation of the histogram, and the ability
  to report the variance / stddev in addition to the (trimmed) mean.

- Interactive HTML and JSON output formats in addition to CSV/TSV.

- More readable and user-friendly output across the board, especially for
  failures and data dumps (e.g. also showing errors inside padding bytes).

- Better cross-platform support, including dynamic fallback of timer
  implementations on ARM platforms, a better RISC-V harness, and more.

There are multiple approaches to how we can solve the problem of integrating
this third party checkasm into dav1d, but I think the hybrid approach of
loading it as an external dependency, falling back to a meson wrap file,
provides the best overall compromise. This avoids the messiness of git e.g.
git submodules, while still allowing us to pin individual tags.
2026-01-01 17:33:55 +01:00
Niklas Haas 3374404179 tests/checkasm/loopfilter: avoid printf format warning
Upstream checkasm adds a printf format attribute to report(), so we should
avoid directly passing the name string to silence a warning.
2026-01-01 12:29:02 +01:00
Niklas HaasandHenrik Gramner fcbc3d1b93 loopfilter: align Av1FilterLUT struct members
Fixes a bug where the Av1FilterLUT instance used in checkasm was not
aligned properly.

In theory, the first ALIGN macro should imply the latter alignments as well,
but I decided to mark all fields as explicitly aligned for clarity; and
because that's the precedent set in other headers.

Allows us to drop the ALIGN macro on the other usage of this struct.
2025-10-20 13:50:43 +00:00
Niklas HaasandLuca Barbato 56f6d16602 riscv64/mc: Re-order instructions
To avoid read-after-write. Speedup is about 1% for width=4 on a K230.
2024-10-09 16:18:42 +02:00
Niklas HaasandLuca Barbato 3d12677c54 riscv64/mc: Add bidir functions
This code compromises between the performance of a dedicated kernel per
VLEN/width pair, and the flexibility of a fully VLEN-dynamic loop, by
using a single special case for w=4, and subdividing the rest into the
unrolled four line fast path, and the general-purpose slow path (for
large width on small VLEN).

Kendryte K230

avg_w4_8bpc_c:          346.8 ( 1.00x)
avg_w4_8bpc_rvv:         50.3 ( 6.90x)
avg_w8_8bpc_c:         1054.9 ( 1.00x)
avg_w8_8bpc_rvv:        139.1 ( 7.58x)
avg_w16_8bpc_c:        3396.3 ( 1.00x)
avg_w16_8bpc_rvv:       350.6 ( 9.69x)
avg_w32_8bpc_c:       13734.3 ( 1.00x)
avg_w32_8bpc_rvv:      1226.3 (11.20x)
avg_w64_8bpc_c:       33260.9 ( 1.00x)
avg_w64_8bpc_rvv:      3869.4 ( 8.60x)
avg_w128_8bpc_c:      83441.3 ( 1.00x)
avg_w128_8bpc_rvv:     9765.1 ( 8.54x)

w_avg_w4_8bpc_c:        444.3 ( 1.00x)
w_avg_w4_8bpc_rvv:       75.8 ( 5.86x)
w_avg_w8_8bpc_c:       1365.6 ( 1.00x)
w_avg_w8_8bpc_rvv:      208.8 ( 6.54x)
w_avg_w16_8bpc_c:      4420.8 ( 1.00x)
w_avg_w16_8bpc_rvv:     570.7 ( 7.75x)
w_avg_w32_8bpc_c:     18010.9 ( 1.00x)
w_avg_w32_8bpc_rvv:    2074.4 ( 8.68x)
w_avg_w64_8bpc_c:     43050.4 ( 1.00x)
w_avg_w64_8bpc_rvv:    5799.5 ( 7.42x)
w_avg_w128_8bpc_c:   107153.6 ( 1.00x)
w_avg_w128_8bpc_rvv:  14272.0 ( 7.51x)

mask_w4_8bpc_c:        497.6 ( 1.00x)
mask_w4_8bpc_rvv:       88.5 ( 5.63x)
mask_w8_8bpc_c:       1528.5 ( 1.00x)
mask_w8_8bpc_rvv:      253.1 ( 6.04x)
mask_w16_8bpc_c:      4953.8 ( 1.00x)
mask_w16_8bpc_rvv:     679.0 ( 7.30x)
mask_w32_8bpc_c:     20298.3 ( 1.00x)
mask_w32_8bpc_rvv:    3012.9 ( 6.74x)
mask_w64_8bpc_c:     49718.8 ( 1.00x)
mask_w64_8bpc_rvv:    7291.7 ( 6.82x)
mask_w128_8bpc_c:   126740.3 ( 1.00x)
mask_w128_8bpc_rvv:  18351.1 ( 6.91x)
2024-10-09 16:18:42 +02:00
Niklas HaasandLuca Barbato 50ac82603a riscv: Add $vtype helper definitions 2024-10-09 16:18:42 +02:00
Niklas Haas e58afe4dd9 Don't hard-code FGS block size
Avoiding this hard-coded round-and-shift allows FGS to continue working
when modifying FG_BLOCK_SIZE (for whatever reason), and is better style
(no magic constants).
2023-07-25 16:10:07 +02:00
Niklas Haas 202f68e4d0 Rename BLOCK_SIZE to FG_BLOCK_SIZE
Makes this (globally available) constant more descriptive.
2023-07-25 16:08:51 +02:00
Niklas Haas 2a18394511 Expose dav1d_apply_grain as part of the public API
This change is motivated by a desire to be able to toggle between CPU
and GPU film gain synthesis in players such as VLC. Because VLC
initializes the codec before the vout (and, indeed, the active vout
module may change in the middle of decoding), it cannot make the
decision of whether to apply film grain in libdav1d as part of codec
initialization. It needs to be decided on a frame-by-frame basis
depending on whether the currently active vout supports film grain
synthesis or not.

Using the new API, users like VLC can simply set `apply_grain` to 0 and
then manually call `dav1d_apply_grain` whenever the vout does not
support GPU film grain synthesis. As a side note, `dav1d_apply_grain`
could also technically be called from dedicated worker threads,
something that libdav1d does not currently do internally.

The alternative to this solution would have been to allow changing
Dav1dSettings at runtime, but that would be more invasive and a proper
API would also need to take other settings into consideration, some of
which can't be changed as easily as `apply_grain`. This commit
represents a stop-gap solution.

Bump the minor version to allow clients to depend on this API.
2022-01-01 17:23:28 +01:00
Niklas Haas 7048ed6218 dav1dplay: Suppress compiler warning
The signature of pl_allocate/release_dav1dpic takes a void *cookie,
which the compiler warns about if we don't implicitly cast.
2021-10-31 13:18:22 +01:00