dav1d

10 Commits

Author	SHA1	Message	Date
Niklas Haas	3a2a874994	tests/checkasm: switch to external checkasm There are a number of benefits tied to the upstream / third-party checkasm version, including: - Improved long-term maintainability, code reuse with other projects, etc. - Vastly improved overall performance / runtime for benchmarking, due primarily to the ability to scale the runtime of each test to that test's complexity. - Much more robust statistical analysis of benchmarking results; including robust outlier rejection, an estimation of the histogram, and the ability to report the variance / stddev in addition to the (trimmed) mean. - Interactive HTML and JSON output formats in addition to CSV/TSV. - More readable and user-friendly output across the board, especially for failures and data dumps (e.g. also showing errors inside padding bytes). - Better cross-platform support, including dynamic fallback of timer implementations on ARM platforms, a better RISC-V harness, and more. There are multiple approaches to how we can solve the problem of integrating this third party checkasm into dav1d, but I think the hybrid approach of loading it as an external dependency, falling back to a meson wrap file, provides the best overall compromise. This avoids the messiness of git e.g. git submodules, while still allowing us to pin individual tags.	2026-01-01 17:33:55 +01:00
Niklas Haas	3374404179	tests/checkasm/loopfilter: avoid printf format warning Upstream checkasm adds a printf format attribute to report(), so we should avoid directly passing the name string to silence a warning.	2026-01-01 12:29:02 +01:00
Niklas HaasandHenrik Gramner	fcbc3d1b93	loopfilter: align Av1FilterLUT struct members Fixes a bug where the Av1FilterLUT instance used in checkasm was not aligned properly. In theory, the first ALIGN macro should imply the latter alignments as well, but I decided to mark all fields as explicitly aligned for clarity; and because that's the precedent set in other headers. Allows us to drop the ALIGN macro on the other usage of this struct.	2025-10-20 13:50:43 +00:00
Niklas HaasandLuca Barbato	56f6d16602	riscv64/mc: Re-order instructions To avoid read-after-write. Speedup is about 1% for width=4 on a K230.	2024-10-09 16:18:42 +02:00
Niklas HaasandLuca Barbato	3d12677c54	riscv64/mc: Add bidir functions This code compromises between the performance of a dedicated kernel per VLEN/width pair, and the flexibility of a fully VLEN-dynamic loop, by using a single special case for w=4, and subdividing the rest into the unrolled four line fast path, and the general-purpose slow path (for large width on small VLEN). Kendryte K230 avg_w4_8bpc_c: 346.8 ( 1.00x) avg_w4_8bpc_rvv: 50.3 ( 6.90x) avg_w8_8bpc_c: 1054.9 ( 1.00x) avg_w8_8bpc_rvv: 139.1 ( 7.58x) avg_w16_8bpc_c: 3396.3 ( 1.00x) avg_w16_8bpc_rvv: 350.6 ( 9.69x) avg_w32_8bpc_c: 13734.3 ( 1.00x) avg_w32_8bpc_rvv: 1226.3 (11.20x) avg_w64_8bpc_c: 33260.9 ( 1.00x) avg_w64_8bpc_rvv: 3869.4 ( 8.60x) avg_w128_8bpc_c: 83441.3 ( 1.00x) avg_w128_8bpc_rvv: 9765.1 ( 8.54x) w_avg_w4_8bpc_c: 444.3 ( 1.00x) w_avg_w4_8bpc_rvv: 75.8 ( 5.86x) w_avg_w8_8bpc_c: 1365.6 ( 1.00x) w_avg_w8_8bpc_rvv: 208.8 ( 6.54x) w_avg_w16_8bpc_c: 4420.8 ( 1.00x) w_avg_w16_8bpc_rvv: 570.7 ( 7.75x) w_avg_w32_8bpc_c: 18010.9 ( 1.00x) w_avg_w32_8bpc_rvv: 2074.4 ( 8.68x) w_avg_w64_8bpc_c: 43050.4 ( 1.00x) w_avg_w64_8bpc_rvv: 5799.5 ( 7.42x) w_avg_w128_8bpc_c: 107153.6 ( 1.00x) w_avg_w128_8bpc_rvv: 14272.0 ( 7.51x) mask_w4_8bpc_c: 497.6 ( 1.00x) mask_w4_8bpc_rvv: 88.5 ( 5.63x) mask_w8_8bpc_c: 1528.5 ( 1.00x) mask_w8_8bpc_rvv: 253.1 ( 6.04x) mask_w16_8bpc_c: 4953.8 ( 1.00x) mask_w16_8bpc_rvv: 679.0 ( 7.30x) mask_w32_8bpc_c: 20298.3 ( 1.00x) mask_w32_8bpc_rvv: 3012.9 ( 6.74x) mask_w64_8bpc_c: 49718.8 ( 1.00x) mask_w64_8bpc_rvv: 7291.7 ( 6.82x) mask_w128_8bpc_c: 126740.3 ( 1.00x) mask_w128_8bpc_rvv: 18351.1 ( 6.91x)	2024-10-09 16:18:42 +02:00
Niklas HaasandLuca Barbato	50ac82603a	riscv: Add $vtype helper definitions	2024-10-09 16:18:42 +02:00
Niklas Haas	e58afe4dd9	Don't hard-code FGS block size Avoiding this hard-coded round-and-shift allows FGS to continue working when modifying FG_BLOCK_SIZE (for whatever reason), and is better style (no magic constants).	2023-07-25 16:10:07 +02:00
Niklas Haas	202f68e4d0	Rename BLOCK_SIZE to FG_BLOCK_SIZE Makes this (globally available) constant more descriptive.	2023-07-25 16:08:51 +02:00
Niklas Haas	2a18394511	Expose dav1d_apply_grain as part of the public API This change is motivated by a desire to be able to toggle between CPU and GPU film gain synthesis in players such as VLC. Because VLC initializes the codec before the vout (and, indeed, the active vout module may change in the middle of decoding), it cannot make the decision of whether to apply film grain in libdav1d as part of codec initialization. It needs to be decided on a frame-by-frame basis depending on whether the currently active vout supports film grain synthesis or not. Using the new API, users like VLC can simply set `apply_grain` to 0 and then manually call `dav1d_apply_grain` whenever the vout does not support GPU film grain synthesis. As a side note, `dav1d_apply_grain` could also technically be called from dedicated worker threads, something that libdav1d does not currently do internally. The alternative to this solution would have been to allow changing Dav1dSettings at runtime, but that would be more invasive and a proper API would also need to take other settings into consideration, some of which can't be changed as easily as `apply_grain`. This commit represents a stop-gap solution. Bump the minor version to allow clients to depend on this API.	2022-01-01 17:23:28 +01:00
Niklas Haas	7048ed6218	dav1dplay: Suppress compiler warning The signature of pl_allocate/release_dav1dpic takes a void *cookie, which the compiler warns about if we don't implicitly cast.	2021-10-31 13:18:22 +01:00