2799 Commits
Author SHA1 Message Date
Martin Storsjö b2f9c10670 checkasm: Fix building with MSVC
The glue code in our headers, for integrating with the external
checkasm, was incompatible with MSVC.

MSVC has a nonstandard handling of __VA_ARGS__ with macros; when
one macro invokes another macro, __VA_ARGS__ gets treated as one
single parameter and can't map to more than one parameter in the
invoked macro. (In other words, when calling another macro,
__VA_ARGS__ must map in its entirety to a ... parameter of the
other macro.)

Modern versions of MSVC do implement the correct mode as well,
but defaults to the old one for backwards compatibility. To
choose the new mode, we'd have to build our code with
-Zc:preprocessor. That's certainly doable, but it's fairly easy to
avoid the issue as well.

To avoid this issue, change the variadic PIXEL_RECT(...) to explicitly
names its arguments. There's actually no variability in the arguments
involved here. (Alternatively, we could force the preprocessor to expand
the arguments one extra time, avoiding the issue, with e.g.
"#define EXPAND(x) x" and wrapping PIXEL_RECT with it, e.g.
"#define PIXEL_RECT(...) EXPAND(BUF_RECT(pixel, __VA_ARGS__))".)

See [1], [2] and [3] for more discussion on the matter.

[1] https://stackoverflow.com/a/5134656/3115956
[2] https://stackoverflow.com/a/7459803/3115956
[2] https://learn.microsoft.com/en-us/cpp/preprocessor/preprocessor-experimental-overview?view=msvc-160
2026-01-07 11:57:09 +02:00
Niklas Haas 3a2a874994 tests/checkasm: switch to external checkasm
There are a number of benefits tied to the upstream / third-party checkasm
version, including:

- Improved long-term maintainability, code reuse with other projects, etc.

- Vastly improved overall performance / runtime for benchmarking, due
  primarily to the ability to scale the runtime of each test to that test's
  complexity.

- Much more robust statistical analysis of benchmarking results; including
  robust outlier rejection, an estimation of the histogram, and the ability
  to report the variance / stddev in addition to the (trimmed) mean.

- Interactive HTML and JSON output formats in addition to CSV/TSV.

- More readable and user-friendly output across the board, especially for
  failures and data dumps (e.g. also showing errors inside padding bytes).

- Better cross-platform support, including dynamic fallback of timer
  implementations on ARM platforms, a better RISC-V harness, and more.

There are multiple approaches to how we can solve the problem of integrating
this third party checkasm into dav1d, but I think the hybrid approach of
loading it as an external dependency, falling back to a meson wrap file,
provides the best overall compromise. This avoids the messiness of git e.g.
git submodules, while still allowing us to pin individual tags.
2026-01-01 17:33:55 +01:00
Niklas Haas 3374404179 tests/checkasm/loopfilter: avoid printf format warning
Upstream checkasm adds a printf format attribute to report(), so we should
avoid directly passing the name string to silence a warning.
2026-01-01 12:29:02 +01:00
Jean-Baptiste Kempf b546257f77 NEWS for 1.5.3 1.5.3 2025-12-31 15:50:45 +01:00
Nathan E. EggeandJean-Baptiste Kempf 844510cdb4 Add argon bitstream conformance test instructions 2025-12-30 14:56:50 +01:00
Nathan E. Egge 5e8c380e4b riscv64/mc16: Keep blend_v RVV operations in 16-bits
Kendryte K230                Before             After         Delta

blend_v_w2_16bpc_c:       240.9 ( 1.00x)    240.9 ( 1.00x)    0.00%
blend_v_w2_16bpc_rvv:     149.7 ( 1.61x)    155.4 ( 1.55x)    3.81%
blend_v_w4_16bpc_c:      1072.4 ( 1.00x)   1072.5 ( 1.00x)    0.01%
blend_v_w4_16bpc_rvv:     307.2 ( 3.49x)    299.9 ( 3.58x)   -2.38%
blend_v_w8_16bpc_c:      2004.7 ( 1.00x)   2010.2 ( 1.00x)    0.27%
blend_v_w8_16bpc_rvv:     436.1 ( 4.60x)    381.0 ( 5.28x)  -12.63%
blend_v_w16_16bpc_c:     3859.4 ( 1.00x)   3853.7 ( 1.00x)   -0.15%
blend_v_w16_16bpc_rvv:    761.1 ( 5.07x)    554.0 ( 6.96x)  -27.21%
blend_v_w32_16bpc_c:     7509.7 ( 1.00x)   7505.3 ( 1.00x)   -0.06%
blend_v_w32_16bpc_rvv:   1427.1 ( 5.26x)   1005.5 ( 7.46x)  -29.54%

SpacemiT K1                  Before             After         Delta

blend_v_w2_16bpc_c:       220.1 ( 1.00x)    222.0 ( 1.00x)    0.86%
blend_v_w2_16bpc_rvv:     146.6 ( 1.50x)    151.1 ( 1.47x)    3.07%
blend_v_w4_16bpc_c:       968.3 ( 1.00x)    969.6 ( 1.00x)    0.13%
blend_v_w4_16bpc_rvv:     281.2 ( 3.44x)    290.2 ( 3.34x)    3.20%
blend_v_w8_16bpc_c:      1809.5 ( 1.00x)   1812.1 ( 1.00x)    0.14%
blend_v_w8_16bpc_rvv:     374.2 ( 4.84x)    375.3 ( 4.83x)    0.29%
blend_v_w16_16bpc_c:     3479.7 ( 1.00x)   3480.9 ( 1.00x)    0.03%
blend_v_w16_16bpc_rvv:    521.5 ( 6.67x)    465.9 ( 7.47x)  -10.66%
blend_v_w32_16bpc_c:     6767.9 ( 1.00x)   6773.7 ( 1.00x)    0.09%
blend_v_w32_16bpc_rvv:    852.1 ( 7.94x)    727.4 ( 9.31x)  -14.63%

Blackhole p100a              Before             After         Delta

blend_v_w2_16bpc_c:       205.6 ( 1.00x)    206.0 ( 1.00x)    0.19%
blend_v_w2_16bpc_rvv:     176.5 ( 1.16x)    143.6 ( 1.44x)  -18.64%
blend_v_w4_16bpc_c:       901.0 ( 1.00x)    891.8 ( 1.00x)   -1.02%
blend_v_w4_16bpc_rvv:     298.8 ( 3.02x)    235.2 ( 3.79x)  -21.29%
blend_v_w8_16bpc_c:      1663.3 ( 1.00x)   1656.5 ( 1.00x)   -0.41%
blend_v_w8_16bpc_rvv:     300.1 ( 5.54x)    236.4 ( 7.01x)  -21.23%
blend_v_w16_16bpc_c:     3192.1 ( 1.00x)   3182.3 ( 1.00x)   -0.31%
blend_v_w16_16bpc_rvv:    349.2 ( 9.14x)    311.4 (10.22x)  -10.82%
blend_v_w32_16bpc_c:     6259.2 ( 1.00x)   6257.8 ( 1.00x)   -0.02%
blend_v_w32_16bpc_rvv:    350.2 (17.88x)    321.8 (19.44x)   -8.11%
2025-12-30 13:47:49 +00:00
Nathan E. Egge d2fa9466be riscv64/mc16: Keep blend RVV operations in 16-bits
Kendryte K230                Before             After         Delta

blend_w4_16bpc_c:         227.0 ( 1.00x)    227.1 ( 1.00x)    0.04%
blend_w4_16bpc_rvv:        71.1 ( 3.19x)     73.2 ( 3.10x)    2.95%
blend_w8_16bpc_c:         662.5 ( 1.00x)    662.7 ( 1.00x)    0.03%
blend_w8_16bpc_rvv:       132.4 ( 5.00x)    115.0 ( 5.76x)  -13.14%
blend_w16_16bpc_c:       2559.3 ( 1.00x)   2559.8 ( 1.00x)    0.02%
blend_w16_16bpc_rvv:      416.1 ( 6.15x)    326.7 ( 7.83x)  -21.49%
blend_w32_16bpc_c:       6483.9 ( 1.00x)   6484.5 ( 1.00x)    0.01%
blend_w32_16bpc_rvv:     1029.1 ( 6.30x)    774.7 ( 8.37x)  -24.72%

SpacemiT K1                  Before             After         Delta

blend_w4_16bpc_c:         206.1 ( 1.00x)    207.0 ( 1.00x)    0.44%
blend_w4_16bpc_rvv:        64.4 ( 3.20x)     69.5 ( 2.98x)    7.92%
blend_w8_16bpc_c:         600.2 ( 1.00x)    600.9 ( 1.00x)    0.12%
blend_w8_16bpc_rvv:       101.6 ( 5.91x)    106.9 ( 5.62x)    5.22%
blend_w16_16bpc_c:       2316.0 ( 1.00x)   2316.4 ( 1.00x)    0.02%
blend_w16_16bpc_rvv:      261.8 ( 8.85x)    229.1 (10.11x)  -12.49%
blend_w32_16bpc_c:       5861.1 ( 1.00x)   5860.4 ( 1.00x)   -0.01%
blend_w32_16bpc_rvv:      602.9 ( 9.72x)    475.3 (12.33x)  -21.16%

Blackhole p100a              Before             After         Delta

blend_w4_16bpc_c:         193.3 ( 1.00x)    191.3 ( 1.00x)   -1.03%
blend_w4_16bpc_rvv:        66.3 ( 2.91x)     65.4 ( 2.92x)   -1.36%
blend_w8_16bpc_c:         552.0 ( 1.00x)    549.8 ( 1.00x)   -0.40%
blend_w8_16bpc_rvv:       100.5 ( 5.49x)     96.2 ( 5.71x)   -4.28%
blend_w16_16bpc_c:       2112.5 ( 1.00x)   2111.8 ( 1.00x)   -0.03%
blend_w16_16bpc_rvv:      190.3 (11.10x)    185.9 (11.36x)   -2.31%
blend_w32_16bpc_c:       5417.5 ( 1.00x)   5416.2 ( 1.00x)   -0.02%
blend_w32_16bpc_rvv:      290.3 (18.66x)    304.0 (17.82x)    4.72%
2025-12-30 13:47:49 +00:00
Kyle Siefring 5f7f8ff5b6 cdef: consolidate left edge backup state 2025-12-28 22:48:27 -05:00
Nathan E. Egge a31e4bd757 riscv64/mc16: Add VLEN=512 16bpc RVV blend_{,v} functions
Blackhole p100a               Before             After         Delta

blend_w4_16bpc_c:         193.1 ( 1.00x)     186.8 ( 1.00x)   -3.26%
blend_w4_16bpc_rvv:        64.8 ( 2.98x)      62.8 ( 2.97x)   -3.09%
blend_w8_16bpc_c:         551.0 ( 1.00x)     546.0 ( 1.00x)   -0.91%
blend_w8_16bpc_rvv:        96.2 ( 5.73x)      93.4 ( 5.85x)   -2.91%
blend_w16_16bpc_c:       2111.6 ( 1.00x)    2107.0 ( 1.00x)   -0.22%
blend_w16_16bpc_rvv:      189.9 (11.12x)     189.6 (11.11x)   -0.16%
blend_w32_16bpc_c:       5403.9 ( 1.00x)    5398.5 ( 1.00x)   -0.10%
blend_w32_16bpc_rvv:      292.4 (18.48x)     291.5 (18.52x)   -0.31%

blend_v_w2_16bpc_c:       209.1 ( 1.00x)     208.7 ( 1.00x)   -0.19%
blend_v_w2_16bpc_rvv:     180.3 ( 1.16x)     180.4 ( 1.16x)    0.06%
blend_v_w4_16bpc_c:       896.9 ( 1.00x)     898.5 ( 1.00x)    0.18%
blend_v_w4_16bpc_rvv:     303.0 ( 2.96x)     302.5 ( 2.97x)   -0.17%
blend_v_w8_16bpc_c:      1658.9 ( 1.00x)    1663.1 ( 1.00x)    0.25%
blend_v_w8_16bpc_rvv:     303.0 ( 5.47x)     302.6 ( 5.50x)   -0.13%
blend_v_w16_16bpc_c:     3186.0 ( 1.00x)    3182.7 ( 1.00x)   -0.10%
blend_v_w16_16bpc_rvv:    313.1 (10.17x)     312.1 (10.20x)   -0.32%
blend_v_w32_16bpc_c:     6253.9 ( 1.00x)    6257.0 ( 1.00x)    0.05%
blend_v_w32_16bpc_rvv:    355.4 (17.60x)     353.2 (17.72x)   -0.62%
2025-12-24 01:08:19 +00:00
Marvin ScholzandJean-Baptiste Kempf bf82cfa74e meson: clarify xxhash error message
We do not need to mention the details of the check in the message
as those are already logged by meson when doing the check. Instead
mention why this is an error to make it more clear it is related to
the xxhash_muxer option.

Fix #397
2025-12-23 10:12:58 +01:00
Marvin ScholzandJean-Baptiste Kempf 549a6a0983 meson: leave tools subdir early when tools are disabled
Only dav1d_input_objs are needed by dav1dplay, so we can leave this
directory early and avoid looking for xxhash when it is actually
never used.
2025-12-23 10:12:58 +01:00
Nathan E. Egge 38dd16e108 riscv64/mc: Add VLEN=512 8bpc RVV blend_{,h,v} functions
Blackhole p100a               Before             After         Delta

blend_w4_8bpc_c:          190.7 ( 1.00x)     189.3 ( 1.00x)   -0.73%
blend_w4_8bpc_rvv:         61.2 ( 3.12x)      59.7 ( 3.17x)   -2.45%
blend_w8_8bpc_c:          550.7 ( 1.00x)     547.0 ( 1.00x)   -0.67%
blend_w8_8bpc_rvv:         91.0 ( 6.05x)      89.4 ( 6.12x)   -1.76%
blend_w16_8bpc_c:        2112.4 ( 1.00x)    2106.8 ( 1.00x)   -0.27%
blend_w16_8bpc_rvv:       177.1 (11.92x)     174.8 (12.05x)   -1.30%
blend_w32_8bpc_c:        5423.8 ( 1.00x)    5393.8 ( 1.00x)   -0.55%
blend_w32_8bpc_rvv:       233.5 (23.23x)     230.7 (23.38x)   -1.20%

blend_h_w2_8bpc_c:        126.4 ( 1.00x)     128.0 ( 1.00x)    1.27%
blend_h_w2_8bpc_rvv:       85.0 ( 1.49x)      81.2 ( 1.58x)   -4.47%
blend_h_w4_8bpc_c:        221.2 ( 1.00x)     222.2 ( 1.00x)    0.45%
blend_h_w4_8bpc_rvv:       84.3 ( 2.62x)      81.3 ( 2.73x)   -3.56%
blend_h_w8_8bpc_c:        411.9 ( 1.00x)     413.3 ( 1.00x)    0.34%
blend_h_w8_8bpc_rvv:       84.2 ( 4.89x)      81.0 ( 5.10x)   -3.80%
blend_h_w16_8bpc_c:       792.6 ( 1.00x)     793.5 ( 1.00x)    0.11%
blend_h_w16_8bpc_rvv:      84.5 ( 9.38x)      81.5 ( 9.74x)   -3.55%
blend_h_w32_8bpc_c:      1577.7 ( 1.00x)    1578.8 ( 1.00x)    0.07%
blend_h_w32_8bpc_rvv:      86.6 (18.21x)      83.5 (18.90x)   -3.58%
blend_h_w64_8bpc_c:      3099.5 ( 1.00x)    3101.9 ( 1.00x)    0.08%
blend_h_w64_8bpc_rvv:      98.4 (31.49x)      95.2 (32.58x)   -3.25%
blend_h_w128_8bpc_c:     7496.9 ( 1.00x)    7498.1 ( 1.00x)    0.02%
blend_h_w128_8bpc_rvv:    155.4 (48.24x)     151.5 (49.50x)   -2.51%

blend_v_w2_8bpc_c:        202.9 ( 1.00x)     203.5 ( 1.00x)    0.30%
blend_v_w2_8bpc_rvv:      173.5 ( 1.17x)     176.6 ( 1.15x)    1.79%
blend_v_w4_8bpc_c:        842.3 ( 1.00x)     844.2 ( 1.00x)    0.23%
blend_v_w4_8bpc_rvv:      295.9 ( 2.85x)     299.0 ( 2.82x)    1.05%
blend_v_w8_8bpc_c:       1589.9 ( 1.00x)    1592.1 ( 1.00x)    0.14%
blend_v_w8_8bpc_rvv:      296.2 ( 5.37x)     299.0 ( 5.32x)    0.95%
blend_v_w16_8bpc_c:      3090.3 ( 1.00x)    3088.3 ( 1.00x)   -0.06%
blend_v_w16_8bpc_rvv:     296.0 (10.44x)     299.4 (10.32x)    1.15%
blend_v_w32_8bpc_c:      6080.2 ( 1.00x)    6081.5 ( 1.00x)    0.02%
blend_v_w32_8bpc_rvv:     306.3 (19.85x)     309.3 (19.66x)    0.98%
2025-12-22 23:20:23 +00:00
Henrik Gramner 43f3b8d33b checkasm: Group itx functions by their largest dimension
This reduces the number of itx reports per instruction set from
19 to 5, which avoids excessively flooding the console output.
2025-12-09 22:02:52 +01:00
Henrik Gramner 165e9e251b checkasm: Only run DC-only itx tests for dct_dct 2025-12-09 21:01:10 +01:00
Cameron Cawley 84792e61c8 dav1dplay: Print more error messages when window/context creation fails 2025-11-27 20:49:55 +00:00
Cameron Cawley e60603a9f2 dav1dplay: Ensure a newer OpenGL version is used when creating the context 2025-11-27 20:49:41 +00:00
Tristan MatthewsandHenrik Gramner f3a1070f25 input/ivf: handle files with 0 frames
This avoids a subsequent division by zero.
2025-11-25 12:49:05 +00:00
Cameron Cawley 28b165940d Use CLOCK_REALTIME for providing the initial seed value
CLOCK_MONOTONIC is specified as returning time "since an unspecified point in the past". On RISC OS with UnixLib this returns the time since the last hard reset, but with SharedCLibrary this returns the time since the program started - combined with the coarse resolution used internally, this almost always results in a seed of 0.

CLOCK_REALTIME meanwhile is specified as returning time since the epoch, so it should behave consistently across all platforms.
2025-11-22 23:30:02 +00:00
Un1q32 04d588ee94 allow builds on systems without a supported memalign function 2025-11-19 03:15:27 -05:00
Martin Storsjö e7c280e4cd x86: Sync the latest upstream version of x86inc.asm 2025-11-12 23:19:22 +02:00
Martin Storsjö 2eac05d648 checkasm: arm: Use X() instead of inline ifdefs
This works fine when the referenced symbol has the same prefix
as PRIVATE_PREFIX in the same file; otherwise we could also
create a macro like X() that only prepends the extern symbol
prefix but no symbol namespace prefix.
2025-11-12 15:54:40 +02:00
Sungjoon Moon 6deac59d1e riscv64/mc: Add w_mask functions
K230:
checkasm: VLEN=128 bits, using random seed 42
RVV:
 - mc_8bpc.w_mask               [OK]
checkasm: all 18 tests passed
w_mask_420_w4_8bpc_c:        845.1 ( 1.00x)
w_mask_420_w4_8bpc_rvv:      313.1 ( 2.70x)
w_mask_420_w8_8bpc_c:       2589.9 ( 1.00x)
w_mask_420_w8_8bpc_rvv:      549.3 ( 4.72x)
w_mask_420_w16_8bpc_c:      8389.9 ( 1.00x)
w_mask_420_w16_8bpc_rvv:    1250.4 ( 6.71x)
w_mask_420_w32_8bpc_c:     33485.7 ( 1.00x)
w_mask_420_w32_8bpc_rvv:    4276.9 ( 7.83x)
w_mask_420_w64_8bpc_c:     81934.2 ( 1.00x)
w_mask_420_w64_8bpc_rvv:   11243.9 ( 7.29x)
w_mask_420_w128_8bpc_c:   205865.8 ( 1.00x)
w_mask_420_w128_8bpc_rvv:  28098.0 ( 7.33x)
w_mask_422_w4_8bpc_c:        838.6 ( 1.00x)
w_mask_422_w4_8bpc_rvv:      315.9 ( 2.65x)
w_mask_422_w8_8bpc_c:       2576.4 ( 1.00x)
w_mask_422_w8_8bpc_rvv:      564.2 ( 4.57x)
w_mask_422_w16_8bpc_c:      8378.7 ( 1.00x)
w_mask_422_w16_8bpc_rvv:    1305.4 ( 6.42x)
w_mask_422_w32_8bpc_c:     33512.4 ( 1.00x)
w_mask_422_w32_8bpc_rvv:    4487.6 ( 7.47x)
w_mask_422_w64_8bpc_c:     82489.8 ( 1.00x)
w_mask_422_w64_8bpc_rvv:   11895.3 ( 6.93x)
w_mask_422_w128_8bpc_c:   207116.2 ( 1.00x)
w_mask_422_w128_8bpc_rvv:  29541.4 ( 7.01x)
w_mask_444_w4_8bpc_c:        822.7 ( 1.00x)
w_mask_444_w4_8bpc_rvv:      265.3 ( 3.10x)
w_mask_444_w8_8bpc_c:       2542.5 ( 1.00x)
w_mask_444_w8_8bpc_rvv:      429.2 ( 5.92x)
w_mask_444_w16_8bpc_c:      8290.8 ( 1.00x)
w_mask_444_w16_8bpc_rvv:     965.7 ( 8.59x)
w_mask_444_w32_8bpc_c:     33229.6 ( 1.00x)
w_mask_444_w32_8bpc_rvv:    3289.2 (10.10x)
w_mask_444_w64_8bpc_c:     81404.6 ( 1.00x)
w_mask_444_w64_8bpc_rvv:    9126.6 ( 8.92x)
w_mask_444_w128_8bpc_c:   204438.4 ( 1.00x)
w_mask_444_w128_8bpc_rvv:  22424.9 ( 9.12x)

Spacemit K1:
checkasm: VLEN=256 bits, using random seed 42
RVV:
 - mc_8bpc.w_mask               [OK]
checkasm: all 18 tests passed
w_mask_420_w4_8bpc_c:        747.9 ( 1.00x)
w_mask_420_w4_8bpc_rvv:      290.4 ( 2.58x)
w_mask_420_w8_8bpc_c:       2312.3 ( 1.00x)
w_mask_420_w8_8bpc_rvv:      478.9 ( 4.83x)
w_mask_420_w16_8bpc_c:      7509.3 ( 1.00x)
w_mask_420_w16_8bpc_rvv:     885.2 ( 8.48x)
w_mask_420_w32_8bpc_c:     30087.8 ( 1.00x)
w_mask_420_w32_8bpc_rvv:    2595.6 (11.59x)
w_mask_420_w64_8bpc_c:     72313.0 ( 1.00x)
w_mask_420_w64_8bpc_rvv:    6020.9 (12.01x)
w_mask_420_w128_8bpc_c:   179297.0 ( 1.00x)
w_mask_420_w128_8bpc_rvv:  15659.1 (11.45x)
w_mask_422_w4_8bpc_c:        735.0 ( 1.00x)
w_mask_422_w4_8bpc_rvv:      299.0 ( 2.46x)
w_mask_422_w8_8bpc_c:       2285.6 ( 1.00x)
w_mask_422_w8_8bpc_rvv:      488.5 ( 4.68x)
w_mask_422_w16_8bpc_c:      7459.3 ( 1.00x)
w_mask_422_w16_8bpc_rvv:     946.3 ( 7.88x)
w_mask_422_w32_8bpc_c:     29996.7 ( 1.00x)
w_mask_422_w32_8bpc_rvv:    2812.7 (10.66x)
w_mask_422_w64_8bpc_c:     71809.4 ( 1.00x)
w_mask_422_w64_8bpc_rvv:    6253.7 (11.48x)
w_mask_422_w128_8bpc_c:   178081.9 ( 1.00x)
w_mask_422_w128_8bpc_rvv:  16087.8 (11.07x)
w_mask_444_w4_8bpc_c:        726.2 ( 1.00x)
w_mask_444_w4_8bpc_rvv:      255.9 ( 2.84x)
w_mask_444_w8_8bpc_c:       2250.7 ( 1.00x)
w_mask_444_w8_8bpc_rvv:      403.9 ( 5.57x)
w_mask_444_w16_8bpc_c:      7341.4 ( 1.00x)
w_mask_444_w16_8bpc_rvv:     744.7 ( 9.86x)
w_mask_444_w32_8bpc_c:     29658.4 ( 1.00x)
w_mask_444_w32_8bpc_rvv:    2295.9 (12.92x)
w_mask_444_w64_8bpc_c:     70695.9 ( 1.00x)
w_mask_444_w64_8bpc_rvv:    4879.0 (14.49x)
w_mask_444_w128_8bpc_c:   175483.6 ( 1.00x)
w_mask_444_w128_8bpc_rvv:  13021.9 (13.48x)
2025-11-07 00:51:38 +09:00
Sungjoon MoonandJean-Baptiste Kempf 7ba6452b09 Unroll only for top copy
Unrolling bottom copy causes error when bottom_ext isn't even number
For center copy, there's no enough register to unroll
2025-11-05 09:13:31 +01:00
Sungjoon MoonandJean-Baptiste Kempf 7c04792480 riscv64/mc: Add emu_edge function
K230:
checkasm: VLEN=128 bits, using random seed 42
RVV:
 - mc_8bpc.emu_edge             [OK]
checkasm: all 6 tests passed
emu_edge_w4_8bpc_c:        638.8 ( 1.00x)
emu_edge_w4_8bpc_rvv:      211.9 ( 3.01x)
emu_edge_w8_8bpc_c:        944.4 ( 1.00x)
emu_edge_w8_8bpc_rvv:      230.0 ( 4.11x)
emu_edge_w16_8bpc_c:      1447.9 ( 1.00x)
emu_edge_w16_8bpc_rvv:     287.6 ( 5.03x)
emu_edge_w32_8bpc_c:      3047.0 ( 1.00x)
emu_edge_w32_8bpc_rvv:     775.8 ( 3.93x)
emu_edge_w64_8bpc_c:      5440.1 ( 1.00x)
emu_edge_w64_8bpc_rvv:    1504.2 ( 3.62x)
emu_edge_w128_8bpc_c:    12943.4 ( 1.00x)
emu_edge_w128_8bpc_rvv:   4782.5 ( 2.71x)

Spacemit K1:
checkasm: VLEN=256 bits, using random seed 42
RVV:
 - mc_8bpc.emu_edge             [OK]
checkasm: all 6 tests passed
emu_edge_w4_8bpc_c:        562.4 ( 1.00x)
emu_edge_w4_8bpc_rvv:      202.6 ( 2.78x)
emu_edge_w8_8bpc_c:        695.0 ( 1.00x)
emu_edge_w8_8bpc_rvv:      220.1 ( 3.16x)
emu_edge_w16_8bpc_c:      1271.2 ( 1.00x)
emu_edge_w16_8bpc_rvv:     251.3 ( 5.06x)
emu_edge_w32_8bpc_c:      2772.4 ( 1.00x)
emu_edge_w32_8bpc_rvv:     587.8 ( 4.72x)
emu_edge_w64_8bpc_c:      4917.2 ( 1.00x)
emu_edge_w64_8bpc_rvv:    1115.0 ( 4.41x)
emu_edge_w128_8bpc_c:    12634.6 ( 1.00x)
emu_edge_w128_8bpc_rvv:   3232.6 ( 3.91x)
2025-11-05 09:13:31 +01:00
Mikołaj ZalewskiandJean-Baptiste Kempf d26f298ce2 ipred_v implementation in RISC-V assembly 2025-11-05 08:55:30 +01:00
Sungjoon Moon f979959367 Small optimization, there's no actual meaningful difference
w_mask_420_w4_8bpc_c:               310.5 ( 1.00x)
w_mask_420_w4_16bpc_c:              325.7 ( 1.00x)
w_mask_420_w8_8bpc_c:               978.7 ( 1.00x)
w_mask_420_w8_16bpc_c:              998.5 ( 1.00x)
w_mask_420_w16_8bpc_c:             3187.6 ( 1.00x)
w_mask_420_w16_16bpc_c:            3233.2 ( 1.00x)
w_mask_420_w32_8bpc_c:            12658.8 ( 1.00x)
w_mask_420_w32_16bpc_c:           12686.1 ( 1.00x)
w_mask_420_w64_8bpc_c:            31036.4 ( 1.00x)
w_mask_420_w64_16bpc_c:           30643.1 ( 1.00x)
w_mask_420_w128_8bpc_c:           76785.7 ( 1.00x)
w_mask_420_w128_16bpc_c:          77490.5 ( 1.00x)
w_mask_422_w4_8bpc_c:               325.2 ( 1.00x)
w_mask_422_w4_16bpc_c:              342.0 ( 1.00x)
w_mask_422_w8_8bpc_c:              1002.3 ( 1.00x)
w_mask_422_w8_16bpc_c:             1032.4 ( 1.00x)
w_mask_422_w16_8bpc_c:             3267.8 ( 1.00x)
w_mask_422_w16_16bpc_c:            3343.7 ( 1.00x)
w_mask_422_w32_8bpc_c:            12865.4 ( 1.00x)
w_mask_422_w32_16bpc_c:           12998.5 ( 1.00x)
w_mask_422_w64_8bpc_c:            31112.7 ( 1.00x)
w_mask_422_w64_16bpc_c:           31455.4 ( 1.00x)
w_mask_422_w128_8bpc_c:           78796.5 ( 1.00x)
w_mask_422_w128_16bpc_c:          78100.7 ( 1.00x)
w_mask_444_w4_8bpc_c:               315.1 ( 1.00x)
w_mask_444_w4_16bpc_c:              336.8 ( 1.00x)
w_mask_444_w8_8bpc_c:               985.1 ( 1.00x)
w_mask_444_w8_16bpc_c:             1014.9 ( 1.00x)
w_mask_444_w16_8bpc_c:             3216.7 ( 1.00x)
w_mask_444_w16_16bpc_c:            3182.6 ( 1.00x)
w_mask_444_w32_8bpc_c:            12733.8 ( 1.00x)
w_mask_444_w32_16bpc_c:           12432.0 ( 1.00x)
w_mask_444_w64_8bpc_c:            31156.2 ( 1.00x)
w_mask_444_w64_16bpc_c:           31615.3 ( 1.00x)
w_mask_444_w128_8bpc_c:           76031.1 ( 1.00x)
w_mask_444_w128_16bpc_c:          76989.2 ( 1.00x)
2025-11-02 01:33:33 +09:00
Sungjoon Moon 43a7ac13a5 mc_tmpl: optimize w_mask with distributive law
Reduced register usages (maybe) and improved speed (~7%)
Tested on AMD HX370

Function                  |       Before |        After |         % |
---------------------------------------------------------------------
w_mask_420_w4_8bpc_c      |        335.3 |        312.6 |      6.78 |
w_mask_420_w4_16bpc_c     |        354.5 |        326.4 |      7.94 |
w_mask_420_w8_8bpc_c      |       1056.4 |        979.3 |      7.30 |
w_mask_420_w8_16bpc_c     |       1068.2 |        996.4 |      6.73 |
w_mask_420_w16_8bpc_c     |       3416.1 |       3169.6 |      7.22 |
w_mask_420_w16_16bpc_c    |       3435.4 |       3218.0 |      6.34 |
w_mask_420_w32_8bpc_c     |      13479.7 |      12550.0 |      6.91 |
w_mask_420_w32_16bpc_c    |      13833.3 |      12632.7 |      8.68 |
w_mask_420_w64_8bpc_c     |      32557.6 |      30166.7 |      7.35 |
w_mask_420_w64_16bpc_c    |      32529.8 |      30407.0 |      6.54 |
w_mask_420_w128_8bpc_c    |      81802.8 |      75856.5 |      7.27 |
w_mask_420_w128_16bpc_c   |      81187.8 |      76133.9 |      6.23 |
w_mask_422_w4_8bpc_c      |        331.3 |        327.1 |      1.27 |
w_mask_422_w4_16bpc_c     |        365.1 |        341.2 |      6.53 |
w_mask_422_w8_8bpc_c      |       1052.7 |       1003.5 |      4.68 |
w_mask_422_w8_16bpc_c     |       1095.9 |       1022.6 |      6.69 |
w_mask_422_w16_8bpc_c     |       3479.8 |       3248.8 |      6.67 |
w_mask_422_w16_16bpc_c    |       3504.2 |       3279.5 |      6.41 |
w_mask_422_w32_8bpc_c     |      13702.5 |      12801.4 |      6.58 |
w_mask_422_w32_16bpc_c    |      13738.9 |      12830.5 |      6.61 |
w_mask_422_w64_8bpc_c     |      32517.9 |      30818.0 |      5.23 |
w_mask_422_w64_16bpc_c    |      33199.4 |      30865.3 |      7.03 |
w_mask_422_w128_8bpc_c    |      82867.1 |      77978.7 |      5.90 |
w_mask_422_w128_16bpc_c   |      84937.9 |      77629.8 |      8.60 |
w_mask_444_w4_8bpc_c      |        340.4 |        315.6 |      7.28 |
w_mask_444_w4_16bpc_c     |        361.6 |        335.0 |      7.35 |
w_mask_444_w8_8bpc_c      |       1057.6 |        988.9 |      6.50 |
w_mask_444_w8_16bpc_c     |       1104.3 |       1030.8 |      6.67 |
w_mask_444_w16_8bpc_c     |       3414.4 |       3180.7 |      6.85 |
w_mask_444_w16_16bpc_c    |       3477.4 |       3182.4 |      8.48 |
w_mask_444_w32_8bpc_c     |      13455.8 |      12469.4 |      7.33 |
w_mask_444_w32_16bpc_c    |      13666.9 |      12378.8 |      9.42 |
w_mask_444_w64_8bpc_c     |      33587.2 |      31239.7 |      7.00 |
w_mask_444_w64_16bpc_c    |      34283.3 |      30969.5 |      9.67 |
w_mask_444_w128_8bpc_c    |      82084.2 |      76206.3 |      7.16 |
w_mask_444_w128_16bpc_c   |      82649.4 |      75166.4 |      8.91 |
---------------------------------------------------------------------
avg                       |            - |            - |      6.95 |
2025-11-02 01:33:33 +09:00
Henrik Gramner c720f4d355 cli: Fix input_open() memory leak on fopen() failure 1.5.2 2025-10-27 20:44:47 +01:00
Niklas HaasandHenrik Gramner fcbc3d1b93 loopfilter: align Av1FilterLUT struct members
Fixes a bug where the Av1FilterLUT instance used in checkasm was not
aligned properly.

In theory, the first ALIGN macro should imply the latter alignments as well,
but I decided to mark all fields as explicitly aligned for clarity; and
because that's the precedent set in other headers.

Allows us to drop the ALIGN macro on the other usage of this struct.
2025-10-20 13:50:43 +00:00
Jean-Baptiste Kempf f6965b7f12 Update NEWS for nasm 3.00 2025-10-20 11:20:53 +02:00
Adam Sampson 0bc6bd9341 x86: put the memory operand first for test
Older versions of nasm allowed the operands in either order, but nasm
3.00 requires the memory operand to be first as per the spec.
2025-10-05 14:03:24 +01:00
Khalid MasumandRonald S. Bultje af5cf2b1e7 Readme: improve consitency of compilation steps
Currently compilation steps use two different types of methods, manual
build directory creation and using meson setup build to directly create
the build directory. This potential makes the new user who wants to
build docs or cross compile confused about which step of compilation the
user is in. This patch aims to make these steps clear.
2025-08-24 17:26:19 +00:00
Jean-Baptiste Kempf 0558c332ca On the road to 1.5.2
Signed-off-by: Jean-Baptiste Kempf <jb@videolan.org>
2025-08-12 01:23:20 +02:00
Jean-Baptiste Kempf 04faac6900 Update COPYING years 2025-08-12 01:23:14 +02:00
Henrik Gramner 716164239a obu: Improve short-signaling reference frame index calculation
Reduces code size a fair amount, and with some loop unrolling
by the compiler the code becomes nearly branchless.
2025-07-09 14:24:07 +02:00
Henrik Gramner fa30043ba0 obu: Remove redundant zeroing in frame header parsing
The Dav1dFrameHeader struct is already zero-initialized,
so zeroing individual values a second time is redundant.
2025-07-07 16:00:30 +02:00
Matthias Dressel c3f3a7e567 CI: Check --frametimes with msan
This would have caught 583e8e02eb.
2025-07-01 18:35:31 +02:00
Ronald S. Bultje 583e8e02eb tools/dav1d: initialize elapsed
Based on the following comment on IRC:
"<aconz2> the `elapsed` variable in main() is read uninitialized in
 synchronize and makes the first frametime with --frametime incorrect
 I think. Should be initialized to 0"

Confirmed that after initializing to zero, the first line in the file
generated by --frametime is reasonable.
2025-07-01 08:26:31 -04:00
yuanhecai a86d561b79 loongarch: rename looprestoration_tmpl.c
Rename loongarch/looprestoration_tmpl.c to loongarch/looprestoration_inner.c.
Compiling both src/looprestoration_tmpl.c and loongarch/looprestoration_tmpl.c
produces looprestoration_tmpl.c.o, causing a conflict during linking.
2025-06-25 17:25:27 +08:00
yuanhecai 9eea4fe842 loongarch: Fix Clang compilation errors 2025-06-25 17:25:21 +08:00
Henrik Gramner b3c5848f7f loongarch: Use hidden visibility for asm functions 2025-06-07 22:36:38 +02:00
Henrik Gramner 63bf075aad recon: Fix level index calculation optimization for 2D transforms
Due to a typo this was never actually enabled since being added in
5ef6b24. As a result the slow path was always being used.
2025-06-02 15:54:28 +02:00
Henrik Gramner fe0ab51460 Use exact-width integer min/max defines where appropriate
Improves support for niche systems with uncommon integer sizes.
2025-05-29 19:38:49 +02:00
Henrik Gramner 29efbb9496 refmvs: Shrink mfmv_ref arrays
Includes updates to load_tmvs() asm implementations.
2025-05-28 19:01:45 +02:00
Henrik Gramner 68dc20035b refmvs: Shrink refpoc arrays 2025-05-28 19:01:45 +02:00
Henrik Gramner 7889ac7603 cdf: Remove unused eob_hi_bit entries 2025-05-28 02:06:08 +02:00
Matthias Dressel 8d95618093 CI: Build '-mavx' code as debugoptimized
Workaround a GCC 14 bug where it does not insert `vzeroupper` in C code
built without at least '-O2'.
2025-03-10 16:40:35 +01:00
Matthias Dressel edeac873c4 CI: Update images 2025-03-10 16:40:35 +01:00
Matthias Dressel 1d0cda02a6 CI: Update ppc64le image
Since there seems to be a problem with gcc-14 stay on gcc-13 for now.
2025-03-05 21:58:24 +01:00
Gianni Rosato caef968117 refactor: simplify deltaq bitstream parsing logic 2025-02-28 09:28:46 -05:00