Martin Storsjö
b2f9c10670
checkasm: Fix building with MSVC
...
The glue code in our headers, for integrating with the external
checkasm, was incompatible with MSVC.
MSVC has a nonstandard handling of __VA_ARGS__ with macros; when
one macro invokes another macro, __VA_ARGS__ gets treated as one
single parameter and can't map to more than one parameter in the
invoked macro. (In other words, when calling another macro,
__VA_ARGS__ must map in its entirety to a ... parameter of the
other macro.)
Modern versions of MSVC do implement the correct mode as well,
but defaults to the old one for backwards compatibility. To
choose the new mode, we'd have to build our code with
-Zc:preprocessor. That's certainly doable, but it's fairly easy to
avoid the issue as well.
To avoid this issue, change the variadic PIXEL_RECT(...) to explicitly
names its arguments. There's actually no variability in the arguments
involved here. (Alternatively, we could force the preprocessor to expand
the arguments one extra time, avoiding the issue, with e.g.
"#define EXPAND(x) x" and wrapping PIXEL_RECT with it, e.g.
"#define PIXEL_RECT(...) EXPAND(BUF_RECT(pixel, __VA_ARGS__))".)
See [1], [2] and [3] for more discussion on the matter.
[1] https://stackoverflow.com/a/5134656/3115956
[2] https://stackoverflow.com/a/7459803/3115956
[2] https://learn.microsoft.com/en-us/cpp/preprocessor/preprocessor-experimental-overview?view=msvc-160
2026-01-07 11:57:09 +02:00
Niklas Haas
3a2a874994
tests/checkasm: switch to external checkasm
...
There are a number of benefits tied to the upstream / third-party checkasm
version, including:
- Improved long-term maintainability, code reuse with other projects, etc.
- Vastly improved overall performance / runtime for benchmarking, due
primarily to the ability to scale the runtime of each test to that test's
complexity.
- Much more robust statistical analysis of benchmarking results; including
robust outlier rejection, an estimation of the histogram, and the ability
to report the variance / stddev in addition to the (trimmed) mean.
- Interactive HTML and JSON output formats in addition to CSV/TSV.
- More readable and user-friendly output across the board, especially for
failures and data dumps (e.g. also showing errors inside padding bytes).
- Better cross-platform support, including dynamic fallback of timer
implementations on ARM platforms, a better RISC-V harness, and more.
There are multiple approaches to how we can solve the problem of integrating
this third party checkasm into dav1d, but I think the hybrid approach of
loading it as an external dependency, falling back to a meson wrap file,
provides the best overall compromise. This avoids the messiness of git e.g.
git submodules, while still allowing us to pin individual tags.
2026-01-01 17:33:55 +01:00
Niklas Haas
3374404179
tests/checkasm/loopfilter: avoid printf format warning
...
Upstream checkasm adds a printf format attribute to report(), so we should
avoid directly passing the name string to silence a warning.
2026-01-01 12:29:02 +01:00
Jean-Baptiste Kempf
b546257f77
NEWS for 1.5.3
1.5.3
2025-12-31 15:50:45 +01:00
Nathan E. Egge and Jean-Baptiste Kempf
844510cdb4
Add argon bitstream conformance test instructions
2025-12-30 14:56:50 +01:00
Nathan E. Egge
5e8c380e4b
riscv64/mc16: Keep blend_v RVV operations in 16-bits
...
Kendryte K230 Before After Delta
blend_v_w2_16bpc_c: 240.9 ( 1.00x) 240.9 ( 1.00x) 0.00%
blend_v_w2_16bpc_rvv: 149.7 ( 1.61x) 155.4 ( 1.55x) 3.81%
blend_v_w4_16bpc_c: 1072.4 ( 1.00x) 1072.5 ( 1.00x) 0.01%
blend_v_w4_16bpc_rvv: 307.2 ( 3.49x) 299.9 ( 3.58x) -2.38%
blend_v_w8_16bpc_c: 2004.7 ( 1.00x) 2010.2 ( 1.00x) 0.27%
blend_v_w8_16bpc_rvv: 436.1 ( 4.60x) 381.0 ( 5.28x) -12.63%
blend_v_w16_16bpc_c: 3859.4 ( 1.00x) 3853.7 ( 1.00x) -0.15%
blend_v_w16_16bpc_rvv: 761.1 ( 5.07x) 554.0 ( 6.96x) -27.21%
blend_v_w32_16bpc_c: 7509.7 ( 1.00x) 7505.3 ( 1.00x) -0.06%
blend_v_w32_16bpc_rvv: 1427.1 ( 5.26x) 1005.5 ( 7.46x) -29.54%
SpacemiT K1 Before After Delta
blend_v_w2_16bpc_c: 220.1 ( 1.00x) 222.0 ( 1.00x) 0.86%
blend_v_w2_16bpc_rvv: 146.6 ( 1.50x) 151.1 ( 1.47x) 3.07%
blend_v_w4_16bpc_c: 968.3 ( 1.00x) 969.6 ( 1.00x) 0.13%
blend_v_w4_16bpc_rvv: 281.2 ( 3.44x) 290.2 ( 3.34x) 3.20%
blend_v_w8_16bpc_c: 1809.5 ( 1.00x) 1812.1 ( 1.00x) 0.14%
blend_v_w8_16bpc_rvv: 374.2 ( 4.84x) 375.3 ( 4.83x) 0.29%
blend_v_w16_16bpc_c: 3479.7 ( 1.00x) 3480.9 ( 1.00x) 0.03%
blend_v_w16_16bpc_rvv: 521.5 ( 6.67x) 465.9 ( 7.47x) -10.66%
blend_v_w32_16bpc_c: 6767.9 ( 1.00x) 6773.7 ( 1.00x) 0.09%
blend_v_w32_16bpc_rvv: 852.1 ( 7.94x) 727.4 ( 9.31x) -14.63%
Blackhole p100a Before After Delta
blend_v_w2_16bpc_c: 205.6 ( 1.00x) 206.0 ( 1.00x) 0.19%
blend_v_w2_16bpc_rvv: 176.5 ( 1.16x) 143.6 ( 1.44x) -18.64%
blend_v_w4_16bpc_c: 901.0 ( 1.00x) 891.8 ( 1.00x) -1.02%
blend_v_w4_16bpc_rvv: 298.8 ( 3.02x) 235.2 ( 3.79x) -21.29%
blend_v_w8_16bpc_c: 1663.3 ( 1.00x) 1656.5 ( 1.00x) -0.41%
blend_v_w8_16bpc_rvv: 300.1 ( 5.54x) 236.4 ( 7.01x) -21.23%
blend_v_w16_16bpc_c: 3192.1 ( 1.00x) 3182.3 ( 1.00x) -0.31%
blend_v_w16_16bpc_rvv: 349.2 ( 9.14x) 311.4 (10.22x) -10.82%
blend_v_w32_16bpc_c: 6259.2 ( 1.00x) 6257.8 ( 1.00x) -0.02%
blend_v_w32_16bpc_rvv: 350.2 (17.88x) 321.8 (19.44x) -8.11%
2025-12-30 13:47:49 +00:00
Nathan E. Egge
d2fa9466be
riscv64/mc16: Keep blend RVV operations in 16-bits
...
Kendryte K230 Before After Delta
blend_w4_16bpc_c: 227.0 ( 1.00x) 227.1 ( 1.00x) 0.04%
blend_w4_16bpc_rvv: 71.1 ( 3.19x) 73.2 ( 3.10x) 2.95%
blend_w8_16bpc_c: 662.5 ( 1.00x) 662.7 ( 1.00x) 0.03%
blend_w8_16bpc_rvv: 132.4 ( 5.00x) 115.0 ( 5.76x) -13.14%
blend_w16_16bpc_c: 2559.3 ( 1.00x) 2559.8 ( 1.00x) 0.02%
blend_w16_16bpc_rvv: 416.1 ( 6.15x) 326.7 ( 7.83x) -21.49%
blend_w32_16bpc_c: 6483.9 ( 1.00x) 6484.5 ( 1.00x) 0.01%
blend_w32_16bpc_rvv: 1029.1 ( 6.30x) 774.7 ( 8.37x) -24.72%
SpacemiT K1 Before After Delta
blend_w4_16bpc_c: 206.1 ( 1.00x) 207.0 ( 1.00x) 0.44%
blend_w4_16bpc_rvv: 64.4 ( 3.20x) 69.5 ( 2.98x) 7.92%
blend_w8_16bpc_c: 600.2 ( 1.00x) 600.9 ( 1.00x) 0.12%
blend_w8_16bpc_rvv: 101.6 ( 5.91x) 106.9 ( 5.62x) 5.22%
blend_w16_16bpc_c: 2316.0 ( 1.00x) 2316.4 ( 1.00x) 0.02%
blend_w16_16bpc_rvv: 261.8 ( 8.85x) 229.1 (10.11x) -12.49%
blend_w32_16bpc_c: 5861.1 ( 1.00x) 5860.4 ( 1.00x) -0.01%
blend_w32_16bpc_rvv: 602.9 ( 9.72x) 475.3 (12.33x) -21.16%
Blackhole p100a Before After Delta
blend_w4_16bpc_c: 193.3 ( 1.00x) 191.3 ( 1.00x) -1.03%
blend_w4_16bpc_rvv: 66.3 ( 2.91x) 65.4 ( 2.92x) -1.36%
blend_w8_16bpc_c: 552.0 ( 1.00x) 549.8 ( 1.00x) -0.40%
blend_w8_16bpc_rvv: 100.5 ( 5.49x) 96.2 ( 5.71x) -4.28%
blend_w16_16bpc_c: 2112.5 ( 1.00x) 2111.8 ( 1.00x) -0.03%
blend_w16_16bpc_rvv: 190.3 (11.10x) 185.9 (11.36x) -2.31%
blend_w32_16bpc_c: 5417.5 ( 1.00x) 5416.2 ( 1.00x) -0.02%
blend_w32_16bpc_rvv: 290.3 (18.66x) 304.0 (17.82x) 4.72%
2025-12-30 13:47:49 +00:00
Kyle Siefring
5f7f8ff5b6
cdef: consolidate left edge backup state
2025-12-28 22:48:27 -05:00
Nathan E. Egge
a31e4bd757
riscv64/mc16: Add VLEN=512 16bpc RVV blend_{,v} functions
...
Blackhole p100a Before After Delta
blend_w4_16bpc_c: 193.1 ( 1.00x) 186.8 ( 1.00x) -3.26%
blend_w4_16bpc_rvv: 64.8 ( 2.98x) 62.8 ( 2.97x) -3.09%
blend_w8_16bpc_c: 551.0 ( 1.00x) 546.0 ( 1.00x) -0.91%
blend_w8_16bpc_rvv: 96.2 ( 5.73x) 93.4 ( 5.85x) -2.91%
blend_w16_16bpc_c: 2111.6 ( 1.00x) 2107.0 ( 1.00x) -0.22%
blend_w16_16bpc_rvv: 189.9 (11.12x) 189.6 (11.11x) -0.16%
blend_w32_16bpc_c: 5403.9 ( 1.00x) 5398.5 ( 1.00x) -0.10%
blend_w32_16bpc_rvv: 292.4 (18.48x) 291.5 (18.52x) -0.31%
blend_v_w2_16bpc_c: 209.1 ( 1.00x) 208.7 ( 1.00x) -0.19%
blend_v_w2_16bpc_rvv: 180.3 ( 1.16x) 180.4 ( 1.16x) 0.06%
blend_v_w4_16bpc_c: 896.9 ( 1.00x) 898.5 ( 1.00x) 0.18%
blend_v_w4_16bpc_rvv: 303.0 ( 2.96x) 302.5 ( 2.97x) -0.17%
blend_v_w8_16bpc_c: 1658.9 ( 1.00x) 1663.1 ( 1.00x) 0.25%
blend_v_w8_16bpc_rvv: 303.0 ( 5.47x) 302.6 ( 5.50x) -0.13%
blend_v_w16_16bpc_c: 3186.0 ( 1.00x) 3182.7 ( 1.00x) -0.10%
blend_v_w16_16bpc_rvv: 313.1 (10.17x) 312.1 (10.20x) -0.32%
blend_v_w32_16bpc_c: 6253.9 ( 1.00x) 6257.0 ( 1.00x) 0.05%
blend_v_w32_16bpc_rvv: 355.4 (17.60x) 353.2 (17.72x) -0.62%
2025-12-24 01:08:19 +00:00
Marvin Scholz and Jean-Baptiste Kempf
bf82cfa74e
meson: clarify xxhash error message
...
We do not need to mention the details of the check in the message
as those are already logged by meson when doing the check. Instead
mention why this is an error to make it more clear it is related to
the xxhash_muxer option.
Fix #397
2025-12-23 10:12:58 +01:00
Marvin Scholz and Jean-Baptiste Kempf
549a6a0983
meson: leave tools subdir early when tools are disabled
...
Only dav1d_input_objs are needed by dav1dplay, so we can leave this
directory early and avoid looking for xxhash when it is actually
never used.
2025-12-23 10:12:58 +01:00
Nathan E. Egge
38dd16e108
riscv64/mc: Add VLEN=512 8bpc RVV blend_{,h,v} functions
...
Blackhole p100a Before After Delta
blend_w4_8bpc_c: 190.7 ( 1.00x) 189.3 ( 1.00x) -0.73%
blend_w4_8bpc_rvv: 61.2 ( 3.12x) 59.7 ( 3.17x) -2.45%
blend_w8_8bpc_c: 550.7 ( 1.00x) 547.0 ( 1.00x) -0.67%
blend_w8_8bpc_rvv: 91.0 ( 6.05x) 89.4 ( 6.12x) -1.76%
blend_w16_8bpc_c: 2112.4 ( 1.00x) 2106.8 ( 1.00x) -0.27%
blend_w16_8bpc_rvv: 177.1 (11.92x) 174.8 (12.05x) -1.30%
blend_w32_8bpc_c: 5423.8 ( 1.00x) 5393.8 ( 1.00x) -0.55%
blend_w32_8bpc_rvv: 233.5 (23.23x) 230.7 (23.38x) -1.20%
blend_h_w2_8bpc_c: 126.4 ( 1.00x) 128.0 ( 1.00x) 1.27%
blend_h_w2_8bpc_rvv: 85.0 ( 1.49x) 81.2 ( 1.58x) -4.47%
blend_h_w4_8bpc_c: 221.2 ( 1.00x) 222.2 ( 1.00x) 0.45%
blend_h_w4_8bpc_rvv: 84.3 ( 2.62x) 81.3 ( 2.73x) -3.56%
blend_h_w8_8bpc_c: 411.9 ( 1.00x) 413.3 ( 1.00x) 0.34%
blend_h_w8_8bpc_rvv: 84.2 ( 4.89x) 81.0 ( 5.10x) -3.80%
blend_h_w16_8bpc_c: 792.6 ( 1.00x) 793.5 ( 1.00x) 0.11%
blend_h_w16_8bpc_rvv: 84.5 ( 9.38x) 81.5 ( 9.74x) -3.55%
blend_h_w32_8bpc_c: 1577.7 ( 1.00x) 1578.8 ( 1.00x) 0.07%
blend_h_w32_8bpc_rvv: 86.6 (18.21x) 83.5 (18.90x) -3.58%
blend_h_w64_8bpc_c: 3099.5 ( 1.00x) 3101.9 ( 1.00x) 0.08%
blend_h_w64_8bpc_rvv: 98.4 (31.49x) 95.2 (32.58x) -3.25%
blend_h_w128_8bpc_c: 7496.9 ( 1.00x) 7498.1 ( 1.00x) 0.02%
blend_h_w128_8bpc_rvv: 155.4 (48.24x) 151.5 (49.50x) -2.51%
blend_v_w2_8bpc_c: 202.9 ( 1.00x) 203.5 ( 1.00x) 0.30%
blend_v_w2_8bpc_rvv: 173.5 ( 1.17x) 176.6 ( 1.15x) 1.79%
blend_v_w4_8bpc_c: 842.3 ( 1.00x) 844.2 ( 1.00x) 0.23%
blend_v_w4_8bpc_rvv: 295.9 ( 2.85x) 299.0 ( 2.82x) 1.05%
blend_v_w8_8bpc_c: 1589.9 ( 1.00x) 1592.1 ( 1.00x) 0.14%
blend_v_w8_8bpc_rvv: 296.2 ( 5.37x) 299.0 ( 5.32x) 0.95%
blend_v_w16_8bpc_c: 3090.3 ( 1.00x) 3088.3 ( 1.00x) -0.06%
blend_v_w16_8bpc_rvv: 296.0 (10.44x) 299.4 (10.32x) 1.15%
blend_v_w32_8bpc_c: 6080.2 ( 1.00x) 6081.5 ( 1.00x) 0.02%
blend_v_w32_8bpc_rvv: 306.3 (19.85x) 309.3 (19.66x) 0.98%
2025-12-22 23:20:23 +00:00
Henrik Gramner
43f3b8d33b
checkasm: Group itx functions by their largest dimension
...
This reduces the number of itx reports per instruction set from
19 to 5, which avoids excessively flooding the console output.
2025-12-09 22:02:52 +01:00
Henrik Gramner
165e9e251b
checkasm: Only run DC-only itx tests for dct_dct
2025-12-09 21:01:10 +01:00
Cameron Cawley
84792e61c8
dav1dplay: Print more error messages when window/context creation fails
2025-11-27 20:49:55 +00:00
Cameron Cawley
e60603a9f2
dav1dplay: Ensure a newer OpenGL version is used when creating the context
2025-11-27 20:49:41 +00:00
Tristan Matthews and Henrik Gramner
f3a1070f25
input/ivf: handle files with 0 frames
...
This avoids a subsequent division by zero.
2025-11-25 12:49:05 +00:00
Cameron Cawley
28b165940d
Use CLOCK_REALTIME for providing the initial seed value
...
CLOCK_MONOTONIC is specified as returning time "since an unspecified point in the past". On RISC OS with UnixLib this returns the time since the last hard reset, but with SharedCLibrary this returns the time since the program started - combined with the coarse resolution used internally, this almost always results in a seed of 0.
CLOCK_REALTIME meanwhile is specified as returning time since the epoch, so it should behave consistently across all platforms.
2025-11-22 23:30:02 +00:00
Un1q32
04d588ee94
allow builds on systems without a supported memalign function
2025-11-19 03:15:27 -05:00
Martin Storsjö
e7c280e4cd
x86: Sync the latest upstream version of x86inc.asm
2025-11-12 23:19:22 +02:00
Martin Storsjö
2eac05d648
checkasm: arm: Use X() instead of inline ifdefs
...
This works fine when the referenced symbol has the same prefix
as PRIVATE_PREFIX in the same file; otherwise we could also
create a macro like X() that only prepends the extern symbol
prefix but no symbol namespace prefix.
2025-11-12 15:54:40 +02:00
Sungjoon Moon
6deac59d1e
riscv64/mc: Add w_mask functions
...
K230:
checkasm: VLEN=128 bits, using random seed 42
RVV:
- mc_8bpc.w_mask [OK]
checkasm: all 18 tests passed
w_mask_420_w4_8bpc_c: 845.1 ( 1.00x)
w_mask_420_w4_8bpc_rvv: 313.1 ( 2.70x)
w_mask_420_w8_8bpc_c: 2589.9 ( 1.00x)
w_mask_420_w8_8bpc_rvv: 549.3 ( 4.72x)
w_mask_420_w16_8bpc_c: 8389.9 ( 1.00x)
w_mask_420_w16_8bpc_rvv: 1250.4 ( 6.71x)
w_mask_420_w32_8bpc_c: 33485.7 ( 1.00x)
w_mask_420_w32_8bpc_rvv: 4276.9 ( 7.83x)
w_mask_420_w64_8bpc_c: 81934.2 ( 1.00x)
w_mask_420_w64_8bpc_rvv: 11243.9 ( 7.29x)
w_mask_420_w128_8bpc_c: 205865.8 ( 1.00x)
w_mask_420_w128_8bpc_rvv: 28098.0 ( 7.33x)
w_mask_422_w4_8bpc_c: 838.6 ( 1.00x)
w_mask_422_w4_8bpc_rvv: 315.9 ( 2.65x)
w_mask_422_w8_8bpc_c: 2576.4 ( 1.00x)
w_mask_422_w8_8bpc_rvv: 564.2 ( 4.57x)
w_mask_422_w16_8bpc_c: 8378.7 ( 1.00x)
w_mask_422_w16_8bpc_rvv: 1305.4 ( 6.42x)
w_mask_422_w32_8bpc_c: 33512.4 ( 1.00x)
w_mask_422_w32_8bpc_rvv: 4487.6 ( 7.47x)
w_mask_422_w64_8bpc_c: 82489.8 ( 1.00x)
w_mask_422_w64_8bpc_rvv: 11895.3 ( 6.93x)
w_mask_422_w128_8bpc_c: 207116.2 ( 1.00x)
w_mask_422_w128_8bpc_rvv: 29541.4 ( 7.01x)
w_mask_444_w4_8bpc_c: 822.7 ( 1.00x)
w_mask_444_w4_8bpc_rvv: 265.3 ( 3.10x)
w_mask_444_w8_8bpc_c: 2542.5 ( 1.00x)
w_mask_444_w8_8bpc_rvv: 429.2 ( 5.92x)
w_mask_444_w16_8bpc_c: 8290.8 ( 1.00x)
w_mask_444_w16_8bpc_rvv: 965.7 ( 8.59x)
w_mask_444_w32_8bpc_c: 33229.6 ( 1.00x)
w_mask_444_w32_8bpc_rvv: 3289.2 (10.10x)
w_mask_444_w64_8bpc_c: 81404.6 ( 1.00x)
w_mask_444_w64_8bpc_rvv: 9126.6 ( 8.92x)
w_mask_444_w128_8bpc_c: 204438.4 ( 1.00x)
w_mask_444_w128_8bpc_rvv: 22424.9 ( 9.12x)
Spacemit K1:
checkasm: VLEN=256 bits, using random seed 42
RVV:
- mc_8bpc.w_mask [OK]
checkasm: all 18 tests passed
w_mask_420_w4_8bpc_c: 747.9 ( 1.00x)
w_mask_420_w4_8bpc_rvv: 290.4 ( 2.58x)
w_mask_420_w8_8bpc_c: 2312.3 ( 1.00x)
w_mask_420_w8_8bpc_rvv: 478.9 ( 4.83x)
w_mask_420_w16_8bpc_c: 7509.3 ( 1.00x)
w_mask_420_w16_8bpc_rvv: 885.2 ( 8.48x)
w_mask_420_w32_8bpc_c: 30087.8 ( 1.00x)
w_mask_420_w32_8bpc_rvv: 2595.6 (11.59x)
w_mask_420_w64_8bpc_c: 72313.0 ( 1.00x)
w_mask_420_w64_8bpc_rvv: 6020.9 (12.01x)
w_mask_420_w128_8bpc_c: 179297.0 ( 1.00x)
w_mask_420_w128_8bpc_rvv: 15659.1 (11.45x)
w_mask_422_w4_8bpc_c: 735.0 ( 1.00x)
w_mask_422_w4_8bpc_rvv: 299.0 ( 2.46x)
w_mask_422_w8_8bpc_c: 2285.6 ( 1.00x)
w_mask_422_w8_8bpc_rvv: 488.5 ( 4.68x)
w_mask_422_w16_8bpc_c: 7459.3 ( 1.00x)
w_mask_422_w16_8bpc_rvv: 946.3 ( 7.88x)
w_mask_422_w32_8bpc_c: 29996.7 ( 1.00x)
w_mask_422_w32_8bpc_rvv: 2812.7 (10.66x)
w_mask_422_w64_8bpc_c: 71809.4 ( 1.00x)
w_mask_422_w64_8bpc_rvv: 6253.7 (11.48x)
w_mask_422_w128_8bpc_c: 178081.9 ( 1.00x)
w_mask_422_w128_8bpc_rvv: 16087.8 (11.07x)
w_mask_444_w4_8bpc_c: 726.2 ( 1.00x)
w_mask_444_w4_8bpc_rvv: 255.9 ( 2.84x)
w_mask_444_w8_8bpc_c: 2250.7 ( 1.00x)
w_mask_444_w8_8bpc_rvv: 403.9 ( 5.57x)
w_mask_444_w16_8bpc_c: 7341.4 ( 1.00x)
w_mask_444_w16_8bpc_rvv: 744.7 ( 9.86x)
w_mask_444_w32_8bpc_c: 29658.4 ( 1.00x)
w_mask_444_w32_8bpc_rvv: 2295.9 (12.92x)
w_mask_444_w64_8bpc_c: 70695.9 ( 1.00x)
w_mask_444_w64_8bpc_rvv: 4879.0 (14.49x)
w_mask_444_w128_8bpc_c: 175483.6 ( 1.00x)
w_mask_444_w128_8bpc_rvv: 13021.9 (13.48x)
2025-11-07 00:51:38 +09:00
Sungjoon Moon and Jean-Baptiste Kempf
7ba6452b09
Unroll only for top copy
...
Unrolling bottom copy causes error when bottom_ext isn't even number
For center copy, there's no enough register to unroll
2025-11-05 09:13:31 +01:00
Sungjoon Moon and Jean-Baptiste Kempf
7c04792480
riscv64/mc: Add emu_edge function
...
K230:
checkasm: VLEN=128 bits, using random seed 42
RVV:
- mc_8bpc.emu_edge [OK]
checkasm: all 6 tests passed
emu_edge_w4_8bpc_c: 638.8 ( 1.00x)
emu_edge_w4_8bpc_rvv: 211.9 ( 3.01x)
emu_edge_w8_8bpc_c: 944.4 ( 1.00x)
emu_edge_w8_8bpc_rvv: 230.0 ( 4.11x)
emu_edge_w16_8bpc_c: 1447.9 ( 1.00x)
emu_edge_w16_8bpc_rvv: 287.6 ( 5.03x)
emu_edge_w32_8bpc_c: 3047.0 ( 1.00x)
emu_edge_w32_8bpc_rvv: 775.8 ( 3.93x)
emu_edge_w64_8bpc_c: 5440.1 ( 1.00x)
emu_edge_w64_8bpc_rvv: 1504.2 ( 3.62x)
emu_edge_w128_8bpc_c: 12943.4 ( 1.00x)
emu_edge_w128_8bpc_rvv: 4782.5 ( 2.71x)
Spacemit K1:
checkasm: VLEN=256 bits, using random seed 42
RVV:
- mc_8bpc.emu_edge [OK]
checkasm: all 6 tests passed
emu_edge_w4_8bpc_c: 562.4 ( 1.00x)
emu_edge_w4_8bpc_rvv: 202.6 ( 2.78x)
emu_edge_w8_8bpc_c: 695.0 ( 1.00x)
emu_edge_w8_8bpc_rvv: 220.1 ( 3.16x)
emu_edge_w16_8bpc_c: 1271.2 ( 1.00x)
emu_edge_w16_8bpc_rvv: 251.3 ( 5.06x)
emu_edge_w32_8bpc_c: 2772.4 ( 1.00x)
emu_edge_w32_8bpc_rvv: 587.8 ( 4.72x)
emu_edge_w64_8bpc_c: 4917.2 ( 1.00x)
emu_edge_w64_8bpc_rvv: 1115.0 ( 4.41x)
emu_edge_w128_8bpc_c: 12634.6 ( 1.00x)
emu_edge_w128_8bpc_rvv: 3232.6 ( 3.91x)
2025-11-05 09:13:31 +01:00
Mikołaj Zalewski and Jean-Baptiste Kempf
d26f298ce2
ipred_v implementation in RISC-V assembly
2025-11-05 08:55:30 +01:00
Sungjoon Moon
f979959367
Small optimization, there's no actual meaningful difference
...
w_mask_420_w4_8bpc_c: 310.5 ( 1.00x)
w_mask_420_w4_16bpc_c: 325.7 ( 1.00x)
w_mask_420_w8_8bpc_c: 978.7 ( 1.00x)
w_mask_420_w8_16bpc_c: 998.5 ( 1.00x)
w_mask_420_w16_8bpc_c: 3187.6 ( 1.00x)
w_mask_420_w16_16bpc_c: 3233.2 ( 1.00x)
w_mask_420_w32_8bpc_c: 12658.8 ( 1.00x)
w_mask_420_w32_16bpc_c: 12686.1 ( 1.00x)
w_mask_420_w64_8bpc_c: 31036.4 ( 1.00x)
w_mask_420_w64_16bpc_c: 30643.1 ( 1.00x)
w_mask_420_w128_8bpc_c: 76785.7 ( 1.00x)
w_mask_420_w128_16bpc_c: 77490.5 ( 1.00x)
w_mask_422_w4_8bpc_c: 325.2 ( 1.00x)
w_mask_422_w4_16bpc_c: 342.0 ( 1.00x)
w_mask_422_w8_8bpc_c: 1002.3 ( 1.00x)
w_mask_422_w8_16bpc_c: 1032.4 ( 1.00x)
w_mask_422_w16_8bpc_c: 3267.8 ( 1.00x)
w_mask_422_w16_16bpc_c: 3343.7 ( 1.00x)
w_mask_422_w32_8bpc_c: 12865.4 ( 1.00x)
w_mask_422_w32_16bpc_c: 12998.5 ( 1.00x)
w_mask_422_w64_8bpc_c: 31112.7 ( 1.00x)
w_mask_422_w64_16bpc_c: 31455.4 ( 1.00x)
w_mask_422_w128_8bpc_c: 78796.5 ( 1.00x)
w_mask_422_w128_16bpc_c: 78100.7 ( 1.00x)
w_mask_444_w4_8bpc_c: 315.1 ( 1.00x)
w_mask_444_w4_16bpc_c: 336.8 ( 1.00x)
w_mask_444_w8_8bpc_c: 985.1 ( 1.00x)
w_mask_444_w8_16bpc_c: 1014.9 ( 1.00x)
w_mask_444_w16_8bpc_c: 3216.7 ( 1.00x)
w_mask_444_w16_16bpc_c: 3182.6 ( 1.00x)
w_mask_444_w32_8bpc_c: 12733.8 ( 1.00x)
w_mask_444_w32_16bpc_c: 12432.0 ( 1.00x)
w_mask_444_w64_8bpc_c: 31156.2 ( 1.00x)
w_mask_444_w64_16bpc_c: 31615.3 ( 1.00x)
w_mask_444_w128_8bpc_c: 76031.1 ( 1.00x)
w_mask_444_w128_16bpc_c: 76989.2 ( 1.00x)
2025-11-02 01:33:33 +09:00
Sungjoon Moon
43a7ac13a5
mc_tmpl: optimize w_mask with distributive law
...
Reduced register usages (maybe) and improved speed (~7%)
Tested on AMD HX370
Function | Before | After | % |
---------------------------------------------------------------------
w_mask_420_w4_8bpc_c | 335.3 | 312.6 | 6.78 |
w_mask_420_w4_16bpc_c | 354.5 | 326.4 | 7.94 |
w_mask_420_w8_8bpc_c | 1056.4 | 979.3 | 7.30 |
w_mask_420_w8_16bpc_c | 1068.2 | 996.4 | 6.73 |
w_mask_420_w16_8bpc_c | 3416.1 | 3169.6 | 7.22 |
w_mask_420_w16_16bpc_c | 3435.4 | 3218.0 | 6.34 |
w_mask_420_w32_8bpc_c | 13479.7 | 12550.0 | 6.91 |
w_mask_420_w32_16bpc_c | 13833.3 | 12632.7 | 8.68 |
w_mask_420_w64_8bpc_c | 32557.6 | 30166.7 | 7.35 |
w_mask_420_w64_16bpc_c | 32529.8 | 30407.0 | 6.54 |
w_mask_420_w128_8bpc_c | 81802.8 | 75856.5 | 7.27 |
w_mask_420_w128_16bpc_c | 81187.8 | 76133.9 | 6.23 |
w_mask_422_w4_8bpc_c | 331.3 | 327.1 | 1.27 |
w_mask_422_w4_16bpc_c | 365.1 | 341.2 | 6.53 |
w_mask_422_w8_8bpc_c | 1052.7 | 1003.5 | 4.68 |
w_mask_422_w8_16bpc_c | 1095.9 | 1022.6 | 6.69 |
w_mask_422_w16_8bpc_c | 3479.8 | 3248.8 | 6.67 |
w_mask_422_w16_16bpc_c | 3504.2 | 3279.5 | 6.41 |
w_mask_422_w32_8bpc_c | 13702.5 | 12801.4 | 6.58 |
w_mask_422_w32_16bpc_c | 13738.9 | 12830.5 | 6.61 |
w_mask_422_w64_8bpc_c | 32517.9 | 30818.0 | 5.23 |
w_mask_422_w64_16bpc_c | 33199.4 | 30865.3 | 7.03 |
w_mask_422_w128_8bpc_c | 82867.1 | 77978.7 | 5.90 |
w_mask_422_w128_16bpc_c | 84937.9 | 77629.8 | 8.60 |
w_mask_444_w4_8bpc_c | 340.4 | 315.6 | 7.28 |
w_mask_444_w4_16bpc_c | 361.6 | 335.0 | 7.35 |
w_mask_444_w8_8bpc_c | 1057.6 | 988.9 | 6.50 |
w_mask_444_w8_16bpc_c | 1104.3 | 1030.8 | 6.67 |
w_mask_444_w16_8bpc_c | 3414.4 | 3180.7 | 6.85 |
w_mask_444_w16_16bpc_c | 3477.4 | 3182.4 | 8.48 |
w_mask_444_w32_8bpc_c | 13455.8 | 12469.4 | 7.33 |
w_mask_444_w32_16bpc_c | 13666.9 | 12378.8 | 9.42 |
w_mask_444_w64_8bpc_c | 33587.2 | 31239.7 | 7.00 |
w_mask_444_w64_16bpc_c | 34283.3 | 30969.5 | 9.67 |
w_mask_444_w128_8bpc_c | 82084.2 | 76206.3 | 7.16 |
w_mask_444_w128_16bpc_c | 82649.4 | 75166.4 | 8.91 |
---------------------------------------------------------------------
avg | - | - | 6.95 |
2025-11-02 01:33:33 +09:00
Henrik Gramner
c720f4d355
cli: Fix input_open() memory leak on fopen() failure
1.5.2
2025-10-27 20:44:47 +01:00
Niklas Haas and Henrik Gramner
fcbc3d1b93
loopfilter: align Av1FilterLUT struct members
...
Fixes a bug where the Av1FilterLUT instance used in checkasm was not
aligned properly.
In theory, the first ALIGN macro should imply the latter alignments as well,
but I decided to mark all fields as explicitly aligned for clarity; and
because that's the precedent set in other headers.
Allows us to drop the ALIGN macro on the other usage of this struct.
2025-10-20 13:50:43 +00:00
Jean-Baptiste Kempf
f6965b7f12
Update NEWS for nasm 3.00
2025-10-20 11:20:53 +02:00
Adam Sampson
0bc6bd9341
x86: put the memory operand first for test
...
Older versions of nasm allowed the operands in either order, but nasm
3.00 requires the memory operand to be first as per the spec.
2025-10-05 14:03:24 +01:00
Khalid Masum and Ronald S. Bultje
af5cf2b1e7
Readme: improve consitency of compilation steps
...
Currently compilation steps use two different types of methods, manual
build directory creation and using meson setup build to directly create
the build directory. This potential makes the new user who wants to
build docs or cross compile confused about which step of compilation the
user is in. This patch aims to make these steps clear.
2025-08-24 17:26:19 +00:00
Jean-Baptiste Kempf
0558c332ca
On the road to 1.5.2
...
Signed-off-by: Jean-Baptiste Kempf <jb@videolan.org >
2025-08-12 01:23:20 +02:00
Jean-Baptiste Kempf
04faac6900
Update COPYING years
2025-08-12 01:23:14 +02:00
Henrik Gramner
716164239a
obu: Improve short-signaling reference frame index calculation
...
Reduces code size a fair amount, and with some loop unrolling
by the compiler the code becomes nearly branchless.
2025-07-09 14:24:07 +02:00
Henrik Gramner
fa30043ba0
obu: Remove redundant zeroing in frame header parsing
...
The Dav1dFrameHeader struct is already zero-initialized,
so zeroing individual values a second time is redundant.
2025-07-07 16:00:30 +02:00
Matthias Dressel
c3f3a7e567
CI: Check --frametimes with msan
...
This would have caught 583e8e02eb .
2025-07-01 18:35:31 +02:00
Ronald S. Bultje
583e8e02eb
tools/dav1d: initialize elapsed
...
Based on the following comment on IRC:
"<aconz2> the `elapsed` variable in main() is read uninitialized in
synchronize and makes the first frametime with --frametime incorrect
I think. Should be initialized to 0"
Confirmed that after initializing to zero, the first line in the file
generated by --frametime is reasonable.
2025-07-01 08:26:31 -04:00
yuanhecai
a86d561b79
loongarch: rename looprestoration_tmpl.c
...
Rename loongarch/looprestoration_tmpl.c to loongarch/looprestoration_inner.c.
Compiling both src/looprestoration_tmpl.c and loongarch/looprestoration_tmpl.c
produces looprestoration_tmpl.c.o, causing a conflict during linking.
2025-06-25 17:25:27 +08:00
yuanhecai
9eea4fe842
loongarch: Fix Clang compilation errors
2025-06-25 17:25:21 +08:00
Henrik Gramner
b3c5848f7f
loongarch: Use hidden visibility for asm functions
2025-06-07 22:36:38 +02:00
Henrik Gramner
63bf075aad
recon: Fix level index calculation optimization for 2D transforms
...
Due to a typo this was never actually enabled since being added in
5ef6b24 . As a result the slow path was always being used.
2025-06-02 15:54:28 +02:00
Henrik Gramner
fe0ab51460
Use exact-width integer min/max defines where appropriate
...
Improves support for niche systems with uncommon integer sizes.
2025-05-29 19:38:49 +02:00
Henrik Gramner
29efbb9496
refmvs: Shrink mfmv_ref arrays
...
Includes updates to load_tmvs() asm implementations.
2025-05-28 19:01:45 +02:00
Henrik Gramner
68dc20035b
refmvs: Shrink refpoc arrays
2025-05-28 19:01:45 +02:00
Henrik Gramner
7889ac7603
cdf: Remove unused eob_hi_bit entries
2025-05-28 02:06:08 +02:00
Matthias Dressel
8d95618093
CI: Build '-mavx' code as debugoptimized
...
Workaround a GCC 14 bug where it does not insert `vzeroupper` in C code
built without at least '-O2'.
2025-03-10 16:40:35 +01:00
Matthias Dressel
edeac873c4
CI: Update images
2025-03-10 16:40:35 +01:00
Matthias Dressel
1d0cda02a6
CI: Update ppc64le image
...
Since there seems to be a problem with gcc-14 stay on gcc-13 for now.
2025-03-05 21:58:24 +01:00
Gianni Rosato
caef968117
refactor: simplify deltaq bitstream parsing logic
2025-02-28 09:28:46 -05:00