Nathan E. Egge
e542f661d0
meson: Move riscv64 8bpc only files into bitdepth sources
...
The cdef.S, itx.S and mc.S files contain only 8bpc implementations and
should be compiled only when building with -Dbitdepths=8 configuration.
2024-10-29 12:17:14 +00:00
Nathan E. Egge and Luca Barbato
ca489d8aab
riscv64/mc16: Add 16bpc RVV blend function
...
Kendryte K230
blend_w4_16bpc_c: 214.4 ( 1.00x)
blend_w4_16bpc_rvv: 90.2 ( 2.38x)
blend_w8_16bpc_c: 618.9 ( 1.00x)
blend_w8_16bpc_rvv: 147.4 ( 4.20x)
blend_w16_16bpc_c: 2376.5 ( 1.00x)
blend_w16_16bpc_rvv: 466.0 ( 5.10x)
blend_w32_16bpc_c: 6008.6 ( 1.00x)
blend_w32_16bpc_rvv: 985.0 ( 6.10x)
SpacemiT K1
blend_w4_16bpc_c: 204.9 ( 1.00x)
blend_w4_16bpc_rvv: 88.3 ( 2.32x)
blend_w8_16bpc_c: 598.5 ( 1.00x)
blend_w8_16bpc_rvv: 155.3 ( 3.85x)
blend_w16_16bpc_c: 2315.4 ( 1.00x)
blend_w16_16bpc_rvv: 444.4 ( 5.21x)
blend_w32_16bpc_c: 5860.1 ( 1.00x)
blend_w32_16bpc_rvv: 993.0 ( 5.90x)
2024-10-29 08:21:53 +00:00
Nathan E. Egge
22e9c0fee3
riscv64/ipred16: Fix build error with -Dbitdepths=16
...
When configuring and building dav1d with just the 16bp code paths using
meson setup .. -Dbitdepths=16 there is an undefined reference to
dav1d_dc_gen_8bpc_rvv due to a typo in src/riscv/64/ipred16.S.
2024-10-28 23:30:46 +00:00
Henrik Gramner
ef4aff75b0
x86: Improve SSSE3 SGR asm
...
* Use the same approach as AVX2 of using floating-point reciprocal
instructions to replace dav1d_sgr_x_by_x[] table lookups.
* Optimize clipping of p-values in the 10bpc code.
* Rename some macros to clarify their functionality.
* Implement various minor tweaks.
2024-10-22 00:00:32 +02:00
Martin Storsjö
55fb9433b7
checkasm: Remove leftover comment
...
This comment no longer is relevant after
9278a14cf4 .
2024-10-18 14:37:28 +00:00
Martin Storsjö
23f2769266
meson: Test support for aarch64 extensions with gas-preprocessor too
2024-10-18 10:55:59 +00:00
Martin Storsjö
b13d1bc2bb
meson: Move checks for gas-preprocessor earlier
...
Locate the assembler tools before checking for support for various
assembler features.
2024-10-18 10:55:59 +00:00
Jean-Baptiste Kempf
32cf02af50
NEWS for 1.5.0
1.5.0
2024-10-18 01:02:57 +02:00
Nathan E. Egge
c3fa1db301
NEWS: add itx to riscv list
2024-10-16 18:06:00 +00:00
Nathan E. Egge
789a1f652b
riscv64/itx: Replace vwadd+vnsra with vnclip
...
The vnclip instruction does a fixed-point saturating add then shift and
can replace vwadd followed by vnsra in idct_4, idct_8, idct_16, iadst_8
and iadst_16.
Including 572c5a6 (which applies the same change to iadst_4) these
commits give the following average improvements across all modified 2D
transform functions:
Kendryte K230 SpacemiT K1
4x4 -5.50% -4.44%
8x8 -9.78% -7.62%
16x16 -9.70% -9.04%
4x8 -8.39% -7.54%
8x4 -8.10% -4.66%
4x16 -8.16% -7.74%
16x4 -8.07% -6.96%
8x16 -9.11% -7.43%
16x8 -9.87% -7.81%
Kendryte K230 Old New Delta
inv_txfm_add_4x4_adst_adst_0_8bpc_rvv 99.0 93.4 -5.66%
inv_txfm_add_4x4_adst_adst_1_8bpc_rvv 99.0 93.4 -5.66%
inv_txfm_add_4x4_adst_dct_0_8bpc_rvv 93.4 87.2 -6.64%
inv_txfm_add_4x4_adst_dct_1_8bpc_rvv 93.5 87.2 -6.74%
inv_txfm_add_4x4_adst_flipadst_0_8bpc_rvv 100.3 94.9 -5.38%
inv_txfm_add_4x4_adst_flipadst_1_8bpc_rvv 100.3 94.9 -5.38%
inv_txfm_add_4x4_adst_identity_0_8bpc_rvv 80.5 77.2 -4.10%
inv_txfm_add_4x4_adst_identity_1_8bpc_rvv 80.5 77.2 -4.10%
inv_txfm_add_4x4_dct_adst_0_8bpc_rvv 94.1 88.5 -5.95%
inv_txfm_add_4x4_dct_adst_1_8bpc_rvv 94.1 88.5 -5.95%
inv_txfm_add_4x4_dct_dct_0_8bpc_rvv 40.3 40.3 0.00%
inv_txfm_add_4x4_dct_dct_1_8bpc_rvv 92.2 82.1 -10.95%
inv_txfm_add_4x4_dct_flipadst_0_8bpc_rvv 95.3 89.9 -5.67%
inv_txfm_add_4x4_dct_flipadst_1_8bpc_rvv 95.3 89.9 -5.67%
inv_txfm_add_4x4_dct_identity_0_8bpc_rvv 75.5 73.3 -2.91%
inv_txfm_add_4x4_dct_identity_1_8bpc_rvv 75.5 73.3 -2.91%
inv_txfm_add_4x4_flipadst_adst_0_8bpc_rvv 100.3 94.7 -5.58%
inv_txfm_add_4x4_flipadst_adst_1_8bpc_rvv 100.3 94.7 -5.58%
inv_txfm_add_4x4_flipadst_dct_0_8bpc_rvv 94.8 88.4 -6.75%
inv_txfm_add_4x4_flipadst_dct_1_8bpc_rvv 94.8 88.5 -6.65%
inv_txfm_add_4x4_flipadst_flipadst_0_8bpc_rvv 105.0 96.0 -8.57%
inv_txfm_add_4x4_flipadst_flipadst_1_8bpc_rvv 105.0 95.9 -8.67%
inv_txfm_add_4x4_flipadst_identity_0_8bpc_rvv 81.6 78.5 -3.80%
inv_txfm_add_4x4_flipadst_identity_1_8bpc_rvv 81.6 78.4 -3.92%
inv_txfm_add_4x4_identity_adst_0_8bpc_rvv 80.3 77.8 -3.11%
inv_txfm_add_4x4_identity_adst_1_8bpc_rvv 80.3 77.8 -3.11%
inv_txfm_add_4x4_identity_dct_0_8bpc_rvv 77.2 71.7 -7.12%
inv_txfm_add_4x4_identity_dct_1_8bpc_rvv 77.2 71.7 -7.12%
inv_txfm_add_4x4_identity_flipadst_0_8bpc_rvv 81.5 79.2 -2.82%
inv_txfm_add_4x4_identity_flipadst_1_8bpc_rvv 81.6 79.2 -2.94%
inv_txfm_add_4x4_identity_identity_0_8bpc_rvv 62.8 61.6 -1.91%
inv_txfm_add_4x4_identity_identity_1_8bpc_rvv 62.8 61.6 -1.91%
inv_txfm_add_4x4_wht_wht_0_8bpc_rvv 67.8 67.8 0.00%
inv_txfm_add_4x4_wht_wht_1_8bpc_rvv 67.8 67.8 0.00%
inv_txfm_add_8x8_adst_adst_0_8bpc_rvv 403.1 356.1 -11.66%
inv_txfm_add_8x8_adst_adst_1_8bpc_rvv 403.1 356.0 -11.68%
inv_txfm_add_8x8_adst_dct_0_8bpc_rvv 360.2 323.2 -10.27%
inv_txfm_add_8x8_adst_dct_1_8bpc_rvv 360.2 323.2 -10.27%
inv_txfm_add_8x8_adst_flipadst_0_8bpc_rvv 405.2 358.4 -11.55%
inv_txfm_add_8x8_adst_flipadst_1_8bpc_rvv 405.2 358.4 -11.55%
inv_txfm_add_8x8_adst_identity_0_8bpc_rvv 284.3 261.0 -8.20%
inv_txfm_add_8x8_adst_identity_1_8bpc_rvv 284.4 260.9 -8.26%
inv_txfm_add_8x8_dct_adst_0_8bpc_rvv 360.2 322.0 -10.61%
inv_txfm_add_8x8_dct_adst_1_8bpc_rvv 360.0 321.9 -10.58%
inv_txfm_add_8x8_dct_dct_0_8bpc_rvv 76.6 77.0 0.52%
inv_txfm_add_8x8_dct_dct_1_8bpc_rvv 317.2 289.0 -8.89%
inv_txfm_add_8x8_dct_flipadst_0_8bpc_rvv 363.7 324.3 -10.83%
inv_txfm_add_8x8_dct_flipadst_1_8bpc_rvv 363.8 324.3 -10.86%
inv_txfm_add_8x8_dct_identity_0_8bpc_rvv 241.2 226.9 -5.93%
inv_txfm_add_8x8_dct_identity_1_8bpc_rvv 241.3 227.0 -5.93%
inv_txfm_add_8x8_flipadst_adst_0_8bpc_rvv 404.9 358.0 -11.58%
inv_txfm_add_8x8_flipadst_adst_1_8bpc_rvv 405.0 358.1 -11.58%
inv_txfm_add_8x8_flipadst_dct_0_8bpc_rvv 365.1 323.8 -11.31%
inv_txfm_add_8x8_flipadst_dct_1_8bpc_rvv 365.2 323.9 -11.31%
inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_rvv 407.2 359.6 -11.69%
inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_rvv 406.4 359.5 -11.54%
inv_txfm_add_8x8_flipadst_identity_0_8bpc_rvv 285.8 261.9 -8.36%
inv_txfm_add_8x8_flipadst_identity_1_8bpc_rvv 285.9 261.8 -8.43%
inv_txfm_add_8x8_identity_adst_0_8bpc_rvv 269.9 244.5 -9.41%
inv_txfm_add_8x8_identity_adst_1_8bpc_rvv 269.8 244.5 -9.38%
inv_txfm_add_8x8_identity_dct_0_8bpc_rvv 225.5 209.6 -7.05%
inv_txfm_add_8x8_identity_dct_1_8bpc_rvv 225.6 209.5 -7.14%
inv_txfm_add_8x8_identity_flipadst_0_8bpc_rvv 270.5 246.5 -8.87%
inv_txfm_add_8x8_identity_flipadst_1_8bpc_rvv 270.5 246.5 -8.87%
inv_txfm_add_8x8_identity_identity_0_8bpc_rvv 146.5 145.4 -0.75%
inv_txfm_add_8x8_identity_identity_1_8bpc_rvv 146.4 145.4 -0.68%
inv_txfm_add_16x16_adst_adst_0_8bpc_rvv 1363.4 1212.0 -11.10%
inv_txfm_add_16x16_adst_adst_1_8bpc_rvv 1363.6 1212.2 -11.10%
inv_txfm_add_16x16_adst_adst_2_8bpc_rvv 1813.7 1601.4 -11.71%
inv_txfm_add_16x16_adst_dct_0_8bpc_rvv 1185.9 1074.6 -9.39%
inv_txfm_add_16x16_adst_dct_1_8bpc_rvv 1186.0 1074.7 -9.38%
inv_txfm_add_16x16_adst_dct_2_8bpc_rvv 1639.5 1468.9 -10.41%
inv_txfm_add_16x16_adst_flipadst_0_8bpc_rvv 1374.8 1214.8 -11.64%
inv_txfm_add_16x16_adst_flipadst_1_8bpc_rvv 1374.7 1214.6 -11.65%
inv_txfm_add_16x16_adst_flipadst_2_8bpc_rvv 1819.3 1610.9 -11.45%
inv_txfm_add_16x16_dct_adst_0_8bpc_rvv 1283.3 1139.1 -11.24%
inv_txfm_add_16x16_dct_adst_1_8bpc_rvv 1283.2 1139.2 -11.22%
inv_txfm_add_16x16_dct_adst_2_8bpc_rvv 1632.4 1471.9 -9.83%
inv_txfm_add_16x16_dct_dct_0_8bpc_rvv 160.9 158.7 -1.37%
inv_txfm_add_16x16_dct_dct_1_8bpc_rvv 1099.5 997.1 -9.31%
inv_txfm_add_16x16_dct_dct_2_8bpc_rvv 1465.3 1335.2 -8.88%
inv_txfm_add_16x16_dct_flipadst_0_8bpc_rvv 1286.8 1143.2 -11.16%
inv_txfm_add_16x16_dct_flipadst_1_8bpc_rvv 1286.8 1143.3 -11.15%
inv_txfm_add_16x16_dct_flipadst_2_8bpc_rvv 1638.6 1473.5 -10.08%
inv_txfm_add_16x16_dct_identity_0_8bpc_rvv 806.6 783.3 -2.89%
inv_txfm_add_16x16_dct_identity_1_8bpc_rvv 806.7 783.4 -2.89%
inv_txfm_add_16x16_dct_identity_2_8bpc_rvv 1163.1 1105.3 -4.97%
inv_txfm_add_16x16_flipadst_adst_0_8bpc_rvv 1374.3 1216.0 -11.52%
inv_txfm_add_16x16_flipadst_adst_1_8bpc_rvv 1374.3 1216.2 -11.50%
inv_txfm_add_16x16_flipadst_adst_2_8bpc_rvv 1817.5 1609.7 -11.43%
inv_txfm_add_16x16_flipadst_dct_0_8bpc_rvv 1190.4 1073.8 -9.80%
inv_txfm_add_16x16_flipadst_dct_1_8bpc_rvv 1190.4 1073.9 -9.79%
inv_txfm_add_16x16_flipadst_dct_2_8bpc_rvv 1640.4 1472.6 -10.23%
inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_rvv 1376.0 1224.2 -11.03%
inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_rvv 1376.0 1224.1 -11.04%
inv_txfm_add_16x16_flipadst_flipadst_2_8bpc_rvv 1829.3 1616.6 -11.63%
inv_txfm_add_16x16_identity_dct_0_8bpc_rvv 952.9 882.0 -7.44%
inv_txfm_add_16x16_identity_dct_1_8bpc_rvv 952.8 881.9 -7.44%
inv_txfm_add_16x16_identity_dct_2_8bpc_rvv 1172.0 1100.1 -6.13%
inv_txfm_add_16x16_identity_identity_0_8bpc_rvv 657.6 659.8 0.33%
inv_txfm_add_16x16_identity_identity_1_8bpc_rvv 657.6 659.7 0.32%
inv_txfm_add_16x16_identity_identity_2_8bpc_rvv 876.2 878.1 0.22%
inv_txfm_add_4x8_adst_adst_0_8bpc_rvv 197.3 178.0 -9.78%
inv_txfm_add_4x8_adst_adst_1_8bpc_rvv 197.4 178.0 -9.83%
inv_txfm_add_4x8_adst_dct_0_8bpc_rvv 174.9 159.9 -8.58%
inv_txfm_add_4x8_adst_dct_1_8bpc_rvv 174.9 159.9 -8.58%
inv_txfm_add_4x8_adst_flipadst_0_8bpc_rvv 199.2 180.2 -9.54%
inv_txfm_add_4x8_adst_flipadst_1_8bpc_rvv 199.2 180.2 -9.54%
inv_txfm_add_4x8_adst_identity_0_8bpc_rvv 123.3 118.0 -4.30%
inv_txfm_add_4x8_adst_identity_1_8bpc_rvv 123.3 118.0 -4.30%
inv_txfm_add_4x8_dct_adst_0_8bpc_rvv 191.1 171.8 -10.10%
inv_txfm_add_4x8_dct_adst_1_8bpc_rvv 191.1 171.7 -10.15%
inv_txfm_add_4x8_dct_dct_0_8bpc_rvv 168.9 153.6 -9.06%
inv_txfm_add_4x8_dct_dct_1_8bpc_rvv 169.0 153.6 -9.11%
inv_txfm_add_4x8_dct_flipadst_0_8bpc_rvv 193.0 173.9 -9.90%
inv_txfm_add_4x8_dct_flipadst_1_8bpc_rvv 193.0 173.9 -9.90%
inv_txfm_add_4x8_dct_identity_0_8bpc_rvv 117.0 111.7 -4.53%
inv_txfm_add_4x8_dct_identity_1_8bpc_rvv 117.0 111.7 -4.53%
inv_txfm_add_4x8_flipadst_adst_0_8bpc_rvv 198.0 178.6 -9.80%
inv_txfm_add_4x8_flipadst_adst_1_8bpc_rvv 198.0 178.6 -9.80%
inv_txfm_add_4x8_flipadst_dct_0_8bpc_rvv 175.8 160.5 -8.70%
inv_txfm_add_4x8_flipadst_dct_1_8bpc_rvv 175.8 160.5 -8.70%
inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_rvv 199.9 180.5 -9.70%
inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_rvv 199.9 180.5 -9.70%
inv_txfm_add_4x8_flipadst_identity_0_8bpc_rvv 123.6 118.6 -4.05%
inv_txfm_add_4x8_flipadst_identity_1_8bpc_rvv 123.6 118.6 -4.05%
inv_txfm_add_4x8_identity_adst_0_8bpc_rvv 171.3 154.2 -9.98%
inv_txfm_add_4x8_identity_adst_1_8bpc_rvv 171.3 154.2 -9.98%
inv_txfm_add_4x8_identity_dct_0_8bpc_rvv 148.6 136.5 -8.14%
inv_txfm_add_4x8_identity_dct_1_8bpc_rvv 148.6 136.5 -8.14%
inv_txfm_add_4x8_identity_flipadst_0_8bpc_rvv 173.1 156.4 -9.65%
inv_txfm_add_4x8_identity_flipadst_1_8bpc_rvv 173.2 156.4 -9.70%
inv_txfm_add_4x8_identity_identity_0_8bpc_rvv 94.3 94.2 -0.11%
inv_txfm_add_4x8_identity_identity_1_8bpc_rvv 94.2 94.2 0.00%
inv_txfm_add_8x4_adst_adst_0_8bpc_rvv 201.2 188.4 -6.36%
inv_txfm_add_8x4_adst_adst_1_8bpc_rvv 201.2 188.4 -6.36%
inv_txfm_add_8x4_adst_dct_0_8bpc_rvv 194.9 175.7 -9.85%
inv_txfm_add_8x4_adst_dct_1_8bpc_rvv 194.9 175.7 -9.85%
inv_txfm_add_8x4_adst_flipadst_0_8bpc_rvv 202.4 182.3 -9.93%
inv_txfm_add_8x4_adst_flipadst_1_8bpc_rvv 202.4 182.3 -9.93%
inv_txfm_add_8x4_adst_identity_0_8bpc_rvv 170.1 155.7 -8.47%
inv_txfm_add_8x4_adst_identity_1_8bpc_rvv 170.1 155.7 -8.47%
inv_txfm_add_8x4_dct_adst_0_8bpc_rvv 178.0 162.1 -8.93%
inv_txfm_add_8x4_dct_adst_1_8bpc_rvv 178.0 162.1 -8.93%
inv_txfm_add_8x4_dct_dct_0_8bpc_rvv 172.8 157.0 -9.14%
inv_txfm_add_8x4_dct_dct_1_8bpc_rvv 172.9 157.0 -9.20%
inv_txfm_add_8x4_dct_flipadst_0_8bpc_rvv 180.3 163.7 -9.21%
inv_txfm_add_8x4_dct_flipadst_1_8bpc_rvv 180.3 163.7 -9.21%
inv_txfm_add_8x4_dct_identity_0_8bpc_rvv 147.9 137.9 -6.76%
inv_txfm_add_8x4_dct_identity_1_8bpc_rvv 147.9 137.9 -6.76%
inv_txfm_add_8x4_flipadst_adst_0_8bpc_rvv 202.4 182.3 -9.93%
inv_txfm_add_8x4_flipadst_adst_1_8bpc_rvv 202.4 182.3 -9.93%
inv_txfm_add_8x4_flipadst_dct_0_8bpc_rvv 196.3 175.9 -10.39%
inv_txfm_add_8x4_flipadst_dct_1_8bpc_rvv 196.3 175.9 -10.39%
inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_rvv 203.7 183.4 -9.97%
inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_rvv 203.7 183.4 -9.97%
inv_txfm_add_8x4_flipadst_identity_0_8bpc_rvv 171.1 155.9 -8.88%
inv_txfm_add_8x4_flipadst_identity_1_8bpc_rvv 171.1 155.9 -8.88%
inv_txfm_add_8x4_identity_adst_0_8bpc_rvv 126.8 120.9 -4.65%
inv_txfm_add_8x4_identity_adst_1_8bpc_rvv 126.8 120.9 -4.65%
inv_txfm_add_8x4_identity_dct_0_8bpc_rvv 121.5 117.0 -3.70%
inv_txfm_add_8x4_identity_dct_1_8bpc_rvv 121.6 117.0 -3.78%
inv_txfm_add_8x4_identity_flipadst_0_8bpc_rvv 129.1 122.3 -5.27%
inv_txfm_add_8x4_identity_flipadst_1_8bpc_rvv 129.1 122.3 -5.27%
inv_txfm_add_8x4_identity_identity_0_8bpc_rvv 98.5 95.7 -2.84%
inv_txfm_add_8x4_identity_identity_1_8bpc_rvv 98.5 95.7 -2.84%
inv_txfm_add_4x16_adst_adst_0_8bpc_rvv 384.4 344.6 -10.35%
inv_txfm_add_4x16_adst_adst_1_8bpc_rvv 384.5 344.6 -10.38%
inv_txfm_add_4x16_adst_adst_2_8bpc_rvv 429.3 387.3 -9.78%
inv_txfm_add_4x16_adst_dct_0_8bpc_rvv 333.7 304.3 -8.81%
inv_txfm_add_4x16_adst_dct_1_8bpc_rvv 333.7 304.2 -8.84%
inv_txfm_add_4x16_adst_dct_2_8bpc_rvv 381.2 354.2 -7.08%
inv_txfm_add_4x16_adst_flipadst_0_8bpc_rvv 385.7 349.1 -9.49%
inv_txfm_add_4x16_adst_flipadst_1_8bpc_rvv 385.7 349.1 -9.49%
inv_txfm_add_4x16_adst_flipadst_2_8bpc_rvv 433.0 389.3 -10.09%
inv_txfm_add_4x16_adst_identity_0_8bpc_rvv 251.6 244.2 -2.94%
inv_txfm_add_4x16_adst_identity_1_8bpc_rvv 251.5 244.1 -2.94%
inv_txfm_add_4x16_adst_identity_2_8bpc_rvv 300.4 289.6 -3.60%
inv_txfm_add_4x16_dct_adst_0_8bpc_rvv 378.5 335.6 -11.33%
inv_txfm_add_4x16_dct_adst_1_8bpc_rvv 378.5 335.5 -11.36%
inv_txfm_add_4x16_dct_adst_2_8bpc_rvv 420.6 369.5 -12.15%
inv_txfm_add_4x16_dct_dct_0_8bpc_rvv 323.5 295.3 -8.72%
inv_txfm_add_4x16_dct_dct_1_8bpc_rvv 323.2 295.2 -8.66%
inv_txfm_add_4x16_dct_dct_2_8bpc_rvv 362.9 333.0 -8.24%
inv_txfm_add_4x16_dct_flipadst_0_8bpc_rvv 375.3 339.4 -9.57%
inv_txfm_add_4x16_dct_flipadst_1_8bpc_rvv 375.4 339.0 -9.70%
inv_txfm_add_4x16_dct_flipadst_2_8bpc_rvv 414.8 372.2 -10.27%
inv_txfm_add_4x16_dct_identity_0_8bpc_rvv 240.8 234.7 -2.53%
inv_txfm_add_4x16_dct_identity_1_8bpc_rvv 240.7 234.7 -2.49%
inv_txfm_add_4x16_dct_identity_2_8bpc_rvv 283.2 268.0 -5.37%
inv_txfm_add_4x16_flipadst_adst_0_8bpc_rvv 384.2 345.8 -9.99%
inv_txfm_add_4x16_flipadst_adst_1_8bpc_rvv 384.1 345.8 -9.97%
inv_txfm_add_4x16_flipadst_adst_2_8bpc_rvv 432.5 387.7 -10.36%
inv_txfm_add_4x16_flipadst_dct_0_8bpc_rvv 334.9 307.0 -8.33%
inv_txfm_add_4x16_flipadst_dct_1_8bpc_rvv 335.0 307.1 -8.33%
inv_txfm_add_4x16_flipadst_dct_2_8bpc_rvv 386.1 347.2 -10.08%
inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_rvv 386.7 349.4 -9.65%
inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_rvv 386.8 349.5 -9.64%
inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_rvv 436.6 392.9 -10.01%
inv_txfm_add_4x16_flipadst_identity_0_8bpc_rvv 252.4 247.4 -1.98%
inv_txfm_add_4x16_flipadst_identity_1_8bpc_rvv 252.4 247.5 -1.94%
inv_txfm_add_4x16_flipadst_identity_2_8bpc_rvv 302.1 286.7 -5.10%
inv_txfm_add_4x16_identity_adst_0_8bpc_rvv 348.3 317.4 -8.87%
inv_txfm_add_4x16_identity_adst_1_8bpc_rvv 348.4 317.5 -8.87%
inv_txfm_add_4x16_identity_adst_2_8bpc_rvv 361.4 329.0 -8.97%
inv_txfm_add_4x16_identity_dct_0_8bpc_rvv 301.8 275.8 -8.61%
inv_txfm_add_4x16_identity_dct_1_8bpc_rvv 301.8 275.8 -8.61%
inv_txfm_add_4x16_identity_dct_2_8bpc_rvv 312.0 287.4 -7.88%
inv_txfm_add_4x16_identity_flipadst_0_8bpc_rvv 352.2 321.9 -8.60%
inv_txfm_add_4x16_identity_flipadst_1_8bpc_rvv 352.2 322.0 -8.57%
inv_txfm_add_4x16_identity_flipadst_2_8bpc_rvv 363.7 332.5 -8.58%
inv_txfm_add_4x16_identity_identity_0_8bpc_rvv 215.8 215.0 -0.37%
inv_txfm_add_4x16_identity_identity_1_8bpc_rvv 215.8 215.1 -0.32%
inv_txfm_add_4x16_identity_identity_2_8bpc_rvv 228.0 227.0 -0.44%
inv_txfm_add_16x4_adst_adst_0_8bpc_rvv 430.3 388.5 -9.71%
inv_txfm_add_16x4_adst_adst_1_8bpc_rvv 430.3 388.5 -9.71%
inv_txfm_add_16x4_adst_adst_2_8bpc_rvv 430.2 388.5 -9.69%
inv_txfm_add_16x4_adst_dct_0_8bpc_rvv 412.1 374.1 -9.22%
inv_txfm_add_16x4_adst_dct_1_8bpc_rvv 412.0 374.3 -9.15%
inv_txfm_add_16x4_adst_dct_2_8bpc_rvv 412.1 374.2 -9.20%
inv_txfm_add_16x4_adst_flipadst_0_8bpc_rvv 432.9 391.0 -9.68%
inv_txfm_add_16x4_adst_flipadst_1_8bpc_rvv 432.8 391.1 -9.63%
inv_txfm_add_16x4_adst_flipadst_2_8bpc_rvv 432.4 391.0 -9.57%
inv_txfm_add_16x4_adst_identity_0_8bpc_rvv 358.4 332.1 -7.34%
inv_txfm_add_16x4_adst_identity_1_8bpc_rvv 358.4 332.3 -7.28%
inv_txfm_add_16x4_adst_identity_2_8bpc_rvv 358.5 332.5 -7.25%
inv_txfm_add_16x4_dct_adst_0_8bpc_rvv 386.9 347.1 -10.29%
inv_txfm_add_16x4_dct_adst_1_8bpc_rvv 386.8 347.1 -10.26%
inv_txfm_add_16x4_dct_adst_2_8bpc_rvv 387.0 346.8 -10.39%
inv_txfm_add_16x4_dct_dct_0_8bpc_rvv 363.3 330.9 -8.92%
inv_txfm_add_16x4_dct_dct_1_8bpc_rvv 363.3 330.9 -8.92%
inv_txfm_add_16x4_dct_dct_2_8bpc_rvv 363.2 331.0 -8.87%
inv_txfm_add_16x4_dct_flipadst_0_8bpc_rvv 383.7 349.8 -8.84%
inv_txfm_add_16x4_dct_flipadst_1_8bpc_rvv 384.3 349.8 -8.98%
inv_txfm_add_16x4_dct_flipadst_2_8bpc_rvv 384.3 349.7 -9.00%
inv_txfm_add_16x4_dct_identity_0_8bpc_rvv 310.2 288.4 -7.03%
inv_txfm_add_16x4_dct_identity_1_8bpc_rvv 310.2 288.4 -7.03%
inv_txfm_add_16x4_dct_identity_2_8bpc_rvv 310.3 288.5 -7.03%
inv_txfm_add_16x4_flipadst_adst_0_8bpc_rvv 434.1 391.5 -9.81%
inv_txfm_add_16x4_flipadst_adst_1_8bpc_rvv 434.1 392.0 -9.70%
inv_txfm_add_16x4_flipadst_adst_2_8bpc_rvv 434.1 392.0 -9.70%
inv_txfm_add_16x4_flipadst_dct_0_8bpc_rvv 423.5 375.5 -11.33%
inv_txfm_add_16x4_flipadst_dct_1_8bpc_rvv 423.5 375.4 -11.36%
inv_txfm_add_16x4_flipadst_dct_2_8bpc_rvv 423.5 375.5 -11.33%
inv_txfm_add_16x4_flipadst_flipadst_0_8bpc_rvv 438.0 396.1 -9.57%
inv_txfm_add_16x4_flipadst_flipadst_1_8bpc_rvv 438.1 396.0 -9.61%
inv_txfm_add_16x4_flipadst_flipadst_2_8bpc_rvv 438.0 395.8 -9.63%
inv_txfm_add_16x4_flipadst_identity_0_8bpc_rvv 361.9 333.0 -7.99%
inv_txfm_add_16x4_flipadst_identity_1_8bpc_rvv 362.4 333.0 -8.11%
inv_txfm_add_16x4_flipadst_identity_2_8bpc_rvv 362.4 333.0 -8.11%
inv_txfm_add_16x4_identity_adst_0_8bpc_rvv 308.3 296.3 -3.89%
inv_txfm_add_16x4_identity_adst_1_8bpc_rvv 308.4 296.4 -3.89%
inv_txfm_add_16x4_identity_adst_2_8bpc_rvv 308.4 296.4 -3.89%
inv_txfm_add_16x4_identity_dct_0_8bpc_rvv 289.9 279.9 -3.45%
inv_txfm_add_16x4_identity_dct_1_8bpc_rvv 289.9 280.0 -3.41%
inv_txfm_add_16x4_identity_dct_2_8bpc_rvv 290.0 279.9 -3.48%
inv_txfm_add_16x4_identity_flipadst_0_8bpc_rvv 311.2 298.9 -3.95%
inv_txfm_add_16x4_identity_flipadst_1_8bpc_rvv 311.1 298.9 -3.92%
inv_txfm_add_16x4_identity_flipadst_2_8bpc_rvv 310.9 298.9 -3.86%
inv_txfm_add_16x4_identity_identity_0_8bpc_rvv 238.4 243.2 2.01%
inv_txfm_add_16x4_identity_identity_1_8bpc_rvv 238.4 243.2 2.01%
inv_txfm_add_16x4_identity_identity_2_8bpc_rvv 238.5 243.2 1.97%
inv_txfm_add_8x16_adst_adst_0_8bpc_rvv 701.5 624.2 -11.02%
inv_txfm_add_8x16_adst_adst_1_8bpc_rvv 701.6 624.2 -11.03%
inv_txfm_add_8x16_adst_adst_2_8bpc_rvv 853.5 755.2 -11.52%
inv_txfm_add_8x16_adst_dct_0_8bpc_rvv 611.1 551.6 -9.74%
inv_txfm_add_8x16_adst_dct_1_8bpc_rvv 611.2 551.7 -9.73%
inv_txfm_add_8x16_adst_dct_2_8bpc_rvv 765.0 682.8 -10.75%
inv_txfm_add_8x16_adst_flipadst_0_8bpc_rvv 703.4 629.3 -10.53%
inv_txfm_add_8x16_adst_flipadst_1_8bpc_rvv 703.4 629.5 -10.51%
inv_txfm_add_8x16_adst_flipadst_2_8bpc_rvv 858.1 763.9 -10.98%
inv_txfm_add_8x16_adst_identity_0_8bpc_rvv 463.7 440.2 -5.07%
inv_txfm_add_8x16_adst_identity_1_8bpc_rvv 464.3 440.2 -5.19%
inv_txfm_add_8x16_adst_identity_2_8bpc_rvv 618.6 571.7 -7.58%
inv_txfm_add_8x16_dct_adst_0_8bpc_rvv 660.3 590.5 -10.57%
inv_txfm_add_8x16_dct_adst_1_8bpc_rvv 660.2 590.3 -10.59%
inv_txfm_add_8x16_dct_adst_2_8bpc_rvv 776.2 687.9 -11.38%
inv_txfm_add_8x16_dct_dct_0_8bpc_rvv 566.9 516.3 -8.93%
inv_txfm_add_8x16_dct_dct_1_8bpc_rvv 567.1 516.4 -8.94%
inv_txfm_add_8x16_dct_dct_2_8bpc_rvv 685.9 616.6 -10.10%
inv_txfm_add_8x16_dct_flipadst_0_8bpc_rvv 663.3 593.5 -10.52%
inv_txfm_add_8x16_dct_flipadst_1_8bpc_rvv 663.2 593.5 -10.51%
inv_txfm_add_8x16_dct_flipadst_2_8bpc_rvv 771.7 690.5 -10.52%
inv_txfm_add_8x16_dct_identity_0_8bpc_rvv 421.3 406.1 -3.61%
inv_txfm_add_8x16_dct_identity_1_8bpc_rvv 421.3 406.1 -3.61%
inv_txfm_add_8x16_dct_identity_2_8bpc_rvv 536.6 503.6 -6.15%
inv_txfm_add_8x16_flipadst_adst_0_8bpc_rvv 703.3 627.1 -10.83%
inv_txfm_add_8x16_flipadst_adst_1_8bpc_rvv 703.4 627.2 -10.83%
inv_txfm_add_8x16_flipadst_adst_2_8bpc_rvv 857.7 763.7 -10.96%
inv_txfm_add_8x16_flipadst_dct_0_8bpc_rvv 613.5 552.8 -9.89%
inv_txfm_add_8x16_flipadst_dct_1_8bpc_rvv 613.4 552.7 -9.90%
inv_txfm_add_8x16_flipadst_dct_2_8bpc_rvv 771.0 693.1 -10.10%
inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_rvv 706.3 631.4 -10.60%
inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_rvv 706.5 631.7 -10.59%
inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_rvv 861.1 76.9 -11.17%
inv_txfm_add_8x16_flipadst_identity_0_8bpc_rvv 467.0 443.0 -5.14%
inv_txfm_add_8x16_flipadst_identity_1_8bpc_rvv 467.0 443.0 -5.14%
inv_txfm_add_8x16_flipadst_identity_2_8bpc_rvv 623.7 575.1 -7.79%
inv_txfm_add_8x16_identity_adst_0_8bpc_rvv 565.6 512.0 -9.48%
inv_txfm_add_8x16_identity_adst_1_8bpc_rvv 565.6 512.9 -9.32%
inv_txfm_add_8x16_identity_adst_2_8bpc_rvv 585.6 532.8 -9.02%
inv_txfm_add_8x16_identity_dct_0_8bpc_rvv 476.4 439.9 -7.66%
inv_txfm_add_8x16_identity_dct_1_8bpc_rvv 476.4 440.0 -7.64%
inv_txfm_add_8x16_identity_dct_2_8bpc_rvv 496.3 459.5 -7.41%
inv_txfm_add_8x16_identity_flipadst_0_8bpc_rvv 570.7 516.4 -9.51%
inv_txfm_add_8x16_identity_flipadst_1_8bpc_rvv 570.6 516.3 -9.52%
inv_txfm_add_8x16_identity_flipadst_2_8bpc_rvv 590.2 540.0 -8.51%
inv_txfm_add_8x16_identity_identity_0_8bpc_rvv 330.9 329.9 -0.30%
inv_txfm_add_8x16_identity_identity_1_8bpc_rvv 330.9 329.9 -0.30%
inv_txfm_add_8x16_identity_identity_2_8bpc_rvv 350.8 349.7 -0.31%
inv_txfm_add_16x8_adst_adst_0_8bpc_rvv 855.5 752.1 -12.09%
inv_txfm_add_16x8_adst_adst_1_8bpc_rvv 855.5 751.9 -12.11%
inv_txfm_add_16x8_adst_adst_2_8bpc_rvv 855.4 752.1 -12.08%
inv_txfm_add_16x8_adst_dct_0_8bpc_rvv 765.4 685.5 -10.44%
inv_txfm_add_16x8_adst_dct_1_8bpc_rvv 765.5 685.3 -10.48%
inv_txfm_add_16x8_adst_dct_2_8bpc_rvv 765.5 685.5 -10.45%
inv_txfm_add_16x8_adst_flipadst_0_8bpc_rvv 859.2 755.8 -12.03%
inv_txfm_add_16x8_adst_flipadst_1_8bpc_rvv 859.1 756.0 -12.00%
inv_txfm_add_16x8_adst_flipadst_2_8bpc_rvv 859.1 755.9 -12.01%
inv_txfm_add_16x8_adst_identity_0_8bpc_rvv 612.8 561.9 -8.31%
inv_txfm_add_16x8_adst_identity_1_8bpc_rvv 612.9 561.9 -8.32%
inv_txfm_add_16x8_adst_identity_2_8bpc_rvv 612.8 561.9 -8.31%
inv_txfm_add_16x8_dct_adst_0_8bpc_rvv 765.1 676.0 -11.65%
inv_txfm_add_16x8_dct_adst_1_8bpc_rvv 765.0 676.2 -11.61%
inv_txfm_add_16x8_dct_adst_2_8bpc_rvv 765.0 676.2 -11.61%
inv_txfm_add_16x8_dct_dct_0_8bpc_rvv 674.5 612.0 -9.27%
inv_txfm_add_16x8_dct_dct_1_8bpc_rvv 674.5 612.1 -9.25%
inv_txfm_add_16x8_dct_dct_2_8bpc_rvv 674.6 612.0 -9.28%
inv_txfm_add_16x8_dct_flipadst_0_8bpc_rvv 777.2 679.9 -12.52%
inv_txfm_add_16x8_dct_flipadst_1_8bpc_rvv 777.1 680.1 -12.48%
inv_txfm_add_16x8_dct_flipadst_2_8bpc_rvv 777.1 680.0 -12.50%
inv_txfm_add_16x8_dct_identity_0_8bpc_rvv 522.2 488.2 -6.51%
inv_txfm_add_16x8_dct_identity_1_8bpc_rvv 522.1 488.2 -6.49%
inv_txfm_add_16x8_dct_identity_2_8bpc_rvv 522.1 487.5 -6.63%
inv_txfm_add_16x8_flipadst_adst_0_8bpc_rvv 859.2 753.5 -12.30%
inv_txfm_add_16x8_flipadst_adst_1_8bpc_rvv 859.2 753.6 -12.29%
inv_txfm_add_16x8_flipadst_adst_2_8bpc_rvv 859.2 753.5 -12.30%
inv_txfm_add_16x8_flipadst_dct_0_8bpc_rvv 768.9 689.0 -10.39%
inv_txfm_add_16x8_flipadst_dct_1_8bpc_rvv 768.9 689.2 -10.37%
inv_txfm_add_16x8_flipadst_dct_2_8bpc_rvv 768.8 689.2 -10.35%
inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_rvv 863.0 758.7 -12.09%
inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_rvv 862.9 758.7 -12.08%
inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_rvv 863.0 758.6 -12.10%
inv_txfm_add_16x8_flipadst_identity_0_8bpc_rvv 616.5 566.7 -8.08%
inv_txfm_add_16x8_flipadst_identity_1_8bpc_rvv 616.6 566.6 -8.11%
inv_txfm_add_16x8_flipadst_identity_2_8bpc_rvv 616.3 567.0 -8.00%
inv_txfm_add_16x8_identity_adst_0_8bpc_rvv 618.1 564.5 -8.67%
inv_txfm_add_16x8_identity_adst_1_8bpc_rvv 618.0 564.5 -8.66%
inv_txfm_add_16x8_identity_adst_2_8bpc_rvv 617.7 564.6 -8.60%
inv_txfm_add_16x8_identity_dct_0_8bpc_rvv 527.9 500.6 -5.17%
inv_txfm_add_16x8_identity_dct_1_8bpc_rvv 527.8 500.7 -5.13%
inv_txfm_add_16x8_identity_dct_2_8bpc_rvv 527.7 500.7 -5.12%
inv_txfm_add_16x8_identity_flipadst_0_8bpc_rvv 622.3 568.5 -8.65%
inv_txfm_add_16x8_identity_flipadst_1_8bpc_rvv 622.2 568.5 -8.63%
inv_txfm_add_16x8_identity_flipadst_2_8bpc_rvv 622.3 568.4 -8.66%
inv_txfm_add_16x8_identity_identity_0_8bpc_rvv 373.4 374.4 0.27%
inv_txfm_add_16x8_identity_identity_1_8bpc_rvv 373.4 374.5 0.29%
inv_txfm_add_16x8_identity_identity_2_8bpc_rvv 373.4 374.4 0.27%
SpacemiT K1 Old New Delta
inv_txfm_add_4x4_adst_adst_0_8bpc_rvv 101.0 96.8 -4.16%
inv_txfm_add_4x4_adst_adst_1_8bpc_rvv 101.1 96.8 -4.25%
inv_txfm_add_4x4_adst_dct_0_8bpc_rvv 96.8 91.7 -5.27%
inv_txfm_add_4x4_adst_dct_1_8bpc_rvv 95.9 91.8 -4.28%
inv_txfm_add_4x4_adst_flipadst_0_8bpc_rvv 102.2 97.9 -4.21%
inv_txfm_add_4x4_adst_flipadst_1_8bpc_rvv 102.2 97.9 -4.21%
inv_txfm_add_4x4_adst_identity_0_8bpc_rvv 82.4 80.4 -2.43%
inv_txfm_add_4x4_adst_identity_1_8bpc_rvv 82.4 80.5 -2.31%
inv_txfm_add_4x4_dct_adst_0_8bpc_rvv 97.3 92.6 -4.83%
inv_txfm_add_4x4_dct_adst_1_8bpc_rvv 97.2 92.3 -5.04%
inv_txfm_add_4x4_dct_dct_0_8bpc_rvv 41.2 41.3 0.24%
inv_txfm_add_4x4_dct_dct_1_8bpc_rvv 96.0 87.5 -8.85%
inv_txfm_add_4x4_dct_flipadst_0_8bpc_rvv 98.5 94.5 -4.06%
inv_txfm_add_4x4_dct_flipadst_1_8bpc_rvv 98.6 94.7 -3.96%
inv_txfm_add_4x4_dct_identity_0_8bpc_rvv 78.6 76.1 -3.18%
inv_txfm_add_4x4_dct_identity_1_8bpc_rvv 78.6 76.0 -3.31%
inv_txfm_add_4x4_flipadst_adst_0_8bpc_rvv 104.3 99.1 -4.99%
inv_txfm_add_4x4_flipadst_adst_1_8bpc_rvv 104.4 99.1 -5.08%
inv_txfm_add_4x4_flipadst_dct_0_8bpc_rvv 98.0 94.6 -3.47%
inv_txfm_add_4x4_flipadst_dct_1_8bpc_rvv 98.1 94.4 -3.77%
inv_txfm_add_4x4_flipadst_flipadst_0_8bpc_rvv 104.2 99.2 -4.80%
inv_txfm_add_4x4_flipadst_flipadst_1_8bpc_rvv 104.3 99.2 -4.89%
inv_txfm_add_4x4_flipadst_identity_0_8bpc_rvv 86.9 81.8 -5.87%
inv_txfm_add_4x4_flipadst_identity_1_8bpc_rvv 87.0 81.9 -5.86%
inv_txfm_add_4x4_identity_adst_0_8bpc_rvv 86.0 80.8 -6.05%
inv_txfm_add_4x4_identity_adst_1_8bpc_rvv 85.9 81.4 -5.24%
inv_txfm_add_4x4_identity_dct_0_8bpc_rvv 78.5 76.1 -3.06%
inv_txfm_add_4x4_identity_dct_1_8bpc_rvv 78.6 76.1 -3.18%
inv_txfm_add_4x4_identity_flipadst_0_8bpc_rvv 85.9 82.5 -3.96%
inv_txfm_add_4x4_identity_flipadst_1_8bpc_rvv 85.9 82.3 -4.19%
inv_txfm_add_4x4_identity_identity_0_8bpc_rvv 65.9 64.9 -1.52%
inv_txfm_add_4x4_identity_identity_1_8bpc_rvv 65.9 64.8 -1.67%
inv_txfm_add_4x4_wht_wht_0_8bpc_rvv 71.2 71.3 0.14%
inv_txfm_add_4x4_wht_wht_1_8bpc_rvv 71.2 71.3 0.14%
inv_txfm_add_8x8_adst_adst_0_8bpc_rvv 440.6 399.3 -9.37%
inv_txfm_add_8x8_adst_adst_1_8bpc_rvv 440.6 399.3 -9.37%
inv_txfm_add_8x8_adst_dct_0_8bpc_rvv 401.7 368.4 -8.29%
inv_txfm_add_8x8_adst_dct_1_8bpc_rvv 401.8 368.4 -8.31%
inv_txfm_add_8x8_adst_flipadst_0_8bpc_rvv 442.4 401.2 -9.31%
inv_txfm_add_8x8_adst_flipadst_1_8bpc_rvv 442.4 401.1 -9.34%
inv_txfm_add_8x8_adst_identity_0_8bpc_rvv 329.7 310.1 -5.94%
inv_txfm_add_8x8_adst_identity_1_8bpc_rvv 329.7 310.1 -5.94%
inv_txfm_add_8x8_dct_adst_0_8bpc_rvv 401.8 367.4 -8.56%
inv_txfm_add_8x8_dct_adst_1_8bpc_rvv 401.7 367.3 -8.56%
inv_txfm_add_8x8_dct_dct_0_8bpc_rvv 79.5 80.2 0.88%
inv_txfm_add_8x8_dct_dct_1_8bpc_rvv 362.1 335.8 -7.26%
inv_txfm_add_8x8_dct_flipadst_0_8bpc_rvv 405.0 369.2 -8.84%
inv_txfm_add_8x8_dct_flipadst_1_8bpc_rvv 405.1 369.2 -8.86%
inv_txfm_add_8x8_dct_identity_0_8bpc_rvv 290.9 278.2 -4.37%
inv_txfm_add_8x8_dct_identity_1_8bpc_rvv 290.8 278.2 -4.33%
inv_txfm_add_8x8_flipadst_adst_0_8bpc_rvv 442.5 401.1 -9.36%
inv_txfm_add_8x8_flipadst_adst_1_8bpc_rvv 442.5 401.2 -9.33%
inv_txfm_add_8x8_flipadst_dct_0_8bpc_rvv 405.8 369.2 -9.02%
inv_txfm_add_8x8_flipadst_dct_1_8bpc_rvv 405.8 369.1 -9.04%
inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_rvv 444.3 403.0 -9.30%
inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_rvv 444.3 403.1 -9.27%
inv_txfm_add_8x8_flipadst_identity_0_8bpc_rvv 331.6 310.9 -6.24%
inv_txfm_add_8x8_flipadst_identity_1_8bpc_rvv 331.6 310.9 -6.24%
inv_txfm_add_8x8_identity_adst_0_8bpc_rvv 313.3 292.6 -6.61%
inv_txfm_add_8x8_identity_adst_1_8bpc_rvv 313.1 292.6 -6.55%
inv_txfm_add_8x8_identity_dct_0_8bpc_rvv 274.5 260.6 -5.06%
inv_txfm_add_8x8_identity_dct_1_8bpc_rvv 274.4 260.7 -4.99%
inv_txfm_add_8x8_identity_flipadst_0_8bpc_rvv 315.3 294.4 -6.63%
inv_txfm_add_8x8_identity_flipadst_1_8bpc_rvv 315.3 294.4 -6.63%
inv_txfm_add_8x8_identity_identity_0_8bpc_rvv 202.5 202.5 0.00%
inv_txfm_add_8x8_identity_identity_1_8bpc_rvv 202.6 202.5 -0.05%
inv_txfm_add_16x16_adst_adst_0_8bpc_rvv 1418.8 1268.2 -10.61%
inv_txfm_add_16x16_adst_adst_1_8bpc_rvv 1418.9 1268.3 -10.61%
inv_txfm_add_16x16_adst_adst_2_8bpc_rvv 1943.3 1733.6 -10.79%
inv_txfm_add_16x16_adst_dct_0_8bpc_rvv 1241.7 1134.6 -8.63%
inv_txfm_add_16x16_adst_dct_1_8bpc_rvv 1241.5 1134.5 -8.62%
inv_txfm_add_16x16_adst_dct_2_8bpc_rvv 1772.5 1599.8 -9.74%
inv_txfm_add_16x16_adst_flipadst_0_8bpc_rvv 1429.8 1270.3 -11.16%
inv_txfm_add_16x16_adst_flipadst_1_8bpc_rvv 1429.7 1270.1 -11.16%
inv_txfm_add_16x16_adst_flipadst_2_8bpc_rvv 1951.1 1741.4 -10.75%
inv_txfm_add_16x16_dct_adst_0_8bpc_rvv 1337.8 1195.8 -10.61%
inv_txfm_add_16x16_dct_adst_1_8bpc_rvv 1337.5 1196.0 -10.58%
inv_txfm_add_16x16_dct_adst_2_8bpc_rvv 1763.2 1604.6 -9.00%
inv_txfm_add_16x16_dct_dct_0_8bpc_rvv 179.3 181.1 1.00%
inv_txfm_add_16x16_dct_dct_1_8bpc_rvv 1153.8 1060.7 -8.07%
inv_txfm_add_16x16_dct_dct_2_8bpc_rvv 1601.6 1470.6 -8.18%
inv_txfm_add_16x16_dct_flipadst_0_8bpc_rvv 1340.7 1199.8 -10.51%
inv_txfm_add_16x16_dct_flipadst_1_8bpc_rvv 1340.4 1199.8 -10.49%
inv_txfm_add_16x16_dct_flipadst_2_8bpc_rvv 1771.2 1606.6 -9.29%
inv_txfm_add_16x16_dct_identity_0_8bpc_rvv 877.9 854.9 -2.62%
inv_txfm_add_16x16_dct_identity_1_8bpc_rvv 877.7 855.2 -2.56%
inv_txfm_add_16x16_dct_identity_2_8bpc_rvv 1311.6 1254.1 -4.38%
inv_txfm_add_16x16_flipadst_adst_0_8bpc_rvv 1428.2 1270.5 -11.04%
inv_txfm_add_16x16_flipadst_adst_1_8bpc_rvv 1428.3 1270.6 -11.04%
inv_txfm_add_16x16_flipadst_adst_2_8bpc_rvv 1947.3 1737.3 -10.78%
inv_txfm_add_16x16_flipadst_dct_0_8bpc_rvv 1245.8 1133.5 -9.01%
inv_txfm_add_16x16_flipadst_dct_1_8bpc_rvv 1246.0 1133.7 -9.01%
inv_txfm_add_16x16_flipadst_dct_2_8bpc_rvv 1769.9 1603.9 -9.38%
inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_rvv 1428.7 1279.7 -10.43%
inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_rvv 1428.8 1279.5 -10.45%
inv_txfm_add_16x16_flipadst_flipadst_2_8bpc_rvv 1960.8 1745.8 -10.96%
inv_txfm_add_16x16_identity_dct_0_8bpc_rvv 1016.6 948.8 -6.67%
inv_txfm_add_16x16_identity_dct_1_8bpc_rvv 1016.7 948.8 -6.68%
inv_txfm_add_16x16_identity_dct_2_8bpc_rvv 1319.8 1247.7 -5.46%
inv_txfm_add_16x16_identity_identity_0_8bpc_rvv 735.4 736.6 0.16%
inv_txfm_add_16x16_identity_identity_1_8bpc_rvv 735.3 736.4 0.15%
inv_txfm_add_16x16_identity_identity_2_8bpc_rvv 1037.8 1036.7 -0.11%
inv_txfm_add_4x8_adst_adst_0_8bpc_rvv 197.2 179.9 -8.77%
inv_txfm_add_4x8_adst_adst_1_8bpc_rvv 197.1 180.0 -8.68%
inv_txfm_add_4x8_adst_dct_0_8bpc_rvv 177.5 164.2 -7.49%
inv_txfm_add_4x8_adst_dct_1_8bpc_rvv 177.5 164.3 -7.44%
inv_txfm_add_4x8_adst_flipadst_0_8bpc_rvv 199.3 181.8 -8.78%
inv_txfm_add_4x8_adst_flipadst_1_8bpc_rvv 199.0 181.8 -8.64%
inv_txfm_add_4x8_adst_identity_0_8bpc_rvv 126.7 121.8 -3.87%
inv_txfm_add_4x8_adst_identity_1_8bpc_rvv 126.7 121.9 -3.79%
inv_txfm_add_4x8_dct_adst_0_8bpc_rvv 189.8 172.4 -9.17%
inv_txfm_add_4x8_dct_adst_1_8bpc_rvv 189.8 172.4 -9.17%
inv_txfm_add_4x8_dct_dct_0_8bpc_rvv 170.2 156.8 -7.87%
inv_txfm_add_4x8_dct_dct_1_8bpc_rvv 170.2 156.9 -7.81%
inv_txfm_add_4x8_dct_flipadst_0_8bpc_rvv 192.6 174.2 -9.55%
inv_txfm_add_4x8_dct_flipadst_1_8bpc_rvv 192.6 174.2 -9.55%
inv_txfm_add_4x8_dct_identity_0_8bpc_rvv 119.4 114.3 -4.27%
inv_txfm_add_4x8_dct_identity_1_8bpc_rvv 119.6 114.2 -4.52%
inv_txfm_add_4x8_flipadst_adst_0_8bpc_rvv 197.7 180.5 -8.70%
inv_txfm_add_4x8_flipadst_adst_1_8bpc_rvv 197.8 180.6 -8.70%
inv_txfm_add_4x8_flipadst_dct_0_8bpc_rvv 178.3 165.0 -7.46%
inv_txfm_add_4x8_flipadst_dct_1_8bpc_rvv 178.3 164.9 -7.52%
inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_rvv 199.7 182.5 -8.61%
inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_rvv 200.0 182.4 -8.80%
inv_txfm_add_4x8_flipadst_identity_0_8bpc_rvv 127.2 122.3 -3.85%
inv_txfm_add_4x8_flipadst_identity_1_8bpc_rvv 127.3 122.5 -3.77%
inv_txfm_add_4x8_identity_adst_0_8bpc_rvv 172.1 155.0 -9.94%
inv_txfm_add_4x8_identity_adst_1_8bpc_rvv 172.1 155.0 -9.94%
inv_txfm_add_4x8_identity_dct_0_8bpc_rvv 148.7 139.4 -6.25%
inv_txfm_add_4x8_identity_dct_1_8bpc_rvv 148.7 139.5 -6.19%
inv_txfm_add_4x8_identity_flipadst_0_8bpc_rvv 171.7 156.8 -8.68%
inv_txfm_add_4x8_identity_flipadst_1_8bpc_rvv 171.6 156.9 -8.57%
inv_txfm_add_4x8_identity_identity_0_8bpc_rvv 96.8 96.8 0.00%
inv_txfm_add_4x8_identity_identity_1_8bpc_rvv 96.7 96.7 0.00%
inv_txfm_add_8x4_adst_adst_0_8bpc_rvv 228.1 220.0 -3.55%
inv_txfm_add_8x4_adst_adst_1_8bpc_rvv 227.9 219.9 -3.51%
inv_txfm_add_8x4_adst_dct_0_8bpc_rvv 219.4 206.4 -5.93%
inv_txfm_add_8x4_adst_dct_1_8bpc_rvv 219.4 206.4 -5.93%
inv_txfm_add_8x4_adst_flipadst_0_8bpc_rvv 229.4 214.7 -6.41%
inv_txfm_add_8x4_adst_flipadst_1_8bpc_rvv 229.4 214.8 -6.36%
inv_txfm_add_8x4_adst_identity_0_8bpc_rvv 195.6 187.6 -4.09%
inv_txfm_add_8x4_adst_identity_1_8bpc_rvv 195.8 187.6 -4.19%
inv_txfm_add_8x4_dct_adst_0_8bpc_rvv 207.0 195.2 -5.70%
inv_txfm_add_8x4_dct_adst_1_8bpc_rvv 206.9 195.2 -5.65%
inv_txfm_add_8x4_dct_dct_0_8bpc_rvv 199.4 188.2 -5.62%
inv_txfm_add_8x4_dct_dct_1_8bpc_rvv 199.4 188.5 -5.47%
inv_txfm_add_8x4_dct_flipadst_0_8bpc_rvv 209.5 196.5 -6.21%
inv_txfm_add_8x4_dct_flipadst_1_8bpc_rvv 209.7 196.6 -6.25%
inv_txfm_add_8x4_dct_identity_0_8bpc_rvv 175.7 169.5 -3.53%
inv_txfm_add_8x4_dct_identity_1_8bpc_rvv 175.9 169.6 -3.58%
inv_txfm_add_8x4_flipadst_adst_0_8bpc_rvv 229.0 214.7 -6.24%
inv_txfm_add_8x4_flipadst_adst_1_8bpc_rvv 229.3 214.5 -6.45%
inv_txfm_add_8x4_flipadst_dct_0_8bpc_rvv 220.9 206.7 -6.43%
inv_txfm_add_8x4_flipadst_dct_1_8bpc_rvv 220.6 206.5 -6.39%
inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_rvv 230.6 215.9 -6.37%
inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_rvv 230.7 215.9 -6.42%
inv_txfm_add_8x4_flipadst_identity_0_8bpc_rvv 196.9 188.9 -4.06%
inv_txfm_add_8x4_flipadst_identity_1_8bpc_rvv 196.9 188.9 -4.06%
inv_txfm_add_8x4_identity_adst_0_8bpc_rvv 157.6 154.7 -1.84%
inv_txfm_add_8x4_identity_adst_1_8bpc_rvv 157.5 154.9 -1.65%
inv_txfm_add_8x4_identity_dct_0_8bpc_rvv 150.0 147.9 -1.40%
inv_txfm_add_8x4_identity_dct_1_8bpc_rvv 150.0 147.7 -1.53%
inv_txfm_add_8x4_identity_flipadst_0_8bpc_rvv 159.6 155.9 -2.32%
inv_txfm_add_8x4_identity_flipadst_1_8bpc_rvv 159.8 155.6 -2.63%
inv_txfm_add_8x4_identity_identity_0_8bpc_rvv 128.6 128.8 0.16%
inv_txfm_add_8x4_identity_identity_1_8bpc_rvv 128.4 129.3 0.70%
inv_txfm_add_4x16_adst_adst_0_8bpc_rvv 373.8 335.9 -10.14%
inv_txfm_add_4x16_adst_adst_1_8bpc_rvv 373.8 335.7 -10.19%
inv_txfm_add_4x16_adst_adst_2_8bpc_rvv 417.4 380.0 -8.96%
inv_txfm_add_4x16_adst_dct_0_8bpc_rvv 328.3 301.7 -8.10%
inv_txfm_add_4x16_adst_dct_1_8bpc_rvv 328.0 302.0 -7.93%
inv_txfm_add_4x16_adst_dct_2_8bpc_rvv 374.3 351.3 -6.14%
inv_txfm_add_4x16_adst_flipadst_0_8bpc_rvv 374.5 339.8 -9.27%
inv_txfm_add_4x16_adst_flipadst_1_8bpc_rvv 374.3 339.4 -9.32%
inv_txfm_add_4x16_adst_flipadst_2_8bpc_rvv 422.0 383.8 -9.05%
inv_txfm_add_4x16_adst_identity_0_8bpc_rvv 248.0 242.9 -2.06%
inv_txfm_add_4x16_adst_identity_1_8bpc_rvv 248.0 242.2 -2.34%
inv_txfm_add_4x16_adst_identity_2_8bpc_rvv 298.6 290.3 -2.78%
inv_txfm_add_4x16_dct_adst_0_8bpc_rvv 370.5 329.4 -11.09%
inv_txfm_add_4x16_dct_adst_1_8bpc_rvv 370.8 329.0 -11.27%
inv_txfm_add_4x16_dct_adst_2_8bpc_rvv 409.1 360.9 -11.78%
inv_txfm_add_4x16_dct_dct_0_8bpc_rvv 321.1 293.7 -8.53%
inv_txfm_add_4x16_dct_dct_1_8bpc_rvv 321.0 294.3 -8.32%
inv_txfm_add_4x16_dct_dct_2_8bpc_rvv 357.8 329.8 -7.83%
inv_txfm_add_4x16_dct_flipadst_0_8bpc_rvv 369.7 332.9 -9.95%
inv_txfm_add_4x16_dct_flipadst_1_8bpc_rvv 370.4 333.0 -10.10%
inv_txfm_add_4x16_dct_flipadst_2_8bpc_rvv 405.5 364.9 -10.01%
inv_txfm_add_4x16_dct_identity_0_8bpc_rvv 241.6 236.6 -2.07%
inv_txfm_add_4x16_dct_identity_1_8bpc_rvv 241.8 235.6 -2.56%
inv_txfm_add_4x16_dct_identity_2_8bpc_rvv 281.9 266.9 -5.32%
inv_txfm_add_4x16_flipadst_adst_0_8bpc_rvv 371.9 337.3 -9.30%
inv_txfm_add_4x16_flipadst_adst_1_8bpc_rvv 372.2 337.1 -9.43%
inv_txfm_add_4x16_flipadst_adst_2_8bpc_rvv 419.8 381.5 -9.12%
inv_txfm_add_4x16_flipadst_dct_0_8bpc_rvv 328.3 302.9 -7.74%
inv_txfm_add_4x16_flipadst_dct_1_8bpc_rvv 328.4 303.3 -7.64%
inv_txfm_add_4x16_flipadst_dct_2_8bpc_rvv 380.6 343.7 -9.70%
inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_rvv 377.7 341.1 -9.69%
inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_rvv 377.6 341.5 -9.56%
inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_rvv 423.6 386.7 -8.71%
inv_txfm_add_4x16_flipadst_identity_0_8bpc_rvv 250.0 245.7 -1.72%
inv_txfm_add_4x16_flipadst_identity_1_8bpc_rvv 249.3 246.0 -1.32%
inv_txfm_add_4x16_flipadst_identity_2_8bpc_rvv 296.4 284.7 -3.95%
inv_txfm_add_4x16_identity_adst_0_8bpc_rvv 343.0 311.2 -9.27%
inv_txfm_add_4x16_identity_adst_1_8bpc_rvv 342.9 311.0 -9.30%
inv_txfm_add_4x16_identity_adst_2_8bpc_rvv 354.8 325.0 -8.40%
inv_txfm_add_4x16_identity_dct_0_8bpc_rvv 298.9 274.9 -8.03%
inv_txfm_add_4x16_identity_dct_1_8bpc_rvv 298.8 275.0 -7.97%
inv_txfm_add_4x16_identity_dct_2_8bpc_rvv 310.3 289.1 -6.83%
inv_txfm_add_4x16_identity_flipadst_0_8bpc_rvv 344.7 314.9 -8.65%
inv_txfm_add_4x16_identity_flipadst_1_8bpc_rvv 344.5 314.8 -8.62%
inv_txfm_add_4x16_identity_flipadst_2_8bpc_rvv 358.3 328.6 -8.29%
inv_txfm_add_4x16_identity_identity_0_8bpc_rvv 219.6 216.1 -1.59%
inv_txfm_add_4x16_identity_identity_1_8bpc_rvv 218.3 216.3 -0.92%
inv_txfm_add_4x16_identity_identity_2_8bpc_rvv 231.3 229.6 -0.73%
inv_txfm_add_16x4_adst_adst_0_8bpc_rvv 468.5 428.8 -8.47%
inv_txfm_add_16x4_adst_adst_1_8bpc_rvv 468.5 428.9 -8.45%
inv_txfm_add_16x4_adst_adst_2_8bpc_rvv 468.5 428.9 -8.45%
inv_txfm_add_16x4_adst_dct_0_8bpc_rvv 453.8 414.5 -8.66%
inv_txfm_add_16x4_adst_dct_1_8bpc_rvv 453.8 414.5 -8.66%
inv_txfm_add_16x4_adst_dct_2_8bpc_rvv 453.9 414.4 -8.70%
inv_txfm_add_16x4_adst_flipadst_0_8bpc_rvv 471.0 431.5 -8.39%
inv_txfm_add_16x4_adst_flipadst_1_8bpc_rvv 471.0 431.3 -8.43%
inv_txfm_add_16x4_adst_flipadst_2_8bpc_rvv 471.0 431.5 -8.39%
inv_txfm_add_16x4_adst_identity_0_8bpc_rvv 402.2 375.0 -6.76%
inv_txfm_add_16x4_adst_identity_1_8bpc_rvv 402.1 375.0 -6.74%
inv_txfm_add_16x4_adst_identity_2_8bpc_rvv 402.0 375.3 -6.64%
inv_txfm_add_16x4_dct_adst_0_8bpc_rvv 432.8 392.5 -9.31%
inv_txfm_add_16x4_dct_adst_1_8bpc_rvv 432.8 392.5 -9.31%
inv_txfm_add_16x4_dct_adst_2_8bpc_rvv 432.8 392.5 -9.31%
inv_txfm_add_16x4_dct_dct_0_8bpc_rvv 407.9 378.3 -7.26%
inv_txfm_add_16x4_dct_dct_1_8bpc_rvv 407.8 378.1 -7.28%
inv_txfm_add_16x4_dct_dct_2_8bpc_rvv 407.8 378.1 -7.28%
inv_txfm_add_16x4_dct_flipadst_0_8bpc_rvv 426.0 395.1 -7.25%
inv_txfm_add_16x4_dct_flipadst_1_8bpc_rvv 425.9 395.0 -7.26%
inv_txfm_add_16x4_dct_flipadst_2_8bpc_rvv 426.0 395.1 -7.25%
inv_txfm_add_16x4_dct_identity_0_8bpc_rvv 357.1 338.7 -5.15%
inv_txfm_add_16x4_dct_identity_1_8bpc_rvv 357.1 338.7 -5.15%
inv_txfm_add_16x4_dct_identity_2_8bpc_rvv 357.2 338.7 -5.18%
inv_txfm_add_16x4_flipadst_adst_0_8bpc_rvv 472.4 432.6 -8.43%
inv_txfm_add_16x4_flipadst_adst_1_8bpc_rvv 472.2 432.6 -8.39%
inv_txfm_add_16x4_flipadst_adst_2_8bpc_rvv 472.3 432.7 -8.38%
inv_txfm_add_16x4_flipadst_dct_0_8bpc_rvv 464.3 418.2 -9.93%
inv_txfm_add_16x4_flipadst_dct_1_8bpc_rvv 464.2 418.2 -9.91%
inv_txfm_add_16x4_flipadst_dct_2_8bpc_rvv 464.2 418.2 -9.91%
inv_txfm_add_16x4_flipadst_flipadst_0_8bpc_rvv 474.7 435.1 -8.34%
inv_txfm_add_16x4_flipadst_flipadst_1_8bpc_rvv 474.8 435.1 -8.36%
inv_txfm_add_16x4_flipadst_flipadst_2_8bpc_rvv 474.7 435.1 -8.34%
inv_txfm_add_16x4_flipadst_identity_0_8bpc_rvv 405.9 378.8 -6.68%
inv_txfm_add_16x4_flipadst_identity_1_8bpc_rvv 406.0 378.8 -6.70%
inv_txfm_add_16x4_flipadst_identity_2_8bpc_rvv 406.0 378.8 -6.70%
inv_txfm_add_16x4_identity_adst_0_8bpc_rvv 353.7 342.2 -3.25%
inv_txfm_add_16x4_identity_adst_1_8bpc_rvv 353.8 342.3 -3.25%
inv_txfm_add_16x4_identity_adst_2_8bpc_rvv 353.7 342.4 -3.19%
inv_txfm_add_16x4_identity_dct_0_8bpc_rvv 338.1 327.9 -3.02%
inv_txfm_add_16x4_identity_dct_1_8bpc_rvv 338.1 327.9 -3.02%
inv_txfm_add_16x4_identity_dct_2_8bpc_rvv 338.2 327.9 -3.05%
inv_txfm_add_16x4_identity_flipadst_0_8bpc_rvv 357.5 344.8 -3.55%
inv_txfm_add_16x4_identity_flipadst_1_8bpc_rvv 357.5 344.9 -3.52%
inv_txfm_add_16x4_identity_flipadst_2_8bpc_rvv 357.5 344.7 -3.58%
inv_txfm_add_16x4_identity_identity_0_8bpc_rvv 287.1 297.0 3.45%
inv_txfm_add_16x4_identity_identity_1_8bpc_rvv 287.2 297.0 3.41%
inv_txfm_add_16x4_identity_identity_2_8bpc_rvv 287.2 297.0 3.41%
inv_txfm_add_8x16_adst_adst_0_8bpc_rvv 774.3 704.8 -8.98%
inv_txfm_add_8x16_adst_adst_1_8bpc_rvv 774.4 704.8 -8.99%
inv_txfm_add_8x16_adst_adst_2_8bpc_rvv 929.5 839.9 -9.64%
inv_txfm_add_8x16_adst_dct_0_8bpc_rvv 687.9 634.9 -7.70%
inv_txfm_add_8x16_adst_dct_1_8bpc_rvv 688.0 634.8 -7.73%
inv_txfm_add_8x16_adst_dct_2_8bpc_rvv 845.5 768.4 -9.12%
inv_txfm_add_8x16_adst_flipadst_0_8bpc_rvv 779.5 708.5 -9.11%
inv_txfm_add_8x16_adst_flipadst_1_8bpc_rvv 779.5 708.5 -9.11%
inv_txfm_add_8x16_adst_flipadst_2_8bpc_rvv 933.3 849.9 -8.94%
inv_txfm_add_8x16_adst_identity_0_8bpc_rvv 546.5 529.0 -3.20%
inv_txfm_add_8x16_adst_identity_1_8bpc_rvv 546.5 529.0 -3.20%
inv_txfm_add_8x16_adst_identity_2_8bpc_rvv 702.5 664.1 -5.47%
inv_txfm_add_8x16_dct_adst_0_8bpc_rvv 739.9 672.7 -9.08%
inv_txfm_add_8x16_dct_adst_1_8bpc_rvv 739.9 672.7 -9.08%
inv_txfm_add_8x16_dct_adst_2_8bpc_rvv 863.1 776.1 -10.08%
inv_txfm_add_8x16_dct_dct_0_8bpc_rvv 651.2 601.9 -7.57%
inv_txfm_add_8x16_dct_dct_1_8bpc_rvv 651.2 601.8 -7.59%
inv_txfm_add_8x16_dct_dct_2_8bpc_rvv 777.6 706.5 -9.14%
inv_txfm_add_8x16_dct_flipadst_0_8bpc_rvv 742.4 678.9 -8.55%
inv_txfm_add_8x16_dct_flipadst_1_8bpc_rvv 742.5 678.9 -8.57%
inv_txfm_add_8x16_dct_flipadst_2_8bpc_rvv 858.8 779.3 -9.26%
inv_txfm_add_8x16_dct_identity_0_8bpc_rvv 510.8 496.4 -2.82%
inv_txfm_add_8x16_dct_identity_1_8bpc_rvv 510.6 496.5 -2.76%
inv_txfm_add_8x16_dct_identity_2_8bpc_rvv 630.0 599.7 -4.81%
inv_txfm_add_8x16_flipadst_adst_0_8bpc_rvv 778.3 707.2 -9.14%
inv_txfm_add_8x16_flipadst_adst_1_8bpc_rvv 778.3 707.1 -9.15%
inv_txfm_add_8x16_flipadst_adst_2_8bpc_rvv 934.4 843.5 -9.73%
inv_txfm_add_8x16_flipadst_dct_0_8bpc_rvv 689.3 634.7 -7.92%
inv_txfm_add_8x16_flipadst_dct_1_8bpc_rvv 689.2 634.8 -7.89%
inv_txfm_add_8x16_flipadst_dct_2_8bpc_rvv 845.8 774.4 -8.44%
inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_rvv 779.9 710.5 -8.90%
inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_rvv 780.0 710.4 -8.92%
inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_rvv 936.4 848.1 -9.43%
inv_txfm_add_8x16_flipadst_identity_0_8bpc_rvv 550.4 531.3 -3.47%
inv_txfm_add_8x16_flipadst_identity_1_8bpc_rvv 550.4 531.3 -3.47%
inv_txfm_add_8x16_flipadst_identity_2_8bpc_rvv 705.3 669.4 -5.09%
inv_txfm_add_8x16_identity_adst_0_8bpc_rvv 649.0 599.7 -7.60%
inv_txfm_add_8x16_identity_adst_1_8bpc_rvv 649.0 599.7 -7.60%
inv_txfm_add_8x16_identity_adst_2_8bpc_rvv 682.8 633.4 -7.23%
inv_txfm_add_8x16_identity_dct_0_8bpc_rvv 562.1 527.9 -6.08%
inv_txfm_add_8x16_identity_dct_1_8bpc_rvv 562.0 527.9 -6.07%
inv_txfm_add_8x16_identity_dct_2_8bpc_rvv 597.4 561.5 -6.01%
inv_txfm_add_8x16_identity_flipadst_0_8bpc_rvv 652.7 603.6 -7.52%
inv_txfm_add_8x16_identity_flipadst_1_8bpc_rvv 652.8 603.6 -7.54%
inv_txfm_add_8x16_identity_flipadst_2_8bpc_rvv 686.6 640.5 -6.71%
inv_txfm_add_8x16_identity_identity_0_8bpc_rvv 421.6 424.4 0.66%
inv_txfm_add_8x16_identity_identity_1_8bpc_rvv 421.7 424.4 0.64%
inv_txfm_add_8x16_identity_identity_2_8bpc_rvv 455.5 458.1 0.57%
inv_txfm_add_16x8_adst_adst_0_8bpc_rvv 935.2 843.2 -9.84%
inv_txfm_add_16x8_adst_adst_1_8bpc_rvv 935.2 843.3 -9.83%
inv_txfm_add_16x8_adst_adst_2_8bpc_rvv 935.2 843.1 -9.85%
inv_txfm_add_16x8_adst_dct_0_8bpc_rvv 857.0 781.1 -8.86%
inv_txfm_add_16x8_adst_dct_1_8bpc_rvv 856.9 781.1 -8.85%
inv_txfm_add_16x8_adst_dct_2_8bpc_rvv 856.9 781.0 -8.86%
inv_txfm_add_16x8_adst_flipadst_0_8bpc_rvv 938.9 846.8 -9.81%
inv_txfm_add_16x8_adst_flipadst_1_8bpc_rvv 938.8 847.0 -9.78%
inv_txfm_add_16x8_adst_flipadst_2_8bpc_rvv 938.9 847.0 -9.79%
inv_txfm_add_16x8_adst_identity_0_8bpc_rvv 711.2 661.6 -6.97%
inv_txfm_add_16x8_adst_identity_1_8bpc_rvv 711.2 661.6 -6.97%
inv_txfm_add_16x8_adst_identity_2_8bpc_rvv 711.2 661.6 -6.97%
inv_txfm_add_16x8_dct_adst_0_8bpc_rvv 846.1 771.5 -8.82%
inv_txfm_add_16x8_dct_adst_1_8bpc_rvv 845.9 771.5 -8.80%
inv_txfm_add_16x8_dct_adst_2_8bpc_rvv 846.2 772.1 -8.76%
inv_txfm_add_16x8_dct_dct_0_8bpc_rvv 767.8 710.3 -7.49%
inv_txfm_add_16x8_dct_dct_1_8bpc_rvv 767.8 710.4 -7.48%
inv_txfm_add_16x8_dct_dct_2_8bpc_rvv 767.4 710.4 -7.43%
inv_txfm_add_16x8_dct_flipadst_0_8bpc_rvv 856.6 775.6 -9.46%
inv_txfm_add_16x8_dct_flipadst_1_8bpc_rvv 856.5 775.1 -9.50%
inv_txfm_add_16x8_dct_flipadst_2_8bpc_rvv 856.6 775.2 -9.50%
inv_txfm_add_16x8_dct_identity_0_8bpc_rvv 623.3 589.9 -5.36%
inv_txfm_add_16x8_dct_identity_1_8bpc_rvv 623.3 590.0 -5.34%
inv_txfm_add_16x8_dct_identity_2_8bpc_rvv 623.3 589.7 -5.39%
inv_txfm_add_16x8_flipadst_adst_0_8bpc_rvv 939.8 846.9 -9.89%
inv_txfm_add_16x8_flipadst_adst_1_8bpc_rvv 939.8 847.0 -9.87%
inv_txfm_add_16x8_flipadst_adst_2_8bpc_rvv 939.9 846.9 -9.89%
inv_txfm_add_16x8_flipadst_dct_0_8bpc_rvv 860.8 784.9 -8.82%
inv_txfm_add_16x8_flipadst_dct_1_8bpc_rvv 860.7 784.8 -8.82%
inv_txfm_add_16x8_flipadst_dct_2_8bpc_rvv 860.8 784.9 -8.82%
inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_rvv 942.7 852.2 -9.60%
inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_rvv 942.7 852.1 -9.61%
inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_rvv 942.8 852.1 -9.62%
inv_txfm_add_16x8_flipadst_identity_0_8bpc_rvv 714.9 667.0 -6.70%
inv_txfm_add_16x8_flipadst_identity_1_8bpc_rvv 715.0 666.9 -6.73%
inv_txfm_add_16x8_flipadst_identity_2_8bpc_rvv 715.0 666.9 -6.73%
inv_txfm_add_16x8_identity_adst_0_8bpc_rvv 707.9 667.2 -5.75%
inv_txfm_add_16x8_identity_adst_1_8bpc_rvv 707.9 667.3 -5.74%
inv_txfm_add_16x8_identity_adst_2_8bpc_rvv 707.9 667.2 -5.75%
inv_txfm_add_16x8_identity_dct_0_8bpc_rvv 630.6 604.8 -4.09%
inv_txfm_add_16x8_identity_dct_1_8bpc_rvv 630.7 604.9 -4.09%
inv_txfm_add_16x8_identity_dct_2_8bpc_rvv 630.6 604.8 -4.09%
inv_txfm_add_16x8_identity_flipadst_0_8bpc_rvv 711.7 671.1 -5.70%
inv_txfm_add_16x8_identity_flipadst_1_8bpc_rvv 711.9 671.1 -5.73%
inv_txfm_add_16x8_identity_flipadst_2_8bpc_rvv 711.8 671.2 -5.70%
inv_txfm_add_16x8_identity_identity_0_8bpc_rvv 485.2 486.2 0.21%
inv_txfm_add_16x8_identity_identity_1_8bpc_rvv 485.2 486.3 0.23%
inv_txfm_add_16x8_identity_identity_2_8bpc_rvv 485.2 486.3 0.23%
2024-10-16 11:04:14 +00:00
Jean-Baptiste Kempf
389450f61e
NEWS: last updates about optimizations
2024-10-14 19:21:07 +02:00
Luca Barbato
79f7188c25
NEWS: add an entry for the Power9 optimization
2024-10-13 21:51:36 +02:00
Nathan E. Egge
572c5a669d
riscv: Fix argon test failure
...
This fixes md5sum mismatch in profile0_core/streams/test11168_11073.obu.
2024-10-13 18:11:27 +00:00
yuanhecai
257b04f91c
loongarch: fix argon tests failure
2024-10-12 15:19:57 +08:00
Bogdan Gligorijević and Luca Barbato
b2e7f06c72
riscv64/mc: warp_8x8 and warp_8x8t 8bpc
...
Benchmarks:
- Kendryte K230:
warp_8x8_8bpc_c: 4549.7 ( 1.00x)
warp_8x8_8bpc_rvv: 2504.7 ( 1.82x)
warp_8x8t_8bpc_c: 4414.7 ( 1.00x)
warp_8x8t_8bpc_rvv: 2465.7 ( 1.79x)
- Banana Pi BPI-F3:
warp_8x8_8bpc_c: 4431.2 ( 1.00x)
warp_8x8_8bpc_rvv: 3297.4 ( 1.34x)
warp_8x8t_8bpc_c: 4299.3 ( 1.00x)
warp_8x8t_8bpc_rvv: 3255.7 ( 1.32x)
2024-10-09 21:00:08 +02:00
Niklas Haas and Luca Barbato
56f6d16602
riscv64/mc: Re-order instructions
...
To avoid read-after-write. Speedup is about 1% for width=4 on a K230.
2024-10-09 16:18:42 +02:00
Niklas Haas and Luca Barbato
3d12677c54
riscv64/mc: Add bidir functions
...
This code compromises between the performance of a dedicated kernel per
VLEN/width pair, and the flexibility of a fully VLEN-dynamic loop, by
using a single special case for w=4, and subdividing the rest into the
unrolled four line fast path, and the general-purpose slow path (for
large width on small VLEN).
Kendryte K230
avg_w4_8bpc_c: 346.8 ( 1.00x)
avg_w4_8bpc_rvv: 50.3 ( 6.90x)
avg_w8_8bpc_c: 1054.9 ( 1.00x)
avg_w8_8bpc_rvv: 139.1 ( 7.58x)
avg_w16_8bpc_c: 3396.3 ( 1.00x)
avg_w16_8bpc_rvv: 350.6 ( 9.69x)
avg_w32_8bpc_c: 13734.3 ( 1.00x)
avg_w32_8bpc_rvv: 1226.3 (11.20x)
avg_w64_8bpc_c: 33260.9 ( 1.00x)
avg_w64_8bpc_rvv: 3869.4 ( 8.60x)
avg_w128_8bpc_c: 83441.3 ( 1.00x)
avg_w128_8bpc_rvv: 9765.1 ( 8.54x)
w_avg_w4_8bpc_c: 444.3 ( 1.00x)
w_avg_w4_8bpc_rvv: 75.8 ( 5.86x)
w_avg_w8_8bpc_c: 1365.6 ( 1.00x)
w_avg_w8_8bpc_rvv: 208.8 ( 6.54x)
w_avg_w16_8bpc_c: 4420.8 ( 1.00x)
w_avg_w16_8bpc_rvv: 570.7 ( 7.75x)
w_avg_w32_8bpc_c: 18010.9 ( 1.00x)
w_avg_w32_8bpc_rvv: 2074.4 ( 8.68x)
w_avg_w64_8bpc_c: 43050.4 ( 1.00x)
w_avg_w64_8bpc_rvv: 5799.5 ( 7.42x)
w_avg_w128_8bpc_c: 107153.6 ( 1.00x)
w_avg_w128_8bpc_rvv: 14272.0 ( 7.51x)
mask_w4_8bpc_c: 497.6 ( 1.00x)
mask_w4_8bpc_rvv: 88.5 ( 5.63x)
mask_w8_8bpc_c: 1528.5 ( 1.00x)
mask_w8_8bpc_rvv: 253.1 ( 6.04x)
mask_w16_8bpc_c: 4953.8 ( 1.00x)
mask_w16_8bpc_rvv: 679.0 ( 7.30x)
mask_w32_8bpc_c: 20298.3 ( 1.00x)
mask_w32_8bpc_rvv: 3012.9 ( 6.74x)
mask_w64_8bpc_c: 49718.8 ( 1.00x)
mask_w64_8bpc_rvv: 7291.7 ( 6.82x)
mask_w128_8bpc_c: 126740.3 ( 1.00x)
mask_w128_8bpc_rvv: 18351.1 ( 6.91x)
2024-10-09 16:18:42 +02:00
Niklas Haas and Luca Barbato
50ac82603a
riscv: Add $vtype helper definitions
2024-10-09 16:18:42 +02:00
Nathan E. Egge and Luca Barbato
cc7d8773ee
riscv64/mc: Branchless vsetvl in blend_v function
...
Kendryte K230
blend_v_w2_8bpc_c: 221.4 ( 1.00x)
blend_v_w2_8bpc_rvv: 147.7 ( 1.50x)
blend_v_w4_8bpc_c: 945.3 ( 1.00x)
blend_v_w4_8bpc_rvv: 243.3 ( 3.89x)
blend_v_w8_8bpc_c: 1786.9 ( 1.00x)
blend_v_w8_8bpc_rvv: 256.1 ( 6.98x)
blend_v_w16_8bpc_c: 3472.1 ( 1.00x)
blend_v_w16_8bpc_rvv: 351.1 ( 9.89x)
blend_v_w32_8bpc_c: 6832.1 ( 1.00x)
blend_v_w32_8bpc_rvv: 635.4 (10.75x)
SpacemiT K1
blend_v_w2_8bpc_c: 218.0 ( 1.00x)
blend_v_w2_8bpc_rvv: 144.3 ( 1.51x)
blend_v_w4_8bpc_c: 921.7 ( 1.00x)
blend_v_w4_8bpc_rvv: 237.1 ( 3.89x)
blend_v_w8_8bpc_c: 1739.8 ( 1.00x)
blend_v_w8_8bpc_rvv: 237.4 ( 7.33x)
blend_v_w16_8bpc_c: 3376.6 ( 1.00x)
blend_v_w16_8bpc_rvv: 296.3 (11.40x)
blend_v_w32_8bpc_c: 6647.2 ( 1.00x)
blend_v_w32_8bpc_rvv: 408.1 (16.29x)
2024-10-09 16:18:42 +02:00
Nathan E. Egge and Luca Barbato
2da8107ec1
riscv64/mc: Branchless vsetvl in blend_h function
...
Kendryte K230
blend_h_w2_8bpc_c: 165.9 ( 1.00x)
blend_h_w2_8bpc_rvv: 83.8 ( 1.98x)
blend_h_w4_8bpc_c: 295.2 ( 1.00x)
blend_h_w4_8bpc_rvv: 83.8 ( 3.52x)
blend_h_w8_8bpc_c: 557.9 ( 1.00x)
blend_h_w8_8bpc_rvv: 92.5 ( 6.03x)
blend_h_w16_8bpc_c: 1078.8 ( 1.00x)
blend_h_w16_8bpc_rvv: 117.3 ( 9.19x)
blend_h_w32_8bpc_c: 2117.8 ( 1.00x)
blend_h_w32_8bpc_rvv: 200.5 (10.57x)
blend_h_w64_8bpc_c: 4194.7 ( 1.00x)
blend_h_w64_8bpc_rvv: 363.2 (11.55x)
blend_h_w128_8bpc_c: 10271.4 ( 1.00x)
blend_h_w128_8bpc_rvv: 844.5 (12.16x)
SpacemiT K1
blend_h_w2_8bpc_c: 162.5 ( 1.00x)
blend_h_w2_8bpc_rvv: 83.9 ( 1.94x)
blend_h_w4_8bpc_c: 288.6 ( 1.00x)
blend_h_w4_8bpc_rvv: 83.7 ( 3.45x)
blend_h_w8_8bpc_c: 544.7 ( 1.00x)
blend_h_w8_8bpc_rvv: 84.0 ( 6.48x)
blend_h_w16_8bpc_c: 1052.8 ( 1.00x)
blend_h_w16_8bpc_rvv: 102.9 (10.23x)
blend_h_w32_8bpc_c: 2068.0 ( 1.00x)
blend_h_w32_8bpc_rvv: 131.4 (15.73x)
blend_h_w64_8bpc_c: 4093.7 ( 1.00x)
blend_h_w64_8bpc_rvv: 220.3 (18.58x)
blend_h_w128_8bpc_c: 10023.1 ( 1.00x)
blend_h_w128_8bpc_rvv: 467.3 (21.45x)
2024-10-09 16:18:42 +02:00
Nathan E. Egge and Luca Barbato
b374b24c0f
riscv64/mc: Branchless vsetvl in blend function
...
Kendryte K230
blend_w4_8bpc_c: 204.8 ( 1.00x)
blend_w4_8bpc_rvv: 59.8 ( 3.42x)
blend_w8_8bpc_c: 608.9 ( 1.00x)
blend_w8_8bpc_rvv: 87.2 ( 6.98x)
blend_w16_8bpc_c: 2362.4 ( 1.00x)
blend_w16_8bpc_rvv: 225.2 (10.49x)
blend_w32_8bpc_c: 5990.4 ( 1.00x)
blend_w32_8bpc_rvv: 518.3 (11.56x)
SpacemiT K1
blend_w4_8bpc_c: 201.6 ( 1.00x)
blend_w4_8bpc_rvv: 58.0 ( 3.48x)
blend_w8_8bpc_c: 595.1 ( 1.00x)
blend_w8_8bpc_rvv: 82.1 ( 7.25x)
blend_w16_8bpc_c: 2308.8 ( 1.00x)
blend_w16_8bpc_rvv: 189.0 (12.22x)
blend_w32_8bpc_c: 5853.1 ( 1.00x)
blend_w32_8bpc_rvv: 339.5 (17.24x)
2024-10-09 16:18:42 +02:00
Nathan E. Egge and Luca Barbato
0e3f70e898
riscv64/mc: Add VLEN=256 8bpc RVV blend_v function
...
SpacemiT K1
blend_v_w2_8bpc_c: 217.0 ( 1.00x)
blend_v_w2_8bpc_rvv: 143.3 ( 1.51x)
blend_v_w4_8bpc_c: 921.6 ( 1.00x)
blend_v_w4_8bpc_rvv: 236.3 ( 3.90x)
blend_v_w8_8bpc_c: 1738.2 ( 1.00x)
blend_v_w8_8bpc_rvv: 238.1 ( 7.30x)
blend_v_w16_8bpc_c: 3376.1 ( 1.00x)
blend_v_w16_8bpc_rvv: 298.0 (11.33x)
blend_v_w32_8bpc_c: 6648.0 ( 1.00x)
blend_v_w32_8bpc_rvv: 409.5 (16.24x)
2024-10-09 16:18:42 +02:00
Nathan E. Egge and Luca Barbato
a5b9544866
riscv64/mc: Add VLEN=256 8bpc RVV blend_h function
...
SpacemiT K1
blend_h_w2_8bpc_c: 161.8 ( 1.00x)
blend_h_w2_8bpc_rvv: 83.5 ( 1.94x)
blend_h_w4_8bpc_c: 288.4 ( 1.00x)
blend_h_w4_8bpc_rvv: 83.7 ( 3.45x)
blend_h_w8_8bpc_c: 543.9 ( 1.00x)
blend_h_w8_8bpc_rvv: 84.5 ( 6.44x)
blend_h_w16_8bpc_c: 1051.6 ( 1.00x)
blend_h_w16_8bpc_rvv: 103.8 (10.13x)
blend_h_w32_8bpc_c: 2066.0 ( 1.00x)
blend_h_w32_8bpc_rvv: 133.8 (15.44x)
blend_h_w64_8bpc_c: 4092.7 ( 1.00x)
blend_h_w64_8bpc_rvv: 225.2 (18.18x)
blend_h_w128_8bpc_c: 10011.3 ( 1.00x)
blend_h_w128_8bpc_rvv: 474.7 (21.09x)
2024-10-09 16:18:42 +02:00
Nathan E. Egge and Luca Barbato
83485c5092
riscv64/mc: Add VLEN=256 8bpc RVV blend function
...
SpacemiT K1
blend_w4_8bpc_c: 201.3 ( 1.00x)
blend_w4_8bpc_rvv: 59.3 ( 3.40x)
blend_w8_8bpc_c: 595.1 ( 1.00x)
blend_w8_8bpc_rvv: 84.1 ( 7.07x)
blend_w16_8bpc_c: 2309.0 ( 1.00x)
blend_w16_8bpc_rvv: 190.5 (12.12x)
blend_w32_8bpc_c: 5854.7 ( 1.00x)
blend_w32_8bpc_rvv: 341.6 (17.14x)
2024-10-09 16:18:42 +02:00
Nathan E. Egge and Luca Barbato
7f2bb2fbc9
riscv: Move get_vlenb() from checkasm_ to dav1d_
2024-10-09 16:18:42 +02:00
Nathan E. Egge and Luca Barbato
01da36ebdf
riscv64/mc: Add 8bpc RVV blend_v function
...
Kendryte K230
blend_v_w2_8bpc_c: 219.6 ( 1.00x)
blend_v_w2_8bpc_rvv: 141.8 ( 1.55x)
blend_v_w4_8bpc_c: 942.9 ( 1.00x)
blend_v_w4_8bpc_rvv: 240.9 ( 3.91x)
blend_v_w8_8bpc_c: 1783.5 ( 1.00x)
blend_v_w8_8bpc_rvv: 254.7 ( 7.00x)
blend_v_w16_8bpc_c: 3466.5 ( 1.00x)
blend_v_w16_8bpc_rvv: 350.5 ( 9.89x)
blend_v_w32_8bpc_c: 6825.2 ( 1.00x)
blend_v_w32_8bpc_rvv: 635.1 (10.75x)
2024-10-09 16:18:42 +02:00
Nathan E. Egge and Luca Barbato
d3a94f1194
riscv64/mc: Add 8bpc RVV blend_h function
...
Kendryte K230
blend_h_w2_8bpc_c: 165.4 ( 1.00x)
blend_h_w2_8bpc_rvv: 79.4 ( 2.08x)
blend_h_w4_8bpc_c: 294.6 ( 1.00x)
blend_h_w4_8bpc_rvv: 81.5 ( 3.61x)
blend_h_w8_8bpc_c: 556.9 ( 1.00x)
blend_h_w8_8bpc_rvv: 90.2 ( 6.17x)
blend_h_w16_8bpc_c: 1077.6 ( 1.00x)
blend_h_w16_8bpc_rvv: 116.1 ( 9.29x)
blend_h_w32_8bpc_c: 2116.2 ( 1.00x)
blend_h_w32_8bpc_rvv: 200.5 (10.55x)
blend_h_w64_8bpc_c: 4191.8 ( 1.00x)
blend_h_w64_8bpc_rvv: 363.3 (11.54x)
blend_h_w128_8bpc_c: 10264.6 ( 1.00x)
blend_h_w128_8bpc_rvv: 844.1 (12.16x)
2024-10-09 16:18:42 +02:00
Nathan E. Egge and Luca Barbato
f851fcd0b4
riscv64/mc: Add 8bpc RVV blend function
...
Kendryte K230
blend_w4_8bpc_c: 204.5 ( 1.00x)
blend_w4_8bpc_rvv: 56.4 ( 3.62x)
blend_w8_8bpc_c: 608.6 ( 1.00x)
blend_w8_8bpc_rvv: 87.3 ( 6.97x)
blend_w16_8bpc_c: 2363.8 ( 1.00x)
blend_w16_8bpc_rvv: 225.1 (10.50x)
blend_w32_8bpc_c: 5990.3 ( 1.00x)
blend_w32_8bpc_rvv: 518.8 (11.55x)
2024-10-09 16:18:42 +02:00
Bogdan Gligorijević and Luca Barbato
848c5a2dbb
Tone down loop to only 2 iterations
...
Benchmark pending
2024-10-09 16:18:42 +02:00
Bogdan Gligorijević and Luca Barbato
a0a08d8543
Scalar dc calculation
...
Current benchmark:
- Kendryte K230:
inv_txfm_add_16x16_dct_dct_0_8bpc_c: 1729.4 ( 1.00x)
inv_txfm_add_16x16_dct_dct_0_8bpc_rvv: 153.2 (11.29x)
- spacemiT K1:
inv_txfm_add_16x16_dct_dct_0_8bpc_c: 1533.4 ( 1.00x)
inv_txfm_add_16x16_dct_dct_0_8bpc_rvv: 176.8 ( 8.67x)
2024-10-09 16:18:42 +02:00
Bogdan Gligorijević and Luca Barbato
c8749f06e5
riscv64/itx: Special case 16x16 8bpc dct_dct eob=0
...
Performance comparison:
- SpacemiT K1: Master branch: itx_16x16:
inv_txfm_add_16x16_dct_dct_0_8bpc_c: 1534.1 ( 1.00x) 1534.9 ( 1.00x)
inv_txfm_add_16x16_dct_dct_0_8bpc_rvv: 1173.6 ( 1.31x) 173.1 ( 8.87x)
- Kendryte K230: Master branch: itx_16x16:
inv_txfm_add_16x16_dct_dct_0_8bpc_c: 1576.0 ( 1.00x) 1579.1 ( 1.00x)
inv_txfm_add_16x16_dct_dct_0_8bpc_rvv: 1095.5 ( 1.44x) 146.8 (10.75x)
2024-10-09 16:18:42 +02:00
Bogdan Gligorijević and Luca Barbato
0cdf1b4be5
ipred_paeth
...
Benchmarks:
- Kendryte K230:
intra_pred_paeth_w4_8bpc_c: 412.9 ( 1.00x)
intra_pred_paeth_w4_8bpc_rvv: 688.0 ( 0.60x)
intra_pred_paeth_w8_8bpc_c: 1206.6 ( 1.00x)
intra_pred_paeth_w8_8bpc_rvv: 1094.3 ( 1.10x)
intra_pred_paeth_w16_8bpc_c: 3889.7 ( 1.00x)
intra_pred_paeth_w16_8bpc_rvv: 1796.7 ( 2.16x)
intra_pred_paeth_w32_8bpc_c: 9797.2 ( 1.00x)
intra_pred_paeth_w32_8bpc_rvv: 4323.9 ( 2.27x)
intra_pred_paeth_w64_8bpc_c: 24242.5 ( 1.00x)
intra_pred_paeth_w64_8bpc_rvv: 10739.8 ( 2.26x)
- Banana Pi BPI-F3
intra_pred_paeth_w4_8bpc_c: 395.1 ( 1.00x)
intra_pred_paeth_w4_8bpc_rvv: 705.4 ( 0.56x)
intra_pred_paeth_w8_8bpc_c: 1184.9 ( 1.00x)
intra_pred_paeth_w8_8bpc_rvv: 1125.3 ( 1.05x)
intra_pred_paeth_w16_8bpc_c: 3807.8 ( 1.00x)
intra_pred_paeth_w16_8bpc_rvv: 1850.8 ( 2.06x)
intra_pred_paeth_w32_8bpc_c: 9985.1 ( 1.00x)
intra_pred_paeth_w32_8bpc_rvv: 2235.5 ( 4.47x)
intra_pred_paeth_w64_8bpc_c: 24040.4 ( 1.00x)
intra_pred_paeth_w64_8bpc_rvv: 5450.0 ( 4.41x)
2024-10-09 16:18:42 +02:00
Bogdan Gligorijević and Luca Barbato
b830ac82bb
pal_pred
...
Benchmarks:
- Kendryte K230:
pal_pred_w4_8bpc_c: 115.6 ( 1.00x)
pal_pred_w4_8bpc_rvv: 331.4 ( 0.35x)
pal_pred_w4_16bpc_c: 140.8 ( 1.00x)
pal_pred_w4_16bpc_rvv: 247.9 ( 0.57x)
pal_pred_w8_8bpc_c: 334.9 ( 1.00x)
pal_pred_w8_8bpc_rvv: 520.8 ( 0.64x)
pal_pred_w8_16bpc_c: 412.7 ( 1.00x)
pal_pred_w8_16bpc_rvv: 386.2 ( 1.07x)
pal_pred_w16_8bpc_c: 1044.4 ( 1.00x)
pal_pred_w16_8bpc_rvv: 842.8 ( 1.24x)
pal_pred_w16_16bpc_c: 1300.3 ( 1.00x)
pal_pred_w16_16bpc_rvv: 619.9 ( 2.10x)
pal_pred_w32_8bpc_c: 2452.8 ( 1.00x)
pal_pred_w32_8bpc_rvv: 1016.1 ( 2.41x)
pal_pred_w32_16bpc_c: 3072.1 ( 1.00x)
pal_pred_w32_16bpc_rvv: 1440.5 ( 2.13x)
pal_pred_w64_8bpc_c: 6015.8 ( 1.00x)
pal_pred_w64_8bpc_rvv: 2505.5 ( 2.40x)
pal_pred_w64_16bpc_c: 7552.4 ( 1.00x)
pal_pred_w64_16bpc_rvv: 3512.7 ( 2.15x)
- Banana Pi BPI-F3:
pal_pred_w4_8bpc_c: 102.2 ( 1.00x)
pal_pred_w4_8bpc_rvv: 511.2 ( 0.20x)
pal_pred_w4_16bpc_c: 137.7 ( 1.00x)
pal_pred_w4_16bpc_rvv: 330.9 ( 0.42x)
pal_pred_w8_8bpc_c: 289.2 ( 1.00x)
pal_pred_w8_8bpc_rvv: 819.6 ( 0.35x)
pal_pred_w8_16bpc_c: 402.6 ( 1.00x)
pal_pred_w8_16bpc_rvv: 520.7 ( 0.77x)
pal_pred_w16_8bpc_c: 894.5 ( 1.00x)
pal_pred_w16_8bpc_rvv: 1326.6 ( 0.67x)
pal_pred_w16_16bpc_c: 1268.6 ( 1.00x)
pal_pred_w16_16bpc_rvv: 845.8 ( 1.50x)
pal_pred_w32_8bpc_c: 2094.5 ( 1.00x)
pal_pred_w32_8bpc_rvv: 1610.9 ( 1.30x)
pal_pred_w32_16bpc_c: 2999.4 ( 1.00x)
pal_pred_w32_16bpc_rvv: 1029.8 ( 2.91x)
pal_pred_w64_8bpc_c: 5128.0 ( 1.00x)
pal_pred_w64_8bpc_rvv: 2000.8 ( 2.56x)
pal_pred_w64_16bpc_c: 7375.0 ( 1.00x)
pal_pred_w64_16bpc_rvv: 2518.2 ( 2.93x)
2024-10-09 16:18:42 +02:00
Bogdan Gligorijević and Luca Barbato
44541dfa6b
ipred_smooth
...
Benchmarks:
- Kendryte K230:
intra_pred_smooth_w4_8bpc_c: 392.6 ( 1.00x)
intra_pred_smooth_w4_8bpc_rvv: 311.3 ( 1.26x)
intra_pred_smooth_w8_8bpc_c: 1204.1 ( 1.00x)
intra_pred_smooth_w8_8bpc_rvv: 488.9 ( 2.46x)
intra_pred_smooth_w16_8bpc_c: 3885.9 ( 1.00x)
intra_pred_smooth_w16_8bpc_rvv: 796.6 ( 4.88x)
intra_pred_smooth_w32_8bpc_c: 9305.7 ( 1.00x)
intra_pred_smooth_w32_8bpc_rvv: 1806.7 ( 5.15x)
intra_pred_smooth_w64_8bpc_c: 23043.0 ( 1.00x)
intra_pred_smooth_w64_8bpc_rvv: 4344.3 ( 5.30x)
- spacemiT K1:
intra_pred_smooth_w4_8bpc_c: 384.1 ( 1.00x)
intra_pred_smooth_w4_8bpc_rvv: 322.2 ( 1.19x)
intra_pred_smooth_w8_8bpc_c: 1177.6 ( 1.00x)
intra_pred_smooth_w8_8bpc_rvv: 507.1 ( 2.32x)
intra_pred_smooth_w16_8bpc_c: 3801.2 ( 1.00x)
intra_pred_smooth_w16_8bpc_rvv: 814.4 ( 4.67x)
intra_pred_smooth_w32_8bpc_c: 9103.1 ( 1.00x)
intra_pred_smooth_w32_8bpc_rvv: 980.8 ( 9.28x)
intra_pred_smooth_w64_8bpc_c: 22540.1 ( 1.00x)
intra_pred_smooth_w64_8bpc_rvv: 2319.3 ( 9.72x)
2024-10-09 16:18:42 +02:00
Bogdan Gligorijević and Luca Barbato
d711f974eb
ipred cfl functions
...
Benchmarks:
- Kendryte K230:
cfl_pred_cfl_128_w4_8bpc_c: 497.3 ( 1.00x)
cfl_pred_cfl_128_w4_8bpc_rvv: 369.6 ( 1.35x)
cfl_pred_cfl_128_w4_16bpc_c: 425.2 ( 1.00x)
cfl_pred_cfl_128_w4_16bpc_rvv: 385.5 ( 1.10x)
cfl_pred_cfl_128_w8_8bpc_c: 1544.2 ( 1.00x)
cfl_pred_cfl_128_w8_8bpc_rvv: 584.2 ( 2.64x)
cfl_pred_cfl_128_w8_16bpc_c: 1306.2 ( 1.00x)
cfl_pred_cfl_128_w8_16bpc_rvv: 608.8 ( 2.15x)
cfl_pred_cfl_128_w16_8bpc_c: 3085.6 ( 1.00x)
cfl_pred_cfl_128_w16_8bpc_rvv: 584.2 ( 5.28x)
cfl_pred_cfl_128_w16_16bpc_c: 2657.1 ( 1.00x)
cfl_pred_cfl_128_w16_16bpc_rvv: 608.9 ( 4.36x)
cfl_pred_cfl_128_w32_8bpc_c: 8405.6 ( 1.00x)
cfl_pred_cfl_128_w32_8bpc_rvv: 1416.1 ( 5.94x)
cfl_pred_cfl_128_w32_16bpc_c: 7199.9 ( 1.00x)
cfl_pred_cfl_128_w32_16bpc_rvv: 1479.8 ( 4.87x)
cfl_pred_cfl_left_w4_8bpc_c: 553.1 ( 1.00x)
cfl_pred_cfl_left_w4_8bpc_rvv: 395.6 ( 1.40x)
cfl_pred_cfl_left_w4_16bpc_c: 486.7 ( 1.00x)
cfl_pred_cfl_left_w4_16bpc_rvv: 409.1 ( 1.19x)
cfl_pred_cfl_left_w8_8bpc_c: 1610.8 ( 1.00x)
cfl_pred_cfl_left_w8_8bpc_rvv: 610.4 ( 2.64x)
cfl_pred_cfl_left_w8_16bpc_c: 1378.0 ( 1.00x)
cfl_pred_cfl_left_w8_16bpc_rvv: 636.2 ( 2.17x)
cfl_pred_cfl_left_w16_8bpc_c: 3154.4 ( 1.00x)
cfl_pred_cfl_left_w16_8bpc_rvv: 610.4 ( 5.17x)
cfl_pred_cfl_left_w16_16bpc_c: 2733.2 ( 1.00x)
cfl_pred_cfl_left_w16_16bpc_rvv: 636.3 ( 4.30x)
cfl_pred_cfl_left_w32_8bpc_c: 8451.7 ( 1.00x)
cfl_pred_cfl_left_w32_8bpc_rvv: 1442.5 ( 5.86x)
cfl_pred_cfl_left_w32_16bpc_c: 7267.2 ( 1.00x)
cfl_pred_cfl_left_w32_16bpc_rvv: 1509.4 ( 4.81x)
cfl_pred_cfl_top_w4_8bpc_c: 544.7 ( 1.00x)
cfl_pred_cfl_top_w4_8bpc_rvv: 395.8 ( 1.38x)
cfl_pred_cfl_top_w4_16bpc_c: 475.1 ( 1.00x)
cfl_pred_cfl_top_w4_16bpc_rvv: 406.7 ( 1.17x)
cfl_pred_cfl_top_w8_8bpc_c: 1599.3 ( 1.00x)
cfl_pred_cfl_top_w8_8bpc_rvv: 610.4 ( 2.62x)
cfl_pred_cfl_top_w8_16bpc_c: 1363.8 ( 1.00x)
cfl_pred_cfl_top_w8_16bpc_rvv: 630.3 ( 2.16x)
cfl_pred_cfl_top_w16_8bpc_c: 3161.0 ( 1.00x)
cfl_pred_cfl_top_w16_8bpc_rvv: 610.5 ( 5.18x)
cfl_pred_cfl_top_w16_16bpc_c: 2735.9 ( 1.00x)
cfl_pred_cfl_top_w16_16bpc_rvv: 634.3 ( 4.31x)
cfl_pred_cfl_top_w32_8bpc_c: 8564.4 ( 1.00x)
cfl_pred_cfl_top_w32_8bpc_rvv: 1442.8 ( 5.94x)
cfl_pred_cfl_top_w32_16bpc_c: 7294.9 ( 1.00x)
cfl_pred_cfl_top_w32_16bpc_rvv: 1511.5 ( 4.83x)
cfl_pred_cfl_w4_8bpc_c: 571.5 ( 1.00x)
cfl_pred_cfl_w4_8bpc_rvv: 421.0 ( 1.36x)
cfl_pred_cfl_w4_16bpc_c: 499.1 ( 1.00x)
cfl_pred_cfl_w4_16bpc_rvv: 462.8 ( 1.08x)
cfl_pred_cfl_w8_8bpc_c: 1642.0 ( 1.00x)
cfl_pred_cfl_w8_8bpc_rvv: 635.8 ( 2.58x)
cfl_pred_cfl_w8_16bpc_c: 1401.4 ( 1.00x)
cfl_pred_cfl_w8_16bpc_rvv: 686.1 ( 2.04x)
cfl_pred_cfl_w16_8bpc_c: 3204.3 ( 1.00x)
cfl_pred_cfl_w16_8bpc_rvv: 635.8 ( 5.04x)
cfl_pred_cfl_w16_16bpc_c: 2784.8 ( 1.00x)
cfl_pred_cfl_w16_16bpc_rvv: 686.1 ( 4.06x)
cfl_pred_cfl_w32_8bpc_c: 8623.9 ( 1.00x)
cfl_pred_cfl_w32_8bpc_rvv: 1465.9 ( 5.88x)
cfl_pred_cfl_w32_16bpc_c: 7357.8 ( 1.00x)
cfl_pred_cfl_w32_16bpc_rvv: 1556.3 ( 4.73x)
- Banana Pi BPI-F3:
cfl_pred_cfl_128_w4_8bpc_c: 485.5 ( 1.00x)
cfl_pred_cfl_128_w4_8bpc_rvv: 366.4 ( 1.33x)
cfl_pred_cfl_128_w4_16bpc_c: 393.5 ( 1.00x)
cfl_pred_cfl_128_w4_16bpc_rvv: 378.7 ( 1.04x)
cfl_pred_cfl_128_w8_8bpc_c: 1507.9 ( 1.00x)
cfl_pred_cfl_128_w8_8bpc_rvv: 577.4 ( 2.61x)
cfl_pred_cfl_128_w8_16bpc_c: 1205.7 ( 1.00x)
cfl_pred_cfl_128_w8_16bpc_rvv: 605.1 ( 1.99x)
cfl_pred_cfl_128_w16_8bpc_c: 3019.3 ( 1.00x)
cfl_pred_cfl_128_w16_8bpc_rvv: 577.4 ( 5.23x)
cfl_pred_cfl_128_w16_16bpc_c: 2506.5 ( 1.00x)
cfl_pred_cfl_128_w16_16bpc_rvv: 605.1 ( 4.14x)
cfl_pred_cfl_128_w32_8bpc_c: 8170.0 ( 1.00x)
cfl_pred_cfl_128_w32_8bpc_rvv: 715.6 (11.42x)
cfl_pred_cfl_128_w32_16bpc_c: 6686.7 ( 1.00x)
cfl_pred_cfl_128_w32_16bpc_rvv: 749.7 ( 8.92x)
cfl_pred_cfl_left_w4_8bpc_c: 539.4 ( 1.00x)
cfl_pred_cfl_left_w4_8bpc_rvv: 393.2 ( 1.37x)
cfl_pred_cfl_left_w4_16bpc_c: 452.0 ( 1.00x)
cfl_pred_cfl_left_w4_16bpc_rvv: 401.2 ( 1.13x)
cfl_pred_cfl_left_w8_8bpc_c: 1572.4 ( 1.00x)
cfl_pred_cfl_left_w8_8bpc_rvv: 604.1 ( 2.60x)
cfl_pred_cfl_left_w8_16bpc_c: 1274.5 ( 1.00x)
cfl_pred_cfl_left_w8_16bpc_rvv: 629.0 ( 2.03x)
cfl_pred_cfl_left_w16_8bpc_c: 3096.0 ( 1.00x)
cfl_pred_cfl_left_w16_8bpc_rvv: 604.1 ( 5.13x)
cfl_pred_cfl_left_w16_16bpc_c: 2591.4 ( 1.00x)
cfl_pred_cfl_left_w16_16bpc_rvv: 629.0 ( 4.12x)
cfl_pred_cfl_left_w32_8bpc_c: 8266.0 ( 1.00x)
cfl_pred_cfl_left_w32_8bpc_rvv: 742.4 (11.13x)
cfl_pred_cfl_left_w32_16bpc_c: 6758.0 ( 1.00x)
cfl_pred_cfl_left_w32_16bpc_rvv: 773.9 ( 8.73x)
cfl_pred_cfl_top_w4_8bpc_c: 532.3 ( 1.00x)
cfl_pred_cfl_top_w4_8bpc_rvv: 392.6 ( 1.36x)
cfl_pred_cfl_top_w4_16bpc_c: 440.4 ( 1.00x)
cfl_pred_cfl_top_w4_16bpc_rvv: 399.6 ( 1.10x)
cfl_pred_cfl_top_w8_8bpc_c: 1563.3 ( 1.00x)
cfl_pred_cfl_top_w8_8bpc_rvv: 603.6 ( 2.59x)
cfl_pred_cfl_top_w8_16bpc_c: 1271.6 ( 1.00x)
cfl_pred_cfl_top_w8_16bpc_rvv: 626.1 ( 2.03x)
cfl_pred_cfl_top_w16_8bpc_c: 3098.6 ( 1.00x)
cfl_pred_cfl_top_w16_8bpc_rvv: 603.6 ( 5.13x)
cfl_pred_cfl_top_w16_16bpc_c: 2562.8 ( 1.00x)
cfl_pred_cfl_top_w16_16bpc_rvv: 626.0 ( 4.09x)
cfl_pred_cfl_top_w32_8bpc_c: 8278.1 ( 1.00x)
cfl_pred_cfl_top_w32_8bpc_rvv: 741.8 (11.16x)
cfl_pred_cfl_top_w32_16bpc_c: 6799.1 ( 1.00x)
cfl_pred_cfl_top_w32_16bpc_rvv: 775.0 ( 8.77x)
cfl_pred_cfl_w4_8bpc_c: 559.8 ( 1.00x)
cfl_pred_cfl_w4_8bpc_rvv: 421.7 ( 1.33x)
cfl_pred_cfl_w4_16bpc_c: 470.2 ( 1.00x)
cfl_pred_cfl_w4_16bpc_rvv: 451.3 ( 1.04x)
cfl_pred_cfl_w8_8bpc_c: 1605.5 ( 1.00x)
cfl_pred_cfl_w8_8bpc_rvv: 632.8 ( 2.54x)
cfl_pred_cfl_w8_16bpc_c: 1308.5 ( 1.00x)
cfl_pred_cfl_w8_16bpc_rvv: 677.9 ( 1.93x)
cfl_pred_cfl_w16_8bpc_c: 3135.0 ( 1.00x)
cfl_pred_cfl_w16_8bpc_rvv: 632.9 ( 4.95x)
cfl_pred_cfl_w16_16bpc_c: 2625.9 ( 1.00x)
cfl_pred_cfl_w16_16bpc_rvv: 677.9 ( 3.87x)
cfl_pred_cfl_w32_8bpc_c: 8376.6 ( 1.00x)
cfl_pred_cfl_w32_8bpc_rvv: 770.4 (10.87x)
cfl_pred_cfl_w32_16bpc_c: 6866.4 ( 1.00x)
cfl_pred_cfl_w32_16bpc_rvv: 822.7 ( 8.35x)
2024-10-09 16:18:42 +02:00
Bogdan Gligorijević and Luca Barbato
2f5bfc37b0
riscv64/cdef: filter functions
...
Benchmarks:
- Kendryte K230:
cdef_filter_4x4_01_8bpc_c: 1339.4 ( 1.00x)
cdef_filter_4x4_01_8bpc_rvv: 836.2 ( 1.60x)
cdef_filter_4x4_01_16bpc_c: 1369.1 ( 1.00x)
cdef_filter_4x4_01_16bpc_rvv: 824.7 ( 1.66x)
cdef_filter_4x4_10_8bpc_c: 872.8 ( 1.00x)
cdef_filter_4x4_10_8bpc_rvv: 523.9 ( 1.67x)
cdef_filter_4x4_10_16bpc_c: 938.2 ( 1.00x)
cdef_filter_4x4_10_16bpc_rvv: 517.1 ( 1.81x)
cdef_filter_4x4_11_8bpc_c: 2668.3 ( 1.00x)
cdef_filter_4x4_11_8bpc_rvv: 1285.0 ( 2.08x)
cdef_filter_4x4_11_16bpc_c: 2922.1 ( 1.00x)
cdef_filter_4x4_11_16bpc_rvv: 1291.0 ( 2.26x)
cdef_filter_4x8_01_8bpc_c: 2489.1 ( 1.00x)
cdef_filter_4x8_01_8bpc_rvv: 1594.3 ( 1.56x)
cdef_filter_4x8_01_16bpc_c: 2528.1 ( 1.00x)
cdef_filter_4x8_01_16bpc_rvv: 1566.6 ( 1.61x)
cdef_filter_4x8_10_8bpc_c: 1576.9 ( 1.00x)
cdef_filter_4x8_10_8bpc_rvv: 967.1 ( 1.63x)
cdef_filter_4x8_10_16bpc_c: 1641.3 ( 1.00x)
cdef_filter_4x8_10_16bpc_rvv: 947.1 ( 1.73x)
cdef_filter_4x8_11_8bpc_c: 5164.0 ( 1.00x)
cdef_filter_4x8_11_8bpc_rvv: 2490.7 ( 2.07x)
cdef_filter_4x8_11_16bpc_c: 5732.3 ( 1.00x)
cdef_filter_4x8_11_16bpc_rvv: 2499.2 ( 2.29x)
cdef_filter_8x8_01_8bpc_c: 4742.3 ( 1.00x)
cdef_filter_8x8_01_8bpc_rvv: 1628.6 ( 2.91x)
cdef_filter_8x8_01_16bpc_c: 4785.0 ( 1.00x)
cdef_filter_8x8_01_16bpc_rvv: 1595.5 ( 3.00x)
cdef_filter_8x8_10_8bpc_c: 2962.4 ( 1.00x)
cdef_filter_8x8_10_8bpc_rvv: 1000.8 ( 2.96x)
cdef_filter_8x8_10_16bpc_c: 3022.4 ( 1.00x)
cdef_filter_8x8_10_16bpc_rvv: 975.7 ( 3.10x)
cdef_filter_8x8_11_8bpc_c: 12623.9 ( 1.00x)
cdef_filter_8x8_11_8bpc_rvv: 2525.4 ( 5.00x)
cdef_filter_8x8_11_16bpc_c: 12470.7 ( 1.00x)
cdef_filter_8x8_11_16bpc_rvv: 2528.2 ( 4.93x)
- Banana Pi BPI-F3:
cdef_filter_4x4_01_8bpc_c: 1281.2 ( 1.00x)
cdef_filter_4x4_01_8bpc_rvv: 813.0 ( 1.58x)
cdef_filter_4x4_01_16bpc_c: 1300.8 ( 1.00x)
cdef_filter_4x4_01_16bpc_rvv: 808.9 ( 1.61x)
cdef_filter_4x4_10_8bpc_c: 843.0 ( 1.00x)
cdef_filter_4x4_10_8bpc_rvv: 498.4 ( 1.69x)
cdef_filter_4x4_10_16bpc_c: 903.6 ( 1.00x)
cdef_filter_4x4_10_16bpc_rvv: 497.9 ( 1.81x)
cdef_filter_4x4_11_8bpc_c: 2614.1 ( 1.00x)
cdef_filter_4x4_11_8bpc_rvv: 1219.6 ( 2.14x)
cdef_filter_4x4_11_16bpc_c: 2795.6 ( 1.00x)
cdef_filter_4x4_11_16bpc_rvv: 1243.1 ( 2.25x)
cdef_filter_4x8_01_8bpc_c: 2405.4 ( 1.00x)
cdef_filter_4x8_01_8bpc_rvv: 1548.5 ( 1.55x)
cdef_filter_4x8_01_16bpc_c: 2402.7 ( 1.00x)
cdef_filter_4x8_01_16bpc_rvv: 1542.7 ( 1.56x)
cdef_filter_4x8_10_8bpc_c: 1522.0 ( 1.00x)
cdef_filter_4x8_10_8bpc_rvv: 917.4 ( 1.66x)
cdef_filter_4x8_10_16bpc_c: 1589.2 ( 1.00x)
cdef_filter_4x8_10_16bpc_rvv: 915.9 ( 1.74x)
cdef_filter_4x8_11_8bpc_c: 5050.7 ( 1.00x)
cdef_filter_4x8_11_8bpc_rvv: 2358.7 ( 2.14x)
cdef_filter_4x8_11_16bpc_c: 5510.5 ( 1.00x)
cdef_filter_4x8_11_16bpc_rvv: 2411.6 ( 2.28x)
cdef_filter_8x8_01_8bpc_c: 4558.3 ( 1.00x)
cdef_filter_8x8_01_8bpc_rvv: 1579.7 ( 2.89x)
cdef_filter_8x8_01_16bpc_c: 4551.1 ( 1.00x)
cdef_filter_8x8_01_16bpc_rvv: 1571.1 ( 2.90x)
cdef_filter_8x8_10_8bpc_c: 2869.3 ( 1.00x)
cdef_filter_8x8_10_8bpc_rvv: 948.4 ( 3.03x)
cdef_filter_8x8_10_16bpc_c: 2928.6 ( 1.00x)
cdef_filter_8x8_10_16bpc_rvv: 944.2 ( 3.10x)
cdef_filter_8x8_11_8bpc_c: 12317.5 ( 1.00x)
cdef_filter_8x8_11_8bpc_rvv: 2389.7 ( 5.15x)
cdef_filter_8x8_11_16bpc_c: 11950.6 ( 1.00x)
cdef_filter_8x8_11_16bpc_rvv: 2440.1 ( 4.90x)
2024-10-09 16:18:42 +02:00
Bogdan Gligorijević and Luca Barbato
f223436bb6
pal_idx_finish
...
Benchmarks:
- Kendryte K230:
pal_idx_finish_w4_c: 122.5 ( 1.00x)
pal_idx_finish_w4_rvv: 107.2 ( 1.14x)
pal_idx_finish_w8_c: 302.8 ( 1.00x)
pal_idx_finish_w8_rvv: 197.9 ( 1.53x)
pal_idx_finish_w16_c: 868.2 ( 1.00x)
pal_idx_finish_w16_rvv: 438.5 ( 1.98x)
pal_idx_finish_w32_c: 1966.5 ( 1.00x)
pal_idx_finish_w32_rvv: 833.0 ( 2.36x)
pal_idx_finish_w64_c: 4737.5 ( 1.00x)
pal_idx_finish_w64_rvv: 1818.3 ( 2.61x)
- Banana Pi BPI-F3:
pal_idx_finish_w4_c: 122.4 ( 1.00x)
pal_idx_finish_w4_rvv: 132.0 ( 0.93x)
pal_idx_finish_w8_c: 289.4 ( 1.00x)
pal_idx_finish_w8_rvv: 195.8 ( 1.48x)
pal_idx_finish_w16_c: 788.0 ( 1.00x)
pal_idx_finish_w16_rvv: 430.6 ( 1.83x)
pal_idx_finish_w32_c: 1699.2 ( 1.00x)
pal_idx_finish_w32_rvv: 816.3 ( 2.08x)
pal_idx_finish_w64_c: 3977.7 ( 1.00x)
pal_idx_finish_w64_rvv: 1779.4 ( 2.24x)
2024-10-09 16:18:42 +02:00
Nathan E. Egge and Luca Barbato
38f74bdc46
riscv: Allow multiple .option arch with vararg ext
2024-10-09 16:18:42 +02:00
Henrik Gramner
7072e79faa
x86: Make AVX2 SGR gatherless
...
Instead of using gathers we can calculate the value of
sgr_x_by_x[min(z, 255)] by doing 256 / (z + 1) in floating-point
with some clipping for z == 0 and z >= 255.
As the required precision of the division is fairly small it can be
performed using an approximate reciprocal, which is significantly
faster than a regular division.
Gather instructions are slow on all AMD CPU:s, and on most Intel
CPU:s ever since µcode updates were issued as a workaround for
the Gather Data Sampling side channel vulnerability.
2024-10-07 13:04:34 +02:00
Luca Barbato
21d9f29d38
tests: Add a fail fast option
2024-10-02 13:00:26 +02:00
jinbo and Hecai Yuan
ed004fe95d
loongarch: minor improvement on decode_symbol_adapt
...
Change-Id: I78fe788113ff2487ba1ce2e7d0c7d7c78c5a8c58
2024-09-30 06:37:00 +00:00
yuanhecai
62a51df14e
loongarch: rewrite optimization functions in loongarch/itx.S
...
Change-Id: I1566e8145d36296f2c76107cf15fc2cc7ac0ecc7
2024-09-30 06:37:00 +00:00
guxiwei and Hecai Yuan
757f294a49
LoongArch: Add save_tmvs_lsx
...
The performance data is as follows:
save_tmvs_c: 3938.6 ( 1.00x)
save_tmvs_lsx: 1355.3 ( 2.91x)
2024-09-30 06:37:00 +00:00
jinbo and Hecai Yuan
3d96175df2
loongarch: refactor loopfilter
...
bench performance before:
lpf_h_sb_y_w16_8bpc_c: 117.0 ( 1.00x)
lpf_h_sb_y_w16_8bpc_lsx: 33.9 ( 3.46x)
lpf_v_sb_y_w16_8bpc_c: 132.1 ( 1.00x)
lpf_v_sb_y_w16_8bpc_lsx: 59.7 ( 2.21x)
bench performance after:
lpf_h_sb_y_w16_8bpc_c: 114.9 ( 1.00x)
lpf_h_sb_y_w16_8bpc_lsx: 32.0 ( 3.59x)
lpf_v_sb_y_w16_8bpc_c: 132.5 ( 1.00x)
lpf_v_sb_y_w16_8bpc_lsx: 28.1 ( 4.72x)
Change-Id: Ie64e164a9416c438f6b3881ce18fb42e2ddd073d
2024-09-30 06:37:00 +00:00
yuanhecai
70582027e7
loongarch: add lasx implementation of sgr_3x3 for 8 bpc
...
sgr_3x3_8bpc_c: 27233.1 ( 1.00x)
sgr_3x3_8bpc_lsx: 12874.7 ( 2.12x)
sgr_3x3_8bpc_lasx: 10183.7 ( 2.67x)
Change-Id: I2aa469e8560733d6191396186bf776a12ad6e4a3
2024-09-30 06:37:00 +00:00
yuanhecai
96d6e472ad
loongarch: rewirte warp_8x8/8x8t_lsx for 8 bpc
...
before:
warp_8x8_8bpc_c: 109.8 ( 1.00x)
warp_8x8_8bpc_lsx: 44.6 ( 2.46x)
warp_8x8t_8bpc_c: 97.5 ( 1.00x)
warp_8x8t_8bpc_lsx: 43.7 ( 2.23x)
after:
warp_8x8_8bpc_c: 109.8 ( 1.00x)
warp_8x8_8bpc_lsx: 39.2 ( 2.80x)
warp_8x8t_8bpc_c: 97.5 ( 1.00x)
warp_8x8t_8bpc_lsx: 37.9 ( 2.57x)
Change-Id: I11728c2c30821b8e2b1c85208710dfe5d1c1269c
2024-09-30 06:37:00 +00:00
jinbo and Hecai Yuan
b9e9a0ef79
loongarch: Refine prep_8tap_8bpc_lasx
...
mct_8tap_regular_w8_h_8bpc_c: 47.1 ( 1.00x)
mct_8tap_regular_w8_h_8bpc_lsx: 6.3 ( 7.46x)
mct_8tap_regular_w8_h_8bpc_lasx: 4.4 (10.80x)
mct_8tap_regular_w8_hv_8bpc_c: 118.9 ( 1.00x)
mct_8tap_regular_w8_hv_8bpc_lsx: 19.2 ( 6.20x)
mct_8tap_regular_w8_hv_8bpc_lasx: 13.7 ( 8.69x)
mct_8tap_regular_w8_v_8bpc_c: 60.3 ( 1.00x)
mct_8tap_regular_w8_v_8bpc_lsx: 5.4 (11.08x)
mct_8tap_regular_w8_v_8bpc_lasx: 3.3 (18.33x)
Change-Id: I1140f6ffbd738166f2581bc9111ebbdf6f9fa72c
2024-09-30 06:37:00 +00:00
yuanhecai
af11a10a4b
loongarch: add lasx implementation of wiener filter for 8 bpc
...
wiener_5tap_8bpc_c: 18382.0 ( 1.00x)
wiener_5tap_8bpc_lsx: 4166.9 ( 4.41x)
wiener_5tap_8bpc_lasx: 2832.2 ( 6.49x)
wiener_7tap_8bpc_c: 18339.6 ( 1.00x)
wiener_7tap_8bpc_lsx: 4168.3 ( 4.40x)
wiener_7tap_8bpc_lasx: 2832.5 ( 6.47x)
Change-Id: I183a8cb008203fb61683b0543d9409d58d141a2e
2024-09-30 06:37:00 +00:00
zhoupeng and Hecai Yuan
90a9549b4e
Loongarch: Optimized load_tmvs_c function by LSX
...
load_tmvs_c: 9702.0 ( 1.00x)
load_tmvs_lsx: 7857.0 ( 1.23x)
2024-09-30 06:37:00 +00:00
pengxu and Hecai Yuan
411fc219a7
Loongarch: Optimized ipred_z1 8bpc functions by LSX
...
intra_pred_z1_w4_8bpc_c: 16.5 ( 1.00x)
intra_pred_z1_w4_8bpc_lsx: 7.1 ( 2.31x)
intra_pred_z1_w8_8bpc_c: 31.9 ( 1.00x)
intra_pred_z1_w8_8bpc_lsx: 10.0 ( 3.20x)
intra_pred_z1_w16_8bpc_c: 80.1 ( 1.00x)
intra_pred_z1_w16_8bpc_lsx: 20.2 ( 3.96x)
intra_pred_z1_w32_8bpc_c: 185.8 ( 1.00x)
intra_pred_z1_w32_8bpc_lsx: 40.8 ( 4.55x)
intra_pred_z1_w64_8bpc_c: 511.1 ( 1.00x)
intra_pred_z1_w64_8bpc_lsx: 99.0 ( 5.16x)
Change-Id: Id7591e9b87e5b4d7fc3f438397e25dc6ca8e7f91
2024-09-30 06:37:00 +00:00