mirror of
https://code.videolan.org/videolan/dav1d
synced 2026-06-11 04:03:05 +00:00
master
100
Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
beda1b3cda | riscv64/itx: Match stack allocation of 16x16 itx | ||
|
|
5e8c380e4b |
riscv64/mc16: Keep blend_v RVV operations in 16-bits
Kendryte K230 Before After Delta blend_v_w2_16bpc_c: 240.9 ( 1.00x) 240.9 ( 1.00x) 0.00% blend_v_w2_16bpc_rvv: 149.7 ( 1.61x) 155.4 ( 1.55x) 3.81% blend_v_w4_16bpc_c: 1072.4 ( 1.00x) 1072.5 ( 1.00x) 0.01% blend_v_w4_16bpc_rvv: 307.2 ( 3.49x) 299.9 ( 3.58x) -2.38% blend_v_w8_16bpc_c: 2004.7 ( 1.00x) 2010.2 ( 1.00x) 0.27% blend_v_w8_16bpc_rvv: 436.1 ( 4.60x) 381.0 ( 5.28x) -12.63% blend_v_w16_16bpc_c: 3859.4 ( 1.00x) 3853.7 ( 1.00x) -0.15% blend_v_w16_16bpc_rvv: 761.1 ( 5.07x) 554.0 ( 6.96x) -27.21% blend_v_w32_16bpc_c: 7509.7 ( 1.00x) 7505.3 ( 1.00x) -0.06% blend_v_w32_16bpc_rvv: 1427.1 ( 5.26x) 1005.5 ( 7.46x) -29.54% SpacemiT K1 Before After Delta blend_v_w2_16bpc_c: 220.1 ( 1.00x) 222.0 ( 1.00x) 0.86% blend_v_w2_16bpc_rvv: 146.6 ( 1.50x) 151.1 ( 1.47x) 3.07% blend_v_w4_16bpc_c: 968.3 ( 1.00x) 969.6 ( 1.00x) 0.13% blend_v_w4_16bpc_rvv: 281.2 ( 3.44x) 290.2 ( 3.34x) 3.20% blend_v_w8_16bpc_c: 1809.5 ( 1.00x) 1812.1 ( 1.00x) 0.14% blend_v_w8_16bpc_rvv: 374.2 ( 4.84x) 375.3 ( 4.83x) 0.29% blend_v_w16_16bpc_c: 3479.7 ( 1.00x) 3480.9 ( 1.00x) 0.03% blend_v_w16_16bpc_rvv: 521.5 ( 6.67x) 465.9 ( 7.47x) -10.66% blend_v_w32_16bpc_c: 6767.9 ( 1.00x) 6773.7 ( 1.00x) 0.09% blend_v_w32_16bpc_rvv: 852.1 ( 7.94x) 727.4 ( 9.31x) -14.63% Blackhole p100a Before After Delta blend_v_w2_16bpc_c: 205.6 ( 1.00x) 206.0 ( 1.00x) 0.19% blend_v_w2_16bpc_rvv: 176.5 ( 1.16x) 143.6 ( 1.44x) -18.64% blend_v_w4_16bpc_c: 901.0 ( 1.00x) 891.8 ( 1.00x) -1.02% blend_v_w4_16bpc_rvv: 298.8 ( 3.02x) 235.2 ( 3.79x) -21.29% blend_v_w8_16bpc_c: 1663.3 ( 1.00x) 1656.5 ( 1.00x) -0.41% blend_v_w8_16bpc_rvv: 300.1 ( 5.54x) 236.4 ( 7.01x) -21.23% blend_v_w16_16bpc_c: 3192.1 ( 1.00x) 3182.3 ( 1.00x) -0.31% blend_v_w16_16bpc_rvv: 349.2 ( 9.14x) 311.4 (10.22x) -10.82% blend_v_w32_16bpc_c: 6259.2 ( 1.00x) 6257.8 ( 1.00x) -0.02% blend_v_w32_16bpc_rvv: 350.2 (17.88x) 321.8 (19.44x) -8.11% |
||
|
|
d2fa9466be |
riscv64/mc16: Keep blend RVV operations in 16-bits
Kendryte K230 Before After Delta blend_w4_16bpc_c: 227.0 ( 1.00x) 227.1 ( 1.00x) 0.04% blend_w4_16bpc_rvv: 71.1 ( 3.19x) 73.2 ( 3.10x) 2.95% blend_w8_16bpc_c: 662.5 ( 1.00x) 662.7 ( 1.00x) 0.03% blend_w8_16bpc_rvv: 132.4 ( 5.00x) 115.0 ( 5.76x) -13.14% blend_w16_16bpc_c: 2559.3 ( 1.00x) 2559.8 ( 1.00x) 0.02% blend_w16_16bpc_rvv: 416.1 ( 6.15x) 326.7 ( 7.83x) -21.49% blend_w32_16bpc_c: 6483.9 ( 1.00x) 6484.5 ( 1.00x) 0.01% blend_w32_16bpc_rvv: 1029.1 ( 6.30x) 774.7 ( 8.37x) -24.72% SpacemiT K1 Before After Delta blend_w4_16bpc_c: 206.1 ( 1.00x) 207.0 ( 1.00x) 0.44% blend_w4_16bpc_rvv: 64.4 ( 3.20x) 69.5 ( 2.98x) 7.92% blend_w8_16bpc_c: 600.2 ( 1.00x) 600.9 ( 1.00x) 0.12% blend_w8_16bpc_rvv: 101.6 ( 5.91x) 106.9 ( 5.62x) 5.22% blend_w16_16bpc_c: 2316.0 ( 1.00x) 2316.4 ( 1.00x) 0.02% blend_w16_16bpc_rvv: 261.8 ( 8.85x) 229.1 (10.11x) -12.49% blend_w32_16bpc_c: 5861.1 ( 1.00x) 5860.4 ( 1.00x) -0.01% blend_w32_16bpc_rvv: 602.9 ( 9.72x) 475.3 (12.33x) -21.16% Blackhole p100a Before After Delta blend_w4_16bpc_c: 193.3 ( 1.00x) 191.3 ( 1.00x) -1.03% blend_w4_16bpc_rvv: 66.3 ( 2.91x) 65.4 ( 2.92x) -1.36% blend_w8_16bpc_c: 552.0 ( 1.00x) 549.8 ( 1.00x) -0.40% blend_w8_16bpc_rvv: 100.5 ( 5.49x) 96.2 ( 5.71x) -4.28% blend_w16_16bpc_c: 2112.5 ( 1.00x) 2111.8 ( 1.00x) -0.03% blend_w16_16bpc_rvv: 190.3 (11.10x) 185.9 (11.36x) -2.31% blend_w32_16bpc_c: 5417.5 ( 1.00x) 5416.2 ( 1.00x) -0.02% blend_w32_16bpc_rvv: 290.3 (18.66x) 304.0 (17.82x) 4.72% |
||
|
|
a31e4bd757 |
riscv64/mc16: Add VLEN=512 16bpc RVV blend_{,v} functions
Blackhole p100a Before After Delta blend_w4_16bpc_c: 193.1 ( 1.00x) 186.8 ( 1.00x) -3.26% blend_w4_16bpc_rvv: 64.8 ( 2.98x) 62.8 ( 2.97x) -3.09% blend_w8_16bpc_c: 551.0 ( 1.00x) 546.0 ( 1.00x) -0.91% blend_w8_16bpc_rvv: 96.2 ( 5.73x) 93.4 ( 5.85x) -2.91% blend_w16_16bpc_c: 2111.6 ( 1.00x) 2107.0 ( 1.00x) -0.22% blend_w16_16bpc_rvv: 189.9 (11.12x) 189.6 (11.11x) -0.16% blend_w32_16bpc_c: 5403.9 ( 1.00x) 5398.5 ( 1.00x) -0.10% blend_w32_16bpc_rvv: 292.4 (18.48x) 291.5 (18.52x) -0.31% blend_v_w2_16bpc_c: 209.1 ( 1.00x) 208.7 ( 1.00x) -0.19% blend_v_w2_16bpc_rvv: 180.3 ( 1.16x) 180.4 ( 1.16x) 0.06% blend_v_w4_16bpc_c: 896.9 ( 1.00x) 898.5 ( 1.00x) 0.18% blend_v_w4_16bpc_rvv: 303.0 ( 2.96x) 302.5 ( 2.97x) -0.17% blend_v_w8_16bpc_c: 1658.9 ( 1.00x) 1663.1 ( 1.00x) 0.25% blend_v_w8_16bpc_rvv: 303.0 ( 5.47x) 302.6 ( 5.50x) -0.13% blend_v_w16_16bpc_c: 3186.0 ( 1.00x) 3182.7 ( 1.00x) -0.10% blend_v_w16_16bpc_rvv: 313.1 (10.17x) 312.1 (10.20x) -0.32% blend_v_w32_16bpc_c: 6253.9 ( 1.00x) 6257.0 ( 1.00x) 0.05% blend_v_w32_16bpc_rvv: 355.4 (17.60x) 353.2 (17.72x) -0.62% |
||
|
|
38dd16e108 |
riscv64/mc: Add VLEN=512 8bpc RVV blend_{,h,v} functions
Blackhole p100a Before After Delta blend_w4_8bpc_c: 190.7 ( 1.00x) 189.3 ( 1.00x) -0.73% blend_w4_8bpc_rvv: 61.2 ( 3.12x) 59.7 ( 3.17x) -2.45% blend_w8_8bpc_c: 550.7 ( 1.00x) 547.0 ( 1.00x) -0.67% blend_w8_8bpc_rvv: 91.0 ( 6.05x) 89.4 ( 6.12x) -1.76% blend_w16_8bpc_c: 2112.4 ( 1.00x) 2106.8 ( 1.00x) -0.27% blend_w16_8bpc_rvv: 177.1 (11.92x) 174.8 (12.05x) -1.30% blend_w32_8bpc_c: 5423.8 ( 1.00x) 5393.8 ( 1.00x) -0.55% blend_w32_8bpc_rvv: 233.5 (23.23x) 230.7 (23.38x) -1.20% blend_h_w2_8bpc_c: 126.4 ( 1.00x) 128.0 ( 1.00x) 1.27% blend_h_w2_8bpc_rvv: 85.0 ( 1.49x) 81.2 ( 1.58x) -4.47% blend_h_w4_8bpc_c: 221.2 ( 1.00x) 222.2 ( 1.00x) 0.45% blend_h_w4_8bpc_rvv: 84.3 ( 2.62x) 81.3 ( 2.73x) -3.56% blend_h_w8_8bpc_c: 411.9 ( 1.00x) 413.3 ( 1.00x) 0.34% blend_h_w8_8bpc_rvv: 84.2 ( 4.89x) 81.0 ( 5.10x) -3.80% blend_h_w16_8bpc_c: 792.6 ( 1.00x) 793.5 ( 1.00x) 0.11% blend_h_w16_8bpc_rvv: 84.5 ( 9.38x) 81.5 ( 9.74x) -3.55% blend_h_w32_8bpc_c: 1577.7 ( 1.00x) 1578.8 ( 1.00x) 0.07% blend_h_w32_8bpc_rvv: 86.6 (18.21x) 83.5 (18.90x) -3.58% blend_h_w64_8bpc_c: 3099.5 ( 1.00x) 3101.9 ( 1.00x) 0.08% blend_h_w64_8bpc_rvv: 98.4 (31.49x) 95.2 (32.58x) -3.25% blend_h_w128_8bpc_c: 7496.9 ( 1.00x) 7498.1 ( 1.00x) 0.02% blend_h_w128_8bpc_rvv: 155.4 (48.24x) 151.5 (49.50x) -2.51% blend_v_w2_8bpc_c: 202.9 ( 1.00x) 203.5 ( 1.00x) 0.30% blend_v_w2_8bpc_rvv: 173.5 ( 1.17x) 176.6 ( 1.15x) 1.79% blend_v_w4_8bpc_c: 842.3 ( 1.00x) 844.2 ( 1.00x) 0.23% blend_v_w4_8bpc_rvv: 295.9 ( 2.85x) 299.0 ( 2.82x) 1.05% blend_v_w8_8bpc_c: 1589.9 ( 1.00x) 1592.1 ( 1.00x) 0.14% blend_v_w8_8bpc_rvv: 296.2 ( 5.37x) 299.0 ( 5.32x) 0.95% blend_v_w16_8bpc_c: 3090.3 ( 1.00x) 3088.3 ( 1.00x) -0.06% blend_v_w16_8bpc_rvv: 296.0 (10.44x) 299.4 (10.32x) 1.15% blend_v_w32_8bpc_c: 6080.2 ( 1.00x) 6081.5 ( 1.00x) 0.02% blend_v_w32_8bpc_rvv: 306.3 (19.85x) 309.3 (19.66x) 0.98% |
||
|
|
a17c862576 |
riscv64/mc: Only process w*3/4 elements in blend_v
Setting VL for this function only impacts the 16bpc performance and only on the SpacemiT K1 which has two vector units of length 128b each. Kendryte K230 Before After Delta blend_v_w2_8bpc_c: 220.0 ( 1.00x) 221.3 ( 1.00x) 0.59% blend_v_w2_8bpc_rvv: 145.7 ( 1.51x) 148.2 ( 1.49x) 1.72% blend_v_w4_8bpc_c: 942.1 ( 1.00x) 943.7 ( 1.00x) 0.17% blend_v_w4_8bpc_rvv: 240.4 ( 3.92x) 242.9 ( 3.89x) 1.04% blend_v_w8_8bpc_c: 1782.3 ( 1.00x) 1783.8 ( 1.00x) 0.08% blend_v_w8_8bpc_rvv: 252.6 ( 7.06x) 254.9 ( 7.00x) 0.91% blend_v_w16_8bpc_c: 3650.9 ( 1.00x) 3647.0 ( 1.00x) -0.11% blend_v_w16_8bpc_rvv: 495.5 ( 7.37x) 494.4 ( 7.38x) -0.22% blend_v_w32_8bpc_c: 7013.0 ( 1.00x) 7018.2 ( 1.00x) 0.07% blend_v_w32_8bpc_rvv: 807.9 ( 8.68x) 802.0 ( 8.75x) -0.73% blend_v_w2_16bpc_c: 226.1 ( 1.00x) 225.5 ( 1.00x) -0.27% blend_v_w2_16bpc_rvv: 148.6 ( 1.52x) 148.9 ( 1.51x) 0.20% blend_v_w4_16bpc_c: 1010.7 ( 1.00x) 1006.7 ( 1.00x) -0.40% blend_v_w4_16bpc_rvv: 306.7 ( 3.30x) 307.4 ( 3.27x) 0.23% blend_v_w8_16bpc_c: 1990.2 ( 1.00x) 1996.1 ( 1.00x) 0.30% blend_v_w8_16bpc_rvv: 519.5 ( 3.83x) 523.4 ( 3.81x) 0.75% blend_v_w16_16bpc_c: 3744.5 ( 1.00x) 3742.4 ( 1.00x) -0.06% blend_v_w16_16bpc_rvv: 899.6 ( 4.16x) 906.4 ( 4.13x) 0.76% blend_v_w32_16bpc_c: 7047.5 ( 1.00x) 7079.3 ( 1.00x) 0.45% blend_v_w32_16bpc_rvv: 1475.5 ( 4.78x) 1483.3 ( 4.77x) 0.53% SpacemiT K1 Before After Delta blend_v_w2_8bpc_c: 216.3 ( 1.00x) 214.4 ( 1.00x) -0.88% blend_v_w2_8bpc_rvv: 144.0 ( 1.50x) 143.6 ( 1.49x) -0.28% blend_v_w4_8bpc_c: 919.8 ( 1.00x) 918.1 ( 1.00x) -0.18% blend_v_w4_8bpc_rvv: 236.6 ( 3.89x) 236.4 ( 3.88x) -0.08% blend_v_w8_8bpc_c: 1739.3 ( 1.00x) 1736.8 ( 1.00x) -0.14% blend_v_w8_8bpc_rvv: 236.8 ( 7.34x) 236.3 ( 7.35x) -0.21% blend_v_w16_8bpc_c: 3374.7 ( 1.00x) 3374.9 ( 1.00x) 0.01% blend_v_w16_8bpc_rvv: 297.0 (11.36x) 296.8 (11.37x) -0.07% blend_v_w32_8bpc_c: 6647.5 ( 1.00x) 6645.5 ( 1.00x) -0.03% blend_v_w32_8bpc_rvv: 403.3 (16.48x) 402.4 (16.51x) -0.22% blend_v_w2_16bpc_c: 221.4 ( 1.00x) 220.1 ( 1.00x) -0.59% blend_v_w2_16bpc_rvv: 146.3 ( 1.51x) 147.3 ( 1.49x) 0.68% blend_v_w4_16bpc_c: 973.3 ( 1.00x) 972.7 ( 1.00x) -0.06% blend_v_w4_16bpc_rvv: 280.3 ( 3.47x) 282.1 ( 3.45x) 0.64% blend_v_w8_16bpc_c: 1814.8 ( 1.00x) 1816.2 ( 1.00x) 0.08% blend_v_w8_16bpc_rvv: 376.6 ( 4.82x) 376.9 ( 4.82x) 0.08% blend_v_w16_16bpc_c: 3485.5 ( 1.00x) 3485.5 ( 1.00x) 0.00% blend_v_w16_16bpc_rvv: 531.1 ( 6.56x) 525.6 ( 6.63x) -1.04% blend_v_w32_16bpc_c: 6788.3 ( 1.00x) 6778.8 ( 1.00x) -0.14% blend_v_w32_16bpc_rvv: 904.5 ( 7.51x) 854.6 ( 7.93x) -5.52% |
||
|
|
907dd87191 |
riscv64/mc16: Unroll 16bpc RVV blend_v 2x
Kendryte K230 Before After Delta blend_v_w2_16bpc_c: 225.8 ( 1.00x) 225.7 ( 1.00x) -0.04% blend_v_w2_16bpc_rvv: 194.7 ( 1.16x) 148.6 ( 1.52x) -23.68% blend_v_w4_16bpc_c: 1011.3 ( 1.00x) 1005.8 ( 1.00x) -0.54% blend_v_w4_16bpc_rvv: 387.2 ( 2.61x) 305.4 ( 3.29x) -21.13% blend_v_w8_16bpc_c: 1878.5 ( 1.00x) 1872.7 ( 1.00x) -0.31% blend_v_w8_16bpc_rvv: 475.3 ( 3.95x) 435.6 ( 4.30x) -8.35% blend_v_w16_16bpc_c: 3601.9 ( 1.00x) 3601.6 ( 1.00x) -0.01% blend_v_w16_16bpc_rvv: 891.2 ( 4.04x) 892.7 ( 4.03x) 0.17% blend_v_w32_16bpc_c: 7043.7 ( 1.00x) 7058.8 ( 1.00x) 0.21% blend_v_w32_16bpc_rvv: 1384.5 ( 5.09x) 1478.0 ( 4.78x) 6.75% SpacemiT K1 Before After Delta blend_v_w2_16bpc_c: 222.6 ( 1.00x) 220.5 ( 1.00x) -0.94% blend_v_w2_16bpc_rvv: 195.7 ( 1.14x) 146.6 ( 1.50x) -25.09% blend_v_w4_16bpc_c: 972.3 ( 1.00x) 972.0 ( 1.00x) -0.03% blend_v_w4_16bpc_rvv: 349.1 ( 2.79x) 281.9 ( 3.45x) -19.25% blend_v_w8_16bpc_c: 1812.1 ( 1.00x) 1813.0 ( 1.00x) 0.05% blend_v_w8_16bpc_rvv: 481.5 ( 3.76x) 376.0 ( 4.82x) -21.91% blend_v_w16_16bpc_c: 3488.4 ( 1.00x) 3484.6 ( 1.00x) -0.11% blend_v_w16_16bpc_rvv: 608.7 ( 5.73x) 523.4 ( 6.66x) -14.01% blend_v_w32_16bpc_c: 6795.3 ( 1.00x) 6792.4 ( 1.00x) -0.04% blend_v_w32_16bpc_rvv: 934.8 ( 7.27x) 907.3 ( 7.49x) -2.94% |
||
|
|
9710e7de9c |
riscv64/mc16: Branchless vsetvl in blend_v function
Kendryte K230 Before After Delta blend_v_w2_16bpc_c: 226.0 ( 1.00x) 226.1 ( 1.00x) 0.04% blend_v_w2_16bpc_rvv: 194.0 ( 1.16x) 193.9 ( 1.17x) -0.05% blend_v_w4_16bpc_c: 1011.8 ( 1.00x) 1009.4 ( 1.00x) -0.24% blend_v_w4_16bpc_rvv: 392.7 ( 2.58x) 390.8 ( 2.58x) -0.48% blend_v_w8_16bpc_c: 1987.9 ( 1.00x) 1988.0 ( 1.00x) 0.01% blend_v_w8_16bpc_rvv: 561.5 ( 3.54x) 560.2 ( 3.55x) -0.23% blend_v_w16_16bpc_c: 3738.1 ( 1.00x) 3739.1 ( 1.00x) 0.03% blend_v_w16_16bpc_rvv: 934.1 ( 4.00x) 932.2 ( 4.01x) -0.20% blend_v_w32_16bpc_c: 7031.0 ( 1.00x) 7030.1 ( 1.00x) -0.01% blend_v_w32_16bpc_rvv: 1403.3 ( 5.01x) 1395.8 ( 5.04x) -0.53% SpacemiT K1 Before After Delta blend_v_w2_16bpc_c: 221.0 ( 1.00x) 221.2 ( 1.00x) 0.09% blend_v_w2_16bpc_rvv: 195.2 ( 1.13x) 196.0 ( 1.13x) 0.41% blend_v_w4_16bpc_c: 969.8 ( 1.00x) 971.9 ( 1.00x) 0.22% blend_v_w4_16bpc_rvv: 348.8 ( 2.78x) 349.1 ( 2.78x) 0.09% blend_v_w8_16bpc_c: 1812.6 ( 1.00x) 1814.9 ( 1.00x) 0.13% blend_v_w8_16bpc_rvv: 486.1 ( 3.73x) 484.3 ( 3.75x) -0.37% blend_v_w16_16bpc_c: 3483.0 ( 1.00x) 3485.1 ( 1.00x) 0.06% blend_v_w16_16bpc_rvv: 608.7 ( 5.72x) 607.4 ( 5.74x) -0.21% blend_v_w32_16bpc_c: 6791.8 ( 1.00x) 6794.2 ( 1.00x) 0.04% blend_v_w32_16bpc_rvv: 940.6 ( 7.22x) 942.1 ( 7.21x) 0.16% |
||
|
|
28d1c21779 |
riscv64/mc16: Add VLEN=256 8bpc RVV blend_v function
SpacemiT K1 Before After Delta blend_v_w2_16bpc_c: 221.5 ( 1.00x) 220.3 ( 1.00x) -0.54% blend_v_w2_16bpc_rvv: 193.5 ( 1.14x) 194.3 ( 1.13x) 0.41% blend_v_w4_16bpc_c: 968.8 ( 1.00x) 967.2 ( 1.00x) -0.17% blend_v_w4_16bpc_rvv: 442.2 ( 2.19x) 347.4 ( 2.78x) -21.44% blend_v_w8_16bpc_c: 1809.4 ( 1.00x) 1811.2 ( 1.00x) 0.10% blend_v_w8_16bpc_rvv: 557.4 ( 3.25x) 483.2 ( 3.75x) -13.31% blend_v_w16_16bpc_c: 3481.4 ( 1.00x) 3473.4 ( 1.00x) -0.23% blend_v_w16_16bpc_rvv: 844.3 ( 4.12x) 603.1 ( 5.76x) -28.57% blend_v_w32_16bpc_c: 6783.1 ( 1.00x) 6749.8 ( 1.00x) -0.49% blend_v_w32_16bpc_rvv: 1406.1 ( 4.82x) 919.4 ( 7.34x) -34.61% |
||
|
|
aa2deb898e |
riscv64/mc16: Add 16bpc RVV blend_v function
Kendryte K230 blend_v_w2_16bpc_c: 226.5 ( 1.00x) blend_v_w2_16bpc_rvv: 192.2 ( 1.18x) blend_v_w4_16bpc_c: 1010.3 ( 1.00x) blend_v_w4_16bpc_rvv: 390.5 ( 2.59x) blend_v_w8_16bpc_c: 1994.2 ( 1.00x) blend_v_w8_16bpc_rvv: 561.7 ( 3.55x) blend_v_w16_16bpc_c: 3737.9 ( 1.00x) blend_v_w16_16bpc_rvv: 928.0 ( 4.03x) blend_v_w32_16bpc_c: 7064.7 ( 1.00x) blend_v_w32_16bpc_rvv: 1428.9 ( 4.94x) SpacemiT K1 blend_v_w2_16bpc_c: 220.8 ( 1.00x) blend_v_w2_16bpc_rvv: 193.5 ( 1.14x) blend_v_w4_16bpc_c: 967.3 ( 1.00x) blend_v_w4_16bpc_rvv: 439.5 ( 2.20x) blend_v_w8_16bpc_c: 1810.2 ( 1.00x) blend_v_w8_16bpc_rvv: 555.3 ( 3.26x) blend_v_w16_16bpc_c: 3476.4 ( 1.00x) blend_v_w16_16bpc_rvv: 830.9 ( 4.18x) blend_v_w32_16bpc_c: 6772.9 ( 1.00x) blend_v_w32_16bpc_rvv: 1356.3 ( 4.99x) |
||
|
|
c783088fe7 |
riscv64/mc16: Unroll 16bpc RVV blend 2x
Kendryte K230 Before After Delta blend_w4_16bpc_c: 210.0 ( 1.00x) 208.9 ( 1.00x) -0.52% blend_w4_16bpc_rvv: 88.5 ( 2.37x) 66.2 ( 3.15x) -25.20% blend_w8_16bpc_c: 614.1 ( 1.00x) 613.5 ( 1.00x) -0.10% blend_w8_16bpc_rvv: 143.1 ( 4.29x) 126.9 ( 4.83x) -11.32% blend_w16_16bpc_c: 2371.2 ( 1.00x) 2371.3 ( 1.00x) 0.00% blend_w16_16bpc_rvv: 461.1 ( 5.14x) 413.2 ( 5.74x) -10.39% blend_w32_16bpc_c: 5998.4 ( 1.00x) 5998.4 ( 1.00x) 0.00% blend_w32_16bpc_rvv: 978.4 ( 6.13x) 1013.1 ( 5.92x) 3.55% SpacemiT K1 Before After Delta blend_w4_16bpc_c: 205.8 ( 1.00x) 205.9 ( 1.00x) 0.05% blend_w4_16bpc_rvv: 80.9 ( 2.54x) 64.9 ( 3.17x) -19.78% blend_w8_16bpc_c: 599.9 ( 1.00x) 599.9 ( 1.00x) 0.00% blend_w8_16bpc_rvv: 134.4 ( 4.46x) 101.9 ( 5.89x) -24.18% blend_w16_16bpc_c: 2316.5 ( 1.00x) 2316.5 ( 1.00x) 0.00% blend_w16_16bpc_rvv: 302.0 ( 7.67x) 262.8 ( 8.81x) -12.98% blend_w32_16bpc_c: 5861.9 ( 1.00x) 5861.4 ( 1.00x) -0.01% blend_w32_16bpc_rvv: 589.6 ( 9.94x) 602.2 ( 9.73x) 2.14% |
||
|
|
67c60d76e1 |
riscv64/mc16: Branchless vsetvl in blend function
Kendryte K230 Before After Delta blend_w4_16bpc_c: 208.8 ( 1.00x) 209.9 ( 1.00x) 0.53% blend_w4_16bpc_rvv: 85.9 ( 2.43x) 88.6 ( 2.37x) 3.14% blend_w8_16bpc_c: 613.2 ( 1.00x) 614.3 ( 1.00x) 0.18% blend_w8_16bpc_rvv: 145.4 ( 4.22x) 143.1 ( 4.29x) -1.58% blend_w16_16bpc_c: 2371.9 ( 1.00x) 2373.6 ( 1.00x) 0.07% blend_w16_16bpc_rvv: 464.0 ( 5.11x) 461.2 ( 5.15x) -0.60% blend_w32_16bpc_c: 6005.6 ( 1.00x) 6007.7 ( 1.00x) 0.03% blend_w32_16bpc_rvv: 981.6 ( 6.12x) 979.4 ( 6.13x) -0.22% SpacemiT K1 Before After Delta blend_w4_16bpc_c: 206.4 ( 1.00x) 205.7 ( 1.00x) -0.34% blend_w4_16bpc_rvv: 79.5 ( 2.60x) 81.0 ( 2.54x) 1.89% blend_w8_16bpc_c: 600.7 ( 1.00x) 599.7 ( 1.00x) -0.17% blend_w8_16bpc_rvv: 133.3 ( 4.51x) 134.1 ( 4.47x) 0.60% blend_w16_16bpc_c: 2315.9 ( 1.00x) 2315.2 ( 1.00x) -0.03% blend_w16_16bpc_rvv: 305.2 ( 7.59x) 300.7 ( 7.70x) -1.47% blend_w32_16bpc_c: 5861.1 ( 1.00x) 5860.2 ( 1.00x) -0.02% blend_w32_16bpc_rvv: 592.5 ( 9.89x) 589.5 ( 9.94x) -0.51% |
||
|
|
3437a26b3d |
riscv64/mc16: Add VLEN=256 8bpc RVV blend function
SpacemiT K1 Before After Delta blend_w4_16bpc_c: 206.8 ( 1.00x) 206.0 ( 1.00x) -0.39% blend_w4_16bpc_rvv: 95.8 ( 2.16x) 77.8 ( 2.65x) -18.79% blend_w8_16bpc_c: 600.4 ( 1.00x) 600.1 ( 1.00x) -0.05% blend_w8_16bpc_rvv: 161.7 ( 3.71x) 131.3 ( 4.57x) -18.80% blend_w16_16bpc_c: 2317.6 ( 1.00x) 2316.5 ( 1.00x) -0.05% blend_w16_16bpc_rvv: 459.6 ( 5.04x) 302.9 ( 7.65x) -34.09% blend_w32_16bpc_c: 5863.0 ( 1.00x) 5863.3 ( 1.00x) 0.01% blend_w32_16bpc_rvv: 992.7 ( 5.91x) 578.1 (10.14x) -41.76% |
||
|
|
e542f661d0 |
meson: Move riscv64 8bpc only files into bitdepth sources
The cdef.S, itx.S and mc.S files contain only 8bpc implementations and should be compiled only when building with -Dbitdepths=8 configuration. |
||
|
|
ca489d8aab |
riscv64/mc16: Add 16bpc RVV blend function
Kendryte K230 blend_w4_16bpc_c: 214.4 ( 1.00x) blend_w4_16bpc_rvv: 90.2 ( 2.38x) blend_w8_16bpc_c: 618.9 ( 1.00x) blend_w8_16bpc_rvv: 147.4 ( 4.20x) blend_w16_16bpc_c: 2376.5 ( 1.00x) blend_w16_16bpc_rvv: 466.0 ( 5.10x) blend_w32_16bpc_c: 6008.6 ( 1.00x) blend_w32_16bpc_rvv: 985.0 ( 6.10x) SpacemiT K1 blend_w4_16bpc_c: 204.9 ( 1.00x) blend_w4_16bpc_rvv: 88.3 ( 2.32x) blend_w8_16bpc_c: 598.5 ( 1.00x) blend_w8_16bpc_rvv: 155.3 ( 3.85x) blend_w16_16bpc_c: 2315.4 ( 1.00x) blend_w16_16bpc_rvv: 444.4 ( 5.21x) blend_w32_16bpc_c: 5860.1 ( 1.00x) blend_w32_16bpc_rvv: 993.0 ( 5.90x) |
||
|
|
22e9c0fee3 |
riscv64/ipred16: Fix build error with -Dbitdepths=16
When configuring and building dav1d with just the 16bp code paths using meson setup .. -Dbitdepths=16 there is an undefined reference to dav1d_dc_gen_8bpc_rvv due to a typo in src/riscv/64/ipred16.S. |
||
|
|
c3fa1db301 | NEWS: add itx to riscv list | ||
|
|
789a1f652b |
riscv64/itx: Replace vwadd+vnsra with vnclip
The vnclip instruction does a fixed-point saturating add then shift and
can replace vwadd followed by vnsra in idct_4, idct_8, idct_16, iadst_8
and iadst_16.
Including
|
||
|
|
572c5a669d |
riscv: Fix argon test failure
This fixes md5sum mismatch in profile0_core/streams/test11168_11073.obu. |
||
|
|
cc7d8773ee |
riscv64/mc: Branchless vsetvl in blend_v function
Kendryte K230 blend_v_w2_8bpc_c: 221.4 ( 1.00x) blend_v_w2_8bpc_rvv: 147.7 ( 1.50x) blend_v_w4_8bpc_c: 945.3 ( 1.00x) blend_v_w4_8bpc_rvv: 243.3 ( 3.89x) blend_v_w8_8bpc_c: 1786.9 ( 1.00x) blend_v_w8_8bpc_rvv: 256.1 ( 6.98x) blend_v_w16_8bpc_c: 3472.1 ( 1.00x) blend_v_w16_8bpc_rvv: 351.1 ( 9.89x) blend_v_w32_8bpc_c: 6832.1 ( 1.00x) blend_v_w32_8bpc_rvv: 635.4 (10.75x) SpacemiT K1 blend_v_w2_8bpc_c: 218.0 ( 1.00x) blend_v_w2_8bpc_rvv: 144.3 ( 1.51x) blend_v_w4_8bpc_c: 921.7 ( 1.00x) blend_v_w4_8bpc_rvv: 237.1 ( 3.89x) blend_v_w8_8bpc_c: 1739.8 ( 1.00x) blend_v_w8_8bpc_rvv: 237.4 ( 7.33x) blend_v_w16_8bpc_c: 3376.6 ( 1.00x) blend_v_w16_8bpc_rvv: 296.3 (11.40x) blend_v_w32_8bpc_c: 6647.2 ( 1.00x) blend_v_w32_8bpc_rvv: 408.1 (16.29x) |
||
|
|
2da8107ec1 |
riscv64/mc: Branchless vsetvl in blend_h function
Kendryte K230 blend_h_w2_8bpc_c: 165.9 ( 1.00x) blend_h_w2_8bpc_rvv: 83.8 ( 1.98x) blend_h_w4_8bpc_c: 295.2 ( 1.00x) blend_h_w4_8bpc_rvv: 83.8 ( 3.52x) blend_h_w8_8bpc_c: 557.9 ( 1.00x) blend_h_w8_8bpc_rvv: 92.5 ( 6.03x) blend_h_w16_8bpc_c: 1078.8 ( 1.00x) blend_h_w16_8bpc_rvv: 117.3 ( 9.19x) blend_h_w32_8bpc_c: 2117.8 ( 1.00x) blend_h_w32_8bpc_rvv: 200.5 (10.57x) blend_h_w64_8bpc_c: 4194.7 ( 1.00x) blend_h_w64_8bpc_rvv: 363.2 (11.55x) blend_h_w128_8bpc_c: 10271.4 ( 1.00x) blend_h_w128_8bpc_rvv: 844.5 (12.16x) SpacemiT K1 blend_h_w2_8bpc_c: 162.5 ( 1.00x) blend_h_w2_8bpc_rvv: 83.9 ( 1.94x) blend_h_w4_8bpc_c: 288.6 ( 1.00x) blend_h_w4_8bpc_rvv: 83.7 ( 3.45x) blend_h_w8_8bpc_c: 544.7 ( 1.00x) blend_h_w8_8bpc_rvv: 84.0 ( 6.48x) blend_h_w16_8bpc_c: 1052.8 ( 1.00x) blend_h_w16_8bpc_rvv: 102.9 (10.23x) blend_h_w32_8bpc_c: 2068.0 ( 1.00x) blend_h_w32_8bpc_rvv: 131.4 (15.73x) blend_h_w64_8bpc_c: 4093.7 ( 1.00x) blend_h_w64_8bpc_rvv: 220.3 (18.58x) blend_h_w128_8bpc_c: 10023.1 ( 1.00x) blend_h_w128_8bpc_rvv: 467.3 (21.45x) |
||
|
|
b374b24c0f |
riscv64/mc: Branchless vsetvl in blend function
Kendryte K230 blend_w4_8bpc_c: 204.8 ( 1.00x) blend_w4_8bpc_rvv: 59.8 ( 3.42x) blend_w8_8bpc_c: 608.9 ( 1.00x) blend_w8_8bpc_rvv: 87.2 ( 6.98x) blend_w16_8bpc_c: 2362.4 ( 1.00x) blend_w16_8bpc_rvv: 225.2 (10.49x) blend_w32_8bpc_c: 5990.4 ( 1.00x) blend_w32_8bpc_rvv: 518.3 (11.56x) SpacemiT K1 blend_w4_8bpc_c: 201.6 ( 1.00x) blend_w4_8bpc_rvv: 58.0 ( 3.48x) blend_w8_8bpc_c: 595.1 ( 1.00x) blend_w8_8bpc_rvv: 82.1 ( 7.25x) blend_w16_8bpc_c: 2308.8 ( 1.00x) blend_w16_8bpc_rvv: 189.0 (12.22x) blend_w32_8bpc_c: 5853.1 ( 1.00x) blend_w32_8bpc_rvv: 339.5 (17.24x) |
||
|
|
0e3f70e898 |
riscv64/mc: Add VLEN=256 8bpc RVV blend_v function
SpacemiT K1 blend_v_w2_8bpc_c: 217.0 ( 1.00x) blend_v_w2_8bpc_rvv: 143.3 ( 1.51x) blend_v_w4_8bpc_c: 921.6 ( 1.00x) blend_v_w4_8bpc_rvv: 236.3 ( 3.90x) blend_v_w8_8bpc_c: 1738.2 ( 1.00x) blend_v_w8_8bpc_rvv: 238.1 ( 7.30x) blend_v_w16_8bpc_c: 3376.1 ( 1.00x) blend_v_w16_8bpc_rvv: 298.0 (11.33x) blend_v_w32_8bpc_c: 6648.0 ( 1.00x) blend_v_w32_8bpc_rvv: 409.5 (16.24x) |
||
|
|
a5b9544866 |
riscv64/mc: Add VLEN=256 8bpc RVV blend_h function
SpacemiT K1 blend_h_w2_8bpc_c: 161.8 ( 1.00x) blend_h_w2_8bpc_rvv: 83.5 ( 1.94x) blend_h_w4_8bpc_c: 288.4 ( 1.00x) blend_h_w4_8bpc_rvv: 83.7 ( 3.45x) blend_h_w8_8bpc_c: 543.9 ( 1.00x) blend_h_w8_8bpc_rvv: 84.5 ( 6.44x) blend_h_w16_8bpc_c: 1051.6 ( 1.00x) blend_h_w16_8bpc_rvv: 103.8 (10.13x) blend_h_w32_8bpc_c: 2066.0 ( 1.00x) blend_h_w32_8bpc_rvv: 133.8 (15.44x) blend_h_w64_8bpc_c: 4092.7 ( 1.00x) blend_h_w64_8bpc_rvv: 225.2 (18.18x) blend_h_w128_8bpc_c: 10011.3 ( 1.00x) blend_h_w128_8bpc_rvv: 474.7 (21.09x) |
||
|
|
83485c5092 |
riscv64/mc: Add VLEN=256 8bpc RVV blend function
SpacemiT K1 blend_w4_8bpc_c: 201.3 ( 1.00x) blend_w4_8bpc_rvv: 59.3 ( 3.40x) blend_w8_8bpc_c: 595.1 ( 1.00x) blend_w8_8bpc_rvv: 84.1 ( 7.07x) blend_w16_8bpc_c: 2309.0 ( 1.00x) blend_w16_8bpc_rvv: 190.5 (12.12x) blend_w32_8bpc_c: 5854.7 ( 1.00x) blend_w32_8bpc_rvv: 341.6 (17.14x) |
||
|
|
7f2bb2fbc9 | riscv: Move get_vlenb() from checkasm_ to dav1d_ | ||
|
|
01da36ebdf |
riscv64/mc: Add 8bpc RVV blend_v function
Kendryte K230 blend_v_w2_8bpc_c: 219.6 ( 1.00x) blend_v_w2_8bpc_rvv: 141.8 ( 1.55x) blend_v_w4_8bpc_c: 942.9 ( 1.00x) blend_v_w4_8bpc_rvv: 240.9 ( 3.91x) blend_v_w8_8bpc_c: 1783.5 ( 1.00x) blend_v_w8_8bpc_rvv: 254.7 ( 7.00x) blend_v_w16_8bpc_c: 3466.5 ( 1.00x) blend_v_w16_8bpc_rvv: 350.5 ( 9.89x) blend_v_w32_8bpc_c: 6825.2 ( 1.00x) blend_v_w32_8bpc_rvv: 635.1 (10.75x) |
||
|
|
d3a94f1194 |
riscv64/mc: Add 8bpc RVV blend_h function
Kendryte K230 blend_h_w2_8bpc_c: 165.4 ( 1.00x) blend_h_w2_8bpc_rvv: 79.4 ( 2.08x) blend_h_w4_8bpc_c: 294.6 ( 1.00x) blend_h_w4_8bpc_rvv: 81.5 ( 3.61x) blend_h_w8_8bpc_c: 556.9 ( 1.00x) blend_h_w8_8bpc_rvv: 90.2 ( 6.17x) blend_h_w16_8bpc_c: 1077.6 ( 1.00x) blend_h_w16_8bpc_rvv: 116.1 ( 9.29x) blend_h_w32_8bpc_c: 2116.2 ( 1.00x) blend_h_w32_8bpc_rvv: 200.5 (10.55x) blend_h_w64_8bpc_c: 4191.8 ( 1.00x) blend_h_w64_8bpc_rvv: 363.3 (11.54x) blend_h_w128_8bpc_c: 10264.6 ( 1.00x) blend_h_w128_8bpc_rvv: 844.1 (12.16x) |
||
|
|
f851fcd0b4 |
riscv64/mc: Add 8bpc RVV blend function
Kendryte K230 blend_w4_8bpc_c: 204.5 ( 1.00x) blend_w4_8bpc_rvv: 56.4 ( 3.62x) blend_w8_8bpc_c: 608.6 ( 1.00x) blend_w8_8bpc_rvv: 87.3 ( 6.97x) blend_w16_8bpc_c: 2363.8 ( 1.00x) blend_w16_8bpc_rvv: 225.1 (10.50x) blend_w32_8bpc_c: 5990.3 ( 1.00x) blend_w32_8bpc_rvv: 518.8 (11.55x) |
||
|
|
38f74bdc46 | riscv: Allow multiple .option arch with vararg ext | ||
|
|
01b94cc33b | cli: Prevent buffer over-read | ||
|
|
fc4763c5a4 | riscv: Check for standards compliant RVV 1.0+ | ||
|
|
0fff614a4c | arm32/msac: Trim C functions, saves 1024 bytes | ||
|
|
b9f5333021 | arm64/msac: Trim C functions, saves 1392 bytes | ||
|
|
b5b394cd6e |
arm: Use -fno-align-functions when building
arm32: 2 byte alignment saves 136 bytes arm64: 4 byte alignment saves 1200 bytes |
||
|
|
61d16e07ac | arm32/itx: Trim dav1d_inv_wht4_1d_c, saves 68 bytes | ||
|
|
485413b059 | arm64/itx: Trim dav1d_inv_wht4_1d_c, saves 92 bytes | ||
|
|
ec695854f7 |
arm32/itx16: Add 4x4 12bpc NEON wht_wht transform
When -Dtrim_dsp=true, this commit saves 740 bytes. inv_txfm_add_4x4_wht_wht_0_12bpc_c: 192.4 ( 1.00x) inv_txfm_add_4x4_wht_wht_0_12bpc_neon: 46.1 ( 4.17x) inv_txfm_add_4x4_wht_wht_1_12bpc_c: 192.4 ( 1.00x) inv_txfm_add_4x4_wht_wht_1_12bpc_neon: 45.7 ( 4.21x) |
||
|
|
3b852b15e9 |
arm64/itx16: Add 4x4 12bpc NEON wht_wht transform
When -Dtrim_dsp=true, this commit saves 940 bytes. inv_txfm_add_4x4_wht_wht_0_12bpc_c: 145.2 ( 1.00x) inv_txfm_add_4x4_wht_wht_0_12bpc_neon: 42.9 ( 3.39x) inv_txfm_add_4x4_wht_wht_1_12bpc_c: 145.4 ( 1.00x) inv_txfm_add_4x4_wht_wht_1_12bpc_neon: 42.9 ( 3.39x) |
||
|
|
b7963a7389 |
riscv64/itx: Add 16x16 8bpc eob test
Kendryte K230 Before After inv_txfm_add_16x16_adst_adst_0_8bpc_rvv: 1804.9 (8.45x) 1374.3 (11.18x) inv_txfm_add_16x16_adst_adst_1_8bpc_rvv: 1805.2 (8.45x) 1374.3 (11.17x) inv_txfm_add_16x16_adst_dct_0_8bpc_rvv: 1626.6 (8.92x) 1185.8 (12.22x) inv_txfm_add_16x16_adst_dct_1_8bpc_rvv: 1626.5 (8.91x) 1185.9 (12.22x) inv_txfm_add_16x16_adst_flipadst_0_8bpc_rvv: 1824.2 (8.38x) 1372.1 (11.22x) inv_txfm_add_16x16_adst_flipadst_1_8bpc_rvv: 1824.2 (8.37x) 1372.2 (11.21x) inv_txfm_add_16x16_dct_adst_0_8bpc_rvv: 1627.3 (8.94x) 1283.5 (11.29x) inv_txfm_add_16x16_dct_adst_1_8bpc_rvv: 1627.2 (8.95x) 1283.2 (11.29x) inv_txfm_add_16x16_dct_dct_0_8bpc_rvv: 1449.3 (1.08x) 1095.2 ( 1.44x) inv_txfm_add_16x16_dct_dct_1_8bpc_rvv: 1449.1 (9.52x) 1095.1 (12.45x) inv_txfm_add_16x16_dct_flipadst_0_8bpc_rvv: 1643.0 (8.87x) 1283.5 (11.29x) inv_txfm_add_16x16_dct_flipadst_1_8bpc_rvv: 1643.3 (8.87x) 1283.3 (11.30x) inv_txfm_add_16x16_dct_identity_0_8bpc_rvv: 1155.4 (9.23x) 805.9 (13.17x) inv_txfm_add_16x16_dct_identity_1_8bpc_rvv: 1155.4 (9.24x) 805.9 (13.17x) inv_txfm_add_16x16_flipadst_adst_0_8bpc_rvv: 1812.2 (8.43x) 1370.9 (11.23x) inv_txfm_add_16x16_flipadst_adst_1_8bpc_rvv: 1811.7 (8.44x) 1370.8 (11.24x) inv_txfm_add_16x16_flipadst_dct_0_8bpc_rvv: 1637.2 (8.88x) 1190.8 (12.19x) inv_txfm_add_16x16_flipadst_dct_1_8bpc_rvv: 1637.6 (8.87x) 1190.9 (12.19x) inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_rvv: 1831.1 (8.34x) 1374.7 (11.21x) inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_rvv: 1830.8 (8.35x) 1374.5 (11.22x) inv_txfm_add_16x16_identity_dct_0_8bpc_rvv: 1156.2 (8.67x) 948.6 (10.49x) inv_txfm_add_16x16_identity_dct_1_8bpc_rvv: 1156.3 (8.68x) 948.6 (10.49x) inv_txfm_add_16x16_identity_identity_0_8bpc_rvv: 879.3 (7.81x) 673.5 (10.28x) inv_txfm_add_16x16_identity_identity_1_8bpc_rvv: 879.3 (7.81x) 673.5 (10.28x) |
||
|
|
701225128a |
riscv64/itx: Add 8x16 8bpc eob test
Kendryte K230 Before After inv_txfm_add_8x16_adst_adst_0_8bpc_rvv: 853.9 ( 9.00x) 698.3 (11.03x) inv_txfm_add_8x16_adst_adst_1_8bpc_rvv: 853.8 ( 9.00x) 698.3 (11.03x) inv_txfm_add_8x16_adst_dct_0_8bpc_rvv: 763.0 ( 9.55x) 609.2 (12.00x) inv_txfm_add_8x16_adst_dct_1_8bpc_rvv: 763.1 ( 9.55x) 609.3 (11.94x) inv_txfm_add_8x16_adst_flipadst_0_8bpc_rvv: 857.1 ( 8.99x) 701.6 (11.00x) inv_txfm_add_8x16_adst_flipadst_1_8bpc_rvv: 856.8 ( 8.98x) 701.3 (10.97x) inv_txfm_add_8x16_adst_identity_0_8bpc_rvv: 622.9 ( 9.22x) 468.5 (12.36x) inv_txfm_add_8x16_adst_identity_1_8bpc_rvv: 622.9 ( 9.23x) 468.6 (12.37x) inv_txfm_add_8x16_dct_adst_0_8bpc_rvv: 770.1 ( 9.32x) 655.1 (10.93x) inv_txfm_add_8x16_dct_adst_1_8bpc_rvv: 770.1 ( 9.34x) 655.4 (10.93x) inv_txfm_add_8x16_dct_dct_0_8bpc_rvv: 679.8 ( 1.23x) 566.1 ( 1.48x) inv_txfm_add_8x16_dct_dct_1_8bpc_rvv: 679.8 ( 9.98x) 566.5 (11.89x) inv_txfm_add_8x16_dct_flipadst_0_8bpc_rvv: 771.1 ( 9.34x) 667.4 (10.75x) inv_txfm_add_8x16_dct_flipadst_1_8bpc_rvv: 771.1 ( 9.34x) 667.3 (10.76x) inv_txfm_add_8x16_dct_identity_0_8bpc_rvv: 532.3 ( 9.84x) 422.1 (12.42x) inv_txfm_add_8x16_dct_identity_1_8bpc_rvv: 532.4 ( 9.85x) 422.2 (12.40x) inv_txfm_add_8x16_flipadst_adst_0_8bpc_rvv: 858.4 ( 8.98x) 699.2 (11.03x) inv_txfm_add_8x16_flipadst_adst_1_8bpc_rvv: 858.5 ( 8.98x) 699.3 (11.03x) inv_txfm_add_8x16_flipadst_dct_0_8bpc_rvv: 768.6 ( 9.52x) 609.7 (11.97x) inv_txfm_add_8x16_flipadst_dct_1_8bpc_rvv: 768.4 ( 9.52x) 609.6 (11.97x) inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_rvv: 866.5 ( 8.91x) 706.5 (10.92x) inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_rvv: 866.4 ( 8.92x) 706.6 (10.95x) inv_txfm_add_8x16_flipadst_identity_0_8bpc_rvv: 621.9 ( 9.28x) 464.6 (12.46x) inv_txfm_add_8x16_flipadst_identity_1_8bpc_rvv: 621.8 ( 9.28x) 464.6 (12.46x) inv_txfm_add_8x16_identity_adst_0_8bpc_rvv: 584.9 ( 9.78x) 564.1 (10.12x) inv_txfm_add_8x16_identity_adst_1_8bpc_rvv: 584.8 ( 9.78x) 563.9 (10.12x) inv_txfm_add_8x16_identity_dct_0_8bpc_rvv: 495.0 (10.75x) 474.6 (11.13x) inv_txfm_add_8x16_identity_dct_1_8bpc_rvv: 494.3 (10.75x) 474.7 (11.12x) inv_txfm_add_8x16_identity_flipadst_0_8bpc_rvv: 588.1 ( 9.76x) 568.1 (10.07x) inv_txfm_add_8x16_identity_flipadst_1_8bpc_rvv: 588.7 ( 9.74x) 568.0 (10.07x) inv_txfm_add_8x16_identity_identity_0_8bpc_rvv: 349.5 (10.78x) 328.8 (11.46x) inv_txfm_add_8x16_identity_identity_1_8bpc_rvv: 349.4 (10.79x) 328.7 (11.46x) |
||
|
|
afeeb3cc90 |
riscv64/itx: Add 4x16 8bpc eob test
Kendryte K230 Before After inv_txfm_add_4x16_adst_adst_0_8bpc_rvv: 429.9 (7.45x) 381.3 (8.58x) inv_txfm_add_4x16_adst_adst_1_8bpc_rvv: 430.0 (7.45x) 381.3 (8.57x) inv_txfm_add_4x16_adst_dct_0_8bpc_rvv: 381.0 (8.01x) 332.5 (9.19x) inv_txfm_add_4x16_adst_dct_1_8bpc_rvv: 381.0 (8.00x) 332.5 (9.19x) inv_txfm_add_4x16_adst_flipadst_0_8bpc_rvv: 432.8 (7.42x) 384.5 (8.52x) inv_txfm_add_4x16_adst_flipadst_1_8bpc_rvv: 432.8 (7.42x) 384.4 (8.52x) inv_txfm_add_4x16_adst_identity_0_8bpc_rvv: 304.6 (7.32x) 249.8 (9.18x) inv_txfm_add_4x16_adst_identity_1_8bpc_rvv: 304.5 (7.32x) 249.8 (9.18x) inv_txfm_add_4x16_dct_adst_0_8bpc_rvv: 407.2 (7.68x) 371.4 (8.57x) inv_txfm_add_4x16_dct_adst_1_8bpc_rvv: 407.1 (7.68x) 371.5 (8.58x) inv_txfm_add_4x16_dct_dct_0_8bpc_rvv: 357.9 (1.27x) 323.1 (1.41x) inv_txfm_add_4x16_dct_dct_1_8bpc_rvv: 357.9 (8.29x) 322.9 (9.16x) inv_txfm_add_4x16_dct_flipadst_0_8bpc_rvv: 410.0 (7.62x) 376.6 (8.45x) inv_txfm_add_4x16_dct_flipadst_1_8bpc_rvv: 410.0 (7.62x) 376.5 (8.47x) inv_txfm_add_4x16_dct_identity_0_8bpc_rvv: 275.2 (7.79x) 240.5 (9.21x) inv_txfm_add_4x16_dct_identity_1_8bpc_rvv: 275.3 (7.78x) 240.6 (9.19x) inv_txfm_add_4x16_flipadst_adst_0_8bpc_rvv: 430.5 (7.51x) 382.6 (8.60x) inv_txfm_add_4x16_flipadst_adst_1_8bpc_rvv: 430.1 (7.52x) 382.8 (8.60x) inv_txfm_add_4x16_flipadst_dct_0_8bpc_rvv: 381.1 (8.09x) 333.8 (9.21x) inv_txfm_add_4x16_flipadst_dct_1_8bpc_rvv: 381.0 (8.08x) 333.7 (9.21x) inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_rvv: 433.0 (7.48x) 385.7 (8.55x) inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_rvv: 433.0 (7.48x) 385.7 (8.55x) inv_txfm_add_4x16_flipadst_identity_0_8bpc_rvv: 298.6 (7.57x) 250.8 (9.28x) inv_txfm_add_4x16_flipadst_identity_1_8bpc_rvv: 298.6 (7.57x) 250.9 (9.27x) inv_txfm_add_4x16_identity_adst_0_8bpc_rvv: 361.5 (7.93x) 347.3 (8.35x) inv_txfm_add_4x16_identity_adst_1_8bpc_rvv: 361.4 (7.93x) 347.4 (8.35x) inv_txfm_add_4x16_identity_dct_0_8bpc_rvv: 310.9 (8.69x) 297.8 (9.02x) inv_txfm_add_4x16_identity_dct_1_8bpc_rvv: 311.0 (8.69x) 297.8 (9.02x) inv_txfm_add_4x16_identity_flipadst_0_8bpc_rvv: 364.1 (7.88x) 350.5 (8.29x) inv_txfm_add_4x16_identity_flipadst_1_8bpc_rvv: 364.2 (7.88x) 350.4 (8.31x) inv_txfm_add_4x16_identity_identity_0_8bpc_rvv: 229.7 (8.22x) 211.4 (9.11x) inv_txfm_add_4x16_identity_identity_1_8bpc_rvv: 229.7 (8.21x) 211.2 (9.12x) |
||
|
|
52948bbfcc | riscv/checkasm: Print the RVV vector length, if available | ||
|
|
8c209190bb |
arm/msac: Enable NEON optimizations on more platforms
This commit enables msac NEON assembly optimizations when building with MSVC targeting ARM. Note, the test for __APPLE__ is redundant and added for consistency. |
||
|
|
2ab2ec388e | riscv64/itx: Fix build issues with clang | ||
|
|
7be30df413 | arm64/itx16: Reuse horz_16x4 epilog, saves 96 bytes | ||
|
|
28c7e530b1 | arm32/itx16: Reuse horz_16x2 epilog, saves 24 bytes | ||
|
|
4bb0005ca7 | riscv64/itx: Reuse horz_16x8 epilog, saves 94 bytes | ||
|
|
f15b073156 | arm64/itx: Reuse horz_16x8 epilog, saves 512 bytes | ||
|
|
6249bd8809 | arm32/itx: Reuse horz_16x4 epilog, saves 336 bytes | ||
|
|
6e5d1df633 | riscv64/itx: Reuse 16x8 epilog, saves 706 bytes | ||
|
|
4585763474 | riscv64/itx: Reuse 8x16 epilog, saves 24 bytes | ||
|
|
be47dfcd3f | riscv64/itx: Tail call vert_8x16, saves 1086 bytes | ||
|
|
1830c9b598 | riscv64/itx: Reuse 16x4 epilog, saves 354 bytes | ||
|
|
311816b46d | riscv64/itx: Reuse 4x16 epilog, saves 642 bytes | ||
|
|
e7378375b5 | riscv64/itx: Fix unrolled .irp loops, saves 12 bytes | ||
|
|
d4746c908c | arm32/itx: Remove 16x8 variant, saves 528 bytes | ||
|
|
1ed35d2cbe | arm32/itx: Reuse 8x16 epilog, saves 48 bytes | ||
|
|
5250a16f21 |
arm32/itx: Reuse 16x4 epilog, saves 220 bytes
Unlike arm64, it is not possible to fold the vmov instructions into the transpose_4x8h macro so this commit _adds_ 4 vmov instructions to the code paths of the twelve 16x4 transforms that do not start with idtx. |
||
|
|
9944ce3080 | arm32/itx: Reuse 4x16 epilog, saves 268 bytes | ||
|
|
5020162d63 |
arm64/itx: Reuse 16x8 epilog, saves 568 bytes
Move the vertical transpose-transform-store operations to the end of inv_txfm_add_16x8_neon and fold the mov instructions into the second transpose. Only the four *16x8_identity* functions are modified and this commit _removes_ 8 mov instructions from these code paths. |
||
|
|
89c53031a8 | arm64: Add transpose_8x8h_mov macro | ||
|
|
80806b57b0 | arm64/itx: Reuse 8x16 epilog, saves 424 bytes | ||
|
|
3335c5ebdb |
arm64/itx: Reuse 16x4 epilog, saves 264 bytes
Move the vertical transpose-transform-store operations to the end of inv_txfm_add_16x4_neon and fold the mov instructions into the second transpose. Only the four *16x4_identity* functions are modified and this commit _removes_ 4 mov instructions from these code paths. |
||
|
|
57e46dd955 | arm64: Add transpose_4x8h_mov macro | ||
|
|
955939f762 | arm64/itx: Reuse 4x16 epilog, saves 312 bytes | ||
|
|
c15f7ecd46 |
riscv64/itx: Add 16x8 8bpp RVV transforms
inv_txfm_add_16x8_adst_adst_0_8bpc_c: 7638.9 ( 1.00x) inv_txfm_add_16x8_adst_adst_0_8bpc_rvv: 854.4 ( 8.94x) inv_txfm_add_16x8_adst_adst_1_8bpc_c: 7650.5 ( 1.00x) inv_txfm_add_16x8_adst_adst_1_8bpc_rvv: 854.4 ( 8.95x) inv_txfm_add_16x8_adst_adst_2_8bpc_c: 7649.4 ( 1.00x) inv_txfm_add_16x8_adst_adst_2_8bpc_rvv: 854.4 ( 8.95x) inv_txfm_add_16x8_adst_dct_0_8bpc_c: 7182.0 ( 1.00x) inv_txfm_add_16x8_adst_dct_0_8bpc_rvv: 758.1 ( 9.47x) inv_txfm_add_16x8_adst_dct_1_8bpc_c: 7175.6 ( 1.00x) inv_txfm_add_16x8_adst_dct_1_8bpc_rvv: 758.1 ( 9.47x) inv_txfm_add_16x8_adst_dct_2_8bpc_c: 7181.7 ( 1.00x) inv_txfm_add_16x8_adst_dct_2_8bpc_rvv: 758.0 ( 9.47x) inv_txfm_add_16x8_adst_flipadst_0_8bpc_c: 7671.7 ( 1.00x) inv_txfm_add_16x8_adst_flipadst_0_8bpc_rvv: 858.3 ( 8.94x) inv_txfm_add_16x8_adst_flipadst_1_8bpc_c: 7671.5 ( 1.00x) inv_txfm_add_16x8_adst_flipadst_1_8bpc_rvv: 858.1 ( 8.94x) inv_txfm_add_16x8_adst_flipadst_2_8bpc_c: 7673.8 ( 1.00x) inv_txfm_add_16x8_adst_flipadst_2_8bpc_rvv: 858.2 ( 8.94x) inv_txfm_add_16x8_adst_identity_0_8bpc_c: 5727.4 ( 1.00x) inv_txfm_add_16x8_adst_identity_0_8bpc_rvv: 612.6 ( 9.35x) inv_txfm_add_16x8_adst_identity_1_8bpc_c: 5709.0 ( 1.00x) inv_txfm_add_16x8_adst_identity_1_8bpc_rvv: 612.6 ( 9.32x) inv_txfm_add_16x8_adst_identity_2_8bpc_c: 5709.6 ( 1.00x) inv_txfm_add_16x8_adst_identity_2_8bpc_rvv: 612.5 ( 9.32x) inv_txfm_add_16x8_dct_adst_0_8bpc_c: 7272.9 ( 1.00x) inv_txfm_add_16x8_dct_adst_0_8bpc_rvv: 761.2 ( 9.55x) inv_txfm_add_16x8_dct_adst_1_8bpc_c: 7276.0 ( 1.00x) inv_txfm_add_16x8_dct_adst_1_8bpc_rvv: 761.0 ( 9.56x) inv_txfm_add_16x8_dct_adst_2_8bpc_c: 7271.5 ( 1.00x) inv_txfm_add_16x8_dct_adst_2_8bpc_rvv: 761.0 ( 9.55x) inv_txfm_add_16x8_dct_dct_0_8bpc_c: 822.4 ( 1.00x) inv_txfm_add_16x8_dct_dct_0_8bpc_rvv: 666.4 ( 1.23x) inv_txfm_add_16x8_dct_dct_1_8bpc_c: 6791.3 ( 1.00x) inv_txfm_add_16x8_dct_dct_1_8bpc_rvv: 666.6 (10.19x) inv_txfm_add_16x8_dct_dct_2_8bpc_c: 6786.0 ( 1.00x) inv_txfm_add_16x8_dct_dct_2_8bpc_rvv: 666.5 (10.18x) inv_txfm_add_16x8_dct_flipadst_0_8bpc_c: 7280.7 ( 1.00x) inv_txfm_add_16x8_dct_flipadst_0_8bpc_rvv: 764.8 ( 9.52x) inv_txfm_add_16x8_dct_flipadst_1_8bpc_c: 7279.0 ( 1.00x) inv_txfm_add_16x8_dct_flipadst_1_8bpc_rvv: 765.0 ( 9.52x) inv_txfm_add_16x8_dct_flipadst_2_8bpc_c: 7282.0 ( 1.00x) inv_txfm_add_16x8_dct_flipadst_2_8bpc_rvv: 764.8 ( 9.52x) inv_txfm_add_16x8_dct_identity_0_8bpc_c: 5340.5 ( 1.00x) inv_txfm_add_16x8_dct_identity_0_8bpc_rvv: 520.4 (10.26x) inv_txfm_add_16x8_dct_identity_1_8bpc_c: 5342.2 ( 1.00x) inv_txfm_add_16x8_dct_identity_1_8bpc_rvv: 521.0 (10.25x) inv_txfm_add_16x8_dct_identity_2_8bpc_c: 5341.7 ( 1.00x) inv_txfm_add_16x8_dct_identity_2_8bpc_rvv: 520.9 (10.25x) inv_txfm_add_16x8_flipadst_adst_0_8bpc_c: 7671.5 ( 1.00x) inv_txfm_add_16x8_flipadst_adst_0_8bpc_rvv: 855.3 ( 8.97x) inv_txfm_add_16x8_flipadst_adst_1_8bpc_c: 7663.0 ( 1.00x) inv_txfm_add_16x8_flipadst_adst_1_8bpc_rvv: 855.3 ( 8.96x) inv_txfm_add_16x8_flipadst_adst_2_8bpc_c: 7663.4 ( 1.00x) inv_txfm_add_16x8_flipadst_adst_2_8bpc_rvv: 855.2 ( 8.96x) inv_txfm_add_16x8_flipadst_dct_0_8bpc_c: 7185.0 ( 1.00x) inv_txfm_add_16x8_flipadst_dct_0_8bpc_rvv: 760.2 ( 9.45x) inv_txfm_add_16x8_flipadst_dct_1_8bpc_c: 7185.4 ( 1.00x) inv_txfm_add_16x8_flipadst_dct_1_8bpc_rvv: 760.2 ( 9.45x) inv_txfm_add_16x8_flipadst_dct_2_8bpc_c: 7185.3 ( 1.00x) inv_txfm_add_16x8_flipadst_dct_2_8bpc_rvv: 760.4 ( 9.45x) inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_c: 7686.6 ( 1.00x) inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_rvv: 859.1 ( 8.95x) inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_c: 7687.9 ( 1.00x) inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_rvv: 859.2 ( 8.95x) inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_c: 7684.5 ( 1.00x) inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_rvv: 859.0 ( 8.95x) inv_txfm_add_16x8_flipadst_identity_0_8bpc_c: 5723.1 ( 1.00x) inv_txfm_add_16x8_flipadst_identity_0_8bpc_rvv: 615.7 ( 9.30x) inv_txfm_add_16x8_flipadst_identity_1_8bpc_c: 5725.1 ( 1.00x) inv_txfm_add_16x8_flipadst_identity_1_8bpc_rvv: 615.6 ( 9.30x) inv_txfm_add_16x8_flipadst_identity_2_8bpc_c: 5713.0 ( 1.00x) inv_txfm_add_16x8_flipadst_identity_2_8bpc_rvv: 615.6 ( 9.28x) inv_txfm_add_16x8_identity_adst_0_8bpc_c: 5390.1 ( 1.00x) inv_txfm_add_16x8_identity_adst_0_8bpc_rvv: 617.9 ( 8.72x) inv_txfm_add_16x8_identity_adst_1_8bpc_c: 5388.8 ( 1.00x) inv_txfm_add_16x8_identity_adst_1_8bpc_rvv: 617.7 ( 8.72x) inv_txfm_add_16x8_identity_adst_2_8bpc_c: 5390.0 ( 1.00x) inv_txfm_add_16x8_identity_adst_2_8bpc_rvv: 617.7 ( 8.73x) inv_txfm_add_16x8_identity_dct_0_8bpc_c: 4919.0 ( 1.00x) inv_txfm_add_16x8_identity_dct_0_8bpc_rvv: 522.9 ( 9.41x) inv_txfm_add_16x8_identity_dct_1_8bpc_c: 4916.6 ( 1.00x) inv_txfm_add_16x8_identity_dct_1_8bpc_rvv: 523.0 ( 9.40x) inv_txfm_add_16x8_identity_dct_2_8bpc_c: 4918.6 ( 1.00x) inv_txfm_add_16x8_identity_dct_2_8bpc_rvv: 523.0 ( 9.40x) inv_txfm_add_16x8_identity_flipadst_0_8bpc_c: 5402.3 ( 1.00x) inv_txfm_add_16x8_identity_flipadst_0_8bpc_rvv: 621.7 ( 8.69x) inv_txfm_add_16x8_identity_flipadst_1_8bpc_c: 5402.1 ( 1.00x) inv_txfm_add_16x8_identity_flipadst_1_8bpc_rvv: 621.3 ( 8.69x) inv_txfm_add_16x8_identity_flipadst_2_8bpc_c: 5401.6 ( 1.00x) inv_txfm_add_16x8_identity_flipadst_2_8bpc_rvv: 621.6 ( 8.69x) inv_txfm_add_16x8_identity_identity_0_8bpc_c: 3436.1 ( 1.00x) inv_txfm_add_16x8_identity_identity_0_8bpc_rvv: 377.8 ( 9.09x) inv_txfm_add_16x8_identity_identity_1_8bpc_c: 3436.3 ( 1.00x) inv_txfm_add_16x8_identity_identity_1_8bpc_rvv: 377.9 ( 9.09x) inv_txfm_add_16x8_identity_identity_2_8bpc_c: 3436.1 ( 1.00x) inv_txfm_add_16x8_identity_identity_2_8bpc_rvv: 377.8 ( 9.09x) |
||
|
|
5ca7a025be |
riscv64/itx: Add 8x16 8bpc RVV transforms
inv_txfm_add_8x16_adst_adst_0_8bpc_c: 7682.3 ( 1.00x) inv_txfm_add_8x16_adst_adst_0_8bpc_rvv: 842.2 ( 9.12x) inv_txfm_add_8x16_adst_adst_1_8bpc_c: 7682.0 ( 1.00x) inv_txfm_add_8x16_adst_adst_1_8bpc_rvv: 842.1 ( 9.12x) inv_txfm_add_8x16_adst_adst_2_8bpc_c: 7681.6 ( 1.00x) inv_txfm_add_8x16_adst_adst_2_8bpc_rvv: 842.2 ( 9.12x) inv_txfm_add_8x16_adst_dct_0_8bpc_c: 7309.0 ( 1.00x) inv_txfm_add_8x16_adst_dct_0_8bpc_rvv: 752.9 ( 9.71x) inv_txfm_add_8x16_adst_dct_1_8bpc_c: 7317.4 ( 1.00x) inv_txfm_add_8x16_adst_dct_1_8bpc_rvv: 752.9 ( 9.72x) inv_txfm_add_8x16_adst_dct_2_8bpc_c: 7323.6 ( 1.00x) inv_txfm_add_8x16_adst_dct_2_8bpc_rvv: 753.0 ( 9.73x) inv_txfm_add_8x16_adst_flipadst_0_8bpc_c: 7686.5 ( 1.00x) inv_txfm_add_8x16_adst_flipadst_0_8bpc_rvv: 846.7 ( 9.08x) inv_txfm_add_8x16_adst_flipadst_1_8bpc_c: 7686.7 ( 1.00x) inv_txfm_add_8x16_adst_flipadst_1_8bpc_rvv: 846.6 ( 9.08x) inv_txfm_add_8x16_adst_flipadst_2_8bpc_c: 7688.0 ( 1.00x) inv_txfm_add_8x16_adst_flipadst_2_8bpc_rvv: 846.6 ( 9.08x) inv_txfm_add_8x16_adst_identity_0_8bpc_c: 5742.6 ( 1.00x) inv_txfm_add_8x16_adst_identity_0_8bpc_rvv: 608.6 ( 9.44x) inv_txfm_add_8x16_adst_identity_1_8bpc_c: 5741.5 ( 1.00x) inv_txfm_add_8x16_adst_identity_1_8bpc_rvv: 608.7 ( 9.43x) inv_txfm_add_8x16_adst_identity_2_8bpc_c: 5743.3 ( 1.00x) inv_txfm_add_8x16_adst_identity_2_8bpc_rvv: 608.4 ( 9.44x) inv_txfm_add_8x16_dct_adst_0_8bpc_c: 7229.8 ( 1.00x) inv_txfm_add_8x16_dct_adst_0_8bpc_rvv: 756.3 ( 9.56x) inv_txfm_add_8x16_dct_adst_1_8bpc_c: 7227.7 ( 1.00x) inv_txfm_add_8x16_dct_adst_1_8bpc_rvv: 756.3 ( 9.56x) inv_txfm_add_8x16_dct_adst_2_8bpc_c: 7229.0 ( 1.00x) inv_txfm_add_8x16_dct_adst_2_8bpc_rvv: 756.3 ( 9.56x) inv_txfm_add_8x16_dct_dct_0_8bpc_c: 839.3 ( 1.00x) inv_txfm_add_8x16_dct_dct_0_8bpc_rvv: 667.4 ( 1.26x) inv_txfm_add_8x16_dct_dct_1_8bpc_c: 6842.7 ( 1.00x) inv_txfm_add_8x16_dct_dct_1_8bpc_rvv: 667.4 (10.25x) inv_txfm_add_8x16_dct_dct_2_8bpc_c: 6845.3 ( 1.00x) inv_txfm_add_8x16_dct_dct_2_8bpc_rvv: 667.4 (10.26x) inv_txfm_add_8x16_dct_flipadst_0_8bpc_c: 7222.3 ( 1.00x) inv_txfm_add_8x16_dct_flipadst_0_8bpc_rvv: 760.4 ( 9.50x) inv_txfm_add_8x16_dct_flipadst_1_8bpc_c: 7222.7 ( 1.00x) inv_txfm_add_8x16_dct_flipadst_1_8bpc_rvv: 760.4 ( 9.50x) inv_txfm_add_8x16_dct_flipadst_2_8bpc_c: 7222.2 ( 1.00x) inv_txfm_add_8x16_dct_flipadst_2_8bpc_rvv: 760.4 ( 9.50x) inv_txfm_add_8x16_dct_identity_0_8bpc_c: 5286.1 ( 1.00x) inv_txfm_add_8x16_dct_identity_0_8bpc_rvv: 521.4 (10.14x) inv_txfm_add_8x16_dct_identity_1_8bpc_c: 5283.2 ( 1.00x) inv_txfm_add_8x16_dct_identity_1_8bpc_rvv: 521.4 (10.13x) inv_txfm_add_8x16_dct_identity_2_8bpc_c: 5285.7 ( 1.00x) inv_txfm_add_8x16_dct_identity_2_8bpc_rvv: 521.3 (10.14x) inv_txfm_add_8x16_flipadst_adst_0_8bpc_c: 7701.2 ( 1.00x) inv_txfm_add_8x16_flipadst_adst_0_8bpc_rvv: 845.7 ( 9.11x) inv_txfm_add_8x16_flipadst_adst_1_8bpc_c: 7702.5 ( 1.00x) inv_txfm_add_8x16_flipadst_adst_1_8bpc_rvv: 845.7 ( 9.11x) inv_txfm_add_8x16_flipadst_adst_2_8bpc_c: 7708.0 ( 1.00x) inv_txfm_add_8x16_flipadst_adst_2_8bpc_rvv: 845.7 ( 9.11x) inv_txfm_add_8x16_flipadst_dct_0_8bpc_c: 7331.0 ( 1.00x) inv_txfm_add_8x16_flipadst_dct_0_8bpc_rvv: 758.9 ( 9.66x) inv_txfm_add_8x16_flipadst_dct_1_8bpc_c: 7327.2 ( 1.00x) inv_txfm_add_8x16_flipadst_dct_1_8bpc_rvv: 758.8 ( 9.66x) inv_txfm_add_8x16_flipadst_dct_2_8bpc_c: 7326.8 ( 1.00x) inv_txfm_add_8x16_flipadst_dct_2_8bpc_rvv: 758.7 ( 9.66x) inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_c: 7707.7 ( 1.00x) inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_rvv: 855.8 ( 9.01x) inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_c: 7708.1 ( 1.00x) inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_rvv: 855.5 ( 9.01x) inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_c: 7708.1 ( 1.00x) inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_rvv: 855.7 ( 9.01x) inv_txfm_add_8x16_flipadst_identity_0_8bpc_c: 5764.4 ( 1.00x) inv_txfm_add_8x16_flipadst_identity_0_8bpc_rvv: 611.8 ( 9.42x) inv_txfm_add_8x16_flipadst_identity_1_8bpc_c: 5766.6 ( 1.00x) inv_txfm_add_8x16_flipadst_identity_1_8bpc_rvv: 611.8 ( 9.43x) inv_txfm_add_8x16_flipadst_identity_2_8bpc_c: 5763.2 ( 1.00x) inv_txfm_add_8x16_flipadst_identity_2_8bpc_rvv: 611.8 ( 9.42x) inv_txfm_add_8x16_identity_adst_0_8bpc_c: 5719.2 ( 1.00x) inv_txfm_add_8x16_identity_adst_0_8bpc_rvv: 574.0 ( 9.96x) inv_txfm_add_8x16_identity_adst_1_8bpc_c: 5719.2 ( 1.00x) inv_txfm_add_8x16_identity_adst_1_8bpc_rvv: 574.0 ( 9.96x) inv_txfm_add_8x16_identity_adst_2_8bpc_c: 5721.1 ( 1.00x) inv_txfm_add_8x16_identity_adst_2_8bpc_rvv: 574.0 ( 9.97x) inv_txfm_add_8x16_identity_dct_0_8bpc_c: 5344.9 ( 1.00x) inv_txfm_add_8x16_identity_dct_0_8bpc_rvv: 484.9 (11.02x) inv_txfm_add_8x16_identity_dct_1_8bpc_c: 5341.4 ( 1.00x) inv_txfm_add_8x16_identity_dct_1_8bpc_rvv: 484.2 (11.03x) inv_txfm_add_8x16_identity_dct_2_8bpc_c: 5342.9 ( 1.00x) inv_txfm_add_8x16_identity_dct_2_8bpc_rvv: 484.9 (11.02x) inv_txfm_add_8x16_identity_flipadst_0_8bpc_c: 5729.5 ( 1.00x) inv_txfm_add_8x16_identity_flipadst_0_8bpc_rvv: 577.8 ( 9.92x) inv_txfm_add_8x16_identity_flipadst_1_8bpc_c: 5731.1 ( 1.00x) inv_txfm_add_8x16_identity_flipadst_1_8bpc_rvv: 578.3 ( 9.91x) inv_txfm_add_8x16_identity_flipadst_2_8bpc_c: 5730.1 ( 1.00x) inv_txfm_add_8x16_identity_flipadst_2_8bpc_rvv: 578.2 ( 9.91x) inv_txfm_add_8x16_identity_identity_0_8bpc_c: 3779.3 ( 1.00x) inv_txfm_add_8x16_identity_identity_0_8bpc_rvv: 338.8 (11.15x) inv_txfm_add_8x16_identity_identity_1_8bpc_c: 3779.2 ( 1.00x) inv_txfm_add_8x16_identity_identity_1_8bpc_rvv: 338.8 (11.16x) inv_txfm_add_8x16_identity_identity_2_8bpc_c: 3779.3 ( 1.00x) inv_txfm_add_8x16_identity_identity_2_8bpc_rvv: 338.7 (11.16x) |
||
|
|
ce7cd2855b | riscv64/itx: Use registers above v15 in iadst_8 macro | ||
|
|
e4ed80bc5a |
riscv64/itx: Add 16x4 8bpc RVV transforms
inv_txfm_add_16x4_adst_adst_0_8bpc_c: 3132.7 ( 1.00x) inv_txfm_add_16x4_adst_adst_0_8bpc_rvv: 427.3 ( 7.33x) inv_txfm_add_16x4_adst_adst_1_8bpc_c: 3120.8 ( 1.00x) inv_txfm_add_16x4_adst_adst_1_8bpc_rvv: 427.1 ( 7.31x) inv_txfm_add_16x4_adst_adst_2_8bpc_c: 3119.4 ( 1.00x) inv_txfm_add_16x4_adst_adst_2_8bpc_rvv: 427.2 ( 7.30x) inv_txfm_add_16x4_adst_dct_0_8bpc_c: 3063.0 ( 1.00x) inv_txfm_add_16x4_adst_dct_0_8bpc_rvv: 405.3 ( 7.56x) inv_txfm_add_16x4_adst_dct_1_8bpc_c: 3063.4 ( 1.00x) inv_txfm_add_16x4_adst_dct_1_8bpc_rvv: 405.4 ( 7.56x) inv_txfm_add_16x4_adst_dct_2_8bpc_c: 3062.7 ( 1.00x) inv_txfm_add_16x4_adst_dct_2_8bpc_rvv: 405.4 ( 7.56x) inv_txfm_add_16x4_adst_flipadst_0_8bpc_c: 3166.7 ( 1.00x) inv_txfm_add_16x4_adst_flipadst_0_8bpc_rvv: 430.9 ( 7.35x) inv_txfm_add_16x4_adst_flipadst_1_8bpc_c: 3160.9 ( 1.00x) inv_txfm_add_16x4_adst_flipadst_1_8bpc_rvv: 430.7 ( 7.34x) inv_txfm_add_16x4_adst_flipadst_2_8bpc_c: 3160.7 ( 1.00x) inv_txfm_add_16x4_adst_flipadst_2_8bpc_rvv: 430.2 ( 7.35x) inv_txfm_add_16x4_adst_identity_0_8bpc_c: 2958.9 ( 1.00x) inv_txfm_add_16x4_adst_identity_0_8bpc_rvv: 365.2 ( 8.10x) inv_txfm_add_16x4_adst_identity_1_8bpc_c: 2955.2 ( 1.00x) inv_txfm_add_16x4_adst_identity_1_8bpc_rvv: 365.2 ( 8.09x) inv_txfm_add_16x4_adst_identity_2_8bpc_c: 2961.4 ( 1.00x) inv_txfm_add_16x4_adst_identity_2_8bpc_rvv: 365.2 ( 8.11x) inv_txfm_add_16x4_dct_adst_0_8bpc_c: 2928.8 ( 1.00x) inv_txfm_add_16x4_dct_adst_0_8bpc_rvv: 378.5 ( 7.74x) inv_txfm_add_16x4_dct_adst_1_8bpc_c: 2930.5 ( 1.00x) inv_txfm_add_16x4_dct_adst_1_8bpc_rvv: 378.6 ( 7.74x) inv_txfm_add_16x4_dct_adst_2_8bpc_c: 2942.7 ( 1.00x) inv_txfm_add_16x4_dct_adst_2_8bpc_rvv: 378.6 ( 7.77x) inv_txfm_add_16x4_dct_dct_0_8bpc_c: 438.8 ( 1.00x) inv_txfm_add_16x4_dct_dct_0_8bpc_rvv: 356.8 ( 1.23x) inv_txfm_add_16x4_dct_dct_1_8bpc_c: 2871.7 ( 1.00x) inv_txfm_add_16x4_dct_dct_1_8bpc_rvv: 356.7 ( 8.05x) inv_txfm_add_16x4_dct_dct_2_8bpc_c: 2862.9 ( 1.00x) inv_txfm_add_16x4_dct_dct_2_8bpc_rvv: 356.7 ( 8.03x) inv_txfm_add_16x4_dct_flipadst_0_8bpc_c: 2965.8 ( 1.00x) inv_txfm_add_16x4_dct_flipadst_0_8bpc_rvv: 380.6 ( 7.79x) inv_txfm_add_16x4_dct_flipadst_1_8bpc_c: 2964.8 ( 1.00x) inv_txfm_add_16x4_dct_flipadst_1_8bpc_rvv: 381.1 ( 7.78x) inv_txfm_add_16x4_dct_flipadst_2_8bpc_c: 2966.1 ( 1.00x) inv_txfm_add_16x4_dct_flipadst_2_8bpc_rvv: 381.0 ( 7.78x) inv_txfm_add_16x4_dct_identity_0_8bpc_c: 2760.8 ( 1.00x) inv_txfm_add_16x4_dct_identity_0_8bpc_rvv: 310.7 ( 8.89x) inv_txfm_add_16x4_dct_identity_1_8bpc_c: 2760.8 ( 1.00x) inv_txfm_add_16x4_dct_identity_1_8bpc_rvv: 310.7 ( 8.89x) inv_txfm_add_16x4_dct_identity_2_8bpc_c: 2760.4 ( 1.00x) inv_txfm_add_16x4_dct_identity_2_8bpc_rvv: 310.7 ( 8.88x) inv_txfm_add_16x4_flipadst_adst_0_8bpc_c: 3140.5 ( 1.00x) inv_txfm_add_16x4_flipadst_adst_0_8bpc_rvv: 430.7 ( 7.29x) inv_txfm_add_16x4_flipadst_adst_1_8bpc_c: 3138.3 ( 1.00x) inv_txfm_add_16x4_flipadst_adst_1_8bpc_rvv: 430.7 ( 7.29x) inv_txfm_add_16x4_flipadst_adst_2_8bpc_c: 3139.1 ( 1.00x) inv_txfm_add_16x4_flipadst_adst_2_8bpc_rvv: 430.5 ( 7.29x) inv_txfm_add_16x4_flipadst_dct_0_8bpc_c: 3060.7 ( 1.00x) inv_txfm_add_16x4_flipadst_dct_0_8bpc_rvv: 408.9 ( 7.48x) inv_txfm_add_16x4_flipadst_dct_1_8bpc_c: 3059.8 ( 1.00x) inv_txfm_add_16x4_flipadst_dct_1_8bpc_rvv: 408.9 ( 7.48x) inv_txfm_add_16x4_flipadst_dct_2_8bpc_c: 3063.6 ( 1.00x) inv_txfm_add_16x4_flipadst_dct_2_8bpc_rvv: 408.9 ( 7.49x) inv_txfm_add_16x4_flipadst_flipadst_0_8bpc_c: 3170.7 ( 1.00x) inv_txfm_add_16x4_flipadst_flipadst_0_8bpc_rvv: 433.1 ( 7.32x) inv_txfm_add_16x4_flipadst_flipadst_1_8bpc_c: 3169.1 ( 1.00x) inv_txfm_add_16x4_flipadst_flipadst_1_8bpc_rvv: 433.0 ( 7.32x) inv_txfm_add_16x4_flipadst_flipadst_2_8bpc_c: 3175.1 ( 1.00x) inv_txfm_add_16x4_flipadst_flipadst_2_8bpc_rvv: 433.2 ( 7.33x) inv_txfm_add_16x4_flipadst_identity_0_8bpc_c: 2954.0 ( 1.00x) inv_txfm_add_16x4_flipadst_identity_0_8bpc_rvv: 362.1 ( 8.16x) inv_txfm_add_16x4_flipadst_identity_1_8bpc_c: 2949.5 ( 1.00x) inv_txfm_add_16x4_flipadst_identity_1_8bpc_rvv: 362.4 ( 8.14x) inv_txfm_add_16x4_flipadst_identity_2_8bpc_c: 2950.6 ( 1.00x) inv_txfm_add_16x4_flipadst_identity_2_8bpc_rvv: 362.5 ( 8.14x) inv_txfm_add_16x4_identity_adst_0_8bpc_c: 1977.4 ( 1.00x) inv_txfm_add_16x4_identity_adst_0_8bpc_rvv: 296.6 ( 6.67x) inv_txfm_add_16x4_identity_adst_1_8bpc_c: 1977.3 ( 1.00x) inv_txfm_add_16x4_identity_adst_1_8bpc_rvv: 296.6 ( 6.67x) inv_txfm_add_16x4_identity_adst_2_8bpc_c: 1977.4 ( 1.00x) inv_txfm_add_16x4_identity_adst_2_8bpc_rvv: 296.6 ( 6.67x) inv_txfm_add_16x4_identity_dct_0_8bpc_c: 1917.3 ( 1.00x) inv_txfm_add_16x4_identity_dct_0_8bpc_rvv: 276.2 ( 6.94x) inv_txfm_add_16x4_identity_dct_1_8bpc_c: 1915.6 ( 1.00x) inv_txfm_add_16x4_identity_dct_1_8bpc_rvv: 276.2 ( 6.94x) inv_txfm_add_16x4_identity_dct_2_8bpc_c: 1917.2 ( 1.00x) inv_txfm_add_16x4_identity_dct_2_8bpc_rvv: 276.1 ( 6.94x) inv_txfm_add_16x4_identity_flipadst_0_8bpc_c: 2017.0 ( 1.00x) inv_txfm_add_16x4_identity_flipadst_0_8bpc_rvv: 305.8 ( 6.60x) inv_txfm_add_16x4_identity_flipadst_1_8bpc_c: 2017.4 ( 1.00x) inv_txfm_add_16x4_identity_flipadst_1_8bpc_rvv: 305.7 ( 6.60x) inv_txfm_add_16x4_identity_flipadst_2_8bpc_c: 2017.0 ( 1.00x) inv_txfm_add_16x4_identity_flipadst_2_8bpc_rvv: 305.8 ( 6.60x) inv_txfm_add_16x4_identity_identity_0_8bpc_c: 1803.4 ( 1.00x) inv_txfm_add_16x4_identity_identity_0_8bpc_rvv: 228.6 ( 7.89x) inv_txfm_add_16x4_identity_identity_1_8bpc_c: 1803.6 ( 1.00x) inv_txfm_add_16x4_identity_identity_1_8bpc_rvv: 228.6 ( 7.89x) inv_txfm_add_16x4_identity_identity_2_8bpc_c: 1803.0 ( 1.00x) inv_txfm_add_16x4_identity_identity_2_8bpc_rvv: 228.6 ( 7.89x) |
||
|
|
83423b3484 |
riscv64/itx: Add 4x16 8bpc RVV transforms
inv_txfm_add_4x16_adst_adst_0_8bpc_c: 3310.8 ( 1.00x) inv_txfm_add_4x16_adst_adst_0_8bpc_rvv: 429.3 ( 7.71x) inv_txfm_add_4x16_adst_adst_1_8bpc_c: 3308.6 ( 1.00x) inv_txfm_add_4x16_adst_adst_1_8bpc_rvv: 429.3 ( 7.71x) inv_txfm_add_4x16_adst_adst_2_8bpc_c: 3308.2 ( 1.00x) inv_txfm_add_4x16_adst_adst_2_8bpc_rvv: 429.3 ( 7.71x) inv_txfm_add_4x16_adst_dct_0_8bpc_c: 3097.6 ( 1.00x) inv_txfm_add_4x16_adst_dct_0_8bpc_rvv: 381.5 ( 8.12x) inv_txfm_add_4x16_adst_dct_1_8bpc_c: 3097.6 ( 1.00x) inv_txfm_add_4x16_adst_dct_1_8bpc_rvv: 381.0 ( 8.13x) inv_txfm_add_4x16_adst_dct_2_8bpc_c: 3096.4 ( 1.00x) inv_txfm_add_4x16_adst_dct_2_8bpc_rvv: 381.5 ( 8.12x) inv_txfm_add_4x16_adst_flipadst_0_8bpc_c: 3309.4 ( 1.00x) inv_txfm_add_4x16_adst_flipadst_0_8bpc_rvv: 433.5 ( 7.64x) inv_txfm_add_4x16_adst_flipadst_1_8bpc_c: 3306.9 ( 1.00x) inv_txfm_add_4x16_adst_flipadst_1_8bpc_rvv: 433.4 ( 7.63x) inv_txfm_add_4x16_adst_flipadst_2_8bpc_c: 3308.5 ( 1.00x) inv_txfm_add_4x16_adst_flipadst_2_8bpc_rvv: 433.6 ( 7.63x) inv_txfm_add_4x16_adst_identity_0_8bpc_c: 2330.0 ( 1.00x) inv_txfm_add_4x16_adst_identity_0_8bpc_rvv: 298.4 ( 7.81x) inv_txfm_add_4x16_adst_identity_1_8bpc_c: 2329.4 ( 1.00x) inv_txfm_add_4x16_adst_identity_1_8bpc_rvv: 298.4 ( 7.81x) inv_txfm_add_4x16_adst_identity_2_8bpc_c: 2329.7 ( 1.00x) inv_txfm_add_4x16_adst_identity_2_8bpc_rvv: 298.3 ( 7.81x) inv_txfm_add_4x16_dct_adst_0_8bpc_c: 3186.5 ( 1.00x) inv_txfm_add_4x16_dct_adst_0_8bpc_rvv: 408.0 ( 7.81x) inv_txfm_add_4x16_dct_adst_1_8bpc_c: 3190.3 ( 1.00x) inv_txfm_add_4x16_dct_adst_1_8bpc_rvv: 408.0 ( 7.82x) inv_txfm_add_4x16_dct_adst_2_8bpc_c: 3184.9 ( 1.00x) inv_txfm_add_4x16_dct_adst_2_8bpc_rvv: 408.1 ( 7.80x) inv_txfm_add_4x16_dct_dct_0_8bpc_c: 455.3 ( 1.00x) inv_txfm_add_4x16_dct_dct_0_8bpc_rvv: 360.0 ( 1.26x) inv_txfm_add_4x16_dct_dct_1_8bpc_c: 2974.0 ( 1.00x) inv_txfm_add_4x16_dct_dct_1_8bpc_rvv: 359.9 ( 8.26x) inv_txfm_add_4x16_dct_dct_2_8bpc_c: 2975.4 ( 1.00x) inv_txfm_add_4x16_dct_dct_2_8bpc_rvv: 359.9 ( 8.27x) inv_txfm_add_4x16_dct_flipadst_0_8bpc_c: 3190.7 ( 1.00x) inv_txfm_add_4x16_dct_flipadst_0_8bpc_rvv: 412.2 ( 7.74x) inv_txfm_add_4x16_dct_flipadst_1_8bpc_c: 3190.9 ( 1.00x) inv_txfm_add_4x16_dct_flipadst_1_8bpc_rvv: 412.3 ( 7.74x) inv_txfm_add_4x16_dct_flipadst_2_8bpc_c: 3192.7 ( 1.00x) inv_txfm_add_4x16_dct_flipadst_2_8bpc_rvv: 412.2 ( 7.75x) inv_txfm_add_4x16_dct_identity_0_8bpc_c: 2208.3 ( 1.00x) inv_txfm_add_4x16_dct_identity_0_8bpc_rvv: 277.2 ( 7.97x) inv_txfm_add_4x16_dct_identity_1_8bpc_c: 2206.6 ( 1.00x) inv_txfm_add_4x16_dct_identity_1_8bpc_rvv: 277.2 ( 7.96x) inv_txfm_add_4x16_dct_identity_2_8bpc_c: 2205.9 ( 1.00x) inv_txfm_add_4x16_dct_identity_2_8bpc_rvv: 277.1 ( 7.96x) inv_txfm_add_4x16_flipadst_adst_0_8bpc_c: 3329.2 ( 1.00x) inv_txfm_add_4x16_flipadst_adst_0_8bpc_rvv: 429.7 ( 7.75x) inv_txfm_add_4x16_flipadst_adst_1_8bpc_c: 3328.1 ( 1.00x) inv_txfm_add_4x16_flipadst_adst_1_8bpc_rvv: 430.3 ( 7.73x) inv_txfm_add_4x16_flipadst_adst_2_8bpc_c: 3331.1 ( 1.00x) inv_txfm_add_4x16_flipadst_adst_2_8bpc_rvv: 430.3 ( 7.74x) inv_txfm_add_4x16_flipadst_dct_0_8bpc_c: 3119.8 ( 1.00x) inv_txfm_add_4x16_flipadst_dct_0_8bpc_rvv: 381.6 ( 8.18x) inv_txfm_add_4x16_flipadst_dct_1_8bpc_c: 3119.7 ( 1.00x) inv_txfm_add_4x16_flipadst_dct_1_8bpc_rvv: 381.6 ( 8.17x) inv_txfm_add_4x16_flipadst_dct_2_8bpc_c: 3119.0 ( 1.00x) inv_txfm_add_4x16_flipadst_dct_2_8bpc_rvv: 381.7 ( 8.17x) inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_c: 3329.8 ( 1.00x) inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_rvv: 433.7 ( 7.68x) inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_c: 3328.3 ( 1.00x) inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_rvv: 433.7 ( 7.67x) inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_c: 3328.2 ( 1.00x) inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_rvv: 433.6 ( 7.67x) inv_txfm_add_4x16_flipadst_identity_0_8bpc_c: 2350.4 ( 1.00x) inv_txfm_add_4x16_flipadst_identity_0_8bpc_rvv: 299.2 ( 7.86x) inv_txfm_add_4x16_flipadst_identity_1_8bpc_c: 2353.5 ( 1.00x) inv_txfm_add_4x16_flipadst_identity_1_8bpc_rvv: 299.1 ( 7.87x) inv_txfm_add_4x16_flipadst_identity_2_8bpc_c: 2352.5 ( 1.00x) inv_txfm_add_4x16_flipadst_identity_2_8bpc_rvv: 299.1 ( 7.87x) inv_txfm_add_4x16_identity_adst_0_8bpc_c: 2967.8 ( 1.00x) inv_txfm_add_4x16_identity_adst_0_8bpc_rvv: 360.7 ( 8.23x) inv_txfm_add_4x16_identity_adst_1_8bpc_c: 2965.5 ( 1.00x) inv_txfm_add_4x16_identity_adst_1_8bpc_rvv: 360.7 ( 8.22x) inv_txfm_add_4x16_identity_adst_2_8bpc_c: 2964.5 ( 1.00x) inv_txfm_add_4x16_identity_adst_2_8bpc_rvv: 360.4 ( 8.23x) inv_txfm_add_4x16_identity_dct_0_8bpc_c: 2758.0 ( 1.00x) inv_txfm_add_4x16_identity_dct_0_8bpc_rvv: 313.2 ( 8.81x) inv_txfm_add_4x16_identity_dct_1_8bpc_c: 2757.3 ( 1.00x) inv_txfm_add_4x16_identity_dct_1_8bpc_rvv: 313.2 ( 8.80x) inv_txfm_add_4x16_identity_dct_2_8bpc_c: 2758.4 ( 1.00x) inv_txfm_add_4x16_identity_dct_2_8bpc_rvv: 313.1 ( 8.81x) inv_txfm_add_4x16_identity_flipadst_0_8bpc_c: 2968.3 ( 1.00x) inv_txfm_add_4x16_identity_flipadst_0_8bpc_rvv: 364.6 ( 8.14x) inv_txfm_add_4x16_identity_flipadst_1_8bpc_c: 2965.2 ( 1.00x) inv_txfm_add_4x16_identity_flipadst_1_8bpc_rvv: 364.6 ( 8.13x) inv_txfm_add_4x16_identity_flipadst_2_8bpc_c: 2968.5 ( 1.00x) inv_txfm_add_4x16_identity_flipadst_2_8bpc_rvv: 364.6 ( 8.14x) inv_txfm_add_4x16_identity_identity_0_8bpc_c: 1985.7 ( 1.00x) inv_txfm_add_4x16_identity_identity_0_8bpc_rvv: 229.3 ( 8.66x) inv_txfm_add_4x16_identity_identity_1_8bpc_c: 1985.4 ( 1.00x) inv_txfm_add_4x16_identity_identity_1_8bpc_rvv: 229.6 ( 8.65x) inv_txfm_add_4x16_identity_identity_2_8bpc_c: 1985.7 ( 1.00x) inv_txfm_add_4x16_identity_identity_2_8bpc_rvv: 229.4 ( 8.66x) |
||
|
|
40d5b50552 | riscv64/itx: Use registers above v15 in iadst_4 macro | ||
|
|
27e5e2629c |
riscv64/itx: Add 8x4 8bpc RVV transforms
inv_txfm_add_8x4_adst_adst_0_8bpc_c: 1600.6 ( 1.00x) inv_txfm_add_8x4_adst_adst_0_8bpc_rvv: 199.2 ( 8.03x) inv_txfm_add_8x4_adst_adst_1_8bpc_c: 1602.3 ( 1.00x) inv_txfm_add_8x4_adst_adst_1_8bpc_rvv: 199.2 ( 8.04x) inv_txfm_add_8x4_adst_dct_0_8bpc_c: 1551.1 ( 1.00x) inv_txfm_add_8x4_adst_dct_0_8bpc_rvv: 193.6 ( 8.01x) inv_txfm_add_8x4_adst_dct_1_8bpc_c: 1550.7 ( 1.00x) inv_txfm_add_8x4_adst_dct_1_8bpc_rvv: 193.6 ( 8.01x) inv_txfm_add_8x4_adst_flipadst_0_8bpc_c: 1609.9 ( 1.00x) inv_txfm_add_8x4_adst_flipadst_0_8bpc_rvv: 200.7 ( 8.02x) inv_txfm_add_8x4_adst_flipadst_1_8bpc_c: 1608.4 ( 1.00x) inv_txfm_add_8x4_adst_flipadst_1_8bpc_rvv: 200.7 ( 8.01x) inv_txfm_add_8x4_adst_identity_0_8bpc_c: 1518.1 ( 1.00x) inv_txfm_add_8x4_adst_identity_0_8bpc_rvv: 168.6 ( 9.00x) inv_txfm_add_8x4_adst_identity_1_8bpc_c: 1518.0 ( 1.00x) inv_txfm_add_8x4_adst_identity_1_8bpc_rvv: 168.6 ( 9.00x) inv_txfm_add_8x4_dct_adst_0_8bpc_c: 1474.6 ( 1.00x) inv_txfm_add_8x4_dct_adst_0_8bpc_rvv: 176.1 ( 8.37x) inv_txfm_add_8x4_dct_adst_1_8bpc_c: 1474.4 ( 1.00x) inv_txfm_add_8x4_dct_adst_1_8bpc_rvv: 176.1 ( 8.37x) inv_txfm_add_8x4_dct_dct_0_8bpc_c: 256.5 ( 1.00x) inv_txfm_add_8x4_dct_dct_0_8bpc_rvv: 170.5 ( 1.50x) inv_txfm_add_8x4_dct_dct_1_8bpc_c: 1450.1 ( 1.00x) inv_txfm_add_8x4_dct_dct_1_8bpc_rvv: 170.5 ( 8.50x) inv_txfm_add_8x4_dct_flipadst_0_8bpc_c: 1489.6 ( 1.00x) inv_txfm_add_8x4_dct_flipadst_0_8bpc_rvv: 177.5 ( 8.39x) inv_txfm_add_8x4_dct_flipadst_1_8bpc_c: 1488.6 ( 1.00x) inv_txfm_add_8x4_dct_flipadst_1_8bpc_rvv: 177.5 ( 8.38x) inv_txfm_add_8x4_dct_identity_0_8bpc_c: 1396.3 ( 1.00x) inv_txfm_add_8x4_dct_identity_0_8bpc_rvv: 145.5 ( 9.60x) inv_txfm_add_8x4_dct_identity_1_8bpc_c: 1395.7 ( 1.00x) inv_txfm_add_8x4_dct_identity_1_8bpc_rvv: 145.5 ( 9.59x) inv_txfm_add_8x4_flipadst_adst_0_8bpc_c: 1596.5 ( 1.00x) inv_txfm_add_8x4_flipadst_adst_0_8bpc_rvv: 200.5 ( 7.96x) inv_txfm_add_8x4_flipadst_adst_1_8bpc_c: 1596.0 ( 1.00x) inv_txfm_add_8x4_flipadst_adst_1_8bpc_rvv: 200.5 ( 7.96x) inv_txfm_add_8x4_flipadst_dct_0_8bpc_c: 1554.8 ( 1.00x) inv_txfm_add_8x4_flipadst_dct_0_8bpc_rvv: 194.8 ( 7.98x) inv_txfm_add_8x4_flipadst_dct_1_8bpc_c: 1556.5 ( 1.00x) inv_txfm_add_8x4_flipadst_dct_1_8bpc_rvv: 194.8 ( 7.99x) inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_c: 1613.3 ( 1.00x) inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_rvv: 206.7 ( 7.80x) inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_c: 1612.1 ( 1.00x) inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_rvv: 206.7 ( 7.80x) inv_txfm_add_8x4_flipadst_identity_0_8bpc_c: 1519.8 ( 1.00x) inv_txfm_add_8x4_flipadst_identity_0_8bpc_rvv: 169.8 ( 8.95x) inv_txfm_add_8x4_flipadst_identity_1_8bpc_c: 1520.7 ( 1.00x) inv_txfm_add_8x4_flipadst_identity_1_8bpc_rvv: 169.8 ( 8.95x) inv_txfm_add_8x4_identity_adst_0_8bpc_c: 1101.0 ( 1.00x) inv_txfm_add_8x4_identity_adst_0_8bpc_rvv: 124.8 ( 8.82x) inv_txfm_add_8x4_identity_adst_1_8bpc_c: 1101.0 ( 1.00x) inv_txfm_add_8x4_identity_adst_1_8bpc_rvv: 124.8 ( 8.82x) inv_txfm_add_8x4_identity_dct_0_8bpc_c: 1058.4 ( 1.00x) inv_txfm_add_8x4_identity_dct_0_8bpc_rvv: 121.2 ( 8.73x) inv_txfm_add_8x4_identity_dct_1_8bpc_c: 1058.3 ( 1.00x) inv_txfm_add_8x4_identity_dct_1_8bpc_rvv: 121.2 ( 8.73x) inv_txfm_add_8x4_identity_flipadst_0_8bpc_c: 1113.2 ( 1.00x) inv_txfm_add_8x4_identity_flipadst_0_8bpc_rvv: 126.2 ( 8.82x) inv_txfm_add_8x4_identity_flipadst_1_8bpc_c: 1113.4 ( 1.00x) inv_txfm_add_8x4_identity_flipadst_1_8bpc_rvv: 126.4 ( 8.81x) inv_txfm_add_8x4_identity_identity_0_8bpc_c: 1010.6 ( 1.00x) inv_txfm_add_8x4_identity_identity_0_8bpc_rvv: 94.2 (10.73x) inv_txfm_add_8x4_identity_identity_1_8bpc_c: 1010.4 ( 1.00x) inv_txfm_add_8x4_identity_identity_1_8bpc_rvv: 94.2 (10.72x) |
||
|
|
adba0c6ff8 |
riscv64/itx: Add 4x8 8bpc RVV transforms
inv_txfm_add_4x8_adst_adst_0_8bpc_c: 1619.6 ( 1.00x) inv_txfm_add_4x8_adst_adst_0_8bpc_rvv: 198.6 ( 8.16x) inv_txfm_add_4x8_adst_adst_1_8bpc_c: 1621.5 ( 1.00x) inv_txfm_add_4x8_adst_adst_1_8bpc_rvv: 198.5 ( 8.17x) inv_txfm_add_4x8_adst_dct_0_8bpc_c: 1496.1 ( 1.00x) inv_txfm_add_4x8_adst_dct_0_8bpc_rvv: 175.1 ( 8.54x) inv_txfm_add_4x8_adst_dct_1_8bpc_c: 1496.3 ( 1.00x) inv_txfm_add_4x8_adst_dct_1_8bpc_rvv: 175.1 ( 8.55x) inv_txfm_add_4x8_adst_flipadst_0_8bpc_c: 1624.8 ( 1.00x) inv_txfm_add_4x8_adst_flipadst_0_8bpc_rvv: 200.6 ( 8.10x) inv_txfm_add_4x8_adst_flipadst_1_8bpc_c: 1623.9 ( 1.00x) inv_txfm_add_4x8_adst_flipadst_1_8bpc_rvv: 200.6 ( 8.10x) inv_txfm_add_4x8_adst_identity_0_8bpc_c: 1132.3 ( 1.00x) inv_txfm_add_4x8_adst_identity_0_8bpc_rvv: 122.6 ( 9.24x) inv_txfm_add_4x8_adst_identity_1_8bpc_c: 1132.2 ( 1.00x) inv_txfm_add_4x8_adst_identity_1_8bpc_rvv: 122.6 ( 9.23x) inv_txfm_add_4x8_dct_adst_0_8bpc_c: 1561.5 ( 1.00x) inv_txfm_add_4x8_dct_adst_0_8bpc_rvv: 192.3 ( 8.12x) inv_txfm_add_4x8_dct_adst_1_8bpc_c: 1563.9 ( 1.00x) inv_txfm_add_4x8_dct_adst_1_8bpc_rvv: 192.3 ( 8.13x) inv_txfm_add_4x8_dct_dct_0_8bpc_c: 260.9 ( 1.00x) inv_txfm_add_4x8_dct_dct_0_8bpc_rvv: 168.9 ( 1.55x) inv_txfm_add_4x8_dct_dct_1_8bpc_c: 1443.6 ( 1.00x) inv_txfm_add_4x8_dct_dct_1_8bpc_rvv: 168.9 ( 8.55x) inv_txfm_add_4x8_dct_flipadst_0_8bpc_c: 1567.5 ( 1.00x) inv_txfm_add_4x8_dct_flipadst_0_8bpc_rvv: 194.3 ( 8.07x) inv_txfm_add_4x8_dct_flipadst_1_8bpc_c: 1565.8 ( 1.00x) inv_txfm_add_4x8_dct_flipadst_1_8bpc_rvv: 194.3 ( 8.06x) inv_txfm_add_4x8_dct_identity_0_8bpc_c: 1073.8 ( 1.00x) inv_txfm_add_4x8_dct_identity_0_8bpc_rvv: 116.4 ( 9.23x) inv_txfm_add_4x8_dct_identity_1_8bpc_c: 1074.4 ( 1.00x) inv_txfm_add_4x8_dct_identity_1_8bpc_rvv: 116.3 ( 9.23x) inv_txfm_add_4x8_flipadst_adst_0_8bpc_c: 1631.1 ( 1.00x) inv_txfm_add_4x8_flipadst_adst_0_8bpc_rvv: 200.6 ( 8.13x) inv_txfm_add_4x8_flipadst_adst_1_8bpc_c: 1631.1 ( 1.00x) inv_txfm_add_4x8_flipadst_adst_1_8bpc_rvv: 200.6 ( 8.13x) inv_txfm_add_4x8_flipadst_dct_0_8bpc_c: 1507.0 ( 1.00x) inv_txfm_add_4x8_flipadst_dct_0_8bpc_rvv: 177.1 ( 8.51x) inv_txfm_add_4x8_flipadst_dct_1_8bpc_c: 1506.3 ( 1.00x) inv_txfm_add_4x8_flipadst_dct_1_8bpc_rvv: 177.1 ( 8.50x) inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_c: 1633.9 ( 1.00x) inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_rvv: 202.5 ( 8.07x) inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_c: 1633.7 ( 1.00x) inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_rvv: 202.5 ( 8.07x) inv_txfm_add_4x8_flipadst_identity_0_8bpc_c: 1142.7 ( 1.00x) inv_txfm_add_4x8_flipadst_identity_0_8bpc_rvv: 123.2 ( 9.27x) inv_txfm_add_4x8_flipadst_identity_1_8bpc_c: 1142.6 ( 1.00x) inv_txfm_add_4x8_flipadst_identity_1_8bpc_rvv: 123.2 ( 9.27x) inv_txfm_add_4x8_identity_adst_0_8bpc_c: 1442.0 ( 1.00x) inv_txfm_add_4x8_identity_adst_0_8bpc_rvv: 168.9 ( 8.54x) inv_txfm_add_4x8_identity_adst_1_8bpc_c: 1442.8 ( 1.00x) inv_txfm_add_4x8_identity_adst_1_8bpc_rvv: 168.9 ( 8.54x) inv_txfm_add_4x8_identity_dct_0_8bpc_c: 1322.7 ( 1.00x) inv_txfm_add_4x8_identity_dct_0_8bpc_rvv: 146.7 ( 9.02x) inv_txfm_add_4x8_identity_dct_1_8bpc_c: 1320.9 ( 1.00x) inv_txfm_add_4x8_identity_dct_1_8bpc_rvv: 146.7 ( 9.00x) inv_txfm_add_4x8_identity_flipadst_0_8bpc_c: 1451.0 ( 1.00x) inv_txfm_add_4x8_identity_flipadst_0_8bpc_rvv: 171.0 ( 8.48x) inv_txfm_add_4x8_identity_flipadst_1_8bpc_c: 1450.0 ( 1.00x) inv_txfm_add_4x8_identity_flipadst_1_8bpc_rvv: 171.0 ( 8.48x) inv_txfm_add_4x8_identity_identity_0_8bpc_c: 977.1 ( 1.00x) inv_txfm_add_4x8_identity_identity_0_8bpc_rvv: 93.9 (10.41x) inv_txfm_add_4x8_identity_identity_1_8bpc_c: 976.9 ( 1.00x) inv_txfm_add_4x8_identity_identity_1_8bpc_rvv: 93.9 (10.41x) |
||
|
|
45f993c3ba |
riscv64/itx: Add 4-point 8bpc RVV wide transforms
The 4-point ADST transform in AV1 is a Type-VII DST and 8bpc uses 32-bit additions so cannot be made grouping agnostic. |
||
|
|
e0d4655ff3 | riscv64/itx: Parameterize LMUL in iadst_4 macro | ||
|
|
c5b12bd94e | riscv64/itx: Use m2 register spacing in iadst_4 macro | ||
|
|
7080c09057 |
riscv64/itx: Reuse 8x8 epilog, saves 306 bytes
This commit shares the trailing instructions from inv_txfm_add_8x8_rvv
with inv_txfm_identity_add_8x8_rvv, only *8x8_identity* functions are
modified:
Old New Delta
inv_txfm_add_8x8_identity_adst_0_8bpc_rvv: 268.2 268.2 0.00%
inv_txfm_add_8x8_identity_adst_1_8bpc_rvv: 268.3 268.2 -0.04%
inv_txfm_add_8x8_identity_dct_0_8bpc_rvv: 225.1 228.3 1.42%
inv_txfm_add_8x8_identity_dct_1_8bpc_rvv: 225.1 228.2 1.37%
inv_txfm_add_8x8_identity_flipadst_0_8bpc_rvv: 270.6 270.2 -0.15%
inv_txfm_add_8x8_identity_flipadst_1_8bpc_rvv: 270.6 270.3 -0.11%
inv_txfm_add_8x8_identity_identity_0_8bpc_rvv: 146.1 146.0 -0.07%
inv_txfm_add_8x8_identity_identity_1_8bpc_rvv: 146.1 146.1 0.00%
inv_txfm_add_8x8_dct_adst_0_8bpc_rvv: 360.0 359.8 -0.06%
inv_txfm_add_8x8_dct_adst_1_8bpc_rvv: 360.0 359.6 -0.11%
inv_txfm_add_8x8_dct_dct_0_8bpc_rvv: 74.7 76.4 2.28%
inv_txfm_add_8x8_dct_dct_1_8bpc_rvv: 316.9 321.6 1.48%
inv_txfm_add_8x8_dct_flipadst_0_8bpc_rvv: 362.0 361.8 -0.06%
inv_txfm_add_8x8_dct_flipadst_1_8bpc_rvv: 361.9 361.9 0.00%
inv_txfm_add_8x8_dct_identity_0_8bpc_rvv: 240.0 240.6 0.25%
inv_txfm_add_8x8_dct_identity_1_8bpc_rvv: 240.0 240.6 0.25%
inv_txfm_add_8x8_adst_adst_0_8bpc_rvv: 403.0 403.3 0.07%
inv_txfm_add_8x8_adst_adst_1_8bpc_rvv: 403.0 403.4 0.10%
inv_txfm_add_8x8_adst_dct_0_8bpc_rvv: 359.7 359.7 0.00%
inv_txfm_add_8x8_adst_dct_1_8bpc_rvv: 359.4 359.7 0.08%
inv_txfm_add_8x8_adst_flipadst_0_8bpc_rvv: 404.6 405.1 0.12%
inv_txfm_add_8x8_adst_flipadst_1_8bpc_rvv: 404.6 405.3 0.17%
inv_txfm_add_8x8_adst_identity_0_8bpc_rvv: 283.4 282.8 -0.21%
inv_txfm_add_8x8_adst_identity_1_8bpc_rvv: 283.4 282.8 -0.21%
inv_txfm_add_8x8_flipadst_adst_0_8bpc_rvv: 403.9 404.6 0.17%
inv_txfm_add_8x8_flipadst_adst_1_8bpc_rvv: 404.0 404.6 0.15%
inv_txfm_add_8x8_flipadst_dct_0_8bpc_rvv: 361.4 361.5 0.03%
inv_txfm_add_8x8_flipadst_dct_1_8bpc_rvv: 361.3 361.5 0.06%
inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_rvv: 406.2 406.1 -0.02%
inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_rvv: 405.7 406.4 0.17%
inv_txfm_add_8x8_flipadst_identity_0_8bpc_rvv: 284.8 287.5 0.95%
inv_txfm_add_8x8_flipadst_identity_1_8bpc_rvv: 284.8 287.6 0.98%
|
||
|
|
9315185b73 | riscv: Add asm.S macro for decorating local symbols | ||
|
|
090b959c77 |
arm64/itx: Reuse 8x8 epilog, saves 264 bytes
This commit shares the trailing instructions from inv_txfm_add_8x8_neon
with inv_txfm_identity_add_8x8_neon, only *8x8_identity* functions are
modified:
Old New Delta
inv_txfm_add_8x8_identity_adst_0_8bpc_neon: 113.5 117.3 3.35%
inv_txfm_add_8x8_identity_adst_1_8bpc_neon: 113.5 117.3 3.35%
inv_txfm_add_8x8_identity_dct_0_8bpc_neon: 98.2 96.0 -2.24%
inv_txfm_add_8x8_identity_dct_1_8bpc_neon: 98.3 96.0 -2.34%
inv_txfm_add_8x8_identity_flipadst_0_8bpc_neon: 113.3 112.8 -0.44%
inv_txfm_add_8x8_identity_flipadst_1_8bpc_neon: 113.4 112.8 -0.53%
inv_txfm_add_8x8_identity_identity_0_8bpc_neon: 44.3 45.0 1.58%
inv_txfm_add_8x8_identity_identity_1_8bpc_neon: 44.3 45.0 1.58%
inv_txfm_add_8x8_dct_adst_0_8bpc_neon: 190.8 190.3 -0.26%
inv_txfm_add_8x8_dct_adst_1_8bpc_neon: 190.8 190.3 -0.26%
inv_txfm_add_8x8_dct_dct_0_8bpc_neon: 31.3 31.3 0.00%
inv_txfm_add_8x8_dct_dct_1_8bpc_neon: 166.8 167.0 0.12%
inv_txfm_add_8x8_dct_flipadst_0_8bpc_neon: 190.5 190.3 -0.11%
inv_txfm_add_8x8_dct_flipadst_1_8bpc_neon: 190.5 190.3 -0.11%
inv_txfm_add_8x8_dct_identity_0_8bpc_neon: 118.8 118.3 -0.42%
inv_txfm_add_8x8_dct_identity_1_8bpc_neon: 118.8 118.3 -0.42%
inv_txfm_add_8x8_adst_adst_0_8bpc_neon: 206.8 206.5 -0.15%
inv_txfm_add_8x8_adst_adst_1_8bpc_neon: 206.8 206.5 -0.15%
inv_txfm_add_8x8_adst_dct_0_8bpc_neon: 187.7 188.3 0.32%
inv_txfm_add_8x8_adst_dct_1_8bpc_neon: 187.5 188.3 0.42%
inv_txfm_add_8x8_adst_flipadst_0_8bpc_neon: 207.3 207.3 0.00%
inv_txfm_add_8x8_adst_flipadst_1_8bpc_neon: 207.3 207.3 0.00%
inv_txfm_add_8x8_adst_identity_0_8bpc_neon: 136.7 136.5 -0.15%
inv_txfm_add_8x8_adst_identity_1_8bpc_neon: 136.3 136.5 0.15%
inv_txfm_add_8x8_flipadst_adst_0_8bpc_neon: 206.5 206.5 0.00%
inv_txfm_add_8x8_flipadst_adst_1_8bpc_neon: 206.5 206.5 0.00%
inv_txfm_add_8x8_flipadst_dct_0_8bpc_neon: 188.5 188.3 -0.11%
inv_txfm_add_8x8_flipadst_dct_1_8bpc_neon: 188.5 188.3 -0.11%
inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_neon: 207.5 206.9 -0.29%
inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_neon: 207.5 206.5 -0.48%
inv_txfm_add_8x8_flipadst_identity_0_8bpc_neon: 138.2 138.3 0.07%
inv_txfm_add_8x8_flipadst_identity_1_8bpc_neon: 137.5 138.3 0.58%
|
||
|
|
e8fbfd999b |
arm32/itx: Reuse 8x8 epilog, saves 220 bytes
This commit shares the trailing instructions from inv_txfm_add_8x8_neon with inv_txfm_identity_add_8x8_neon, only *8x8_identity* functions are modified. |
||
|
|
50d63f9a6e |
arm32/itx: Only set r4 when needed, saves 48 bytes
Avoid setting r4 when the horizontal transform is the identity in
{4,8}x16 and 16x4 rectangular transforms.
|
||
|
|
b56b02a914 |
arm64/itx: Only set x4 when needed, saves 64 bytes
Avoid setting x4 when the horizontal transform is the identity in
{4,8}x16 and 16x{4,8} rectangular transforms.
|
||
|
|
97cc6cee81 | riscv64/itx: Add missing tail, mask agnostic flags | ||
|
|
7b15ca1375 |
riscv64/itx: Add 16-point 8bpc RVV flipadst transform
inv_txfm_add_16x16_adst_flipadst_0_8bpc_c: 15272.2 ( 1.00x) inv_txfm_add_16x16_adst_flipadst_0_8bpc_rvv: 1824.4 ( 8.37x) inv_txfm_add_16x16_adst_flipadst_1_8bpc_c: 15261.2 ( 1.00x) inv_txfm_add_16x16_adst_flipadst_1_8bpc_rvv: 1824.5 ( 8.36x) inv_txfm_add_16x16_adst_flipadst_2_8bpc_c: 15260.0 ( 1.00x) inv_txfm_add_16x16_adst_flipadst_2_8bpc_rvv: 1824.5 ( 8.36x) inv_txfm_add_16x16_dct_flipadst_0_8bpc_c: 14497.2 ( 1.00x) inv_txfm_add_16x16_dct_flipadst_0_8bpc_rvv: 1637.3 ( 8.85x) inv_txfm_add_16x16_dct_flipadst_1_8bpc_c: 14490.5 ( 1.00x) inv_txfm_add_16x16_dct_flipadst_1_8bpc_rvv: 1637.3 ( 8.85x) inv_txfm_add_16x16_dct_flipadst_2_8bpc_c: 14486.4 ( 1.00x) inv_txfm_add_16x16_dct_flipadst_2_8bpc_rvv: 1637.3 ( 8.85x) inv_txfm_add_16x16_flipadst_adst_0_8bpc_c: 15307.7 ( 1.00x) inv_txfm_add_16x16_flipadst_adst_0_8bpc_rvv: 1808.0 ( 8.47x) inv_txfm_add_16x16_flipadst_adst_1_8bpc_c: 15341.0 ( 1.00x) inv_txfm_add_16x16_flipadst_adst_1_8bpc_rvv: 1808.1 ( 8.48x) inv_txfm_add_16x16_flipadst_adst_2_8bpc_c: 15333.5 ( 1.00x) inv_txfm_add_16x16_flipadst_adst_2_8bpc_rvv: 1808.1 ( 8.48x) inv_txfm_add_16x16_flipadst_dct_0_8bpc_c: 14530.0 ( 1.00x) inv_txfm_add_16x16_flipadst_dct_0_8bpc_rvv: 1636.4 ( 8.88x) inv_txfm_add_16x16_flipadst_dct_1_8bpc_c: 14510.3 ( 1.00x) inv_txfm_add_16x16_flipadst_dct_1_8bpc_rvv: 1636.3 ( 8.87x) inv_txfm_add_16x16_flipadst_dct_2_8bpc_c: 14504.7 ( 1.00x) inv_txfm_add_16x16_flipadst_dct_2_8bpc_rvv: 1636.3 ( 8.86x) inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_c: 15315.5 ( 1.00x) inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_rvv: 1823.5 ( 8.40x) inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_c: 15324.1 ( 1.00x) inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_rvv: 1823.3 ( 8.40x) inv_txfm_add_16x16_flipadst_flipadst_2_8bpc_c: 15315.6 ( 1.00x) inv_txfm_add_16x16_flipadst_flipadst_2_8bpc_rvv: 1823.5 ( 8.40x) |
||
|
|
b981bc9c3e | riscv64/itx: Convert inv_adst_e16_x16_rvv to macro | ||
|
|
2685b40920 |
riscv64/itx: Add 16-point 8bpc RVV adst transform
inv_txfm_add_16x16_adst_adst_0_8bpc_c: 15364.4 ( 1.00x) inv_txfm_add_16x16_adst_adst_0_8bpc_rvv: 1814.1 ( 8.47x) inv_txfm_add_16x16_adst_adst_1_8bpc_c: 15363.7 ( 1.00x) inv_txfm_add_16x16_adst_adst_1_8bpc_rvv: 1814.5 ( 8.47x) inv_txfm_add_16x16_adst_adst_2_8bpc_c: 15368.9 ( 1.00x) inv_txfm_add_16x16_adst_adst_2_8bpc_rvv: 1814.5 ( 8.47x) inv_txfm_add_16x16_adst_dct_0_8bpc_c: 14560.0 ( 1.00x) inv_txfm_add_16x16_adst_dct_0_8bpc_rvv: 1644.4 ( 8.85x) inv_txfm_add_16x16_adst_dct_1_8bpc_c: 14578.9 ( 1.00x) inv_txfm_add_16x16_adst_dct_1_8bpc_rvv: 1644.5 ( 8.87x) inv_txfm_add_16x16_adst_dct_2_8bpc_c: 14575.0 ( 1.00x) inv_txfm_add_16x16_adst_dct_2_8bpc_rvv: 1644.6 ( 8.86x) inv_txfm_add_16x16_dct_adst_0_8bpc_c: 14550.7 ( 1.00x) inv_txfm_add_16x16_dct_adst_0_8bpc_rvv: 1622.7 ( 8.97x) inv_txfm_add_16x16_dct_adst_1_8bpc_c: 14556.0 ( 1.00x) inv_txfm_add_16x16_dct_adst_1_8bpc_rvv: 1622.6 ( 8.97x) inv_txfm_add_16x16_dct_adst_2_8bpc_c: 14543.3 ( 1.00x) inv_txfm_add_16x16_dct_adst_2_8bpc_rvv: 1622.6 ( 8.96x) |
||
|
|
72dba22e66 |
riscv64/itx: Add 4x4 8bpc RVV wht_wht transform
inv_txfm_add_4x4_wht_wht_0_8bpc_c: 265.6 ( 1.00x) inv_txfm_add_4x4_wht_wht_0_8bpc_rvv: 66.9 ( 3.97x) inv_txfm_add_4x4_wht_wht_1_8bpc_c: 265.5 ( 1.00x) inv_txfm_add_4x4_wht_wht_1_8bpc_rvv: 66.9 ( 3.97x) |
||
|
|
cc29b2314c |
riscv64/itx: Add 16x16 8bpc dct_identity and identity_dct
inv_txfm_add_16x16_dct_identity_0_8bpc_c: 10593.3 ( 1.00x) inv_txfm_add_16x16_dct_identity_0_8bpc_rvv: 1163.3 ( 9.11x) inv_txfm_add_16x16_dct_identity_1_8bpc_c: 10584.9 ( 1.00x) inv_txfm_add_16x16_dct_identity_1_8bpc_rvv: 1163.3 ( 9.10x) inv_txfm_add_16x16_dct_identity_2_8bpc_c: 10590.3 ( 1.00x) inv_txfm_add_16x16_dct_identity_2_8bpc_rvv: 1163.6 ( 9.10x) inv_txfm_add_16x16_identity_dct_0_8bpc_c: 9945.9 ( 1.00x) inv_txfm_add_16x16_identity_dct_0_8bpc_rvv: 1150.2 ( 8.65x) inv_txfm_add_16x16_identity_dct_1_8bpc_c: 9937.0 ( 1.00x) inv_txfm_add_16x16_identity_dct_1_8bpc_rvv: 1150.3 ( 8.64x) inv_txfm_add_16x16_identity_dct_2_8bpc_c: 9934.6 ( 1.00x) inv_txfm_add_16x16_identity_dct_2_8bpc_rvv: 1150.4 ( 8.64x) |
||
|
|
8e82093ebb |
riscv64/itx: Add 16-point 8bpc RVV dct transform
inv_txfm_add_16x16_dct_dct_0_8bpc_c: 1574.4 ( 1.00x) inv_txfm_add_16x16_dct_dct_0_8bpc_rvv: 1450.3 ( 1.09x) inv_txfm_add_16x16_dct_dct_1_8bpc_c: 13614.4 ( 1.00x) inv_txfm_add_16x16_dct_dct_1_8bpc_rvv: 1450.5 ( 9.39x) inv_txfm_add_16x16_dct_dct_2_8bpc_c: 13613.2 ( 1.00x) inv_txfm_add_16x16_dct_dct_2_8bpc_rvv: 1450.4 ( 9.39x) |
||
|
|
9976976ec8 | riscv64/itx: Use registers above v15 in dct macros | ||
|
|
57d5729cf8 | riscv64/itx: Convert inv_dct_e16_x8_rvv to macro | ||
|
|
c0ccc323d6 | riscv64/itx: Convert inv_txfm_horz_16x8_rvv to macro | ||
|
|
64c9d16049 |
riscv64/itx: Add 16-point 8bpc RVV idtx transform
inv_txfm_add_16x16_identity_identity_0_8bpc_c: 6933.8 ( 1.00x) inv_txfm_add_16x16_identity_identity_0_8bpc_rvv: 866.0 ( 8.01x) inv_txfm_add_16x16_identity_identity_1_8bpc_c: 6933.4 ( 1.00x) inv_txfm_add_16x16_identity_identity_1_8bpc_rvv: 866.1 ( 8.01x) inv_txfm_add_16x16_identity_identity_2_8bpc_c: 6934.2 ( 1.00x) inv_txfm_add_16x16_identity_identity_2_8bpc_rvv: 866.1 ( 8.01x) |
||
|
|
08051a3b50 | arm64/itx: Set x8 outside .irp loop | ||
|
|
314423b3d9 | arm64/itx: Set x8 only once in inv_txfm_add_16x16_neon | ||
|
|
a6878be7e0 | Alphabetize architecture defines and usage | ||
|
|
219befefeb |
riscv64/itx: Add 8-point 8bpc RVV flipadst transform
inv_txfm_add_8x8_adst_flipadst_0_8bpc_c: 3323.1 ( 1.00x) inv_txfm_add_8x8_adst_flipadst_0_8bpc_rvv: 402.1 ( 8.26x) inv_txfm_add_8x8_adst_flipadst_1_8bpc_c: 3322.8 ( 1.00x) inv_txfm_add_8x8_adst_flipadst_1_8bpc_rvv: 402.2 ( 8.26x) inv_txfm_add_8x8_dct_flipadst_0_8bpc_c: 3074.3 ( 1.00x) inv_txfm_add_8x8_dct_flipadst_0_8bpc_rvv: 359.5 ( 8.55x) inv_txfm_add_8x8_dct_flipadst_1_8bpc_c: 3074.4 ( 1.00x) inv_txfm_add_8x8_dct_flipadst_1_8bpc_rvv: 359.4 ( 8.56x) inv_txfm_add_8x8_flipadst_adst_0_8bpc_c: 3314.8 ( 1.00x) inv_txfm_add_8x8_flipadst_adst_0_8bpc_rvv: 403.3 ( 8.22x) inv_txfm_add_8x8_flipadst_adst_1_8bpc_c: 3315.3 ( 1.00x) inv_txfm_add_8x8_flipadst_adst_1_8bpc_rvv: 403.3 ( 8.22x) inv_txfm_add_8x8_flipadst_dct_0_8bpc_c: 3071.7 ( 1.00x) inv_txfm_add_8x8_flipadst_dct_0_8bpc_rvv: 359.1 ( 8.55x) inv_txfm_add_8x8_flipadst_dct_1_8bpc_c: 3072.5 ( 1.00x) inv_txfm_add_8x8_flipadst_dct_1_8bpc_rvv: 359.3 ( 8.55x) inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_c: 3325.2 ( 1.00x) inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_rvv: 405.2 ( 8.21x) inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_c: 3325.0 ( 1.00x) inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_rvv: 405.2 ( 8.21x) inv_txfm_add_8x8_flipadst_identity_0_8bpc_c: 2356.2 ( 1.00x) inv_txfm_add_8x8_flipadst_identity_0_8bpc_rvv: 283.7 ( 8.31x) inv_txfm_add_8x8_flipadst_identity_1_8bpc_c: 2356.2 ( 1.00x) inv_txfm_add_8x8_flipadst_identity_1_8bpc_rvv: 283.5 ( 8.31x) inv_txfm_add_8x8_identity_flipadst_0_8bpc_c: 2332.8 ( 1.00x) inv_txfm_add_8x8_identity_flipadst_0_8bpc_rvv: 268.0 ( 8.71x) inv_txfm_add_8x8_identity_flipadst_1_8bpc_c: 2331.5 ( 1.00x) inv_txfm_add_8x8_identity_flipadst_1_8bpc_rvv: 268.0 ( 8.70x) |
||
|
|
b5747aee1e | riscv64/itx: Convert inv_adst_e16_x8_rvv to macro | ||
|
|
64f9fd0239 |
riscv64/itx: Add 8-point 8bpc RVV adst transform
inv_txfm_add_8x8_adst_adst_0_8bpc_c: 3338.5 ( 1.00x) inv_txfm_add_8x8_adst_adst_0_8bpc_rvv: 400.4 ( 8.34x) inv_txfm_add_8x8_adst_adst_1_8bpc_c: 3338.1 ( 1.00x) inv_txfm_add_8x8_adst_adst_1_8bpc_rvv: 399.8 ( 8.35x) inv_txfm_add_8x8_adst_dct_0_8bpc_c: 3112.5 ( 1.00x) inv_txfm_add_8x8_adst_dct_0_8bpc_rvv: 357.2 ( 8.71x) inv_txfm_add_8x8_adst_dct_1_8bpc_c: 3111.4 ( 1.00x) inv_txfm_add_8x8_adst_dct_1_8bpc_rvv: 357.0 ( 8.71x) inv_txfm_add_8x8_adst_identity_0_8bpc_c: 2375.0 ( 1.00x) inv_txfm_add_8x8_adst_identity_0_8bpc_rvv: 281.0 ( 8.45x) inv_txfm_add_8x8_adst_identity_1_8bpc_c: 2375.6 ( 1.00x) inv_txfm_add_8x8_adst_identity_1_8bpc_rvv: 281.0 ( 8.45x) inv_txfm_add_8x8_dct_adst_0_8bpc_c: 3113.3 ( 1.00x) inv_txfm_add_8x8_dct_adst_0_8bpc_rvv: 357.2 ( 8.72x) inv_txfm_add_8x8_dct_adst_1_8bpc_c: 3112.1 ( 1.00x) inv_txfm_add_8x8_dct_adst_1_8bpc_rvv: 357.2 ( 8.71x) inv_txfm_add_8x8_identity_adst_0_8bpc_c: 2346.7 ( 1.00x) inv_txfm_add_8x8_identity_adst_0_8bpc_rvv: 265.6 ( 8.83x) inv_txfm_add_8x8_identity_adst_1_8bpc_c: 2348.3 ( 1.00x) inv_txfm_add_8x8_identity_adst_1_8bpc_rvv: 265.8 ( 8.84x) |