100 Commits
Author SHA1 Message Date
Nathan E. Egge beda1b3cda riscv64/itx: Match stack allocation of 16x16 itx 2026-06-07 02:52:33 -04:00
Nathan E. Egge 5e8c380e4b riscv64/mc16: Keep blend_v RVV operations in 16-bits
Kendryte K230                Before             After         Delta

blend_v_w2_16bpc_c:       240.9 ( 1.00x)    240.9 ( 1.00x)    0.00%
blend_v_w2_16bpc_rvv:     149.7 ( 1.61x)    155.4 ( 1.55x)    3.81%
blend_v_w4_16bpc_c:      1072.4 ( 1.00x)   1072.5 ( 1.00x)    0.01%
blend_v_w4_16bpc_rvv:     307.2 ( 3.49x)    299.9 ( 3.58x)   -2.38%
blend_v_w8_16bpc_c:      2004.7 ( 1.00x)   2010.2 ( 1.00x)    0.27%
blend_v_w8_16bpc_rvv:     436.1 ( 4.60x)    381.0 ( 5.28x)  -12.63%
blend_v_w16_16bpc_c:     3859.4 ( 1.00x)   3853.7 ( 1.00x)   -0.15%
blend_v_w16_16bpc_rvv:    761.1 ( 5.07x)    554.0 ( 6.96x)  -27.21%
blend_v_w32_16bpc_c:     7509.7 ( 1.00x)   7505.3 ( 1.00x)   -0.06%
blend_v_w32_16bpc_rvv:   1427.1 ( 5.26x)   1005.5 ( 7.46x)  -29.54%

SpacemiT K1                  Before             After         Delta

blend_v_w2_16bpc_c:       220.1 ( 1.00x)    222.0 ( 1.00x)    0.86%
blend_v_w2_16bpc_rvv:     146.6 ( 1.50x)    151.1 ( 1.47x)    3.07%
blend_v_w4_16bpc_c:       968.3 ( 1.00x)    969.6 ( 1.00x)    0.13%
blend_v_w4_16bpc_rvv:     281.2 ( 3.44x)    290.2 ( 3.34x)    3.20%
blend_v_w8_16bpc_c:      1809.5 ( 1.00x)   1812.1 ( 1.00x)    0.14%
blend_v_w8_16bpc_rvv:     374.2 ( 4.84x)    375.3 ( 4.83x)    0.29%
blend_v_w16_16bpc_c:     3479.7 ( 1.00x)   3480.9 ( 1.00x)    0.03%
blend_v_w16_16bpc_rvv:    521.5 ( 6.67x)    465.9 ( 7.47x)  -10.66%
blend_v_w32_16bpc_c:     6767.9 ( 1.00x)   6773.7 ( 1.00x)    0.09%
blend_v_w32_16bpc_rvv:    852.1 ( 7.94x)    727.4 ( 9.31x)  -14.63%

Blackhole p100a              Before             After         Delta

blend_v_w2_16bpc_c:       205.6 ( 1.00x)    206.0 ( 1.00x)    0.19%
blend_v_w2_16bpc_rvv:     176.5 ( 1.16x)    143.6 ( 1.44x)  -18.64%
blend_v_w4_16bpc_c:       901.0 ( 1.00x)    891.8 ( 1.00x)   -1.02%
blend_v_w4_16bpc_rvv:     298.8 ( 3.02x)    235.2 ( 3.79x)  -21.29%
blend_v_w8_16bpc_c:      1663.3 ( 1.00x)   1656.5 ( 1.00x)   -0.41%
blend_v_w8_16bpc_rvv:     300.1 ( 5.54x)    236.4 ( 7.01x)  -21.23%
blend_v_w16_16bpc_c:     3192.1 ( 1.00x)   3182.3 ( 1.00x)   -0.31%
blend_v_w16_16bpc_rvv:    349.2 ( 9.14x)    311.4 (10.22x)  -10.82%
blend_v_w32_16bpc_c:     6259.2 ( 1.00x)   6257.8 ( 1.00x)   -0.02%
blend_v_w32_16bpc_rvv:    350.2 (17.88x)    321.8 (19.44x)   -8.11%
2025-12-30 13:47:49 +00:00
Nathan E. Egge d2fa9466be riscv64/mc16: Keep blend RVV operations in 16-bits
Kendryte K230                Before             After         Delta

blend_w4_16bpc_c:         227.0 ( 1.00x)    227.1 ( 1.00x)    0.04%
blend_w4_16bpc_rvv:        71.1 ( 3.19x)     73.2 ( 3.10x)    2.95%
blend_w8_16bpc_c:         662.5 ( 1.00x)    662.7 ( 1.00x)    0.03%
blend_w8_16bpc_rvv:       132.4 ( 5.00x)    115.0 ( 5.76x)  -13.14%
blend_w16_16bpc_c:       2559.3 ( 1.00x)   2559.8 ( 1.00x)    0.02%
blend_w16_16bpc_rvv:      416.1 ( 6.15x)    326.7 ( 7.83x)  -21.49%
blend_w32_16bpc_c:       6483.9 ( 1.00x)   6484.5 ( 1.00x)    0.01%
blend_w32_16bpc_rvv:     1029.1 ( 6.30x)    774.7 ( 8.37x)  -24.72%

SpacemiT K1                  Before             After         Delta

blend_w4_16bpc_c:         206.1 ( 1.00x)    207.0 ( 1.00x)    0.44%
blend_w4_16bpc_rvv:        64.4 ( 3.20x)     69.5 ( 2.98x)    7.92%
blend_w8_16bpc_c:         600.2 ( 1.00x)    600.9 ( 1.00x)    0.12%
blend_w8_16bpc_rvv:       101.6 ( 5.91x)    106.9 ( 5.62x)    5.22%
blend_w16_16bpc_c:       2316.0 ( 1.00x)   2316.4 ( 1.00x)    0.02%
blend_w16_16bpc_rvv:      261.8 ( 8.85x)    229.1 (10.11x)  -12.49%
blend_w32_16bpc_c:       5861.1 ( 1.00x)   5860.4 ( 1.00x)   -0.01%
blend_w32_16bpc_rvv:      602.9 ( 9.72x)    475.3 (12.33x)  -21.16%

Blackhole p100a              Before             After         Delta

blend_w4_16bpc_c:         193.3 ( 1.00x)    191.3 ( 1.00x)   -1.03%
blend_w4_16bpc_rvv:        66.3 ( 2.91x)     65.4 ( 2.92x)   -1.36%
blend_w8_16bpc_c:         552.0 ( 1.00x)    549.8 ( 1.00x)   -0.40%
blend_w8_16bpc_rvv:       100.5 ( 5.49x)     96.2 ( 5.71x)   -4.28%
blend_w16_16bpc_c:       2112.5 ( 1.00x)   2111.8 ( 1.00x)   -0.03%
blend_w16_16bpc_rvv:      190.3 (11.10x)    185.9 (11.36x)   -2.31%
blend_w32_16bpc_c:       5417.5 ( 1.00x)   5416.2 ( 1.00x)   -0.02%
blend_w32_16bpc_rvv:      290.3 (18.66x)    304.0 (17.82x)    4.72%
2025-12-30 13:47:49 +00:00
Nathan E. Egge a31e4bd757 riscv64/mc16: Add VLEN=512 16bpc RVV blend_{,v} functions
Blackhole p100a               Before             After         Delta

blend_w4_16bpc_c:         193.1 ( 1.00x)     186.8 ( 1.00x)   -3.26%
blend_w4_16bpc_rvv:        64.8 ( 2.98x)      62.8 ( 2.97x)   -3.09%
blend_w8_16bpc_c:         551.0 ( 1.00x)     546.0 ( 1.00x)   -0.91%
blend_w8_16bpc_rvv:        96.2 ( 5.73x)      93.4 ( 5.85x)   -2.91%
blend_w16_16bpc_c:       2111.6 ( 1.00x)    2107.0 ( 1.00x)   -0.22%
blend_w16_16bpc_rvv:      189.9 (11.12x)     189.6 (11.11x)   -0.16%
blend_w32_16bpc_c:       5403.9 ( 1.00x)    5398.5 ( 1.00x)   -0.10%
blend_w32_16bpc_rvv:      292.4 (18.48x)     291.5 (18.52x)   -0.31%

blend_v_w2_16bpc_c:       209.1 ( 1.00x)     208.7 ( 1.00x)   -0.19%
blend_v_w2_16bpc_rvv:     180.3 ( 1.16x)     180.4 ( 1.16x)    0.06%
blend_v_w4_16bpc_c:       896.9 ( 1.00x)     898.5 ( 1.00x)    0.18%
blend_v_w4_16bpc_rvv:     303.0 ( 2.96x)     302.5 ( 2.97x)   -0.17%
blend_v_w8_16bpc_c:      1658.9 ( 1.00x)    1663.1 ( 1.00x)    0.25%
blend_v_w8_16bpc_rvv:     303.0 ( 5.47x)     302.6 ( 5.50x)   -0.13%
blend_v_w16_16bpc_c:     3186.0 ( 1.00x)    3182.7 ( 1.00x)   -0.10%
blend_v_w16_16bpc_rvv:    313.1 (10.17x)     312.1 (10.20x)   -0.32%
blend_v_w32_16bpc_c:     6253.9 ( 1.00x)    6257.0 ( 1.00x)    0.05%
blend_v_w32_16bpc_rvv:    355.4 (17.60x)     353.2 (17.72x)   -0.62%
2025-12-24 01:08:19 +00:00
Nathan E. Egge 38dd16e108 riscv64/mc: Add VLEN=512 8bpc RVV blend_{,h,v} functions
Blackhole p100a               Before             After         Delta

blend_w4_8bpc_c:          190.7 ( 1.00x)     189.3 ( 1.00x)   -0.73%
blend_w4_8bpc_rvv:         61.2 ( 3.12x)      59.7 ( 3.17x)   -2.45%
blend_w8_8bpc_c:          550.7 ( 1.00x)     547.0 ( 1.00x)   -0.67%
blend_w8_8bpc_rvv:         91.0 ( 6.05x)      89.4 ( 6.12x)   -1.76%
blend_w16_8bpc_c:        2112.4 ( 1.00x)    2106.8 ( 1.00x)   -0.27%
blend_w16_8bpc_rvv:       177.1 (11.92x)     174.8 (12.05x)   -1.30%
blend_w32_8bpc_c:        5423.8 ( 1.00x)    5393.8 ( 1.00x)   -0.55%
blend_w32_8bpc_rvv:       233.5 (23.23x)     230.7 (23.38x)   -1.20%

blend_h_w2_8bpc_c:        126.4 ( 1.00x)     128.0 ( 1.00x)    1.27%
blend_h_w2_8bpc_rvv:       85.0 ( 1.49x)      81.2 ( 1.58x)   -4.47%
blend_h_w4_8bpc_c:        221.2 ( 1.00x)     222.2 ( 1.00x)    0.45%
blend_h_w4_8bpc_rvv:       84.3 ( 2.62x)      81.3 ( 2.73x)   -3.56%
blend_h_w8_8bpc_c:        411.9 ( 1.00x)     413.3 ( 1.00x)    0.34%
blend_h_w8_8bpc_rvv:       84.2 ( 4.89x)      81.0 ( 5.10x)   -3.80%
blend_h_w16_8bpc_c:       792.6 ( 1.00x)     793.5 ( 1.00x)    0.11%
blend_h_w16_8bpc_rvv:      84.5 ( 9.38x)      81.5 ( 9.74x)   -3.55%
blend_h_w32_8bpc_c:      1577.7 ( 1.00x)    1578.8 ( 1.00x)    0.07%
blend_h_w32_8bpc_rvv:      86.6 (18.21x)      83.5 (18.90x)   -3.58%
blend_h_w64_8bpc_c:      3099.5 ( 1.00x)    3101.9 ( 1.00x)    0.08%
blend_h_w64_8bpc_rvv:      98.4 (31.49x)      95.2 (32.58x)   -3.25%
blend_h_w128_8bpc_c:     7496.9 ( 1.00x)    7498.1 ( 1.00x)    0.02%
blend_h_w128_8bpc_rvv:    155.4 (48.24x)     151.5 (49.50x)   -2.51%

blend_v_w2_8bpc_c:        202.9 ( 1.00x)     203.5 ( 1.00x)    0.30%
blend_v_w2_8bpc_rvv:      173.5 ( 1.17x)     176.6 ( 1.15x)    1.79%
blend_v_w4_8bpc_c:        842.3 ( 1.00x)     844.2 ( 1.00x)    0.23%
blend_v_w4_8bpc_rvv:      295.9 ( 2.85x)     299.0 ( 2.82x)    1.05%
blend_v_w8_8bpc_c:       1589.9 ( 1.00x)    1592.1 ( 1.00x)    0.14%
blend_v_w8_8bpc_rvv:      296.2 ( 5.37x)     299.0 ( 5.32x)    0.95%
blend_v_w16_8bpc_c:      3090.3 ( 1.00x)    3088.3 ( 1.00x)   -0.06%
blend_v_w16_8bpc_rvv:     296.0 (10.44x)     299.4 (10.32x)    1.15%
blend_v_w32_8bpc_c:      6080.2 ( 1.00x)    6081.5 ( 1.00x)    0.02%
blend_v_w32_8bpc_rvv:     306.3 (19.85x)     309.3 (19.66x)    0.98%
2025-12-22 23:20:23 +00:00
Nathan E. Egge a17c862576 riscv64/mc: Only process w*3/4 elements in blend_v
Setting VL for this function only impacts the 16bpc performance and only
 on the SpacemiT K1 which has two vector units of length 128b each.

Kendryte K230                Before             After         Delta

blend_v_w2_8bpc_c:        220.0 ( 1.00x)    221.3 ( 1.00x)    0.59%
blend_v_w2_8bpc_rvv:      145.7 ( 1.51x)    148.2 ( 1.49x)    1.72%
blend_v_w4_8bpc_c:        942.1 ( 1.00x)    943.7 ( 1.00x)    0.17%
blend_v_w4_8bpc_rvv:      240.4 ( 3.92x)    242.9 ( 3.89x)    1.04%
blend_v_w8_8bpc_c:       1782.3 ( 1.00x)   1783.8 ( 1.00x)    0.08%
blend_v_w8_8bpc_rvv:      252.6 ( 7.06x)    254.9 ( 7.00x)    0.91%
blend_v_w16_8bpc_c:      3650.9 ( 1.00x)   3647.0 ( 1.00x)   -0.11%
blend_v_w16_8bpc_rvv:     495.5 ( 7.37x)    494.4 ( 7.38x)   -0.22%
blend_v_w32_8bpc_c:      7013.0 ( 1.00x)   7018.2 ( 1.00x)    0.07%
blend_v_w32_8bpc_rvv:     807.9 ( 8.68x)    802.0 ( 8.75x)   -0.73%

blend_v_w2_16bpc_c:       226.1 ( 1.00x)    225.5 ( 1.00x)   -0.27%
blend_v_w2_16bpc_rvv:     148.6 ( 1.52x)    148.9 ( 1.51x)    0.20%
blend_v_w4_16bpc_c:      1010.7 ( 1.00x)   1006.7 ( 1.00x)   -0.40%
blend_v_w4_16bpc_rvv:     306.7 ( 3.30x)    307.4 ( 3.27x)    0.23%
blend_v_w8_16bpc_c:      1990.2 ( 1.00x)   1996.1 ( 1.00x)    0.30%
blend_v_w8_16bpc_rvv:     519.5 ( 3.83x)    523.4 ( 3.81x)    0.75%
blend_v_w16_16bpc_c:     3744.5 ( 1.00x)   3742.4 ( 1.00x)   -0.06%
blend_v_w16_16bpc_rvv:    899.6 ( 4.16x)    906.4 ( 4.13x)    0.76%
blend_v_w32_16bpc_c:     7047.5 ( 1.00x)   7079.3 ( 1.00x)    0.45%
blend_v_w32_16bpc_rvv:   1475.5 ( 4.78x)   1483.3 ( 4.77x)    0.53%

SpacemiT K1                  Before             After         Delta

blend_v_w2_8bpc_c:        216.3 ( 1.00x)    214.4 ( 1.00x)   -0.88%
blend_v_w2_8bpc_rvv:      144.0 ( 1.50x)    143.6 ( 1.49x)   -0.28%
blend_v_w4_8bpc_c:        919.8 ( 1.00x)    918.1 ( 1.00x)   -0.18%
blend_v_w4_8bpc_rvv:      236.6 ( 3.89x)    236.4 ( 3.88x)   -0.08%
blend_v_w8_8bpc_c:       1739.3 ( 1.00x)   1736.8 ( 1.00x)   -0.14%
blend_v_w8_8bpc_rvv:      236.8 ( 7.34x)    236.3 ( 7.35x)   -0.21%
blend_v_w16_8bpc_c:      3374.7 ( 1.00x)   3374.9 ( 1.00x)    0.01%
blend_v_w16_8bpc_rvv:     297.0 (11.36x)    296.8 (11.37x)   -0.07%
blend_v_w32_8bpc_c:      6647.5 ( 1.00x)   6645.5 ( 1.00x)   -0.03%
blend_v_w32_8bpc_rvv:     403.3 (16.48x)    402.4 (16.51x)   -0.22%

blend_v_w2_16bpc_c:       221.4 ( 1.00x)    220.1 ( 1.00x)   -0.59%
blend_v_w2_16bpc_rvv:     146.3 ( 1.51x)    147.3 ( 1.49x)    0.68%
blend_v_w4_16bpc_c:       973.3 ( 1.00x)    972.7 ( 1.00x)   -0.06%
blend_v_w4_16bpc_rvv:     280.3 ( 3.47x)    282.1 ( 3.45x)    0.64%
blend_v_w8_16bpc_c:      1814.8 ( 1.00x)   1816.2 ( 1.00x)    0.08%
blend_v_w8_16bpc_rvv:     376.6 ( 4.82x)    376.9 ( 4.82x)    0.08%
blend_v_w16_16bpc_c:     3485.5 ( 1.00x)   3485.5 ( 1.00x)    0.00%
blend_v_w16_16bpc_rvv:    531.1 ( 6.56x)    525.6 ( 6.63x)   -1.04%
blend_v_w32_16bpc_c:     6788.3 ( 1.00x)   6778.8 ( 1.00x)   -0.14%
blend_v_w32_16bpc_rvv:    904.5 ( 7.51x)    854.6 ( 7.93x)   -5.52%
2024-11-05 04:11:55 +00:00
Nathan E. Egge 907dd87191 riscv64/mc16: Unroll 16bpc RVV blend_v 2x
Kendryte K230                Before             After         Delta

blend_v_w2_16bpc_c:       225.8 ( 1.00x)    225.7 ( 1.00x)   -0.04%
blend_v_w2_16bpc_rvv:     194.7 ( 1.16x)    148.6 ( 1.52x)  -23.68%
blend_v_w4_16bpc_c:      1011.3 ( 1.00x)   1005.8 ( 1.00x)   -0.54%
blend_v_w4_16bpc_rvv:     387.2 ( 2.61x)    305.4 ( 3.29x)  -21.13%
blend_v_w8_16bpc_c:      1878.5 ( 1.00x)   1872.7 ( 1.00x)   -0.31%
blend_v_w8_16bpc_rvv:     475.3 ( 3.95x)    435.6 ( 4.30x)   -8.35%
blend_v_w16_16bpc_c:     3601.9 ( 1.00x)   3601.6 ( 1.00x)   -0.01%
blend_v_w16_16bpc_rvv:    891.2 ( 4.04x)    892.7 ( 4.03x)    0.17%
blend_v_w32_16bpc_c:     7043.7 ( 1.00x)   7058.8 ( 1.00x)    0.21%
blend_v_w32_16bpc_rvv:   1384.5 ( 5.09x)   1478.0 ( 4.78x)    6.75%

SpacemiT K1                  Before             After         Delta

blend_v_w2_16bpc_c:       222.6 ( 1.00x)    220.5 ( 1.00x)   -0.94%
blend_v_w2_16bpc_rvv:     195.7 ( 1.14x)    146.6 ( 1.50x)  -25.09%
blend_v_w4_16bpc_c:       972.3 ( 1.00x)    972.0 ( 1.00x)   -0.03%
blend_v_w4_16bpc_rvv:     349.1 ( 2.79x)    281.9 ( 3.45x)  -19.25%
blend_v_w8_16bpc_c:      1812.1 ( 1.00x)   1813.0 ( 1.00x)    0.05%
blend_v_w8_16bpc_rvv:     481.5 ( 3.76x)    376.0 ( 4.82x)  -21.91%
blend_v_w16_16bpc_c:     3488.4 ( 1.00x)   3484.6 ( 1.00x)   -0.11%
blend_v_w16_16bpc_rvv:    608.7 ( 5.73x)    523.4 ( 6.66x)  -14.01%
blend_v_w32_16bpc_c:     6795.3 ( 1.00x)   6792.4 ( 1.00x)   -0.04%
blend_v_w32_16bpc_rvv:    934.8 ( 7.27x)    907.3 ( 7.49x)   -2.94%
2024-11-04 20:20:37 +00:00
Nathan E. Egge 9710e7de9c riscv64/mc16: Branchless vsetvl in blend_v function
Kendryte K230                Before             After         Delta

blend_v_w2_16bpc_c:       226.0 ( 1.00x)    226.1 ( 1.00x)    0.04%
blend_v_w2_16bpc_rvv:     194.0 ( 1.16x)    193.9 ( 1.17x)   -0.05%
blend_v_w4_16bpc_c:      1011.8 ( 1.00x)   1009.4 ( 1.00x)   -0.24%
blend_v_w4_16bpc_rvv:     392.7 ( 2.58x)    390.8 ( 2.58x)   -0.48%
blend_v_w8_16bpc_c:      1987.9 ( 1.00x)   1988.0 ( 1.00x)    0.01%
blend_v_w8_16bpc_rvv:     561.5 ( 3.54x)    560.2 ( 3.55x)   -0.23%
blend_v_w16_16bpc_c:     3738.1 ( 1.00x)   3739.1 ( 1.00x)    0.03%
blend_v_w16_16bpc_rvv:    934.1 ( 4.00x)    932.2 ( 4.01x)   -0.20%
blend_v_w32_16bpc_c:     7031.0 ( 1.00x)   7030.1 ( 1.00x)   -0.01%
blend_v_w32_16bpc_rvv:   1403.3 ( 5.01x)   1395.8 ( 5.04x)   -0.53%

SpacemiT K1                  Before             After         Delta

blend_v_w2_16bpc_c:       221.0 ( 1.00x)    221.2 ( 1.00x)    0.09%
blend_v_w2_16bpc_rvv:     195.2 ( 1.13x)    196.0 ( 1.13x)    0.41%
blend_v_w4_16bpc_c:       969.8 ( 1.00x)    971.9 ( 1.00x)    0.22%
blend_v_w4_16bpc_rvv:     348.8 ( 2.78x)    349.1 ( 2.78x)    0.09%
blend_v_w8_16bpc_c:      1812.6 ( 1.00x)   1814.9 ( 1.00x)    0.13%
blend_v_w8_16bpc_rvv:     486.1 ( 3.73x)    484.3 ( 3.75x)   -0.37%
blend_v_w16_16bpc_c:     3483.0 ( 1.00x)   3485.1 ( 1.00x)    0.06%
blend_v_w16_16bpc_rvv:    608.7 ( 5.72x)    607.4 ( 5.74x)   -0.21%
blend_v_w32_16bpc_c:     6791.8 ( 1.00x)   6794.2 ( 1.00x)    0.04%
blend_v_w32_16bpc_rvv:    940.6 ( 7.22x)    942.1 ( 7.21x)    0.16%
2024-11-04 19:46:26 +00:00
Nathan E. Egge 28d1c21779 riscv64/mc16: Add VLEN=256 8bpc RVV blend_v function
SpacemiT K1                  Before             After         Delta

blend_v_w2_16bpc_c:       221.5 ( 1.00x)    220.3 ( 1.00x)   -0.54%
blend_v_w2_16bpc_rvv:     193.5 ( 1.14x)    194.3 ( 1.13x)    0.41%
blend_v_w4_16bpc_c:       968.8 ( 1.00x)    967.2 ( 1.00x)   -0.17%
blend_v_w4_16bpc_rvv:     442.2 ( 2.19x)    347.4 ( 2.78x)  -21.44%
blend_v_w8_16bpc_c:      1809.4 ( 1.00x)   1811.2 ( 1.00x)    0.10%
blend_v_w8_16bpc_rvv:     557.4 ( 3.25x)    483.2 ( 3.75x)  -13.31%
blend_v_w16_16bpc_c:     3481.4 ( 1.00x)   3473.4 ( 1.00x)   -0.23%
blend_v_w16_16bpc_rvv:    844.3 ( 4.12x)    603.1 ( 5.76x)  -28.57%
blend_v_w32_16bpc_c:     6783.1 ( 1.00x)   6749.8 ( 1.00x)   -0.49%
blend_v_w32_16bpc_rvv:   1406.1 ( 4.82x)    919.4 ( 7.34x)  -34.61%
2024-11-04 18:52:32 +00:00
Nathan E. Egge aa2deb898e riscv64/mc16: Add 16bpc RVV blend_v function
Kendryte K230

blend_v_w2_16bpc_c:       226.5 ( 1.00x)
blend_v_w2_16bpc_rvv:     192.2 ( 1.18x)
blend_v_w4_16bpc_c:      1010.3 ( 1.00x)
blend_v_w4_16bpc_rvv:     390.5 ( 2.59x)
blend_v_w8_16bpc_c:      1994.2 ( 1.00x)
blend_v_w8_16bpc_rvv:     561.7 ( 3.55x)
blend_v_w16_16bpc_c:     3737.9 ( 1.00x)
blend_v_w16_16bpc_rvv:    928.0 ( 4.03x)
blend_v_w32_16bpc_c:     7064.7 ( 1.00x)
blend_v_w32_16bpc_rvv:   1428.9 ( 4.94x)

SpacemiT K1

blend_v_w2_16bpc_c:       220.8 ( 1.00x)
blend_v_w2_16bpc_rvv:     193.5 ( 1.14x)
blend_v_w4_16bpc_c:       967.3 ( 1.00x)
blend_v_w4_16bpc_rvv:     439.5 ( 2.20x)
blend_v_w8_16bpc_c:      1810.2 ( 1.00x)
blend_v_w8_16bpc_rvv:     555.3 ( 3.26x)
blend_v_w16_16bpc_c:     3476.4 ( 1.00x)
blend_v_w16_16bpc_rvv:    830.9 ( 4.18x)
blend_v_w32_16bpc_c:     6772.9 ( 1.00x)
blend_v_w32_16bpc_rvv:   1356.3 ( 4.99x)
2024-11-04 18:52:30 +00:00
Nathan E. Egge c783088fe7 riscv64/mc16: Unroll 16bpc RVV blend 2x
Kendryte K230              Before               After         Delta

blend_w4_16bpc_c:       210.0 ( 1.00x)      208.9 ( 1.00x)   -0.52%
blend_w4_16bpc_rvv:      88.5 ( 2.37x)       66.2 ( 3.15x)  -25.20%
blend_w8_16bpc_c:       614.1 ( 1.00x)      613.5 ( 1.00x)   -0.10%
blend_w8_16bpc_rvv:     143.1 ( 4.29x)      126.9 ( 4.83x)  -11.32%
blend_w16_16bpc_c:     2371.2 ( 1.00x)     2371.3 ( 1.00x)    0.00%
blend_w16_16bpc_rvv:    461.1 ( 5.14x)      413.2 ( 5.74x)  -10.39%
blend_w32_16bpc_c:     5998.4 ( 1.00x)     5998.4 ( 1.00x)    0.00%
blend_w32_16bpc_rvv:    978.4 ( 6.13x)     1013.1 ( 5.92x)    3.55%

SpacemiT K1                Before               After         Delta

blend_w4_16bpc_c:       205.8 ( 1.00x)      205.9 ( 1.00x)    0.05%
blend_w4_16bpc_rvv:      80.9 ( 2.54x)       64.9 ( 3.17x)  -19.78%
blend_w8_16bpc_c:       599.9 ( 1.00x)      599.9 ( 1.00x)    0.00%
blend_w8_16bpc_rvv:     134.4 ( 4.46x)      101.9 ( 5.89x)  -24.18%
blend_w16_16bpc_c:     2316.5 ( 1.00x)     2316.5 ( 1.00x)    0.00%
blend_w16_16bpc_rvv:    302.0 ( 7.67x)      262.8 ( 8.81x)  -12.98%
blend_w32_16bpc_c:     5861.9 ( 1.00x)     5861.4 ( 1.00x)   -0.01%
blend_w32_16bpc_rvv:    589.6 ( 9.94x)      602.2 ( 9.73x)    2.14%
2024-10-31 07:11:35 +00:00
Nathan E. Egge 67c60d76e1 riscv64/mc16: Branchless vsetvl in blend function
Kendryte K230              Before               After         Delta

blend_w4_16bpc_c:       208.8 ( 1.00x)      209.9 ( 1.00x)    0.53%
blend_w4_16bpc_rvv:      85.9 ( 2.43x)       88.6 ( 2.37x)    3.14%
blend_w8_16bpc_c:       613.2 ( 1.00x)      614.3 ( 1.00x)    0.18%
blend_w8_16bpc_rvv:     145.4 ( 4.22x)      143.1 ( 4.29x)   -1.58%
blend_w16_16bpc_c:     2371.9 ( 1.00x)     2373.6 ( 1.00x)    0.07%
blend_w16_16bpc_rvv:    464.0 ( 5.11x)      461.2 ( 5.15x)   -0.60%
blend_w32_16bpc_c:     6005.6 ( 1.00x)     6007.7 ( 1.00x)    0.03%
blend_w32_16bpc_rvv:    981.6 ( 6.12x)      979.4 ( 6.13x)   -0.22%

SpacemiT K1                Before               After         Delta

blend_w4_16bpc_c:       206.4 ( 1.00x)      205.7 ( 1.00x)   -0.34%
blend_w4_16bpc_rvv:      79.5 ( 2.60x)       81.0 ( 2.54x)    1.89%
blend_w8_16bpc_c:       600.7 ( 1.00x)      599.7 ( 1.00x)   -0.17%
blend_w8_16bpc_rvv:     133.3 ( 4.51x)      134.1 ( 4.47x)    0.60%
blend_w16_16bpc_c:     2315.9 ( 1.00x)     2315.2 ( 1.00x)   -0.03%
blend_w16_16bpc_rvv:    305.2 ( 7.59x)      300.7 ( 7.70x)   -1.47%
blend_w32_16bpc_c:     5861.1 ( 1.00x)     5860.2 ( 1.00x)   -0.02%
blend_w32_16bpc_rvv:    592.5 ( 9.89x)      589.5 ( 9.94x)   -0.51%
2024-10-31 07:11:35 +00:00
Nathan E. Egge 3437a26b3d riscv64/mc16: Add VLEN=256 8bpc RVV blend function
SpacemiT K1                Before               After         Delta

blend_w4_16bpc_c:       206.8 ( 1.00x)      206.0 ( 1.00x)   -0.39%
blend_w4_16bpc_rvv:      95.8 ( 2.16x)       77.8 ( 2.65x)  -18.79%
blend_w8_16bpc_c:       600.4 ( 1.00x)      600.1 ( 1.00x)   -0.05%
blend_w8_16bpc_rvv:     161.7 ( 3.71x)      131.3 ( 4.57x)  -18.80%
blend_w16_16bpc_c:     2317.6 ( 1.00x)     2316.5 ( 1.00x)   -0.05%
blend_w16_16bpc_rvv:    459.6 ( 5.04x)      302.9 ( 7.65x)  -34.09%
blend_w32_16bpc_c:     5863.0 ( 1.00x)     5863.3 ( 1.00x)    0.01%
blend_w32_16bpc_rvv:    992.7 ( 5.91x)      578.1 (10.14x)  -41.76%
2024-10-31 07:11:35 +00:00
Nathan E. Egge e542f661d0 meson: Move riscv64 8bpc only files into bitdepth sources
The cdef.S, itx.S and mc.S files contain only 8bpc implementations and
 should be compiled only when building with -Dbitdepths=8 configuration.
2024-10-29 12:17:14 +00:00
Nathan E. EggeandLuca Barbato ca489d8aab riscv64/mc16: Add 16bpc RVV blend function
Kendryte K230

blend_w4_16bpc_c:        214.4 ( 1.00x)
blend_w4_16bpc_rvv:       90.2 ( 2.38x)
blend_w8_16bpc_c:        618.9 ( 1.00x)
blend_w8_16bpc_rvv:      147.4 ( 4.20x)
blend_w16_16bpc_c:      2376.5 ( 1.00x)
blend_w16_16bpc_rvv:     466.0 ( 5.10x)
blend_w32_16bpc_c:      6008.6 ( 1.00x)
blend_w32_16bpc_rvv:     985.0 ( 6.10x)

SpacemiT K1

blend_w4_16bpc_c:        204.9 ( 1.00x)
blend_w4_16bpc_rvv:       88.3 ( 2.32x)
blend_w8_16bpc_c:        598.5 ( 1.00x)
blend_w8_16bpc_rvv:      155.3 ( 3.85x)
blend_w16_16bpc_c:      2315.4 ( 1.00x)
blend_w16_16bpc_rvv:     444.4 ( 5.21x)
blend_w32_16bpc_c:      5860.1 ( 1.00x)
blend_w32_16bpc_rvv:     993.0 ( 5.90x)
2024-10-29 08:21:53 +00:00
Nathan E. Egge 22e9c0fee3 riscv64/ipred16: Fix build error with -Dbitdepths=16
When configuring and building dav1d with just the 16bp code paths using
 meson setup .. -Dbitdepths=16 there is an undefined reference to
 dav1d_dc_gen_8bpc_rvv due to a typo in src/riscv/64/ipred16.S.
2024-10-28 23:30:46 +00:00
Nathan E. Egge c3fa1db301 NEWS: add itx to riscv list 2024-10-16 18:06:00 +00:00
Nathan E. Egge 789a1f652b riscv64/itx: Replace vwadd+vnsra with vnclip
The vnclip instruction does a fixed-point saturating add then shift and
 can replace vwadd followed by vnsra in idct_4, idct_8, idct_16, iadst_8
 and iadst_16.
Including 572c5a6 (which applies the same change to iadst_4) these
 commits give the following average improvements across all modified 2D
 transform functions:

          Kendryte K230     SpacemiT K1

   4x4       -5.50%           -4.44%
   8x8       -9.78%           -7.62%
  16x16      -9.70%           -9.04%
   4x8       -8.39%           -7.54%
   8x4       -8.10%           -4.66%
   4x16      -8.16%           -7.74%
  16x4       -8.07%           -6.96%
   8x16      -9.11%           -7.43%
  16x8       -9.87%           -7.81%

Kendryte K230                                      Old     New     Delta

inv_txfm_add_4x4_adst_adst_0_8bpc_rvv              99.0    93.4   -5.66%
inv_txfm_add_4x4_adst_adst_1_8bpc_rvv              99.0    93.4   -5.66%
inv_txfm_add_4x4_adst_dct_0_8bpc_rvv               93.4    87.2   -6.64%
inv_txfm_add_4x4_adst_dct_1_8bpc_rvv               93.5    87.2   -6.74%
inv_txfm_add_4x4_adst_flipadst_0_8bpc_rvv         100.3    94.9   -5.38%
inv_txfm_add_4x4_adst_flipadst_1_8bpc_rvv         100.3    94.9   -5.38%
inv_txfm_add_4x4_adst_identity_0_8bpc_rvv          80.5    77.2   -4.10%
inv_txfm_add_4x4_adst_identity_1_8bpc_rvv          80.5    77.2   -4.10%
inv_txfm_add_4x4_dct_adst_0_8bpc_rvv               94.1    88.5   -5.95%
inv_txfm_add_4x4_dct_adst_1_8bpc_rvv               94.1    88.5   -5.95%
inv_txfm_add_4x4_dct_dct_0_8bpc_rvv                40.3    40.3    0.00%
inv_txfm_add_4x4_dct_dct_1_8bpc_rvv                92.2    82.1  -10.95%
inv_txfm_add_4x4_dct_flipadst_0_8bpc_rvv           95.3    89.9   -5.67%
inv_txfm_add_4x4_dct_flipadst_1_8bpc_rvv           95.3    89.9   -5.67%
inv_txfm_add_4x4_dct_identity_0_8bpc_rvv           75.5    73.3   -2.91%
inv_txfm_add_4x4_dct_identity_1_8bpc_rvv           75.5    73.3   -2.91%
inv_txfm_add_4x4_flipadst_adst_0_8bpc_rvv         100.3    94.7   -5.58%
inv_txfm_add_4x4_flipadst_adst_1_8bpc_rvv         100.3    94.7   -5.58%
inv_txfm_add_4x4_flipadst_dct_0_8bpc_rvv           94.8    88.4   -6.75%
inv_txfm_add_4x4_flipadst_dct_1_8bpc_rvv           94.8    88.5   -6.65%
inv_txfm_add_4x4_flipadst_flipadst_0_8bpc_rvv     105.0    96.0   -8.57%
inv_txfm_add_4x4_flipadst_flipadst_1_8bpc_rvv     105.0    95.9   -8.67%
inv_txfm_add_4x4_flipadst_identity_0_8bpc_rvv      81.6    78.5   -3.80%
inv_txfm_add_4x4_flipadst_identity_1_8bpc_rvv      81.6    78.4   -3.92%
inv_txfm_add_4x4_identity_adst_0_8bpc_rvv          80.3    77.8   -3.11%
inv_txfm_add_4x4_identity_adst_1_8bpc_rvv          80.3    77.8   -3.11%
inv_txfm_add_4x4_identity_dct_0_8bpc_rvv           77.2    71.7   -7.12%
inv_txfm_add_4x4_identity_dct_1_8bpc_rvv           77.2    71.7   -7.12%
inv_txfm_add_4x4_identity_flipadst_0_8bpc_rvv      81.5    79.2   -2.82%
inv_txfm_add_4x4_identity_flipadst_1_8bpc_rvv      81.6    79.2   -2.94%
inv_txfm_add_4x4_identity_identity_0_8bpc_rvv      62.8    61.6   -1.91%
inv_txfm_add_4x4_identity_identity_1_8bpc_rvv      62.8    61.6   -1.91%
inv_txfm_add_4x4_wht_wht_0_8bpc_rvv                67.8    67.8    0.00%
inv_txfm_add_4x4_wht_wht_1_8bpc_rvv                67.8    67.8    0.00%

inv_txfm_add_8x8_adst_adst_0_8bpc_rvv             403.1   356.1  -11.66%
inv_txfm_add_8x8_adst_adst_1_8bpc_rvv             403.1   356.0  -11.68%
inv_txfm_add_8x8_adst_dct_0_8bpc_rvv              360.2   323.2  -10.27%
inv_txfm_add_8x8_adst_dct_1_8bpc_rvv              360.2   323.2  -10.27%
inv_txfm_add_8x8_adst_flipadst_0_8bpc_rvv         405.2   358.4  -11.55%
inv_txfm_add_8x8_adst_flipadst_1_8bpc_rvv         405.2   358.4  -11.55%
inv_txfm_add_8x8_adst_identity_0_8bpc_rvv         284.3   261.0   -8.20%
inv_txfm_add_8x8_adst_identity_1_8bpc_rvv         284.4   260.9   -8.26%
inv_txfm_add_8x8_dct_adst_0_8bpc_rvv              360.2   322.0  -10.61%
inv_txfm_add_8x8_dct_adst_1_8bpc_rvv              360.0   321.9  -10.58%
inv_txfm_add_8x8_dct_dct_0_8bpc_rvv                76.6    77.0    0.52%
inv_txfm_add_8x8_dct_dct_1_8bpc_rvv               317.2   289.0   -8.89%
inv_txfm_add_8x8_dct_flipadst_0_8bpc_rvv          363.7   324.3  -10.83%
inv_txfm_add_8x8_dct_flipadst_1_8bpc_rvv          363.8   324.3  -10.86%
inv_txfm_add_8x8_dct_identity_0_8bpc_rvv          241.2   226.9   -5.93%
inv_txfm_add_8x8_dct_identity_1_8bpc_rvv          241.3   227.0   -5.93%
inv_txfm_add_8x8_flipadst_adst_0_8bpc_rvv         404.9   358.0  -11.58%
inv_txfm_add_8x8_flipadst_adst_1_8bpc_rvv         405.0   358.1  -11.58%
inv_txfm_add_8x8_flipadst_dct_0_8bpc_rvv          365.1   323.8  -11.31%
inv_txfm_add_8x8_flipadst_dct_1_8bpc_rvv          365.2   323.9  -11.31%
inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_rvv     407.2   359.6  -11.69%
inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_rvv     406.4   359.5  -11.54%
inv_txfm_add_8x8_flipadst_identity_0_8bpc_rvv     285.8   261.9   -8.36%
inv_txfm_add_8x8_flipadst_identity_1_8bpc_rvv     285.9   261.8   -8.43%
inv_txfm_add_8x8_identity_adst_0_8bpc_rvv         269.9   244.5   -9.41%
inv_txfm_add_8x8_identity_adst_1_8bpc_rvv         269.8   244.5   -9.38%
inv_txfm_add_8x8_identity_dct_0_8bpc_rvv          225.5   209.6   -7.05%
inv_txfm_add_8x8_identity_dct_1_8bpc_rvv          225.6   209.5   -7.14%
inv_txfm_add_8x8_identity_flipadst_0_8bpc_rvv     270.5   246.5   -8.87%
inv_txfm_add_8x8_identity_flipadst_1_8bpc_rvv     270.5   246.5   -8.87%
inv_txfm_add_8x8_identity_identity_0_8bpc_rvv     146.5   145.4   -0.75%
inv_txfm_add_8x8_identity_identity_1_8bpc_rvv     146.4   145.4   -0.68%

inv_txfm_add_16x16_adst_adst_0_8bpc_rvv          1363.4  1212.0  -11.10%
inv_txfm_add_16x16_adst_adst_1_8bpc_rvv          1363.6  1212.2  -11.10%
inv_txfm_add_16x16_adst_adst_2_8bpc_rvv          1813.7  1601.4  -11.71%
inv_txfm_add_16x16_adst_dct_0_8bpc_rvv           1185.9  1074.6   -9.39%
inv_txfm_add_16x16_adst_dct_1_8bpc_rvv           1186.0  1074.7   -9.38%
inv_txfm_add_16x16_adst_dct_2_8bpc_rvv           1639.5  1468.9  -10.41%
inv_txfm_add_16x16_adst_flipadst_0_8bpc_rvv      1374.8  1214.8  -11.64%
inv_txfm_add_16x16_adst_flipadst_1_8bpc_rvv      1374.7  1214.6  -11.65%
inv_txfm_add_16x16_adst_flipadst_2_8bpc_rvv      1819.3  1610.9  -11.45%
inv_txfm_add_16x16_dct_adst_0_8bpc_rvv           1283.3  1139.1  -11.24%
inv_txfm_add_16x16_dct_adst_1_8bpc_rvv           1283.2  1139.2  -11.22%
inv_txfm_add_16x16_dct_adst_2_8bpc_rvv           1632.4  1471.9   -9.83%
inv_txfm_add_16x16_dct_dct_0_8bpc_rvv             160.9   158.7   -1.37%
inv_txfm_add_16x16_dct_dct_1_8bpc_rvv            1099.5   997.1   -9.31%
inv_txfm_add_16x16_dct_dct_2_8bpc_rvv            1465.3  1335.2   -8.88%
inv_txfm_add_16x16_dct_flipadst_0_8bpc_rvv       1286.8  1143.2  -11.16%
inv_txfm_add_16x16_dct_flipadst_1_8bpc_rvv       1286.8  1143.3  -11.15%
inv_txfm_add_16x16_dct_flipadst_2_8bpc_rvv       1638.6  1473.5  -10.08%
inv_txfm_add_16x16_dct_identity_0_8bpc_rvv        806.6   783.3   -2.89%
inv_txfm_add_16x16_dct_identity_1_8bpc_rvv        806.7   783.4   -2.89%
inv_txfm_add_16x16_dct_identity_2_8bpc_rvv       1163.1  1105.3   -4.97%
inv_txfm_add_16x16_flipadst_adst_0_8bpc_rvv      1374.3  1216.0  -11.52%
inv_txfm_add_16x16_flipadst_adst_1_8bpc_rvv      1374.3  1216.2  -11.50%
inv_txfm_add_16x16_flipadst_adst_2_8bpc_rvv      1817.5  1609.7  -11.43%
inv_txfm_add_16x16_flipadst_dct_0_8bpc_rvv       1190.4  1073.8   -9.80%
inv_txfm_add_16x16_flipadst_dct_1_8bpc_rvv       1190.4  1073.9   -9.79%
inv_txfm_add_16x16_flipadst_dct_2_8bpc_rvv       1640.4  1472.6  -10.23%
inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_rvv  1376.0  1224.2  -11.03%
inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_rvv  1376.0  1224.1  -11.04%
inv_txfm_add_16x16_flipadst_flipadst_2_8bpc_rvv  1829.3  1616.6  -11.63%
inv_txfm_add_16x16_identity_dct_0_8bpc_rvv        952.9   882.0   -7.44%
inv_txfm_add_16x16_identity_dct_1_8bpc_rvv        952.8   881.9   -7.44%
inv_txfm_add_16x16_identity_dct_2_8bpc_rvv       1172.0  1100.1   -6.13%
inv_txfm_add_16x16_identity_identity_0_8bpc_rvv   657.6   659.8    0.33%
inv_txfm_add_16x16_identity_identity_1_8bpc_rvv   657.6   659.7    0.32%
inv_txfm_add_16x16_identity_identity_2_8bpc_rvv   876.2   878.1    0.22%

inv_txfm_add_4x8_adst_adst_0_8bpc_rvv             197.3   178.0   -9.78%
inv_txfm_add_4x8_adst_adst_1_8bpc_rvv             197.4   178.0   -9.83%
inv_txfm_add_4x8_adst_dct_0_8bpc_rvv              174.9   159.9   -8.58%
inv_txfm_add_4x8_adst_dct_1_8bpc_rvv              174.9   159.9   -8.58%
inv_txfm_add_4x8_adst_flipadst_0_8bpc_rvv         199.2   180.2   -9.54%
inv_txfm_add_4x8_adst_flipadst_1_8bpc_rvv         199.2   180.2   -9.54%
inv_txfm_add_4x8_adst_identity_0_8bpc_rvv         123.3   118.0   -4.30%
inv_txfm_add_4x8_adst_identity_1_8bpc_rvv         123.3   118.0   -4.30%
inv_txfm_add_4x8_dct_adst_0_8bpc_rvv              191.1   171.8  -10.10%
inv_txfm_add_4x8_dct_adst_1_8bpc_rvv              191.1   171.7  -10.15%
inv_txfm_add_4x8_dct_dct_0_8bpc_rvv               168.9   153.6   -9.06%
inv_txfm_add_4x8_dct_dct_1_8bpc_rvv               169.0   153.6   -9.11%
inv_txfm_add_4x8_dct_flipadst_0_8bpc_rvv          193.0   173.9   -9.90%
inv_txfm_add_4x8_dct_flipadst_1_8bpc_rvv          193.0   173.9   -9.90%
inv_txfm_add_4x8_dct_identity_0_8bpc_rvv          117.0   111.7   -4.53%
inv_txfm_add_4x8_dct_identity_1_8bpc_rvv          117.0   111.7   -4.53%
inv_txfm_add_4x8_flipadst_adst_0_8bpc_rvv         198.0   178.6   -9.80%
inv_txfm_add_4x8_flipadst_adst_1_8bpc_rvv         198.0   178.6   -9.80%
inv_txfm_add_4x8_flipadst_dct_0_8bpc_rvv          175.8   160.5   -8.70%
inv_txfm_add_4x8_flipadst_dct_1_8bpc_rvv          175.8   160.5   -8.70%
inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_rvv     199.9   180.5   -9.70%
inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_rvv     199.9   180.5   -9.70%
inv_txfm_add_4x8_flipadst_identity_0_8bpc_rvv     123.6   118.6   -4.05%
inv_txfm_add_4x8_flipadst_identity_1_8bpc_rvv     123.6   118.6   -4.05%
inv_txfm_add_4x8_identity_adst_0_8bpc_rvv         171.3   154.2   -9.98%
inv_txfm_add_4x8_identity_adst_1_8bpc_rvv         171.3   154.2   -9.98%
inv_txfm_add_4x8_identity_dct_0_8bpc_rvv          148.6   136.5   -8.14%
inv_txfm_add_4x8_identity_dct_1_8bpc_rvv          148.6   136.5   -8.14%
inv_txfm_add_4x8_identity_flipadst_0_8bpc_rvv     173.1   156.4   -9.65%
inv_txfm_add_4x8_identity_flipadst_1_8bpc_rvv     173.2   156.4   -9.70%
inv_txfm_add_4x8_identity_identity_0_8bpc_rvv      94.3    94.2   -0.11%
inv_txfm_add_4x8_identity_identity_1_8bpc_rvv      94.2    94.2    0.00%

inv_txfm_add_8x4_adst_adst_0_8bpc_rvv             201.2   188.4   -6.36%
inv_txfm_add_8x4_adst_adst_1_8bpc_rvv             201.2   188.4   -6.36%
inv_txfm_add_8x4_adst_dct_0_8bpc_rvv              194.9   175.7   -9.85%
inv_txfm_add_8x4_adst_dct_1_8bpc_rvv              194.9   175.7   -9.85%
inv_txfm_add_8x4_adst_flipadst_0_8bpc_rvv         202.4   182.3   -9.93%
inv_txfm_add_8x4_adst_flipadst_1_8bpc_rvv         202.4   182.3   -9.93%
inv_txfm_add_8x4_adst_identity_0_8bpc_rvv         170.1   155.7   -8.47%
inv_txfm_add_8x4_adst_identity_1_8bpc_rvv         170.1   155.7   -8.47%
inv_txfm_add_8x4_dct_adst_0_8bpc_rvv              178.0   162.1   -8.93%
inv_txfm_add_8x4_dct_adst_1_8bpc_rvv              178.0   162.1   -8.93%
inv_txfm_add_8x4_dct_dct_0_8bpc_rvv               172.8   157.0   -9.14%
inv_txfm_add_8x4_dct_dct_1_8bpc_rvv               172.9   157.0   -9.20%
inv_txfm_add_8x4_dct_flipadst_0_8bpc_rvv          180.3   163.7   -9.21%
inv_txfm_add_8x4_dct_flipadst_1_8bpc_rvv          180.3   163.7   -9.21%
inv_txfm_add_8x4_dct_identity_0_8bpc_rvv          147.9   137.9   -6.76%
inv_txfm_add_8x4_dct_identity_1_8bpc_rvv          147.9   137.9   -6.76%
inv_txfm_add_8x4_flipadst_adst_0_8bpc_rvv         202.4   182.3   -9.93%
inv_txfm_add_8x4_flipadst_adst_1_8bpc_rvv         202.4   182.3   -9.93%
inv_txfm_add_8x4_flipadst_dct_0_8bpc_rvv          196.3   175.9  -10.39%
inv_txfm_add_8x4_flipadst_dct_1_8bpc_rvv          196.3   175.9  -10.39%
inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_rvv     203.7   183.4   -9.97%
inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_rvv     203.7   183.4   -9.97%
inv_txfm_add_8x4_flipadst_identity_0_8bpc_rvv     171.1   155.9   -8.88%
inv_txfm_add_8x4_flipadst_identity_1_8bpc_rvv     171.1   155.9   -8.88%
inv_txfm_add_8x4_identity_adst_0_8bpc_rvv         126.8   120.9   -4.65%
inv_txfm_add_8x4_identity_adst_1_8bpc_rvv         126.8   120.9   -4.65%
inv_txfm_add_8x4_identity_dct_0_8bpc_rvv          121.5   117.0   -3.70%
inv_txfm_add_8x4_identity_dct_1_8bpc_rvv          121.6   117.0   -3.78%
inv_txfm_add_8x4_identity_flipadst_0_8bpc_rvv     129.1   122.3   -5.27%
inv_txfm_add_8x4_identity_flipadst_1_8bpc_rvv     129.1   122.3   -5.27%
inv_txfm_add_8x4_identity_identity_0_8bpc_rvv      98.5    95.7   -2.84%
inv_txfm_add_8x4_identity_identity_1_8bpc_rvv      98.5    95.7   -2.84%

inv_txfm_add_4x16_adst_adst_0_8bpc_rvv            384.4   344.6  -10.35%
inv_txfm_add_4x16_adst_adst_1_8bpc_rvv            384.5   344.6  -10.38%
inv_txfm_add_4x16_adst_adst_2_8bpc_rvv            429.3   387.3   -9.78%
inv_txfm_add_4x16_adst_dct_0_8bpc_rvv             333.7   304.3   -8.81%
inv_txfm_add_4x16_adst_dct_1_8bpc_rvv             333.7   304.2   -8.84%
inv_txfm_add_4x16_adst_dct_2_8bpc_rvv             381.2   354.2   -7.08%
inv_txfm_add_4x16_adst_flipadst_0_8bpc_rvv        385.7   349.1   -9.49%
inv_txfm_add_4x16_adst_flipadst_1_8bpc_rvv        385.7   349.1   -9.49%
inv_txfm_add_4x16_adst_flipadst_2_8bpc_rvv        433.0   389.3  -10.09%
inv_txfm_add_4x16_adst_identity_0_8bpc_rvv        251.6   244.2   -2.94%
inv_txfm_add_4x16_adst_identity_1_8bpc_rvv        251.5   244.1   -2.94%
inv_txfm_add_4x16_adst_identity_2_8bpc_rvv        300.4   289.6   -3.60%
inv_txfm_add_4x16_dct_adst_0_8bpc_rvv             378.5   335.6  -11.33%
inv_txfm_add_4x16_dct_adst_1_8bpc_rvv             378.5   335.5  -11.36%
inv_txfm_add_4x16_dct_adst_2_8bpc_rvv             420.6   369.5  -12.15%
inv_txfm_add_4x16_dct_dct_0_8bpc_rvv              323.5   295.3   -8.72%
inv_txfm_add_4x16_dct_dct_1_8bpc_rvv              323.2   295.2   -8.66%
inv_txfm_add_4x16_dct_dct_2_8bpc_rvv              362.9   333.0   -8.24%
inv_txfm_add_4x16_dct_flipadst_0_8bpc_rvv         375.3   339.4   -9.57%
inv_txfm_add_4x16_dct_flipadst_1_8bpc_rvv         375.4   339.0   -9.70%
inv_txfm_add_4x16_dct_flipadst_2_8bpc_rvv         414.8   372.2  -10.27%
inv_txfm_add_4x16_dct_identity_0_8bpc_rvv         240.8   234.7   -2.53%
inv_txfm_add_4x16_dct_identity_1_8bpc_rvv         240.7   234.7   -2.49%
inv_txfm_add_4x16_dct_identity_2_8bpc_rvv         283.2   268.0   -5.37%
inv_txfm_add_4x16_flipadst_adst_0_8bpc_rvv        384.2   345.8   -9.99%
inv_txfm_add_4x16_flipadst_adst_1_8bpc_rvv        384.1   345.8   -9.97%
inv_txfm_add_4x16_flipadst_adst_2_8bpc_rvv        432.5   387.7  -10.36%
inv_txfm_add_4x16_flipadst_dct_0_8bpc_rvv         334.9   307.0   -8.33%
inv_txfm_add_4x16_flipadst_dct_1_8bpc_rvv         335.0   307.1   -8.33%
inv_txfm_add_4x16_flipadst_dct_2_8bpc_rvv         386.1   347.2  -10.08%
inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_rvv    386.7   349.4   -9.65%
inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_rvv    386.8   349.5   -9.64%
inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_rvv    436.6   392.9  -10.01%
inv_txfm_add_4x16_flipadst_identity_0_8bpc_rvv    252.4   247.4   -1.98%
inv_txfm_add_4x16_flipadst_identity_1_8bpc_rvv    252.4   247.5   -1.94%
inv_txfm_add_4x16_flipadst_identity_2_8bpc_rvv    302.1   286.7   -5.10%
inv_txfm_add_4x16_identity_adst_0_8bpc_rvv        348.3   317.4   -8.87%
inv_txfm_add_4x16_identity_adst_1_8bpc_rvv        348.4   317.5   -8.87%
inv_txfm_add_4x16_identity_adst_2_8bpc_rvv        361.4   329.0   -8.97%
inv_txfm_add_4x16_identity_dct_0_8bpc_rvv         301.8   275.8   -8.61%
inv_txfm_add_4x16_identity_dct_1_8bpc_rvv         301.8   275.8   -8.61%
inv_txfm_add_4x16_identity_dct_2_8bpc_rvv         312.0   287.4   -7.88%
inv_txfm_add_4x16_identity_flipadst_0_8bpc_rvv    352.2   321.9   -8.60%
inv_txfm_add_4x16_identity_flipadst_1_8bpc_rvv    352.2   322.0   -8.57%
inv_txfm_add_4x16_identity_flipadst_2_8bpc_rvv    363.7   332.5   -8.58%
inv_txfm_add_4x16_identity_identity_0_8bpc_rvv    215.8   215.0   -0.37%
inv_txfm_add_4x16_identity_identity_1_8bpc_rvv    215.8   215.1   -0.32%
inv_txfm_add_4x16_identity_identity_2_8bpc_rvv    228.0   227.0   -0.44%

inv_txfm_add_16x4_adst_adst_0_8bpc_rvv            430.3   388.5   -9.71%
inv_txfm_add_16x4_adst_adst_1_8bpc_rvv            430.3   388.5   -9.71%
inv_txfm_add_16x4_adst_adst_2_8bpc_rvv            430.2   388.5   -9.69%
inv_txfm_add_16x4_adst_dct_0_8bpc_rvv             412.1   374.1   -9.22%
inv_txfm_add_16x4_adst_dct_1_8bpc_rvv             412.0   374.3   -9.15%
inv_txfm_add_16x4_adst_dct_2_8bpc_rvv             412.1   374.2   -9.20%
inv_txfm_add_16x4_adst_flipadst_0_8bpc_rvv        432.9   391.0   -9.68%
inv_txfm_add_16x4_adst_flipadst_1_8bpc_rvv        432.8   391.1   -9.63%
inv_txfm_add_16x4_adst_flipadst_2_8bpc_rvv        432.4   391.0   -9.57%
inv_txfm_add_16x4_adst_identity_0_8bpc_rvv        358.4   332.1   -7.34%
inv_txfm_add_16x4_adst_identity_1_8bpc_rvv        358.4   332.3   -7.28%
inv_txfm_add_16x4_adst_identity_2_8bpc_rvv        358.5   332.5   -7.25%
inv_txfm_add_16x4_dct_adst_0_8bpc_rvv             386.9   347.1  -10.29%
inv_txfm_add_16x4_dct_adst_1_8bpc_rvv             386.8   347.1  -10.26%
inv_txfm_add_16x4_dct_adst_2_8bpc_rvv             387.0   346.8  -10.39%
inv_txfm_add_16x4_dct_dct_0_8bpc_rvv              363.3   330.9   -8.92%
inv_txfm_add_16x4_dct_dct_1_8bpc_rvv              363.3   330.9   -8.92%
inv_txfm_add_16x4_dct_dct_2_8bpc_rvv              363.2   331.0   -8.87%
inv_txfm_add_16x4_dct_flipadst_0_8bpc_rvv         383.7   349.8   -8.84%
inv_txfm_add_16x4_dct_flipadst_1_8bpc_rvv         384.3   349.8   -8.98%
inv_txfm_add_16x4_dct_flipadst_2_8bpc_rvv         384.3   349.7   -9.00%
inv_txfm_add_16x4_dct_identity_0_8bpc_rvv         310.2   288.4   -7.03%
inv_txfm_add_16x4_dct_identity_1_8bpc_rvv         310.2   288.4   -7.03%
inv_txfm_add_16x4_dct_identity_2_8bpc_rvv         310.3   288.5   -7.03%
inv_txfm_add_16x4_flipadst_adst_0_8bpc_rvv        434.1   391.5   -9.81%
inv_txfm_add_16x4_flipadst_adst_1_8bpc_rvv        434.1   392.0   -9.70%
inv_txfm_add_16x4_flipadst_adst_2_8bpc_rvv        434.1   392.0   -9.70%
inv_txfm_add_16x4_flipadst_dct_0_8bpc_rvv         423.5   375.5  -11.33%
inv_txfm_add_16x4_flipadst_dct_1_8bpc_rvv         423.5   375.4  -11.36%
inv_txfm_add_16x4_flipadst_dct_2_8bpc_rvv         423.5   375.5  -11.33%
inv_txfm_add_16x4_flipadst_flipadst_0_8bpc_rvv    438.0   396.1   -9.57%
inv_txfm_add_16x4_flipadst_flipadst_1_8bpc_rvv    438.1   396.0   -9.61%
inv_txfm_add_16x4_flipadst_flipadst_2_8bpc_rvv    438.0   395.8   -9.63%
inv_txfm_add_16x4_flipadst_identity_0_8bpc_rvv    361.9   333.0   -7.99%
inv_txfm_add_16x4_flipadst_identity_1_8bpc_rvv    362.4   333.0   -8.11%
inv_txfm_add_16x4_flipadst_identity_2_8bpc_rvv    362.4   333.0   -8.11%
inv_txfm_add_16x4_identity_adst_0_8bpc_rvv        308.3   296.3   -3.89%
inv_txfm_add_16x4_identity_adst_1_8bpc_rvv        308.4   296.4   -3.89%
inv_txfm_add_16x4_identity_adst_2_8bpc_rvv        308.4   296.4   -3.89%
inv_txfm_add_16x4_identity_dct_0_8bpc_rvv         289.9   279.9   -3.45%
inv_txfm_add_16x4_identity_dct_1_8bpc_rvv         289.9   280.0   -3.41%
inv_txfm_add_16x4_identity_dct_2_8bpc_rvv         290.0   279.9   -3.48%
inv_txfm_add_16x4_identity_flipadst_0_8bpc_rvv    311.2   298.9   -3.95%
inv_txfm_add_16x4_identity_flipadst_1_8bpc_rvv    311.1   298.9   -3.92%
inv_txfm_add_16x4_identity_flipadst_2_8bpc_rvv    310.9   298.9   -3.86%
inv_txfm_add_16x4_identity_identity_0_8bpc_rvv    238.4   243.2    2.01%
inv_txfm_add_16x4_identity_identity_1_8bpc_rvv    238.4   243.2    2.01%
inv_txfm_add_16x4_identity_identity_2_8bpc_rvv    238.5   243.2    1.97%

inv_txfm_add_8x16_adst_adst_0_8bpc_rvv            701.5   624.2  -11.02%
inv_txfm_add_8x16_adst_adst_1_8bpc_rvv            701.6   624.2  -11.03%
inv_txfm_add_8x16_adst_adst_2_8bpc_rvv            853.5   755.2  -11.52%
inv_txfm_add_8x16_adst_dct_0_8bpc_rvv             611.1   551.6   -9.74%
inv_txfm_add_8x16_adst_dct_1_8bpc_rvv             611.2   551.7   -9.73%
inv_txfm_add_8x16_adst_dct_2_8bpc_rvv             765.0   682.8  -10.75%
inv_txfm_add_8x16_adst_flipadst_0_8bpc_rvv        703.4   629.3  -10.53%
inv_txfm_add_8x16_adst_flipadst_1_8bpc_rvv        703.4   629.5  -10.51%
inv_txfm_add_8x16_adst_flipadst_2_8bpc_rvv        858.1   763.9  -10.98%
inv_txfm_add_8x16_adst_identity_0_8bpc_rvv        463.7   440.2   -5.07%
inv_txfm_add_8x16_adst_identity_1_8bpc_rvv        464.3   440.2   -5.19%
inv_txfm_add_8x16_adst_identity_2_8bpc_rvv        618.6   571.7   -7.58%
inv_txfm_add_8x16_dct_adst_0_8bpc_rvv             660.3   590.5  -10.57%
inv_txfm_add_8x16_dct_adst_1_8bpc_rvv             660.2   590.3  -10.59%
inv_txfm_add_8x16_dct_adst_2_8bpc_rvv             776.2   687.9  -11.38%
inv_txfm_add_8x16_dct_dct_0_8bpc_rvv              566.9   516.3   -8.93%
inv_txfm_add_8x16_dct_dct_1_8bpc_rvv              567.1   516.4   -8.94%
inv_txfm_add_8x16_dct_dct_2_8bpc_rvv              685.9   616.6  -10.10%
inv_txfm_add_8x16_dct_flipadst_0_8bpc_rvv         663.3   593.5  -10.52%
inv_txfm_add_8x16_dct_flipadst_1_8bpc_rvv         663.2   593.5  -10.51%
inv_txfm_add_8x16_dct_flipadst_2_8bpc_rvv         771.7   690.5  -10.52%
inv_txfm_add_8x16_dct_identity_0_8bpc_rvv         421.3   406.1   -3.61%
inv_txfm_add_8x16_dct_identity_1_8bpc_rvv         421.3   406.1   -3.61%
inv_txfm_add_8x16_dct_identity_2_8bpc_rvv         536.6   503.6   -6.15%
inv_txfm_add_8x16_flipadst_adst_0_8bpc_rvv        703.3   627.1  -10.83%
inv_txfm_add_8x16_flipadst_adst_1_8bpc_rvv        703.4   627.2  -10.83%
inv_txfm_add_8x16_flipadst_adst_2_8bpc_rvv        857.7   763.7  -10.96%
inv_txfm_add_8x16_flipadst_dct_0_8bpc_rvv         613.5   552.8   -9.89%
inv_txfm_add_8x16_flipadst_dct_1_8bpc_rvv         613.4   552.7   -9.90%
inv_txfm_add_8x16_flipadst_dct_2_8bpc_rvv         771.0   693.1  -10.10%
inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_rvv    706.3   631.4  -10.60%
inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_rvv    706.5   631.7  -10.59%
inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_rvv    861.1    76.9  -11.17%
inv_txfm_add_8x16_flipadst_identity_0_8bpc_rvv    467.0   443.0   -5.14%
inv_txfm_add_8x16_flipadst_identity_1_8bpc_rvv    467.0   443.0   -5.14%
inv_txfm_add_8x16_flipadst_identity_2_8bpc_rvv    623.7   575.1   -7.79%
inv_txfm_add_8x16_identity_adst_0_8bpc_rvv        565.6   512.0   -9.48%
inv_txfm_add_8x16_identity_adst_1_8bpc_rvv        565.6   512.9   -9.32%
inv_txfm_add_8x16_identity_adst_2_8bpc_rvv        585.6   532.8   -9.02%
inv_txfm_add_8x16_identity_dct_0_8bpc_rvv         476.4   439.9   -7.66%
inv_txfm_add_8x16_identity_dct_1_8bpc_rvv         476.4   440.0   -7.64%
inv_txfm_add_8x16_identity_dct_2_8bpc_rvv         496.3   459.5   -7.41%
inv_txfm_add_8x16_identity_flipadst_0_8bpc_rvv    570.7   516.4   -9.51%
inv_txfm_add_8x16_identity_flipadst_1_8bpc_rvv    570.6   516.3   -9.52%
inv_txfm_add_8x16_identity_flipadst_2_8bpc_rvv    590.2   540.0   -8.51%
inv_txfm_add_8x16_identity_identity_0_8bpc_rvv    330.9   329.9   -0.30%
inv_txfm_add_8x16_identity_identity_1_8bpc_rvv    330.9   329.9   -0.30%
inv_txfm_add_8x16_identity_identity_2_8bpc_rvv    350.8   349.7   -0.31%

inv_txfm_add_16x8_adst_adst_0_8bpc_rvv            855.5   752.1  -12.09%
inv_txfm_add_16x8_adst_adst_1_8bpc_rvv            855.5   751.9  -12.11%
inv_txfm_add_16x8_adst_adst_2_8bpc_rvv            855.4   752.1  -12.08%
inv_txfm_add_16x8_adst_dct_0_8bpc_rvv             765.4   685.5  -10.44%
inv_txfm_add_16x8_adst_dct_1_8bpc_rvv             765.5   685.3  -10.48%
inv_txfm_add_16x8_adst_dct_2_8bpc_rvv             765.5   685.5  -10.45%
inv_txfm_add_16x8_adst_flipadst_0_8bpc_rvv        859.2   755.8  -12.03%
inv_txfm_add_16x8_adst_flipadst_1_8bpc_rvv        859.1   756.0  -12.00%
inv_txfm_add_16x8_adst_flipadst_2_8bpc_rvv        859.1   755.9  -12.01%
inv_txfm_add_16x8_adst_identity_0_8bpc_rvv        612.8   561.9   -8.31%
inv_txfm_add_16x8_adst_identity_1_8bpc_rvv        612.9   561.9   -8.32%
inv_txfm_add_16x8_adst_identity_2_8bpc_rvv        612.8   561.9   -8.31%
inv_txfm_add_16x8_dct_adst_0_8bpc_rvv             765.1   676.0  -11.65%
inv_txfm_add_16x8_dct_adst_1_8bpc_rvv             765.0   676.2  -11.61%
inv_txfm_add_16x8_dct_adst_2_8bpc_rvv             765.0   676.2  -11.61%
inv_txfm_add_16x8_dct_dct_0_8bpc_rvv              674.5   612.0   -9.27%
inv_txfm_add_16x8_dct_dct_1_8bpc_rvv              674.5   612.1   -9.25%
inv_txfm_add_16x8_dct_dct_2_8bpc_rvv              674.6   612.0   -9.28%
inv_txfm_add_16x8_dct_flipadst_0_8bpc_rvv         777.2   679.9  -12.52%
inv_txfm_add_16x8_dct_flipadst_1_8bpc_rvv         777.1   680.1  -12.48%
inv_txfm_add_16x8_dct_flipadst_2_8bpc_rvv         777.1   680.0  -12.50%
inv_txfm_add_16x8_dct_identity_0_8bpc_rvv         522.2   488.2   -6.51%
inv_txfm_add_16x8_dct_identity_1_8bpc_rvv         522.1   488.2   -6.49%
inv_txfm_add_16x8_dct_identity_2_8bpc_rvv         522.1   487.5   -6.63%
inv_txfm_add_16x8_flipadst_adst_0_8bpc_rvv        859.2   753.5  -12.30%
inv_txfm_add_16x8_flipadst_adst_1_8bpc_rvv        859.2   753.6  -12.29%
inv_txfm_add_16x8_flipadst_adst_2_8bpc_rvv        859.2   753.5  -12.30%
inv_txfm_add_16x8_flipadst_dct_0_8bpc_rvv         768.9   689.0  -10.39%
inv_txfm_add_16x8_flipadst_dct_1_8bpc_rvv         768.9   689.2  -10.37%
inv_txfm_add_16x8_flipadst_dct_2_8bpc_rvv         768.8   689.2  -10.35%
inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_rvv    863.0   758.7  -12.09%
inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_rvv    862.9   758.7  -12.08%
inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_rvv    863.0   758.6  -12.10%
inv_txfm_add_16x8_flipadst_identity_0_8bpc_rvv    616.5   566.7   -8.08%
inv_txfm_add_16x8_flipadst_identity_1_8bpc_rvv    616.6   566.6   -8.11%
inv_txfm_add_16x8_flipadst_identity_2_8bpc_rvv    616.3   567.0   -8.00%
inv_txfm_add_16x8_identity_adst_0_8bpc_rvv        618.1   564.5   -8.67%
inv_txfm_add_16x8_identity_adst_1_8bpc_rvv        618.0   564.5   -8.66%
inv_txfm_add_16x8_identity_adst_2_8bpc_rvv        617.7   564.6   -8.60%
inv_txfm_add_16x8_identity_dct_0_8bpc_rvv         527.9   500.6   -5.17%
inv_txfm_add_16x8_identity_dct_1_8bpc_rvv         527.8   500.7   -5.13%
inv_txfm_add_16x8_identity_dct_2_8bpc_rvv         527.7   500.7   -5.12%
inv_txfm_add_16x8_identity_flipadst_0_8bpc_rvv    622.3   568.5   -8.65%
inv_txfm_add_16x8_identity_flipadst_1_8bpc_rvv    622.2   568.5   -8.63%
inv_txfm_add_16x8_identity_flipadst_2_8bpc_rvv    622.3   568.4   -8.66%
inv_txfm_add_16x8_identity_identity_0_8bpc_rvv    373.4   374.4    0.27%
inv_txfm_add_16x8_identity_identity_1_8bpc_rvv    373.4   374.5    0.29%
inv_txfm_add_16x8_identity_identity_2_8bpc_rvv    373.4   374.4    0.27%

SpacemiT K1                                        Old     New     Delta

inv_txfm_add_4x4_adst_adst_0_8bpc_rvv             101.0    96.8   -4.16%
inv_txfm_add_4x4_adst_adst_1_8bpc_rvv             101.1    96.8   -4.25%
inv_txfm_add_4x4_adst_dct_0_8bpc_rvv               96.8    91.7   -5.27%
inv_txfm_add_4x4_adst_dct_1_8bpc_rvv               95.9    91.8   -4.28%
inv_txfm_add_4x4_adst_flipadst_0_8bpc_rvv         102.2    97.9   -4.21%
inv_txfm_add_4x4_adst_flipadst_1_8bpc_rvv         102.2    97.9   -4.21%
inv_txfm_add_4x4_adst_identity_0_8bpc_rvv          82.4    80.4   -2.43%
inv_txfm_add_4x4_adst_identity_1_8bpc_rvv          82.4    80.5   -2.31%
inv_txfm_add_4x4_dct_adst_0_8bpc_rvv               97.3    92.6   -4.83%
inv_txfm_add_4x4_dct_adst_1_8bpc_rvv               97.2    92.3   -5.04%
inv_txfm_add_4x4_dct_dct_0_8bpc_rvv                41.2    41.3    0.24%
inv_txfm_add_4x4_dct_dct_1_8bpc_rvv                96.0    87.5   -8.85%
inv_txfm_add_4x4_dct_flipadst_0_8bpc_rvv           98.5    94.5   -4.06%
inv_txfm_add_4x4_dct_flipadst_1_8bpc_rvv           98.6    94.7   -3.96%
inv_txfm_add_4x4_dct_identity_0_8bpc_rvv           78.6    76.1   -3.18%
inv_txfm_add_4x4_dct_identity_1_8bpc_rvv           78.6    76.0   -3.31%
inv_txfm_add_4x4_flipadst_adst_0_8bpc_rvv         104.3    99.1   -4.99%
inv_txfm_add_4x4_flipadst_adst_1_8bpc_rvv         104.4    99.1   -5.08%
inv_txfm_add_4x4_flipadst_dct_0_8bpc_rvv           98.0    94.6   -3.47%
inv_txfm_add_4x4_flipadst_dct_1_8bpc_rvv           98.1    94.4   -3.77%
inv_txfm_add_4x4_flipadst_flipadst_0_8bpc_rvv     104.2    99.2   -4.80%
inv_txfm_add_4x4_flipadst_flipadst_1_8bpc_rvv     104.3    99.2   -4.89%
inv_txfm_add_4x4_flipadst_identity_0_8bpc_rvv      86.9    81.8   -5.87%
inv_txfm_add_4x4_flipadst_identity_1_8bpc_rvv      87.0    81.9   -5.86%
inv_txfm_add_4x4_identity_adst_0_8bpc_rvv          86.0    80.8   -6.05%
inv_txfm_add_4x4_identity_adst_1_8bpc_rvv          85.9    81.4   -5.24%
inv_txfm_add_4x4_identity_dct_0_8bpc_rvv           78.5    76.1   -3.06%
inv_txfm_add_4x4_identity_dct_1_8bpc_rvv           78.6    76.1   -3.18%
inv_txfm_add_4x4_identity_flipadst_0_8bpc_rvv      85.9    82.5   -3.96%
inv_txfm_add_4x4_identity_flipadst_1_8bpc_rvv      85.9    82.3   -4.19%
inv_txfm_add_4x4_identity_identity_0_8bpc_rvv      65.9    64.9   -1.52%
inv_txfm_add_4x4_identity_identity_1_8bpc_rvv      65.9    64.8   -1.67%
inv_txfm_add_4x4_wht_wht_0_8bpc_rvv                71.2    71.3    0.14%
inv_txfm_add_4x4_wht_wht_1_8bpc_rvv                71.2    71.3    0.14%

inv_txfm_add_8x8_adst_adst_0_8bpc_rvv             440.6   399.3   -9.37%
inv_txfm_add_8x8_adst_adst_1_8bpc_rvv             440.6   399.3   -9.37%
inv_txfm_add_8x8_adst_dct_0_8bpc_rvv              401.7   368.4   -8.29%
inv_txfm_add_8x8_adst_dct_1_8bpc_rvv              401.8   368.4   -8.31%
inv_txfm_add_8x8_adst_flipadst_0_8bpc_rvv         442.4   401.2   -9.31%
inv_txfm_add_8x8_adst_flipadst_1_8bpc_rvv         442.4   401.1   -9.34%
inv_txfm_add_8x8_adst_identity_0_8bpc_rvv         329.7   310.1   -5.94%
inv_txfm_add_8x8_adst_identity_1_8bpc_rvv         329.7   310.1   -5.94%
inv_txfm_add_8x8_dct_adst_0_8bpc_rvv              401.8   367.4   -8.56%
inv_txfm_add_8x8_dct_adst_1_8bpc_rvv              401.7   367.3   -8.56%
inv_txfm_add_8x8_dct_dct_0_8bpc_rvv                79.5    80.2    0.88%
inv_txfm_add_8x8_dct_dct_1_8bpc_rvv               362.1   335.8   -7.26%
inv_txfm_add_8x8_dct_flipadst_0_8bpc_rvv          405.0   369.2   -8.84%
inv_txfm_add_8x8_dct_flipadst_1_8bpc_rvv          405.1   369.2   -8.86%
inv_txfm_add_8x8_dct_identity_0_8bpc_rvv          290.9   278.2   -4.37%
inv_txfm_add_8x8_dct_identity_1_8bpc_rvv          290.8   278.2   -4.33%
inv_txfm_add_8x8_flipadst_adst_0_8bpc_rvv         442.5   401.1   -9.36%
inv_txfm_add_8x8_flipadst_adst_1_8bpc_rvv         442.5   401.2   -9.33%
inv_txfm_add_8x8_flipadst_dct_0_8bpc_rvv          405.8   369.2   -9.02%
inv_txfm_add_8x8_flipadst_dct_1_8bpc_rvv          405.8   369.1   -9.04%
inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_rvv     444.3   403.0   -9.30%
inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_rvv     444.3   403.1   -9.27%
inv_txfm_add_8x8_flipadst_identity_0_8bpc_rvv     331.6   310.9   -6.24%
inv_txfm_add_8x8_flipadst_identity_1_8bpc_rvv     331.6   310.9   -6.24%
inv_txfm_add_8x8_identity_adst_0_8bpc_rvv         313.3   292.6   -6.61%
inv_txfm_add_8x8_identity_adst_1_8bpc_rvv         313.1   292.6   -6.55%
inv_txfm_add_8x8_identity_dct_0_8bpc_rvv          274.5   260.6   -5.06%
inv_txfm_add_8x8_identity_dct_1_8bpc_rvv          274.4   260.7   -4.99%
inv_txfm_add_8x8_identity_flipadst_0_8bpc_rvv     315.3   294.4   -6.63%
inv_txfm_add_8x8_identity_flipadst_1_8bpc_rvv     315.3   294.4   -6.63%
inv_txfm_add_8x8_identity_identity_0_8bpc_rvv     202.5   202.5    0.00%
inv_txfm_add_8x8_identity_identity_1_8bpc_rvv     202.6   202.5   -0.05%

inv_txfm_add_16x16_adst_adst_0_8bpc_rvv          1418.8  1268.2  -10.61%
inv_txfm_add_16x16_adst_adst_1_8bpc_rvv          1418.9  1268.3  -10.61%
inv_txfm_add_16x16_adst_adst_2_8bpc_rvv          1943.3  1733.6  -10.79%
inv_txfm_add_16x16_adst_dct_0_8bpc_rvv           1241.7  1134.6   -8.63%
inv_txfm_add_16x16_adst_dct_1_8bpc_rvv           1241.5  1134.5   -8.62%
inv_txfm_add_16x16_adst_dct_2_8bpc_rvv           1772.5  1599.8   -9.74%
inv_txfm_add_16x16_adst_flipadst_0_8bpc_rvv      1429.8  1270.3  -11.16%
inv_txfm_add_16x16_adst_flipadst_1_8bpc_rvv      1429.7  1270.1  -11.16%
inv_txfm_add_16x16_adst_flipadst_2_8bpc_rvv      1951.1  1741.4  -10.75%
inv_txfm_add_16x16_dct_adst_0_8bpc_rvv           1337.8  1195.8  -10.61%
inv_txfm_add_16x16_dct_adst_1_8bpc_rvv           1337.5  1196.0  -10.58%
inv_txfm_add_16x16_dct_adst_2_8bpc_rvv           1763.2  1604.6   -9.00%
inv_txfm_add_16x16_dct_dct_0_8bpc_rvv             179.3   181.1    1.00%
inv_txfm_add_16x16_dct_dct_1_8bpc_rvv            1153.8  1060.7   -8.07%
inv_txfm_add_16x16_dct_dct_2_8bpc_rvv            1601.6  1470.6   -8.18%
inv_txfm_add_16x16_dct_flipadst_0_8bpc_rvv       1340.7  1199.8  -10.51%
inv_txfm_add_16x16_dct_flipadst_1_8bpc_rvv       1340.4  1199.8  -10.49%
inv_txfm_add_16x16_dct_flipadst_2_8bpc_rvv       1771.2  1606.6   -9.29%
inv_txfm_add_16x16_dct_identity_0_8bpc_rvv        877.9   854.9   -2.62%
inv_txfm_add_16x16_dct_identity_1_8bpc_rvv        877.7   855.2   -2.56%
inv_txfm_add_16x16_dct_identity_2_8bpc_rvv       1311.6  1254.1   -4.38%
inv_txfm_add_16x16_flipadst_adst_0_8bpc_rvv      1428.2  1270.5  -11.04%
inv_txfm_add_16x16_flipadst_adst_1_8bpc_rvv      1428.3  1270.6  -11.04%
inv_txfm_add_16x16_flipadst_adst_2_8bpc_rvv      1947.3  1737.3  -10.78%
inv_txfm_add_16x16_flipadst_dct_0_8bpc_rvv       1245.8  1133.5   -9.01%
inv_txfm_add_16x16_flipadst_dct_1_8bpc_rvv       1246.0  1133.7   -9.01%
inv_txfm_add_16x16_flipadst_dct_2_8bpc_rvv       1769.9  1603.9   -9.38%
inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_rvv  1428.7  1279.7  -10.43%
inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_rvv  1428.8  1279.5  -10.45%
inv_txfm_add_16x16_flipadst_flipadst_2_8bpc_rvv  1960.8  1745.8  -10.96%
inv_txfm_add_16x16_identity_dct_0_8bpc_rvv       1016.6   948.8   -6.67%
inv_txfm_add_16x16_identity_dct_1_8bpc_rvv       1016.7   948.8   -6.68%
inv_txfm_add_16x16_identity_dct_2_8bpc_rvv       1319.8  1247.7   -5.46%
inv_txfm_add_16x16_identity_identity_0_8bpc_rvv   735.4   736.6    0.16%
inv_txfm_add_16x16_identity_identity_1_8bpc_rvv   735.3   736.4    0.15%
inv_txfm_add_16x16_identity_identity_2_8bpc_rvv  1037.8  1036.7   -0.11%

inv_txfm_add_4x8_adst_adst_0_8bpc_rvv             197.2   179.9   -8.77%
inv_txfm_add_4x8_adst_adst_1_8bpc_rvv             197.1   180.0   -8.68%
inv_txfm_add_4x8_adst_dct_0_8bpc_rvv              177.5   164.2   -7.49%
inv_txfm_add_4x8_adst_dct_1_8bpc_rvv              177.5   164.3   -7.44%
inv_txfm_add_4x8_adst_flipadst_0_8bpc_rvv         199.3   181.8   -8.78%
inv_txfm_add_4x8_adst_flipadst_1_8bpc_rvv         199.0   181.8   -8.64%
inv_txfm_add_4x8_adst_identity_0_8bpc_rvv         126.7   121.8   -3.87%
inv_txfm_add_4x8_adst_identity_1_8bpc_rvv         126.7   121.9   -3.79%
inv_txfm_add_4x8_dct_adst_0_8bpc_rvv              189.8   172.4   -9.17%
inv_txfm_add_4x8_dct_adst_1_8bpc_rvv              189.8   172.4   -9.17%
inv_txfm_add_4x8_dct_dct_0_8bpc_rvv               170.2   156.8   -7.87%
inv_txfm_add_4x8_dct_dct_1_8bpc_rvv               170.2   156.9   -7.81%
inv_txfm_add_4x8_dct_flipadst_0_8bpc_rvv          192.6   174.2   -9.55%
inv_txfm_add_4x8_dct_flipadst_1_8bpc_rvv          192.6   174.2   -9.55%
inv_txfm_add_4x8_dct_identity_0_8bpc_rvv          119.4   114.3   -4.27%
inv_txfm_add_4x8_dct_identity_1_8bpc_rvv          119.6   114.2   -4.52%
inv_txfm_add_4x8_flipadst_adst_0_8bpc_rvv         197.7   180.5   -8.70%
inv_txfm_add_4x8_flipadst_adst_1_8bpc_rvv         197.8   180.6   -8.70%
inv_txfm_add_4x8_flipadst_dct_0_8bpc_rvv          178.3   165.0   -7.46%
inv_txfm_add_4x8_flipadst_dct_1_8bpc_rvv          178.3   164.9   -7.52%
inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_rvv     199.7   182.5   -8.61%
inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_rvv     200.0   182.4   -8.80%
inv_txfm_add_4x8_flipadst_identity_0_8bpc_rvv     127.2   122.3   -3.85%
inv_txfm_add_4x8_flipadst_identity_1_8bpc_rvv     127.3   122.5   -3.77%
inv_txfm_add_4x8_identity_adst_0_8bpc_rvv         172.1   155.0   -9.94%
inv_txfm_add_4x8_identity_adst_1_8bpc_rvv         172.1   155.0   -9.94%
inv_txfm_add_4x8_identity_dct_0_8bpc_rvv          148.7   139.4   -6.25%
inv_txfm_add_4x8_identity_dct_1_8bpc_rvv          148.7   139.5   -6.19%
inv_txfm_add_4x8_identity_flipadst_0_8bpc_rvv     171.7   156.8   -8.68%
inv_txfm_add_4x8_identity_flipadst_1_8bpc_rvv     171.6   156.9   -8.57%
inv_txfm_add_4x8_identity_identity_0_8bpc_rvv      96.8    96.8    0.00%
inv_txfm_add_4x8_identity_identity_1_8bpc_rvv      96.7    96.7    0.00%

inv_txfm_add_8x4_adst_adst_0_8bpc_rvv             228.1   220.0   -3.55%
inv_txfm_add_8x4_adst_adst_1_8bpc_rvv             227.9   219.9   -3.51%
inv_txfm_add_8x4_adst_dct_0_8bpc_rvv              219.4   206.4   -5.93%
inv_txfm_add_8x4_adst_dct_1_8bpc_rvv              219.4   206.4   -5.93%
inv_txfm_add_8x4_adst_flipadst_0_8bpc_rvv         229.4   214.7   -6.41%
inv_txfm_add_8x4_adst_flipadst_1_8bpc_rvv         229.4   214.8   -6.36%
inv_txfm_add_8x4_adst_identity_0_8bpc_rvv         195.6   187.6   -4.09%
inv_txfm_add_8x4_adst_identity_1_8bpc_rvv         195.8   187.6   -4.19%
inv_txfm_add_8x4_dct_adst_0_8bpc_rvv              207.0   195.2   -5.70%
inv_txfm_add_8x4_dct_adst_1_8bpc_rvv              206.9   195.2   -5.65%
inv_txfm_add_8x4_dct_dct_0_8bpc_rvv               199.4   188.2   -5.62%
inv_txfm_add_8x4_dct_dct_1_8bpc_rvv               199.4   188.5   -5.47%
inv_txfm_add_8x4_dct_flipadst_0_8bpc_rvv          209.5   196.5   -6.21%
inv_txfm_add_8x4_dct_flipadst_1_8bpc_rvv          209.7   196.6   -6.25%
inv_txfm_add_8x4_dct_identity_0_8bpc_rvv          175.7   169.5   -3.53%
inv_txfm_add_8x4_dct_identity_1_8bpc_rvv          175.9   169.6   -3.58%
inv_txfm_add_8x4_flipadst_adst_0_8bpc_rvv         229.0   214.7   -6.24%
inv_txfm_add_8x4_flipadst_adst_1_8bpc_rvv         229.3   214.5   -6.45%
inv_txfm_add_8x4_flipadst_dct_0_8bpc_rvv          220.9   206.7   -6.43%
inv_txfm_add_8x4_flipadst_dct_1_8bpc_rvv          220.6   206.5   -6.39%
inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_rvv     230.6   215.9   -6.37%
inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_rvv     230.7   215.9   -6.42%
inv_txfm_add_8x4_flipadst_identity_0_8bpc_rvv     196.9   188.9   -4.06%
inv_txfm_add_8x4_flipadst_identity_1_8bpc_rvv     196.9   188.9   -4.06%
inv_txfm_add_8x4_identity_adst_0_8bpc_rvv         157.6   154.7   -1.84%
inv_txfm_add_8x4_identity_adst_1_8bpc_rvv         157.5   154.9   -1.65%
inv_txfm_add_8x4_identity_dct_0_8bpc_rvv          150.0   147.9   -1.40%
inv_txfm_add_8x4_identity_dct_1_8bpc_rvv          150.0   147.7   -1.53%
inv_txfm_add_8x4_identity_flipadst_0_8bpc_rvv     159.6   155.9   -2.32%
inv_txfm_add_8x4_identity_flipadst_1_8bpc_rvv     159.8   155.6   -2.63%
inv_txfm_add_8x4_identity_identity_0_8bpc_rvv     128.6   128.8    0.16%
inv_txfm_add_8x4_identity_identity_1_8bpc_rvv     128.4   129.3    0.70%

inv_txfm_add_4x16_adst_adst_0_8bpc_rvv            373.8   335.9  -10.14%
inv_txfm_add_4x16_adst_adst_1_8bpc_rvv            373.8   335.7  -10.19%
inv_txfm_add_4x16_adst_adst_2_8bpc_rvv            417.4   380.0   -8.96%
inv_txfm_add_4x16_adst_dct_0_8bpc_rvv             328.3   301.7   -8.10%
inv_txfm_add_4x16_adst_dct_1_8bpc_rvv             328.0   302.0   -7.93%
inv_txfm_add_4x16_adst_dct_2_8bpc_rvv             374.3   351.3   -6.14%
inv_txfm_add_4x16_adst_flipadst_0_8bpc_rvv        374.5   339.8   -9.27%
inv_txfm_add_4x16_adst_flipadst_1_8bpc_rvv        374.3   339.4   -9.32%
inv_txfm_add_4x16_adst_flipadst_2_8bpc_rvv        422.0   383.8   -9.05%
inv_txfm_add_4x16_adst_identity_0_8bpc_rvv        248.0   242.9   -2.06%
inv_txfm_add_4x16_adst_identity_1_8bpc_rvv        248.0   242.2   -2.34%
inv_txfm_add_4x16_adst_identity_2_8bpc_rvv        298.6   290.3   -2.78%
inv_txfm_add_4x16_dct_adst_0_8bpc_rvv             370.5   329.4  -11.09%
inv_txfm_add_4x16_dct_adst_1_8bpc_rvv             370.8   329.0  -11.27%
inv_txfm_add_4x16_dct_adst_2_8bpc_rvv             409.1   360.9  -11.78%
inv_txfm_add_4x16_dct_dct_0_8bpc_rvv              321.1   293.7   -8.53%
inv_txfm_add_4x16_dct_dct_1_8bpc_rvv              321.0   294.3   -8.32%
inv_txfm_add_4x16_dct_dct_2_8bpc_rvv              357.8   329.8   -7.83%
inv_txfm_add_4x16_dct_flipadst_0_8bpc_rvv         369.7   332.9   -9.95%
inv_txfm_add_4x16_dct_flipadst_1_8bpc_rvv         370.4   333.0  -10.10%
inv_txfm_add_4x16_dct_flipadst_2_8bpc_rvv         405.5   364.9  -10.01%
inv_txfm_add_4x16_dct_identity_0_8bpc_rvv         241.6   236.6   -2.07%
inv_txfm_add_4x16_dct_identity_1_8bpc_rvv         241.8   235.6   -2.56%
inv_txfm_add_4x16_dct_identity_2_8bpc_rvv         281.9   266.9   -5.32%
inv_txfm_add_4x16_flipadst_adst_0_8bpc_rvv        371.9   337.3   -9.30%
inv_txfm_add_4x16_flipadst_adst_1_8bpc_rvv        372.2   337.1   -9.43%
inv_txfm_add_4x16_flipadst_adst_2_8bpc_rvv        419.8   381.5   -9.12%
inv_txfm_add_4x16_flipadst_dct_0_8bpc_rvv         328.3   302.9   -7.74%
inv_txfm_add_4x16_flipadst_dct_1_8bpc_rvv         328.4   303.3   -7.64%
inv_txfm_add_4x16_flipadst_dct_2_8bpc_rvv         380.6   343.7   -9.70%
inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_rvv    377.7   341.1   -9.69%
inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_rvv    377.6   341.5   -9.56%
inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_rvv    423.6   386.7   -8.71%
inv_txfm_add_4x16_flipadst_identity_0_8bpc_rvv    250.0   245.7   -1.72%
inv_txfm_add_4x16_flipadst_identity_1_8bpc_rvv    249.3   246.0   -1.32%
inv_txfm_add_4x16_flipadst_identity_2_8bpc_rvv    296.4   284.7   -3.95%
inv_txfm_add_4x16_identity_adst_0_8bpc_rvv        343.0   311.2   -9.27%
inv_txfm_add_4x16_identity_adst_1_8bpc_rvv        342.9   311.0   -9.30%
inv_txfm_add_4x16_identity_adst_2_8bpc_rvv        354.8   325.0   -8.40%
inv_txfm_add_4x16_identity_dct_0_8bpc_rvv         298.9   274.9   -8.03%
inv_txfm_add_4x16_identity_dct_1_8bpc_rvv         298.8   275.0   -7.97%
inv_txfm_add_4x16_identity_dct_2_8bpc_rvv         310.3   289.1   -6.83%
inv_txfm_add_4x16_identity_flipadst_0_8bpc_rvv    344.7   314.9   -8.65%
inv_txfm_add_4x16_identity_flipadst_1_8bpc_rvv    344.5   314.8   -8.62%
inv_txfm_add_4x16_identity_flipadst_2_8bpc_rvv    358.3   328.6   -8.29%
inv_txfm_add_4x16_identity_identity_0_8bpc_rvv    219.6   216.1   -1.59%
inv_txfm_add_4x16_identity_identity_1_8bpc_rvv    218.3   216.3   -0.92%
inv_txfm_add_4x16_identity_identity_2_8bpc_rvv    231.3   229.6   -0.73%

inv_txfm_add_16x4_adst_adst_0_8bpc_rvv            468.5   428.8   -8.47%
inv_txfm_add_16x4_adst_adst_1_8bpc_rvv            468.5   428.9   -8.45%
inv_txfm_add_16x4_adst_adst_2_8bpc_rvv            468.5   428.9   -8.45%
inv_txfm_add_16x4_adst_dct_0_8bpc_rvv             453.8   414.5   -8.66%
inv_txfm_add_16x4_adst_dct_1_8bpc_rvv             453.8   414.5   -8.66%
inv_txfm_add_16x4_adst_dct_2_8bpc_rvv             453.9   414.4   -8.70%
inv_txfm_add_16x4_adst_flipadst_0_8bpc_rvv        471.0   431.5   -8.39%
inv_txfm_add_16x4_adst_flipadst_1_8bpc_rvv        471.0   431.3   -8.43%
inv_txfm_add_16x4_adst_flipadst_2_8bpc_rvv        471.0   431.5   -8.39%
inv_txfm_add_16x4_adst_identity_0_8bpc_rvv        402.2   375.0   -6.76%
inv_txfm_add_16x4_adst_identity_1_8bpc_rvv        402.1   375.0   -6.74%
inv_txfm_add_16x4_adst_identity_2_8bpc_rvv        402.0   375.3   -6.64%
inv_txfm_add_16x4_dct_adst_0_8bpc_rvv             432.8   392.5   -9.31%
inv_txfm_add_16x4_dct_adst_1_8bpc_rvv             432.8   392.5   -9.31%
inv_txfm_add_16x4_dct_adst_2_8bpc_rvv             432.8   392.5   -9.31%
inv_txfm_add_16x4_dct_dct_0_8bpc_rvv              407.9   378.3   -7.26%
inv_txfm_add_16x4_dct_dct_1_8bpc_rvv              407.8   378.1   -7.28%
inv_txfm_add_16x4_dct_dct_2_8bpc_rvv              407.8   378.1   -7.28%
inv_txfm_add_16x4_dct_flipadst_0_8bpc_rvv         426.0   395.1   -7.25%
inv_txfm_add_16x4_dct_flipadst_1_8bpc_rvv         425.9   395.0   -7.26%
inv_txfm_add_16x4_dct_flipadst_2_8bpc_rvv         426.0   395.1   -7.25%
inv_txfm_add_16x4_dct_identity_0_8bpc_rvv         357.1   338.7   -5.15%
inv_txfm_add_16x4_dct_identity_1_8bpc_rvv         357.1   338.7   -5.15%
inv_txfm_add_16x4_dct_identity_2_8bpc_rvv         357.2   338.7   -5.18%
inv_txfm_add_16x4_flipadst_adst_0_8bpc_rvv        472.4   432.6   -8.43%
inv_txfm_add_16x4_flipadst_adst_1_8bpc_rvv        472.2   432.6   -8.39%
inv_txfm_add_16x4_flipadst_adst_2_8bpc_rvv        472.3   432.7   -8.38%
inv_txfm_add_16x4_flipadst_dct_0_8bpc_rvv         464.3   418.2   -9.93%
inv_txfm_add_16x4_flipadst_dct_1_8bpc_rvv         464.2   418.2   -9.91%
inv_txfm_add_16x4_flipadst_dct_2_8bpc_rvv         464.2   418.2   -9.91%
inv_txfm_add_16x4_flipadst_flipadst_0_8bpc_rvv    474.7   435.1   -8.34%
inv_txfm_add_16x4_flipadst_flipadst_1_8bpc_rvv    474.8   435.1   -8.36%
inv_txfm_add_16x4_flipadst_flipadst_2_8bpc_rvv    474.7   435.1   -8.34%
inv_txfm_add_16x4_flipadst_identity_0_8bpc_rvv    405.9   378.8   -6.68%
inv_txfm_add_16x4_flipadst_identity_1_8bpc_rvv    406.0   378.8   -6.70%
inv_txfm_add_16x4_flipadst_identity_2_8bpc_rvv    406.0   378.8   -6.70%
inv_txfm_add_16x4_identity_adst_0_8bpc_rvv        353.7   342.2   -3.25%
inv_txfm_add_16x4_identity_adst_1_8bpc_rvv        353.8   342.3   -3.25%
inv_txfm_add_16x4_identity_adst_2_8bpc_rvv        353.7   342.4   -3.19%
inv_txfm_add_16x4_identity_dct_0_8bpc_rvv         338.1   327.9   -3.02%
inv_txfm_add_16x4_identity_dct_1_8bpc_rvv         338.1   327.9   -3.02%
inv_txfm_add_16x4_identity_dct_2_8bpc_rvv         338.2   327.9   -3.05%
inv_txfm_add_16x4_identity_flipadst_0_8bpc_rvv    357.5   344.8   -3.55%
inv_txfm_add_16x4_identity_flipadst_1_8bpc_rvv    357.5   344.9   -3.52%
inv_txfm_add_16x4_identity_flipadst_2_8bpc_rvv    357.5   344.7   -3.58%
inv_txfm_add_16x4_identity_identity_0_8bpc_rvv    287.1   297.0    3.45%
inv_txfm_add_16x4_identity_identity_1_8bpc_rvv    287.2   297.0    3.41%
inv_txfm_add_16x4_identity_identity_2_8bpc_rvv    287.2   297.0    3.41%

inv_txfm_add_8x16_adst_adst_0_8bpc_rvv            774.3   704.8   -8.98%
inv_txfm_add_8x16_adst_adst_1_8bpc_rvv            774.4   704.8   -8.99%
inv_txfm_add_8x16_adst_adst_2_8bpc_rvv            929.5   839.9   -9.64%
inv_txfm_add_8x16_adst_dct_0_8bpc_rvv             687.9   634.9   -7.70%
inv_txfm_add_8x16_adst_dct_1_8bpc_rvv             688.0   634.8   -7.73%
inv_txfm_add_8x16_adst_dct_2_8bpc_rvv             845.5   768.4   -9.12%
inv_txfm_add_8x16_adst_flipadst_0_8bpc_rvv        779.5   708.5   -9.11%
inv_txfm_add_8x16_adst_flipadst_1_8bpc_rvv        779.5   708.5   -9.11%
inv_txfm_add_8x16_adst_flipadst_2_8bpc_rvv        933.3   849.9   -8.94%
inv_txfm_add_8x16_adst_identity_0_8bpc_rvv        546.5   529.0   -3.20%
inv_txfm_add_8x16_adst_identity_1_8bpc_rvv        546.5   529.0   -3.20%
inv_txfm_add_8x16_adst_identity_2_8bpc_rvv        702.5   664.1   -5.47%
inv_txfm_add_8x16_dct_adst_0_8bpc_rvv             739.9   672.7   -9.08%
inv_txfm_add_8x16_dct_adst_1_8bpc_rvv             739.9   672.7   -9.08%
inv_txfm_add_8x16_dct_adst_2_8bpc_rvv             863.1   776.1  -10.08%
inv_txfm_add_8x16_dct_dct_0_8bpc_rvv              651.2   601.9   -7.57%
inv_txfm_add_8x16_dct_dct_1_8bpc_rvv              651.2   601.8   -7.59%
inv_txfm_add_8x16_dct_dct_2_8bpc_rvv              777.6   706.5   -9.14%
inv_txfm_add_8x16_dct_flipadst_0_8bpc_rvv         742.4   678.9   -8.55%
inv_txfm_add_8x16_dct_flipadst_1_8bpc_rvv         742.5   678.9   -8.57%
inv_txfm_add_8x16_dct_flipadst_2_8bpc_rvv         858.8   779.3   -9.26%
inv_txfm_add_8x16_dct_identity_0_8bpc_rvv         510.8   496.4   -2.82%
inv_txfm_add_8x16_dct_identity_1_8bpc_rvv         510.6   496.5   -2.76%
inv_txfm_add_8x16_dct_identity_2_8bpc_rvv         630.0   599.7   -4.81%
inv_txfm_add_8x16_flipadst_adst_0_8bpc_rvv        778.3   707.2   -9.14%
inv_txfm_add_8x16_flipadst_adst_1_8bpc_rvv        778.3   707.1   -9.15%
inv_txfm_add_8x16_flipadst_adst_2_8bpc_rvv        934.4   843.5   -9.73%
inv_txfm_add_8x16_flipadst_dct_0_8bpc_rvv         689.3   634.7   -7.92%
inv_txfm_add_8x16_flipadst_dct_1_8bpc_rvv         689.2   634.8   -7.89%
inv_txfm_add_8x16_flipadst_dct_2_8bpc_rvv         845.8   774.4   -8.44%
inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_rvv    779.9   710.5   -8.90%
inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_rvv    780.0   710.4   -8.92%
inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_rvv    936.4   848.1   -9.43%
inv_txfm_add_8x16_flipadst_identity_0_8bpc_rvv    550.4   531.3   -3.47%
inv_txfm_add_8x16_flipadst_identity_1_8bpc_rvv    550.4   531.3   -3.47%
inv_txfm_add_8x16_flipadst_identity_2_8bpc_rvv    705.3   669.4   -5.09%
inv_txfm_add_8x16_identity_adst_0_8bpc_rvv        649.0   599.7   -7.60%
inv_txfm_add_8x16_identity_adst_1_8bpc_rvv        649.0   599.7   -7.60%
inv_txfm_add_8x16_identity_adst_2_8bpc_rvv        682.8   633.4   -7.23%
inv_txfm_add_8x16_identity_dct_0_8bpc_rvv         562.1   527.9   -6.08%
inv_txfm_add_8x16_identity_dct_1_8bpc_rvv         562.0   527.9   -6.07%
inv_txfm_add_8x16_identity_dct_2_8bpc_rvv         597.4   561.5   -6.01%
inv_txfm_add_8x16_identity_flipadst_0_8bpc_rvv    652.7   603.6   -7.52%
inv_txfm_add_8x16_identity_flipadst_1_8bpc_rvv    652.8   603.6   -7.54%
inv_txfm_add_8x16_identity_flipadst_2_8bpc_rvv    686.6   640.5   -6.71%
inv_txfm_add_8x16_identity_identity_0_8bpc_rvv    421.6   424.4    0.66%
inv_txfm_add_8x16_identity_identity_1_8bpc_rvv    421.7   424.4    0.64%
inv_txfm_add_8x16_identity_identity_2_8bpc_rvv    455.5   458.1    0.57%

inv_txfm_add_16x8_adst_adst_0_8bpc_rvv            935.2   843.2   -9.84%
inv_txfm_add_16x8_adst_adst_1_8bpc_rvv            935.2   843.3   -9.83%
inv_txfm_add_16x8_adst_adst_2_8bpc_rvv            935.2   843.1   -9.85%
inv_txfm_add_16x8_adst_dct_0_8bpc_rvv             857.0   781.1   -8.86%
inv_txfm_add_16x8_adst_dct_1_8bpc_rvv             856.9   781.1   -8.85%
inv_txfm_add_16x8_adst_dct_2_8bpc_rvv             856.9   781.0   -8.86%
inv_txfm_add_16x8_adst_flipadst_0_8bpc_rvv        938.9   846.8   -9.81%
inv_txfm_add_16x8_adst_flipadst_1_8bpc_rvv        938.8   847.0   -9.78%
inv_txfm_add_16x8_adst_flipadst_2_8bpc_rvv        938.9   847.0   -9.79%
inv_txfm_add_16x8_adst_identity_0_8bpc_rvv        711.2   661.6   -6.97%
inv_txfm_add_16x8_adst_identity_1_8bpc_rvv        711.2   661.6   -6.97%
inv_txfm_add_16x8_adst_identity_2_8bpc_rvv        711.2   661.6   -6.97%
inv_txfm_add_16x8_dct_adst_0_8bpc_rvv             846.1   771.5   -8.82%
inv_txfm_add_16x8_dct_adst_1_8bpc_rvv             845.9   771.5   -8.80%
inv_txfm_add_16x8_dct_adst_2_8bpc_rvv             846.2   772.1   -8.76%
inv_txfm_add_16x8_dct_dct_0_8bpc_rvv              767.8   710.3   -7.49%
inv_txfm_add_16x8_dct_dct_1_8bpc_rvv              767.8   710.4   -7.48%
inv_txfm_add_16x8_dct_dct_2_8bpc_rvv              767.4   710.4   -7.43%
inv_txfm_add_16x8_dct_flipadst_0_8bpc_rvv         856.6   775.6   -9.46%
inv_txfm_add_16x8_dct_flipadst_1_8bpc_rvv         856.5   775.1   -9.50%
inv_txfm_add_16x8_dct_flipadst_2_8bpc_rvv         856.6   775.2   -9.50%
inv_txfm_add_16x8_dct_identity_0_8bpc_rvv         623.3   589.9   -5.36%
inv_txfm_add_16x8_dct_identity_1_8bpc_rvv         623.3   590.0   -5.34%
inv_txfm_add_16x8_dct_identity_2_8bpc_rvv         623.3   589.7   -5.39%
inv_txfm_add_16x8_flipadst_adst_0_8bpc_rvv        939.8   846.9   -9.89%
inv_txfm_add_16x8_flipadst_adst_1_8bpc_rvv        939.8   847.0   -9.87%
inv_txfm_add_16x8_flipadst_adst_2_8bpc_rvv        939.9   846.9   -9.89%
inv_txfm_add_16x8_flipadst_dct_0_8bpc_rvv         860.8   784.9   -8.82%
inv_txfm_add_16x8_flipadst_dct_1_8bpc_rvv         860.7   784.8   -8.82%
inv_txfm_add_16x8_flipadst_dct_2_8bpc_rvv         860.8   784.9   -8.82%
inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_rvv    942.7   852.2   -9.60%
inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_rvv    942.7   852.1   -9.61%
inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_rvv    942.8   852.1   -9.62%
inv_txfm_add_16x8_flipadst_identity_0_8bpc_rvv    714.9   667.0   -6.70%
inv_txfm_add_16x8_flipadst_identity_1_8bpc_rvv    715.0   666.9   -6.73%
inv_txfm_add_16x8_flipadst_identity_2_8bpc_rvv    715.0   666.9   -6.73%
inv_txfm_add_16x8_identity_adst_0_8bpc_rvv        707.9   667.2   -5.75%
inv_txfm_add_16x8_identity_adst_1_8bpc_rvv        707.9   667.3   -5.74%
inv_txfm_add_16x8_identity_adst_2_8bpc_rvv        707.9   667.2   -5.75%
inv_txfm_add_16x8_identity_dct_0_8bpc_rvv         630.6   604.8   -4.09%
inv_txfm_add_16x8_identity_dct_1_8bpc_rvv         630.7   604.9   -4.09%
inv_txfm_add_16x8_identity_dct_2_8bpc_rvv         630.6   604.8   -4.09%
inv_txfm_add_16x8_identity_flipadst_0_8bpc_rvv    711.7   671.1   -5.70%
inv_txfm_add_16x8_identity_flipadst_1_8bpc_rvv    711.9   671.1   -5.73%
inv_txfm_add_16x8_identity_flipadst_2_8bpc_rvv    711.8   671.2   -5.70%
inv_txfm_add_16x8_identity_identity_0_8bpc_rvv    485.2   486.2    0.21%
inv_txfm_add_16x8_identity_identity_1_8bpc_rvv    485.2   486.3    0.23%
inv_txfm_add_16x8_identity_identity_2_8bpc_rvv    485.2   486.3    0.23%
2024-10-16 11:04:14 +00:00
Nathan E. Egge 572c5a669d riscv: Fix argon test failure
This fixes md5sum mismatch in profile0_core/streams/test11168_11073.obu.
2024-10-13 18:11:27 +00:00
Nathan E. EggeandLuca Barbato cc7d8773ee riscv64/mc: Branchless vsetvl in blend_v function
Kendryte K230

blend_v_w2_8bpc_c:       221.4 ( 1.00x)
blend_v_w2_8bpc_rvv:     147.7 ( 1.50x)
blend_v_w4_8bpc_c:       945.3 ( 1.00x)
blend_v_w4_8bpc_rvv:     243.3 ( 3.89x)
blend_v_w8_8bpc_c:      1786.9 ( 1.00x)
blend_v_w8_8bpc_rvv:     256.1 ( 6.98x)
blend_v_w16_8bpc_c:     3472.1 ( 1.00x)
blend_v_w16_8bpc_rvv:    351.1 ( 9.89x)
blend_v_w32_8bpc_c:     6832.1 ( 1.00x)
blend_v_w32_8bpc_rvv:    635.4 (10.75x)

SpacemiT K1

blend_v_w2_8bpc_c:       218.0 ( 1.00x)
blend_v_w2_8bpc_rvv:     144.3 ( 1.51x)
blend_v_w4_8bpc_c:       921.7 ( 1.00x)
blend_v_w4_8bpc_rvv:     237.1 ( 3.89x)
blend_v_w8_8bpc_c:      1739.8 ( 1.00x)
blend_v_w8_8bpc_rvv:     237.4 ( 7.33x)
blend_v_w16_8bpc_c:     3376.6 ( 1.00x)
blend_v_w16_8bpc_rvv:    296.3 (11.40x)
blend_v_w32_8bpc_c:     6647.2 ( 1.00x)
blend_v_w32_8bpc_rvv:    408.1 (16.29x)
2024-10-09 16:18:42 +02:00
Nathan E. EggeandLuca Barbato 2da8107ec1 riscv64/mc: Branchless vsetvl in blend_h function
Kendryte K230

blend_h_w2_8bpc_c:        165.9 ( 1.00x)
blend_h_w2_8bpc_rvv:       83.8 ( 1.98x)
blend_h_w4_8bpc_c:        295.2 ( 1.00x)
blend_h_w4_8bpc_rvv:       83.8 ( 3.52x)
blend_h_w8_8bpc_c:        557.9 ( 1.00x)
blend_h_w8_8bpc_rvv:       92.5 ( 6.03x)
blend_h_w16_8bpc_c:      1078.8 ( 1.00x)
blend_h_w16_8bpc_rvv:     117.3 ( 9.19x)
blend_h_w32_8bpc_c:      2117.8 ( 1.00x)
blend_h_w32_8bpc_rvv:     200.5 (10.57x)
blend_h_w64_8bpc_c:      4194.7 ( 1.00x)
blend_h_w64_8bpc_rvv:     363.2 (11.55x)
blend_h_w128_8bpc_c:    10271.4 ( 1.00x)
blend_h_w128_8bpc_rvv:    844.5 (12.16x)

SpacemiT K1

blend_h_w2_8bpc_c:        162.5 ( 1.00x)
blend_h_w2_8bpc_rvv:       83.9 ( 1.94x)
blend_h_w4_8bpc_c:        288.6 ( 1.00x)
blend_h_w4_8bpc_rvv:       83.7 ( 3.45x)
blend_h_w8_8bpc_c:        544.7 ( 1.00x)
blend_h_w8_8bpc_rvv:       84.0 ( 6.48x)
blend_h_w16_8bpc_c:      1052.8 ( 1.00x)
blend_h_w16_8bpc_rvv:     102.9 (10.23x)
blend_h_w32_8bpc_c:      2068.0 ( 1.00x)
blend_h_w32_8bpc_rvv:     131.4 (15.73x)
blend_h_w64_8bpc_c:      4093.7 ( 1.00x)
blend_h_w64_8bpc_rvv:     220.3 (18.58x)
blend_h_w128_8bpc_c:    10023.1 ( 1.00x)
blend_h_w128_8bpc_rvv:    467.3 (21.45x)
2024-10-09 16:18:42 +02:00
Nathan E. EggeandLuca Barbato b374b24c0f riscv64/mc: Branchless vsetvl in blend function
Kendryte K230

blend_w4_8bpc_c:       204.8 ( 1.00x)
blend_w4_8bpc_rvv:      59.8 ( 3.42x)
blend_w8_8bpc_c:       608.9 ( 1.00x)
blend_w8_8bpc_rvv:      87.2 ( 6.98x)
blend_w16_8bpc_c:     2362.4 ( 1.00x)
blend_w16_8bpc_rvv:    225.2 (10.49x)
blend_w32_8bpc_c:     5990.4 ( 1.00x)
blend_w32_8bpc_rvv:    518.3 (11.56x)

SpacemiT K1

blend_w4_8bpc_c:       201.6 ( 1.00x)
blend_w4_8bpc_rvv:      58.0 ( 3.48x)
blend_w8_8bpc_c:       595.1 ( 1.00x)
blend_w8_8bpc_rvv:      82.1 ( 7.25x)
blend_w16_8bpc_c:     2308.8 ( 1.00x)
blend_w16_8bpc_rvv:    189.0 (12.22x)
blend_w32_8bpc_c:     5853.1 ( 1.00x)
blend_w32_8bpc_rvv:    339.5 (17.24x)
2024-10-09 16:18:42 +02:00
Nathan E. EggeandLuca Barbato 0e3f70e898 riscv64/mc: Add VLEN=256 8bpc RVV blend_v function
SpacemiT K1

blend_v_w2_8bpc_c:       217.0 ( 1.00x)
blend_v_w2_8bpc_rvv:     143.3 ( 1.51x)
blend_v_w4_8bpc_c:       921.6 ( 1.00x)
blend_v_w4_8bpc_rvv:     236.3 ( 3.90x)
blend_v_w8_8bpc_c:      1738.2 ( 1.00x)
blend_v_w8_8bpc_rvv:     238.1 ( 7.30x)
blend_v_w16_8bpc_c:     3376.1 ( 1.00x)
blend_v_w16_8bpc_rvv:    298.0 (11.33x)
blend_v_w32_8bpc_c:     6648.0 ( 1.00x)
blend_v_w32_8bpc_rvv:    409.5 (16.24x)
2024-10-09 16:18:42 +02:00
Nathan E. EggeandLuca Barbato a5b9544866 riscv64/mc: Add VLEN=256 8bpc RVV blend_h function
SpacemiT K1

blend_h_w2_8bpc_c:        161.8 ( 1.00x)
blend_h_w2_8bpc_rvv:       83.5 ( 1.94x)
blend_h_w4_8bpc_c:        288.4 ( 1.00x)
blend_h_w4_8bpc_rvv:       83.7 ( 3.45x)
blend_h_w8_8bpc_c:        543.9 ( 1.00x)
blend_h_w8_8bpc_rvv:       84.5 ( 6.44x)
blend_h_w16_8bpc_c:      1051.6 ( 1.00x)
blend_h_w16_8bpc_rvv:     103.8 (10.13x)
blend_h_w32_8bpc_c:      2066.0 ( 1.00x)
blend_h_w32_8bpc_rvv:     133.8 (15.44x)
blend_h_w64_8bpc_c:      4092.7 ( 1.00x)
blend_h_w64_8bpc_rvv:     225.2 (18.18x)
blend_h_w128_8bpc_c:    10011.3 ( 1.00x)
blend_h_w128_8bpc_rvv:    474.7 (21.09x)
2024-10-09 16:18:42 +02:00
Nathan E. EggeandLuca Barbato 83485c5092 riscv64/mc: Add VLEN=256 8bpc RVV blend function
SpacemiT K1

blend_w4_8bpc_c:       201.3 ( 1.00x)
blend_w4_8bpc_rvv:      59.3 ( 3.40x)
blend_w8_8bpc_c:       595.1 ( 1.00x)
blend_w8_8bpc_rvv:      84.1 ( 7.07x)
blend_w16_8bpc_c:     2309.0 ( 1.00x)
blend_w16_8bpc_rvv:    190.5 (12.12x)
blend_w32_8bpc_c:     5854.7 ( 1.00x)
blend_w32_8bpc_rvv:    341.6 (17.14x)
2024-10-09 16:18:42 +02:00
Nathan E. EggeandLuca Barbato 7f2bb2fbc9 riscv: Move get_vlenb() from checkasm_ to dav1d_ 2024-10-09 16:18:42 +02:00
Nathan E. EggeandLuca Barbato 01da36ebdf riscv64/mc: Add 8bpc RVV blend_v function
Kendryte K230

blend_v_w2_8bpc_c:       219.6 ( 1.00x)
blend_v_w2_8bpc_rvv:     141.8 ( 1.55x)
blend_v_w4_8bpc_c:       942.9 ( 1.00x)
blend_v_w4_8bpc_rvv:     240.9 ( 3.91x)
blend_v_w8_8bpc_c:      1783.5 ( 1.00x)
blend_v_w8_8bpc_rvv:     254.7 ( 7.00x)
blend_v_w16_8bpc_c:     3466.5 ( 1.00x)
blend_v_w16_8bpc_rvv:    350.5 ( 9.89x)
blend_v_w32_8bpc_c:     6825.2 ( 1.00x)
blend_v_w32_8bpc_rvv:    635.1 (10.75x)
2024-10-09 16:18:42 +02:00
Nathan E. EggeandLuca Barbato d3a94f1194 riscv64/mc: Add 8bpc RVV blend_h function
Kendryte K230

blend_h_w2_8bpc_c:        165.4 ( 1.00x)
blend_h_w2_8bpc_rvv:       79.4 ( 2.08x)
blend_h_w4_8bpc_c:        294.6 ( 1.00x)
blend_h_w4_8bpc_rvv:       81.5 ( 3.61x)
blend_h_w8_8bpc_c:        556.9 ( 1.00x)
blend_h_w8_8bpc_rvv:       90.2 ( 6.17x)
blend_h_w16_8bpc_c:      1077.6 ( 1.00x)
blend_h_w16_8bpc_rvv:     116.1 ( 9.29x)
blend_h_w32_8bpc_c:      2116.2 ( 1.00x)
blend_h_w32_8bpc_rvv:     200.5 (10.55x)
blend_h_w64_8bpc_c:      4191.8 ( 1.00x)
blend_h_w64_8bpc_rvv:     363.3 (11.54x)
blend_h_w128_8bpc_c:    10264.6 ( 1.00x)
blend_h_w128_8bpc_rvv:    844.1 (12.16x)
2024-10-09 16:18:42 +02:00
Nathan E. EggeandLuca Barbato f851fcd0b4 riscv64/mc: Add 8bpc RVV blend function
Kendryte K230

blend_w4_8bpc_c:       204.5 ( 1.00x)
blend_w4_8bpc_rvv:      56.4 ( 3.62x)
blend_w8_8bpc_c:       608.6 ( 1.00x)
blend_w8_8bpc_rvv:      87.3 ( 6.97x)
blend_w16_8bpc_c:     2363.8 ( 1.00x)
blend_w16_8bpc_rvv:    225.1 (10.50x)
blend_w32_8bpc_c:     5990.3 ( 1.00x)
blend_w32_8bpc_rvv:    518.8 (11.55x)
2024-10-09 16:18:42 +02:00
Nathan E. EggeandLuca Barbato 38f74bdc46 riscv: Allow multiple .option arch with vararg ext 2024-10-09 16:18:42 +02:00
Nathan E. Egge 01b94cc33b cli: Prevent buffer over-read 2024-06-10 14:45:27 -04:00
Nathan E. Egge fc4763c5a4 riscv: Check for standards compliant RVV 1.0+ 2024-05-06 14:15:44 -04:00
Nathan E. Egge 0fff614a4c arm32/msac: Trim C functions, saves 1024 bytes 2024-03-08 20:26:46 +00:00
Nathan E. Egge b9f5333021 arm64/msac: Trim C functions, saves 1392 bytes 2024-03-08 20:16:13 +00:00
Nathan E. Egge b5b394cd6e arm: Use -fno-align-functions when building
arm32: 2 byte alignment saves 136 bytes
arm64: 4 byte alignment saves 1200 bytes
2024-03-08 16:52:21 +00:00
Nathan E. Egge 61d16e07ac arm32/itx: Trim dav1d_inv_wht4_1d_c, saves 68 bytes 2024-03-08 12:45:00 +00:00
Nathan E. Egge 485413b059 arm64/itx: Trim dav1d_inv_wht4_1d_c, saves 92 bytes 2024-03-08 12:45:00 +00:00
Nathan E. Egge ec695854f7 arm32/itx16: Add 4x4 12bpc NEON wht_wht transform
When -Dtrim_dsp=true, this commit saves 740 bytes.

inv_txfm_add_4x4_wht_wht_0_12bpc_c:       192.4 ( 1.00x)
inv_txfm_add_4x4_wht_wht_0_12bpc_neon:     46.1 ( 4.17x)
inv_txfm_add_4x4_wht_wht_1_12bpc_c:       192.4 ( 1.00x)
inv_txfm_add_4x4_wht_wht_1_12bpc_neon:     45.7 ( 4.21x)
2024-03-08 12:44:57 +00:00
Nathan E. Egge 3b852b15e9 arm64/itx16: Add 4x4 12bpc NEON wht_wht transform
When -Dtrim_dsp=true, this commit saves 940 bytes.

inv_txfm_add_4x4_wht_wht_0_12bpc_c:       145.2 ( 1.00x)
inv_txfm_add_4x4_wht_wht_0_12bpc_neon:     42.9 ( 3.39x)
inv_txfm_add_4x4_wht_wht_1_12bpc_c:       145.4 ( 1.00x)
inv_txfm_add_4x4_wht_wht_1_12bpc_neon:     42.9 ( 3.39x)
2024-03-08 12:41:06 +00:00
Nathan E. Egge b7963a7389 riscv64/itx: Add 16x16 8bpc eob test
Kendryte K230                                         Before          After

inv_txfm_add_16x16_adst_adst_0_8bpc_rvv:          1804.9 (8.45x)  1374.3 (11.18x)
inv_txfm_add_16x16_adst_adst_1_8bpc_rvv:          1805.2 (8.45x)  1374.3 (11.17x)
inv_txfm_add_16x16_adst_dct_0_8bpc_rvv:           1626.6 (8.92x)  1185.8 (12.22x)
inv_txfm_add_16x16_adst_dct_1_8bpc_rvv:           1626.5 (8.91x)  1185.9 (12.22x)
inv_txfm_add_16x16_adst_flipadst_0_8bpc_rvv:      1824.2 (8.38x)  1372.1 (11.22x)
inv_txfm_add_16x16_adst_flipadst_1_8bpc_rvv:      1824.2 (8.37x)  1372.2 (11.21x)
inv_txfm_add_16x16_dct_adst_0_8bpc_rvv:           1627.3 (8.94x)  1283.5 (11.29x)
inv_txfm_add_16x16_dct_adst_1_8bpc_rvv:           1627.2 (8.95x)  1283.2 (11.29x)
inv_txfm_add_16x16_dct_dct_0_8bpc_rvv:            1449.3 (1.08x)  1095.2 ( 1.44x)
inv_txfm_add_16x16_dct_dct_1_8bpc_rvv:            1449.1 (9.52x)  1095.1 (12.45x)
inv_txfm_add_16x16_dct_flipadst_0_8bpc_rvv:       1643.0 (8.87x)  1283.5 (11.29x)
inv_txfm_add_16x16_dct_flipadst_1_8bpc_rvv:       1643.3 (8.87x)  1283.3 (11.30x)
inv_txfm_add_16x16_dct_identity_0_8bpc_rvv:       1155.4 (9.23x)   805.9 (13.17x)
inv_txfm_add_16x16_dct_identity_1_8bpc_rvv:       1155.4 (9.24x)   805.9 (13.17x)
inv_txfm_add_16x16_flipadst_adst_0_8bpc_rvv:      1812.2 (8.43x)  1370.9 (11.23x)
inv_txfm_add_16x16_flipadst_adst_1_8bpc_rvv:      1811.7 (8.44x)  1370.8 (11.24x)
inv_txfm_add_16x16_flipadst_dct_0_8bpc_rvv:       1637.2 (8.88x)  1190.8 (12.19x)
inv_txfm_add_16x16_flipadst_dct_1_8bpc_rvv:       1637.6 (8.87x)  1190.9 (12.19x)
inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_rvv:  1831.1 (8.34x)  1374.7 (11.21x)
inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_rvv:  1830.8 (8.35x)  1374.5 (11.22x)
inv_txfm_add_16x16_identity_dct_0_8bpc_rvv:       1156.2 (8.67x)   948.6 (10.49x)
inv_txfm_add_16x16_identity_dct_1_8bpc_rvv:       1156.3 (8.68x)   948.6 (10.49x)
inv_txfm_add_16x16_identity_identity_0_8bpc_rvv:   879.3 (7.81x)   673.5 (10.28x)
inv_txfm_add_16x16_identity_identity_1_8bpc_rvv:   879.3 (7.81x)   673.5 (10.28x)
2024-02-27 04:47:36 -05:00
Nathan E. Egge 701225128a riscv64/itx: Add 8x16 8bpc eob test
Kendryte K230                                        Before          After

inv_txfm_add_8x16_adst_adst_0_8bpc_rvv:           853.9 ( 9.00x)  698.3 (11.03x)
inv_txfm_add_8x16_adst_adst_1_8bpc_rvv:           853.8 ( 9.00x)  698.3 (11.03x)
inv_txfm_add_8x16_adst_dct_0_8bpc_rvv:            763.0 ( 9.55x)  609.2 (12.00x)
inv_txfm_add_8x16_adst_dct_1_8bpc_rvv:            763.1 ( 9.55x)  609.3 (11.94x)
inv_txfm_add_8x16_adst_flipadst_0_8bpc_rvv:       857.1 ( 8.99x)  701.6 (11.00x)
inv_txfm_add_8x16_adst_flipadst_1_8bpc_rvv:       856.8 ( 8.98x)  701.3 (10.97x)
inv_txfm_add_8x16_adst_identity_0_8bpc_rvv:       622.9 ( 9.22x)  468.5 (12.36x)
inv_txfm_add_8x16_adst_identity_1_8bpc_rvv:       622.9 ( 9.23x)  468.6 (12.37x)
inv_txfm_add_8x16_dct_adst_0_8bpc_rvv:            770.1 ( 9.32x)  655.1 (10.93x)
inv_txfm_add_8x16_dct_adst_1_8bpc_rvv:            770.1 ( 9.34x)  655.4 (10.93x)
inv_txfm_add_8x16_dct_dct_0_8bpc_rvv:             679.8 ( 1.23x)  566.1 ( 1.48x)
inv_txfm_add_8x16_dct_dct_1_8bpc_rvv:             679.8 ( 9.98x)  566.5 (11.89x)
inv_txfm_add_8x16_dct_flipadst_0_8bpc_rvv:        771.1 ( 9.34x)  667.4 (10.75x)
inv_txfm_add_8x16_dct_flipadst_1_8bpc_rvv:        771.1 ( 9.34x)  667.3 (10.76x)
inv_txfm_add_8x16_dct_identity_0_8bpc_rvv:        532.3 ( 9.84x)  422.1 (12.42x)
inv_txfm_add_8x16_dct_identity_1_8bpc_rvv:        532.4 ( 9.85x)  422.2 (12.40x)
inv_txfm_add_8x16_flipadst_adst_0_8bpc_rvv:       858.4 ( 8.98x)  699.2 (11.03x)
inv_txfm_add_8x16_flipadst_adst_1_8bpc_rvv:       858.5 ( 8.98x)  699.3 (11.03x)
inv_txfm_add_8x16_flipadst_dct_0_8bpc_rvv:        768.6 ( 9.52x)  609.7 (11.97x)
inv_txfm_add_8x16_flipadst_dct_1_8bpc_rvv:        768.4 ( 9.52x)  609.6 (11.97x)
inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_rvv:   866.5 ( 8.91x)  706.5 (10.92x)
inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_rvv:   866.4 ( 8.92x)  706.6 (10.95x)
inv_txfm_add_8x16_flipadst_identity_0_8bpc_rvv:   621.9 ( 9.28x)  464.6 (12.46x)
inv_txfm_add_8x16_flipadst_identity_1_8bpc_rvv:   621.8 ( 9.28x)  464.6 (12.46x)
inv_txfm_add_8x16_identity_adst_0_8bpc_rvv:       584.9 ( 9.78x)  564.1 (10.12x)
inv_txfm_add_8x16_identity_adst_1_8bpc_rvv:       584.8 ( 9.78x)  563.9 (10.12x)
inv_txfm_add_8x16_identity_dct_0_8bpc_rvv:        495.0 (10.75x)  474.6 (11.13x)
inv_txfm_add_8x16_identity_dct_1_8bpc_rvv:        494.3 (10.75x)  474.7 (11.12x)
inv_txfm_add_8x16_identity_flipadst_0_8bpc_rvv:   588.1 ( 9.76x)  568.1 (10.07x)
inv_txfm_add_8x16_identity_flipadst_1_8bpc_rvv:   588.7 ( 9.74x)  568.0 (10.07x)
inv_txfm_add_8x16_identity_identity_0_8bpc_rvv:   349.5 (10.78x)  328.8 (11.46x)
inv_txfm_add_8x16_identity_identity_1_8bpc_rvv:   349.4 (10.79x)  328.7 (11.46x)
2024-02-27 04:47:36 -05:00
Nathan E. Egge afeeb3cc90 riscv64/itx: Add 4x16 8bpc eob test
Kendryte K230                                        Before         After

inv_txfm_add_4x16_adst_adst_0_8bpc_rvv:           429.9 (7.45x)  381.3 (8.58x)
inv_txfm_add_4x16_adst_adst_1_8bpc_rvv:           430.0 (7.45x)  381.3 (8.57x)
inv_txfm_add_4x16_adst_dct_0_8bpc_rvv:            381.0 (8.01x)  332.5 (9.19x)
inv_txfm_add_4x16_adst_dct_1_8bpc_rvv:            381.0 (8.00x)  332.5 (9.19x)
inv_txfm_add_4x16_adst_flipadst_0_8bpc_rvv:       432.8 (7.42x)  384.5 (8.52x)
inv_txfm_add_4x16_adst_flipadst_1_8bpc_rvv:       432.8 (7.42x)  384.4 (8.52x)
inv_txfm_add_4x16_adst_identity_0_8bpc_rvv:       304.6 (7.32x)  249.8 (9.18x)
inv_txfm_add_4x16_adst_identity_1_8bpc_rvv:       304.5 (7.32x)  249.8 (9.18x)
inv_txfm_add_4x16_dct_adst_0_8bpc_rvv:            407.2 (7.68x)  371.4 (8.57x)
inv_txfm_add_4x16_dct_adst_1_8bpc_rvv:            407.1 (7.68x)  371.5 (8.58x)
inv_txfm_add_4x16_dct_dct_0_8bpc_rvv:             357.9 (1.27x)  323.1 (1.41x)
inv_txfm_add_4x16_dct_dct_1_8bpc_rvv:             357.9 (8.29x)  322.9 (9.16x)
inv_txfm_add_4x16_dct_flipadst_0_8bpc_rvv:        410.0 (7.62x)  376.6 (8.45x)
inv_txfm_add_4x16_dct_flipadst_1_8bpc_rvv:        410.0 (7.62x)  376.5 (8.47x)
inv_txfm_add_4x16_dct_identity_0_8bpc_rvv:        275.2 (7.79x)  240.5 (9.21x)
inv_txfm_add_4x16_dct_identity_1_8bpc_rvv:        275.3 (7.78x)  240.6 (9.19x)
inv_txfm_add_4x16_flipadst_adst_0_8bpc_rvv:       430.5 (7.51x)  382.6 (8.60x)
inv_txfm_add_4x16_flipadst_adst_1_8bpc_rvv:       430.1 (7.52x)  382.8 (8.60x)
inv_txfm_add_4x16_flipadst_dct_0_8bpc_rvv:        381.1 (8.09x)  333.8 (9.21x)
inv_txfm_add_4x16_flipadst_dct_1_8bpc_rvv:        381.0 (8.08x)  333.7 (9.21x)
inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_rvv:   433.0 (7.48x)  385.7 (8.55x)
inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_rvv:   433.0 (7.48x)  385.7 (8.55x)
inv_txfm_add_4x16_flipadst_identity_0_8bpc_rvv:   298.6 (7.57x)  250.8 (9.28x)
inv_txfm_add_4x16_flipadst_identity_1_8bpc_rvv:   298.6 (7.57x)  250.9 (9.27x)
inv_txfm_add_4x16_identity_adst_0_8bpc_rvv:       361.5 (7.93x)  347.3 (8.35x)
inv_txfm_add_4x16_identity_adst_1_8bpc_rvv:       361.4 (7.93x)  347.4 (8.35x)
inv_txfm_add_4x16_identity_dct_0_8bpc_rvv:        310.9 (8.69x)  297.8 (9.02x)
inv_txfm_add_4x16_identity_dct_1_8bpc_rvv:        311.0 (8.69x)  297.8 (9.02x)
inv_txfm_add_4x16_identity_flipadst_0_8bpc_rvv:   364.1 (7.88x)  350.5 (8.29x)
inv_txfm_add_4x16_identity_flipadst_1_8bpc_rvv:   364.2 (7.88x)  350.4 (8.31x)
inv_txfm_add_4x16_identity_identity_0_8bpc_rvv:   229.7 (8.22x)  211.4 (9.11x)
inv_txfm_add_4x16_identity_identity_1_8bpc_rvv:   229.7 (8.21x)  211.2 (9.12x)
2024-02-27 04:47:36 -05:00
Nathan E. Egge 52948bbfcc riscv/checkasm: Print the RVV vector length, if available 2024-02-26 18:07:11 -05:00
Nathan E. Egge 8c209190bb arm/msac: Enable NEON optimizations on more platforms
This commit enables msac NEON assembly optimizations when building with
 MSVC targeting ARM.
Note, the test for __APPLE__ is redundant and added for consistency.
2024-02-26 19:46:54 +00:00
Nathan E. Egge 2ab2ec388e riscv64/itx: Fix build issues with clang 2024-02-22 11:50:56 -05:00
Nathan E. Egge 7be30df413 arm64/itx16: Reuse horz_16x4 epilog, saves 96 bytes 2024-02-21 06:58:56 -05:00
Nathan E. Egge 28c7e530b1 arm32/itx16: Reuse horz_16x2 epilog, saves 24 bytes 2024-02-21 06:58:56 -05:00
Nathan E. Egge 4bb0005ca7 riscv64/itx: Reuse horz_16x8 epilog, saves 94 bytes 2024-02-21 06:58:56 -05:00
Nathan E. Egge f15b073156 arm64/itx: Reuse horz_16x8 epilog, saves 512 bytes 2024-02-21 06:58:56 -05:00
Nathan E. Egge 6249bd8809 arm32/itx: Reuse horz_16x4 epilog, saves 336 bytes 2024-02-21 06:58:56 -05:00
Nathan E. Egge 6e5d1df633 riscv64/itx: Reuse 16x8 epilog, saves 706 bytes 2024-02-21 06:58:55 -05:00
Nathan E. Egge 4585763474 riscv64/itx: Reuse 8x16 epilog, saves 24 bytes 2024-02-21 06:58:55 -05:00
Nathan E. Egge be47dfcd3f riscv64/itx: Tail call vert_8x16, saves 1086 bytes 2024-02-21 06:58:55 -05:00
Nathan E. Egge 1830c9b598 riscv64/itx: Reuse 16x4 epilog, saves 354 bytes 2024-02-21 06:58:55 -05:00
Nathan E. Egge 311816b46d riscv64/itx: Reuse 4x16 epilog, saves 642 bytes 2024-02-21 06:58:55 -05:00
Nathan E. Egge e7378375b5 riscv64/itx: Fix unrolled .irp loops, saves 12 bytes 2024-02-21 06:58:55 -05:00
Nathan E. Egge d4746c908c arm32/itx: Remove 16x8 variant, saves 528 bytes 2024-02-21 06:58:55 -05:00
Nathan E. Egge 1ed35d2cbe arm32/itx: Reuse 8x16 epilog, saves 48 bytes 2024-02-21 06:58:55 -05:00
Nathan E. Egge 5250a16f21 arm32/itx: Reuse 16x4 epilog, saves 220 bytes
Unlike arm64, it is not possible to fold the vmov instructions into the
 transpose_4x8h macro so this commit _adds_ 4 vmov instructions to the
 code paths of the twelve 16x4 transforms that do not start with idtx.
2024-02-21 06:58:55 -05:00
Nathan E. Egge 9944ce3080 arm32/itx: Reuse 4x16 epilog, saves 268 bytes 2024-02-21 06:58:55 -05:00
Nathan E. Egge 5020162d63 arm64/itx: Reuse 16x8 epilog, saves 568 bytes
Move the vertical transpose-transform-store operations to the end of
 inv_txfm_add_16x8_neon and fold the mov instructions into the second
 transpose.
Only the four *16x8_identity* functions are modified and this commit
 _removes_ 8 mov instructions from these code paths.
2024-02-21 06:58:55 -05:00
Nathan E. Egge 89c53031a8 arm64: Add transpose_8x8h_mov macro 2024-02-21 06:58:55 -05:00
Nathan E. Egge 80806b57b0 arm64/itx: Reuse 8x16 epilog, saves 424 bytes 2024-02-21 06:58:55 -05:00
Nathan E. Egge 3335c5ebdb arm64/itx: Reuse 16x4 epilog, saves 264 bytes
Move the vertical transpose-transform-store operations to the end of
 inv_txfm_add_16x4_neon and fold the mov instructions into the second
 transpose.
Only the four *16x4_identity* functions are modified and this commit
 _removes_ 4 mov instructions from these code paths.
2024-02-21 06:58:55 -05:00
Nathan E. Egge 57e46dd955 arm64: Add transpose_4x8h_mov macro 2024-02-21 06:58:55 -05:00
Nathan E. Egge 955939f762 arm64/itx: Reuse 4x16 epilog, saves 312 bytes 2024-02-21 06:58:55 -05:00
Nathan E. Egge c15f7ecd46 riscv64/itx: Add 16x8 8bpp RVV transforms
inv_txfm_add_16x8_adst_adst_0_8bpc_c:             7638.9 ( 1.00x)
inv_txfm_add_16x8_adst_adst_0_8bpc_rvv:            854.4 ( 8.94x)
inv_txfm_add_16x8_adst_adst_1_8bpc_c:             7650.5 ( 1.00x)
inv_txfm_add_16x8_adst_adst_1_8bpc_rvv:            854.4 ( 8.95x)
inv_txfm_add_16x8_adst_adst_2_8bpc_c:             7649.4 ( 1.00x)
inv_txfm_add_16x8_adst_adst_2_8bpc_rvv:            854.4 ( 8.95x)
inv_txfm_add_16x8_adst_dct_0_8bpc_c:              7182.0 ( 1.00x)
inv_txfm_add_16x8_adst_dct_0_8bpc_rvv:             758.1 ( 9.47x)
inv_txfm_add_16x8_adst_dct_1_8bpc_c:              7175.6 ( 1.00x)
inv_txfm_add_16x8_adst_dct_1_8bpc_rvv:             758.1 ( 9.47x)
inv_txfm_add_16x8_adst_dct_2_8bpc_c:              7181.7 ( 1.00x)
inv_txfm_add_16x8_adst_dct_2_8bpc_rvv:             758.0 ( 9.47x)
inv_txfm_add_16x8_adst_flipadst_0_8bpc_c:         7671.7 ( 1.00x)
inv_txfm_add_16x8_adst_flipadst_0_8bpc_rvv:        858.3 ( 8.94x)
inv_txfm_add_16x8_adst_flipadst_1_8bpc_c:         7671.5 ( 1.00x)
inv_txfm_add_16x8_adst_flipadst_1_8bpc_rvv:        858.1 ( 8.94x)
inv_txfm_add_16x8_adst_flipadst_2_8bpc_c:         7673.8 ( 1.00x)
inv_txfm_add_16x8_adst_flipadst_2_8bpc_rvv:        858.2 ( 8.94x)
inv_txfm_add_16x8_adst_identity_0_8bpc_c:         5727.4 ( 1.00x)
inv_txfm_add_16x8_adst_identity_0_8bpc_rvv:        612.6 ( 9.35x)
inv_txfm_add_16x8_adst_identity_1_8bpc_c:         5709.0 ( 1.00x)
inv_txfm_add_16x8_adst_identity_1_8bpc_rvv:        612.6 ( 9.32x)
inv_txfm_add_16x8_adst_identity_2_8bpc_c:         5709.6 ( 1.00x)
inv_txfm_add_16x8_adst_identity_2_8bpc_rvv:        612.5 ( 9.32x)
inv_txfm_add_16x8_dct_adst_0_8bpc_c:              7272.9 ( 1.00x)
inv_txfm_add_16x8_dct_adst_0_8bpc_rvv:             761.2 ( 9.55x)
inv_txfm_add_16x8_dct_adst_1_8bpc_c:              7276.0 ( 1.00x)
inv_txfm_add_16x8_dct_adst_1_8bpc_rvv:             761.0 ( 9.56x)
inv_txfm_add_16x8_dct_adst_2_8bpc_c:              7271.5 ( 1.00x)
inv_txfm_add_16x8_dct_adst_2_8bpc_rvv:             761.0 ( 9.55x)
inv_txfm_add_16x8_dct_dct_0_8bpc_c:                822.4 ( 1.00x)
inv_txfm_add_16x8_dct_dct_0_8bpc_rvv:              666.4 ( 1.23x)
inv_txfm_add_16x8_dct_dct_1_8bpc_c:               6791.3 ( 1.00x)
inv_txfm_add_16x8_dct_dct_1_8bpc_rvv:              666.6 (10.19x)
inv_txfm_add_16x8_dct_dct_2_8bpc_c:               6786.0 ( 1.00x)
inv_txfm_add_16x8_dct_dct_2_8bpc_rvv:              666.5 (10.18x)
inv_txfm_add_16x8_dct_flipadst_0_8bpc_c:          7280.7 ( 1.00x)
inv_txfm_add_16x8_dct_flipadst_0_8bpc_rvv:         764.8 ( 9.52x)
inv_txfm_add_16x8_dct_flipadst_1_8bpc_c:          7279.0 ( 1.00x)
inv_txfm_add_16x8_dct_flipadst_1_8bpc_rvv:         765.0 ( 9.52x)
inv_txfm_add_16x8_dct_flipadst_2_8bpc_c:          7282.0 ( 1.00x)
inv_txfm_add_16x8_dct_flipadst_2_8bpc_rvv:         764.8 ( 9.52x)
inv_txfm_add_16x8_dct_identity_0_8bpc_c:          5340.5 ( 1.00x)
inv_txfm_add_16x8_dct_identity_0_8bpc_rvv:         520.4 (10.26x)
inv_txfm_add_16x8_dct_identity_1_8bpc_c:          5342.2 ( 1.00x)
inv_txfm_add_16x8_dct_identity_1_8bpc_rvv:         521.0 (10.25x)
inv_txfm_add_16x8_dct_identity_2_8bpc_c:          5341.7 ( 1.00x)
inv_txfm_add_16x8_dct_identity_2_8bpc_rvv:         520.9 (10.25x)
inv_txfm_add_16x8_flipadst_adst_0_8bpc_c:         7671.5 ( 1.00x)
inv_txfm_add_16x8_flipadst_adst_0_8bpc_rvv:        855.3 ( 8.97x)
inv_txfm_add_16x8_flipadst_adst_1_8bpc_c:         7663.0 ( 1.00x)
inv_txfm_add_16x8_flipadst_adst_1_8bpc_rvv:        855.3 ( 8.96x)
inv_txfm_add_16x8_flipadst_adst_2_8bpc_c:         7663.4 ( 1.00x)
inv_txfm_add_16x8_flipadst_adst_2_8bpc_rvv:        855.2 ( 8.96x)
inv_txfm_add_16x8_flipadst_dct_0_8bpc_c:          7185.0 ( 1.00x)
inv_txfm_add_16x8_flipadst_dct_0_8bpc_rvv:         760.2 ( 9.45x)
inv_txfm_add_16x8_flipadst_dct_1_8bpc_c:          7185.4 ( 1.00x)
inv_txfm_add_16x8_flipadst_dct_1_8bpc_rvv:         760.2 ( 9.45x)
inv_txfm_add_16x8_flipadst_dct_2_8bpc_c:          7185.3 ( 1.00x)
inv_txfm_add_16x8_flipadst_dct_2_8bpc_rvv:         760.4 ( 9.45x)
inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_c:     7686.6 ( 1.00x)
inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_rvv:    859.1 ( 8.95x)
inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_c:     7687.9 ( 1.00x)
inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_rvv:    859.2 ( 8.95x)
inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_c:     7684.5 ( 1.00x)
inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_rvv:    859.0 ( 8.95x)
inv_txfm_add_16x8_flipadst_identity_0_8bpc_c:     5723.1 ( 1.00x)
inv_txfm_add_16x8_flipadst_identity_0_8bpc_rvv:    615.7 ( 9.30x)
inv_txfm_add_16x8_flipadst_identity_1_8bpc_c:     5725.1 ( 1.00x)
inv_txfm_add_16x8_flipadst_identity_1_8bpc_rvv:    615.6 ( 9.30x)
inv_txfm_add_16x8_flipadst_identity_2_8bpc_c:     5713.0 ( 1.00x)
inv_txfm_add_16x8_flipadst_identity_2_8bpc_rvv:    615.6 ( 9.28x)
inv_txfm_add_16x8_identity_adst_0_8bpc_c:         5390.1 ( 1.00x)
inv_txfm_add_16x8_identity_adst_0_8bpc_rvv:        617.9 ( 8.72x)
inv_txfm_add_16x8_identity_adst_1_8bpc_c:         5388.8 ( 1.00x)
inv_txfm_add_16x8_identity_adst_1_8bpc_rvv:        617.7 ( 8.72x)
inv_txfm_add_16x8_identity_adst_2_8bpc_c:         5390.0 ( 1.00x)
inv_txfm_add_16x8_identity_adst_2_8bpc_rvv:        617.7 ( 8.73x)
inv_txfm_add_16x8_identity_dct_0_8bpc_c:          4919.0 ( 1.00x)
inv_txfm_add_16x8_identity_dct_0_8bpc_rvv:         522.9 ( 9.41x)
inv_txfm_add_16x8_identity_dct_1_8bpc_c:          4916.6 ( 1.00x)
inv_txfm_add_16x8_identity_dct_1_8bpc_rvv:         523.0 ( 9.40x)
inv_txfm_add_16x8_identity_dct_2_8bpc_c:          4918.6 ( 1.00x)
inv_txfm_add_16x8_identity_dct_2_8bpc_rvv:         523.0 ( 9.40x)
inv_txfm_add_16x8_identity_flipadst_0_8bpc_c:     5402.3 ( 1.00x)
inv_txfm_add_16x8_identity_flipadst_0_8bpc_rvv:    621.7 ( 8.69x)
inv_txfm_add_16x8_identity_flipadst_1_8bpc_c:     5402.1 ( 1.00x)
inv_txfm_add_16x8_identity_flipadst_1_8bpc_rvv:    621.3 ( 8.69x)
inv_txfm_add_16x8_identity_flipadst_2_8bpc_c:     5401.6 ( 1.00x)
inv_txfm_add_16x8_identity_flipadst_2_8bpc_rvv:    621.6 ( 8.69x)
inv_txfm_add_16x8_identity_identity_0_8bpc_c:     3436.1 ( 1.00x)
inv_txfm_add_16x8_identity_identity_0_8bpc_rvv:    377.8 ( 9.09x)
inv_txfm_add_16x8_identity_identity_1_8bpc_c:     3436.3 ( 1.00x)
inv_txfm_add_16x8_identity_identity_1_8bpc_rvv:    377.9 ( 9.09x)
inv_txfm_add_16x8_identity_identity_2_8bpc_c:     3436.1 ( 1.00x)
inv_txfm_add_16x8_identity_identity_2_8bpc_rvv:    377.8 ( 9.09x)
2024-02-19 10:04:54 -05:00
Nathan E. Egge 5ca7a025be riscv64/itx: Add 8x16 8bpc RVV transforms
inv_txfm_add_8x16_adst_adst_0_8bpc_c:             7682.3 ( 1.00x)
inv_txfm_add_8x16_adst_adst_0_8bpc_rvv:            842.2 ( 9.12x)
inv_txfm_add_8x16_adst_adst_1_8bpc_c:             7682.0 ( 1.00x)
inv_txfm_add_8x16_adst_adst_1_8bpc_rvv:            842.1 ( 9.12x)
inv_txfm_add_8x16_adst_adst_2_8bpc_c:             7681.6 ( 1.00x)
inv_txfm_add_8x16_adst_adst_2_8bpc_rvv:            842.2 ( 9.12x)
inv_txfm_add_8x16_adst_dct_0_8bpc_c:              7309.0 ( 1.00x)
inv_txfm_add_8x16_adst_dct_0_8bpc_rvv:             752.9 ( 9.71x)
inv_txfm_add_8x16_adst_dct_1_8bpc_c:              7317.4 ( 1.00x)
inv_txfm_add_8x16_adst_dct_1_8bpc_rvv:             752.9 ( 9.72x)
inv_txfm_add_8x16_adst_dct_2_8bpc_c:              7323.6 ( 1.00x)
inv_txfm_add_8x16_adst_dct_2_8bpc_rvv:             753.0 ( 9.73x)
inv_txfm_add_8x16_adst_flipadst_0_8bpc_c:         7686.5 ( 1.00x)
inv_txfm_add_8x16_adst_flipadst_0_8bpc_rvv:        846.7 ( 9.08x)
inv_txfm_add_8x16_adst_flipadst_1_8bpc_c:         7686.7 ( 1.00x)
inv_txfm_add_8x16_adst_flipadst_1_8bpc_rvv:        846.6 ( 9.08x)
inv_txfm_add_8x16_adst_flipadst_2_8bpc_c:         7688.0 ( 1.00x)
inv_txfm_add_8x16_adst_flipadst_2_8bpc_rvv:        846.6 ( 9.08x)
inv_txfm_add_8x16_adst_identity_0_8bpc_c:         5742.6 ( 1.00x)
inv_txfm_add_8x16_adst_identity_0_8bpc_rvv:        608.6 ( 9.44x)
inv_txfm_add_8x16_adst_identity_1_8bpc_c:         5741.5 ( 1.00x)
inv_txfm_add_8x16_adst_identity_1_8bpc_rvv:        608.7 ( 9.43x)
inv_txfm_add_8x16_adst_identity_2_8bpc_c:         5743.3 ( 1.00x)
inv_txfm_add_8x16_adst_identity_2_8bpc_rvv:        608.4 ( 9.44x)
inv_txfm_add_8x16_dct_adst_0_8bpc_c:              7229.8 ( 1.00x)
inv_txfm_add_8x16_dct_adst_0_8bpc_rvv:             756.3 ( 9.56x)
inv_txfm_add_8x16_dct_adst_1_8bpc_c:              7227.7 ( 1.00x)
inv_txfm_add_8x16_dct_adst_1_8bpc_rvv:             756.3 ( 9.56x)
inv_txfm_add_8x16_dct_adst_2_8bpc_c:              7229.0 ( 1.00x)
inv_txfm_add_8x16_dct_adst_2_8bpc_rvv:             756.3 ( 9.56x)
inv_txfm_add_8x16_dct_dct_0_8bpc_c:                839.3 ( 1.00x)
inv_txfm_add_8x16_dct_dct_0_8bpc_rvv:              667.4 ( 1.26x)
inv_txfm_add_8x16_dct_dct_1_8bpc_c:               6842.7 ( 1.00x)
inv_txfm_add_8x16_dct_dct_1_8bpc_rvv:              667.4 (10.25x)
inv_txfm_add_8x16_dct_dct_2_8bpc_c:               6845.3 ( 1.00x)
inv_txfm_add_8x16_dct_dct_2_8bpc_rvv:              667.4 (10.26x)
inv_txfm_add_8x16_dct_flipadst_0_8bpc_c:          7222.3 ( 1.00x)
inv_txfm_add_8x16_dct_flipadst_0_8bpc_rvv:         760.4 ( 9.50x)
inv_txfm_add_8x16_dct_flipadst_1_8bpc_c:          7222.7 ( 1.00x)
inv_txfm_add_8x16_dct_flipadst_1_8bpc_rvv:         760.4 ( 9.50x)
inv_txfm_add_8x16_dct_flipadst_2_8bpc_c:          7222.2 ( 1.00x)
inv_txfm_add_8x16_dct_flipadst_2_8bpc_rvv:         760.4 ( 9.50x)
inv_txfm_add_8x16_dct_identity_0_8bpc_c:          5286.1 ( 1.00x)
inv_txfm_add_8x16_dct_identity_0_8bpc_rvv:         521.4 (10.14x)
inv_txfm_add_8x16_dct_identity_1_8bpc_c:          5283.2 ( 1.00x)
inv_txfm_add_8x16_dct_identity_1_8bpc_rvv:         521.4 (10.13x)
inv_txfm_add_8x16_dct_identity_2_8bpc_c:          5285.7 ( 1.00x)
inv_txfm_add_8x16_dct_identity_2_8bpc_rvv:         521.3 (10.14x)
inv_txfm_add_8x16_flipadst_adst_0_8bpc_c:         7701.2 ( 1.00x)
inv_txfm_add_8x16_flipadst_adst_0_8bpc_rvv:        845.7 ( 9.11x)
inv_txfm_add_8x16_flipadst_adst_1_8bpc_c:         7702.5 ( 1.00x)
inv_txfm_add_8x16_flipadst_adst_1_8bpc_rvv:        845.7 ( 9.11x)
inv_txfm_add_8x16_flipadst_adst_2_8bpc_c:         7708.0 ( 1.00x)
inv_txfm_add_8x16_flipadst_adst_2_8bpc_rvv:        845.7 ( 9.11x)
inv_txfm_add_8x16_flipadst_dct_0_8bpc_c:          7331.0 ( 1.00x)
inv_txfm_add_8x16_flipadst_dct_0_8bpc_rvv:         758.9 ( 9.66x)
inv_txfm_add_8x16_flipadst_dct_1_8bpc_c:          7327.2 ( 1.00x)
inv_txfm_add_8x16_flipadst_dct_1_8bpc_rvv:         758.8 ( 9.66x)
inv_txfm_add_8x16_flipadst_dct_2_8bpc_c:          7326.8 ( 1.00x)
inv_txfm_add_8x16_flipadst_dct_2_8bpc_rvv:         758.7 ( 9.66x)
inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_c:     7707.7 ( 1.00x)
inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_rvv:    855.8 ( 9.01x)
inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_c:     7708.1 ( 1.00x)
inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_rvv:    855.5 ( 9.01x)
inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_c:     7708.1 ( 1.00x)
inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_rvv:    855.7 ( 9.01x)
inv_txfm_add_8x16_flipadst_identity_0_8bpc_c:     5764.4 ( 1.00x)
inv_txfm_add_8x16_flipadst_identity_0_8bpc_rvv:    611.8 ( 9.42x)
inv_txfm_add_8x16_flipadst_identity_1_8bpc_c:     5766.6 ( 1.00x)
inv_txfm_add_8x16_flipadst_identity_1_8bpc_rvv:    611.8 ( 9.43x)
inv_txfm_add_8x16_flipadst_identity_2_8bpc_c:     5763.2 ( 1.00x)
inv_txfm_add_8x16_flipadst_identity_2_8bpc_rvv:    611.8 ( 9.42x)
inv_txfm_add_8x16_identity_adst_0_8bpc_c:         5719.2 ( 1.00x)
inv_txfm_add_8x16_identity_adst_0_8bpc_rvv:        574.0 ( 9.96x)
inv_txfm_add_8x16_identity_adst_1_8bpc_c:         5719.2 ( 1.00x)
inv_txfm_add_8x16_identity_adst_1_8bpc_rvv:        574.0 ( 9.96x)
inv_txfm_add_8x16_identity_adst_2_8bpc_c:         5721.1 ( 1.00x)
inv_txfm_add_8x16_identity_adst_2_8bpc_rvv:        574.0 ( 9.97x)
inv_txfm_add_8x16_identity_dct_0_8bpc_c:          5344.9 ( 1.00x)
inv_txfm_add_8x16_identity_dct_0_8bpc_rvv:         484.9 (11.02x)
inv_txfm_add_8x16_identity_dct_1_8bpc_c:          5341.4 ( 1.00x)
inv_txfm_add_8x16_identity_dct_1_8bpc_rvv:         484.2 (11.03x)
inv_txfm_add_8x16_identity_dct_2_8bpc_c:          5342.9 ( 1.00x)
inv_txfm_add_8x16_identity_dct_2_8bpc_rvv:         484.9 (11.02x)
inv_txfm_add_8x16_identity_flipadst_0_8bpc_c:     5729.5 ( 1.00x)
inv_txfm_add_8x16_identity_flipadst_0_8bpc_rvv:    577.8 ( 9.92x)
inv_txfm_add_8x16_identity_flipadst_1_8bpc_c:     5731.1 ( 1.00x)
inv_txfm_add_8x16_identity_flipadst_1_8bpc_rvv:    578.3 ( 9.91x)
inv_txfm_add_8x16_identity_flipadst_2_8bpc_c:     5730.1 ( 1.00x)
inv_txfm_add_8x16_identity_flipadst_2_8bpc_rvv:    578.2 ( 9.91x)
inv_txfm_add_8x16_identity_identity_0_8bpc_c:     3779.3 ( 1.00x)
inv_txfm_add_8x16_identity_identity_0_8bpc_rvv:    338.8 (11.15x)
inv_txfm_add_8x16_identity_identity_1_8bpc_c:     3779.2 ( 1.00x)
inv_txfm_add_8x16_identity_identity_1_8bpc_rvv:    338.8 (11.16x)
inv_txfm_add_8x16_identity_identity_2_8bpc_c:     3779.3 ( 1.00x)
inv_txfm_add_8x16_identity_identity_2_8bpc_rvv:    338.7 (11.16x)
2024-02-19 10:04:54 -05:00
Nathan E. Egge ce7cd2855b riscv64/itx: Use registers above v15 in iadst_8 macro 2024-02-19 10:04:54 -05:00
Nathan E. Egge e4ed80bc5a riscv64/itx: Add 16x4 8bpc RVV transforms
inv_txfm_add_16x4_adst_adst_0_8bpc_c:             3132.7 ( 1.00x)
inv_txfm_add_16x4_adst_adst_0_8bpc_rvv:            427.3 ( 7.33x)
inv_txfm_add_16x4_adst_adst_1_8bpc_c:             3120.8 ( 1.00x)
inv_txfm_add_16x4_adst_adst_1_8bpc_rvv:            427.1 ( 7.31x)
inv_txfm_add_16x4_adst_adst_2_8bpc_c:             3119.4 ( 1.00x)
inv_txfm_add_16x4_adst_adst_2_8bpc_rvv:            427.2 ( 7.30x)
inv_txfm_add_16x4_adst_dct_0_8bpc_c:              3063.0 ( 1.00x)
inv_txfm_add_16x4_adst_dct_0_8bpc_rvv:             405.3 ( 7.56x)
inv_txfm_add_16x4_adst_dct_1_8bpc_c:              3063.4 ( 1.00x)
inv_txfm_add_16x4_adst_dct_1_8bpc_rvv:             405.4 ( 7.56x)
inv_txfm_add_16x4_adst_dct_2_8bpc_c:              3062.7 ( 1.00x)
inv_txfm_add_16x4_adst_dct_2_8bpc_rvv:             405.4 ( 7.56x)
inv_txfm_add_16x4_adst_flipadst_0_8bpc_c:         3166.7 ( 1.00x)
inv_txfm_add_16x4_adst_flipadst_0_8bpc_rvv:        430.9 ( 7.35x)
inv_txfm_add_16x4_adst_flipadst_1_8bpc_c:         3160.9 ( 1.00x)
inv_txfm_add_16x4_adst_flipadst_1_8bpc_rvv:        430.7 ( 7.34x)
inv_txfm_add_16x4_adst_flipadst_2_8bpc_c:         3160.7 ( 1.00x)
inv_txfm_add_16x4_adst_flipadst_2_8bpc_rvv:        430.2 ( 7.35x)
inv_txfm_add_16x4_adst_identity_0_8bpc_c:         2958.9 ( 1.00x)
inv_txfm_add_16x4_adst_identity_0_8bpc_rvv:        365.2 ( 8.10x)
inv_txfm_add_16x4_adst_identity_1_8bpc_c:         2955.2 ( 1.00x)
inv_txfm_add_16x4_adst_identity_1_8bpc_rvv:        365.2 ( 8.09x)
inv_txfm_add_16x4_adst_identity_2_8bpc_c:         2961.4 ( 1.00x)
inv_txfm_add_16x4_adst_identity_2_8bpc_rvv:        365.2 ( 8.11x)
inv_txfm_add_16x4_dct_adst_0_8bpc_c:              2928.8 ( 1.00x)
inv_txfm_add_16x4_dct_adst_0_8bpc_rvv:             378.5 ( 7.74x)
inv_txfm_add_16x4_dct_adst_1_8bpc_c:              2930.5 ( 1.00x)
inv_txfm_add_16x4_dct_adst_1_8bpc_rvv:             378.6 ( 7.74x)
inv_txfm_add_16x4_dct_adst_2_8bpc_c:              2942.7 ( 1.00x)
inv_txfm_add_16x4_dct_adst_2_8bpc_rvv:             378.6 ( 7.77x)
inv_txfm_add_16x4_dct_dct_0_8bpc_c:                438.8 ( 1.00x)
inv_txfm_add_16x4_dct_dct_0_8bpc_rvv:              356.8 ( 1.23x)
inv_txfm_add_16x4_dct_dct_1_8bpc_c:               2871.7 ( 1.00x)
inv_txfm_add_16x4_dct_dct_1_8bpc_rvv:              356.7 ( 8.05x)
inv_txfm_add_16x4_dct_dct_2_8bpc_c:               2862.9 ( 1.00x)
inv_txfm_add_16x4_dct_dct_2_8bpc_rvv:              356.7 ( 8.03x)
inv_txfm_add_16x4_dct_flipadst_0_8bpc_c:          2965.8 ( 1.00x)
inv_txfm_add_16x4_dct_flipadst_0_8bpc_rvv:         380.6 ( 7.79x)
inv_txfm_add_16x4_dct_flipadst_1_8bpc_c:          2964.8 ( 1.00x)
inv_txfm_add_16x4_dct_flipadst_1_8bpc_rvv:         381.1 ( 7.78x)
inv_txfm_add_16x4_dct_flipadst_2_8bpc_c:          2966.1 ( 1.00x)
inv_txfm_add_16x4_dct_flipadst_2_8bpc_rvv:         381.0 ( 7.78x)
inv_txfm_add_16x4_dct_identity_0_8bpc_c:          2760.8 ( 1.00x)
inv_txfm_add_16x4_dct_identity_0_8bpc_rvv:         310.7 ( 8.89x)
inv_txfm_add_16x4_dct_identity_1_8bpc_c:          2760.8 ( 1.00x)
inv_txfm_add_16x4_dct_identity_1_8bpc_rvv:         310.7 ( 8.89x)
inv_txfm_add_16x4_dct_identity_2_8bpc_c:          2760.4 ( 1.00x)
inv_txfm_add_16x4_dct_identity_2_8bpc_rvv:         310.7 ( 8.88x)
inv_txfm_add_16x4_flipadst_adst_0_8bpc_c:         3140.5 ( 1.00x)
inv_txfm_add_16x4_flipadst_adst_0_8bpc_rvv:        430.7 ( 7.29x)
inv_txfm_add_16x4_flipadst_adst_1_8bpc_c:         3138.3 ( 1.00x)
inv_txfm_add_16x4_flipadst_adst_1_8bpc_rvv:        430.7 ( 7.29x)
inv_txfm_add_16x4_flipadst_adst_2_8bpc_c:         3139.1 ( 1.00x)
inv_txfm_add_16x4_flipadst_adst_2_8bpc_rvv:        430.5 ( 7.29x)
inv_txfm_add_16x4_flipadst_dct_0_8bpc_c:          3060.7 ( 1.00x)
inv_txfm_add_16x4_flipadst_dct_0_8bpc_rvv:         408.9 ( 7.48x)
inv_txfm_add_16x4_flipadst_dct_1_8bpc_c:          3059.8 ( 1.00x)
inv_txfm_add_16x4_flipadst_dct_1_8bpc_rvv:         408.9 ( 7.48x)
inv_txfm_add_16x4_flipadst_dct_2_8bpc_c:          3063.6 ( 1.00x)
inv_txfm_add_16x4_flipadst_dct_2_8bpc_rvv:         408.9 ( 7.49x)
inv_txfm_add_16x4_flipadst_flipadst_0_8bpc_c:     3170.7 ( 1.00x)
inv_txfm_add_16x4_flipadst_flipadst_0_8bpc_rvv:    433.1 ( 7.32x)
inv_txfm_add_16x4_flipadst_flipadst_1_8bpc_c:     3169.1 ( 1.00x)
inv_txfm_add_16x4_flipadst_flipadst_1_8bpc_rvv:    433.0 ( 7.32x)
inv_txfm_add_16x4_flipadst_flipadst_2_8bpc_c:     3175.1 ( 1.00x)
inv_txfm_add_16x4_flipadst_flipadst_2_8bpc_rvv:    433.2 ( 7.33x)
inv_txfm_add_16x4_flipadst_identity_0_8bpc_c:     2954.0 ( 1.00x)
inv_txfm_add_16x4_flipadst_identity_0_8bpc_rvv:    362.1 ( 8.16x)
inv_txfm_add_16x4_flipadst_identity_1_8bpc_c:     2949.5 ( 1.00x)
inv_txfm_add_16x4_flipadst_identity_1_8bpc_rvv:    362.4 ( 8.14x)
inv_txfm_add_16x4_flipadst_identity_2_8bpc_c:     2950.6 ( 1.00x)
inv_txfm_add_16x4_flipadst_identity_2_8bpc_rvv:    362.5 ( 8.14x)
inv_txfm_add_16x4_identity_adst_0_8bpc_c:         1977.4 ( 1.00x)
inv_txfm_add_16x4_identity_adst_0_8bpc_rvv:        296.6 ( 6.67x)
inv_txfm_add_16x4_identity_adst_1_8bpc_c:         1977.3 ( 1.00x)
inv_txfm_add_16x4_identity_adst_1_8bpc_rvv:        296.6 ( 6.67x)
inv_txfm_add_16x4_identity_adst_2_8bpc_c:         1977.4 ( 1.00x)
inv_txfm_add_16x4_identity_adst_2_8bpc_rvv:        296.6 ( 6.67x)
inv_txfm_add_16x4_identity_dct_0_8bpc_c:          1917.3 ( 1.00x)
inv_txfm_add_16x4_identity_dct_0_8bpc_rvv:         276.2 ( 6.94x)
inv_txfm_add_16x4_identity_dct_1_8bpc_c:          1915.6 ( 1.00x)
inv_txfm_add_16x4_identity_dct_1_8bpc_rvv:         276.2 ( 6.94x)
inv_txfm_add_16x4_identity_dct_2_8bpc_c:          1917.2 ( 1.00x)
inv_txfm_add_16x4_identity_dct_2_8bpc_rvv:         276.1 ( 6.94x)
inv_txfm_add_16x4_identity_flipadst_0_8bpc_c:     2017.0 ( 1.00x)
inv_txfm_add_16x4_identity_flipadst_0_8bpc_rvv:    305.8 ( 6.60x)
inv_txfm_add_16x4_identity_flipadst_1_8bpc_c:     2017.4 ( 1.00x)
inv_txfm_add_16x4_identity_flipadst_1_8bpc_rvv:    305.7 ( 6.60x)
inv_txfm_add_16x4_identity_flipadst_2_8bpc_c:     2017.0 ( 1.00x)
inv_txfm_add_16x4_identity_flipadst_2_8bpc_rvv:    305.8 ( 6.60x)
inv_txfm_add_16x4_identity_identity_0_8bpc_c:     1803.4 ( 1.00x)
inv_txfm_add_16x4_identity_identity_0_8bpc_rvv:    228.6 ( 7.89x)
inv_txfm_add_16x4_identity_identity_1_8bpc_c:     1803.6 ( 1.00x)
inv_txfm_add_16x4_identity_identity_1_8bpc_rvv:    228.6 ( 7.89x)
inv_txfm_add_16x4_identity_identity_2_8bpc_c:     1803.0 ( 1.00x)
inv_txfm_add_16x4_identity_identity_2_8bpc_rvv:    228.6 ( 7.89x)
2024-02-19 10:04:54 -05:00
Nathan E. Egge 83423b3484 riscv64/itx: Add 4x16 8bpc RVV transforms
inv_txfm_add_4x16_adst_adst_0_8bpc_c:             3310.8 ( 1.00x)
inv_txfm_add_4x16_adst_adst_0_8bpc_rvv:            429.3 ( 7.71x)
inv_txfm_add_4x16_adst_adst_1_8bpc_c:             3308.6 ( 1.00x)
inv_txfm_add_4x16_adst_adst_1_8bpc_rvv:            429.3 ( 7.71x)
inv_txfm_add_4x16_adst_adst_2_8bpc_c:             3308.2 ( 1.00x)
inv_txfm_add_4x16_adst_adst_2_8bpc_rvv:            429.3 ( 7.71x)
inv_txfm_add_4x16_adst_dct_0_8bpc_c:              3097.6 ( 1.00x)
inv_txfm_add_4x16_adst_dct_0_8bpc_rvv:             381.5 ( 8.12x)
inv_txfm_add_4x16_adst_dct_1_8bpc_c:              3097.6 ( 1.00x)
inv_txfm_add_4x16_adst_dct_1_8bpc_rvv:             381.0 ( 8.13x)
inv_txfm_add_4x16_adst_dct_2_8bpc_c:              3096.4 ( 1.00x)
inv_txfm_add_4x16_adst_dct_2_8bpc_rvv:             381.5 ( 8.12x)
inv_txfm_add_4x16_adst_flipadst_0_8bpc_c:         3309.4 ( 1.00x)
inv_txfm_add_4x16_adst_flipadst_0_8bpc_rvv:        433.5 ( 7.64x)
inv_txfm_add_4x16_adst_flipadst_1_8bpc_c:         3306.9 ( 1.00x)
inv_txfm_add_4x16_adst_flipadst_1_8bpc_rvv:        433.4 ( 7.63x)
inv_txfm_add_4x16_adst_flipadst_2_8bpc_c:         3308.5 ( 1.00x)
inv_txfm_add_4x16_adst_flipadst_2_8bpc_rvv:        433.6 ( 7.63x)
inv_txfm_add_4x16_adst_identity_0_8bpc_c:         2330.0 ( 1.00x)
inv_txfm_add_4x16_adst_identity_0_8bpc_rvv:        298.4 ( 7.81x)
inv_txfm_add_4x16_adst_identity_1_8bpc_c:         2329.4 ( 1.00x)
inv_txfm_add_4x16_adst_identity_1_8bpc_rvv:        298.4 ( 7.81x)
inv_txfm_add_4x16_adst_identity_2_8bpc_c:         2329.7 ( 1.00x)
inv_txfm_add_4x16_adst_identity_2_8bpc_rvv:        298.3 ( 7.81x)
inv_txfm_add_4x16_dct_adst_0_8bpc_c:              3186.5 ( 1.00x)
inv_txfm_add_4x16_dct_adst_0_8bpc_rvv:             408.0 ( 7.81x)
inv_txfm_add_4x16_dct_adst_1_8bpc_c:              3190.3 ( 1.00x)
inv_txfm_add_4x16_dct_adst_1_8bpc_rvv:             408.0 ( 7.82x)
inv_txfm_add_4x16_dct_adst_2_8bpc_c:              3184.9 ( 1.00x)
inv_txfm_add_4x16_dct_adst_2_8bpc_rvv:             408.1 ( 7.80x)
inv_txfm_add_4x16_dct_dct_0_8bpc_c:                455.3 ( 1.00x)
inv_txfm_add_4x16_dct_dct_0_8bpc_rvv:              360.0 ( 1.26x)
inv_txfm_add_4x16_dct_dct_1_8bpc_c:               2974.0 ( 1.00x)
inv_txfm_add_4x16_dct_dct_1_8bpc_rvv:              359.9 ( 8.26x)
inv_txfm_add_4x16_dct_dct_2_8bpc_c:               2975.4 ( 1.00x)
inv_txfm_add_4x16_dct_dct_2_8bpc_rvv:              359.9 ( 8.27x)
inv_txfm_add_4x16_dct_flipadst_0_8bpc_c:          3190.7 ( 1.00x)
inv_txfm_add_4x16_dct_flipadst_0_8bpc_rvv:         412.2 ( 7.74x)
inv_txfm_add_4x16_dct_flipadst_1_8bpc_c:          3190.9 ( 1.00x)
inv_txfm_add_4x16_dct_flipadst_1_8bpc_rvv:         412.3 ( 7.74x)
inv_txfm_add_4x16_dct_flipadst_2_8bpc_c:          3192.7 ( 1.00x)
inv_txfm_add_4x16_dct_flipadst_2_8bpc_rvv:         412.2 ( 7.75x)
inv_txfm_add_4x16_dct_identity_0_8bpc_c:          2208.3 ( 1.00x)
inv_txfm_add_4x16_dct_identity_0_8bpc_rvv:         277.2 ( 7.97x)
inv_txfm_add_4x16_dct_identity_1_8bpc_c:          2206.6 ( 1.00x)
inv_txfm_add_4x16_dct_identity_1_8bpc_rvv:         277.2 ( 7.96x)
inv_txfm_add_4x16_dct_identity_2_8bpc_c:          2205.9 ( 1.00x)
inv_txfm_add_4x16_dct_identity_2_8bpc_rvv:         277.1 ( 7.96x)
inv_txfm_add_4x16_flipadst_adst_0_8bpc_c:         3329.2 ( 1.00x)
inv_txfm_add_4x16_flipadst_adst_0_8bpc_rvv:        429.7 ( 7.75x)
inv_txfm_add_4x16_flipadst_adst_1_8bpc_c:         3328.1 ( 1.00x)
inv_txfm_add_4x16_flipadst_adst_1_8bpc_rvv:        430.3 ( 7.73x)
inv_txfm_add_4x16_flipadst_adst_2_8bpc_c:         3331.1 ( 1.00x)
inv_txfm_add_4x16_flipadst_adst_2_8bpc_rvv:        430.3 ( 7.74x)
inv_txfm_add_4x16_flipadst_dct_0_8bpc_c:          3119.8 ( 1.00x)
inv_txfm_add_4x16_flipadst_dct_0_8bpc_rvv:         381.6 ( 8.18x)
inv_txfm_add_4x16_flipadst_dct_1_8bpc_c:          3119.7 ( 1.00x)
inv_txfm_add_4x16_flipadst_dct_1_8bpc_rvv:         381.6 ( 8.17x)
inv_txfm_add_4x16_flipadst_dct_2_8bpc_c:          3119.0 ( 1.00x)
inv_txfm_add_4x16_flipadst_dct_2_8bpc_rvv:         381.7 ( 8.17x)
inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_c:     3329.8 ( 1.00x)
inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_rvv:    433.7 ( 7.68x)
inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_c:     3328.3 ( 1.00x)
inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_rvv:    433.7 ( 7.67x)
inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_c:     3328.2 ( 1.00x)
inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_rvv:    433.6 ( 7.67x)
inv_txfm_add_4x16_flipadst_identity_0_8bpc_c:     2350.4 ( 1.00x)
inv_txfm_add_4x16_flipadst_identity_0_8bpc_rvv:    299.2 ( 7.86x)
inv_txfm_add_4x16_flipadst_identity_1_8bpc_c:     2353.5 ( 1.00x)
inv_txfm_add_4x16_flipadst_identity_1_8bpc_rvv:    299.1 ( 7.87x)
inv_txfm_add_4x16_flipadst_identity_2_8bpc_c:     2352.5 ( 1.00x)
inv_txfm_add_4x16_flipadst_identity_2_8bpc_rvv:    299.1 ( 7.87x)
inv_txfm_add_4x16_identity_adst_0_8bpc_c:         2967.8 ( 1.00x)
inv_txfm_add_4x16_identity_adst_0_8bpc_rvv:        360.7 ( 8.23x)
inv_txfm_add_4x16_identity_adst_1_8bpc_c:         2965.5 ( 1.00x)
inv_txfm_add_4x16_identity_adst_1_8bpc_rvv:        360.7 ( 8.22x)
inv_txfm_add_4x16_identity_adst_2_8bpc_c:         2964.5 ( 1.00x)
inv_txfm_add_4x16_identity_adst_2_8bpc_rvv:        360.4 ( 8.23x)
inv_txfm_add_4x16_identity_dct_0_8bpc_c:          2758.0 ( 1.00x)
inv_txfm_add_4x16_identity_dct_0_8bpc_rvv:         313.2 ( 8.81x)
inv_txfm_add_4x16_identity_dct_1_8bpc_c:          2757.3 ( 1.00x)
inv_txfm_add_4x16_identity_dct_1_8bpc_rvv:         313.2 ( 8.80x)
inv_txfm_add_4x16_identity_dct_2_8bpc_c:          2758.4 ( 1.00x)
inv_txfm_add_4x16_identity_dct_2_8bpc_rvv:         313.1 ( 8.81x)
inv_txfm_add_4x16_identity_flipadst_0_8bpc_c:     2968.3 ( 1.00x)
inv_txfm_add_4x16_identity_flipadst_0_8bpc_rvv:    364.6 ( 8.14x)
inv_txfm_add_4x16_identity_flipadst_1_8bpc_c:     2965.2 ( 1.00x)
inv_txfm_add_4x16_identity_flipadst_1_8bpc_rvv:    364.6 ( 8.13x)
inv_txfm_add_4x16_identity_flipadst_2_8bpc_c:     2968.5 ( 1.00x)
inv_txfm_add_4x16_identity_flipadst_2_8bpc_rvv:    364.6 ( 8.14x)
inv_txfm_add_4x16_identity_identity_0_8bpc_c:     1985.7 ( 1.00x)
inv_txfm_add_4x16_identity_identity_0_8bpc_rvv:    229.3 ( 8.66x)
inv_txfm_add_4x16_identity_identity_1_8bpc_c:     1985.4 ( 1.00x)
inv_txfm_add_4x16_identity_identity_1_8bpc_rvv:    229.6 ( 8.65x)
inv_txfm_add_4x16_identity_identity_2_8bpc_c:     1985.7 ( 1.00x)
inv_txfm_add_4x16_identity_identity_2_8bpc_rvv:    229.4 ( 8.66x)
2024-02-19 10:04:54 -05:00
Nathan E. Egge 40d5b50552 riscv64/itx: Use registers above v15 in iadst_4 macro 2024-02-19 10:04:54 -05:00
Nathan E. Egge 27e5e2629c riscv64/itx: Add 8x4 8bpc RVV transforms
inv_txfm_add_8x4_adst_adst_0_8bpc_c:             1600.6 ( 1.00x)
inv_txfm_add_8x4_adst_adst_0_8bpc_rvv:            199.2 ( 8.03x)
inv_txfm_add_8x4_adst_adst_1_8bpc_c:             1602.3 ( 1.00x)
inv_txfm_add_8x4_adst_adst_1_8bpc_rvv:            199.2 ( 8.04x)
inv_txfm_add_8x4_adst_dct_0_8bpc_c:              1551.1 ( 1.00x)
inv_txfm_add_8x4_adst_dct_0_8bpc_rvv:             193.6 ( 8.01x)
inv_txfm_add_8x4_adst_dct_1_8bpc_c:              1550.7 ( 1.00x)
inv_txfm_add_8x4_adst_dct_1_8bpc_rvv:             193.6 ( 8.01x)
inv_txfm_add_8x4_adst_flipadst_0_8bpc_c:         1609.9 ( 1.00x)
inv_txfm_add_8x4_adst_flipadst_0_8bpc_rvv:        200.7 ( 8.02x)
inv_txfm_add_8x4_adst_flipadst_1_8bpc_c:         1608.4 ( 1.00x)
inv_txfm_add_8x4_adst_flipadst_1_8bpc_rvv:        200.7 ( 8.01x)
inv_txfm_add_8x4_adst_identity_0_8bpc_c:         1518.1 ( 1.00x)
inv_txfm_add_8x4_adst_identity_0_8bpc_rvv:        168.6 ( 9.00x)
inv_txfm_add_8x4_adst_identity_1_8bpc_c:         1518.0 ( 1.00x)
inv_txfm_add_8x4_adst_identity_1_8bpc_rvv:        168.6 ( 9.00x)
inv_txfm_add_8x4_dct_adst_0_8bpc_c:              1474.6 ( 1.00x)
inv_txfm_add_8x4_dct_adst_0_8bpc_rvv:             176.1 ( 8.37x)
inv_txfm_add_8x4_dct_adst_1_8bpc_c:              1474.4 ( 1.00x)
inv_txfm_add_8x4_dct_adst_1_8bpc_rvv:             176.1 ( 8.37x)
inv_txfm_add_8x4_dct_dct_0_8bpc_c:                256.5 ( 1.00x)
inv_txfm_add_8x4_dct_dct_0_8bpc_rvv:              170.5 ( 1.50x)
inv_txfm_add_8x4_dct_dct_1_8bpc_c:               1450.1 ( 1.00x)
inv_txfm_add_8x4_dct_dct_1_8bpc_rvv:              170.5 ( 8.50x)
inv_txfm_add_8x4_dct_flipadst_0_8bpc_c:          1489.6 ( 1.00x)
inv_txfm_add_8x4_dct_flipadst_0_8bpc_rvv:         177.5 ( 8.39x)
inv_txfm_add_8x4_dct_flipadst_1_8bpc_c:          1488.6 ( 1.00x)
inv_txfm_add_8x4_dct_flipadst_1_8bpc_rvv:         177.5 ( 8.38x)
inv_txfm_add_8x4_dct_identity_0_8bpc_c:          1396.3 ( 1.00x)
inv_txfm_add_8x4_dct_identity_0_8bpc_rvv:         145.5 ( 9.60x)
inv_txfm_add_8x4_dct_identity_1_8bpc_c:          1395.7 ( 1.00x)
inv_txfm_add_8x4_dct_identity_1_8bpc_rvv:         145.5 ( 9.59x)
inv_txfm_add_8x4_flipadst_adst_0_8bpc_c:         1596.5 ( 1.00x)
inv_txfm_add_8x4_flipadst_adst_0_8bpc_rvv:        200.5 ( 7.96x)
inv_txfm_add_8x4_flipadst_adst_1_8bpc_c:         1596.0 ( 1.00x)
inv_txfm_add_8x4_flipadst_adst_1_8bpc_rvv:        200.5 ( 7.96x)
inv_txfm_add_8x4_flipadst_dct_0_8bpc_c:          1554.8 ( 1.00x)
inv_txfm_add_8x4_flipadst_dct_0_8bpc_rvv:         194.8 ( 7.98x)
inv_txfm_add_8x4_flipadst_dct_1_8bpc_c:          1556.5 ( 1.00x)
inv_txfm_add_8x4_flipadst_dct_1_8bpc_rvv:         194.8 ( 7.99x)
inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_c:     1613.3 ( 1.00x)
inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_rvv:    206.7 ( 7.80x)
inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_c:     1612.1 ( 1.00x)
inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_rvv:    206.7 ( 7.80x)
inv_txfm_add_8x4_flipadst_identity_0_8bpc_c:     1519.8 ( 1.00x)
inv_txfm_add_8x4_flipadst_identity_0_8bpc_rvv:    169.8 ( 8.95x)
inv_txfm_add_8x4_flipadst_identity_1_8bpc_c:     1520.7 ( 1.00x)
inv_txfm_add_8x4_flipadst_identity_1_8bpc_rvv:    169.8 ( 8.95x)
inv_txfm_add_8x4_identity_adst_0_8bpc_c:         1101.0 ( 1.00x)
inv_txfm_add_8x4_identity_adst_0_8bpc_rvv:        124.8 ( 8.82x)
inv_txfm_add_8x4_identity_adst_1_8bpc_c:         1101.0 ( 1.00x)
inv_txfm_add_8x4_identity_adst_1_8bpc_rvv:        124.8 ( 8.82x)
inv_txfm_add_8x4_identity_dct_0_8bpc_c:          1058.4 ( 1.00x)
inv_txfm_add_8x4_identity_dct_0_8bpc_rvv:         121.2 ( 8.73x)
inv_txfm_add_8x4_identity_dct_1_8bpc_c:          1058.3 ( 1.00x)
inv_txfm_add_8x4_identity_dct_1_8bpc_rvv:         121.2 ( 8.73x)
inv_txfm_add_8x4_identity_flipadst_0_8bpc_c:     1113.2 ( 1.00x)
inv_txfm_add_8x4_identity_flipadst_0_8bpc_rvv:    126.2 ( 8.82x)
inv_txfm_add_8x4_identity_flipadst_1_8bpc_c:     1113.4 ( 1.00x)
inv_txfm_add_8x4_identity_flipadst_1_8bpc_rvv:    126.4 ( 8.81x)
inv_txfm_add_8x4_identity_identity_0_8bpc_c:     1010.6 ( 1.00x)
inv_txfm_add_8x4_identity_identity_0_8bpc_rvv:     94.2 (10.73x)
inv_txfm_add_8x4_identity_identity_1_8bpc_c:     1010.4 ( 1.00x)
inv_txfm_add_8x4_identity_identity_1_8bpc_rvv:     94.2 (10.72x)
2024-02-19 10:04:54 -05:00
Nathan E. Egge adba0c6ff8 riscv64/itx: Add 4x8 8bpc RVV transforms
inv_txfm_add_4x8_adst_adst_0_8bpc_c:             1619.6 ( 1.00x)
inv_txfm_add_4x8_adst_adst_0_8bpc_rvv:            198.6 ( 8.16x)
inv_txfm_add_4x8_adst_adst_1_8bpc_c:             1621.5 ( 1.00x)
inv_txfm_add_4x8_adst_adst_1_8bpc_rvv:            198.5 ( 8.17x)
inv_txfm_add_4x8_adst_dct_0_8bpc_c:              1496.1 ( 1.00x)
inv_txfm_add_4x8_adst_dct_0_8bpc_rvv:             175.1 ( 8.54x)
inv_txfm_add_4x8_adst_dct_1_8bpc_c:              1496.3 ( 1.00x)
inv_txfm_add_4x8_adst_dct_1_8bpc_rvv:             175.1 ( 8.55x)
inv_txfm_add_4x8_adst_flipadst_0_8bpc_c:         1624.8 ( 1.00x)
inv_txfm_add_4x8_adst_flipadst_0_8bpc_rvv:        200.6 ( 8.10x)
inv_txfm_add_4x8_adst_flipadst_1_8bpc_c:         1623.9 ( 1.00x)
inv_txfm_add_4x8_adst_flipadst_1_8bpc_rvv:        200.6 ( 8.10x)
inv_txfm_add_4x8_adst_identity_0_8bpc_c:         1132.3 ( 1.00x)
inv_txfm_add_4x8_adst_identity_0_8bpc_rvv:        122.6 ( 9.24x)
inv_txfm_add_4x8_adst_identity_1_8bpc_c:         1132.2 ( 1.00x)
inv_txfm_add_4x8_adst_identity_1_8bpc_rvv:        122.6 ( 9.23x)
inv_txfm_add_4x8_dct_adst_0_8bpc_c:              1561.5 ( 1.00x)
inv_txfm_add_4x8_dct_adst_0_8bpc_rvv:             192.3 ( 8.12x)
inv_txfm_add_4x8_dct_adst_1_8bpc_c:              1563.9 ( 1.00x)
inv_txfm_add_4x8_dct_adst_1_8bpc_rvv:             192.3 ( 8.13x)
inv_txfm_add_4x8_dct_dct_0_8bpc_c:                260.9 ( 1.00x)
inv_txfm_add_4x8_dct_dct_0_8bpc_rvv:              168.9 ( 1.55x)
inv_txfm_add_4x8_dct_dct_1_8bpc_c:               1443.6 ( 1.00x)
inv_txfm_add_4x8_dct_dct_1_8bpc_rvv:              168.9 ( 8.55x)
inv_txfm_add_4x8_dct_flipadst_0_8bpc_c:          1567.5 ( 1.00x)
inv_txfm_add_4x8_dct_flipadst_0_8bpc_rvv:         194.3 ( 8.07x)
inv_txfm_add_4x8_dct_flipadst_1_8bpc_c:          1565.8 ( 1.00x)
inv_txfm_add_4x8_dct_flipadst_1_8bpc_rvv:         194.3 ( 8.06x)
inv_txfm_add_4x8_dct_identity_0_8bpc_c:          1073.8 ( 1.00x)
inv_txfm_add_4x8_dct_identity_0_8bpc_rvv:         116.4 ( 9.23x)
inv_txfm_add_4x8_dct_identity_1_8bpc_c:          1074.4 ( 1.00x)
inv_txfm_add_4x8_dct_identity_1_8bpc_rvv:         116.3 ( 9.23x)
inv_txfm_add_4x8_flipadst_adst_0_8bpc_c:         1631.1 ( 1.00x)
inv_txfm_add_4x8_flipadst_adst_0_8bpc_rvv:        200.6 ( 8.13x)
inv_txfm_add_4x8_flipadst_adst_1_8bpc_c:         1631.1 ( 1.00x)
inv_txfm_add_4x8_flipadst_adst_1_8bpc_rvv:        200.6 ( 8.13x)
inv_txfm_add_4x8_flipadst_dct_0_8bpc_c:          1507.0 ( 1.00x)
inv_txfm_add_4x8_flipadst_dct_0_8bpc_rvv:         177.1 ( 8.51x)
inv_txfm_add_4x8_flipadst_dct_1_8bpc_c:          1506.3 ( 1.00x)
inv_txfm_add_4x8_flipadst_dct_1_8bpc_rvv:         177.1 ( 8.50x)
inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_c:     1633.9 ( 1.00x)
inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_rvv:    202.5 ( 8.07x)
inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_c:     1633.7 ( 1.00x)
inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_rvv:    202.5 ( 8.07x)
inv_txfm_add_4x8_flipadst_identity_0_8bpc_c:     1142.7 ( 1.00x)
inv_txfm_add_4x8_flipadst_identity_0_8bpc_rvv:    123.2 ( 9.27x)
inv_txfm_add_4x8_flipadst_identity_1_8bpc_c:     1142.6 ( 1.00x)
inv_txfm_add_4x8_flipadst_identity_1_8bpc_rvv:    123.2 ( 9.27x)
inv_txfm_add_4x8_identity_adst_0_8bpc_c:         1442.0 ( 1.00x)
inv_txfm_add_4x8_identity_adst_0_8bpc_rvv:        168.9 ( 8.54x)
inv_txfm_add_4x8_identity_adst_1_8bpc_c:         1442.8 ( 1.00x)
inv_txfm_add_4x8_identity_adst_1_8bpc_rvv:        168.9 ( 8.54x)
inv_txfm_add_4x8_identity_dct_0_8bpc_c:          1322.7 ( 1.00x)
inv_txfm_add_4x8_identity_dct_0_8bpc_rvv:         146.7 ( 9.02x)
inv_txfm_add_4x8_identity_dct_1_8bpc_c:          1320.9 ( 1.00x)
inv_txfm_add_4x8_identity_dct_1_8bpc_rvv:         146.7 ( 9.00x)
inv_txfm_add_4x8_identity_flipadst_0_8bpc_c:     1451.0 ( 1.00x)
inv_txfm_add_4x8_identity_flipadst_0_8bpc_rvv:    171.0 ( 8.48x)
inv_txfm_add_4x8_identity_flipadst_1_8bpc_c:     1450.0 ( 1.00x)
inv_txfm_add_4x8_identity_flipadst_1_8bpc_rvv:    171.0 ( 8.48x)
inv_txfm_add_4x8_identity_identity_0_8bpc_c:      977.1 ( 1.00x)
inv_txfm_add_4x8_identity_identity_0_8bpc_rvv:     93.9 (10.41x)
inv_txfm_add_4x8_identity_identity_1_8bpc_c:      976.9 ( 1.00x)
inv_txfm_add_4x8_identity_identity_1_8bpc_rvv:     93.9 (10.41x)
2024-02-19 10:04:54 -05:00
Nathan E. Egge 45f993c3ba riscv64/itx: Add 4-point 8bpc RVV wide transforms
The 4-point ADST transform in AV1 is a Type-VII DST and 8bpc uses 32-bit
 additions so cannot be made grouping agnostic.
2024-02-19 10:03:04 -05:00
Nathan E. Egge e0d4655ff3 riscv64/itx: Parameterize LMUL in iadst_4 macro 2024-02-19 09:24:47 -05:00
Nathan E. Egge c5b12bd94e riscv64/itx: Use m2 register spacing in iadst_4 macro 2024-02-19 09:24:47 -05:00
Nathan E. Egge 7080c09057 riscv64/itx: Reuse 8x8 epilog, saves 306 bytes
This commit shares the trailing instructions from inv_txfm_add_8x8_rvv
 with inv_txfm_identity_add_8x8_rvv, only *8x8_identity* functions are
 modified:

                                                   Old     New    Delta

inv_txfm_add_8x8_identity_adst_0_8bpc_rvv:        268.2   268.2   0.00%
inv_txfm_add_8x8_identity_adst_1_8bpc_rvv:        268.3   268.2  -0.04%
inv_txfm_add_8x8_identity_dct_0_8bpc_rvv:         225.1   228.3   1.42%
inv_txfm_add_8x8_identity_dct_1_8bpc_rvv:         225.1   228.2   1.37%
inv_txfm_add_8x8_identity_flipadst_0_8bpc_rvv:    270.6   270.2  -0.15%
inv_txfm_add_8x8_identity_flipadst_1_8bpc_rvv:    270.6   270.3  -0.11%
inv_txfm_add_8x8_identity_identity_0_8bpc_rvv:    146.1   146.0  -0.07%
inv_txfm_add_8x8_identity_identity_1_8bpc_rvv:    146.1   146.1   0.00%

inv_txfm_add_8x8_dct_adst_0_8bpc_rvv:             360.0   359.8  -0.06%
inv_txfm_add_8x8_dct_adst_1_8bpc_rvv:             360.0   359.6  -0.11%
inv_txfm_add_8x8_dct_dct_0_8bpc_rvv:               74.7    76.4   2.28%
inv_txfm_add_8x8_dct_dct_1_8bpc_rvv:              316.9   321.6   1.48%
inv_txfm_add_8x8_dct_flipadst_0_8bpc_rvv:         362.0   361.8  -0.06%
inv_txfm_add_8x8_dct_flipadst_1_8bpc_rvv:         361.9   361.9   0.00%
inv_txfm_add_8x8_dct_identity_0_8bpc_rvv:         240.0   240.6   0.25%
inv_txfm_add_8x8_dct_identity_1_8bpc_rvv:         240.0   240.6   0.25%

inv_txfm_add_8x8_adst_adst_0_8bpc_rvv:            403.0   403.3   0.07%
inv_txfm_add_8x8_adst_adst_1_8bpc_rvv:            403.0   403.4   0.10%
inv_txfm_add_8x8_adst_dct_0_8bpc_rvv:             359.7   359.7   0.00%
inv_txfm_add_8x8_adst_dct_1_8bpc_rvv:             359.4   359.7   0.08%
inv_txfm_add_8x8_adst_flipadst_0_8bpc_rvv:        404.6   405.1   0.12%
inv_txfm_add_8x8_adst_flipadst_1_8bpc_rvv:        404.6   405.3   0.17%
inv_txfm_add_8x8_adst_identity_0_8bpc_rvv:        283.4   282.8  -0.21%
inv_txfm_add_8x8_adst_identity_1_8bpc_rvv:        283.4   282.8  -0.21%

inv_txfm_add_8x8_flipadst_adst_0_8bpc_rvv:        403.9   404.6   0.17%
inv_txfm_add_8x8_flipadst_adst_1_8bpc_rvv:        404.0   404.6   0.15%
inv_txfm_add_8x8_flipadst_dct_0_8bpc_rvv:         361.4   361.5   0.03%
inv_txfm_add_8x8_flipadst_dct_1_8bpc_rvv:         361.3   361.5   0.06%
inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_rvv:    406.2   406.1  -0.02%
inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_rvv:    405.7   406.4   0.17%
inv_txfm_add_8x8_flipadst_identity_0_8bpc_rvv:    284.8   287.5   0.95%
inv_txfm_add_8x8_flipadst_identity_1_8bpc_rvv:    284.8   287.6   0.98%
2024-02-19 08:15:55 -05:00
Nathan E. Egge 9315185b73 riscv: Add asm.S macro for decorating local symbols 2024-02-19 08:15:55 -05:00
Nathan E. Egge 090b959c77 arm64/itx: Reuse 8x8 epilog, saves 264 bytes
This commit shares the trailing instructions from inv_txfm_add_8x8_neon
 with inv_txfm_identity_add_8x8_neon, only *8x8_identity* functions are
 modified:

                                                   Old     New    Delta

inv_txfm_add_8x8_identity_adst_0_8bpc_neon:       113.5   117.3   3.35%
inv_txfm_add_8x8_identity_adst_1_8bpc_neon:       113.5   117.3   3.35%
inv_txfm_add_8x8_identity_dct_0_8bpc_neon:         98.2    96.0  -2.24%
inv_txfm_add_8x8_identity_dct_1_8bpc_neon:         98.3    96.0  -2.34%
inv_txfm_add_8x8_identity_flipadst_0_8bpc_neon:   113.3   112.8  -0.44%
inv_txfm_add_8x8_identity_flipadst_1_8bpc_neon:   113.4   112.8  -0.53%
inv_txfm_add_8x8_identity_identity_0_8bpc_neon:    44.3    45.0   1.58%
inv_txfm_add_8x8_identity_identity_1_8bpc_neon:    44.3    45.0   1.58%

inv_txfm_add_8x8_dct_adst_0_8bpc_neon:            190.8   190.3  -0.26%
inv_txfm_add_8x8_dct_adst_1_8bpc_neon:            190.8   190.3  -0.26%
inv_txfm_add_8x8_dct_dct_0_8bpc_neon:              31.3    31.3   0.00%
inv_txfm_add_8x8_dct_dct_1_8bpc_neon:             166.8   167.0   0.12%
inv_txfm_add_8x8_dct_flipadst_0_8bpc_neon:        190.5   190.3  -0.11%
inv_txfm_add_8x8_dct_flipadst_1_8bpc_neon:        190.5   190.3  -0.11%
inv_txfm_add_8x8_dct_identity_0_8bpc_neon:        118.8   118.3  -0.42%
inv_txfm_add_8x8_dct_identity_1_8bpc_neon:        118.8   118.3  -0.42%

inv_txfm_add_8x8_adst_adst_0_8bpc_neon:           206.8   206.5  -0.15%
inv_txfm_add_8x8_adst_adst_1_8bpc_neon:           206.8   206.5  -0.15%
inv_txfm_add_8x8_adst_dct_0_8bpc_neon:            187.7   188.3   0.32%
inv_txfm_add_8x8_adst_dct_1_8bpc_neon:            187.5   188.3   0.42%
inv_txfm_add_8x8_adst_flipadst_0_8bpc_neon:       207.3   207.3   0.00%
inv_txfm_add_8x8_adst_flipadst_1_8bpc_neon:       207.3   207.3   0.00%
inv_txfm_add_8x8_adst_identity_0_8bpc_neon:       136.7   136.5  -0.15%
inv_txfm_add_8x8_adst_identity_1_8bpc_neon:       136.3   136.5   0.15%

inv_txfm_add_8x8_flipadst_adst_0_8bpc_neon:       206.5   206.5   0.00%
inv_txfm_add_8x8_flipadst_adst_1_8bpc_neon:       206.5   206.5   0.00%
inv_txfm_add_8x8_flipadst_dct_0_8bpc_neon:        188.5   188.3  -0.11%
inv_txfm_add_8x8_flipadst_dct_1_8bpc_neon:        188.5   188.3  -0.11%
inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_neon:   207.5   206.9  -0.29%
inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_neon:   207.5   206.5  -0.48%
inv_txfm_add_8x8_flipadst_identity_0_8bpc_neon:   138.2   138.3   0.07%
inv_txfm_add_8x8_flipadst_identity_1_8bpc_neon:   137.5   138.3   0.58%
2024-02-19 08:15:55 -05:00
Nathan E. Egge e8fbfd999b arm32/itx: Reuse 8x8 epilog, saves 220 bytes
This commit shares the trailing instructions from inv_txfm_add_8x8_neon
 with inv_txfm_identity_add_8x8_neon, only *8x8_identity* functions are
 modified.
2024-02-19 08:15:55 -05:00
Nathan E. Egge 50d63f9a6e arm32/itx: Only set r4 when needed, saves 48 bytes
Avoid setting r4 when the horizontal transform is the identity in
 {4,8}x16 and 16x4 rectangular transforms.
2024-02-19 07:51:09 -05:00
Nathan E. Egge b56b02a914 arm64/itx: Only set x4 when needed, saves 64 bytes
Avoid setting x4 when the horizontal transform is the identity in
 {4,8}x16 and 16x{4,8} rectangular transforms.
2024-02-19 07:51:09 -05:00
Nathan E. Egge 97cc6cee81 riscv64/itx: Add missing tail, mask agnostic flags 2024-02-15 07:38:06 -05:00
Nathan E. Egge 7b15ca1375 riscv64/itx: Add 16-point 8bpc RVV flipadst transform
inv_txfm_add_16x16_adst_flipadst_0_8bpc_c:        15272.2 ( 1.00x)
inv_txfm_add_16x16_adst_flipadst_0_8bpc_rvv:       1824.4 ( 8.37x)
inv_txfm_add_16x16_adst_flipadst_1_8bpc_c:        15261.2 ( 1.00x)
inv_txfm_add_16x16_adst_flipadst_1_8bpc_rvv:       1824.5 ( 8.36x)
inv_txfm_add_16x16_adst_flipadst_2_8bpc_c:        15260.0 ( 1.00x)
inv_txfm_add_16x16_adst_flipadst_2_8bpc_rvv:       1824.5 ( 8.36x)
inv_txfm_add_16x16_dct_flipadst_0_8bpc_c:         14497.2 ( 1.00x)
inv_txfm_add_16x16_dct_flipadst_0_8bpc_rvv:        1637.3 ( 8.85x)
inv_txfm_add_16x16_dct_flipadst_1_8bpc_c:         14490.5 ( 1.00x)
inv_txfm_add_16x16_dct_flipadst_1_8bpc_rvv:        1637.3 ( 8.85x)
inv_txfm_add_16x16_dct_flipadst_2_8bpc_c:         14486.4 ( 1.00x)
inv_txfm_add_16x16_dct_flipadst_2_8bpc_rvv:        1637.3 ( 8.85x)
inv_txfm_add_16x16_flipadst_adst_0_8bpc_c:        15307.7 ( 1.00x)
inv_txfm_add_16x16_flipadst_adst_0_8bpc_rvv:       1808.0 ( 8.47x)
inv_txfm_add_16x16_flipadst_adst_1_8bpc_c:        15341.0 ( 1.00x)
inv_txfm_add_16x16_flipadst_adst_1_8bpc_rvv:       1808.1 ( 8.48x)
inv_txfm_add_16x16_flipadst_adst_2_8bpc_c:        15333.5 ( 1.00x)
inv_txfm_add_16x16_flipadst_adst_2_8bpc_rvv:       1808.1 ( 8.48x)
inv_txfm_add_16x16_flipadst_dct_0_8bpc_c:         14530.0 ( 1.00x)
inv_txfm_add_16x16_flipadst_dct_0_8bpc_rvv:        1636.4 ( 8.88x)
inv_txfm_add_16x16_flipadst_dct_1_8bpc_c:         14510.3 ( 1.00x)
inv_txfm_add_16x16_flipadst_dct_1_8bpc_rvv:        1636.3 ( 8.87x)
inv_txfm_add_16x16_flipadst_dct_2_8bpc_c:         14504.7 ( 1.00x)
inv_txfm_add_16x16_flipadst_dct_2_8bpc_rvv:        1636.3 ( 8.86x)
inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_c:    15315.5 ( 1.00x)
inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_rvv:   1823.5 ( 8.40x)
inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_c:    15324.1 ( 1.00x)
inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_rvv:   1823.3 ( 8.40x)
inv_txfm_add_16x16_flipadst_flipadst_2_8bpc_c:    15315.6 ( 1.00x)
inv_txfm_add_16x16_flipadst_flipadst_2_8bpc_rvv:   1823.5 ( 8.40x)
2024-02-14 08:31:42 -05:00
Nathan E. Egge b981bc9c3e riscv64/itx: Convert inv_adst_e16_x16_rvv to macro 2024-02-14 08:31:42 -05:00
Nathan E. Egge 2685b40920 riscv64/itx: Add 16-point 8bpc RVV adst transform
inv_txfm_add_16x16_adst_adst_0_8bpc_c:      15364.4 ( 1.00x)
inv_txfm_add_16x16_adst_adst_0_8bpc_rvv:     1814.1 ( 8.47x)
inv_txfm_add_16x16_adst_adst_1_8bpc_c:      15363.7 ( 1.00x)
inv_txfm_add_16x16_adst_adst_1_8bpc_rvv:     1814.5 ( 8.47x)
inv_txfm_add_16x16_adst_adst_2_8bpc_c:      15368.9 ( 1.00x)
inv_txfm_add_16x16_adst_adst_2_8bpc_rvv:     1814.5 ( 8.47x)
inv_txfm_add_16x16_adst_dct_0_8bpc_c:       14560.0 ( 1.00x)
inv_txfm_add_16x16_adst_dct_0_8bpc_rvv:      1644.4 ( 8.85x)
inv_txfm_add_16x16_adst_dct_1_8bpc_c:       14578.9 ( 1.00x)
inv_txfm_add_16x16_adst_dct_1_8bpc_rvv:      1644.5 ( 8.87x)
inv_txfm_add_16x16_adst_dct_2_8bpc_c:       14575.0 ( 1.00x)
inv_txfm_add_16x16_adst_dct_2_8bpc_rvv:      1644.6 ( 8.86x)
inv_txfm_add_16x16_dct_adst_0_8bpc_c:       14550.7 ( 1.00x)
inv_txfm_add_16x16_dct_adst_0_8bpc_rvv:      1622.7 ( 8.97x)
inv_txfm_add_16x16_dct_adst_1_8bpc_c:       14556.0 ( 1.00x)
inv_txfm_add_16x16_dct_adst_1_8bpc_rvv:      1622.6 ( 8.97x)
inv_txfm_add_16x16_dct_adst_2_8bpc_c:       14543.3 ( 1.00x)
inv_txfm_add_16x16_dct_adst_2_8bpc_rvv:      1622.6 ( 8.96x)
2024-02-14 08:31:42 -05:00
Nathan E. Egge 72dba22e66 riscv64/itx: Add 4x4 8bpc RVV wht_wht transform
inv_txfm_add_4x4_wht_wht_0_8bpc_c:      265.6 ( 1.00x)
inv_txfm_add_4x4_wht_wht_0_8bpc_rvv:     66.9 ( 3.97x)
inv_txfm_add_4x4_wht_wht_1_8bpc_c:      265.5 ( 1.00x)
inv_txfm_add_4x4_wht_wht_1_8bpc_rvv:     66.9 ( 3.97x)
2024-02-14 08:31:42 -05:00
Nathan E. Egge cc29b2314c riscv64/itx: Add 16x16 8bpc dct_identity and identity_dct
inv_txfm_add_16x16_dct_identity_0_8bpc_c:    10593.3 ( 1.00x)
inv_txfm_add_16x16_dct_identity_0_8bpc_rvv:   1163.3 ( 9.11x)
inv_txfm_add_16x16_dct_identity_1_8bpc_c:    10584.9 ( 1.00x)
inv_txfm_add_16x16_dct_identity_1_8bpc_rvv:   1163.3 ( 9.10x)
inv_txfm_add_16x16_dct_identity_2_8bpc_c:    10590.3 ( 1.00x)
inv_txfm_add_16x16_dct_identity_2_8bpc_rvv:   1163.6 ( 9.10x)
inv_txfm_add_16x16_identity_dct_0_8bpc_c:     9945.9 ( 1.00x)
inv_txfm_add_16x16_identity_dct_0_8bpc_rvv:   1150.2 ( 8.65x)
inv_txfm_add_16x16_identity_dct_1_8bpc_c:     9937.0 ( 1.00x)
inv_txfm_add_16x16_identity_dct_1_8bpc_rvv:   1150.3 ( 8.64x)
inv_txfm_add_16x16_identity_dct_2_8bpc_c:     9934.6 ( 1.00x)
inv_txfm_add_16x16_identity_dct_2_8bpc_rvv:   1150.4 ( 8.64x)
2024-02-14 08:31:42 -05:00
Nathan E. Egge 8e82093ebb riscv64/itx: Add 16-point 8bpc RVV dct transform
inv_txfm_add_16x16_dct_dct_0_8bpc_c:     1574.4 ( 1.00x)
inv_txfm_add_16x16_dct_dct_0_8bpc_rvv:   1450.3 ( 1.09x)
inv_txfm_add_16x16_dct_dct_1_8bpc_c:    13614.4 ( 1.00x)
inv_txfm_add_16x16_dct_dct_1_8bpc_rvv:   1450.5 ( 9.39x)
inv_txfm_add_16x16_dct_dct_2_8bpc_c:    13613.2 ( 1.00x)
inv_txfm_add_16x16_dct_dct_2_8bpc_rvv:   1450.4 ( 9.39x)
2024-02-14 08:31:42 -05:00
Nathan E. Egge 9976976ec8 riscv64/itx: Use registers above v15 in dct macros 2024-02-14 08:31:42 -05:00
Nathan E. Egge 57d5729cf8 riscv64/itx: Convert inv_dct_e16_x8_rvv to macro 2024-02-14 08:31:42 -05:00
Nathan E. Egge c0ccc323d6 riscv64/itx: Convert inv_txfm_horz_16x8_rvv to macro 2024-02-14 08:31:42 -05:00
Nathan E. Egge 64c9d16049 riscv64/itx: Add 16-point 8bpc RVV idtx transform
inv_txfm_add_16x16_identity_identity_0_8bpc_c:     6933.8 ( 1.00x)
inv_txfm_add_16x16_identity_identity_0_8bpc_rvv:    866.0 ( 8.01x)
inv_txfm_add_16x16_identity_identity_1_8bpc_c:     6933.4 ( 1.00x)
inv_txfm_add_16x16_identity_identity_1_8bpc_rvv:    866.1 ( 8.01x)
inv_txfm_add_16x16_identity_identity_2_8bpc_c:     6934.2 ( 1.00x)
inv_txfm_add_16x16_identity_identity_2_8bpc_rvv:    866.1 ( 8.01x)
2024-02-14 08:31:42 -05:00
Nathan E. Egge 08051a3b50 arm64/itx: Set x8 outside .irp loop 2024-02-06 01:43:10 +00:00
Nathan E. Egge 314423b3d9 arm64/itx: Set x8 only once in inv_txfm_add_16x16_neon 2024-02-05 23:36:44 +00:00
Nathan E. Egge a6878be7e0 Alphabetize architecture defines and usage 2024-01-31 06:04:21 -05:00
Nathan E. Egge 219befefeb riscv64/itx: Add 8-point 8bpc RVV flipadst transform
inv_txfm_add_8x8_adst_flipadst_0_8bpc_c:         3323.1 ( 1.00x)
inv_txfm_add_8x8_adst_flipadst_0_8bpc_rvv:        402.1 ( 8.26x)
inv_txfm_add_8x8_adst_flipadst_1_8bpc_c:         3322.8 ( 1.00x)
inv_txfm_add_8x8_adst_flipadst_1_8bpc_rvv:        402.2 ( 8.26x)
inv_txfm_add_8x8_dct_flipadst_0_8bpc_c:          3074.3 ( 1.00x)
inv_txfm_add_8x8_dct_flipadst_0_8bpc_rvv:         359.5 ( 8.55x)
inv_txfm_add_8x8_dct_flipadst_1_8bpc_c:          3074.4 ( 1.00x)
inv_txfm_add_8x8_dct_flipadst_1_8bpc_rvv:         359.4 ( 8.56x)
inv_txfm_add_8x8_flipadst_adst_0_8bpc_c:         3314.8 ( 1.00x)
inv_txfm_add_8x8_flipadst_adst_0_8bpc_rvv:        403.3 ( 8.22x)
inv_txfm_add_8x8_flipadst_adst_1_8bpc_c:         3315.3 ( 1.00x)
inv_txfm_add_8x8_flipadst_adst_1_8bpc_rvv:        403.3 ( 8.22x)
inv_txfm_add_8x8_flipadst_dct_0_8bpc_c:          3071.7 ( 1.00x)
inv_txfm_add_8x8_flipadst_dct_0_8bpc_rvv:         359.1 ( 8.55x)
inv_txfm_add_8x8_flipadst_dct_1_8bpc_c:          3072.5 ( 1.00x)
inv_txfm_add_8x8_flipadst_dct_1_8bpc_rvv:         359.3 ( 8.55x)
inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_c:     3325.2 ( 1.00x)
inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_rvv:    405.2 ( 8.21x)
inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_c:     3325.0 ( 1.00x)
inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_rvv:    405.2 ( 8.21x)
inv_txfm_add_8x8_flipadst_identity_0_8bpc_c:     2356.2 ( 1.00x)
inv_txfm_add_8x8_flipadst_identity_0_8bpc_rvv:    283.7 ( 8.31x)
inv_txfm_add_8x8_flipadst_identity_1_8bpc_c:     2356.2 ( 1.00x)
inv_txfm_add_8x8_flipadst_identity_1_8bpc_rvv:    283.5 ( 8.31x)
inv_txfm_add_8x8_identity_flipadst_0_8bpc_c:     2332.8 ( 1.00x)
inv_txfm_add_8x8_identity_flipadst_0_8bpc_rvv:    268.0 ( 8.71x)
inv_txfm_add_8x8_identity_flipadst_1_8bpc_c:     2331.5 ( 1.00x)
inv_txfm_add_8x8_identity_flipadst_1_8bpc_rvv:    268.0 ( 8.70x)
2024-01-31 06:04:21 -05:00
Nathan E. Egge b5747aee1e riscv64/itx: Convert inv_adst_e16_x8_rvv to macro 2024-01-31 06:04:21 -05:00
Nathan E. Egge 64f9fd0239 riscv64/itx: Add 8-point 8bpc RVV adst transform
inv_txfm_add_8x8_adst_adst_0_8bpc_c:         3338.5 ( 1.00x)
inv_txfm_add_8x8_adst_adst_0_8bpc_rvv:        400.4 ( 8.34x)
inv_txfm_add_8x8_adst_adst_1_8bpc_c:         3338.1 ( 1.00x)
inv_txfm_add_8x8_adst_adst_1_8bpc_rvv:        399.8 ( 8.35x)
inv_txfm_add_8x8_adst_dct_0_8bpc_c:          3112.5 ( 1.00x)
inv_txfm_add_8x8_adst_dct_0_8bpc_rvv:         357.2 ( 8.71x)
inv_txfm_add_8x8_adst_dct_1_8bpc_c:          3111.4 ( 1.00x)
inv_txfm_add_8x8_adst_dct_1_8bpc_rvv:         357.0 ( 8.71x)
inv_txfm_add_8x8_adst_identity_0_8bpc_c:     2375.0 ( 1.00x)
inv_txfm_add_8x8_adst_identity_0_8bpc_rvv:    281.0 ( 8.45x)
inv_txfm_add_8x8_adst_identity_1_8bpc_c:     2375.6 ( 1.00x)
inv_txfm_add_8x8_adst_identity_1_8bpc_rvv:    281.0 ( 8.45x)
inv_txfm_add_8x8_dct_adst_0_8bpc_c:          3113.3 ( 1.00x)
inv_txfm_add_8x8_dct_adst_0_8bpc_rvv:         357.2 ( 8.72x)
inv_txfm_add_8x8_dct_adst_1_8bpc_c:          3112.1 ( 1.00x)
inv_txfm_add_8x8_dct_adst_1_8bpc_rvv:         357.2 ( 8.71x)
inv_txfm_add_8x8_identity_adst_0_8bpc_c:     2346.7 ( 1.00x)
inv_txfm_add_8x8_identity_adst_0_8bpc_rvv:    265.6 ( 8.83x)
inv_txfm_add_8x8_identity_adst_1_8bpc_c:     2348.3 ( 1.00x)
inv_txfm_add_8x8_identity_adst_1_8bpc_rvv:    265.8 ( 8.84x)
2024-01-31 06:04:21 -05:00