mirror of
https://git.ffmpeg.org/ffmpeg.git
synced 2026-06-11 08:13:06 +00:00
avcodec/bswapdsp: improve performance by remove manually unroll
Manually unrolling loops increases code size, which can sometimes
improve performance, but more often than not, it degrades performance.
Keep the C version simple, and add assembly optimizations when needed.
x86-clang x86-gcc-arch-native x86-msvc m1-clang rpi5-clang pi5-gcc-14
-------------------------------------------------------------------------------------------------------------
bswap_buf_c 57.3 ( 1.00x) 19.4 ( 1.00x) 55.4 ( 1.00x) 0.5 ( 1.00x) 143.5 ( 1.00x) 59.8 ( 1.00x)
bswap_buf_this* 49.0 ( 1.17x) 12.5 ( 1.56x) 17.7 ( 3.13x) 0.3 ( 2.04x) 57.9 ( 2.48x) 73.5 ( 0.81x)
bswap_buf_sse2 28.4 ( 2.02x) 24.3 ( 0.80x) 25.5 ( 2.18x) - - -
bswap_buf_ssse3 24.6 ( 2.32x) 16.0 ( 1.22x) 19.0 ( 2.92x) - - -
bswap_buf_avx2 21.2 ( 2.70x) 11.1 ( 1.74x) 11.2 ( 4.95x) - - -
bswap_buf_c: C implementation before this patch
bswap_buf_this: C implementation after this patch
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
This commit is contained in:
committed by
James Almer
co-authored by
James Almer
parent
bba42ce036
commit
8777fa60e6
+1
-13
@@ -24,19 +24,7 @@
|
||||
|
||||
static void bswap_buf(uint32_t *dst, const uint32_t *src, int w)
|
||||
{
|
||||
int i;
|
||||
|
||||
for (i = 0; i + 8 <= w; i += 8) {
|
||||
dst[i + 0] = av_bswap32(src[i + 0]);
|
||||
dst[i + 1] = av_bswap32(src[i + 1]);
|
||||
dst[i + 2] = av_bswap32(src[i + 2]);
|
||||
dst[i + 3] = av_bswap32(src[i + 3]);
|
||||
dst[i + 4] = av_bswap32(src[i + 4]);
|
||||
dst[i + 5] = av_bswap32(src[i + 5]);
|
||||
dst[i + 6] = av_bswap32(src[i + 6]);
|
||||
dst[i + 7] = av_bswap32(src[i + 7]);
|
||||
}
|
||||
for (; i < w; i++)
|
||||
for (int i = 0; i < w; i++)
|
||||
dst[i + 0] = av_bswap32(src[i + 0]);
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user