Files
Jun ZhaoandJun Zhao cfa3ceac7a lavc/hevc: add aarch64 NEON for angular modes 10 and 26
Add NEON-optimized implementations for HEVC angular intra prediction
modes 10 (pure horizontal) and 26 (pure vertical) at 8-bit depth.

Mode 10 (Horizontal):
- Broadcasts left[y] to fill each row using ld2r/ld4r for efficiency
- Applies edge smoothing for luma blocks smaller than 32x32

Mode 26 (Vertical):
- Copies top reference row to all output rows
- Applies edge smoothing for luma blocks smaller than 32x32

Edge smoothing uses uhsub+usqadd to compute the filtered result
directly in 8-bit, avoiding widening to 16-bit intermediates.

The C pred_angular wrappers are made non-static with ff_ prefix to
allow the NEON dispatch to fall back to C for modes not yet optimized.
This will be reverted once all angular modes are implemented.

Note: since pred_angular[] is a per-size function pointer (not
per-mode), checkasm benchmarks will show '_neon' for all 33 modes
even though only modes 10/26 are truly accelerated; unoptimized
modes show ~1.0x speedup as they pass through the NEON wrapper to
the C fallback with negligible overhead.

Speedup over C on Apple M4 (checkasm --bench, 15-run average):

  Mode 10 (Horizontal):
    4x4: 4.66x    8x8: 5.80x    16x16: 16.86x    32x32: 24.89x

  Mode 26 (Vertical):
    4x4: 1.16x    8x8: 1.83x    16x16: 2.45x    32x32: 4.50x

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-06-07 23:29:33 +00:00

89 lines
2.7 KiB
C

/*
* HEVC video Decoder
*
* Copyright (C) 2012 - 2013 Guillaume Martres
*
* This file is part of FFmpeg.
*
* FFmpeg is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2.1 of the License, or (at your option) any later version.
*
* FFmpeg is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with FFmpeg; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*/
#include "hevcdec.h"
#include "pred.h"
#define BIT_DEPTH 8
#include "pred_template.c"
#undef BIT_DEPTH
#define BIT_DEPTH 9
#include "pred_template.c"
#undef BIT_DEPTH
#define BIT_DEPTH 10
#include "pred_template.c"
#undef BIT_DEPTH
#define BIT_DEPTH 12
#include "pred_template.c"
#undef BIT_DEPTH
void ff_hevc_pred_init(HEVCPredContext *hpc, int bit_depth)
{
#undef FUNC
#define FUNC(a, depth) a ## _ ## depth
#define HEVC_PRED(depth) \
hpc->intra_pred[0] = FUNC(intra_pred_2, depth); \
hpc->intra_pred[1] = FUNC(intra_pred_3, depth); \
hpc->intra_pred[2] = FUNC(intra_pred_4, depth); \
hpc->intra_pred[3] = FUNC(intra_pred_5, depth); \
hpc->pred_planar[0] = FUNC(pred_planar_0, depth); \
hpc->pred_planar[1] = FUNC(pred_planar_1, depth); \
hpc->pred_planar[2] = FUNC(pred_planar_2, depth); \
hpc->pred_planar[3] = FUNC(pred_planar_3, depth); \
hpc->pred_dc = FUNC(pred_dc, depth); \
hpc->pred_angular[0] = FUNC(ff_hevc_pred_angular_0, depth); \
hpc->pred_angular[1] = FUNC(ff_hevc_pred_angular_1, depth); \
hpc->pred_angular[2] = FUNC(ff_hevc_pred_angular_2, depth); \
hpc->pred_angular[3] = FUNC(ff_hevc_pred_angular_3, depth); \
hpc->ref_filter_3tap[0] = FUNC(ref_filter_3tap, depth); \
hpc->ref_filter_3tap[1] = FUNC(ref_filter_3tap, depth); \
hpc->ref_filter_3tap[2] = FUNC(ref_filter_3tap, depth); \
hpc->ref_filter_strong = FUNC(ref_filter_strong, depth);
switch (bit_depth) {
case 9:
HEVC_PRED(9);
break;
case 10:
HEVC_PRED(10);
break;
case 12:
HEVC_PRED(12);
break;
default:
HEVC_PRED(8);
break;
}
#if ARCH_AARCH64
ff_hevc_pred_init_aarch64(hpc, bit_depth);
#endif
#if ARCH_MIPS
ff_hevc_pred_init_mips(hpc, bit_depth);
#endif
}