100 Commits
Author SHA1 Message Date
Matthias Dressel 46e9017355 subprojects: Update checkasm to v1.2.0
Among various fixes it no longer installs the checkasm library, header
files and pkgconfig when installing dav1d.
2026-06-07 23:15:02 +02:00
Matthias Dressel d69235dd80 CI: Use shortform QEMU_CPU for loongarch64
Since qemu commit 979bf44af8483cedc00c63b3e79407de08e75a30 the cpu
argument accepts just 'max' as a shorthand.
2026-03-17 22:00:03 +01:00
Matthias Dressel bfbd7d4677 CI: loongarch64: Move QEMU_LD_PREFIX to crossfile
Simplifies developement builds on local machines.
2026-03-17 22:00:03 +01:00
Matthias Dressel afcdb781cb CI: riscv64: Move QEMU_LD_PREFIX to crossfile
Simplifies developement builds on local machines.
2026-03-17 22:00:03 +01:00
Matthias Dressel 42ac98706a CI: aarch64: Move QEMU_LD_PREFIX to crossfile
Simplifies developement builds on local machines.
2026-03-17 22:00:03 +01:00
Matthias Dressel 8feb8526bb CI: Remove outdated version suffix from job name 2026-03-17 22:00:03 +01:00
Matthias Dressel 1dcfc90757 CI: Update images 2026-02-28 16:27:35 +01:00
Matthias Dressel daef396277 CI: Switch to loongarch64 Debian toolchain
loong64 was recently promoted to an official Debian architecture. [0]

[0] https://lists.debian.org/debian-devel-announce/2025/12/msg00004.html
2026-02-09 02:33:29 +01:00
Matthias Dressel c3f3a7e567 CI: Check --frametimes with msan
This would have caught 583e8e02eb.
2025-07-01 18:35:31 +02:00
Matthias Dressel 8d95618093 CI: Build '-mavx' code as debugoptimized
Workaround a GCC 14 bug where it does not insert `vzeroupper` in C code
built without at least '-O2'.
2025-03-10 16:40:35 +01:00
Matthias Dressel edeac873c4 CI: Update images 2025-03-10 16:40:35 +01:00
Matthias Dressel 1d0cda02a6 CI: Update ppc64le image
Since there seems to be a problem with gcc-14 stay on gcc-13 for now.
2025-03-05 21:58:24 +01:00
Matthias DresselandJean-Baptiste Kempf 37155c1147 CI: Update Android image
NDK 26 dropped support for API versions 19 and 20 (KitKat, Android 4.4).
The minimum supported API is now 21 (Lollipop, Android 5.0).
2024-05-18 10:04:31 +00:00
Matthias DresselandJean-Baptiste Kempf c7df9a3e65 CI: Improve coverage for argon samples using different thread counts
Similar to 4796b59fc0.
2024-05-01 13:09:09 +00:00
Matthias DresselandJean-Baptiste Kempf 0f504bf57c CI: Add dotprod to argon tests 2024-05-01 13:09:09 +00:00
Matthias Dressel 5851901772 CI: Move llvm crossfiles from image to project
Since dav1d was the only user of these crossfiles, it was agreed upon to
remove them from the image [0] and move to dav1d directly. [1]

[0] https://code.videolan.org/videolan/docker-images/-/merge_requests/293
[1] https://code.videolan.org/videolan/docker-images/-/merge_requests/294#note_434720
2024-04-16 11:53:16 +02:00
Matthias Dressel 313af0b6a5 CI: Update images
Now with clang 18 and downgraded xz-utils.
2024-04-14 01:57:37 +02:00
Matthias Dressel aa63a41ccd cli: Add missing ARM cpumasks help text
Forgotten in acc1121d2f.
2024-04-11 23:15:07 +02:00
Matthias Dressel b9312c8dd8 Update THANKS.md 2024-03-08 23:24:30 +01:00
Matthias Dressel 9d57a654e2 CI: Add riscv64 clang build 2024-02-22 19:13:23 +01:00
Matthias Dressel bada810c17 CI: Update image
Now contains clang 17.
2024-02-22 19:13:23 +01:00
Matthias Dressel 91ddba0b07 gcovr: Fix config file
gcovr 7.0 fixed a config file parsing bug [0].
Valid options are 'all', 'negative_hits.warn',
'negative_hits.warn_once_per_file'.

[0] https://github.com/gcovr/gcovr/pull/816
2024-02-22 19:13:23 +01:00
Matthias Dressel 81c0b46375 meson: Test for RISC-V assembler support
Support for '.option arch' directive [0] was added to binutils in
d3ffd7f77654adafe5f1989bdfdbe4a337ff2e8b [1] and in llvm in
9e8ed3403c191ab9c4903e8eeb8f732ff8a43cb4 [2].

[0] https://github.com/riscv-non-isa/riscv-asm-manual/pull/67
[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d3ffd7f77654adafe5f1989bdfdbe4a337ff2e8b
[2] https://github.com/llvm/llvm-project/commit/9e8ed3403c191ab9c4903e8eeb8f732ff8a43cb4
2024-02-20 15:53:04 +01:00
Matthias DresselandNathan E. Egge a7edb02987 CI: Use cross-compiling libc instead of multi-arch
See https://code.videolan.org/videolan/docker-images/-/merge_requests/272
for more context.
2024-01-31 06:04:21 -05:00
Matthias DresselandNathan E. Egge ebbddd48e3 CI: Add riscv64 tests 2024-01-31 06:04:21 -05:00
Matthias Dressel 16ed8e8b99 meson: Disable seek-stress tests by default 2024-01-24 16:28:38 +01:00
Matthias DresselandRonald S. Bultje 2c9bbb4908 meson: Add 'enable_seek_stress' option
Allows to explicitly enable/disable seek-stress tests.
2024-01-23 17:47:46 +00:00
Matthias Dressel b084160736 CI: Switch to using 'testdata' suite
Simplifies testing and also contains the forgotten 'testdata-multi'
suite which was added later.
2024-01-23 00:26:51 +01:00
Matthias Dressel 7d225bec62 CI: Add loongarch64 tests 2024-01-15 14:54:46 +01:00
Matthias Dressel 655d7ec07d CI: Add loongarch64 toolchain 2024-01-15 09:35:54 +01:00
Matthias Dressel 48ef395920 CI: Update images 2023-10-24 20:27:33 +02:00
Matthias Dressel 9278a14cf4 checkasm: Always bench C-only functions as well
Integrates --bench-c into --bench to simplify benchmarks.
2023-07-12 19:38:06 +02:00
Matthias Dressel fc40a0db51 checkasm: document '-t' in --help text 2023-07-07 21:21:51 +02:00
Matthias Dressel f8ae94eca0 CI: Add argon tests 2023-05-14 17:52:59 +02:00
Matthias DresselandJean-Baptiste Kempf 6addb1a83c crossfiles: Streamline and simplify crossfiles
* `needs_exe_wrapper` is only needed in specific cases when
  `exe_wrapper` is not set.
  See https://mesonbuild.com/Cross-compilation.html#properties

* "Before 0.56.0, <lang>_args and <lang>_link_args must be put in the
   properties section instead, else they will be ignored."
   [https://mesonbuild.com/Machine-files.html#meson-builtin-options]
   Our minimum version is 0.49.0. Meson >= 0.56.0 prints a deprecation
   warning.
2023-04-23 12:39:02 +00:00
Matthias DresselandJean-Baptiste Kempf 380efd764f CI: Add wasm{32,64} builds
Fixes #421
2023-04-06 07:52:12 +00:00
Matthias Dressel 0207e0fe9f x86/itx: Fix identation of macro instructions 2023-03-31 18:41:54 +02:00
Matthias Dressel f6d4c0c473 x86/itx: Add 32x32 12bpc AVX2 idtx
inv_txfm_add_32x32_identity_identity_0_12bpc_c:      5785.8 ( 1.00x)
inv_txfm_add_32x32_identity_identity_0_12bpc_avx2:     20.7 (279.65x)
inv_txfm_add_32x32_identity_identity_1_12bpc_c:      5896.9 ( 1.00x)
inv_txfm_add_32x32_identity_identity_1_12bpc_avx2:     20.7 (285.01x)
inv_txfm_add_32x32_identity_identity_2_12bpc_c:      5799.5 ( 1.00x)
inv_txfm_add_32x32_identity_identity_2_12bpc_avx2:     68.9 (84.20x)
inv_txfm_add_32x32_identity_identity_3_12bpc_c:      5798.1 ( 1.00x)
inv_txfm_add_32x32_identity_identity_3_12bpc_avx2:    140.6 (41.25x)
inv_txfm_add_32x32_identity_identity_4_12bpc_c:      5803.3 ( 1.00x)
inv_txfm_add_32x32_identity_identity_4_12bpc_avx2:    308.2 (18.83x)
2023-03-31 18:41:36 +02:00
Matthias Dressel 1e602b8b33 x86/itx: Add 32x16 12bpc AVX2 idtx
inv_txfm_add_32x16_identity_identity_0_12bpc_c:      4138.7 ( 1.00x)
inv_txfm_add_32x16_identity_identity_0_12bpc_avx2:     30.4 (136.26x)
inv_txfm_add_32x16_identity_identity_1_12bpc_c:      4147.5 ( 1.00x)
inv_txfm_add_32x16_identity_identity_1_12bpc_avx2:     30.7 (135.25x)
inv_txfm_add_32x16_identity_identity_2_12bpc_c:      4138.2 ( 1.00x)
inv_txfm_add_32x16_identity_identity_2_12bpc_avx2:     98.9 (41.84x)
inv_txfm_add_32x16_identity_identity_3_12bpc_c:      4136.6 ( 1.00x)
inv_txfm_add_32x16_identity_identity_3_12bpc_avx2:    167.7 (24.67x)
inv_txfm_add_32x16_identity_identity_4_12bpc_c:      4156.3 ( 1.00x)
inv_txfm_add_32x16_identity_identity_4_12bpc_avx2:    242.1 (17.17x)
2023-03-31 18:41:19 +02:00
Matthias Dressel e6b194e7d2 x86/itx: Add 16x32 12bpc AVX2 idtx
inv_txfm_add_16x32_identity_identity_0_12bpc_c:      4287.9 ( 1.00x)
inv_txfm_add_16x32_identity_identity_0_12bpc_avx2:     31.4 (136.66x)
inv_txfm_add_16x32_identity_identity_1_12bpc_c:      4293.7 ( 1.00x)
inv_txfm_add_16x32_identity_identity_1_12bpc_avx2:     30.9 (139.07x)
inv_txfm_add_16x32_identity_identity_2_12bpc_c:      4273.8 ( 1.00x)
inv_txfm_add_16x32_identity_identity_2_12bpc_avx2:     97.3 (43.92x)
inv_txfm_add_16x32_identity_identity_3_12bpc_c:      4269.0 ( 1.00x)
inv_txfm_add_16x32_identity_identity_3_12bpc_avx2:    165.2 (25.83x)
inv_txfm_add_16x32_identity_identity_4_12bpc_c:      4284.4 ( 1.00x)
inv_txfm_add_16x32_identity_identity_4_12bpc_avx2:    235.2 (18.22x)
2023-03-31 18:40:35 +02:00
Matthias Dressel d426d1c910 .gitignore: Add tests/argon 2023-03-01 19:59:10 +01:00
Matthias DresselandHenrik Gramner e43904ca48 Add script to test against argon samples
Co-authored-by: Henrik Gramner <gramner@twoorioles.com>
2023-03-01 19:59:10 +01:00
Matthias Dressel b8a43e2225 CI: Replace only/except with rules
"only and except are not being actively developed. rules is the
preferred keyword to control when to add jobs to pipelines." [0]

[0] https://docs.gitlab.com/ee/ci/yaml/index.html#only--except
2023-02-13 21:10:44 +01:00
Matthias Dressel 616dad2b43 CI: Unambiguously call meson setup
Calling meson with no command is deprecated since 0.64.0
2023-02-13 21:10:44 +01:00
Matthias Dressel 899d6c9fd3 CI: Update images 2023-02-13 21:10:44 +01:00
Matthias Dressel 934713e4a6 CI: Disable trimming on some tests
Allow checkasm to run.
2022-09-09 09:21:25 +02:00
Matthias Dressel 3920bd9d9d CI: Remove git 'safe.directory' config
It is now handled by the gitlab runner.

Ref: 7d859f9c72
2022-09-09 09:21:25 +02:00
Matthias Dressel ddb3189c25 gcovr: Ignore parsing errors 2022-09-09 09:21:25 +02:00
Matthias Dressel aa3fda7800 crossfiles: Update Android toolchains
* Android armv7: target API 19 since it's the lowest directly provided
  by the new NDK.
* Newer NDK has generic tools for ar, strip, etc.
* Remove windres as it's only relevant for Windows targets.
2022-09-09 09:20:52 +02:00
Matthias Dressel d92594bd5d CI: Update images
Remove experimental since gcc12, clang14, mold are now in unstable.
2022-09-09 09:20:52 +02:00
Matthias Dressel 8c079f784a CI: Update coverage collecting
artifacts:reports:cobertura was deprecated in GitLab 14.9
2022-05-25 19:41:34 +02:00
Matthias Dressel 0770d98d93 CI: Add a build with the minimum requirements
* meson 0.49.0
  * nasm 2.14
2022-05-25 19:41:34 +02:00
Matthias Dressel 7d859f9c72 CI: Deactivate git 'safe.directory'
An attacker already has arbitrary code execution inside the container.

Ref: CVE-2022-24765
2022-05-25 19:41:34 +02:00
Matthias Dressel c1264cd27e CI: Update images 2022-05-25 19:41:34 +02:00
Matthias Dressel 9833c92807 CI: Add gcc12 and clang14 builds with mold linker 2022-05-07 16:51:25 +02:00
Matthias Dressel 1bd91c3e67 CI: Trigger documentation rebuild if configuration changes
Additionally, switch from 'only'/'except' to 'rules' which is
more flexible.
2022-05-06 01:52:36 +02:00
Matthias Dressel 9c69574d0f meson/doc: Fix doxygen config
* Doxygen had a longstanding bug [0] where it would use `dot` even if
  not configured to do so. Due to this behaviour our config magically
  worked.
  This bug is fixed in 1.9.2 therefore we need to explicitly enable
  `dot` support in order to keep existing functionality.

* Enables WARN_AS_ERROR to catch mistakes.

* Adds a version string to the header to easily identify which commit
  the docs are built from.

[0] https://github.com/doxygen/doxygen/issues/7273
2022-05-06 01:52:36 +02:00
Matthias Dressel ffb5968035 x86/itx: Add 32x8 12bpc AVX2 transforms
inv_txfm_add_32x8_dct_dct_0_12bpc_c: 286.7
inv_txfm_add_32x8_dct_dct_0_12bpc_avx2: 20.1
inv_txfm_add_32x8_dct_dct_1_12bpc_c: 7832.7
inv_txfm_add_32x8_dct_dct_1_12bpc_avx2: 710.6
inv_txfm_add_32x8_dct_dct_2_12bpc_c: 7838.1
inv_txfm_add_32x8_dct_dct_2_12bpc_avx2: 711.6
inv_txfm_add_32x8_dct_dct_3_12bpc_c: 7818.3
inv_txfm_add_32x8_dct_dct_3_12bpc_avx2: 710.9
inv_txfm_add_32x8_dct_dct_4_12bpc_c: 7820.6
inv_txfm_add_32x8_dct_dct_4_12bpc_avx2: 710.5
inv_txfm_add_32x8_identity_identity_0_12bpc_c: 1526.6
inv_txfm_add_32x8_identity_identity_0_12bpc_avx2: 19.3
inv_txfm_add_32x8_identity_identity_1_12bpc_c: 1519.4
inv_txfm_add_32x8_identity_identity_1_12bpc_avx2: 19.9
inv_txfm_add_32x8_identity_identity_2_12bpc_c: 1519.9
inv_txfm_add_32x8_identity_identity_2_12bpc_avx2: 43.6
inv_txfm_add_32x8_identity_identity_3_12bpc_c: 1519.4
inv_txfm_add_32x8_identity_identity_3_12bpc_avx2: 67.8
inv_txfm_add_32x8_identity_identity_4_12bpc_c: 1523.2
inv_txfm_add_32x8_identity_identity_4_12bpc_avx2: 91.6
2022-04-24 20:58:00 +02:00
Matthias Dressel e67a500054 x86/itx: Add 8x32 12bpc AVX2 transforms
inv_txfm_add_8x32_dct_dct_0_12bpc_c: 334.6
inv_txfm_add_8x32_dct_dct_0_12bpc_avx2: 66.0
inv_txfm_add_8x32_dct_dct_1_12bpc_c: 7929.7
inv_txfm_add_8x32_dct_dct_1_12bpc_avx2: 489.3
inv_txfm_add_8x32_dct_dct_2_12bpc_c: 7925.8
inv_txfm_add_8x32_dct_dct_2_12bpc_avx2: 547.1
inv_txfm_add_8x32_dct_dct_3_12bpc_c: 7928.9
inv_txfm_add_8x32_dct_dct_3_12bpc_avx2: 647.8
inv_txfm_add_8x32_dct_dct_4_12bpc_c: 7916.1
inv_txfm_add_8x32_dct_dct_4_12bpc_avx2: 701.0
inv_txfm_add_8x32_identity_identity_0_12bpc_c: 2413.1
inv_txfm_add_8x32_identity_identity_0_12bpc_avx2: 28.6
inv_txfm_add_8x32_identity_identity_1_12bpc_c: 2415.2
inv_txfm_add_8x32_identity_identity_1_12bpc_avx2: 28.6
inv_txfm_add_8x32_identity_identity_2_12bpc_c: 2413.7
inv_txfm_add_8x32_identity_identity_2_12bpc_avx2: 55.1
inv_txfm_add_8x32_identity_identity_3_12bpc_c: 2415.4
inv_txfm_add_8x32_identity_identity_3_12bpc_avx2: 85.3
inv_txfm_add_8x32_identity_identity_4_12bpc_c: 2401.8
inv_txfm_add_8x32_identity_identity_4_12bpc_avx2: 116.8
2022-04-24 20:56:32 +02:00
Matthias Dressel 0c1fbdefdc x86/itx: Deduplicate dconly code 2022-04-24 17:59:04 +02:00
Matthias Dressel 11aa919a2f lib: Fix typo in documentation 2022-04-23 23:38:20 +02:00
Matthias Dressel d821d88035 Update THANKS.md 2022-02-19 18:01:31 +01:00
Matthias Dressel 94b1bf456e meson: Use native check of return value 2022-02-09 15:35:09 +01:00
Matthias Dressel 8e8148c16d x86/itx: Add 16x16 12bpc AVX2 transforms
inv_txfm_add_16x16_adst_adst_0_12bpc_c: 8990.0
inv_txfm_add_16x16_adst_adst_0_12bpc_avx2: 646.1
inv_txfm_add_16x16_adst_adst_1_12bpc_c: 8965.3
inv_txfm_add_16x16_adst_adst_1_12bpc_avx2: 646.9
inv_txfm_add_16x16_adst_adst_2_12bpc_c: 8983.2
inv_txfm_add_16x16_adst_adst_2_12bpc_avx2: 870.1
inv_txfm_add_16x16_adst_dct_0_12bpc_c: 9058.2
inv_txfm_add_16x16_adst_dct_0_12bpc_avx2: 548.8
inv_txfm_add_16x16_adst_dct_1_12bpc_c: 9092.7
inv_txfm_add_16x16_adst_dct_1_12bpc_avx2: 549.3
inv_txfm_add_16x16_adst_dct_2_12bpc_c: 9086.7
inv_txfm_add_16x16_adst_dct_2_12bpc_avx2: 775.5
inv_txfm_add_16x16_adst_flipadst_0_12bpc_c: 9083.4
inv_txfm_add_16x16_adst_flipadst_0_12bpc_avx2: 645.6
inv_txfm_add_16x16_adst_flipadst_1_12bpc_c: 8998.3
inv_txfm_add_16x16_adst_flipadst_1_12bpc_avx2: 646.2
inv_txfm_add_16x16_adst_flipadst_2_12bpc_c: 9014.7
inv_txfm_add_16x16_adst_flipadst_2_12bpc_avx2: 873.8
inv_txfm_add_16x16_dct_adst_0_12bpc_c: 9080.1
inv_txfm_add_16x16_dct_adst_0_12bpc_avx2: 598.2
inv_txfm_add_16x16_dct_adst_1_12bpc_c: 9103.3
inv_txfm_add_16x16_dct_adst_1_12bpc_avx2: 598.1
inv_txfm_add_16x16_dct_adst_2_12bpc_c: 9089.5
inv_txfm_add_16x16_dct_adst_2_12bpc_avx2: 764.4
inv_txfm_add_16x16_dct_dct_0_12bpc_c: 1042.1
inv_txfm_add_16x16_dct_dct_0_12bpc_avx2: 28.6
inv_txfm_add_16x16_dct_dct_1_12bpc_c: 9164.6
inv_txfm_add_16x16_dct_dct_1_12bpc_avx2: 500.8
inv_txfm_add_16x16_dct_dct_2_12bpc_c: 9161.9
inv_txfm_add_16x16_dct_dct_2_12bpc_avx2: 678.2
inv_txfm_add_16x16_dct_flipadst_0_12bpc_c: 9104.9
inv_txfm_add_16x16_dct_flipadst_0_12bpc_avx2: 601.8
inv_txfm_add_16x16_dct_flipadst_1_12bpc_c: 9248.6
inv_txfm_add_16x16_dct_flipadst_1_12bpc_avx2: 599.2
inv_txfm_add_16x16_dct_flipadst_2_12bpc_c: 9087.4
inv_txfm_add_16x16_dct_flipadst_2_12bpc_avx2: 770.1
inv_txfm_add_16x16_dct_identity_0_12bpc_c: 6570.4
inv_txfm_add_16x16_dct_identity_0_12bpc_avx2: 243.9
inv_txfm_add_16x16_dct_identity_1_12bpc_c: 6615.4
inv_txfm_add_16x16_dct_identity_1_12bpc_avx2: 246.0
inv_txfm_add_16x16_dct_identity_2_12bpc_c: 6553.4
inv_txfm_add_16x16_dct_identity_2_12bpc_avx2: 435.0
inv_txfm_add_16x16_flipadst_adst_0_12bpc_c: 8982.1
inv_txfm_add_16x16_flipadst_adst_0_12bpc_avx2: 647.2
inv_txfm_add_16x16_flipadst_adst_1_12bpc_c: 8978.9
inv_txfm_add_16x16_flipadst_adst_1_12bpc_avx2: 647.2
inv_txfm_add_16x16_flipadst_adst_2_12bpc_c: 8964.0
inv_txfm_add_16x16_flipadst_adst_2_12bpc_avx2: 868.4
inv_txfm_add_16x16_flipadst_dct_0_12bpc_c: 9083.5
inv_txfm_add_16x16_flipadst_dct_0_12bpc_avx2: 550.0
inv_txfm_add_16x16_flipadst_dct_1_12bpc_c: 9070.4
inv_txfm_add_16x16_flipadst_dct_1_12bpc_avx2: 550.2
inv_txfm_add_16x16_flipadst_dct_2_12bpc_c: 9085.8
inv_txfm_add_16x16_flipadst_dct_2_12bpc_avx2: 779.7
inv_txfm_add_16x16_flipadst_flipadst_0_12bpc_c: 8977.1
inv_txfm_add_16x16_flipadst_flipadst_0_12bpc_avx2: 657.3
inv_txfm_add_16x16_flipadst_flipadst_1_12bpc_c: 9002.0
inv_txfm_add_16x16_flipadst_flipadst_1_12bpc_avx2: 657.3
inv_txfm_add_16x16_flipadst_flipadst_2_12bpc_c: 9008.4
inv_txfm_add_16x16_flipadst_flipadst_2_12bpc_avx2: 872.0
inv_txfm_add_16x16_identity_dct_0_12bpc_c: 6504.7
inv_txfm_add_16x16_identity_dct_0_12bpc_avx2: 387.5
inv_txfm_add_16x16_identity_dct_1_12bpc_c: 6548.3
inv_txfm_add_16x16_identity_dct_1_12bpc_avx2: 387.5
inv_txfm_add_16x16_identity_dct_2_12bpc_c: 6512.4
inv_txfm_add_16x16_identity_dct_2_12bpc_avx2: 387.5
inv_txfm_add_16x16_identity_identity_0_12bpc_c: 3926.2
inv_txfm_add_16x16_identity_identity_0_12bpc_avx2: 135.0
inv_txfm_add_16x16_identity_identity_1_12bpc_c: 3896.7
inv_txfm_add_16x16_identity_identity_1_12bpc_avx2: 134.5
inv_txfm_add_16x16_identity_identity_2_12bpc_c: 3888.0
inv_txfm_add_16x16_identity_identity_2_12bpc_avx2: 230.3
2022-01-24 18:11:46 +01:00
Matthias Dressel 0a596b6fa1 x86/filmgrain: Don't use AVX2 for fgy, fguv on CPUs with slow gather
Filmgrain is using a lot of `vpgatherdd` instructions which are rather
slow on certain chips, making the SSSE3 version faster.

Fixes #377
2022-01-11 23:24:41 +01:00
Matthias DresselandHenrik Gramner e663897a94 x86: Detect CPUs with slow AVX2 gather
`vpgather*` instructions seem to be relatively slow on current AMD
chips. Intel Haswell is slow as well, but just (barely) fast enough to
not cause regressions in our current use cases.

Co-authored-by: Henrik Gramner <gramner@twoorioles.com>
2022-01-11 23:24:41 +01:00
Matthias Dressel 633c63ed51 README: Add the new documentation option 2022-01-04 15:26:01 +01:00
Matthias Dressel 37881b8278 ppc: Rename types.h to dav1d_types.h
Avoid collision with system header using gcc7.

Fixes #363
2022-01-03 22:39:52 +01:00
Matthias Dressel 3e5b7d3770 CI: Add enable_docs option 2021-12-29 17:25:37 +01:00
Matthias DresselandRudi Heitbaum 5e67cfd806 meson: Add explicit option to build documentation
Co-authored-by: Rudi Heitbaum <rudi@heitbaum.com>
2021-12-29 17:25:37 +01:00
Matthias Dressel f266b3b295 README: Update minimum meson version
Changed in d85fdf52
2021-12-28 21:15:50 +01:00
Matthias Dressel e8a3f99d90 x86/itx: Add 16x8 12bpc AVX2 transforms
inv_txfm_add_16x8_adst_adst_0_12bpc_c: 4517.9
inv_txfm_add_16x8_adst_adst_0_12bpc_avx2: 432.4
inv_txfm_add_16x8_adst_adst_1_12bpc_c: 4510.9
inv_txfm_add_16x8_adst_adst_1_12bpc_avx2: 432.4
inv_txfm_add_16x8_adst_adst_2_12bpc_c: 4498.6
inv_txfm_add_16x8_adst_adst_2_12bpc_avx2: 432.4
inv_txfm_add_16x8_adst_dct_0_12bpc_c: 4553.8
inv_txfm_add_16x8_adst_dct_0_12bpc_avx2: 389.1
inv_txfm_add_16x8_adst_dct_1_12bpc_c: 4543.3
inv_txfm_add_16x8_adst_dct_1_12bpc_avx2: 389.1
inv_txfm_add_16x8_adst_dct_2_12bpc_c: 4538.4
inv_txfm_add_16x8_adst_dct_2_12bpc_avx2: 389.1
inv_txfm_add_16x8_adst_flipadst_0_12bpc_c: 4532.6
inv_txfm_add_16x8_adst_flipadst_0_12bpc_avx2: 435.4
inv_txfm_add_16x8_adst_flipadst_1_12bpc_c: 4520.4
inv_txfm_add_16x8_adst_flipadst_1_12bpc_avx2: 435.4
inv_txfm_add_16x8_adst_flipadst_2_12bpc_c: 4516.2
inv_txfm_add_16x8_adst_flipadst_2_12bpc_avx2: 435.4
inv_txfm_add_16x8_adst_identity_0_12bpc_c: 3502.3
inv_txfm_add_16x8_adst_identity_0_12bpc_avx2: 255.9
inv_txfm_add_16x8_adst_identity_1_12bpc_c: 3492.9
inv_txfm_add_16x8_adst_identity_1_12bpc_avx2: 256.3
inv_txfm_add_16x8_adst_identity_2_12bpc_c: 3471.4
inv_txfm_add_16x8_adst_identity_2_12bpc_avx2: 256.7
inv_txfm_add_16x8_dct_adst_0_12bpc_c: 4563.2
inv_txfm_add_16x8_dct_adst_0_12bpc_avx2: 383.6
inv_txfm_add_16x8_dct_adst_1_12bpc_c: 4573.1
inv_txfm_add_16x8_dct_adst_1_12bpc_avx2: 383.9
inv_txfm_add_16x8_dct_adst_2_12bpc_c: 4562.2
inv_txfm_add_16x8_dct_adst_2_12bpc_avx2: 383.7
inv_txfm_add_16x8_dct_dct_0_12bpc_c: 514.0
inv_txfm_add_16x8_dct_dct_0_12bpc_avx2: 25.0
inv_txfm_add_16x8_dct_dct_1_12bpc_c: 4540.5
inv_txfm_add_16x8_dct_dct_1_12bpc_avx2: 340.4
inv_txfm_add_16x8_dct_dct_2_12bpc_c: 4563.0
inv_txfm_add_16x8_dct_dct_2_12bpc_avx2: 339.3
inv_txfm_add_16x8_dct_flipadst_0_12bpc_c: 4568.0
inv_txfm_add_16x8_dct_flipadst_0_12bpc_avx2: 385.9
inv_txfm_add_16x8_dct_flipadst_1_12bpc_c: 4577.5
inv_txfm_add_16x8_dct_flipadst_1_12bpc_avx2: 385.8
inv_txfm_add_16x8_dct_flipadst_2_12bpc_c: 4573.8
inv_txfm_add_16x8_dct_flipadst_2_12bpc_avx2: 385.8
inv_txfm_add_16x8_dct_identity_0_12bpc_c: 3549.9
inv_txfm_add_16x8_dct_identity_0_12bpc_avx2: 212.1
inv_txfm_add_16x8_dct_identity_1_12bpc_c: 3538.7
inv_txfm_add_16x8_dct_identity_1_12bpc_avx2: 212.1
inv_txfm_add_16x8_dct_identity_2_12bpc_c: 3539.7
inv_txfm_add_16x8_dct_identity_2_12bpc_avx2: 212.1
inv_txfm_add_16x8_flipadst_adst_0_12bpc_c: 4495.3
inv_txfm_add_16x8_flipadst_adst_0_12bpc_avx2: 431.4
inv_txfm_add_16x8_flipadst_adst_1_12bpc_c: 4496.3
inv_txfm_add_16x8_flipadst_adst_1_12bpc_avx2: 431.4
inv_txfm_add_16x8_flipadst_adst_2_12bpc_c: 4499.2
inv_txfm_add_16x8_flipadst_adst_2_12bpc_avx2: 431.3
inv_txfm_add_16x8_flipadst_dct_0_12bpc_c: 4506.9
inv_txfm_add_16x8_flipadst_dct_0_12bpc_avx2: 386.3
inv_txfm_add_16x8_flipadst_dct_1_12bpc_c: 4512.9
inv_txfm_add_16x8_flipadst_dct_1_12bpc_avx2: 386.0
inv_txfm_add_16x8_flipadst_dct_2_12bpc_c: 4503.2
inv_txfm_add_16x8_flipadst_dct_2_12bpc_avx2: 386.0
inv_txfm_add_16x8_flipadst_flipadst_0_12bpc_c: 4509.1
inv_txfm_add_16x8_flipadst_flipadst_0_12bpc_avx2: 432.2
inv_txfm_add_16x8_flipadst_flipadst_1_12bpc_c: 4519.0
inv_txfm_add_16x8_flipadst_flipadst_1_12bpc_avx2: 432.1
inv_txfm_add_16x8_flipadst_flipadst_2_12bpc_c: 4518.3
inv_txfm_add_16x8_flipadst_flipadst_2_12bpc_avx2: 432.1
inv_txfm_add_16x8_flipadst_identity_0_12bpc_c: 3511.0
inv_txfm_add_16x8_flipadst_identity_0_12bpc_avx2: 257.1
inv_txfm_add_16x8_flipadst_identity_1_12bpc_c: 3518.5
inv_txfm_add_16x8_flipadst_identity_1_12bpc_avx2: 257.2
inv_txfm_add_16x8_flipadst_identity_2_12bpc_c: 3521.7
inv_txfm_add_16x8_flipadst_identity_2_12bpc_avx2: 257.1
inv_txfm_add_16x8_identity_adst_0_12bpc_c: 3166.8
inv_txfm_add_16x8_identity_adst_0_12bpc_avx2: 268.6
inv_txfm_add_16x8_identity_adst_1_12bpc_c: 3157.9
inv_txfm_add_16x8_identity_adst_1_12bpc_avx2: 268.6
inv_txfm_add_16x8_identity_adst_2_12bpc_c: 3156.5
inv_txfm_add_16x8_identity_adst_2_12bpc_avx2: 268.6
inv_txfm_add_16x8_identity_dct_0_12bpc_c: 3187.4
inv_txfm_add_16x8_identity_dct_0_12bpc_avx2: 224.4
inv_txfm_add_16x8_identity_dct_1_12bpc_c: 3185.8
inv_txfm_add_16x8_identity_dct_1_12bpc_avx2: 224.4
inv_txfm_add_16x8_identity_dct_2_12bpc_c: 3190.8
inv_txfm_add_16x8_identity_dct_2_12bpc_avx2: 224.4
inv_txfm_add_16x8_identity_flipadst_0_12bpc_c: 3167.7
inv_txfm_add_16x8_identity_flipadst_0_12bpc_avx2: 269.7
inv_txfm_add_16x8_identity_flipadst_1_12bpc_c: 3174.1
inv_txfm_add_16x8_identity_flipadst_1_12bpc_avx2: 269.8
inv_txfm_add_16x8_identity_flipadst_2_12bpc_c: 3174.7
inv_txfm_add_16x8_identity_flipadst_2_12bpc_avx2: 269.7
inv_txfm_add_16x8_identity_identity_0_12bpc_c: 2153.3
inv_txfm_add_16x8_identity_identity_0_12bpc_avx2: 99.1
inv_txfm_add_16x8_identity_identity_1_12bpc_c: 2143.6
inv_txfm_add_16x8_identity_identity_1_12bpc_avx2: 99.3
inv_txfm_add_16x8_identity_identity_2_12bpc_c: 2145.9
inv_txfm_add_16x8_identity_identity_2_12bpc_avx2: 98.6
2021-12-04 05:04:37 +01:00
Matthias Dressel 23e8405c2e x86/itx: Add 8x16 12bpc AVX2 transforms
inv_txfm_add_8x16_adst_adst_0_12bpc_c: 4440.4
inv_txfm_add_8x16_adst_adst_0_12bpc_avx2: 354.3
inv_txfm_add_8x16_adst_adst_1_12bpc_c: 4437.3
inv_txfm_add_8x16_adst_adst_1_12bpc_avx2: 354.3
inv_txfm_add_8x16_adst_adst_2_12bpc_c: 4438.8
inv_txfm_add_8x16_adst_adst_2_12bpc_avx2: 442.6
inv_txfm_add_8x16_adst_dct_0_12bpc_c: 4507.3
inv_txfm_add_8x16_adst_dct_0_12bpc_avx2: 310.0
inv_txfm_add_8x16_adst_dct_1_12bpc_c: 4500.3
inv_txfm_add_8x16_adst_dct_1_12bpc_avx2: 310.0
inv_txfm_add_8x16_adst_dct_2_12bpc_c: 4516.1
inv_txfm_add_8x16_adst_dct_2_12bpc_avx2: 399.5
inv_txfm_add_8x16_adst_flipadst_0_12bpc_c: 4457.3
inv_txfm_add_8x16_adst_flipadst_0_12bpc_avx2: 355.6
inv_txfm_add_8x16_adst_flipadst_1_12bpc_c: 4441.3
inv_txfm_add_8x16_adst_flipadst_1_12bpc_avx2: 355.6
inv_txfm_add_8x16_adst_flipadst_2_12bpc_c: 4448.9
inv_txfm_add_8x16_adst_flipadst_2_12bpc_avx2: 445.5
inv_txfm_add_8x16_adst_identity_0_12bpc_c: 3204.0
inv_txfm_add_8x16_adst_identity_0_12bpc_avx2: 173.1
inv_txfm_add_8x16_adst_identity_1_12bpc_c: 3207.1
inv_txfm_add_8x16_adst_identity_1_12bpc_avx2: 173.6
inv_txfm_add_8x16_adst_identity_2_12bpc_c: 3210.4
inv_txfm_add_8x16_adst_identity_2_12bpc_avx2: 261.2
inv_txfm_add_8x16_dct_adst_0_12bpc_c: 4484.2
inv_txfm_add_8x16_dct_adst_0_12bpc_avx2: 334.0
inv_txfm_add_8x16_dct_adst_1_12bpc_c: 4503.8
inv_txfm_add_8x16_dct_adst_1_12bpc_avx2: 334.6
inv_txfm_add_8x16_dct_adst_2_12bpc_c: 4490.7
inv_txfm_add_8x16_dct_adst_2_12bpc_avx2: 395.6
inv_txfm_add_8x16_dct_dct_0_12bpc_c: 419.9
inv_txfm_add_8x16_dct_dct_0_12bpc_avx2: 37.6
inv_txfm_add_8x16_dct_dct_1_12bpc_c: 4482.6
inv_txfm_add_8x16_dct_dct_1_12bpc_avx2: 284.6
inv_txfm_add_8x16_dct_dct_2_12bpc_c: 4468.7
inv_txfm_add_8x16_dct_dct_2_12bpc_avx2: 348.3
inv_txfm_add_8x16_dct_flipadst_0_12bpc_c: 4468.4
inv_txfm_add_8x16_dct_flipadst_0_12bpc_avx2: 333.6
inv_txfm_add_8x16_dct_flipadst_1_12bpc_c: 4463.5
inv_txfm_add_8x16_dct_flipadst_1_12bpc_avx2: 333.5
inv_txfm_add_8x16_dct_flipadst_2_12bpc_c: 4459.4
inv_txfm_add_8x16_dct_flipadst_2_12bpc_avx2: 397.4
inv_txfm_add_8x16_dct_identity_0_12bpc_c: 3237.1
inv_txfm_add_8x16_dct_identity_0_12bpc_avx2: 149.6
inv_txfm_add_8x16_dct_identity_1_12bpc_c: 3229.9
inv_txfm_add_8x16_dct_identity_1_12bpc_avx2: 148.6
inv_txfm_add_8x16_dct_identity_2_12bpc_c: 3225.6
inv_txfm_add_8x16_dct_identity_2_12bpc_avx2: 211.3
inv_txfm_add_8x16_flipadst_adst_0_12bpc_c: 4532.1
inv_txfm_add_8x16_flipadst_adst_0_12bpc_avx2: 356.2
inv_txfm_add_8x16_flipadst_adst_1_12bpc_c: 4527.6
inv_txfm_add_8x16_flipadst_adst_1_12bpc_avx2: 356.1
inv_txfm_add_8x16_flipadst_adst_2_12bpc_c: 4532.5
inv_txfm_add_8x16_flipadst_adst_2_12bpc_avx2: 440.0
inv_txfm_add_8x16_flipadst_dct_0_12bpc_c: 4571.6
inv_txfm_add_8x16_flipadst_dct_0_12bpc_avx2: 310.3
inv_txfm_add_8x16_flipadst_dct_1_12bpc_c: 4554.5
inv_txfm_add_8x16_flipadst_dct_1_12bpc_avx2: 309.7
inv_txfm_add_8x16_flipadst_dct_2_12bpc_c: 4554.3
inv_txfm_add_8x16_flipadst_dct_2_12bpc_avx2: 399.9
inv_txfm_add_8x16_flipadst_flipadst_0_12bpc_c: 4497.2
inv_txfm_add_8x16_flipadst_flipadst_0_12bpc_avx2: 355.9
inv_txfm_add_8x16_flipadst_flipadst_1_12bpc_c: 4486.2
inv_txfm_add_8x16_flipadst_flipadst_1_12bpc_avx2: 355.6
inv_txfm_add_8x16_flipadst_flipadst_2_12bpc_c: 4493.4
inv_txfm_add_8x16_flipadst_flipadst_2_12bpc_avx2: 446.0
inv_txfm_add_8x16_flipadst_identity_0_12bpc_c: 3265.7
inv_txfm_add_8x16_flipadst_identity_0_12bpc_avx2: 173.8
inv_txfm_add_8x16_flipadst_identity_1_12bpc_c: 3270.8
inv_txfm_add_8x16_flipadst_identity_1_12bpc_avx2: 173.5
inv_txfm_add_8x16_flipadst_identity_2_12bpc_c: 3271.8
inv_txfm_add_8x16_flipadst_identity_2_12bpc_avx2: 261.6
inv_txfm_add_8x16_identity_adst_0_12bpc_c: 3295.3
inv_txfm_add_8x16_identity_adst_0_12bpc_avx2: 302.5
inv_txfm_add_8x16_identity_adst_1_12bpc_c: 3303.1
inv_txfm_add_8x16_identity_adst_1_12bpc_avx2: 303.0
inv_txfm_add_8x16_identity_adst_2_12bpc_c: 3304.6
inv_txfm_add_8x16_identity_adst_2_12bpc_avx2: 303.1
inv_txfm_add_8x16_identity_dct_0_12bpc_c: 3298.9
inv_txfm_add_8x16_identity_dct_0_12bpc_avx2: 257.8
inv_txfm_add_8x16_identity_dct_1_12bpc_c: 3308.1
inv_txfm_add_8x16_identity_dct_1_12bpc_avx2: 259.2
inv_txfm_add_8x16_identity_dct_2_12bpc_c: 3306.6
inv_txfm_add_8x16_identity_dct_2_12bpc_avx2: 259.2
inv_txfm_add_8x16_identity_flipadst_0_12bpc_c: 3294.7
inv_txfm_add_8x16_identity_flipadst_0_12bpc_avx2: 302.2
inv_txfm_add_8x16_identity_flipadst_1_12bpc_c: 3292.5
inv_txfm_add_8x16_identity_flipadst_1_12bpc_avx2: 302.2
inv_txfm_add_8x16_identity_flipadst_2_12bpc_c: 3275.4
inv_txfm_add_8x16_identity_flipadst_2_12bpc_avx2: 303.3
inv_txfm_add_8x16_identity_identity_0_12bpc_c: 2044.6
inv_txfm_add_8x16_identity_identity_0_12bpc_avx2: 116.2
inv_txfm_add_8x16_identity_identity_1_12bpc_c: 2059.9
inv_txfm_add_8x16_identity_identity_1_12bpc_avx2: 117.0
inv_txfm_add_8x16_identity_identity_2_12bpc_c: 2048.4
inv_txfm_add_8x16_identity_identity_2_12bpc_avx2: 116.2
2021-12-04 05:04:37 +01:00
Matthias Dressel 7be128579e x86/itx: Add 16x4 12bpc AVX2 transforms
inv_txfm_add_16x4_adst_adst_0_12bpc_c: 1756.6
inv_txfm_add_16x4_adst_adst_0_12bpc_avx2: 182.4
inv_txfm_add_16x4_adst_adst_1_12bpc_c: 1756.0
inv_txfm_add_16x4_adst_adst_1_12bpc_avx2: 182.5
inv_txfm_add_16x4_adst_adst_2_12bpc_c: 1763.2
inv_txfm_add_16x4_adst_adst_2_12bpc_avx2: 182.4
inv_txfm_add_16x4_adst_dct_0_12bpc_c: 1863.6
inv_txfm_add_16x4_adst_dct_0_12bpc_avx2: 176.0
inv_txfm_add_16x4_adst_dct_1_12bpc_c: 1864.1
inv_txfm_add_16x4_adst_dct_1_12bpc_avx2: 176.0
inv_txfm_add_16x4_adst_dct_2_12bpc_c: 1861.3
inv_txfm_add_16x4_adst_dct_2_12bpc_avx2: 176.0
inv_txfm_add_16x4_adst_flipadst_0_12bpc_c: 1768.6
inv_txfm_add_16x4_adst_flipadst_0_12bpc_avx2: 184.1
inv_txfm_add_16x4_adst_flipadst_1_12bpc_c: 1768.8
inv_txfm_add_16x4_adst_flipadst_1_12bpc_avx2: 184.5
inv_txfm_add_16x4_adst_flipadst_2_12bpc_c: 1769.3
inv_txfm_add_16x4_adst_flipadst_2_12bpc_avx2: 184.7
inv_txfm_add_16x4_adst_identity_0_12bpc_c: 1686.6
inv_txfm_add_16x4_adst_identity_0_12bpc_avx2: 145.4
inv_txfm_add_16x4_adst_identity_1_12bpc_c: 1685.8
inv_txfm_add_16x4_adst_identity_1_12bpc_avx2: 145.8
inv_txfm_add_16x4_adst_identity_2_12bpc_c: 1681.7
inv_txfm_add_16x4_adst_identity_2_12bpc_avx2: 145.8
inv_txfm_add_16x4_dct_adst_0_12bpc_c: 1783.4
inv_txfm_add_16x4_dct_adst_0_12bpc_avx2: 167.7
inv_txfm_add_16x4_dct_adst_1_12bpc_c: 1789.1
inv_txfm_add_16x4_dct_adst_1_12bpc_avx2: 167.9
inv_txfm_add_16x4_dct_adst_2_12bpc_c: 1788.0
inv_txfm_add_16x4_dct_adst_2_12bpc_avx2: 169.8
inv_txfm_add_16x4_dct_dct_0_12bpc_c: 209.5
inv_txfm_add_16x4_dct_dct_0_12bpc_avx2: 21.6
inv_txfm_add_16x4_dct_dct_1_12bpc_c: 1894.3
inv_txfm_add_16x4_dct_dct_1_12bpc_avx2: 156.8
inv_txfm_add_16x4_dct_dct_2_12bpc_c: 1892.0
inv_txfm_add_16x4_dct_dct_2_12bpc_avx2: 156.8
inv_txfm_add_16x4_dct_flipadst_0_12bpc_c: 1784.7
inv_txfm_add_16x4_dct_flipadst_0_12bpc_avx2: 167.2
inv_txfm_add_16x4_dct_flipadst_1_12bpc_c: 1796.7
inv_txfm_add_16x4_dct_flipadst_1_12bpc_avx2: 168.6
inv_txfm_add_16x4_dct_flipadst_2_12bpc_c: 1788.9
inv_txfm_add_16x4_dct_flipadst_2_12bpc_avx2: 168.9
inv_txfm_add_16x4_dct_identity_0_12bpc_c: 1712.7
inv_txfm_add_16x4_dct_identity_0_12bpc_avx2: 128.8
inv_txfm_add_16x4_dct_identity_1_12bpc_c: 1714.8
inv_txfm_add_16x4_dct_identity_1_12bpc_avx2: 128.8
inv_txfm_add_16x4_dct_identity_2_12bpc_c: 1710.2
inv_txfm_add_16x4_dct_identity_2_12bpc_avx2: 128.8
inv_txfm_add_16x4_flipadst_adst_0_12bpc_c: 1763.6
inv_txfm_add_16x4_flipadst_adst_0_12bpc_avx2: 186.6
inv_txfm_add_16x4_flipadst_adst_1_12bpc_c: 1761.1
inv_txfm_add_16x4_flipadst_adst_1_12bpc_avx2: 185.6
inv_txfm_add_16x4_flipadst_adst_2_12bpc_c: 1761.8
inv_txfm_add_16x4_flipadst_adst_2_12bpc_avx2: 187.0
inv_txfm_add_16x4_flipadst_dct_0_12bpc_c: 1864.4
inv_txfm_add_16x4_flipadst_dct_0_12bpc_avx2: 176.8
inv_txfm_add_16x4_flipadst_dct_1_12bpc_c: 1862.7
inv_txfm_add_16x4_flipadst_dct_1_12bpc_avx2: 176.8
inv_txfm_add_16x4_flipadst_dct_2_12bpc_c: 1860.2
inv_txfm_add_16x4_flipadst_dct_2_12bpc_avx2: 176.8
inv_txfm_add_16x4_flipadst_flipadst_0_12bpc_c: 1760.4
inv_txfm_add_16x4_flipadst_flipadst_0_12bpc_avx2: 185.3
inv_txfm_add_16x4_flipadst_flipadst_1_12bpc_c: 1761.8
inv_txfm_add_16x4_flipadst_flipadst_1_12bpc_avx2: 185.3
inv_txfm_add_16x4_flipadst_flipadst_2_12bpc_c: 1766.5
inv_txfm_add_16x4_flipadst_flipadst_2_12bpc_avx2: 184.9
inv_txfm_add_16x4_flipadst_identity_0_12bpc_c: 1673.0
inv_txfm_add_16x4_flipadst_identity_0_12bpc_avx2: 143.1
inv_txfm_add_16x4_flipadst_identity_1_12bpc_c: 1673.2
inv_txfm_add_16x4_flipadst_identity_1_12bpc_avx2: 143.1
inv_txfm_add_16x4_flipadst_identity_2_12bpc_c: 1681.6
inv_txfm_add_16x4_flipadst_identity_2_12bpc_avx2: 143.2
inv_txfm_add_16x4_identity_adst_0_12bpc_c: 1128.7
inv_txfm_add_16x4_identity_adst_0_12bpc_avx2: 102.8
inv_txfm_add_16x4_identity_adst_1_12bpc_c: 1131.3
inv_txfm_add_16x4_identity_adst_1_12bpc_avx2: 101.3
inv_txfm_add_16x4_identity_adst_2_12bpc_c: 1127.5
inv_txfm_add_16x4_identity_adst_2_12bpc_avx2: 99.1
inv_txfm_add_16x4_identity_dct_0_12bpc_c: 1228.3
inv_txfm_add_16x4_identity_dct_0_12bpc_avx2: 88.3
inv_txfm_add_16x4_identity_dct_1_12bpc_c: 1220.5
inv_txfm_add_16x4_identity_dct_1_12bpc_avx2: 88.0
inv_txfm_add_16x4_identity_dct_2_12bpc_c: 1227.3
inv_txfm_add_16x4_identity_dct_2_12bpc_avx2: 88.1
inv_txfm_add_16x4_identity_flipadst_0_12bpc_c: 1142.4
inv_txfm_add_16x4_identity_flipadst_0_12bpc_avx2: 100.3
inv_txfm_add_16x4_identity_flipadst_1_12bpc_c: 1134.1
inv_txfm_add_16x4_identity_flipadst_1_12bpc_avx2: 100.3
inv_txfm_add_16x4_identity_flipadst_2_12bpc_c: 1136.4
inv_txfm_add_16x4_identity_flipadst_2_12bpc_avx2: 100.3
inv_txfm_add_16x4_identity_identity_0_12bpc_c: 1056.1
inv_txfm_add_16x4_identity_identity_0_12bpc_avx2: 61.6
inv_txfm_add_16x4_identity_identity_1_12bpc_c: 1064.6
inv_txfm_add_16x4_identity_identity_1_12bpc_avx2: 62.9
inv_txfm_add_16x4_identity_identity_2_12bpc_c: 1067.5
inv_txfm_add_16x4_identity_identity_2_12bpc_avx2: 63.5
2021-11-29 15:30:38 +01:00
Matthias Dressel f64b2c2256 x86/itx: Add 4x16 12bpc AVX2 transforms
inv_txfm_add_4x16_adst_adst_0_12bpc_c: 1799.1
inv_txfm_add_4x16_adst_adst_0_12bpc_avx2: 178.8
inv_txfm_add_4x16_adst_adst_1_12bpc_c: 1795.0
inv_txfm_add_4x16_adst_adst_1_12bpc_avx2: 179.1
inv_txfm_add_4x16_adst_adst_2_12bpc_c: 1806.6
inv_txfm_add_4x16_adst_adst_2_12bpc_avx2: 179.3
inv_txfm_add_4x16_adst_dct_0_12bpc_c: 1824.8
inv_txfm_add_4x16_adst_dct_0_12bpc_avx2: 166.8
inv_txfm_add_4x16_adst_dct_1_12bpc_c: 1828.2
inv_txfm_add_4x16_adst_dct_1_12bpc_avx2: 166.7
inv_txfm_add_4x16_adst_dct_2_12bpc_c: 1830.9
inv_txfm_add_4x16_adst_dct_2_12bpc_avx2: 165.6
inv_txfm_add_4x16_adst_flipadst_0_12bpc_c: 1797.9
inv_txfm_add_4x16_adst_flipadst_0_12bpc_avx2: 179.6
inv_txfm_add_4x16_adst_flipadst_1_12bpc_c: 1795.9
inv_txfm_add_4x16_adst_flipadst_1_12bpc_avx2: 180.6
inv_txfm_add_4x16_adst_flipadst_2_12bpc_c: 1791.6
inv_txfm_add_4x16_adst_flipadst_2_12bpc_avx2: 180.1
inv_txfm_add_4x16_adst_identity_0_12bpc_c: 1163.7
inv_txfm_add_4x16_adst_identity_0_12bpc_avx2: 78.6
inv_txfm_add_4x16_adst_identity_1_12bpc_c: 1163.4
inv_txfm_add_4x16_adst_identity_1_12bpc_avx2: 78.9
inv_txfm_add_4x16_adst_identity_2_12bpc_c: 1164.3
inv_txfm_add_4x16_adst_identity_2_12bpc_avx2: 78.8
inv_txfm_add_4x16_dct_adst_0_12bpc_c: 1914.8
inv_txfm_add_4x16_dct_adst_0_12bpc_avx2: 177.0
inv_txfm_add_4x16_dct_adst_1_12bpc_c: 1904.8
inv_txfm_add_4x16_dct_adst_1_12bpc_avx2: 177.3
inv_txfm_add_4x16_dct_adst_2_12bpc_c: 1905.4
inv_txfm_add_4x16_dct_adst_2_12bpc_avx2: 176.4
inv_txfm_add_4x16_dct_dct_0_12bpc_c: 217.1
inv_txfm_add_4x16_dct_dct_0_12bpc_avx2: 26.6
inv_txfm_add_4x16_dct_dct_1_12bpc_c: 1955.1
inv_txfm_add_4x16_dct_dct_1_12bpc_avx2: 162.3
inv_txfm_add_4x16_dct_dct_2_12bpc_c: 1948.9
inv_txfm_add_4x16_dct_dct_2_12bpc_avx2: 162.2
inv_txfm_add_4x16_dct_flipadst_0_12bpc_c: 1922.8
inv_txfm_add_4x16_dct_flipadst_0_12bpc_avx2: 180.6
inv_txfm_add_4x16_dct_flipadst_1_12bpc_c: 1919.7
inv_txfm_add_4x16_dct_flipadst_1_12bpc_avx2: 180.1
inv_txfm_add_4x16_dct_flipadst_2_12bpc_c: 1912.0
inv_txfm_add_4x16_dct_flipadst_2_12bpc_avx2: 180.1
inv_txfm_add_4x16_dct_identity_0_12bpc_c: 1276.4
inv_txfm_add_4x16_dct_identity_0_12bpc_avx2: 75.4
inv_txfm_add_4x16_dct_identity_1_12bpc_c: 1277.5
inv_txfm_add_4x16_dct_identity_1_12bpc_avx2: 75.4
inv_txfm_add_4x16_dct_identity_2_12bpc_c: 1270.1
inv_txfm_add_4x16_dct_identity_2_12bpc_avx2: 75.3
inv_txfm_add_4x16_flipadst_adst_0_12bpc_c: 1802.8
inv_txfm_add_4x16_flipadst_adst_0_12bpc_avx2: 180.8
inv_txfm_add_4x16_flipadst_adst_1_12bpc_c: 1804.8
inv_txfm_add_4x16_flipadst_adst_1_12bpc_avx2: 180.7
inv_txfm_add_4x16_flipadst_adst_2_12bpc_c: 1800.6
inv_txfm_add_4x16_flipadst_adst_2_12bpc_avx2: 181.2
inv_txfm_add_4x16_flipadst_dct_0_12bpc_c: 1842.5
inv_txfm_add_4x16_flipadst_dct_0_12bpc_avx2: 165.1
inv_txfm_add_4x16_flipadst_dct_1_12bpc_c: 1837.8
inv_txfm_add_4x16_flipadst_dct_1_12bpc_avx2: 164.4
inv_txfm_add_4x16_flipadst_dct_2_12bpc_c: 1841.6
inv_txfm_add_4x16_flipadst_dct_2_12bpc_avx2: 166.1
inv_txfm_add_4x16_flipadst_flipadst_0_12bpc_c: 1812.4
inv_txfm_add_4x16_flipadst_flipadst_0_12bpc_avx2: 182.0
inv_txfm_add_4x16_flipadst_flipadst_1_12bpc_c: 1803.9
inv_txfm_add_4x16_flipadst_flipadst_1_12bpc_avx2: 181.2
inv_txfm_add_4x16_flipadst_flipadst_2_12bpc_c: 1809.9
inv_txfm_add_4x16_flipadst_flipadst_2_12bpc_avx2: 183.2
inv_txfm_add_4x16_flipadst_identity_0_12bpc_c: 1170.5
inv_txfm_add_4x16_flipadst_identity_0_12bpc_avx2: 78.4
inv_txfm_add_4x16_flipadst_identity_1_12bpc_c: 1172.1
inv_txfm_add_4x16_flipadst_identity_1_12bpc_avx2: 80.0
inv_txfm_add_4x16_flipadst_identity_2_12bpc_c: 1170.9
inv_txfm_add_4x16_flipadst_identity_2_12bpc_avx2: 78.6
inv_txfm_add_4x16_identity_adst_0_12bpc_c: 1705.4
inv_txfm_add_4x16_identity_adst_0_12bpc_avx2: 162.6
inv_txfm_add_4x16_identity_adst_1_12bpc_c: 1714.5
inv_txfm_add_4x16_identity_adst_1_12bpc_avx2: 162.6
inv_txfm_add_4x16_identity_adst_2_12bpc_c: 1703.1
inv_txfm_add_4x16_identity_adst_2_12bpc_avx2: 162.5
inv_txfm_add_4x16_identity_dct_0_12bpc_c: 1775.0
inv_txfm_add_4x16_identity_dct_0_12bpc_avx2: 150.5
inv_txfm_add_4x16_identity_dct_1_12bpc_c: 1753.0
inv_txfm_add_4x16_identity_dct_1_12bpc_avx2: 150.6
inv_txfm_add_4x16_identity_dct_2_12bpc_c: 1759.6
inv_txfm_add_4x16_identity_dct_2_12bpc_avx2: 149.8
inv_txfm_add_4x16_identity_flipadst_0_12bpc_c: 1727.5
inv_txfm_add_4x16_identity_flipadst_0_12bpc_avx2: 160.3
inv_txfm_add_4x16_identity_flipadst_1_12bpc_c: 1739.8
inv_txfm_add_4x16_identity_flipadst_1_12bpc_avx2: 160.9
inv_txfm_add_4x16_identity_flipadst_2_12bpc_c: 1728.3
inv_txfm_add_4x16_identity_flipadst_2_12bpc_avx2: 159.9
inv_txfm_add_4x16_identity_identity_0_12bpc_c: 1098.6
inv_txfm_add_4x16_identity_identity_0_12bpc_avx2: 60.4
inv_txfm_add_4x16_identity_identity_1_12bpc_c: 1095.4
inv_txfm_add_4x16_identity_identity_1_12bpc_avx2: 61.3
inv_txfm_add_4x16_identity_identity_2_12bpc_c: 1111.6
inv_txfm_add_4x16_identity_identity_2_12bpc_avx2: 60.6
2021-11-29 15:30:38 +01:00
Matthias Dressel 00f92f2ccb x86/itx: Convert 8bpc WHT to SSE2
WHT uses no SSSE3 instructions. The 16bpc variant is already SSE2.
2021-11-29 14:56:25 +01:00
Matthias Dressel 31820a5e6b x86/itx: Add 8x8 12bpc AVX2 transforms
inv_txfm_add_8x8_adst_adst_0_12bpc_c: 1997.9
inv_txfm_add_8x8_adst_adst_0_12bpc_avx2: 185.7
inv_txfm_add_8x8_adst_adst_1_12bpc_c: 2009.8
inv_txfm_add_8x8_adst_adst_1_12bpc_avx2: 185.7
inv_txfm_add_8x8_adst_dct_0_12bpc_c: 1991.0
inv_txfm_add_8x8_adst_dct_0_12bpc_avx2: 161.3
inv_txfm_add_8x8_adst_dct_1_12bpc_c: 1977.0
inv_txfm_add_8x8_adst_dct_1_12bpc_avx2: 161.4
inv_txfm_add_8x8_adst_flipadst_0_12bpc_c: 2017.6
inv_txfm_add_8x8_adst_flipadst_0_12bpc_avx2: 184.2
inv_txfm_add_8x8_adst_flipadst_1_12bpc_c: 2018.9
inv_txfm_add_8x8_adst_flipadst_1_12bpc_avx2: 184.2
inv_txfm_add_8x8_adst_identity_0_12bpc_c: 1407.2
inv_txfm_add_8x8_adst_identity_0_12bpc_avx2: 95.7
inv_txfm_add_8x8_adst_identity_1_12bpc_c: 1405.9
inv_txfm_add_8x8_adst_identity_1_12bpc_avx2: 95.8
inv_txfm_add_8x8_dct_adst_0_12bpc_c: 2024.2
inv_txfm_add_8x8_dct_adst_0_12bpc_avx2: 156.9
inv_txfm_add_8x8_dct_adst_1_12bpc_c: 2018.8
inv_txfm_add_8x8_dct_adst_1_12bpc_avx2: 160.1
inv_txfm_add_8x8_dct_dct_0_12bpc_c: 213.0
inv_txfm_add_8x8_dct_dct_0_12bpc_avx2: 24.8
inv_txfm_add_8x8_dct_dct_1_12bpc_c: 2008.6
inv_txfm_add_8x8_dct_dct_1_12bpc_avx2: 139.0
inv_txfm_add_8x8_dct_flipadst_0_12bpc_c: 2012.3
inv_txfm_add_8x8_dct_flipadst_0_12bpc_avx2: 159.2
inv_txfm_add_8x8_dct_flipadst_1_12bpc_c: 2005.1
inv_txfm_add_8x8_dct_flipadst_1_12bpc_avx2: 158.7
inv_txfm_add_8x8_dct_identity_0_12bpc_c: 1470.4
inv_txfm_add_8x8_dct_identity_0_12bpc_avx2: 71.7
inv_txfm_add_8x8_dct_identity_1_12bpc_c: 1477.8
inv_txfm_add_8x8_dct_identity_1_12bpc_avx2: 70.7
inv_txfm_add_8x8_flipadst_adst_0_12bpc_c: 2006.1
inv_txfm_add_8x8_flipadst_adst_0_12bpc_avx2: 183.6
inv_txfm_add_8x8_flipadst_adst_1_12bpc_c: 1987.6
inv_txfm_add_8x8_flipadst_adst_1_12bpc_avx2: 183.6
inv_txfm_add_8x8_flipadst_dct_0_12bpc_c: 1986.6
inv_txfm_add_8x8_flipadst_dct_0_12bpc_avx2: 163.0
inv_txfm_add_8x8_flipadst_dct_1_12bpc_c: 1979.3
inv_txfm_add_8x8_flipadst_dct_1_12bpc_avx2: 163.1
inv_txfm_add_8x8_flipadst_flipadst_0_12bpc_c: 2004.0
inv_txfm_add_8x8_flipadst_flipadst_0_12bpc_avx2: 184.3
inv_txfm_add_8x8_flipadst_flipadst_1_12bpc_c: 2003.9
inv_txfm_add_8x8_flipadst_flipadst_1_12bpc_avx2: 184.3
inv_txfm_add_8x8_flipadst_identity_0_12bpc_c: 1433.5
inv_txfm_add_8x8_flipadst_identity_0_12bpc_avx2: 95.3
inv_txfm_add_8x8_flipadst_identity_1_12bpc_c: 1425.4
inv_txfm_add_8x8_flipadst_identity_1_12bpc_avx2: 96.3
inv_txfm_add_8x8_identity_adst_0_12bpc_c: 1456.5
inv_txfm_add_8x8_identity_adst_0_12bpc_avx2: 115.8
inv_txfm_add_8x8_identity_adst_1_12bpc_c: 1453.5
inv_txfm_add_8x8_identity_adst_1_12bpc_avx2: 115.8
inv_txfm_add_8x8_identity_dct_0_12bpc_c: 1450.0
inv_txfm_add_8x8_identity_dct_0_12bpc_avx2: 93.5
inv_txfm_add_8x8_identity_dct_1_12bpc_c: 1447.5
inv_txfm_add_8x8_identity_dct_1_12bpc_avx2: 94.3
inv_txfm_add_8x8_identity_flipadst_0_12bpc_c: 1451.7
inv_txfm_add_8x8_identity_flipadst_0_12bpc_avx2: 114.0
inv_txfm_add_8x8_identity_flipadst_1_12bpc_c: 1456.4
inv_txfm_add_8x8_identity_flipadst_1_12bpc_avx2: 114.0
inv_txfm_add_8x8_identity_identity_0_12bpc_c: 892.3
inv_txfm_add_8x8_identity_identity_0_12bpc_avx2: 33.7
inv_txfm_add_8x8_identity_identity_1_12bpc_c: 897.2
inv_txfm_add_8x8_identity_identity_1_12bpc_avx2: 33.1
2021-11-13 15:04:54 +01:00
Matthias Dressel 53cf6a3b65 x86/itx: Add 8x4 12bpc AVX2 transforms
inv_txfm_add_8x4_adst_adst_0_12bpc_c: 882.1
inv_txfm_add_8x4_adst_adst_0_12bpc_avx2: 113.7
inv_txfm_add_8x4_adst_adst_1_12bpc_c: 882.5
inv_txfm_add_8x4_adst_adst_1_12bpc_avx2: 113.8
inv_txfm_add_8x4_adst_dct_0_12bpc_c: 928.0
inv_txfm_add_8x4_adst_dct_0_12bpc_avx2: 109.2
inv_txfm_add_8x4_adst_dct_1_12bpc_c: 924.9
inv_txfm_add_8x4_adst_dct_1_12bpc_avx2: 109.2
inv_txfm_add_8x4_adst_flipadst_0_12bpc_c: 889.9
inv_txfm_add_8x4_adst_flipadst_0_12bpc_avx2: 114.3
inv_txfm_add_8x4_adst_flipadst_1_12bpc_c: 886.0
inv_txfm_add_8x4_adst_flipadst_1_12bpc_avx2: 114.8
inv_txfm_add_8x4_adst_identity_0_12bpc_c: 832.2
inv_txfm_add_8x4_adst_identity_0_12bpc_avx2: 88.8
inv_txfm_add_8x4_adst_identity_1_12bpc_c: 834.6
inv_txfm_add_8x4_adst_identity_1_12bpc_avx2: 89.0
inv_txfm_add_8x4_dct_adst_0_12bpc_c: 870.3
inv_txfm_add_8x4_dct_adst_0_12bpc_avx2: 96.3
inv_txfm_add_8x4_dct_adst_1_12bpc_c: 884.6
inv_txfm_add_8x4_dct_adst_1_12bpc_avx2: 96.3
inv_txfm_add_8x4_dct_dct_0_12bpc_c: 116.1
inv_txfm_add_8x4_dct_dct_0_12bpc_avx2: 24.5
inv_txfm_add_8x4_dct_dct_1_12bpc_c: 925.1
inv_txfm_add_8x4_dct_dct_1_12bpc_avx2: 92.3
inv_txfm_add_8x4_dct_flipadst_0_12bpc_c: 882.7
inv_txfm_add_8x4_dct_flipadst_0_12bpc_avx2: 97.0
inv_txfm_add_8x4_dct_flipadst_1_12bpc_c: 882.1
inv_txfm_add_8x4_dct_flipadst_1_12bpc_avx2: 97.0
inv_txfm_add_8x4_dct_identity_0_12bpc_c: 827.5
inv_txfm_add_8x4_dct_identity_0_12bpc_avx2: 72.4
inv_txfm_add_8x4_dct_identity_1_12bpc_c: 827.8
inv_txfm_add_8x4_dct_identity_1_12bpc_avx2: 73.8
inv_txfm_add_8x4_flipadst_adst_0_12bpc_c: 899.5
inv_txfm_add_8x4_flipadst_adst_0_12bpc_avx2: 113.2
inv_txfm_add_8x4_flipadst_adst_1_12bpc_c: 898.8
inv_txfm_add_8x4_flipadst_adst_1_12bpc_avx2: 113.3
inv_txfm_add_8x4_flipadst_dct_0_12bpc_c: 945.7
inv_txfm_add_8x4_flipadst_dct_0_12bpc_avx2: 108.3
inv_txfm_add_8x4_flipadst_dct_1_12bpc_c: 945.6
inv_txfm_add_8x4_flipadst_dct_1_12bpc_avx2: 108.3
inv_txfm_add_8x4_flipadst_flipadst_0_12bpc_c: 903.6
inv_txfm_add_8x4_flipadst_flipadst_0_12bpc_avx2: 113.9
inv_txfm_add_8x4_flipadst_flipadst_1_12bpc_c: 902.8
inv_txfm_add_8x4_flipadst_flipadst_1_12bpc_avx2: 114.2
inv_txfm_add_8x4_flipadst_identity_0_12bpc_c: 856.6
inv_txfm_add_8x4_flipadst_identity_0_12bpc_avx2: 88.3
inv_txfm_add_8x4_flipadst_identity_1_12bpc_c: 848.8
inv_txfm_add_8x4_flipadst_identity_1_12bpc_avx2: 87.4
inv_txfm_add_8x4_identity_adst_0_12bpc_c: 583.2
inv_txfm_add_8x4_identity_adst_0_12bpc_avx2: 69.6
inv_txfm_add_8x4_identity_adst_1_12bpc_c: 584.3
inv_txfm_add_8x4_identity_adst_1_12bpc_avx2: 69.6
inv_txfm_add_8x4_identity_dct_0_12bpc_c: 632.9
inv_txfm_add_8x4_identity_dct_0_12bpc_avx2: 65.3
inv_txfm_add_8x4_identity_dct_1_12bpc_c: 629.6
inv_txfm_add_8x4_identity_dct_1_12bpc_avx2: 65.8
inv_txfm_add_8x4_identity_flipadst_0_12bpc_c: 587.0
inv_txfm_add_8x4_identity_flipadst_0_12bpc_avx2: 71.0
inv_txfm_add_8x4_identity_flipadst_1_12bpc_c: 586.9
inv_txfm_add_8x4_identity_flipadst_1_12bpc_avx2: 71.0
inv_txfm_add_8x4_identity_identity_0_12bpc_c: 533.0
inv_txfm_add_8x4_identity_identity_0_12bpc_avx2: 45.3
inv_txfm_add_8x4_identity_identity_1_12bpc_c: 539.7
inv_txfm_add_8x4_identity_identity_1_12bpc_avx2: 45.9
2021-11-13 15:04:54 +01:00
Matthias Dressel 241753f5be x86/itx: Add 4x8 12bpc AVX2 transforms
inv_txfm_add_4x8_adst_adst_0_12bpc_c: 900.8
inv_txfm_add_4x8_adst_adst_0_12bpc_avx2: 118.8
inv_txfm_add_4x8_adst_adst_1_12bpc_c: 893.7
inv_txfm_add_4x8_adst_adst_1_12bpc_avx2: 118.8
inv_txfm_add_4x8_adst_dct_0_12bpc_c: 890.2
inv_txfm_add_4x8_adst_dct_0_12bpc_avx2: 104.8
inv_txfm_add_4x8_adst_dct_1_12bpc_c: 887.4
inv_txfm_add_4x8_adst_dct_1_12bpc_avx2: 104.8
inv_txfm_add_4x8_adst_flipadst_0_12bpc_c: 919.6
inv_txfm_add_4x8_adst_flipadst_0_12bpc_avx2: 116.6
inv_txfm_add_4x8_adst_flipadst_1_12bpc_c: 912.1
inv_txfm_add_4x8_adst_flipadst_1_12bpc_avx2: 116.6
inv_txfm_add_4x8_adst_identity_0_12bpc_c: 613.5
inv_txfm_add_4x8_adst_identity_0_12bpc_avx2: 42.8
inv_txfm_add_4x8_adst_identity_1_12bpc_c: 608.7
inv_txfm_add_4x8_adst_identity_1_12bpc_avx2: 43.3
inv_txfm_add_4x8_dct_adst_0_12bpc_c: 951.7
inv_txfm_add_4x8_dct_adst_0_12bpc_avx2: 113.8
inv_txfm_add_4x8_dct_adst_1_12bpc_c: 949.0
inv_txfm_add_4x8_dct_adst_1_12bpc_avx2: 113.1
inv_txfm_add_4x8_dct_dct_0_12bpc_c: 118.6
inv_txfm_add_4x8_dct_dct_0_12bpc_avx2: 24.5
inv_txfm_add_4x8_dct_dct_1_12bpc_c: 942.4
inv_txfm_add_4x8_dct_dct_1_12bpc_avx2: 99.2
inv_txfm_add_4x8_dct_flipadst_0_12bpc_c: 959.3
inv_txfm_add_4x8_dct_flipadst_0_12bpc_avx2: 113.9
inv_txfm_add_4x8_dct_flipadst_1_12bpc_c: 964.1
inv_txfm_add_4x8_dct_flipadst_1_12bpc_avx2: 114.3
inv_txfm_add_4x8_dct_identity_0_12bpc_c: 659.9
inv_txfm_add_4x8_dct_identity_0_12bpc_avx2: 41.9
inv_txfm_add_4x8_dct_identity_1_12bpc_c: 658.6
inv_txfm_add_4x8_dct_identity_1_12bpc_avx2: 41.6
inv_txfm_add_4x8_flipadst_adst_0_12bpc_c: 906.6
inv_txfm_add_4x8_flipadst_adst_0_12bpc_avx2: 117.3
inv_txfm_add_4x8_flipadst_adst_1_12bpc_c: 907.7
inv_txfm_add_4x8_flipadst_adst_1_12bpc_avx2: 117.3
inv_txfm_add_4x8_flipadst_dct_0_12bpc_c: 890.3
inv_txfm_add_4x8_flipadst_dct_0_12bpc_avx2: 104.6
inv_txfm_add_4x8_flipadst_dct_1_12bpc_c: 895.6
inv_txfm_add_4x8_flipadst_dct_1_12bpc_avx2: 104.6
inv_txfm_add_4x8_flipadst_flipadst_0_12bpc_c: 902.9
inv_txfm_add_4x8_flipadst_flipadst_0_12bpc_avx2: 116.5
inv_txfm_add_4x8_flipadst_flipadst_1_12bpc_c: 915.0
inv_txfm_add_4x8_flipadst_flipadst_1_12bpc_avx2: 116.4
inv_txfm_add_4x8_flipadst_identity_0_12bpc_c: 618.6
inv_txfm_add_4x8_flipadst_identity_0_12bpc_avx2: 45.3
inv_txfm_add_4x8_flipadst_identity_1_12bpc_c: 618.1
inv_txfm_add_4x8_flipadst_identity_1_12bpc_avx2: 44.0
inv_txfm_add_4x8_identity_adst_0_12bpc_c: 829.7
inv_txfm_add_4x8_identity_adst_0_12bpc_avx2: 107.4
inv_txfm_add_4x8_identity_adst_1_12bpc_c: 831.7
inv_txfm_add_4x8_identity_adst_1_12bpc_avx2: 107.8
inv_txfm_add_4x8_identity_dct_0_12bpc_c: 823.2
inv_txfm_add_4x8_identity_dct_0_12bpc_avx2: 90.7
inv_txfm_add_4x8_identity_dct_1_12bpc_c: 824.1
inv_txfm_add_4x8_identity_dct_1_12bpc_avx2: 90.7
inv_txfm_add_4x8_identity_flipadst_0_12bpc_c: 853.4
inv_txfm_add_4x8_identity_flipadst_0_12bpc_avx2: 106.8
inv_txfm_add_4x8_identity_flipadst_1_12bpc_c: 852.2
inv_txfm_add_4x8_identity_flipadst_1_12bpc_avx2: 106.8
inv_txfm_add_4x8_identity_identity_0_12bpc_c: 543.2
inv_txfm_add_4x8_identity_identity_0_12bpc_avx2: 36.4
inv_txfm_add_4x8_identity_identity_1_12bpc_c: 544.8
inv_txfm_add_4x8_identity_identity_1_12bpc_avx2: 36.6
2021-11-13 13:58:28 +01:00
Matthias Dressel 9727d8579b CI: Check for potientially dangerous Unicode characters
Bidirectional control and invisible characters can be used to hide
malicious code.
Ref: CVE-2021-42574, CVE-2021-42694
2021-11-05 14:58:25 +01:00
Matthias Dressel e40cc46c3c x86/itx: Add clipping to iadst 4x16
Values need to be clipped after Hadamard rotations.
2021-11-02 16:29:05 +01:00
Matthias Dressel eb0308bcdf x86/itx: Add 12-bit 4x4 transforms in AVX2
Refactors itx into separate 10, 12 bit functions to prevent conditional
jumps.

inv_txfm_add_4x4_adst_adst_0_12bpc_c: 370.9
inv_txfm_add_4x4_adst_adst_0_12bpc_avx2: 68.6
inv_txfm_add_4x4_adst_adst_1_12bpc_c: 371.0
inv_txfm_add_4x4_adst_adst_1_12bpc_avx2: 68.7
inv_txfm_add_4x4_adst_dct_0_12bpc_c: 413.1
inv_txfm_add_4x4_adst_dct_0_12bpc_avx2: 69.2
inv_txfm_add_4x4_adst_dct_1_12bpc_c: 412.7
inv_txfm_add_4x4_adst_dct_1_12bpc_avx2: 68.8
inv_txfm_add_4x4_adst_flipadst_0_12bpc_c: 378.5
inv_txfm_add_4x4_adst_flipadst_0_12bpc_avx2: 74.9
inv_txfm_add_4x4_adst_flipadst_1_12bpc_c: 378.1
inv_txfm_add_4x4_adst_flipadst_1_12bpc_avx2: 74.6
inv_txfm_add_4x4_adst_identity_0_12bpc_c: 347.8
inv_txfm_add_4x4_adst_identity_0_12bpc_avx2: 48.8
inv_txfm_add_4x4_adst_identity_1_12bpc_c: 342.7
inv_txfm_add_4x4_adst_identity_1_12bpc_avx2: 49.0
inv_txfm_add_4x4_dct_adst_0_12bpc_c: 399.2
inv_txfm_add_4x4_dct_adst_0_12bpc_avx2: 73.1
inv_txfm_add_4x4_dct_adst_1_12bpc_c: 398.7
inv_txfm_add_4x4_dct_adst_1_12bpc_avx2: 72.2
inv_txfm_add_4x4_dct_dct_0_12bpc_c: 69.6
inv_txfm_add_4x4_dct_dct_0_12bpc_avx2: 32.9
inv_txfm_add_4x4_dct_dct_1_12bpc_c: 420.5
inv_txfm_add_4x4_dct_dct_1_12bpc_avx2: 72.2
inv_txfm_add_4x4_dct_flipadst_0_12bpc_c: 405.5
inv_txfm_add_4x4_dct_flipadst_0_12bpc_avx2: 75.9
inv_txfm_add_4x4_dct_flipadst_1_12bpc_c: 404.2
inv_txfm_add_4x4_dct_flipadst_1_12bpc_avx2: 75.6
inv_txfm_add_4x4_dct_identity_0_12bpc_c: 374.1
inv_txfm_add_4x4_dct_identity_0_12bpc_avx2: 51.6
inv_txfm_add_4x4_dct_identity_1_12bpc_c: 368.0
inv_txfm_add_4x4_dct_identity_1_12bpc_avx2: 51.8
inv_txfm_add_4x4_flipadst_adst_0_12bpc_c: 368.0
inv_txfm_add_4x4_flipadst_adst_0_12bpc_avx2: 69.2
inv_txfm_add_4x4_flipadst_adst_1_12bpc_c: 370.7
inv_txfm_add_4x4_flipadst_adst_1_12bpc_avx2: 70.4
inv_txfm_add_4x4_flipadst_dct_0_12bpc_c: 393.7
inv_txfm_add_4x4_flipadst_dct_0_12bpc_avx2: 70.1
inv_txfm_add_4x4_flipadst_dct_1_12bpc_c: 392.9
inv_txfm_add_4x4_flipadst_dct_1_12bpc_avx2: 69.6
inv_txfm_add_4x4_flipadst_flipadst_0_12bpc_c: 382.2
inv_txfm_add_4x4_flipadst_flipadst_0_12bpc_avx2: 74.6
inv_txfm_add_4x4_flipadst_flipadst_1_12bpc_c: 381.3
inv_txfm_add_4x4_flipadst_flipadst_1_12bpc_avx2: 74.9
inv_txfm_add_4x4_flipadst_identity_0_12bpc_c: 346.7
inv_txfm_add_4x4_flipadst_identity_0_12bpc_avx2: 48.2
inv_txfm_add_4x4_flipadst_identity_1_12bpc_c: 347.9
inv_txfm_add_4x4_flipadst_identity_1_12bpc_avx2: 48.7
inv_txfm_add_4x4_identity_adst_0_12bpc_c: 344.7
inv_txfm_add_4x4_identity_adst_0_12bpc_avx2: 59.8
inv_txfm_add_4x4_identity_adst_1_12bpc_c: 340.5
inv_txfm_add_4x4_identity_adst_1_12bpc_avx2: 59.2
inv_txfm_add_4x4_identity_dct_0_12bpc_c: 369.8
inv_txfm_add_4x4_identity_dct_0_12bpc_avx2: 59.3
inv_txfm_add_4x4_identity_dct_1_12bpc_c: 369.5
inv_txfm_add_4x4_identity_dct_1_12bpc_avx2: 59.2
inv_txfm_add_4x4_identity_flipadst_0_12bpc_c: 353.4
inv_txfm_add_4x4_identity_flipadst_0_12bpc_avx2: 65.6
inv_txfm_add_4x4_identity_flipadst_1_12bpc_c: 350.9
inv_txfm_add_4x4_identity_flipadst_1_12bpc_avx2: 65.9
inv_txfm_add_4x4_identity_identity_0_12bpc_c: 326.1
inv_txfm_add_4x4_identity_identity_0_12bpc_avx2: 39.5
inv_txfm_add_4x4_identity_identity_1_12bpc_c: 321.6
inv_txfm_add_4x4_identity_identity_1_12bpc_avx2: 39.5
2021-10-18 20:45:36 +02:00
Matthias Dressel 4cdfe6919f x86/itx: Rename rax to r6
Use numerical GPR references everywhere for consistency.
2021-10-18 20:20:02 +02:00
Matthias Dressel 1ea40afdbb x86/itx: Name constants more explicit
Give some constants a more explicit name to avoid confusion when 12bpc
support is added.
2021-10-18 20:20:02 +02:00
Matthias Dressel cff5ba694c CI: Update CI images 2021-10-18 16:15:52 +02:00
Matthias Dressel c6a97f8a3e CI: Output the dav1d-test-data commit used in the run
Having the exact commit hash in the logs helps with debugging.
2021-09-17 16:31:51 +02:00
Matthias Dressel 4533dd8678 CI: snap: Upload releases to stable channel 2021-09-03 13:31:42 +00:00
Matthias Dressel 6ab2b716cc x86: Simplify loopfilter init 2021-09-03 14:59:59 +02:00
Matthias Dressel e4812a6ad7 x86: itx4: Inline transpose
Saves one move.
2021-06-21 19:30:39 +02:00
Matthias Dressel 89be94d41e x86: Add bpc suffix to filmgrain functions 2021-06-20 23:02:02 +02:00
Matthias Dressel c7e0ad4577 x86: Add bpc suffix to loopfilter functions 2021-06-20 23:02:02 +02:00
Matthias Dressel a6821cee0a x86: Add bpc suffix to ipred functions 2021-06-20 23:02:02 +02:00
Matthias Dressel f951165ea6 x86: itx: Port 10-bit 4x4 transforms to SSE4
64-bit  32-bit
inv_txfm_add_4x4_adst_adst_0_10bpc_c:            257.0   346.3
inv_txfm_add_4x4_adst_adst_0_10bpc_sse4:          47.1    51.7
inv_txfm_add_4x4_adst_adst_0_10bpc_avx2:          57.4
inv_txfm_add_4x4_adst_adst_1_10bpc_c:            259.8   345.6
inv_txfm_add_4x4_adst_adst_1_10bpc_sse4:          47.1    52.0
inv_txfm_add_4x4_adst_adst_1_10bpc_avx2:          56.9
inv_txfm_add_4x4_adst_dct_0_10bpc_c:             284.6   369.9
inv_txfm_add_4x4_adst_dct_0_10bpc_sse4:           42.2    46.0
inv_txfm_add_4x4_adst_dct_0_10bpc_avx2:           51.9
inv_txfm_add_4x4_adst_dct_1_10bpc_c:             285.2   369.8
inv_txfm_add_4x4_adst_dct_1_10bpc_sse4:           42.4    45.9
inv_txfm_add_4x4_adst_dct_1_10bpc_avx2:           51.9
inv_txfm_add_4x4_adst_flipadst_0_10bpc_c:        262.9   345.0
inv_txfm_add_4x4_adst_flipadst_0_10bpc_sse4:      46.8    50.1
inv_txfm_add_4x4_adst_flipadst_0_10bpc_avx2:      57.0
inv_txfm_add_4x4_adst_flipadst_1_10bpc_c:        262.1   345.6
inv_txfm_add_4x4_adst_flipadst_1_10bpc_sse4:      46.8    50.3
inv_txfm_add_4x4_adst_flipadst_1_10bpc_avx2:      57.1
inv_txfm_add_4x4_adst_identity_0_10bpc_c:        225.6   302.9
inv_txfm_add_4x4_adst_identity_0_10bpc_sse4:      38.0    42.3
inv_txfm_add_4x4_adst_identity_0_10bpc_avx2:      41.4
inv_txfm_add_4x4_adst_identity_1_10bpc_c:        225.7   303.1
inv_txfm_add_4x4_adst_identity_1_10bpc_sse4:      37.8    42.3
inv_txfm_add_4x4_adst_identity_1_10bpc_avx2:      41.4
inv_txfm_add_4x4_dct_adst_0_10bpc_c:             274.6   378.0
inv_txfm_add_4x4_dct_adst_0_10bpc_sse4:           44.8    48.5
inv_txfm_add_4x4_dct_adst_0_10bpc_avx2:           50.7
inv_txfm_add_4x4_dct_adst_1_10bpc_c:             274.0   377.4
inv_txfm_add_4x4_dct_adst_1_10bpc_sse4:           44.6    48.6
inv_txfm_add_4x4_dct_adst_1_10bpc_avx2:           51.0
inv_txfm_add_4x4_dct_dct_0_10bpc_c:               39.2    50.6
inv_txfm_add_4x4_dct_dct_0_10bpc_sse4:            29.1    33.8
inv_txfm_add_4x4_dct_dct_0_10bpc_avx2:            29.3
inv_txfm_add_4x4_dct_dct_1_10bpc_c:              300.6   399.0
inv_txfm_add_4x4_dct_dct_1_10bpc_sse4:            39.7    44.3
inv_txfm_add_4x4_dct_dct_1_10bpc_avx2:            48.6
inv_txfm_add_4x4_dct_flipadst_0_10bpc_c:         278.6   377.8
inv_txfm_add_4x4_dct_flipadst_0_10bpc_sse4:       45.3    49.6
inv_txfm_add_4x4_dct_flipadst_0_10bpc_avx2:       50.2
inv_txfm_add_4x4_dct_flipadst_1_10bpc_c:         277.1   378.3
inv_txfm_add_4x4_dct_flipadst_1_10bpc_sse4:       45.0    49.7
inv_txfm_add_4x4_dct_flipadst_1_10bpc_avx2:       50.2
inv_txfm_add_4x4_dct_identity_0_10bpc_c:         246.9   335.8
inv_txfm_add_4x4_dct_identity_0_10bpc_sse4:       37.1    41.7
inv_txfm_add_4x4_dct_identity_0_10bpc_avx2:       37.4
inv_txfm_add_4x4_dct_identity_1_10bpc_c:         247.2   336.2
inv_txfm_add_4x4_dct_identity_1_10bpc_sse4:       37.1    41.6
inv_txfm_add_4x4_dct_identity_1_10bpc_avx2:       37.3
inv_txfm_add_4x4_flipadst_adst_0_10bpc_c:        259.4   351.7
inv_txfm_add_4x4_flipadst_adst_0_10bpc_sse4:      47.1    51.8
inv_txfm_add_4x4_flipadst_adst_0_10bpc_avx2:      57.9
inv_txfm_add_4x4_flipadst_adst_1_10bpc_c:        258.7   350.8
inv_txfm_add_4x4_flipadst_adst_1_10bpc_sse4:      47.1    51.8
inv_txfm_add_4x4_flipadst_adst_1_10bpc_avx2:      57.4
inv_txfm_add_4x4_flipadst_dct_0_10bpc_c:         282.3   375.4
inv_txfm_add_4x4_flipadst_dct_0_10bpc_sse4:       42.2    45.8
inv_txfm_add_4x4_flipadst_dct_0_10bpc_avx2:       52.5
inv_txfm_add_4x4_flipadst_dct_1_10bpc_c:         283.0   375.8
inv_txfm_add_4x4_flipadst_dct_1_10bpc_sse4:       42.5    45.9
inv_txfm_add_4x4_flipadst_dct_1_10bpc_avx2:       52.4
inv_txfm_add_4x4_flipadst_flipadst_0_10bpc_c:    258.8   356.1
inv_txfm_add_4x4_flipadst_flipadst_0_10bpc_sse4:  47.3    50.1
inv_txfm_add_4x4_flipadst_flipadst_0_10bpc_avx2:  57.4
inv_txfm_add_4x4_flipadst_flipadst_1_10bpc_c:    259.0   355.3
inv_txfm_add_4x4_flipadst_flipadst_1_10bpc_sse4:  47.8    50.2
inv_txfm_add_4x4_flipadst_flipadst_1_10bpc_avx2:  57.4
inv_txfm_add_4x4_flipadst_identity_0_10bpc_c:    228.6   309.4
inv_txfm_add_4x4_flipadst_identity_0_10bpc_sse4:  37.8    42.0
inv_txfm_add_4x4_flipadst_identity_0_10bpc_avx2:  41.4
inv_txfm_add_4x4_flipadst_identity_1_10bpc_c:    229.1   309.6
inv_txfm_add_4x4_flipadst_identity_1_10bpc_sse4:  37.9    42.2
inv_txfm_add_4x4_flipadst_identity_1_10bpc_avx2:  41.3
inv_txfm_add_4x4_identity_adst_0_10bpc_c:        200.8   275.8
inv_txfm_add_4x4_identity_adst_0_10bpc_sse4:      39.0    43.9
inv_txfm_add_4x4_identity_adst_0_10bpc_avx2:      47.4
inv_txfm_add_4x4_identity_adst_1_10bpc_c:        200.8   276.5
inv_txfm_add_4x4_identity_adst_1_10bpc_sse4:      39.0    44.0
inv_txfm_add_4x4_identity_adst_1_10bpc_avx2:      47.2
inv_txfm_add_4x4_identity_dct_0_10bpc_c:         226.4   300.3
inv_txfm_add_4x4_identity_dct_0_10bpc_sse4:       36.9    41.7
inv_txfm_add_4x4_identity_dct_0_10bpc_avx2:       42.8
inv_txfm_add_4x4_identity_dct_1_10bpc_c:         229.0   300.6
inv_txfm_add_4x4_identity_dct_1_10bpc_sse4:       36.8    41.6
inv_txfm_add_4x4_identity_dct_1_10bpc_avx2:       42.7
inv_txfm_add_4x4_identity_flipadst_0_10bpc_c:    202.6   278.9
inv_txfm_add_4x4_identity_flipadst_0_10bpc_sse4:  39.2    43.7
inv_txfm_add_4x4_identity_flipadst_0_10bpc_avx2:  47.1
inv_txfm_add_4x4_identity_flipadst_1_10bpc_c:    202.6   279.3
inv_txfm_add_4x4_identity_flipadst_1_10bpc_sse4:  39.2    43.8
inv_txfm_add_4x4_identity_flipadst_1_10bpc_avx2:  47.0
inv_txfm_add_4x4_identity_identity_0_10bpc_c:    168.7   235.9
inv_txfm_add_4x4_identity_identity_0_10bpc_sse4:  31.7    37.6
inv_txfm_add_4x4_identity_identity_0_10bpc_avx2:  33.9
inv_txfm_add_4x4_identity_identity_1_10bpc_c:    169.1   235.7
inv_txfm_add_4x4_identity_identity_1_10bpc_sse4:  31.7    37.4
inv_txfm_add_4x4_identity_identity_1_10bpc_avx2:  33.8
2021-06-19 20:44:56 +02:00
Matthias Dressel f4a8f804fd x86: itx: wht: Minor fixes
* Rename macro for consistency. WHT has exactly one line per register.
* Use REPX to make code more readable.
2021-06-18 17:48:56 +02:00
Matthias Dressel 770c9c834d x86: Add bpc suffix to itx functions 2021-06-18 02:54:34 +02:00
Matthias Dressel c54add0204 x86: itx: Add 10/12-bit SSE2 WHT 2021-05-18 02:50:02 +02:00
Matthias Dressel 477cc158d1 x86: itx: Add 12-bit wht 2021-05-13 21:29:03 +02:00
Matthias Dressel ae8958bdc7 CI: Fix asm checks
meson 0.57.0 introduced an optimization [0] for `meson test` to only
rebuild test dependencies. This does not cover changing the build
configuration anymore.

[0] https://mesonbuild.com/Release-notes-for-0-57-0.html
2021-04-12 20:18:03 +00:00
Matthias Dressel 2479973cbb CI: Add check for illegal instructions
Some AVX2 instructions cannot be macroed by x86inc.asm.
Some instructions are valid in SSE4 but not in SSSE3, therefor checking
both.
* Conroe is up to SSSE3
* Penryn is up to SSE4.1

See also: 4dd9431
2021-03-07 17:59:02 +01:00
Matthias Dressel 061ac9aee8 cli: Fix md5 verification for short values
Verification should not succeed if the given string is too short to be a
real hash.

Fixes videolan/dav1d#361
2021-02-08 04:48:46 +01:00