100 Commits
Author SHA1 Message Date
Henrik Gramner c0f2fe3135 build: Update meson version requirement to 0.54.0
Use of the meson 'fallback arg in dependency' feature was introduced
by the switch to external checkasm in 3a2a874.
2026-04-22 21:02:05 +02:00
Henrik Gramner 6894b7f2d0 Improve the memory pool API
Return a void pointer directly to the usable memory region,
abstracting away implementation details.
2026-03-17 18:28:57 +01:00
Henrik Gramner 241a6b236a x86: Fix warp8x8 gamma/delta naming mixup
For whatever reason the names of the gamma and delta parameters
have been switched in a few of the warp8x8 asm implementations.

This is a bit confusing, so fix things by switching them back.

This change is purely cosmetical, the output binary is identical.
2026-03-05 15:50:40 +01:00
Henrik Gramner 2272a19ab0 x86: Update x86inc.asm 2026-01-26 23:32:17 +01:00
Henrik Gramner 43f3b8d33b checkasm: Group itx functions by their largest dimension
This reduces the number of itx reports per instruction set from
19 to 5, which avoids excessively flooding the console output.
2025-12-09 22:02:52 +01:00
Henrik Gramner 165e9e251b checkasm: Only run DC-only itx tests for dct_dct 2025-12-09 21:01:10 +01:00
Henrik Gramner c720f4d355 cli: Fix input_open() memory leak on fopen() failure 2025-10-27 20:44:47 +01:00
Henrik Gramner 716164239a obu: Improve short-signaling reference frame index calculation
Reduces code size a fair amount, and with some loop unrolling
by the compiler the code becomes nearly branchless.
2025-07-09 14:24:07 +02:00
Henrik Gramner fa30043ba0 obu: Remove redundant zeroing in frame header parsing
The Dav1dFrameHeader struct is already zero-initialized,
so zeroing individual values a second time is redundant.
2025-07-07 16:00:30 +02:00
Henrik Gramner b3c5848f7f loongarch: Use hidden visibility for asm functions 2025-06-07 22:36:38 +02:00
Henrik Gramner 63bf075aad recon: Fix level index calculation optimization for 2D transforms
Due to a typo this was never actually enabled since being added in
5ef6b24. As a result the slow path was always being used.
2025-06-02 15:54:28 +02:00
Henrik Gramner fe0ab51460 Use exact-width integer min/max defines where appropriate
Improves support for niche systems with uncommon integer sizes.
2025-05-29 19:38:49 +02:00
Henrik Gramner 29efbb9496 refmvs: Shrink mfmv_ref arrays
Includes updates to load_tmvs() asm implementations.
2025-05-28 19:01:45 +02:00
Henrik Gramner 68dc20035b refmvs: Shrink refpoc arrays 2025-05-28 19:01:45 +02:00
Henrik Gramner 7889ac7603 cdf: Remove unused eob_hi_bit entries 2025-05-28 02:06:08 +02:00
Henrik Gramner 9a75cebc36 Explicitly use uint8_t for the order_palette() scratch buffer
It previously used 'pixel' which is typedefed to uint8_t in files
that aren't bitdepth-templated, but those are indices and not
pixels so that was just confusing and misleading.
2024-12-02 13:47:04 +01:00
Henrik Gramner ef4aff75b0 x86: Improve SSSE3 SGR asm
* Use the same approach as AVX2 of using floating-point reciprocal
   instructions to replace dav1d_sgr_x_by_x[] table lookups.

 * Optimize clipping of p-values in the 10bpc code.

 * Rename some macros to clarify their functionality.

 * Implement various minor tweaks.
2024-10-22 00:00:32 +02:00
Henrik Gramner 7072e79faa x86: Make AVX2 SGR gatherless
Instead of using gathers we can calculate the value of
sgr_x_by_x[min(z, 255)] by doing 256 / (z + 1) in floating-point
with some clipping for z == 0 and z >= 255.

As the required precision of the division is fairly small it can be
performed using an approximate reciprocal, which is significantly
faster than a regular division.

Gather instructions are slow on all AMD CPU:s, and on most Intel
CPU:s ever since µcode updates were issued as a workaround for
the Gather Data Sampling side channel vulnerability.
2024-10-07 13:04:34 +02:00
Henrik Gramner 32bf6cde06 x86: Add 6-tap variants of high bit-depth mc SSSE3 functions 2024-06-25 13:56:11 +02:00
Henrik Gramner da2cc7817c x86: Eliminate hardcoded struct offsets in refmvs load_tmvs() asm 2024-05-27 17:39:10 +02:00
Henrik Gramner 26a2744eae refmvs: Consolidate r and rp_proj allocations
The conditions for when to (re)allocate those buffers are identical,
so they can be merged into a single branch.

The allocation of the buffers themselves can also be combined to
reduce the number of allocation calls.
2024-05-27 17:39:09 +02:00
Henrik Gramner 54801d0734 refmvs: Remove dav1d_refmvs_init()
It's only ever called on data which has already been zero-initialized.
2024-05-27 17:39:08 +02:00
Henrik Gramner 89a200c82e refmvs: Simplify 2-pass logic
n_tc is always >= n_fc, so we only need to check the latter.
2024-05-27 17:39:06 +02:00
Henrik Gramner ca156d90b8 x86: Add 6-tap variants of 8bpc mc SSSE3 functions 2024-05-27 15:45:17 +02:00
Henrik Gramner 8afbd4f68a x86: Add minor 8bpc mc SSE improvements 2024-05-27 15:45:17 +02:00
Henrik Gramner 85c1639170 x86: Remove 8bpc mc SSE2 asm
The amount of nested macros caused by having to support SSE2 makes
the code very difficult to maintain and modify. It is also of
questionable value considering most other asm requires SSSE3.
2024-05-27 15:45:17 +02:00
Henrik Gramner d3997acbeb x86: Remove unused macro in mc16_avx512.asm 2024-05-27 15:45:17 +02:00
Henrik Gramner bb948769e3 tests: Verify dav1d command line in dav1d_argon.bash
Error out early instead of producing bogus mismatch errors in case
of an incorrect cpu mask for example.
2024-05-20 14:29:13 +02:00
Henrik Gramner 841853031b x86: Update x86inc.asm
https://code.videolan.org/videolan/x86inc.asm/-/commit/b6ba1e3045d758fd6c6e24591dac21a3dc812e1d
2024-05-14 15:04:46 +02:00
Henrik Gramner cc1137c85b checkasm: Eliminate unreachable code in the Windows exception handler 2024-05-13 14:01:17 +02:00
Henrik Gramner 471549f268 checkasm: Avoid UB in setjmp() invocations
Both POSIX and the C standard places several environmental limits on
setjmp() invocations, with essentially anything beyond comparing the
return value with a constant as a simple branch condition being UB.

We were previously performing a function call using the setjmp()
return value as an argument, which is technically not allowed
even though it happened to work correctly in practice.

Some systems may loosen those restrictions and allow for more
flexible usage, but we shouldn't be relying on that.
2024-05-13 13:57:35 +02:00
Henrik Gramner 223901243c x86: Add 6-tap variants of high bit-depth mc AVX-512 (Ice Lake) functions 2024-04-29 17:59:09 +02:00
Henrik Gramner 8ff97b3a0b x86: Add minor high bit-depth mc AVX-512 improvements 2024-04-29 17:59:09 +02:00
Henrik Gramner 5b5399911d x86: Add 6-tap variants of 8bpc mc AVX-512 (Ice Lake) functions
6-tap filtering is only performed vertically due to use of VNNI
instructions processing 4 pixels per instruction horizontally.
2024-04-15 13:19:42 +02:00
Henrik Gramner 38df35d2d1 x86: Add various 8bpc mc AVX-512 improvements 2024-04-15 13:12:20 +02:00
Henrik Gramner dc9490134f meson: Enable parallel execution of checkasm in 'meson test'
It was originally disabled due to older meson versions mixing the output
of 'meson test -v' from different tests, which made the log difficult to
read. Newer versions however caches the output from each test as it runs
and prints it in one contiguous block, so that's no longer an issue.
2024-04-08 22:51:15 +02:00
Henrik Gramner f6e05da093 cdf: Combine memcpy() calls in dav1d_cdf_thread_copy()
Place multiple default contexts inside a single outer struct so
that copying can be performed in larger blocks.
2024-04-08 20:25:59 +02:00
Henrik Gramner c8add4f8bf cdf: Reduce code size of dav1d_cdf_thread_update()
Reorder CDF arrays so that copying can be performed in larger blocks.
2024-04-08 20:25:59 +02:00
Henrik Gramner ed24201356 cdf: Make qcat calculation branchless 2024-04-08 20:25:58 +02:00
Henrik Gramner 67fcf01bf2 decode: Simplify read_mv_residual() 2024-04-08 20:25:58 +02:00
Henrik Gramner 17a2180a61 cdf: Remove separate intra-only dmv contexts
We can simply use the regular mv contexts for intra frames.

They are mutually exclusive, and the dmv contexts were already
discarded and replaced with default contexts on frame completion.
2024-04-08 20:25:58 +02:00
Henrik Gramner e2145f5295 cdf: Skip unnecessary context copying in dav1d_cdf_thread_update()
The intrabc and dmv contexts are never reused between frames.
2024-04-08 20:25:58 +02:00
Henrik GramnerandHenrik Gramner e27b451e2a cli: Handle SIGINT and SIGTERM more gracefully
Attempt to finish writing the current frame before exiting to avoid
ending up with a partially written frame at the end of the output file.

Only try catching a signal once, falling back to the default behavior
of exiting immediately the second time a given signal is raised.
2024-04-04 13:06:12 +00:00
Henrik Gramner abc8a1689f lf_mask: Align lvl buffers
Ensures that SIMD stores performed by memset() are aligned.
2024-03-28 15:58:36 +01:00
Henrik Gramner 119df64b21 lf_mask: Use sizeof() in memset() size calculations 2024-03-28 15:58:35 +01:00
Henrik Gramner df3dafddc3 lf_mask: Use a union type for last_delta_lf
On architectures without unaligned load capabilites the compiler will
otherwise load the individual 8-bit values one at a time.
2024-03-28 15:58:34 +01:00
Henrik Gramner 076955a153 refmvs: Fix buffer overread in save_tmvs() asm
The refmvs_block struct is only 12 bytes large but it's accessed
using 16-byte unaligned loads in asm.

In order to avoid reading past the end of the allocated buffer
we therefore need to pad the allocation size by 4 bytes.
2024-03-28 01:41:28 +01:00
Henrik Gramner 3d98a242a0 x86: Add 6-tap variants of high bit-depth mc AVX2 functions 2024-03-22 11:11:58 +01:00
Henrik Gramner b3323a8ccd x86: Add minor high bit-depth mc 8-tap AVX2 improvements 2024-03-22 10:41:45 +01:00
Henrik Gramner 9849ede130 x86: Add 6-tap variants of 8bpc mc AVX2 functions
6-taps filters are sufficient in the majority of cases, and are
quite a bit faster than the equivalent 8-tap filters.
2024-03-21 12:30:05 +00:00
Henrik Gramner 02c2033a1e x86: Add minor 8bpc mc 8-tap AVX2 improvements 2024-03-21 12:30:05 +00:00
Henrik Gramner 645da27785 x86: Update x86inc.asm
https://code.videolan.org/videolan/x86inc.asm/-/commit/8494a52b9548345b6d9f527cf2059eb0d6fe592d
https://code.videolan.org/videolan/x86inc.asm/-/commit/04f14f431ce07ca349b5d87c9e5930f5950cf712
2024-03-15 12:19:27 +01:00
Henrik Gramner 8b46166852 ci: Make checkasm work on the x86-32 build 2024-03-15 12:19:24 +01:00
Henrik GramnerandHenrik Gramner 006ca01d38 x86: Fix out-of-bounds read in 8bpc SSE2/SSSE3 wiener_filter
When decoding a stream with a width of less than 4 pixels this could
cause a segfault if the frame buffer was allocated on a page boundary.
2024-03-07 03:13:33 +01:00
Henrik Gramner 85a10359cd checkasm: Add --list-cpuflags option
Prints a list of cpuflags available for the current architecture.

Flags which are supported on the current system will be printed in
green, and flags which are unsupported in red with a ~ prefix.
2024-02-29 00:13:23 +00:00
Henrik GramnerandHenrik Gramner 36184ce06c x86inc: Fix warnings with old nasm versions 2024-02-22 12:54:30 +01:00
Henrik Gramner d22de29cad Add minor msac optimizations
Skip the overhead of shifting in ones into the LSB in the common case,
that's only required in the EOB padding. In practice this means we
only have to invert bits once during the refill process instead of
twice in every call to msac functions.

Also make some improvements to the refill asm, mainly involving
keeping partially inserted bytes at the end instead of clearing them.
2024-02-21 11:17:41 +00:00
Henrik Gramner 83ae3e9a47 checkasm: Improve msac tests
* Process the entire buffer to get better coverage of eob handling.

* Use a more reasonable buffer size.

* Ignore trailing dif bits to allow for more implementation flexibility.
2024-02-21 11:17:41 +00:00
Henrik Gramner 28908b4341 x86: Update x86inc.asm 2024-02-21 11:04:39 +00:00
Henrik GramnerandHenrik Gramner 4796b59fc0 ci: Improve coverage for argon samples using different thread counts 2024-02-18 15:37:04 +01:00
Henrik GramnerandHenrik Gramner bb26bdca06 tests: Automatically determine job count in dav1d_argon.bash
Default to using the number of logical cores divided by thread count.
2024-02-18 15:37:04 +01:00
Henrik GramnerandJean-Baptiste Kempf 97744bdc8c x86: Add high bit-depth ipred z2 AVX-512 (Ice Lake) asm 2024-02-14 13:09:03 +00:00
Henrik GramnerandHenrik Gramner 2b475307dc Fix tile_start_off calculations for extremely large frame sizes
The tile start offset, in pixels, can exceed the range of a signed int.
2024-02-13 18:18:38 +01:00
Henrik Gramner 227c37f74a Use a constant length for progress reporting in dav1d_argon.bash 2024-01-24 00:29:54 +00:00
Henrik Gramner cdb2a1a27b Avoid printing full path names in dav1d_argon.bash
Only print the paths relative to the argon directory. This avoids
excessive terminal line wrapping due to long path names which
otherwise interferes with the '\r' usage for progress reporting.
2024-01-24 00:29:54 +00:00
Henrik Gramner e2c7a4408b x86: Add high bit-depth ipred z3 AVX-512 (Ice Lake) asm 2024-01-22 12:02:28 +00:00
Henrik Gramner d23e87f7ae checkasm: Prefer sigsetjmp()/siglongjmp() over SA_NODEFER
Also prefer re-setting the signal handler upon intercept in combination
with SA_RESETHAND over re-raising exceptions with the SIG_DFL handler.
2024-01-11 12:35:34 +00:00
Henrik Gramner 8501a4b201 checkasm: Make signal handling async-signal-safe 2024-01-11 12:35:34 +00:00
Henrik GramnerandHenrik Gramner 746ab8b4f3 thread_task: Properly handle spurious wakeups in delayed_fg
POSIX explicitly states that spurious wakeups from pthread_cond_wake()
may occur, even without any corresponding call to pthread_cond_signal().
2023-12-19 13:15:43 +01:00
Henrik GramnerandHenrik Gramner b3f5e8cef5 thread_task: Replace goto's with a regular while-loop 2023-12-19 13:15:43 +01:00
Henrik GramnerandHenrik Gramner 8ba0df8492 checkasm: Fix cdef_dir function prototype 2023-12-19 12:11:46 +01:00
Henrik GramnerandHenrik Gramner b3779b89c0 x86: Add high bit-depth ipred z1 AVX-512 (Ice Lake) asm 2023-12-11 14:15:30 +01:00
Henrik GramnerandHenrik Gramner 0a8d66402e x86: Require fast gathers for AVX-512 horizontal loopfilters
Prefer using the AVX2 implementations (which doesn't use gathers) on Zen 4.
2023-12-08 16:21:13 +01:00
Henrik GramnerandHenrik Gramner a04a724719 x86: Require fast gathers for high bit-depth AVX-512 film grain
Prefer using the SSSE3 implementations on Zen 4.
2023-12-08 16:21:13 +01:00
Henrik GramnerandHenrik Gramner 0e438e70fa x86: Require fast gathers for AVX-512 mc resize and warp
Prefer using the AVX2 implementations (which doesn't use gathers) on Zen 4.
2023-12-08 16:21:13 +01:00
Henrik GramnerandHenrik Gramner ec05e9b978 x86: Flag Zen 4 as having slow gathers 2023-12-08 15:34:16 +01:00
Henrik GramnerandHenrik Gramner 3c41fa88ce x86: Add 8-bit ipred z2 AVX-512 (Ice Lake) asm 2023-11-13 13:05:58 +01:00
Henrik GramnerandHenrik Gramner e47a39ca95 x86: Fix 8bpc AVX2 ipred_z2 filtering with extremely large frame sizes
The max_width/max_height values can exceed 16-bit range.
2023-11-12 22:52:18 +01:00
Henrik GramnerandHenrik Gramner d2ee43892b checkasm: Improve DSP trimming error message 2023-11-01 14:43:19 +01:00
Henrik GramnerandHenrik Gramner 611abc20db checkasm: Add missing WINAPI_PARTITION checks on Windows
Some functionality is only available on WINAPI_PARTITION_DESKTOP systems.
2023-11-01 14:43:19 +01:00
Henrik GramnerandHenrik Gramner 6bc552eb28 checkasm: Enable virtual terminal processing on Windows
This allows for the use of standard VT100 escape codes for text coloring,
which simplifies things by eliminating a bunch of Windows-specific code.

This is only supported since Windows 10. Things will still run on
older systems, just without colored text output.
2023-11-01 14:43:18 +01:00
Henrik GramnerandHenrik Gramner 0f2a877e7e checkasm: Check for errors in command line parsing 2023-11-01 13:59:46 +01:00
Henrik GramnerandHenrik Gramner 9dbf46285d ci: Fix test-debian-asan running checkasm with non-existing arguments 2023-11-01 13:59:46 +01:00
Henrik GramnerandHenrik Gramner fd4ecc2fd8 x86: Add 8-bit ipred z3 AVX-512 (Ice Lake) asm 2023-10-19 17:00:20 +02:00
Henrik GramnerandHenrik Gramner 4c012978fb x86: Add 8-bit ipred z1 AVX-512 (Ice Lake) asm 2023-10-04 11:49:57 +02:00
Henrik GramnerandHenrik Gramner 8936bab7ba x86: Consolidate some pb_0to31 and pb_0to63 constants 2023-10-04 11:49:43 +02:00
Henrik GramnerandHenrik Gramner 97becd7372 Use the correct free() function on dav1d_mem_pool_init() failure 2023-08-18 17:41:50 +02:00
Henrik GramnerandHenrik Gramner 43a11ccb20 Account for chroma subsampling when allocating cbi buffers
Reduces memory usage (by 3 kB per sb128 for 4:2:0) when decoding
streams with subsampled chroma when frame threading is enabled.

This also simplifies the logic for calculating cbi indices.
Both entropy decoding and reconstruction access the elements in
the same order, so calculating block x/y positions is redundant
and we can instead just store values sequentially and increase
the pointer by one every time it's accessed.
2023-07-18 14:21:57 +02:00
Henrik GramnerandHenrik Gramner 9eace34cba x86: Fix misaligned loads in high bit-depth pal_pred SSSE3 asm
Regression introduced in 72e9c7c.
2023-07-07 01:56:10 +02:00
Henrik GramnerandHenrik Gramner 8dbf789ebe x86: Add pal_idx_finish asm 2023-07-06 23:12:43 +02:00
Henrik GramnerandHenrik Gramner 852cc3409f Move palette packing/edge-extension into a DSP function 2023-07-06 23:12:43 +02:00
Henrik GramnerandHenrik Gramner 72e9c7c095 Pack palette indices
Pack two indices into each byte instead of storing them separately.

Reduces memory usage by up to 16 kB per sb128 in streams that uses
screen content tools when frame-threading is enabled, at the cost
of some additional computational overhead for packing/unpacking.
2023-07-06 23:10:22 +02:00
Henrik GramnerandHenrik Gramner 233a424c38 Use pixel instead of uint16_t for palette buffers
Reduces memory usage by 6 kB per sb128 in 8bpc streams that
uses screen content tools when frame-threading is enabled.
2023-07-06 23:10:22 +02:00
Henrik GramnerandHenrik Gramner d437510e9d Remove redundant 4:4:4 wedge sign tables
Only one of the sign or no-sign 4:4:4 tables are ever used for
any given wedge index, so there's no point in having both.

Reduces the table size by around 50 kB.
2023-07-06 14:42:17 +02:00
Henrik GramnerandHenrik Gramner 90a45d89de Optimize the size of interintra/wedge index tables
Replace pointers with 16-bit relative offsets and remove entries
for unused block sizes (only 8x8..32x32 are relevant).

Reduces the table size by around 17 kB.
2023-07-06 14:42:14 +02:00
Henrik GramnerandHenrik Gramner 31de9d5093 Replace validate_input() with assert() in internal functions
Always-enabled basic sanity checks in API functions is reasonable,
but within internal functions assert() is more appropriate when
it comes to checking for "should never happen" conditions.
2023-06-07 13:35:07 +02:00
Henrik GramnerandHenrik Gramner 47e2e672d1 Eliminate validate_input() printf calls in release mode 2023-06-07 13:35:06 +02:00
Henrik GramnerandHenrik Gramner 682fb1ba14 Add a SIZE_MAX/2 validation check in dav1d_parse_sequence_header() 2023-06-07 13:35:04 +02:00
Henrik GramnerandHenrik Gramner 517777270c Add a debug feature for tracking heap memory usage 2023-06-07 12:18:34 +02:00
Henrik GramnerandHenrik Gramner ed22e23d9a build: Simplify malloc handling 2023-06-06 22:10:57 +02:00