dav1d

x/dav1d

mirror of https://code.videolan.org/videolan/dav1d synced 2026-06-11 04:03:05 +00:00

Author	SHA1	Message	Date
Henrik Gramner	c0f2fe3135	build: Update meson version requirement to 0.54.0 Use of the meson 'fallback arg in dependency' feature was introduced by the switch to external checkasm in `3a2a874`.	2026-04-22 21:02:05 +02:00
Henrik Gramner	6894b7f2d0	Improve the memory pool API Return a void pointer directly to the usable memory region, abstracting away implementation details.	2026-03-17 18:28:57 +01:00
Henrik Gramner	241a6b236a	x86: Fix warp8x8 gamma/delta naming mixup For whatever reason the names of the gamma and delta parameters have been switched in a few of the warp8x8 asm implementations. This is a bit confusing, so fix things by switching them back. This change is purely cosmetical, the output binary is identical.	2026-03-05 15:50:40 +01:00
Henrik Gramner	2272a19ab0	x86: Update x86inc.asm	2026-01-26 23:32:17 +01:00
Henrik Gramner	43f3b8d33b	checkasm: Group itx functions by their largest dimension This reduces the number of itx reports per instruction set from 19 to 5, which avoids excessively flooding the console output.	2025-12-09 22:02:52 +01:00
Henrik Gramner	165e9e251b	checkasm: Only run DC-only itx tests for dct_dct	2025-12-09 21:01:10 +01:00
Henrik Gramner	c720f4d355	cli: Fix input_open() memory leak on fopen() failure	2025-10-27 20:44:47 +01:00
Henrik Gramner	716164239a	obu: Improve short-signaling reference frame index calculation Reduces code size a fair amount, and with some loop unrolling by the compiler the code becomes nearly branchless.	2025-07-09 14:24:07 +02:00
Henrik Gramner	fa30043ba0	obu: Remove redundant zeroing in frame header parsing The Dav1dFrameHeader struct is already zero-initialized, so zeroing individual values a second time is redundant.	2025-07-07 16:00:30 +02:00
Henrik Gramner	b3c5848f7f	loongarch: Use hidden visibility for asm functions	2025-06-07 22:36:38 +02:00
Henrik Gramner	63bf075aad	recon: Fix level index calculation optimization for 2D transforms Due to a typo this was never actually enabled since being added in `5ef6b24`. As a result the slow path was always being used.	2025-06-02 15:54:28 +02:00
Henrik Gramner	fe0ab51460	Use exact-width integer min/max defines where appropriate Improves support for niche systems with uncommon integer sizes.	2025-05-29 19:38:49 +02:00
Henrik Gramner	29efbb9496	refmvs: Shrink mfmv_ref arrays Includes updates to load_tmvs() asm implementations.	2025-05-28 19:01:45 +02:00
Henrik Gramner	68dc20035b	refmvs: Shrink refpoc arrays	2025-05-28 19:01:45 +02:00
Henrik Gramner	7889ac7603	cdf: Remove unused eob_hi_bit entries	2025-05-28 02:06:08 +02:00
Henrik Gramner	9a75cebc36	Explicitly use uint8_t for the order_palette() scratch buffer It previously used 'pixel' which is typedefed to uint8_t in files that aren't bitdepth-templated, but those are indices and not pixels so that was just confusing and misleading.	2024-12-02 13:47:04 +01:00
Henrik Gramner	ef4aff75b0	x86: Improve SSSE3 SGR asm * Use the same approach as AVX2 of using floating-point reciprocal instructions to replace dav1d_sgr_x_by_x[] table lookups. * Optimize clipping of p-values in the 10bpc code. * Rename some macros to clarify their functionality. * Implement various minor tweaks.	2024-10-22 00:00:32 +02:00
Henrik Gramner	7072e79faa	x86: Make AVX2 SGR gatherless Instead of using gathers we can calculate the value of sgr_x_by_x[min(z, 255)] by doing 256 / (z + 1) in floating-point with some clipping for z == 0 and z >= 255. As the required precision of the division is fairly small it can be performed using an approximate reciprocal, which is significantly faster than a regular division. Gather instructions are slow on all AMD CPU:s, and on most Intel CPU:s ever since µcode updates were issued as a workaround for the Gather Data Sampling side channel vulnerability.	2024-10-07 13:04:34 +02:00
Henrik Gramner	32bf6cde06	x86: Add 6-tap variants of high bit-depth mc SSSE3 functions	2024-06-25 13:56:11 +02:00
Henrik Gramner	da2cc7817c	x86: Eliminate hardcoded struct offsets in refmvs load_tmvs() asm	2024-05-27 17:39:10 +02:00
Henrik Gramner	26a2744eae	refmvs: Consolidate r and rp_proj allocations The conditions for when to (re)allocate those buffers are identical, so they can be merged into a single branch. The allocation of the buffers themselves can also be combined to reduce the number of allocation calls.	2024-05-27 17:39:09 +02:00
Henrik Gramner	54801d0734	refmvs: Remove dav1d_refmvs_init() It's only ever called on data which has already been zero-initialized.	2024-05-27 17:39:08 +02:00
Henrik Gramner	89a200c82e	refmvs: Simplify 2-pass logic n_tc is always >= n_fc, so we only need to check the latter.	2024-05-27 17:39:06 +02:00
Henrik Gramner	ca156d90b8	x86: Add 6-tap variants of 8bpc mc SSSE3 functions	2024-05-27 15:45:17 +02:00
Henrik Gramner	8afbd4f68a	x86: Add minor 8bpc mc SSE improvements	2024-05-27 15:45:17 +02:00
Henrik Gramner	85c1639170	x86: Remove 8bpc mc SSE2 asm The amount of nested macros caused by having to support SSE2 makes the code very difficult to maintain and modify. It is also of questionable value considering most other asm requires SSSE3.	2024-05-27 15:45:17 +02:00
Henrik Gramner	d3997acbeb	x86: Remove unused macro in mc16_avx512.asm	2024-05-27 15:45:17 +02:00
Henrik Gramner	bb948769e3	tests: Verify dav1d command line in dav1d_argon.bash Error out early instead of producing bogus mismatch errors in case of an incorrect cpu mask for example.	2024-05-20 14:29:13 +02:00
Henrik Gramner	841853031b	x86: Update x86inc.asm https://code.videolan.org/videolan/x86inc.asm/-/commit/b6ba1e3045d758fd6c6e24591dac21a3dc812e1d	2024-05-14 15:04:46 +02:00
Henrik Gramner	cc1137c85b	checkasm: Eliminate unreachable code in the Windows exception handler	2024-05-13 14:01:17 +02:00
Henrik Gramner	471549f268	checkasm: Avoid UB in setjmp() invocations Both POSIX and the C standard places several environmental limits on setjmp() invocations, with essentially anything beyond comparing the return value with a constant as a simple branch condition being UB. We were previously performing a function call using the setjmp() return value as an argument, which is technically not allowed even though it happened to work correctly in practice. Some systems may loosen those restrictions and allow for more flexible usage, but we shouldn't be relying on that.	2024-05-13 13:57:35 +02:00
Henrik Gramner	223901243c	x86: Add 6-tap variants of high bit-depth mc AVX-512 (Ice Lake) functions	2024-04-29 17:59:09 +02:00
Henrik Gramner	8ff97b3a0b	x86: Add minor high bit-depth mc AVX-512 improvements	2024-04-29 17:59:09 +02:00
Henrik Gramner	5b5399911d	x86: Add 6-tap variants of 8bpc mc AVX-512 (Ice Lake) functions 6-tap filtering is only performed vertically due to use of VNNI instructions processing 4 pixels per instruction horizontally.	2024-04-15 13:19:42 +02:00
Henrik Gramner	38df35d2d1	x86: Add various 8bpc mc AVX-512 improvements	2024-04-15 13:12:20 +02:00
Henrik Gramner	dc9490134f	meson: Enable parallel execution of checkasm in 'meson test' It was originally disabled due to older meson versions mixing the output of 'meson test -v' from different tests, which made the log difficult to read. Newer versions however caches the output from each test as it runs and prints it in one contiguous block, so that's no longer an issue.	2024-04-08 22:51:15 +02:00
Henrik Gramner	f6e05da093	cdf: Combine memcpy() calls in dav1d_cdf_thread_copy() Place multiple default contexts inside a single outer struct so that copying can be performed in larger blocks.	2024-04-08 20:25:59 +02:00
Henrik Gramner	c8add4f8bf	cdf: Reduce code size of dav1d_cdf_thread_update() Reorder CDF arrays so that copying can be performed in larger blocks.	2024-04-08 20:25:59 +02:00
Henrik Gramner	ed24201356	cdf: Make qcat calculation branchless	2024-04-08 20:25:58 +02:00
Henrik Gramner	67fcf01bf2	decode: Simplify read_mv_residual()	2024-04-08 20:25:58 +02:00
Henrik Gramner	17a2180a61	cdf: Remove separate intra-only dmv contexts We can simply use the regular mv contexts for intra frames. They are mutually exclusive, and the dmv contexts were already discarded and replaced with default contexts on frame completion.	2024-04-08 20:25:58 +02:00
Henrik Gramner	e2145f5295	cdf: Skip unnecessary context copying in dav1d_cdf_thread_update() The intrabc and dmv contexts are never reused between frames.	2024-04-08 20:25:58 +02:00
Henrik GramnerandHenrik Gramner	e27b451e2a	cli: Handle SIGINT and SIGTERM more gracefully Attempt to finish writing the current frame before exiting to avoid ending up with a partially written frame at the end of the output file. Only try catching a signal once, falling back to the default behavior of exiting immediately the second time a given signal is raised.	2024-04-04 13:06:12 +00:00
Henrik Gramner	abc8a1689f	lf_mask: Align lvl buffers Ensures that SIMD stores performed by memset() are aligned.	2024-03-28 15:58:36 +01:00
Henrik Gramner	119df64b21	lf_mask: Use sizeof() in memset() size calculations	2024-03-28 15:58:35 +01:00
Henrik Gramner	df3dafddc3	lf_mask: Use a union type for last_delta_lf On architectures without unaligned load capabilites the compiler will otherwise load the individual 8-bit values one at a time.	2024-03-28 15:58:34 +01:00
Henrik Gramner	076955a153	refmvs: Fix buffer overread in save_tmvs() asm The refmvs_block struct is only 12 bytes large but it's accessed using 16-byte unaligned loads in asm. In order to avoid reading past the end of the allocated buffer we therefore need to pad the allocation size by 4 bytes.	2024-03-28 01:41:28 +01:00
Henrik Gramner	3d98a242a0	x86: Add 6-tap variants of high bit-depth mc AVX2 functions	2024-03-22 11:11:58 +01:00
Henrik Gramner	b3323a8ccd	x86: Add minor high bit-depth mc 8-tap AVX2 improvements	2024-03-22 10:41:45 +01:00
Henrik Gramner	9849ede130	x86: Add 6-tap variants of 8bpc mc AVX2 functions 6-taps filters are sufficient in the majority of cases, and are quite a bit faster than the equivalent 8-tap filters.	2024-03-21 12:30:05 +00:00
Henrik Gramner	02c2033a1e	x86: Add minor 8bpc mc 8-tap AVX2 improvements	2024-03-21 12:30:05 +00:00
Henrik Gramner	645da27785	x86: Update x86inc.asm https://code.videolan.org/videolan/x86inc.asm/-/commit/8494a52b9548345b6d9f527cf2059eb0d6fe592d https://code.videolan.org/videolan/x86inc.asm/-/commit/04f14f431ce07ca349b5d87c9e5930f5950cf712	2024-03-15 12:19:27 +01:00
Henrik Gramner	8b46166852	ci: Make checkasm work on the x86-32 build	2024-03-15 12:19:24 +01:00
Henrik GramnerandHenrik Gramner	006ca01d38	x86: Fix out-of-bounds read in 8bpc SSE2/SSSE3 wiener_filter When decoding a stream with a width of less than 4 pixels this could cause a segfault if the frame buffer was allocated on a page boundary.	2024-03-07 03:13:33 +01:00
Henrik Gramner	85a10359cd	checkasm: Add --list-cpuflags option Prints a list of cpuflags available for the current architecture. Flags which are supported on the current system will be printed in green, and flags which are unsupported in red with a ~ prefix.	2024-02-29 00:13:23 +00:00
Henrik GramnerandHenrik Gramner	36184ce06c	x86inc: Fix warnings with old nasm versions	2024-02-22 12:54:30 +01:00
Henrik Gramner	d22de29cad	Add minor msac optimizations Skip the overhead of shifting in ones into the LSB in the common case, that's only required in the EOB padding. In practice this means we only have to invert bits once during the refill process instead of twice in every call to msac functions. Also make some improvements to the refill asm, mainly involving keeping partially inserted bytes at the end instead of clearing them.	2024-02-21 11:17:41 +00:00
Henrik Gramner	83ae3e9a47	checkasm: Improve msac tests * Process the entire buffer to get better coverage of eob handling. * Use a more reasonable buffer size. * Ignore trailing dif bits to allow for more implementation flexibility.	2024-02-21 11:17:41 +00:00
Henrik Gramner	28908b4341	x86: Update x86inc.asm	2024-02-21 11:04:39 +00:00
Henrik GramnerandHenrik Gramner	4796b59fc0	ci: Improve coverage for argon samples using different thread counts	2024-02-18 15:37:04 +01:00
Henrik GramnerandHenrik Gramner	bb26bdca06	tests: Automatically determine job count in dav1d_argon.bash Default to using the number of logical cores divided by thread count.	2024-02-18 15:37:04 +01:00
Henrik GramnerandJean-Baptiste Kempf	97744bdc8c	x86: Add high bit-depth ipred z2 AVX-512 (Ice Lake) asm	2024-02-14 13:09:03 +00:00
Henrik GramnerandHenrik Gramner	2b475307dc	Fix tile_start_off calculations for extremely large frame sizes The tile start offset, in pixels, can exceed the range of a signed int.	2024-02-13 18:18:38 +01:00
Henrik Gramner	227c37f74a	Use a constant length for progress reporting in dav1d_argon.bash	2024-01-24 00:29:54 +00:00
Henrik Gramner	cdb2a1a27b	Avoid printing full path names in dav1d_argon.bash Only print the paths relative to the argon directory. This avoids excessive terminal line wrapping due to long path names which otherwise interferes with the '\r' usage for progress reporting.	2024-01-24 00:29:54 +00:00
Henrik Gramner	e2c7a4408b	x86: Add high bit-depth ipred z3 AVX-512 (Ice Lake) asm	2024-01-22 12:02:28 +00:00
Henrik Gramner	d23e87f7ae	checkasm: Prefer sigsetjmp()/siglongjmp() over SA_NODEFER Also prefer re-setting the signal handler upon intercept in combination with SA_RESETHAND over re-raising exceptions with the SIG_DFL handler.	2024-01-11 12:35:34 +00:00
Henrik Gramner	8501a4b201	checkasm: Make signal handling async-signal-safe	2024-01-11 12:35:34 +00:00
Henrik GramnerandHenrik Gramner	746ab8b4f3	thread_task: Properly handle spurious wakeups in delayed_fg POSIX explicitly states that spurious wakeups from pthread_cond_wake() may occur, even without any corresponding call to pthread_cond_signal().	2023-12-19 13:15:43 +01:00
Henrik GramnerandHenrik Gramner	b3f5e8cef5	thread_task: Replace goto's with a regular while-loop	2023-12-19 13:15:43 +01:00
Henrik GramnerandHenrik Gramner	8ba0df8492	checkasm: Fix cdef_dir function prototype	2023-12-19 12:11:46 +01:00
Henrik GramnerandHenrik Gramner	b3779b89c0	x86: Add high bit-depth ipred z1 AVX-512 (Ice Lake) asm	2023-12-11 14:15:30 +01:00
Henrik GramnerandHenrik Gramner	0a8d66402e	x86: Require fast gathers for AVX-512 horizontal loopfilters Prefer using the AVX2 implementations (which doesn't use gathers) on Zen 4.	2023-12-08 16:21:13 +01:00
Henrik GramnerandHenrik Gramner	a04a724719	x86: Require fast gathers for high bit-depth AVX-512 film grain Prefer using the SSSE3 implementations on Zen 4.	2023-12-08 16:21:13 +01:00
Henrik GramnerandHenrik Gramner	0e438e70fa	x86: Require fast gathers for AVX-512 mc resize and warp Prefer using the AVX2 implementations (which doesn't use gathers) on Zen 4.	2023-12-08 16:21:13 +01:00
Henrik GramnerandHenrik Gramner	ec05e9b978	x86: Flag Zen 4 as having slow gathers	2023-12-08 15:34:16 +01:00
Henrik GramnerandHenrik Gramner	3c41fa88ce	x86: Add 8-bit ipred z2 AVX-512 (Ice Lake) asm	2023-11-13 13:05:58 +01:00
Henrik GramnerandHenrik Gramner	e47a39ca95	x86: Fix 8bpc AVX2 ipred_z2 filtering with extremely large frame sizes The max_width/max_height values can exceed 16-bit range.	2023-11-12 22:52:18 +01:00
Henrik GramnerandHenrik Gramner	d2ee43892b	checkasm: Improve DSP trimming error message	2023-11-01 14:43:19 +01:00
Henrik GramnerandHenrik Gramner	611abc20db	checkasm: Add missing WINAPI_PARTITION checks on Windows Some functionality is only available on WINAPI_PARTITION_DESKTOP systems.	2023-11-01 14:43:19 +01:00
Henrik GramnerandHenrik Gramner	6bc552eb28	checkasm: Enable virtual terminal processing on Windows This allows for the use of standard VT100 escape codes for text coloring, which simplifies things by eliminating a bunch of Windows-specific code. This is only supported since Windows 10. Things will still run on older systems, just without colored text output.	2023-11-01 14:43:18 +01:00
Henrik GramnerandHenrik Gramner	0f2a877e7e	checkasm: Check for errors in command line parsing	2023-11-01 13:59:46 +01:00
Henrik GramnerandHenrik Gramner	9dbf46285d	ci: Fix test-debian-asan running checkasm with non-existing arguments	2023-11-01 13:59:46 +01:00
Henrik GramnerandHenrik Gramner	fd4ecc2fd8	x86: Add 8-bit ipred z3 AVX-512 (Ice Lake) asm	2023-10-19 17:00:20 +02:00
Henrik GramnerandHenrik Gramner	4c012978fb	x86: Add 8-bit ipred z1 AVX-512 (Ice Lake) asm	2023-10-04 11:49:57 +02:00
Henrik GramnerandHenrik Gramner	8936bab7ba	x86: Consolidate some pb_0to31 and pb_0to63 constants	2023-10-04 11:49:43 +02:00
Henrik GramnerandHenrik Gramner	97becd7372	Use the correct free() function on dav1d_mem_pool_init() failure	2023-08-18 17:41:50 +02:00
Henrik GramnerandHenrik Gramner	43a11ccb20	Account for chroma subsampling when allocating cbi buffers Reduces memory usage (by 3 kB per sb128 for 4:2:0) when decoding streams with subsampled chroma when frame threading is enabled. This also simplifies the logic for calculating cbi indices. Both entropy decoding and reconstruction access the elements in the same order, so calculating block x/y positions is redundant and we can instead just store values sequentially and increase the pointer by one every time it's accessed.	2023-07-18 14:21:57 +02:00
Henrik GramnerandHenrik Gramner	9eace34cba	x86: Fix misaligned loads in high bit-depth pal_pred SSSE3 asm Regression introduced in `72e9c7c`.	2023-07-07 01:56:10 +02:00
Henrik GramnerandHenrik Gramner	8dbf789ebe	x86: Add pal_idx_finish asm	2023-07-06 23:12:43 +02:00
Henrik GramnerandHenrik Gramner	852cc3409f	Move palette packing/edge-extension into a DSP function	2023-07-06 23:12:43 +02:00
Henrik GramnerandHenrik Gramner	72e9c7c095	Pack palette indices Pack two indices into each byte instead of storing them separately. Reduces memory usage by up to 16 kB per sb128 in streams that uses screen content tools when frame-threading is enabled, at the cost of some additional computational overhead for packing/unpacking.	2023-07-06 23:10:22 +02:00
Henrik GramnerandHenrik Gramner	233a424c38	Use pixel instead of uint16_t for palette buffers Reduces memory usage by 6 kB per sb128 in 8bpc streams that uses screen content tools when frame-threading is enabled.	2023-07-06 23:10:22 +02:00
Henrik GramnerandHenrik Gramner	d437510e9d	Remove redundant 4:4:4 wedge sign tables Only one of the sign or no-sign 4:4:4 tables are ever used for any given wedge index, so there's no point in having both. Reduces the table size by around 50 kB.	2023-07-06 14:42:17 +02:00
Henrik GramnerandHenrik Gramner	90a45d89de	Optimize the size of interintra/wedge index tables Replace pointers with 16-bit relative offsets and remove entries for unused block sizes (only 8x8..32x32 are relevant). Reduces the table size by around 17 kB.	2023-07-06 14:42:14 +02:00
Henrik GramnerandHenrik Gramner	31de9d5093	Replace validate_input() with assert() in internal functions Always-enabled basic sanity checks in API functions is reasonable, but within internal functions assert() is more appropriate when it comes to checking for "should never happen" conditions.	2023-06-07 13:35:07 +02:00
Henrik GramnerandHenrik Gramner	47e2e672d1	Eliminate validate_input() printf calls in release mode	2023-06-07 13:35:06 +02:00
Henrik GramnerandHenrik Gramner	682fb1ba14	Add a SIZE_MAX/2 validation check in dav1d_parse_sequence_header()	2023-06-07 13:35:04 +02:00
Henrik GramnerandHenrik Gramner	517777270c	Add a debug feature for tracking heap memory usage	2023-06-07 12:18:34 +02:00
Henrik GramnerandHenrik Gramner	ed22e23d9a	build: Simplify malloc handling	2023-06-06 22:10:57 +02:00