dav1d

x/dav1d

mirror of https://code.videolan.org/videolan/dav1d synced 2026-06-11 04:03:05 +00:00

Author	SHA1	Message	Date
Ronald S. Bultje	583e8e02eb	tools/dav1d: initialize elapsed Based on the following comment on IRC: "<aconz2> the `elapsed` variable in main() is read uninitialized in synchronize and makes the first frametime with --frametime incorrect I think. Should be initialized to 0" Confirmed that after initializing to zero, the first line in the file generated by --frametime is reasonable.	2025-07-01 08:26:31 -04:00
Ronald S. Bultje	ca83ee6d9d	itx: restrict number of columns iterated over based on EOB	2024-06-17 12:44:34 -04:00
Ronald S. Bultje	f1c518901b	Increase timeout multiplier for aarch64/riscv64/la64-qemu CI jobs They have been failing occasionally lately.	2024-04-13 09:53:54 -04:00
Ronald S. Bultje	7f5d3492f6	picture.c: rename picture_alloc_with_edges() to picture_alloc() The allocated picture has no edges and is not expected to have any edges, so the _with_edges() suffix was misleading. Fixes #415.	2024-02-02 10:57:31 -05:00
Ronald S. Bultje	18b6ed7008	Verify ref frame results after decoding completion This fixes the issue where - when frame threading is active - that a reference could successfully progress to a particular sbrow and signal that, have that picked up by a frame it serves as a reference for, which therefore decodes successfully, even though the reference might fail decoding at a later stage.	2024-02-02 09:14:52 -05:00
Ronald S. Bultje	6d33d1796b	Check for trailing marker/zero bits for tile data Fixes #385.	2024-02-02 09:14:35 -05:00
Ronald S. Bultje	ceeb535d94	qm: derive more tables at runtime This reduces binary size from ~50kb to ~35kb. Ideas provided by Yu-Chen (Eric) Sun and Ryan Lei from Meta.	2024-01-03 13:42:40 -05:00
Ronald S. Bultje	47107e384b	deblock_avx512: convert byte-shifts to gf2p8affineqb	2023-10-05 17:24:34 +00:00
Ronald S. Bultje	ad0f3e6a4b	x86: add AVX512-IceLake implementation of HBD 64x64 DCT^2 Also implement "fast3" path for pass2.dct64 (where 1/8th of the coefficients are non-zero), which affects 32x64 as well as 64x64. Before: inv_txfm_add_32x64_dct_dct_1_10bpc_c: 51008.6 ( 1.00x) inv_txfm_add_32x64_dct_dct_1_10bpc_sse4: 3351.9 (15.22x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx2: 1419.5 (35.93x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx512icl: 744.8 (68.49x) After: inv_txfm_add_32x64_dct_dct_1_10bpc_c: 51019.5 ( 1.00x) inv_txfm_add_32x64_dct_dct_1_10bpc_sse4: 3276.1 (15.57x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx2: 1420.7 (35.91x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx512icl: 668.3 (76.34x) (Not sure why the SSE4 speed changed.) And speed for 64x64: inv_txfm_add_64x64_dct_dct_0_10bpc_c: 3506.9 ( 1.00x) inv_txfm_add_64x64_dct_dct_0_10bpc_sse4: 535.6 ( 6.55x) inv_txfm_add_64x64_dct_dct_0_10bpc_avx2: 223.5 (15.69x) inv_txfm_add_64x64_dct_dct_0_10bpc_avx512icl: 252.4 (13.89x) inv_txfm_add_64x64_dct_dct_1_10bpc_c: 108353.7 ( 1.00x) inv_txfm_add_64x64_dct_dct_1_10bpc_sse4: 6551.9 (16.54x) inv_txfm_add_64x64_dct_dct_1_10bpc_avx2: 2876.8 (37.66x) inv_txfm_add_64x64_dct_dct_1_10bpc_avx512icl: 1310.1 (82.70x) inv_txfm_add_64x64_dct_dct_2_10bpc_c: 108347.6 ( 1.00x) inv_txfm_add_64x64_dct_dct_2_10bpc_sse4: 7985.4 (13.57x) inv_txfm_add_64x64_dct_dct_2_10bpc_avx2: 3561.8 (30.42x) inv_txfm_add_64x64_dct_dct_2_10bpc_avx512icl: 1962.6 (55.20x) inv_txfm_add_64x64_dct_dct_3_10bpc_c: 108455.5 ( 1.00x) inv_txfm_add_64x64_dct_dct_3_10bpc_sse4: 9709.0 (11.17x) inv_txfm_add_64x64_dct_dct_3_10bpc_avx2: 4220.5 (25.70x) inv_txfm_add_64x64_dct_dct_3_10bpc_avx512icl: 2991.1 (36.26x) inv_txfm_add_64x64_dct_dct_4_10bpc_c: 108349.9 ( 1.00x) inv_txfm_add_64x64_dct_dct_4_10bpc_sse4: 11048.0 ( 9.81x) inv_txfm_add_64x64_dct_dct_4_10bpc_avx2: 4898.1 (22.12x) inv_txfm_add_64x64_dct_dct_4_10bpc_avx512icl: 3108.1 (34.86x)	2023-04-20 12:08:42 +00:00
Ronald S. Bultje	68d7a76d08	x86: add AVX512-IceLake implementation of HBD 64x32 DCT^2 inv_txfm_add_64x32_dct_dct_0_10bpc_c: 1760.6 ( 1.00x) inv_txfm_add_64x32_dct_dct_0_10bpc_sse4: 271.1 ( 6.49x) inv_txfm_add_64x32_dct_dct_0_10bpc_avx2: 121.3 (14.52x) inv_txfm_add_64x32_dct_dct_0_10bpc_avx512icl: 116.3 (15.14x) inv_txfm_add_64x32_dct_dct_1_10bpc_c: 66507.4 ( 1.00x) inv_txfm_add_64x32_dct_dct_1_10bpc_sse4: 3712.4 (17.91x) inv_txfm_add_64x32_dct_dct_1_10bpc_avx2: 1830.5 (36.33x) inv_txfm_add_64x32_dct_dct_1_10bpc_avx512icl: 805.4 (82.58x) inv_txfm_add_64x32_dct_dct_2_10bpc_c: 66491.6 ( 1.00x) inv_txfm_add_64x32_dct_dct_2_10bpc_sse4: 5325.3 (12.49x) inv_txfm_add_64x32_dct_dct_2_10bpc_avx2: 2578.5 (25.79x) inv_txfm_add_64x32_dct_dct_2_10bpc_avx512icl: 1394.5 (47.68x) inv_txfm_add_64x32_dct_dct_3_10bpc_c: 66490.2 ( 1.00x) inv_txfm_add_64x32_dct_dct_3_10bpc_sse4: 6418.5 (10.36x) inv_txfm_add_64x32_dct_dct_3_10bpc_avx2: 3305.6 (20.11x) inv_txfm_add_64x32_dct_dct_3_10bpc_avx512icl: 2571.5 (25.86x) inv_txfm_add_64x32_dct_dct_4_10bpc_c: 66508.6 ( 1.00x) inv_txfm_add_64x32_dct_dct_4_10bpc_sse4: 8671.2 ( 7.67x) inv_txfm_add_64x32_dct_dct_4_10bpc_avx2: 4054.2 (16.40x) inv_txfm_add_64x32_dct_dct_4_10bpc_avx512icl: 2691.6 (24.71x)	2023-04-18 11:01:53 -04:00
Ronald S. Bultje	0b809a9281	x86: add AVX512-IceLake implementation of HBD 64x16 DCT^2 inv_txfm_add_64x16_dct_dct_0_10bpc_c: 892.0 ( 1.00x) inv_txfm_add_64x16_dct_dct_0_10bpc_sse4: 131.5 ( 6.78x) inv_txfm_add_64x16_dct_dct_0_10bpc_avx2: 63.4 (14.07x) inv_txfm_add_64x16_dct_dct_0_10bpc_avx512icl: 56.8 (15.71x) inv_txfm_add_64x16_dct_dct_1_10bpc_c: 29253.7 ( 1.00x) inv_txfm_add_64x16_dct_dct_1_10bpc_sse4: 1639.7 (17.84x) inv_txfm_add_64x16_dct_dct_1_10bpc_avx2: 1106.8 (26.43x) inv_txfm_add_64x16_dct_dct_1_10bpc_avx512icl: 532.9 (54.89x) inv_txfm_add_64x16_dct_dct_2_10bpc_c: 29249.8 ( 1.00x) inv_txfm_add_64x16_dct_dct_2_10bpc_sse4: 3065.6 ( 9.54x) inv_txfm_add_64x16_dct_dct_2_10bpc_avx2: 1791.0 (16.33x) inv_txfm_add_64x16_dct_dct_2_10bpc_avx512icl: 1108.0 (26.40x) inv_txfm_add_64x16_dct_dct_3_10bpc_c: 29269.1 ( 1.00x) inv_txfm_add_64x16_dct_dct_3_10bpc_sse4: 3738.2 ( 7.83x) inv_txfm_add_64x16_dct_dct_3_10bpc_avx2: 1790.9 (16.34x) inv_txfm_add_64x16_dct_dct_3_10bpc_avx512icl: 1203.8 (24.31x) inv_txfm_add_64x16_dct_dct_4_10bpc_c: 29337.7 ( 1.00x) inv_txfm_add_64x16_dct_dct_4_10bpc_sse4: 3749.7 ( 7.82x) inv_txfm_add_64x16_dct_dct_4_10bpc_avx2: 1791.0 (16.38x) inv_txfm_add_64x16_dct_dct_4_10bpc_avx512icl: 1203.8 (24.37x)	2023-04-13 10:36:38 -04:00
Ronald S. Bultje	6ae5766724	x86: add AVX512-IceLake implementation of HBD 32x64 DCT^2 inv_txfm_add_32x64_dct_dct_0_10bpc_c: 1783.5 ( 1.00x) inv_txfm_add_32x64_dct_dct_0_10bpc_sse4: 243.3 ( 7.33x) inv_txfm_add_32x64_dct_dct_0_10bpc_avx2: 119.1 (14.97x) inv_txfm_add_32x64_dct_dct_0_10bpc_avx512icl: 142.6 (12.50x) inv_txfm_add_32x64_dct_dct_1_10bpc_c: 50422.5 ( 1.00x) inv_txfm_add_32x64_dct_dct_1_10bpc_sse4: 2880.5 (17.50x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx2: 1423.4 (35.43x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx512icl: 741.6 (67.99x) inv_txfm_add_32x64_dct_dct_2_10bpc_c: 50433.6 ( 1.00x) inv_txfm_add_32x64_dct_dct_2_10bpc_sse4: 4015.1 (12.56x) inv_txfm_add_32x64_dct_dct_2_10bpc_avx2: 1767.7 (28.53x) inv_txfm_add_32x64_dct_dct_2_10bpc_avx512icl: 960.8 (52.49x) inv_txfm_add_32x64_dct_dct_3_10bpc_c: 50422.2 ( 1.00x) inv_txfm_add_32x64_dct_dct_3_10bpc_sse4: 4500.5 (11.20x) inv_txfm_add_32x64_dct_dct_3_10bpc_avx2: 2111.7 (23.88x) inv_txfm_add_32x64_dct_dct_3_10bpc_avx512icl: 1777.1 (28.37x) inv_txfm_add_32x64_dct_dct_4_10bpc_c: 50444.2 ( 1.00x) inv_txfm_add_32x64_dct_dct_4_10bpc_sse4: 5592.8 ( 9.02x) inv_txfm_add_32x64_dct_dct_4_10bpc_avx2: 2458.1 (20.52x) inv_txfm_add_32x64_dct_dct_4_10bpc_avx512icl: 1867.2 (27.02x)	2023-04-12 19:16:21 -04:00
Ronald S. Bultje	5aa3b38f98	x86: add AVX512-IceLake implementation of HBD 16x64 DCT^2 nop: 39.4 inv_txfm_add_16x64_dct_dct_0_10bpc_c: 2208.0 ( 1.00x) inv_txfm_add_16x64_dct_dct_0_10bpc_sse4: 133.5 (16.54x) inv_txfm_add_16x64_dct_dct_0_10bpc_avx2: 71.3 (30.98x) inv_txfm_add_16x64_dct_dct_0_10bpc_avx512icl: 102.0 (21.66x) inv_txfm_add_16x64_dct_dct_1_10bpc_c: 25757.0 ( 1.00x) inv_txfm_add_16x64_dct_dct_1_10bpc_sse4: 1366.1 (18.85x) inv_txfm_add_16x64_dct_dct_1_10bpc_avx2: 657.6 (39.17x) inv_txfm_add_16x64_dct_dct_1_10bpc_avx512icl: 378.9 (67.98x) inv_txfm_add_16x64_dct_dct_2_10bpc_c: 25771.0 ( 1.00x) inv_txfm_add_16x64_dct_dct_2_10bpc_sse4: 1739.7 (14.81x) inv_txfm_add_16x64_dct_dct_2_10bpc_avx2: 772.1 (33.38x) inv_txfm_add_16x64_dct_dct_2_10bpc_avx512icl: 469.3 (54.92x) inv_txfm_add_16x64_dct_dct_3_10bpc_c: 25775.7 ( 1.00x) inv_txfm_add_16x64_dct_dct_3_10bpc_sse4: 1968.1 (13.10x) inv_txfm_add_16x64_dct_dct_3_10bpc_avx2: 886.5 (29.08x) inv_txfm_add_16x64_dct_dct_3_10bpc_avx512icl: 662.6 (38.90x) inv_txfm_add_16x64_dct_dct_4_10bpc_c: 25745.9 ( 1.00x) inv_txfm_add_16x64_dct_dct_4_10bpc_sse4: 2330.9 (11.05x) inv_txfm_add_16x64_dct_dct_4_10bpc_avx2: 1008.5 (25.53x) inv_txfm_add_16x64_dct_dct_4_10bpc_avx512icl: 662.3 (38.88x)	2023-04-08 11:47:31 +00:00
Ronald S. Bultje	b6bd4007cc	lib.c: re-order so all code accessing f->* is grouped together	2022-03-09 15:31:04 -05:00
Ronald S. Bultje	b6bec5b453	lib.c: consider a cached_error as a valid output picture Fixes #277.	2022-03-08 11:37:00 -05:00
Ronald S. BultjeandHenrik Gramner	5fb9f3a460	CI: add threaded tests to avx512icl instance	2022-03-07 20:23:29 +00:00
Ronald S. Bultje	e3f4c70006	lib.c: clear cf after seeking Fixes #390.	2022-03-07 14:38:27 +00:00
Ronald S. Bultje	8ccdf0f6b9	task_thread: use EINVAL/ENOMEM instead of -1 for f->task_thread.retval	2022-02-17 08:46:22 -05:00
Ronald S. Bultje	2a00fb6d47	Forward frame-thread decoding errors back to user thread	2022-02-17 08:46:19 -05:00
Ronald S. Bultje	00d4715ca2	decode.c: remove dead assignment	2022-02-16 18:00:27 -05:00
Ronald S. Bultje	239c951f2e	decode.c: fix return value on bitstream decoding errors Change ENOMEM into EINVAL, since at this point memory allocation errors don't occur, and bitstream decoding errors are not fatal.	2022-02-16 18:00:20 -05:00
Ronald S. Bultje	cae2c4f0bd	tools/dav1d: fix infinite loop on corrupt bitstreams Unref data after decoding failure to prevent re-entering the loop with the same data.	2022-02-16 18:00:09 -05:00
Ronald S. Bultje	2131a2cdaf	Fix typo in EINVAL comparison	2022-02-09 13:21:28 +00:00
Ronald S. Bultje	a363374a83	tools/dav1d: continue on recoverable bitstream decoding errors Fixes inconsistent output frame count depending on --threads=X value for the sample in #244.	2022-02-07 13:39:58 -05:00
Ronald S. BultjeandJames Almer	f984447637	Output only latest spatial layer if --alllayers 0 Right now, --alllayers 0 will only output operating points that exactly match the largest one in the sequence header. However, in certain cases, the largest one might not be available, and a smaller one should be returned to the user instead. This matches update_frame_buffers() in aomdec to output only the latest frame if --alllayers 0 is specified. Signed-off-by: James Almer <jamrial@gmail.com>	2022-02-03 19:37:56 -03:00
Ronald S. Bultje	45e8f2f5f8	Fix indentation	2022-01-18 11:25:04 -05:00
Ronald S. Bultje	9a691b3131	add --inloopfilters to enable/disable postfilters dynamically (To be used alongside --filmgrain.) Addresses part of #310.	2022-01-14 16:27:42 -05:00
Ronald S. Bultje	b562b7f648	Set default framedelay to min(8, ceil(sqrt(n_threads))) This reduces memory usage significantly. Fixes #375.	2022-01-10 14:49:11 +00:00
Ronald S. Bultje	068697556f	Add interface to output invisible (alt-ref) frames Addresses part of #310.	2022-01-07 22:04:24 +00:00
Ronald S. Bultje	36beb8185d	Add option to write each frame to separate output file For per-file yuv/y4m writes, this can be automatically specified using e.g. -o file_%w_%h_%5n.yuv/y4m. --muxer=framemd5 -o - --quiet will accomplish the same for per-frame md5sums. Addresses part of #310.	2022-01-06 18:50:09 +00:00
Ronald S. Bultje	2337127cec	Mark failed-to-decode frames as incomplete when --maxframedelay=1 Credit to oss-fuzz.	2021-11-12 07:56:05 -05:00
Ronald S. Bultje	c7a5b90001	Fix wrong assignment if stride or sbh change, but stride * sbh don't Credit to oss-fuzz.	2021-11-11 07:29:05 -05:00
Ronald S. Bultje	c7f8c8276b	Clear clobbered coefficient array when flushing after seek Fixes #369.	2021-09-13 09:37:11 -04:00
Ronald S. Bultje	d9c01c34dc	Fix formatting string	2021-09-11 10:47:20 -04:00
Ronald S. BultjeandJean-Baptiste Kempf	eae65df192	Fix memleak Credit to Oss-Fuzz.	2021-09-05 08:19:55 +00:00
Ronald S. Bultje	12156a507b	x86/itx: 64x64 inverse dct transforms hbd/sse4	2021-08-17 14:48:20 -04:00
Ronald S. Bultje	80bfd416d7	x86/itx: 64x32 inverse dct transforms hbd/sse4	2021-08-17 14:48:14 -04:00
Ronald S. Bultje	01466edf2e	x86/itx: 64x16 inverse dct transforms hbd/sse4	2021-08-17 14:35:59 -04:00
Ronald S. Bultje	be788c6319	x86/itx: 32x64 inverse dct transforms hbd/sse4	2021-08-17 14:35:59 -04:00
Ronald S. Bultje	db6455e479	x86/itx: 16x64 inverse dct transforms hbd/sse4	2021-08-17 14:35:52 -04:00
Ronald S. Bultje	78d4c87851	itx/x86: rewrite .transpose4x8packed so it uses only m0-3,4&6 And same for .transpose4x8packed_hi.	2021-08-12 15:20:03 -04:00
Ronald S. Bultje	ec9ecba1e6	itx/x86: replace idct8x8.transpose with idct8x4.transpose4x8packed	2021-08-12 15:20:03 -04:00
Ronald S. Bultje	59770564c0	x86/itx: add 1/sqrt(2) (rect2) multiply macro	2021-08-12 15:20:01 -04:00
Ronald S. Bultje	5455e8250c	x86/itx: share pass2 loop between {16,32}x32 dct^2 functions	2021-08-12 14:47:14 -04:00
Ronald S. Bultje	9cf9d4a613	x86/itx: combine .write_8x8 and .round{1,2,3,4} into a single function	2021-08-12 14:01:45 -04:00
Ronald S. Bultje	7050f0581d	x86/itx: combine .write_8x4 and .round{1,2} into a single function	2021-08-12 14:01:45 -04:00
Ronald S. Bultje	a5cea27ce9	x86/itx: split dct/adst/identity pass=2 implementations for 16x8 This simplifies the code a bit, and allows sharing the dct pass=2 implementation with 32x8.	2021-08-12 14:01:45 -04:00
Ronald S. BultjeandJean-Baptiste Kempf	86b03c3cbe	x86/itx: 32x32 inverse dct transforms hbd/sse4	2021-08-12 16:56:40 +00:00
Ronald S. BultjeandJean-Baptiste Kempf	59b3fe6c50	x86/itx: 32x16 inverse dct transforms hbd/sse4	2021-08-12 16:56:40 +00:00
Ronald S. BultjeandJean-Baptiste Kempf	2974828a25	x86/itx: 32x8 inverse dct transforms hbd/sse4	2021-08-12 16:56:40 +00:00
Ronald S. BultjeandJean-Baptiste Kempf	de6603a207	x86/itx: 16x32 inverse dct transforms hbd/sse4	2021-08-12 16:56:40 +00:00
Ronald S. BultjeandJean-Baptiste Kempf	072eb21430	x86/itx: 8x32 inverse dct transforms hbd/sse4	2021-08-12 16:56:40 +00:00
Ronald S. Bultje	b119e71dc5	x86/itx: merge pass=2 rounding and writing operations	2021-08-10 09:06:27 -04:00
Ronald S. BultjeandJean-Baptiste Kempf	ec18f047ca	x86/itx: 32x{8,16,32} & {8,16}x32 idtx transforms hbd/sse4	2021-08-10 11:33:18 +00:00
Ronald S. Bultje	a5f32330e4	x86/itx: replace .transpose8x8 with 2 calls to .transpose4x8packed	2021-08-08 17:50:09 -04:00
Ronald S. Bultje	b34244599c	x86/itx: document third argument in INV_TXFM_WxH_FN macros	2021-08-04 10:46:41 -04:00
Ronald S. Bultje	7edb1a7ed5	x86/itx: 16x16 inverse transforms hbd/sse4	2021-08-02 18:17:32 -04:00
Ronald S. Bultje	bcc994514c	x86/itx: 16x8 inverse transforms hbd/sse4	2021-08-02 18:17:16 -04:00
Ronald S. Bultje	ac8fa32a06	x86/itx: 16x4 inverse transforms hbd/sse4	2021-08-02 18:16:04 -04:00
Ronald S. Bultje	e266f9fa40	x86/itx: 8x16 inverse transforms hbd/sse4	2021-07-28 09:13:38 -04:00
Ronald S. Bultje	d5c0831297	x86/itx: 8x8 inverse transforms hbd/sse4	2021-07-28 09:13:32 -04:00
Ronald S. Bultje	a804d43004	x86/itx: add eob-based fast path to 4x16 hbd/sse4 itx	2021-07-28 09:10:14 -04:00
Ronald S. Bultje	e7228e8013	x86/itx: add eob-based fast path to 4x8 hbd/sse4 itx	2021-07-28 09:10:14 -04:00
Ronald S. Bultje	999a1c4d2a	x86/itx: 8x4 inverse transforms hbd/sse4	2021-07-28 09:10:14 -04:00
Ronald S. Bultje	ba183d230c	x86/itx: 4x16 inverse transforms hbd/sse4	2021-07-28 09:10:10 -04:00
Ronald S. Bultje	755364cbc6	x86/itx: 4x8 inverse transforms hbd/sse4	2021-07-21 11:12:28 -04:00
Ronald S. BultjeandJean-Baptiste Kempf	c719d4a4e1	x86/filmgrain: add fguv_32x32xn_i444 HBD/AVX2	2021-07-20 12:23:15 +00:00
Ronald S. BultjeandJean-Baptiste Kempf	cc0e2d5f2d	x86/filmgrain: add fguv_32x32xn_i422 HBD/AVX2	2021-07-20 12:23:15 +00:00
Ronald S. BultjeandJean-Baptiste Kempf	8f858c2385	x86/filmgrain: add fguv_32x32xn_i422/444 HBD/SSSE3	2021-07-20 12:23:15 +00:00
Ronald S. BultjeandJean-Baptiste Kempf	42978746f4	x86/itx: change function signatures of itx_4x4 to 0 GPRs The wrapper function already backs up GPRs, and declaring 7 here means we will backup/restore twice on x86-32.	2021-07-19 13:09:20 +00:00
Ronald S. Bultje	1944317ea6	x86/filmgrain: simplify post-horizontal filter blending	2021-07-16 17:51:17 -04:00
Ronald S. BultjeandJean-Baptiste Kempf	73db537834	x86/filmgrain: add generate_grain_uv_i422/i444 HBD AVX2 & SSSE3	2021-07-15 15:07:22 +00:00
Ronald S. BultjeandJean-Baptiste Kempf	35aa1c226b	x86/filmgrain: make fguv_i420_32x32xn HBD/SSSE3 32bit-compatible	2021-07-14 17:44:21 +00:00
Ronald S. BultjeandJean-Baptiste Kempf	6235cdf16e	x86/filmgrain: make fgy_32x32xn HBD/SSSE3 32bit-compatible	2021-07-14 17:44:21 +00:00
Ronald S. Bultje	7e6fc8b040	x86/film_grain: make generate_grain_y/uv_420 32-bit compatible	2021-07-12 13:36:48 +00:00
Ronald S. BultjeandJean-Baptiste Kempf	33180d8f6f	x86/deblock: make hbd/ssse3 implementations 32bit-compatible	2021-07-06 21:45:20 +00:00
Ronald S. Bultje	da98a8d562	x86/deblock_avx2: use vpblendvb instead of pand/pandn/por in flat16/8/6	2021-07-05 07:40:27 -04:00
Ronald S. Bultje	0aca76c3b7	x86/deblock_hbd_avx2: use vpblendvb instead of pand/pandn/por in flat16/8/6	2021-07-05 07:40:24 -04:00
Ronald S. Bultje	af16b652aa	Add SSSE3 HBD filmgrain assembly optimizations	2021-06-15 09:49:02 -04:00
Ronald S. Bultje	f7043e4742	Add 10/12-bit deblock SSSE3 implementation Currently 64-bit only.	2021-06-11 12:06:15 -04:00
Ronald S. Bultje	1156c0442a	mc: add HBD/SSSE3 mc.emu_edge optimizations	2021-06-09 23:21:44 +00:00
Ronald S. Bultje	e00e741161	checkasm: allow 1 >= h >= 2 in fgy_32x32xn unit test	2021-06-05 20:22:44 +00:00
Ronald S. Bultje	a8b13fc110	Do avx2/hbd scaling*grain multiplication in 16bit instead of 32bit	2021-06-04 19:39:00 +00:00
Ronald S. Bultje	d16ddb34aa	x86: add 10/12-bpc AVX2 version of mc.emu_edge	2021-05-11 08:02:21 -04:00
Ronald S. BultjeandHenrik Gramner	3a6630707e	x86: Add high bitdepth filmgrain AVX2 asm	2021-05-10 20:41:23 +02:00
Ronald S. BultjeandHenrik Gramner	24b1a4adb3	x86: Add high bitdepth loopfilter AVX2 asm	2021-05-05 00:25:55 +02:00
Ronald S. BultjeandHenrik Gramner	87aa815cfa	x86: Add high bitdepth cdef AVX2 asm	2021-05-05 00:25:55 +02:00
Ronald S. Bultje	47daa4df33	Accumulate leb128 value using uint64_t as intermediate type The shift-amount can be up to 56, and left-shifting 32-bit integers by values >=32 is undefined behaviour. Therefore, use 64-bit integers instead. Also slightly rewrite so we only call dav1d_get_bits() once for the combined more\|bits value, and mask the relevant portions out instead of reading twice. Lastly, move the overflow check out of the loop (as suggested by @wtc) Fixes #341.	2020-06-22 21:10:55 -04:00
Ronald S. Bultje	41cd4199f1	Skip loop restoration cache buffer resize for too-small buffers Fixes crashes in dav1d_resize_{avx2,ssse3} on very small resolutions with super_res enabled but skipped because the width is too small.	2020-04-02 02:44:27 +02:00
Ronald S. Bultje	4687c4696f	x86: add SSSE3 versions for filmgrain.fguv_32x32xn[422/444] fguv_32x32xn_8bpc_420_csfl0_c: 14568.2 fguv_32x32xn_8bpc_420_csfl0_ssse3: 1162.3 fguv_32x32xn_8bpc_420_csfl1_c: 10682.0 fguv_32x32xn_8bpc_420_csfl1_ssse3: 910.3 fguv_32x32xn_8bpc_422_csfl0_c: 16370.5 fguv_32x32xn_8bpc_422_csfl0_ssse3: 1202.6 fguv_32x32xn_8bpc_422_csfl1_c: 11333.8 fguv_32x32xn_8bpc_422_csfl1_ssse3: 958.8 fguv_32x32xn_8bpc_444_csfl0_c: 12950.1 fguv_32x32xn_8bpc_444_csfl0_ssse3: 1133.6 fguv_32x32xn_8bpc_444_csfl1_c: 8806.7 fguv_32x32xn_8bpc_444_csfl1_ssse3: 731.0	2020-04-01 10:50:56 -04:00
Ronald S. Bultje	b73acaa894	x86: use btc instead of xor+test or 32byte alignment in fgy_32x32xn_ssse3	2020-04-01 10:50:22 -04:00
Ronald S. Bultje	275e91de9e	x86: add AVX2 versions for filmgrain.fguv_32x32xn[422/444] fguv_32x32xn_8bpc_420_csfl0_c: 14568.2 fguv_32x32xn_8bpc_420_csfl0_avx2: 940.2 fguv_32x32xn_8bpc_420_csfl1_c: 10682.0 fguv_32x32xn_8bpc_420_csfl1_avx2: 783.3 fguv_32x32xn_8bpc_422_csfl0_c: 16370.5 fguv_32x32xn_8bpc_422_csfl0_avx2: 1557.3 fguv_32x32xn_8bpc_422_csfl1_c: 11333.8 fguv_32x32xn_8bpc_422_csfl1_avx2: 902.1 fguv_32x32xn_8bpc_444_csfl0_c: 12950.1 fguv_32x32xn_8bpc_444_csfl0_avx2: 822.9 fguv_32x32xn_8bpc_444_csfl1_c: 8806.7 fguv_32x32xn_8bpc_444_csfl1_avx2: 708.2	2020-04-01 10:50:02 -04:00
Ronald S. Bultje	fcc94fa905	x86: use btc instead of xor+test in fgy_32x32xn_avx2	2020-04-01 10:49:41 -04:00
Ronald S. Bultje	4dd943156d	x86: don't use vptest in SSSE3 version This is the VEX (AVX) encoded variant for the SSE4 instruction ptest, so emulate it using pmovmskb in the SSSE3 version.	2020-03-31 10:26:08 -04:00
Ronald S. Bultje	e308ae49b3	x86: add SSSE3 version of mc.resize() resize_8bpc_c: 1613670.2 resize_8bpc_ssse3: 110469.5 resize_8bpc_avx2: 93580.6	2020-03-31 13:19:55 +02:00
Ronald S. Bultje	9e36b9b001	x86: add AVX2 version of mc.resize() resize_8bpc_c: 1637609.7 resize_8bpc_avx2: 95162.6	2020-03-31 13:19:55 +02:00
Ronald S. Bultje	862e5bc773	checkasm: add test for mc.resize()	2020-03-31 13:19:55 +02:00
Ronald S. Bultje	aa1866f2ba	Invert src_w/h argument in mc.resize()	2020-03-31 13:19:55 +02:00
Ronald S. Bultje	8fd5dc3a5c	Make dav1d_resize_filter[] negative so it fits in int8_t	2020-03-31 13:19:55 +02:00
Ronald S. Bultje	7f2833a991	x86: add AVX2 SIMD for ipred.cfl_ac[444] cfl_ac_444_w4_8bpc_c: 499.1 cfl_ac_444_w4_8bpc_ssse3: 24.3 cfl_ac_444_w4_8bpc_avx2: 28.9 cfl_ac_444_w8_8bpc_c: 1240.2 cfl_ac_444_w8_8bpc_ssse3: 47.4 cfl_ac_444_w8_8bpc_avx2: 34.9 cfl_ac_444_w16_8bpc_c: 1785.7 cfl_ac_444_w16_8bpc_ssse3: 86.7 cfl_ac_444_w16_8bpc_avx2: 54.6 cfl_ac_444_w32_8bpc_c: 4343.5 cfl_ac_444_w32_8bpc_ssse3: 236.5 cfl_ac_444_w32_8bpc_avx2: 113.6	2020-03-25 22:39:28 +01:00