Rather than hard-coding a separate set of NASM macros, or generating them
with a separate function, we can just leverage the C preprocessor to generate
a NASM source file *from* the existing ops macros.
This is maybe a bit unorthodox, but it avoids unnecessary overhead from
re-generating the macros twice, avoids manual updating of the NASM macros,
and generally does not come with any real downside except being a bit ugly.
The main source of ugliness is the fact that the C preprocessor expands
everything into a single line, whereas NASM expects separate statements to
be on separate lines. Very fortunately, we can work around this by writing a
another NASM macro to take its arguments and dump them onto multiple lines.
It may seem premature, but I went ahead and defined all the macros, since
it was easy enough to do.
I added the %include in this commit to trigger build errors that occur only
as a result of introducing this file in the same commit that introduces it.
Signed-off-by: Niklas Haas <git@haasn.dev>
The checkasm tool originated in x264. It was later rewritten and
modernized for FFmpeg (and relicensed to LGPL). For the dav1d
project, it was relicensed again to 2-clause BSD (with permission
from the relevant authors).
The FFmpeg and dav1d implementations of checkasm have since evolved
independently (with some amount of ported code between the two,
with relicensing permission where relevant).
To synchronize the development, and to make it possible to easily
adopt checkasm in other projects, it has been split out into a
standalone project/library on its own, developed at
https://code.videolan.org/videolan/checkasm/.
That version has all the features of checkasm in both FFmpeg and
dav1d, and has got a number of extra improvements on top:
- More/fixed tests (e.g. properly clobbering high bits of 32-bit registers
on most platforms),
- Vastly improved overall performance / runtime for benchmarking, due
primarily to the ability to scale the runtime of each test to that test's
complexity.
- Much more robust statistical analysis of benchmarking results; including
robust outlier rejection, an estimation of the histogram, and the ability
to report the variance / stddev in addition to the (trimmed) mean.
- Interactive HTML and JSON output formats in addition to CSV/TSV.
- More readable and user-friendly output across the board, especially for
failures and data dumps (e.g. also showing errors inside padding bytes).
- Better cross-platform support, including dynamic fallback of timer
implementations on ARM platforms, a better RISC-V and AArch64 harness,
and more.
On AArch64, it tests which timer out of pmccntr_el0, linux perf,
macos kperf, cntvct_el0 is available, without the user needing to
configure things, and falling back on clock_gettime if neither of
them can be used. This means one automatically gets the best
available timer, if userspace access to pmccntr_el0 has been
unlocked with a kernel module, or if one has permission to use
the perf API, or if the cntvct_el0 is exact enough to be useful.
On AArch64 macOS, there is now a test harness that catches clobbered
registers and stack clobbering, like on other platforms.
- An option for setting affinity, for benchmarking on heterogenous
core systems. (On Linux, this is already easily done through
taskset, but on Windows, the checkasm built in option makes it
possible there as well, and portable.)
- Printing of the tested CPU core name, where possible.
To integrate this external implementation of checkasm into FFmpeg,
without having to build libcheckasm as an external library, the upstream
sources are added as a git subtree, and integrated into the FFmpeg
build system as a foreign source.
For the long and storied history of how we arrived at this solution,
see: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/22546
The relevant config headers for checkasm are generated by configure,
and the sources are built as part of the main ffmpeg build. The
upstream sources, while they use meson as primary build system,
are structured to make it easy to build as part of a foreign build
system.
The existing testcases are mostly kept untouched (only three minor
changes are required, in crc.c, sw_ops.c and vp8dsp.c), while the
majority of the logic from checkasm.c, checkasm.h and the arch
specific assembly files are removed, replaced with the external
implementation.
Co-Authored-By: Martin Storsjö <martin@martin.st>
Signed-off-by: Niklas Haas <git@haasn.dev>
This commit pieces together the previous few commits to implement the
NEON backend for sws_ops.
In essence, a tool which runs on the target (sws_ops_aarch64) is used
to enumerate all the functions that the backend needs to implement. The
list it generates is stored in the repository (ops_entries.c).
The list from above is used at build time by a code generator tool
(ops_asmgen) to implement all the sws_ops functions the NEON backend
supports, and generate a lookup function in C to retrieve the assembly
function pointers.
At runtime, the NEON backend fetches the function pointers to the
assembly functions and chains them together in a continuation-passing
style design, similar to the x86 backend.
The following speedup is observed from legacy swscale to NEON:
A520: Overall speedup=3.780x faster, min=0.137x max=91.928x
A720: Overall speedup=4.129x faster, min=0.234x max=92.424x
And the following from the C sws_ops implementation to NEON:
A520: Overall speedup=5.513x faster, min=0.927x max=14.169x
A720: Overall speedup=4.786x faster, min=0.585x max=20.157x
The slowdowns from legacy to NEON are the same for C/x86. Mostly low
bit-depth conversions that did not perform dithering in legacy.
The 0.585x outlier from C to NEON is gbrpf32le -> gbrapf32le, which is
mostly memcpy with the C implementation. All other conversions are
better.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
The linker command can exceed the maximum argument limit on MinGW,
especially for libavcodec.
The objects list is now stored in a file and passed to the linker.
For example, given TensorFlow model file espcn.pb,
to generate native model file espcn.model, just run:
python convert.py espcn.pb
In current implementation, the native model file is generated for
specific dnn network with hard-code python scripts maintained out of ffmpeg.
For example, srcnn network used by vf_sr is generated with
https://github.com/HighVoltageRocknRoll/sr/blob/master/generate_header_and_model.py#L85
In this patch, the script is designed as a general solution which
converts general TensorFlow model .pb file into .model file. The script
now has some tricky to be compatible with current implemention, will
be refined step by step.
The script is also added into ffmpeg source tree. It is expected there
will be many more patches and community needs the ownership of it.
Another technical direction is to do the conversion in c/c++ code within
ffmpeg source tree. While .pb file is organized with protocol buffers,
it is not easy to do such work with tiny c/c++ code, see more discussion
at http://ffmpeg.org/pipermail/ffmpeg-devel/2019-May/244496.html. So,
choose the python script.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
* commit '11a9320de54759340531177c9f2b1e31e6112cc2':
build: Move build-system-related helper files to a separate subdirectory
"ffbuild" directory name is used instead of "avbuild".
Merged-by: Clément Bœsch <u@pkh.me>
* commit '6641819feedb086ebba3d2be89b8d33980f367e1':
build: Ignore generated mapfile and remove it on distclean
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit '06edef3d5e072ef3c4face9ce946d2d9c36cc477':
Generate the lists of enabled protocols/bsfs from configure.
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit '48362ceadeb2eb5286ae94ef7f9542d990ff7ec7':
doc: Update paths to match new examples location
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
If links don't work, fall back to using the full source path as was
previously done.
This should fix build failures with MSVC.
Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
During a build, a lot of *.o.-hash files are created - had not noticed
this as they are usually dumped in tmpfs on Linux. However, they
sometimes are present during a long build in the project directory, making it
annoying to commit while the project is being built.
These have been observed with Clang, -fsanitize-undefined on Arch Linux,
though other configurations may also generate such temporaries.
The solution here is on lines with the Linux kernel's .gitignore:
https://github.com/torvalds/linux/blob/master/.gitignore.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
It provides the following features:
* verify correctness by comparing output to the C version.
* detect failure to save and restore clobbered callee-saved registers.
* detect 32-bit parameters being used as if they were 64-bit in x86-64
(the upper halves are not guaranteed to be zero - but in practice
they very often are, which makes those bugs hard to spot otherwise).
* easy benchmarking.
Compile by running 'make checkasm'.
Execute by running 'tests/checkasm/checkasm'.
Optional arguments are '--bench' to run benchmarks for all functions,
'--bench=<pattern>' to run benchmarks for all functions that starts with
<pattern>, and '<integer>' to seed the PRNG for reproducible results.
Contains unit tests for most h264pred functions to get started, more tests
can be added afterwards using those as a reference.
Loosely based on code from x264. Currently only supports x86 and x86-64,
but additional architectures shouldn't be too much of an obstacle to add.
Note that functions with floating point parameters or floating point
return values are not supported. Some compiler-specific features or
preprocessor hacks would likely be required to add support for that.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* commit '1316df7aa98c4784f190d107206d0bb12c590b89':
lavu: add an API function to return the Libav version string
Conflicts:
.gitignore
Makefile
cmdutils.c
doc/APIchanges
libavutil/avutil.h
libavutil/utils.c
See: f91126643a
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This returns something like "v12_dev0-1332-g333a27c". This is much more
useful than the individual library versions, of which there are too
many, and which are very hard to map back to releases or git commits.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
The reasoning behind this addition is that various third party
applications are interested in getting some motion information out of a
video "for free" when it is available.
It was considered to export other information as well (such as the intra
information about the block, or the quantization) but the structure
might have ended up into a half full-generic, half full of codec
specific cruft. If more information is necessary, it should either be
added in the "flags" field of the AVMotionVector structure, or in
another side-data.
This commit also includes an example exporting them in a CSV stream.
* commit '706208ef47bffd525c982975d2756f7b2b220b8d':
fate: Split fate-pixdesc tests and dispatch them through Make
Conflicts:
tests/fate-run.sh
tests/ref/fate/filter-pixdesc
Merged-by: Michael Niedermayer <michaelni@gmx.at>
It has not been properly maintained for years and there is little hope
of that changing in the future.
It appears simpler to write a new replacement from scratch than
unbreaking it.