These have horrible support in legacy swscale; in particular, they break the
pixel range (limited vs full) when converting to yuva444p, resulting in SSIM
errors like:
uyva 96x96 -> grayf32le 96x96, SSIM={Y=0.997654 U=1.000000 V=1.000000 A=1.000000} loss=1.876414e-03
loss 1.876414e-03 is worse by 1.864254e-03, expected loss 1.215935e-05
(The ops-based backend gets a 100% bit-exact roundtrip here)
Signed-off-by: Niklas Haas <git@haasn.dev>
When the user passes multiple backends (e.g. SWS_BACKEND_ALL), the
static check in sws_setup_frame() might have succeeded for the ops
backend but not the legacy backend, so we need to properly restrict
the legacy backend implementation function as well. Otherwise, this
may trigger internal errors / AVERROR(EINVAL) inside sws_init_context().
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
If the user passes `-backends all` but without `-flags unstable`, then the
default/legacy backend will be picked unless it doesn't support a given
pixel format.
This allows gradually opting into the new code to handle more pixel formats
than what the legacy backend currently supports, without disturbing the
predictable output/behavior.
Signed-off-by: Niklas Haas <git@haasn.dev>
This allows constraining the set of available backends. This serves as a
better replacement for the "unstable" flag, which is a bit ambiguous. Allows
users to, for example, opt into the memcpy or x86 backend, while excluding
e.g. the upcoming JIT backends.
Signed-off-by: Niklas Haas <git@haasn.dev>
This will be used eventually when I rewrite checkasm/sw_ops to re-use the
code in ops_dispatch.c instead of hand-rolling the execution layer.
Signed-off-by: Niklas Haas <git@haasn.dev>
This function actually lives in ops_dispatch.c, and doesn't really make
sense in ops.h anymore. We should also move some stuff out of ops_internal.h,
which doesn't depend on any external ops stuff, here.
This allows the backend/compilation-related stuff to co-exist more nicely.
Signed-off-by: Niklas Haas <git@haasn.dev>
Mirroring the precedent established by the other SwsOp-generating functions.
This allows us to re-use it for the uops macro generator.
Signed-off-by: Niklas Haas <git@haasn.dev>
Allows the pass buffer allocator to make smarter decisions based on the actual
alignment requirements of the specific pass.
Signed-off-by: Niklas Haas <git@haasn.dev>
The question of whether to do vertical or horizontal scaling first is a tricky
one. There are several valid philosophies:
1. Prefer horizontal scaling on the smaller pixel size, since this lowers the
cost of gather-based kernels.
2. Prefer minimizing the number of total filter taps, i.e. minimizing the size
of the intermediate image.
3. Prefer minimizing the number of rows horizontal scaling is applied to.
Empirically, I'm still not sure which approach is best overall, and it probably
depends at least a bit on the exact filter kernels in use. But for now, I
opted to implement approach 3, which seems to work well. I will re-evaluate
this once the filter kernels are actually finalized.
The 'scale' in 'libswscale' can now stand for 'scaling'.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
When an op list needs to be decomposed into a more complicated sequence
of passes, the compile() code may need to roll back passes that have already
been partially compiled, if a later pass fails to compile.
This matters for subpass splitting (e.g. for filtering), as well as for
plane splitting.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
There's no reason to immediately allocate all of these; we can do it at the
end when we know for sure which passes we have.
This will matter especially if we ever add a way to remove passes again after
adding them (spoiler: we will).
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
adapt_colors() allocates a SwsLut3D before calling add_convert_pass(). If add_convert_pass() fails, the function returns without freeing the previously allocated lut. Free lut on that error path.
Signed-off-by: Huihui_Huang <hhhuang@smu.edu.sg>
Another step towards a cleaner API, with a cleaner separation of purposes.
Also avoids wasting a whopping one third of the flag space on what really
shouldn't have been a flag to begin with.
I pre-emptively decided to separate the scaler selection between "scaler"
and "scaler_sub", the latter defining what's used for things like 4:2:0
subsampling.
This allows us to get rid of the awkwardly defined SWS_BICUBLIN flag, in favor
of that just being the natural consequence of using a different scaler_sub.
Lastly, I also decided to pre-emptively axe the poorly defined and
questionable SWS_X scaler, which I doubt ever saw much use. The old flag
is still available as a deprecated flag, anyhow.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
In case we ever need to increase this number in the future.
I won't bother bumping the ABI version for this new #define, since it doesn't
affect ABI, and I'm about to bump the ABI version in a following commit.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
More useful than just allowing it to "modify" the ops; in practice this means
the contents will be undefined anyways - might as well have this function
take care of freeing it afterwards as well.
Will make things simpler with regards to subpass splitting.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Useful for a handful of reasons, including Vulkan (which depends on external
device resources), but also a change I want to make to the tail handling.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Now that this function returns a status code and takes care of cleanup on
failure, many call-sites can just return the function directly.
Signed-off-by: Niklas Haas <git@haasn.dev>
This is arguably more convenient for most downstream users, as will be
more prominently seen in the next commit.
Also allows this code to re-use a pass_free() helper with the graph uninit.
Signed-off-by: Niklas Haas <git@haasn.dev>
This is just slightly common enough a pattern that it IMO makes sense to do
so. This will also make more sense after the following commits.
Signed-off-by: Niklas Haas <git@haasn.dev>
This condition was weaker than necessary.
In particular, graph->num_thread == 1 guarantees pass->num_slices == 1.
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of once at the start of add_convert_pass(). This makes much
more sense in light of the fact that we want to start e.g. splitting
passes apart.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This allows distinguishing between different types of failure, e.g.
AVERROR(EINVAL) on invalid pass dimensions.
Signed-off-by: Niklas Haas <git@haasn.dev>
The code was evidently designed at one point in time to support "direct"
execution (not via a thread pool) for num_threads == 1, but this was never
implemented.
As a side benefit, reduces context creation overhead in single threaded
mode (relevant e.g. inside the libswscale self test), due to not needing to
spawn and destroy several thousand worker threads.
Co-authored-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Niklas Haas <git@haasn.dev>
AVFrame just really doesn't have the semantics we want. However, there a
tangible benefit to having SwsFrame act as a carbon copy of a (subset of)
AVFrame.
This partially reverts commit 67f3627267.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This has now become fully redundant with AVFrame, especially since the
existence of SwsPassBuffer. Delete it, simplifying a lot of things and
avoiding reinventing the wheel everywhere.
Also generally reduces overhead, since there is less redundant copying
going on.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
And have ff_sws_graph_run() just take a bare AVFrame. This will help with
an upcoming change, aside from being a bit friendlier towards API users
in general.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This commit replaces the AVBufferRef inside SwsPassBuffer by an AVFrame, in
anticipation of the SwsImg removal.
Incidentally, we could also now just use av_frame_get_buffer() here, but
at the cost of breaking the 1:1 relationship between planes and buffers,
which is required for per-plane refcopies.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This function was originally written to support the use case of e.g.
partially allocated planes that implicitly reference the original input
image, but I've decided that this is stupid and doesn't currently work
anyways.
Plus, I have plans to kill SwsImg, so we need to simplify this mess.
Signed-off-by: Niklas Haas <git@haasn.dev>