When color management is disabled, the fragment shader was still first
ensuring straight alpha and then immediately just going back to
pre-multiplied. This is near-impossible for a shader compiler to
optimize out, I guess because of the if-statement to handle division by
zero. Having view alpha applied in between certainly didn't make it
easier.
That causes extra fragment computations that are unnecessary. In the
issue report this was found to cause a notable performance regression.
Fix the performance regression by introducing special-case paths for
when straight alpha is not needed. This skips the unnecessary
computations.
Fixes: https://gitlab.freedesktop.org/wayland/weston/-/issues/623
Fixes: 9a6a4e7032
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com>
The following GL extensions provide support for shaders CM:
-GL_OES_texture_float_linear makes GL_RGB32F linear filterable.
-GL ES 3.0 provides Texture3D support in GL API.
-GL_OES_texture_3D provides sampler3D support in ESSL 1.00.
If abovesaid is supported then renderer sets flag WESTON_CAP_COLOR_OPS
which means that all fields in struct weston_color_transform are
supported, for example, 1DLUT and 3DLUT.
Use GL_OES_texture_3D to implement 3DLUT function which
uses trilinear interpolation for pixel processing or bypass as is.
Quote from https://nick-shaw.github.io/cinematiccolor/luts-and-transforms.html
"3D LUTs have long been embraced by color scientists and are one of
the tools commonly used for gamut mapping. In fact, 3D LUTs are used
within ICC profiles to model the complex device behaviors necessary
for accurate color image reproduction".
Quote from https://developer.nvidia.com/gpugems/gpugems2/part-iii-high-quality-rendering/
chapter-24-using-lookup-tables-accelerate-color
is about interpolation: "By generating intermediate results based
on a weighted average of the eight corners of the bounding cube,
this algorithm is typically sufficient for color processing,
and it is implemented in graphics hardware".
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
This adds shader support for using a three-channel one-dimensional
look-up table for de/encoding input colors. This operation will be useful
for applying EOTF or its inverse, in other words, gamma curves. It will
also be useful in optimizing a following 3D LUT tap distribution once
support for 3D LUT is added.
Even though called three-channel and one-dimensional, it is actually
implemented as a one-channel two-dimensional texture with four rows.
Each row corresponds to a source color channel except the fourth one is
unused. The reason for having the fourth row is to get texture
coordinates in 1/8 steps instead of 1/6 steps. 1/6 may would not be
exact in floating- or fixed-point arithmetic and might perhaps risk
unintended results from bilinear texture filtering when we want linear
filtering only in x but not in y texture coordinates. I may be paranoid.
The LUT is applied on source colors after they have been converted to
straight RGB. It cannot be applied with pre-multiplied alpha. A LUT can
be used for both applying EOTF to go from source color space to blending
color space, and EOTF^-1 to go from blending space to output
(electrical) space. However, this type of LUT cannot do color space
conversions.
For now, this feature is hardcoded to off everywhere, to be enabled in
following patches.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Always when supported, make the fragment shader default floating point
precision high. The medium precision is roughly like half-floats, which
can be surprisingly bad. High precision does not reach even normal
32-bit float precision (by specification), but it's better. GL ES
implementations are allowed to exceed the minimum precision requirements
given in the specification.
This is an advance attempt to avoid nasty surprises from poor shader
precision.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Add a new shader requirements bit input_is_premult which says whether
the texture sampling results in premultiplied alpha or not. Currently
this can be deduced fully from the shader texture variant, but in the
future there might a protocol extension to explicitly control it. Hence
the need for a new bit.
yuva2rgba() is changed to produce straight alpha always. This makes
sample_input_texture() sometimes produce straight or premultiplied
alpha. The input_is_premult bit needs to match sample_input_texture()
behavior. Doing this should save three multiplications in the shader for
straight alpha formats.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Compile time constants play an important role in keeping the shader
programs fast. Introduce an informal annotation to mark compile time
constants to make the shader code easier to reason with.
This will make much more sense once functions with compile time constant
parameters are added.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com>
I have verified that the conversion here follows ITU-R BT.601 except for
the offsets 16/256 and 128/256 which should be 16/255 and 128/255
respectively.
I used to following octave script to verify this:
rf = 0.299;
gf = 0.587;
bf = 0.114;
crdiv = 1.402;
cbdiv = 1.772;
M = [ rf, gf, bf ;
-rf / cbdiv, -gf / cbdiv, (1 - bf) / cbdiv;
(1 - rf) / crdiv, -gf / crdiv, -bf / crdiv ];
YCbCr = [ 'Y'; 'Cb'; 'Cr' ];
RGB = [ 'R'; 'G'; 'B' ];
eq = [ ' '; '='; ' ' ];
l = [ ' [ '; ' [ '; ' [ ' ];
r = [ ' ] '; ' ] '; ' ] ' ];
mat = [
sprintf('%9f %9f %9f', M(1,:));
sprintf('%9f %9f %9f', M(2,:));
sprintf('%9f %9f %9f', M(3,:));
];
[ l YCbCr r eq l mat r l RGB r ]
R = inv(M);
mat = [
sprintf('%9f %9f %9f', R(1,:));
sprintf('%9f %9f %9f', R(2,:));
sprintf('%9f %9f %9f', R(3,:));
];
[ l RGB r eq l mat r l YCbCr r ]
[ R(:,1), R(:,2:3) .* (255/224) ]
The final matrix printed is what the shader uses down to +/- one digit,
so at least 7 correct decimals.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Sampling input texture has nothing to do with view alpha. This clarifies
the code structure.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Reading the input texture is just one part of the future color pipeline,
so separate it into a function of its own. This makes it easier to add
more steps to the pipeline, and shows the green tint is separate as
well.
Making use of early returns, reducing the if-else ladder should help
with readability. Sharing the call to yuva2rgba() likewise.
Setting yuva.w = alpha is not shared though, in case support for AYUV
format might be added in the future.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Do not call texture2D() in the shader when we already have the result.
Simpler code, maybe even a little bit faster?
Suggested-by: Harish Krupo <harishkrupo@gmail.com>
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com>
These same magic constants were used in all cases, so move them into a
common place.
While we are touching all these lines, also change from the four floats
into a vec4. This allows further clean-up in the next patch.
This makes the code easier to read.
Behavior and results are unchanged.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Mathematically the result is the same, while multiplying RGB with alpha
is easier to understand as correct than the earlier form.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com>
A more unique name is easier to grep for. Using 'color' as a local
variable might be useful in the future.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com>
This patch modifies the shader generation code so that the shaders are
stitched together based on the requirement instead of creating them
during initialization. This is necessary for HDR use cases where each
surface would have different properties based on which different
de-gamma or tone mapping or gamma shaders are stitched together.
v2: Use /* */ instead of // (Pekka)
Move shader strings to gl-shaders.c file (Pekka)
Remove Makefile.am changes (Pekka)
Use a struct instead of uint32_t for storing requirements (Pekka)
Clean up shader list on destroy (Pekka)
Rename shader_release -> shader_destroy (Pekka)
Move shader creation/deletion into gl-shaders.c (Pekka)
Use create_shaders's multi string capbility instead of
concatenating (Pekka)
v3: Add length check when adding shader string (Pekka)
Signed-off-by: Harish Krupo <harishkrupo@gmail.com>
v4: Rebased, PROTECTION_MODE_ENFORCED converted.
Dropped unnecessary { }.
Ported setup_censor_overrides().
Split out moving code into gl-shaders.c.
Changed to follow "gl-renderer: rewrite fragment shaders",
no more shader source stitching.
Added SHADER_VARIANT_XYUV.
Const'fy function arguments.
Added gl_shader_requirements_cmp() and moved the early return in
use_gl_program().
Moved use_gl_program() before first use in file.
Split solid shader requirements by use case: requirements_censor and
requirements_triangle_fan.
Simplified fragment_debug_binding() since no need to force anything.
Ensure struct gl_shader_requirements has no padding. This allows us
to use normal C syntax instead of memset() and memcpy() when
initializing or assigning. See also:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/2071
Make it also a bitfield to squeeze the size.
v5: Move wl_list_insert() into gl_shader_create() (Daniel)
Compare variant to explicit value. (Daniel)
Change functions to gl_renderer_get_program,
gl_renderer_use_program, and
gl_renderer_use_program_with_view_uniforms.
Use local variable instead of gr->current_shader. (Daniel)
Simplified gl_renderer_get_program.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com>
The main goal of this patch is to improve the readability of how and
what fragment shaders are generated.
Instead of having C code that assembles each shader variant from literal
string snippets, create one big fragment shader source that has
everything in it. This relies on a GLSL compiler to optimize statically
false conditions and unused uniforms away.
Having all the fragment shader code in one file, uncluttered by C string
literal syntax, improves readability significantly. A disadvantage is
that the code is more verbose, but it allows comments much better.
The actual shader code is kept unchanged except:
- FRAGMENT_CONVERT_YUV macro is now a proper function
- GLSL version is explicitly set to 1.00 ES
- RGBA and EXTERNAL use the same path, the difference is how the sampler
is declared
Further shader code consolidation is possible, but is left for another
time.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com>