Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Hardware & Software > Software players

Reply
 
Thread Tools Search this Thread Display Modes
Old 25th September 2020, 16:37   #21  |  Link
SirMaster
Registered User
 
Join Date: Feb 2019
Posts: 122
Hi @butterw2

I see you have some experience in writing HLSL.

I am interested in writing some of my own shaders but I find myself pretty lost with the syntax.

I am a developer, so I have a lot of programming experience, but maybe you help me get off to a start.

I found this shader which works, but I would like to start by simplifying it.

https://pastebin.com/VMfh8tNL

This shader allows performing RGB color convergence adjustment (useful for 3 chip projectors).

However I notice that when I enable it, it seems to always be blending pixels which screws up 1:1 pixel mapping.

I think this is due to the fact that this shader allows for sorts of scaling and geometric adjustments of the color which I don't need at all.

I would like to modify the shader to only do simple whole pixel adjustments of the red and blue colors.

If you have time, do you think you can help me get started?
SirMaster is offline   Reply With Quote
Old 25th September 2020, 17:46   #22  |  Link
butterw2
Registered User
 
Join Date: Jun 2020
Posts: 204
hlsl pixel shader Intro (dx9)

Quote:
Originally Posted by SirMaster;
If you have time, do you think you can help me get started?
Sure. I have been quite lazy of late with my pixel shading and need to get back into it.

It's actually incredibly simple if you are familiar with c syntax, provided you understand what you are trying to do. Some knowledge of vector math will be useful for more advanced image processing.

I wrote a basic intro here (also details limitations):
https://forum.videohelp.com/threads/...l)#post2587323

Checking out/modifying existing shaders is a good way to get started.
You can have multiple return statements: use the screen to display your intermediate results and comment them out as required.
define your user parameters with #define


For dev, I would recommend mpc-hc/ notepad++ with an extended c syntaxic coloring for .hlsl.
Notepad++ > Settings > Style Configurator...
C .hlsl user-defined
- instructions: main tex2D saturate dot pow max min lerp mul frac sign step sqrt clamp
- types: sampler float4 float3 float2 float3x3

User Defined Language hlsl.xml is also possible (but it is a bit flaky)


mpc-hc allows you to display the debug output / perf of your code.



// This shader can perform software alignment by Catmull-Rom spline6 interpolation for a 3LCD projector's red and blue panels
// This file is part of Video pixel shader pack. // (C) 2011 Jan-Willem Krans (janwillem32 <at> hotmail.com)
I must say I have no idea what this pixel shader is used for (some type of projector calibration ?).

// This shader should be run as a screen space pixel shader.
- screen space pixel shader: use as post-resize shader.

// This shader is meant to work with linear RGB input and output. Regular R'G'B' with a video gamma encoding will have to be converted with the linear gamma shaders to work properly.
see https://forum.doom9.org/showthread.php?t=181651
There is an overhead to converting to linear gamma, so if the effect is negligible in practice you might want to just ignore it, but it could affect the accuracy of the result.
If this is the only shader you intend to use, you could integrate the linear gamma conversion into the shader. Otherwise, I typically use a 10bit integer surface for processing in mpc-hc to avoid too much precision loss with the shader chain.

Last edited by butterw2; 2nd October 2020 at 14:24.
butterw2 is offline   Reply With Quote
Old 25th September 2020, 20:13   #23  |  Link
SirMaster
Registered User
 
Join Date: Feb 2019
Posts: 122
Thanks, I actually just e-mailed Jan-Willem Krans this afternoon asking about it and he whipped up a simple shader to do what I wanted in this case.

Code:
#define RedControls 0
#define BlueControls 0
// RedShiftLeftToRight and BlueShiftLeftToRight, a value of 3. will shift three pixels to the right, 0 is disabled
#define RedShiftLeftToRight 0.
#define BlueShiftLeftToRight 0.
// RedShiftTopToBottom and BlueShiftTopToBottom, a value of 3. will shift three pixels to the bottom, 0 is disabled
#define RedShiftTopToBottom 0.
#define BlueShiftTopToBottom 0.

sampler s0;
float2 c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
	float4 s1 = tex2D(s0, tex);// base pixel
#if RedControls == 1
	s1.r = tex2D(s0, tex+c1*float2(RedShiftLeftToRight, RedShiftTopToBottom)).r;// base red pixel
#endif
#if BlueControls == 1
	s1.b = tex2D(s0, tex+c1*float2(BlueShiftLeftToRight, BlueShiftTopToBottom)).b;// base blue pixel
#endif
	return s1;
}
And yes, this sort of shader is useful for fixing the RGB panel convergence in 3-chip projectors where the 3 color producing panels are not in perfect alignment.

Basically a projector can look like this:

https://lowtek.ca/roo/wp-content/upl...03/avsbad3.jpg

Some higher end projectors have internal software corrections for this sort of issue that you can adjust, but others do not, and that's where this shader comes in.

I used his shader that you posted, but that one has complex pixel interpolation and breaks the perfect 1:1 pixel mapping that I wanted. this new simple shader he wrote or me does exactly what I wanted now.


I do have some other ideas for some simple shaders that I want to try making and I may post here if I have questions about how to do something as I start tinkering around with them.

Last edited by SirMaster; 26th September 2020 at 17:49.
SirMaster is offline   Reply With Quote
Old 25th September 2020, 21:48   #24  |  Link
butterw2
Registered User
 
Join Date: Jun 2020
Posts: 204
You just wanted a configurable (x, y) pixel offset on red and blue channels. Very simple indeed using a pixel shader. This could also be used as an effect of sorts, I suppose.

if I can add anything at all, p1 would be a more common name for the pixel offset:
float2 p1: register(c1);
butterw2 is offline   Reply With Quote
Old 30th September 2020, 17:51   #25  |  Link
SirMaster
Registered User
 
Join Date: Feb 2019
Posts: 122
Here is a shader I made for simple 4 way masking.

Code:
#define width 1920.
#define height 1080.

#define left 0
#define right 0
#define top 0
#define bottom 0

sampler s0;

float4 main(float2 tex : TEXCOORD0) : COLOR
{
	if(tex.x >= (left/width) && tex.x <= 1-(right/width) && tex.y >= (top/height) && tex.y <= 1-(bottom/height)) return tex2D(s0, tex);
	return float4(0, 0, 0, 1);
}
This sort of video masking is a feature found in most higher-end projectors as there can be reasons why you may want to mask parts of the image for certain projection screen sizes and shapes and aspect ratios and such.

This is useful for projectors that don't have a built in 4 way masking control.
SirMaster is offline   Reply With Quote
Old 30th September 2020, 18:27   #26  |  Link
butterw2
Registered User
 
Join Date: Jun 2020
Posts: 204
Here's my own optimized implementation from barMask.hlsl (Mode==112)
"if" conditionals are quite inefficient with pixel shaders, so the code uses a boolean insideBox function instead.
All functions in HLSL are inline. An inline function generates a copy of the function body (when compiling) for each function call. #define macro functions are also commonly used.

Quote:
/* Custom Border Mask defined in pixels
(1 texture, 10 arithmetic)
barMask.hlsl (Mode==112)
*/


#define BorderColor 0
#define Top 0
#define Left 0
#define Bottom 0
#define Right 0

sampler s0: register(s0);
float2 p1: register(c1);

bool insideBox(float2 tex, float2 topLeft, float2 bottomRight){
/* returns true if tex coordinates inside the box, returns false otherwise. !insideBox: outsideBox */
float2 s = step(topLeft, tex) - step(bottomRight, tex);
return s.x * s.y;
}

/* --- Main --- */
float4 main(float2 tex: TEXCOORD0): COLOR {
if (insideBox(tex, p1*float2(Left, Top), 1-p1*float2(Right, Bottom))) return tex2D(s0, tex);
return BorderColor;
}

Last edited by butterw2; 30th September 2020 at 20:30. Reason: Code was copy-pasted, hopefully without mistakes
butterw2 is offline   Reply With Quote
Old 30th September 2020, 19:14   #27  |  Link
SirMaster
Registered User
 
Join Date: Feb 2019
Posts: 122
Quote:
Originally Posted by butterw2 View Post
Here's my own optimized implementation from barMask.hlsl (Mode==112)
if conditionals are quite inefficient with pixel shaders.
Oh, awesome!
SirMaster is offline   Reply With Quote
Old 30th September 2020, 21:11   #28  |  Link
SirMaster
Registered User
 
Join Date: Feb 2019
Posts: 122
Quote:
Originally Posted by butterw2 View Post
hlsl pixel shader Intro (dx9)


// This shader is meant to work with linear RGB input and output. Regular R'G'B' with a video gamma encoding will have to be converted with the linear gamma shaders to work properly.
see https://forum.doom9.org/showthread.php?t=181651
There is an overhead to converting to linear gamma, so if the effect is negligible in practice you might want to just ignore it, but it could affect the accuracy of the result.
If this is the only shader you intend to use, you could integrate the linear gamma conversion into the shader. Otherwise, I typically use a 10bit integer surface for processing in mpc-hc to avoid too much precision loss with the shader chain.
I am confused by this part.

I normally have the whole chain set to RGB full range (lav filters / madVR, GPU output, and display).

And everything is set to and calibrated to 2.2 power law gamma.

So do I need to be concerned about doing some sort of linear gamma conversion?

I am not sure why simply shifting the colored pixel channels around and adding black border masking would need to worry about anything like this.
SirMaster is offline   Reply With Quote
Old 30th September 2020, 21:39   #29  |  Link
butterw2
Registered User
 
Join Date: Jun 2020
Posts: 204
# Linear Gamma Conversion https://forum.doom9.org/showthread.php?t=181651 was mentioned in the original LCD calibration source code you linked and yes it can be confusing.

If you are just moving pixels around or masking the frame: it is not necessary.

Then there are many cases where it might be theoretically necessary (any processing of pixel values), but it makes little or no difference to the end result: just ignore.

In a few cases however it is necessary to convert to linear gamma before processing and then gamma encode when you are done.

Incorrect processing is so common however, that the result will sometimes look "strange/wrong" when you go through the trouble/overhead of doing it right.

Last edited by butterw2; 19th October 2020 at 17:46.
butterw2 is offline   Reply With Quote
Old 2nd October 2020, 17:17   #30  |  Link
Alexkral
Registered User
 
Join Date: Oct 2018
Posts: 254
Hi @butterw2

Maybe you can help me with this.

I'm using a Lanczos 3 shader that I found on the internet to do the VDSR scaling in my app. The results are very good, quite similar to AviSynth and Matlab, and using Lanczos instead of Bicubic as is done on the VDSR paper, allows the final result to be much sharper. This is the code:

Code:
/*
   Copyright (C) 2010 Team XBMC
   http://www.xbmc.org
   Copyright (C) 2011 Stefanos A.
   http://www.opentk.com

This Program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.

This Program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this library.

*/

sampler s0 : register(s0);
float4 p2 :  register(c2);

#define get(x, y) tex2D(s0, float2(x, y)).rgb
#define FIX(c) max(abs(c), 1e-5);
static const float PI = 3.141592653;

float3 weight3(float x)
{
    const float radius = 3.0;
    float s1 = FIX(2.0 * PI * (x - 1.5));
    float s2 = FIX(2.0 * PI * (x - 0.5));
    float s3 = FIX(2.0 * PI * (x + 0.5));
    float3 sample = float3(s1, s2, s3);
    return sin(sample) * sin(sample / radius) / (sample * sample);
}

float3 line_run(float ypos, float3 xpos1, float3 xpos2, float3 linetaps1, float3 linetaps2)
{
    return
	get(xpos1.r, ypos) * linetaps1.r +
	get(xpos1.g, ypos) * linetaps2.r +
	get(xpos1.b, ypos) * linetaps1.g +
	get(xpos2.r, ypos) * linetaps2.g +
	get(xpos2.g, ypos) * linetaps1.b +
	get(xpos2.b, ypos) * linetaps2.b; 
}

float4 main(float2 tex : TEXCOORD0) : COLOR
{
    float2 pos = tex + p2.zw * 0.5;
    float2 f = frac(pos / p2.zw);

    float3 linetaps1   = weight3(0.5 - f.x * 0.5);
    float3 linetaps2   = weight3(1.0 - f.x * 0.5);
    float3 columntaps1 = weight3(0.5 - f.y * 0.5);
    float3 columntaps2 = weight3(1.0 - f.y * 0.5);

    float suml = dot(linetaps1, float3(1, 1, 1)) + dot(linetaps2, float3(1, 1, 1));
    float sumc = dot(columntaps1, float3(1, 1, 1)) + dot(columntaps2, float3(1, 1, 1));
    linetaps1 /= suml;
    linetaps2 /= suml;
    columntaps1 /= sumc;
    columntaps2 /= sumc;

    float2 xystart = (-2.5 - f) * p2.zw + pos;
    float3 xpos1 = float3(xystart.x, xystart.x + p2.z, xystart.x + p2.z * 2.0);
    float3 xpos2 = float3(xystart.x + p2.z * 3.0, xystart.x + p2.z * 4.0, xystart.x + p2.z * 5.0);

    return float4(
	line_run(xystart.y              , xpos1, xpos2, linetaps1, linetaps2) * columntaps1.r +
	line_run(xystart.y + p2.w       , xpos1, xpos2, linetaps1, linetaps2) * columntaps2.r +
	line_run(xystart.y + p2.w * 2.0 , xpos1, xpos2, linetaps1, linetaps2) * columntaps1.g +
	line_run(xystart.y + p2.w * 3.0 , xpos1, xpos2, linetaps1, linetaps2) * columntaps2.g +
	line_run(xystart.y + p2.w * 4.0 , xpos1, xpos2, linetaps1, linetaps2) * columntaps1.b +
	line_run(xystart.y + p2.w * 5.0 , xpos1, xpos2, linetaps1, linetaps2) * columntaps2.b, 1.0);
}
Since Sinc resampling is sharper than Lanczos, what I would like now is to try a Sinc shader, but honestly, I don't understand all the math involved. So far I have only tried to change the weight3 function like this:

Code:
float3 weight3(float x)
{
    const float radius = 3.0;
    float s1 = FIX(2.0 * PI * (x - 1.5));
    float s2 = FIX(2.0 * PI * (x - 0.5));
    float s3 = FIX(2.0 * PI * (x + 0.5));
    float3 sample = float3(s1, s2, s3);
    return sin(sample) / sample;
}
This works, but the result is more blurry, so I'm assuming the shader needs a more in-depth change, or even a rewrite.

I'm not even sure anyway that it will serve to improve the results, since the VDSR network is more limited in what it can improve when using good scaling, and the models that I'm using are quite reduced, but I think that it may be worth the try.

Once again thank you in advance for you help.

EDIT: p2 are the dimensions: (width, height, 1/width, 1/height)
__________________
AviSynth AiUpscale

Last edited by Alexkral; 2nd October 2020 at 17:28.
Alexkral is offline   Reply With Quote
Old 2nd October 2020, 22:57   #31  |  Link
butterw2
Registered User
 
Join Date: Jun 2020
Posts: 204
Your AviSynthAiUpscale project seems interesting, VDSR (Very Deep Super Resolution) is a deep learning approach for enlarging an image, but I haven't done anything with pixel shader resizers beyond bicubic so far. The reason being that Lanczos-3 requires a large kernel and is theoretically non separable (2D sinc-windowed sinc: k(r) with r polar coordinate).

Bicubic methods use a (4x4) convolution kernel req 16 texture taps. A simple improvement is to separate the horizontal and vertical passes, bringing it down to 8 taps.

Lanczos-3 (kernel: 2a*2a, a=3) req 36 taps, as demonstrated by your code (36 texture, 199 arithmetic).
Your code comes from xbmc, Kodi. Do you have an url ? Seems similar to https://github.com/xbmc/xbmc/blob/ma...ion-6x6_d3d.fx

However mpc-be has a 2-pass "compensated Lanczos3" implementation resizer_lanczos3_x.hlsl which seems to be 6+6 ? https://sourceforge.net/p/mpcbe/code...anczos3_x.hlsl
I'm assuming you care more about quality than performance, but it may still be worth checking out.

by Sinc method do you mean SincResize from Avisynth ?
"uses the truncated sinc function. It is very sharp, but prone to ringing artifacts. "

Once you have the correct kernel, implementation shouldn't be too challenging.
If you are trying to implement the avisynth method, validation of the shader would be straightforward enough as you just need to compare to the avisynth reference.


# Pixel shader resizers in video players
Video player can use pixel shaders to perform scaling/resize (vs cpu sw scaling or fixed hw gpu scaling). Mpv allows user-defined scaling shaders, enabling the use of high-quality upscalers (a discrete gpu would be recommended for these shaders as they are not lightweight).
Mpc-hc/be does not allow user-defined resize shaders with the EVR-CP renderer: you must use one of the pre-defined resize methods. A pixel shader cannot change the frame resolution, but you can still test a scaling shader, by zooming in to the top left quarter screen for instance, ex: 1080p screen resolution and 1080p input video in fullscreen (no scaling by video player).
A resize shader will map the output coordinates (tex) to the corresponding input pixels to interpolate output pixel values. This is typically implemented using a convolution kernel (a more efficient 2-pass horizontal then vertical resizing approach is possible if the kernel is separable).
Note: resize operations should theoretically be performed in linear gamma.

Last edited by butterw2; 19th October 2020 at 17:47. Reason: +: Pixel shader resize in Video players, mpc-be repository link
butterw2 is offline   Reply With Quote
Old 3rd October 2020, 16:14   #32  |  Link
Alexkral
Registered User
 
Join Date: Oct 2018
Posts: 254
Ok, thanks for your answer anyway.

Quote:
Originally Posted by butterw2 View Post
Your code comes from xbmc, that's from Kodi. Do you have an url ?
Yes, I adapted it from this: https://github.com/libretro/common-s...rs/lanczos6.cg

Quote:
Originally Posted by butterw2 View Post
However mpc-be seems to have a 2-pass "compensated Lanczos3" implementation which seems to be 6+6, look for Resizers\resizer_lanczos3_x.hlsl ?
I didn't know about this, I'll take a look at it.

Quote:
Originally Posted by butterw2 View Post
by Sinc method do you mean SincResize from Avisynth ?
"uses the truncated sinc function. It is very sharp, but prone to ringing artifacts. "
Yes, everything is explained here as well as the other resampling methods, but I find it too complicated, and as I told you I'm not even sure that it can be useful, maybe when I have more time I will try again. I was hoping that since the Sinc kernel is simpler than the Lanczos kernel, maybe it could be done by slightly modifying the Lanczos code, but without really knowing how it works it seems that it will not be possible.
__________________
AviSynth AiUpscale
Alexkral is offline   Reply With Quote
Old 3rd October 2020, 19:00   #33  |  Link
butterw2
Registered User
 
Join Date: Jun 2020
Posts: 204
Quote:
Originally Posted by Alexkral
I was hoping that since the Sinc kernel is simpler than the Lanczos kernel, maybe it could be done by slightly modifying the Lanczos code, but without really knowing how it works it seems that it will not be possible.
It's not fundamentally very complicated, but this code implementation is maybe not the simplest. They use .r, .b, .g instead of [0], [1], [2] for instance.
Switching kernel functions should just work, but performance will not be dramatically improved.
For reference, a 2-pass Catmull-rom resizer only requires (8 texture, 44 arithmetic).


# Convolution Kernels
A convolution kernel k(x, y) is a table of weights of size X*Y (ex: 3x3, 5x5 or 7x7). In a pixel shader, a convolution kernel might be calculated from parameters using the kernel function (if symmetry is present, it can be used to reduce the number of calculations required) or if constant the kernel can be hardcoded. The kernel is typically normalized so that the sum of weights is equal to one.
To calculate each output pixel value, the corresponding X*Y input pixels values are sampled and multiplied by their corresponding kernel weight to finally be summed.
A 2D separable kernel can be written as kh(x)*kv(y)=k(x, y) with kh and kv 1D-kernels. This allows 2D processing to be performed as 2 successive 1D-shader passes resulting in much more efficient operation in the case of large kernels.

Last edited by butterw2; 24th October 2020 at 22:05. Reason: grammar
butterw2 is offline   Reply With Quote
Old 6th October 2020, 18:00   #34  |  Link
butterw2
Registered User
 
Join Date: Jun 2020
Posts: 204
# Split screen display with pixel/fragment shaders:
This simple code modification can be applied to any pixel shader, and is useful to adjust parameter values and visualize what a shader actually does...
ex: Edge detection from Edge Sharpen https://raw.githubusercontent.com/bu...tect_1080p.jpg

Code:
float4 main(float2 tex: TEXCOORD0): COLOR {
color=tex2D(s0, tex); //source pixel
/* horizontal split screen, left-half: no effect, right-half: with effect */
if (tex.x<0.5) return color; 
...
}
It's of course possible to modify the code to display the same half-frame (with and without effect):
if (tex.x<0.5) return tex2D(s0, tex+float2(0.5, 0)); //Comparison Splitscreen
: no effect

Note: This technique can be used pre or post-resize in mpc-hc/be (but post-resize, it only fully works in fullscreen).


it can also be applied to a mpv glsl .hook: https://github.com/butterw/bShaders/...Side.hook.glsl
vec4 hook() {
vec4 color=HOOKED_tex(HOOKED_pos);
if (HOOKED_pos.x<0.5) return color;
...
}

Vertical split screen: if (tex.y<0.5) return color; //top-half: no Effect


For a sharpening filter, enabling/disabling the shader might be more effective to grasp the visual difference.


It's also of course possible to take screenshots for comparisons.

Last edited by butterw2; 29th December 2020 at 12:28. Reason: +bSide.hook -screenshots
butterw2 is offline   Reply With Quote
Old 8th October 2020, 18:52   #35  |  Link
butterw2
Registered User
 
Join Date: Jun 2020
Posts: 204
# Edge Detection

I've posted a couple of edge detection shaders based on convolution kernels at https://gist.github.com/butterw

- Edge_Sharpen.hlsl: mpc-hc Edge Sharpen optimization and luma edge detection mod (GPL v3 licensed)
Screenshot linked in preceding post.
Edge sharpening can help a soft source which is a bit soft without generating artifacts, it is one of the applications of edge detection.

On a clean high-res source, the following luma edge detection methods work very well on text, objects, hair, but maybe not so great on teeth.
- Frei-Chen edge detection in luma (mpv .hook shader)
can be used in combination with NoChroma.hook
Directly processing source luma seems to give better result (vs obtaining luma from rgb).



- Sobel/Canny Edge detection:
https://github.com/butterw/bShaders/bSobel_Edge.hlsl,
I've also made available a mpv .hook port of a glsl version of Sobel (in rgb).
Sobel is the most common algorithm for edge detection. It can be used stand-alone or as part of the Canny algorithm. The final stages of Canny are better implemented on cpu or compute shader however.

Canny edge detection video filter in ffmpeg (cpu): https://ffmpeg.org/ffmpeg-filters.html#edgedetect
mpv --vf=lavfi="[edgedetect=low=0.1:high=0.4]" input.mp4

These filters are also available (apply a grayscale filter on output):
mpv --vf=sobel input.mp4
mpv --vf=prewitt input.mp4
mpv --vf=roberts input.mp4

Last edited by butterw2; 24th October 2020 at 22:03. Reason: +ffmpeg --vf
butterw2 is offline   Reply With Quote
Old 8th October 2020, 20:04   #36  |  Link
Alexkral
Registered User
 
Join Date: Oct 2018
Posts: 254
Quote:
Originally Posted by butterw2 View Post
Switching kernel functions should just work
So I gave this another try (not knowing how it works yet). As I see it, the kernel is in the weight3 function, so I have changed it again like this:

Code:
float3 weight3(float x)
{
    const float radius = 3.0;
    float s1 = 2.0 * PI * (x - 1.5);
    float s2 = 2.0 * PI * (x - 0.5);
    float s3 = 2.0 * PI * (x + 0.5);
    float ret1 = (abs(s1) < 2.0 * radius) ? sin(s1) / s1 : 0;
    float ret2 = (abs(s2) < 2.0 * radius) ? sin(s2) / s2 : 0;
    float ret3 = (abs(s3) < 2.0 * radius) ? sin(s3) / s3 : 0;
    return float3(ret1, ret2, ret3);
}
I don't know why the samples are multiplied by two, but since it seems that it's needed, I also multiply the taps by two. This produces a VERY sharp result, even sharper than AviSynth's SincResize, so I think I'm on the right track. But if I debug the shader, I see 16 texture, 150 arithmetic, so I don't know what I should call it . Maybe it could be Sinc 2 taps? If I change 2.0 * radius to 7.0, the result is very similar and the code compiles to 36 texture, 209 arithmetic.

The metrics are not good but that doesn't matter too much to me, I'll use it to train some networks and then we'll see what the result is.
__________________
AviSynth AiUpscale
Alexkral is offline   Reply With Quote
Old 8th October 2020, 22:39   #37  |  Link
butterw2
Registered User
 
Join Date: Jun 2020
Posts: 204
# Optimizing gpu arithmetic operations

Whether variable x is float or float4, mad (multiply addition) counts as a single operation on gpu, ex: x*x + y, 2*x +3*4.2 (1 arithmetic op)


Weight3 is the kernel function, yes, but the code uses vectorized float3 operations, which can make it more difficult to understand.
I don't know how poorly a naive (brute force) implementation performs. There doesn't seem to be other code available for a single pass lanczos-3 pixel shader.
If you want to try it out, I can upload the 2-pass lanczos-3 shader code from mpc-be (which I've adapted to run in mpc-hc). Texture fetches hit the gpu harder than arithmetic ops, but it is still useful to reduce arithmetic ops when possible.

You need to help out the compiler in some cases, check the compiler output to see if this saves one operation for instance:
float s1 = 2.0 * PI * (x - 1.5);
can be written as:
float s1 = 2 * PI * x - 2 * PI *1.5; //1 arithmetic

Instead of floats s1, s2, s3
you could use one vector operation to calculate float3 s
and then use s.x, s.y, s.z

# Cost of hlsl functions (in arithmetic operations)
Single op: A*B+C (mad), float dot(A, B), frac(A), saturate(A)
2: lerp(A, B, 0.5 ), floor(A), step(A, 0.5), clamp(0.2, 0.8, A)
3: float length(A)
4: fmod(A, 2)
4,5: smoothstep(0, 1.0, A)
trunc(A)
---
ops>8: If multiple operations on float variables are needed, look at packing a vector.
sqrt(A)
sin(A)

Last edited by butterw2; 27th October 2020 at 22:30. Reason: +arithmetic ops cost of functions
butterw2 is offline   Reply With Quote
Old 8th October 2020, 23:55   #38  |  Link
Alexkral
Registered User
 
Join Date: Oct 2018
Posts: 254
Quote:
Originally Posted by butterw2 View Post
If you want to try it out, I can upload the 2-pass lanczos-3 shader code from mpc-be (which I've adapted to run in mpc-hc).
Yes please, that could be helpful, most of what I've learned has been just looking at the code.

Quote:
Originally Posted by butterw2 View Post
Instead of floats s1, s2, s3
you could use one vector operation to calculate float3 s
and then use s.x, s.y, s.z
I needed to change that from the original code (the link I posted) because the max function gave wrong results, but you are right, now it can be done this way.
__________________
AviSynth AiUpscale
Alexkral is offline   Reply With Quote
Old 9th October 2020, 00:58   #39  |  Link
butterw2
Registered User
 
Join Date: Jun 2020
Posts: 204
Compensated Lanczos3 pass X (from mpc-be, adapted for mpc-hc user shader with 2x magnification):
https://gist.github.com/butterw/8190...6bb6f188b9222b
You'll also need the second pass (pass-Y), which is the same as pass-X, but applied to y-axis.

In hlsl, texel centers (tex) are situated at half-pixel coordinates. If you don't need 2x magnification, don't start from my adaptation, start from the original code (just define dxdy as required, I prefer using p (or p1) for this, the pixel widths are then p.x, p.y).

float4 p2: register(c2); //in this application, c2 contains: (width, height, 1/width, 1/height)
#define p p2.zw //pixel widths (p.x, p.y)




# Testing a shader with a png image input
A good way to test a video player shader is to use a png image as input. Output will update automatically when you modify/save the shader and an output screenshot can be saved for comparison if required.
Input: RGB(A) png (preferably of same resolution as the output screen), shader (preferably pre-resize shader), output (fullscreen preferred for screenshot in windows)
The idea is to bypass compression artifacts (png is lossless), yuv2rgb conversion/range expansion, video player scaling, so as to only test the pixel shader.

Last edited by butterw2; 19th October 2020 at 17:40. Reason: clarity
butterw2 is offline   Reply With Quote
Old 9th October 2020, 02:29   #40  |  Link
Alexkral
Registered User
 
Join Date: Oct 2018
Posts: 254
Thanks, I'll have some time in the next few days to take a look at all this, hopefully soon I will understand a little bit more.
__________________
AviSynth AiUpscale
Alexkral is offline   Reply With Quote
Reply

Tags
hlsl, mpc-be, mpc-hc, mpv, pixel shaders

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 15:53.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, vBulletin Solutions Inc.