Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
25th June 2023, 00:21 | #1 | Link |
Registered User
Join Date: Apr 2018
Posts: 51
|
AVS Softlight
Realization of cuda soflight negative average blend.
Example Brightness example Plugin is x64 (cuda toolkit 12.3 & 11.8) You could see on youtube videos about removing color cast using photoshops softlight blend of negative average. This is a cuda realization of it that process every frame. Input is YUV420, YUV444, RGB32(BGR - avisynth only). There are different modes: Softlight(mode) Mode 0 is default. I'll explain first mode in detail. Steps done in first mode: 1. YUV->RGB conversion 2. Calculates sums of all pixels in R,G,B planes (for each). 3. Get average from these sums (sum / number of pixels). 4. Get negative from this sum (255 - sum) 5. Use softlight blend of each plane with above negative. After this step we have same as photoshop does. But brightness of frame will be changed. To have brightness intact we need to restore it to original. That what other steps do. 6. We get HSV planes. V plane from orignal image (RGB > V). And HS from result after softlight. Then we do HS(changed) + V(original) -> RGB -> YUV So first mode will neutralize only colors (hue + saturation) in frame and not brightness (volume). Also keep in mind that you need to remove black bars in video for correct processing (if there are any). Or they will affect average sum. 1 mode: Same as mode 0 but planes S & V restored to their original values. So this mode only normalizes lightness/brightness and does not change colors. 2 mode: Same as mode 0. But plane S is also boosted (softlight is done for each pixel with itself). 3 mode: Same as first but without brightness restoration. Use it if you want to make brigtness also average (makes dark frames brighter). 4 mode: Same as mode 3 but each of RGB planes are boosted using softlight. 5 mode: YUV->RGB->softlight each RGB plane with itself->YUV (color/contrast boost) 6 mode: YUV->RGB->HSV->boost S->RGB->YUV 8 mode: TV to PC color range conversion (use it on videos where you see no total black and only grays). 10 mode: Grayscale. For RGB32 - this mode uses RGB -> YUV444 -> RGB cuda conversion. U & V planes are set to 128. For YUV - just U & V planes are set to 128 without cuda. You can use 3 different softlight formulas: formula = 0,1,2 0 - pegtop 1 - illusions.hu 2 - W3C In my opinion - pegtop fomula is the best. Also mode 1 & mode 3 are my favourite. Photoshop formula was removed because of discontinuity of local contrast. Formulas are explained here: https://en.wikipedia.org/wiki/Blend_modes Usage AviSynth: Softlight() same as SoftLight(0,0) same as SoftLight(mode=0,formula=0) Usage VapourSynth: video = core.Argaricolm.Softlight(video) or core.Argaricolm.Softlight(video,mode=0,formula=0) Download at github If you use some modes one after another its better to convert color space to RGB before. You can use convertorgbnv & converttoyuvnv functions from my imagesourcenv plugin. Like this: ConvertToRGBNV() Softlight(8) Softlight(3) ConvertToYUVNV() This way modes will not jump like: YUV -> RGB -> YUV -> RGB - YUV. Last edited by Argaricolm; 24th April 2024 at 02:51. Reason: New version |
25th June 2023, 10:29 | #2 | Link |
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,040
|
Would not do any harm to post a few Mode before/after example images.
Postimages.org allows to embed images in your post, without needing Postimages.org account (and dont need to wait for mods approval) Postimages.org:- https://postimages.org/ Use, "thumbnail" or "image" for forum, modes. [copies url to clipboard, just paste in your post] EDIT: If you do post images, I'll try remember to delete this post.
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 25th June 2023 at 10:32. |
25th June 2023, 14:12 | #3 | Link |
Registered User
Join Date: Oct 2001
Location: Germany
Posts: 7,547
|
@StainlessS: here's an example: https://imgsli.com/MTg4MTEz
script used: Code:
ClearAutoloadDirs() SetFilterMTMode("DEFAULT_MT_MODE", MT_MULTI_INSTANCE) LoadPlugin("F:\Hybrid\64bit\Avisynth\avisynthPlugins\LSMASHSource.dll") Import("F:\Hybrid\64bit\Avisynth\avisynthPlugins\mtmodes.avsi") LoadPlugin("c:\Users\Selur\Desktop\Softlight.dll") # loading source: G:\TestClips&Co\files\MPEG-4 H.264\Canon 5D RAW.mp4 # color sampling YV12@8, matrix: bt709, scantyp: progressive, luminance scale: limited LWLibavVideoSource("G:\TestClips&Co\files\MPEG-4 H.264\Canon 5D RAW.mp4",cache=false,format="YUV420P8", prefer_hw=0) org=last Softlight(mode=X) Interleave(org.Subtitle("Original"), last.Subtitle("Softlight(mode=X)")) # current resolution: 1920x1080 PreFetch(16) # output: color sampling YV12@8, matrix: bt709, scantyp: progressive, luminance scale: limited return last Any plans to also allow RGB input and high bit depth support? Cu Selur Last edited by Selur; 25th June 2023 at 14:21. |
25th June 2023, 19:08 | #4 | Link | |
Registered User
Join Date: Apr 2018
Posts: 51
|
Quote:
Vapoursynth - never compiled for it. If much needed I can do it. For high bit depth I'm not sure. If I will be able to change softlight code for it - then possible. |
|
25th June 2023, 18:33 | #5 | Link |
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,040
|
Cheers Selur, nice comparison method.
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? |
25th June 2023, 19:36 | #6 | Link |
Registered User
Join Date: Oct 2001
Location: Germany
Posts: 7,547
|
More supported color spaces are always better, since it give more freedom.
Vapoursynth would be great, since I mainly use Vapoursynth. https://forum.doom9.org/showthread.php?t=182961 might help with supporting both Avisynth and Vapoursynth. Cu Selur |
17th March 2024, 07:08 | #11 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,218
|
To sum all samples of the frame at SIMD there are possible several ways:
1. Sum at integer - require unpacks of 8..16 samples to 32bits and use summing of standard SIMD full width * superscalar factor of sum dispatch ports first for all samples of a line. Because it looks 32bit integer can not hold UHD frame samples number * 256..65535 samples values sum without overflow - it is possible to make intermediate division of intermediate sums for each line and accumulate normalized sums of the all lines of a frame. It is more complex to program in compare with float32 processing but maybe visibly faster for SD 8bit and some HD frame sizes. 2. Make unpack and convert to float32 and perform all of 1 in float32 domain. So best performance implementation can have different processing engines inside for different frame sizes. At least 1920x1080 with 8bit still can be processed with integer full frame summing without 32bit accumulator overflow. Also with SIMD word summing programmer anyway have partial sums at the final SIMD word ready to partial normalizing with some more overflow protection (AVX2 SIMD word of 8 32bit integers provides additional +3bits to overflow so total capacity is 32+3=35bits) and without significant precision loss. Method 2 can process any frame sizes in single engine but expected to be slower at non-UHD frame sizes. CPU SIMD is not very slow - but algorithm requires at least 2 full frame passes: first analisys pass of sum and second is correction pass of adjustment so performance will depend on frame size fitting in availavle CPU caches (our lovely Xeon MAX with HBM onboard will be nice performer here). Last edited by DTL; 17th March 2024 at 07:12. |
3rd July 2023, 23:27 | #14 | Link | ||||
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,040
|
Quote:
Code:
double __cdecl PVF_AverageLuma_Planar(const PVideoFrame &src,const int xx,const int yy,const int ww,const int hh,const bool altscan) { const int ystep = (altscan) ? 2:1; const int pitch = src->GetPitch(PLANAR_Y); const int ystride = pitch*ystep; const BYTE *srcp = src->GetReadPtr(PLANAR_Y) + (yy * pitch) + xx; __int64 acc = 0; unsigned int sum = 0; const int yhit = (altscan) ? (hh +1)>>1 : hh; const unsigned int Pixels = (ww * yhit); if(ww == 1) { // Special case for single pixel width for(int y=yhit ; --y>=0;) { sum += srcp[0]; srcp+= ystride; } } else { const int eodd = (ww & 0x0F); const int wm16 = ww - eodd; for(int y=yhit; --y>=0 ;) { switch(eodd) { case 15: sum += srcp[wm16+14]; case 14: sum += srcp[wm16+13]; case 13: sum += srcp[wm16+12]; case 12: sum += srcp[wm16+11]; case 11: sum += srcp[wm16+10]; case 10: sum += srcp[wm16+9]; case 9: sum += srcp[wm16+8]; case 8: sum += srcp[wm16+7]; case 7: sum += srcp[wm16+6]; case 6: sum += srcp[wm16+5]; case 5: sum += srcp[wm16+4]; case 4: sum += srcp[wm16+3]; case 3: sum += srcp[wm16+2]; case 2: sum += srcp[wm16+1]; case 1: sum += srcp[wm16+0]; case 0: ; } for(int x=wm16; (x-=16)>=0 ; ) { sum += ( srcp[x+15] + srcp[x+14] + srcp[x+13] + srcp[x+12] + srcp[x+11] + srcp[x+10] + srcp[x+ 9] + srcp[x+ 8] + srcp[x+ 7] + srcp[x+ 6] + srcp[x+ 5] + srcp[x+ 4] + srcp[x+ 3] + srcp[x+ 2] + srcp[x+ 1] + srcp[x+ 0] ); } if(sum & 0x80000000) {acc += sum;sum=0;} // avoid possiblilty of overflow srcp += ystride; } } acc += sum; double dacc = double(acc); return dacc / Pixels; } Not that slow really. Similar method for other colorspace in "PVF_ ... " files. EDIT: The switch stuff only accounts for ww pixel width, does not take srcp memory alignment stuff into account, so could be improved to better use compiler vectorization type stuff {probably require additional switch thingy for end cases}. If always full frame {no coords}, then could take some shortcuts. {Avisynth+ frames LHS always aligned, not so for Avs standard 'in place' cropping} EDIT: Might be handy, [from here:- https://forum.doom9.org/showthread.p...61#post1935661 ] Code:
Function PitchTortureTest(clip c) { # IanB:- https://forum.doom9.org/showthread.php?p=1628159#post1628159 c A=SelectEvery(4, 0) B=SelectEvery(4, 1).AddBorders(0,0,8,0).Crop(0,0,-8,0) C=SelectEvery(4, 2).AddBorders(0,0,16,0).Crop(0,0,-16,0) D=SelectEvery(4, 3).AddBorders(2,0,22,0).Crop(2,0,-22,0) Interleave(A,B,C,D) } EDIT: from OP, Quote:
From posted link for IanB thingy thread, here:- https://forum.doom9.org/showthread.p...16#post1935616 Quote:
Code:
if(invert) { // invert ? if(tvy) { // tv levels invert ? [ TV levels center is 125.5 not 127.5, ie (16 + 235)/2 ] ave = int(-(ave_D - 125.5) + 125.5 + 0.5); // TV_YUV Y mid = 125.5, invert, and Round } else { ave = int(ave_D + 0.5) ^ 0xFF; // PC_YUV Y mid = 127.5, symmetrical about 127.5 [EDIT: ADDED, or ave = 255 - int(ave_D + 0.5)] } aveU ^= 0xFF ; aveV ^= 0xFF; } else { Quote:
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 4th July 2023 at 20:12. |
||||
4th July 2023, 03:45 | #15 | Link |
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,040
|
Further to above,
Code:
W = 3 * 1280 # 3840 H = 3 * 720 # 2160 STATICFRAMES = False SECONDS = 5 * 60 ### FRAMES = Round(29.97 * SECONDS) # SECONDS seconds @ 29.97 FPS. STATICFRAMES, If set to false, generate all frames. Default true (one static frame is served) Colorbars(Width=W,Height=H,pixel_type="YV12",staticframes=STATICFRAMES).Trim(0,-FRAMES) # Comment one of below out Return Scriptclip("AverageLuma() return last") # Avs+ builtin. Always full frame. #Return Scriptclip("RT_AverageLuma() return last") # RT_Stats, RT_AverageLuma. Used to be faster than AVS 2.60 Standard. Code:
c:\Z>avsmeter64 test.avs AVSMeter 3.0.9.0 (x64), (c) Groucho2004, 2012-2021 AviSynth+ 3.7.3 (r3996, master, x86_64) (3.7.3.0) Number of frames: 8991 Length (hh:mm:ss.ms): 00:05:00.000 Frame width: 3840 Frame height: 2160 Framerate: 29.970 (30000/1001) Colorspace: YV12 Audio channels: 2 Audio bits/sample: 32 (Float) Audio sample rate: 48000 Audio samples: 14399985 Frames processed: 8991 (0 - 8990) FPS (min | max | average): 554.7 | 890.6 | 843.1 Process memory usage (max): 104 MiB Thread count: 16 CPU usage (average): 7.9% Time (elapsed): 00:00:10.664 Code:
c:\Z>avsmeter64 test.avs AVSMeter 3.0.9.0 (x64), (c) Groucho2004, 2012-2021 AviSynth+ 3.7.3 (r3996, master, x86_64) (3.7.3.0) Number of frames: 8991 Length (hh:mm:ss.ms): 00:05:00.000 Frame width: 3840 Frame height: 2160 Framerate: 29.970 (30000/1001) Colorspace: YV12 Audio channels: 2 Audio bits/sample: 32 (Float) Audio sample rate: 48000 Audio samples: 14399985 Frames processed: 8991 (0 - 8990) FPS (min | max | average): 220.2 | 382.7 | 329.0 Process memory usage (max): 104 MiB Thread count: 14 CPU usage (average): 8.0% Time (elapsed): 00:00:27.328 4K : AVS+ AverageLuma : STATICFRAMES = TRUE Code:
c:\Z>avsmeter64 test.avs AVSMeter 3.0.9.0 (x64), (c) Groucho2004, 2012-2021 AviSynth+ 3.7.3 (r3996, master, x86_64) (3.7.3.0) Number of frames: 8991 Length (hh:mm:ss.ms): 00:05:00.000 Frame width: 3840 Frame height: 2160 Framerate: 29.970 (30000/1001) Colorspace: YV12 Audio channels: 2 Audio bits/sample: 32 (Float) Audio sample rate: 48000 Audio samples: 14399985 Frames processed: 8991 (0 - 8990) FPS (min | max | average): 2335 | 3244 | 3082 Process memory usage (max): 81 MiB Thread count: 16 CPU usage (average): 6.9% Time (elapsed): 00:00:02.917 Code:
c:\Z>avsmeter64 test.avs AVSMeter 3.0.9.0 (x64), (c) Groucho2004, 2012-2021 AviSynth+ 3.7.3 (r3996, master, x86_64) (3.7.3.0) Number of frames: 8991 Length (hh:mm:ss.ms): 00:05:00.000 Frame width: 3840 Frame height: 2160 Framerate: 29.970 (30000/1001) Colorspace: YV12 Audio channels: 2 Audio bits/sample: 32 (Float) Audio sample rate: 48000 Audio samples: 14399985 Frames processed: 8991 (0 - 8990) FPS (min | max | average): 511.8 | 859.5 | 801.5 Process memory usage (max): 81 MiB Thread count: 16 CPU usage (average): 7.9% Time (elapsed): 00:00:11.218 Suggest steal some of his code [I will not tell him]. Also note, Scriptclip would slow it quite a bit compared with GetFrame() in plugin. We did not assign AverageLuma to variable in scriptclip, as that would likely greatly affect results. However, FPS for the RT_AverageLuma aint so very bad for 4K, and you would likely want pure C version anyway. EDIT: Numbers above on i7-8700. [No Prefetch {also Scriptclip single thread}, so I assume fully single core numbers] EDIT: Yep, Resource meter seems to show single core in use.
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 4th July 2023 at 04:12. |
Thread Tools | Search this Thread |
Display Modes | |
|
|