Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 30th May 2022, 22:24   #121  |  Link
takla
Registered User
 
Join Date: May 2018
Posts: 130
Quote:
Originally Posted by ChaosKing View Post
Maybe because some bugs were fixed!?
Good point. Maybe. I'd like DTL to comment on it in more detail.


One more thing, the script from post #117 has 510.140 KB while the script from post #118 has 510.021 KB. Why is that? Shouldn't they both do the exact same thing? Encoded with
Code:
ffmpeg -y -benchmark -i 01.avs -c:v ffv1 TEST.mkv

Last edited by takla; 30th May 2022 at 22:47.
takla is offline   Reply With Quote
Old 31st May 2022, 00:28   #122  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 520
Quote:
Originally Posted by takla View Post
@DTL fixed it like this:

Code:
Super = Input.ConvertBits(16).MSuper(Pel=Pel, Chroma=Chroma)
Multi_Vector = Super.ConvertBits(8).MAnalyse(Multi=True, Delta=TR, BLKSize=BLKSize, Overlap=Overlap, Chroma=Chroma)
I think it should not work at all. The 'super' clip from MSuper is only semi-compatible with AVS so it must be directly feed to other filters of MVtools. If you try to 'convert 16bit super into 8bit' it may be damaged.
MSuper and MAnalyse outputs are semi-clips and I think can not be processed with any other filters (at todays) and designed to be used only as inputs for other MVtools filters. The only MSuper output can be outputted as AVS data and rendered as a viewable images. But I think it is not mean it can be easily 'converted' from 16bit to 8bit.

Same is about question "the script from post #117 has 510.140 KB while the script from post #118 has 510.021 KB. Why is that? Shouldn't they both do the exact same thing?". Yes - I think 'super' 8bit converted from 16bit 'super' is not the same as 'super' clip created from 16to8bit downconverted 'standard AVS clip' with MSuper().

I hope syntax
Code:
Super8 = Input.ConvertBits(8).MSuper(Pel=Pel, Chroma=Chroma)
Multi_Vector = Super8.MAnalyse(Multi=True, Delta=TR, BLKSize=BLKSize, Overlap=Overlap, Chroma=Chroma)
is 'auto' compatible with any bitdepth input because ConvertBits(8) will pass 8bit formats and downconvert any >8 to 8. So MAnalyse will always receive 8bit 'super8' semi-clip created from 8bit MSuper() input.

It looks
Code:
Multi_Vector = Input.ConvertBits(8).MSuper(Pel=Pel, Chroma=Chroma).MAnalyse(Multi=True, Delta=TR, BLKSize=BLKSize, Overlap=Overlap, Chroma=Chroma)
also works.

Last edited by DTL; 31st May 2022 at 01:01.
DTL is offline   Reply With Quote
Old 31st May 2022, 01:10   #123  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 520
Quote:
Originally Posted by takla View Post
@DTL new issues
mvtools2 from printerf MvTools2 2.7.45 and your newest release is no longer bit identical with same settings (post 117)
And same is with workaround from post 118 ?

Also about >8 bit: It useful only if output of the denoising is also used with increased bitdepth. Like 8bit input and 10bit of more output to appropriate codec with >8bit support. Increasing bitdepth from 8 to 16 before MDegrain and converting back to 8 after MDegrain output possibly makes nothing useful but wastes RAM and speed. MDegrain can use 8bit input naturally and output 16 if it required. Simply set 'out16=true'. I currently do not use HEVC with 10bit for my work so do not use this feature.
DTL is offline   Reply With Quote
Old 31st May 2022, 02:04   #124  |  Link
takla
Registered User
 
Join Date: May 2018
Posts: 130
Quote:
Originally Posted by DTL View Post
And same is with workaround from post 118 ?
Yes. I just double checked:

Note: I used ProRes before, but this here is FFV1.

Code:
2.7.46-a.10
510:021 KB

2.7.46-a.05
513.247 KB

2.7.45
513.247 KB
Here are the exact settings:

Code:
LWLibavVideoSource("C:\Users\Admin\Documents\01.mkv")
Trim(0, 600)
EZdenoise(HBD=true)
ConvertBits(10, dither=1)
Prefetch(12, 48)
Code:
function EZdenoise(clip Input, int "thSAD", int "thSADC", int "TR", int "BLKSize", int "Overlap", int "Pel", bool "Chroma", bool "HBD")
{
thSAD = default(thSAD, 150)
thSADC = default(thSADC, thSAD)
TR = default(TR, 3)
BLKSize = default(BLKSize, 8)
Overlap = default(Overlap, 4)
Pel = default(Pel, 1)
Chroma = default(Chroma, false)
HBD = default(HBD, false)

Super = Input.MSuper(Pel=Pel, Chroma=Chroma)
Super8 = Input.ConvertBits(8).MSuper(Pel=Pel, Chroma=Chroma)
Multi_Vector = Super8.MAnalyse(Multi=True, Delta=TR, BLKSize=BLKSize, Overlap=Overlap, Chroma=Chroma)

Input.MDegrainN(Super, Multi_Vector, TR, thSAD=thSAD, thSAD2=thSAD/2, thSADC=thSADC, thSADC2=thSADC/2, out16=HBD)
}
Code:
ffmpeg -y -benchmark -i 01.avs -c:v ffv1 EZ.mkv
And thanks for the explanation on msuper & manalyze. Makes sense that they do not care about bit depth, except for compatibility.

Last edited by takla; 31st May 2022 at 03:49.
takla is offline   Reply With Quote
Old 31st May 2022, 12:06   #125  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 520
Here is a test build for testing : (Updated )

It have optimizations disabled for 16bit (>8bit) SAD functions. Currently it is the only fast way to save from crash in release build. Debug build works but too slow and can not show the source of crash. May process slower. I hope no other (8bit) processing speed of MAnalyse is touched. It is fast but not perfect fix. Also as a bonus it have 2 C++ builds for SSE2 and AVX2 CPUs. May be AVX2 is a bit faster at AVX2 CPU.

If testing will be acceptable I will place next pre-release to github.

About not bit-exact results with old 2.7.45 - it may be one of many small changes to MDegrainN finally works a bit different to old release. Will try to look later.

Update: https://drive.google.com/file/d/1lkm...ew?usp=sharing

It is even more strange - the crash happens only with SSE2-targeted build inside SSE2 intrinsic-based function. With AVX2-targeted it looks working with all optimizations enabled. It may be some complex bug inside mvtools or in current used version of MSVS compiler ? So in updated archive link the SSE2 build is with optimizations for SAD >8 bit disabled and AVX2 build is 'normal'.

"thSAD2=thSAD/2"

I do not think it is good internal default. Typically thSAD should be 'just a small above noise level' . So setting thSAD2 too low by default you either lost many useful frames in tr-scope or force user to increase tr-value too high to have more neibour frames being covered with high enough thSAD near current frame (and it will cause slower speed processing). If user will raise thSAD high enough to get thSAD2 not too small - it may cause additional blurring or detail lost. I typically set thSAD2 manually just a few below thSAD. In float math it is about thSAD2=0.9*thSAD. I not test how it processed by AVS+ scripting. May be ToInt(0.9*thSAD) required if MDegrain will not accept float value or something else. Or let user to enter thSAD2 and thSADC2 too. I think the feature to allow lower thSAD at the edges of tr-range is mostly to decrease artifacts if artifacts happens. If not - the best for denoising and speed with current tr-value is thSAD2 close to thSAD.

Last edited by DTL; 31st May 2022 at 13:48.
DTL is offline   Reply With Quote
Old 31st May 2022, 21:22   #126  |  Link
takla
Registered User
 
Join Date: May 2018
Posts: 130
What do you want me to test with the test build exactly? Here are the encoding times:

Code:
AVX2
time=15.412s
SSE2
time=40.264s
Unfortunately thSAD2=0.9*thSAD is not a valid parameter. I changed thSAD2's to 135 manually, which is the same value. It made the encoding faster by 0.5 seconds on a 25 seconds clip (not applied in the encodings measured above) which is nice.

Code:
function EZdenoise(clip Input, int "thSAD", int "thSAD2", int "thSADC", int "thSADC2", int "TR", int "BLKSize", int "Overlap", int "Pel", bool "Chroma", bool "HBD")
{
thSAD = default(thSAD, 150)
thSAD2 = default(thSAD, 135)
thSADC = default(thSADC, thSAD)
thSADC2 = default(thSADC, thSAD2)
TR = default(TR, 3)
BLKSize = default(BLKSize, 8)
Overlap = default(Overlap, 4)
Pel = default(Pel, 1)
Chroma = default(Chroma, false)
HBD = default(HBD, false)

Super = Input.MSuper(Pel=Pel, Chroma=Chroma)
Super8 = Input.ConvertBits(8).MSuper(Pel=Pel, Chroma=Chroma)
Multi_Vector = Super8.MAnalyse(Multi=True, Delta=TR, BLKSize=BLKSize, Overlap=Overlap, Chroma=Chroma)

Input.MDegrainN(Super, Multi_Vector, TR, thSAD=thSAD, thSAD2=thSAD2, thSADC=thSADC, thSADC2=thSADC2, out16=HBD)
}
takla is offline   Reply With Quote
Old 31st May 2022, 23:00   #127  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 520
"What do you want me to test with the test build exactly?"

If it not crash at your system with your 16bit (>8bit) clips (frame size, colour format, etc).

"AVX2
time=15.412s
SSE2
time=40.264s"

Oh - it is so great difference. The non-optimized SSE2 really very slow. I think it will be only a bit slower because it uses SSE2 intrinsics internally. I hope not many users will run nowdays at SSE2 only CPUs with >8bits MAnalyse. To found what is wrong with 'normally optimized' SSE2 builds may take unknown time.

"Unfortunately thSAD2=0.9*thSAD is not a valid parameter. "

I think more comfortable to user to enter short 'far end thSAD multiplier' as script param and it can be applied to both thSAD2 and thSADC2 equally. Something like

Code:
function EZdenoise(clip Input, int "thSAD", float "far_thSAD_mul", int "TR", int "thSADC", int "BLKSize", int "Overlap", int "Pel", bool "Chroma", bool "HBD")
{
thSAD = default(thSAD, 150)
thSADC = default(thSADC, thSAD)
far_thSAD_mul = default(far_thSAD_mul, 0.9)
thSAD2 = Int(thSAD * far_thSAD_mul)
thSADC2 = Int(thSADC * far_thSAD_mul)
In real use it is easy to call like EZdenoise(200, 0.8, 10). Not set 4 params of th-s in some fixed ratio between values every time when need to adjust 'base thSAD'.

Name far_thSAD_mul is not short and nice - may be something shorter possible.

Also HBD is not clear about 'internal conversion' of 8bit input to 16bit output. May be better name 'toHBD'.

Last edited by DTL; 31st May 2022 at 23:05.
DTL is offline   Reply With Quote
Old 1st June 2022, 00:11   #128  |  Link
takla
Registered User
 
Join Date: May 2018
Posts: 130
Thanks. It no longer crashes.
And I take the 0.5seconds speed gain back. Just modern CPU boosting which caused the variance...
On HBD: I only added it because of the crash with my usual settings. But since that is fixed now I'll remove it again.
Also I never wanted to expose thSAD2's anyways, for the same reason you mentioned.

And thank you for showing me how to add the falloff multiplier.

This is what I'm using now:

Code:
LWLibavVideoSource("C:\Users\Admin\Documents\01.mkv")
Trim(0, 600)
ConvertBits(16)
EZdenoise()
ConvertBits(10, dither=1)
Prefetch(12, 48)
Code:
function EZdenoise(clip Input, int "thSAD", int "thSADC", int "TR", int "BLKSize", int "Overlap", int "Pel", bool "Chroma", float "Falloff")
{
thSAD = default(thSAD, 150)
thSADC = default(thSADC, thSAD)
TR = default(TR, 3)
BLKSize = default(BLKSize, 8)
Overlap = default(Overlap, 4)
Pel = default(Pel, 1)
Chroma = default(Chroma, false)
Falloff = default(Falloff, 0.9)

Super = Input.MSuper(Pel=Pel, Chroma=Chroma)
Multi_Vector = Super.MAnalyse(Multi=True, Delta=TR, BLKSize=BLKSize, Overlap=Overlap, Chroma=Chroma)

Input.MDegrainN(Super, Multi_Vector, TR, thSAD=thSAD, thSAD2=Int(thSAD*Falloff), thSADC=thSADC, thSADC2=Int(thSADC*Falloff))
}

Last edited by takla; 1st June 2022 at 00:36.
takla is offline   Reply With Quote
Old 2nd June 2022, 12:52   #129  |  Link
takla
Registered User
 
Join Date: May 2018
Posts: 130
@DTL
What is your opinion on Vulkan based video processing? Are you aware of it? And yes I realize it does not explicitly talk about motion estimation (because it is probably missing?).

My reason for bringing it up is you talked about MVtools based denoising on a capture device and I too thought about this before. The thing is, such a device would probably not support DX12. Realistically, the camera footage is send to a smartphone. And that platform would support Vulkan on Android or Metal on iOS.

There is also vkFFT, which could also be used for denoising.
takla is offline   Reply With Quote
Old 2nd June 2022, 17:29   #130  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 520
"(because it is probably missing?"

Yes - it looks Vulkan developers still not reach the ME-API as a service of hardware video encoder exposed to other applications. It looks very rarely need by anyone even todays. Only Microsoft understand it and add to DirectX API. Also I think no other (using general purpose computation units in accelerator) implementation of ME available as free to use library in Vulkan.

"on a capture device"

Not on a capture hardware interface card directly. But as a feature of live capture host or pass-through host with a function of live MDegrain. Practically its data flow will be from capture interface card to accelerator card and via CPU to output card. Or with industry transition to IP-based streams - from software API to receive stream - to accelerator and back to API so send stream via IP. Using hardware IP adapters.

"that platform would support Vulkan on Android or Metal on iOS."

I think current smartphones uses same methods of noise reduction as mvtools in hardware. But I not read if any API is exposed to user applications. It may be deeply in the hardware units for camera data processing. So in a perfect world it may be even possible to connect set/rig of smartphones as hardware accelerators to some PC host via USB and use as external hardware accelerated denoisers. The most of money now looks put in the quality of hardware accelerated denoising in smartphones so they quickly got nice results. But it may be covered by patents and not exposed as API for external user applications.

In the current phase of dying civilization the end-users home desktop PCs are dying too. And current buyers can simply put money to smartphone with good denoising if required to shoot new footage. The pro broadcast cameras looks also progress with internal denoisers as I see in 2022 from Tallinn Europe Skate championship broadcast. So it looks any investment in hardware accelerated denoising for home desktop PCs is not any profitable nowdays.

Last edited by DTL; 2nd June 2022 at 17:41.
DTL is offline   Reply With Quote
Old 3rd June 2022, 02:57   #131  |  Link
takla
Registered User
 
Join Date: May 2018
Posts: 130
IIRC denoising depends on the camera app itself. There are some apps which let you disable most postprocessing. But yes, it is unclear if they use fixed functions or CPU.

And yes, if you can do good denoising internally, I can see why no one wants to spent dev time for a desktop solution.
takla is offline   Reply With Quote
Old 16th June 2022, 12:27   #132  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 520
Finally some working tech demo of 2 different processing of MSuper/MDegrainN for pel > 1 : https://drive.google.com/file/d/1E8i...ew?usp=sharing . Only for chroma=false processing or RAM usage/speed testing. Chroma=true with UseSubShift>0 in MDegrainN still outputs some buggy blocks sometime. AVX2 build only.

Test scripts :
Code:
LoadPlugin("mvtools2.dll")

ColorBars(3840,2160, pixel_type="YV12")

Trim(0,1000)

tr=12
super=MSuper(last,chroma=true, mt=false, pel=4, hpad=16, vpad=16, levels=1, pelrefine=false)
multi_vec=MAnalyse (super, multi=true, blksize=8, delta=tr, overlap=0, chroma=false, mt=false, optSearchOption=5, optPredictorType=0,levels=1)
MDegrainN(last,super, multi_vec, tr, thSAD=175, thSAD2=160, mt=false,wpow=7, UseSubShift=1)

Prefetch(6)
vs

Code:
LoadPlugin("mvtools2.dll")

ColorBars(3840,2160, pixel_type="YV12")

Trim(0,1000)

tr=12
super=MSuper(last,chroma=true, mt=false, pel=4, hpad=16, vpad=16, levels=1, pelrefine=true)
multi_vec=MAnalyse (super, multi=true, blksize=8, delta=tr, overlap=0, chroma=false, mt=false, optSearchOption=5, optPredictorType=0,levels=1)
MDegrainN(last,super, multi_vec, tr, thSAD=175, thSAD2=160, mt=false,wpow=7, UseSubShift=0)

Prefetch(6)
At i5-9600K + GTX1060 the second (standard mvtools MSuper/MDegrainN) takes about 7+ GB RAM and runs about 1.5 fps. The new internal sub-shifting method for MDegrainN of single full-size frame takes about 1.8 GB RAM and run at about 7.5 fps. Unfortunately the decreasing of RAM usage with pel=4 is not 16x times lower but only about 4 times. May be super clips are not largest in caching in AVS+.

Last edited by DTL; 16th June 2022 at 12:56.
DTL is offline   Reply With Quote
Old 16th June 2022, 18:08   #133  |  Link
takla
Registered User
 
Join Date: May 2018
Posts: 130
tested with 720p (I'll test 4K in a bit)

pelrefine=false with UseSubShift=1 looks like this (after). Uses 1860MB

pelrefine=true with UseSubShift=0 looks normal. Uses 2880MB

And please do not use ColorBars for testing. I'll share a 4K sample in an hour or so...

Last edited by takla; 16th June 2022 at 18:17.
takla is offline   Reply With Quote
Old 16th June 2022, 18:32   #134  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 520
"UseSubShift=1 looks like this (after)."

I know the output quality is completely bad. It is only test for RAM usage and speed at the use cases like https://forum.doom9.org/showthread.p...65#post1962965 . Where it exhaust 32 GB of RAM with tr about 10.
DTL is offline   Reply With Quote
Old 16th June 2022, 18:48   #135  |  Link
takla
Registered User
 
Join Date: May 2018
Posts: 130
Oh, ok.

Edit:
PelRefine=true & UseSubShift=0 uses 5908MB

PelRefine=false & UseSubShift=1 uses 3731MB

Here is the sample

Not full "4K", cropped to 1600 pixels in height, but still.

by the way, can you tell me why my script crashes? I ended up testing with your settings instead.

Quote:
function EZdenoise(clip Input, int "thSAD", int "thSADC", int "TR", int "BLKSize", int "Overlap", int "Pel")
{
thSAD = default(thSAD, 150)
thSADC = default(thSADC, thSAD)
TR = default(TR, 12)
BLKSize = default(BLKSize, 8)
Overlap = default(Overlap, 0)
Pel = default(Pel, 4)

Super = Input.MSuper(Pel=Pel, Chroma=true, Levels=1, PelRefine=false)
Multi_Vector = Super.MAnalyse(Multi=true, Delta=TR, BLKSize=BLKSize, Overlap=Overlap, Chroma=false, Levels=1, optSearchOption=5, optPredictorType=0)

Input.MDegrainN(Super, Multi_Vector, TR, thSAD=thSAD, thSAD2=int(float(thSAD*0.9)), thSADC=thSADC, thSADC2=int(float(thSADC*0.9)), UseSubShift=1)
}

Last edited by takla; 16th June 2022 at 20:00.
takla is offline   Reply With Quote
Old 16th June 2022, 21:25   #136  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 520
"can you tell me why my script crashes? "

That was not good debugged build and real far outside frame motion vectors may cause crash. So the first test was with not-noised colourbars and larger padding of 16 to save from crash better. Though it was not very good test because not-noised colourbars should produce zero motion vectors and sub-sample shifting is not used so speed may be better. Though the test shows significant difference in speed even in with static colourbars (may be 4K pel=4 super shifted planes creation with old method pelrefine=true also loads CPU/memory a lot too).

Here is possibly better protected from that crash version so should run with real content with default padding of 8 -
https://drive.google.com/file/d/1oIy...ew?usp=sharing

"PelRefine=true & UseSubShift=0 uses 5908MB
PelRefine=false & UseSubShift=1 uses 3731MB"

What was the real frame size and threads number ? Can you adjust Prefetch() to match your CPU cores number ? What is the fps difference ?

"Not full "4K", cropped to 1600 pixels in height,"

If you do not have full-frame 4K source you can put simple fast resize like BilinearResize(3840,2160) before degraining.

"Here is the sample"

I test with your source file sample and FFMS2 source with a full script:
Code:
LoadPlugin("mvtools2.dll")
LoadPlugin("ffms2.dll")


function EZdenoise(clip Input, int "thSAD", int "thSADC", int "TR", int "BLKSize", int "Overlap", int "Pel")
{
thSAD = default(thSAD, 150)
thSADC = default(thSADC, thSAD)
TR = default(TR, 12)
BLKSize = default(BLKSize, 8)
Overlap = default(Overlap, 0)
Pel = default(Pel, 4)

Super = Input.MSuper(Pel=Pel, Chroma=true, Levels=1, PelRefine=false)
Multi_Vector = Super.MAnalyse(Multi=true, Delta=TR, BLKSize=BLKSize, Overlap=Overlap, Chroma=false, Levels=1, optSearchOption=5, optPredictorType=0)

Input.MDegrainN(Super, Multi_Vector, TR, thSAD=thSAD, thSAD2=int(float(thSAD*0.9)), thSADC=thSADC, thSADC2=int(float(thSADC*0.9)), UseSubShift=1)
}

FFmpegSource2("sample.mkv")

ConvertBits(8)
ConvertToYV12()

EZdenoise(TR=12)

Prefetch(6)
At my CPU i5-9600K with 6 cores with 6 threads it run AVSmeter with about 5600M RAM and about 2.6 fps. Without crash (at least at first about 100 frames) with latest build. The total letterboxed frame size in a sample I see is 3840x2160 that is enough for test.
With pelrefine=true and UseSubShift=0 it looks start to swap taking about 10..11+ GB RAM and so fps drops to about 0.3.

It looks with filesource filter and some simple intermediate like convertbits and converttoyv12 the used RAM for AVS+ caching difference is even smaller. Though still reach about 2x.

I also tried to play with combination of SetCacheMode(1) and different Prefetch(6,N):

With lowest possible Prefetch(6,1) and pelrefine=true usesubshift=0 I got RAM usage about stable 8400M but still low fps about 0.3.

With Prefetch(6,2) and pelrefine=false usesubshift=1 RAM usage is about 4700M and speed about 1.2 fps.
With not-defined N in Prefetch(6,N) the RAM usage is just a bit higher about 5200M and speed quickly reach 2.5+ fps.

Last edited by DTL; 16th June 2022 at 22:14. Reason: corrected link to new build version
DTL is offline   Reply With Quote
Old 17th June 2022, 01:23   #137  |  Link
takla
Registered User
 
Join Date: May 2018
Posts: 130
No longer crashing, thanks.

With
Quote:
LWLibavVideoSource("C:\Users\Admin\Downloads\SAMPLE.mkv")
ConvertBits(8)
Crop(0, 280, -0, -280)
BilinearResize(3840, 2160)
EZdenoise()
Prefetch(12, 12)
I get 4082MB & 6293MB (only encoding the first 12 frames in avspmod). During actual encoding, 2.6 FPS and 1.3 FPS respectively. Memory usage is much higher during encoding in ffmpeg, with UseSubShift=0 randomly spiking from 8GB to 22GB.

With SetCacheMode(1) memory usage is halved but so is the speed (FPS)

With just CPU, RAM usage is much more stable and stays below 7GB at all times. FPS is about 2.7

Last edited by takla; 17th June 2022 at 01:31.
takla is offline   Reply With Quote
Old 18th June 2022, 09:38   #138  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 520
Thank you for testing and providing results.

There also an idea about simulating of overlap MDegrainN processing with single non-overlap output of MAnalyse - using interpolating of motion vectors with half-block offset. May be simple mean MV of the 4 surrounding MVs and also mean of SAD. It will not be as precise as true shifted motion search but may good enough simulate overlap processing to hide currently sometime visible blockiness while keeping speed at good value and not require to use 2 hardware accelerators to keep speed.

For speed of development the overlapping may be made inside AVS scripting (or BlockOverlap fizik's filter).

The interpolation may be made inside MDegrainN. Possible processing may be like
Code:
mvs_clip=MAnalyse(overlap=0)
std=MDegrainN(mvs_clip)
shifted=MDegrainN(mvs_clip, interpolateoverlap=true)
BlockOverlap(std, shifted)

Last edited by DTL; 18th June 2022 at 09:47.
DTL is offline   Reply With Quote
Old 2nd July 2022, 13:19   #139  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 520
New build: https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.11 SSE2/AVX2 builds by VisualStudio2019 16.3.

Added UseSubShift param to MDegrainN and pelrefine=true/false to MSuper. Default = 0, set to 1 to enable.

If pelrefine=false in MSuper - all other filters must use UseSubShift=true (or optSearchOption=5 for MAnalyse).

Also redesigned MDegrainN no-overlap processing to single pass YUV formats processing - looks like also added to performance. Old MDegrainN uses each plane separate processing that cause 3 times more DegrainWeight() and norm_weights() calls with equal data. Also separate planes processing not allow to reuse subshifted blocks in both SAD re-check in MVLPF processing followed by MDegrainN processing efficiently

Currently only YV12 format for block size 8x8 (8x8 Y and 4x4 UV) is fully accelerated with AVX2 for sub-shifting. All other will fallback to C-reference that is slow. Also only 8bit is now fully supported.

There is very experimental UseSubShift=1 for MAnalyse to test possible speed (may be unstable and crash with chroma=true , only valid for optPredictorType=1) in 'onCPU' MAnalyse with pel > 1. I still not expect it may be faster in compare with pre-computed planes form MSuper (at least before more fast AVX512 subshift implementations will be designed) but may help to run with larger threads count or tr-value at the limited RAM hosts.

Also to make speed of MVLPF better by reusing of subshifted blocks from MVPlane - the redesigned to single pass processing use_block_yuv() function in MDegrainN is added. It looks helps also in other processing modes performance. Currently only non-overlap single pass colour formats processing in MDegrainN is implemented (no external switches - auto detecting if YUV format input and overlapH=overlapV=0). For overlapped processing single-pass mode is also possible but need more time and more complex design.

If hardware ME accelerator is not available it is possible to enable some 'fastest' mode of MAnalyse to check pel=4 processing without badly limited speed by full-processing mode in MAnalyse:
Code:
tr=15
super=MSuper(last, mt=false, chroma=true, pel=4)
multi_vec=MAnalyse(super, multi=true, blksize=8, delta=tr, search=3, searchparam=2, overlap=0, optSearchOption=1, optPredictorType=4, chroma=false, mt=false)
MDegrainN(last,super, multi_vec, tr, thSAD=250, thSAD2=240, mt=false, UseSubShift=1)
At i5-11600 it runs at 15.6 vs 20.4 fps with UseSubShift=0 or 1 for MDegrainN for about FullHD frame size.

Updated: _2 version from 02.07.22 with fixed bug.

Last edited by DTL; 2nd July 2022 at 19:40.
DTL is offline   Reply With Quote
Old 8th July 2022, 15:58   #140  |  Link
anton_foy
Registered User
 
Join Date: Dec 2005
Location: Sweden
Posts: 415
Quote:
Originally Posted by DTL View Post
New build: https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.11 SSE2/AVX2 builds by VisualStudio2019 16.3.

Added UseSubShift param to MDegrainN and pelrefine=true/false to MSuper. Default = 0, set to 1 to enable.

If pelrefine=false in MSuper - all other filters must use UseSubShift=true (or optSearchOption=5 for MAnalyse).

Also redesigned MDegrainN no-overlap processing to single pass YUV formats processing - looks like also added to performance. Old MDegrainN uses each plane separate processing that cause 3 times more DegrainWeight() and norm_weights() calls with equal data. Also separate planes processing not allow to reuse subshifted blocks in both SAD re-check in MVLPF processing followed by MDegrainN processing efficiently

Currently only YV12 format for block size 8x8 (8x8 Y and 4x4 UV) is fully accelerated with AVX2 for sub-shifting. All other will fallback to C-reference that is slow. Also only 8bit is now fully supported.

There is very experimental UseSubShift=1 for MAnalyse to test possible speed (may be unstable and crash with chroma=true , only valid for optPredictorType=1) in 'onCPU' MAnalyse with pel > 1. I still not expect it may be faster in compare with pre-computed planes form MSuper (at least before more fast AVX512 subshift implementations will be designed) but may help to run with larger threads count or tr-value at the limited RAM hosts.

Also to make speed of MVLPF better by reusing of subshifted blocks from MVPlane - the redesigned to single pass processing use_block_yuv() function in MDegrainN is added. It looks helps also in other processing modes performance. Currently only non-overlap single pass colour formats processing in MDegrainN is implemented (no external switches - auto detecting if YUV format input and overlapH=overlapV=0). For overlapped processing single-pass mode is also possible but need more time and more complex design.

If hardware ME accelerator is not available it is possible to enable some 'fastest' mode of MAnalyse to check pel=4 processing without badly limited speed by full-processing mode in MAnalyse:
Code:
tr=15
super=MSuper(last, mt=false, chroma=true, pel=4)
multi_vec=MAnalyse(super, multi=true, blksize=8, delta=tr, search=3, searchparam=2, overlap=0, optSearchOption=1, optPredictorType=4, chroma=false, mt=false)
MDegrainN(last,super, multi_vec, tr, thSAD=250, thSAD2=240, mt=false, UseSubShift=1)
At i5-11600 it runs at 15.6 vs 20.4 fps with UseSubShift=0 or 1 for MDegrainN for about FullHD frame size.

Updated: _2 version from 02.07.22 with fixed bug.
What you do with mvtools is very interesting and I love the progress you make, much respect. Yesterday I tried your latest build and it worked quite well with the one clip I tested. Some details were smeared/blurred compared to my tests with to my prefiltered tests with the pinterf mvtools but I only tested one problematic clip yet. Overall I am very positive to this build you made since even without prefiltering the lines and objects in high grain clip did not get the usual dancing/wobblyness that is very annoying. I cannot wait until you release it for HBD and blocksize above 8. The source I have been testing it with is 8-bit slog2 4K (3840x2160). With your script above I get about 0.9fps with my intel i5 3570 3.4ghz, 32gb ram, nvidia geforce GTX 970. Mostly I found with pinterf mvtools to experiment with different prefiltering techniques since I have 4 quite different test clips in 4k that are very hard to denoise with the same script. After many months I have found something after trial and error that seems to work pretty well for everything but with your build I think it will improve alot in the future. Thanks again for your great efforts!
anton_foy is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 07:00.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, vBulletin Solutions Inc.