Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
23rd January 2021, 14:17 | #1 | Link |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,904
|
Offloading to GPU
Hi there, folks,
out of curiosity, this is one of the scripts I'm currently running automatically on several files on a farm with two servers which have an Intel Xeon 28c/56th and 64 GB of RAM each and I'm quite happy with the results I've got as output, however it's currently working at 0.6fps and encoding it's not the issue, Avisynth is. Is there anything (besides the Indexer) that I can offload to the GPU? I know about DGDecodeNV and I'm also a "customer" (Donald knows ) but what about the other filters I'm using? As to encoding itself, I'm not really planning to use GPU encoding for the output due to quality concerns, so... is there anything inside Avisynth that I can offload to GPU? (side note: Input is v210 lossless .avi 720x576 25i so nothing hard to index, you know...) Code:
#Indexing video=FFVideoSource("\\mibcssda001\Media Ingest\00_INGEST_MAM\A.R.C.A\00_FILE_DA_ENCODARE\file.avi") ch1=FFAudioSource("\\mibcssda001\Media Ingest\00_INGEST_MAM\A.R.C.A\00_FILE_DA_ENCODARE\file.avi", track=1) ch2=FFAudioSource("\\mibcssda001\Media Ingest\00_INGEST_MAM\A.R.C.A\00_FILE_DA_ENCODARE\file.avi", track=2) ch3=FFAudioSource("\\mibcssda001\Media Ingest\00_INGEST_MAM\A.R.C.A\00_FILE_DA_ENCODARE\file.avi", track=3) ch4=FFAudioSource("\\mibcssda001\Media Ingest\00_INGEST_MAM\A.R.C.A\00_FILE_DA_ENCODARE\file.avi", track=4) audio=MergeChannels(ch1, ch2, ch3, ch4, ch1, ch2, ch3, ch4) AudioDub(video, audio) #Bob-deinterlacing AssumeTFF() QTGMC( Preset="Placebo") #Bring everything to 16bit planar HBD=ConvertBits(m_clip, bits=16) #Convert to 4:2:2 planar 16bit c=Converttoyuv422(HBD, matrix="Rec601") #De-Sport 16bit planar SpotLess(c) #Degrain in 16bit planar super = MSuper(pel=2, sharp=1) bv1 = MAnalyse(super, isb = true, delta = 1, overlap=4) fv1 = MAnalyse(super, isb = false, delta = 1, overlap=4) bv2 = MAnalyse(super, isb = true, delta = 2, overlap=4) fv2 = MAnalyse(super, isb = false, delta = 2, overlap=4) degrain=MDegrain2(super,bv1,fv1,bv2,fv2,thSADC=1200, thSAD=1200) #Spatial denoise 16bit planar denoise=dfttest(degrain, sigma=64, tbsize=1, lsb_in=false, lsb=false, Y=true, U=true, V=true, dither=0) #Adding borders for 1.33 PB 4:3 with 16bit planar precision borders=AddBorders(denoise, 152, 0, 152, 0) #Upscale to FULL HD with Spline64 + NNEDI and 16bit planar precision resized=nnedi3_rpow2(borders, cshift="Spline64ResizeMT", rfactor=2, fwidth=1920, fheight=1080, nsize=4, nns=4, qual=1, etype=0, pscrn=2, threads=56, csresize=true, mpeg2=true, threads_rs=0, logicalCores_rs=true, MaxPhysCore_rs=true, SetAffinity_rs=false) #From 16bit planar to 16bit interleaved interleaved=ConvertToDoubleWidth(resized) #Matrix Conversion from BT601 to BT709 with 16bit interleaved precision color=Matrix(interleaved, from=601, to=709, rg=1.0, gg=1.0, bg=1.0, a=16, b=235, ao=16, bo=235, bitdepth=16) #From 16bit interleaved to 16bit planar planar=ConvertFromDoubleWidth(color) #Dithering from 16bit planar to 8bit planar with the Floyd-Steinberg error diffusion dithered=ConvertBits(planar, bits=8, dither=1) #Limiter TV Range 0.0 - 0.7V m_clip=Limiter(dithered, min_luma=16, max_luma=235, min_chroma=16, max_chroma=240) Return m_clip |
23rd January 2021, 15:32 | #2 | Link |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,815
|
By the way, no prefetch in script?
Code:
#Indexing video=FFVideoSource("\\mibcssda001\Media Ingest\00_INGEST_MAM\A.R.C.A\00_FILE_DA_ENCODARE\file.avi") ch1=FFAudioSource("\\mibcssda001\Media Ingest\00_INGEST_MAM\A.R.C.A\00_FILE_DA_ENCODARE\file.avi", track=1) ch2=FFAudioSource("\\mibcssda001\Media Ingest\00_INGEST_MAM\A.R.C.A\00_FILE_DA_ENCODARE\file.avi", track=2) ch3=FFAudioSource("\\mibcssda001\Media Ingest\00_INGEST_MAM\A.R.C.A\00_FILE_DA_ENCODARE\file.avi", track=3) ch4=FFAudioSource("\\mibcssda001\Media Ingest\00_INGEST_MAM\A.R.C.A\00_FILE_DA_ENCODARE\file.avi", track=4) audio=MergeChannels(ch1, ch2, ch3, ch4, ch1, ch2, ch3, ch4) AudioDub(video, audio) #Bob-deinterlacing AssumeTFF() QTGMC( Preset="Placebo") #Bring everything to 16bit planar HBD=ConvertBits(m_clip, bits=16) #Convert to 4:2:2 planar 16bit c=Converttoyuv422(HBD, matrix="Rec601") #De-Sport 16bit planar SpotLess(c) #Degrain in 16bit planar super = MSuper(pel=2, sharp=1) bv1 = MAnalyse(super, isb = true, delta = 1, overlap=4) fv1 = MAnalyse(super, isb = false, delta = 1, overlap=4) bv2 = MAnalyse(super, isb = true, delta = 2, overlap=4) fv2 = MAnalyse(super, isb = false, delta = 2, overlap=4) degrain=MDegrain2(super,bv1,fv1,bv2,fv2,thSADC=1200, thSAD=1200) #Spatial denoise 16bit planar denoise=dfttest(degrain, sigma=64, tbsize=1, lsb_in=false, lsb=false, Y=true, U=true, V=true, dither=0) #Adding borders for 1.33 PB 4:3 with 16bit planar precision borders=AddBorders(denoise, 152, 0, 152, 0) #Upscale to FULL HD with Spline64 + NNEDI and 16bit planar precision resized=nnedi3_rpow2(borders, cshift="Spline64ResizeMT", rfactor=2, fwidth=1920, fheight=1080, nsize=4, nns=4, qual=1, etype=0, pscrn=2, threads=56, csresize=true, mpeg2=true, threads_rs=0, logicalCores_rs=true, MaxPhysCore_rs=true, SetAffinity_rs=false) #From 16bit planar to 16bit interleaved interleaved=ConvertToDoubleWidth(resized) #Matrix Conversion from BT601 to BT709 with 16bit interleaved precision color=Matrix(interleaved, from=601, to=709, rg=1.0, gg=1.0, bg=1.0, a=16, b=235, ao=16, bo=235, bitdepth=16) #From 16bit interleaved to 16bit planar planar=ConvertFromDoubleWidth(color) #Dithering from 16bit planar to 8bit planar with the Floyd-Steinberg error diffusion dithered=ConvertBits(planar, bits=8, dither=1) #Limiter TV Range 0.0 - 0.7V m_clip=Limiter(dithered, min_luma=16, max_luma=235, min_chroma=16, max_chroma=240) #Prefetch m_clip=Prefetch(m_clip,28) Return m_clip
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper Last edited by Atak_Snajpera; 23rd January 2021 at 15:47. |
23rd January 2021, 16:14 | #3 | Link |
Registered User
Join Date: Jan 2012
Location: Mesopotamia
Posts: 2,587
|
there are https://github.com/nekopanda/AviSynthPlus/releases (I think avs+ can do same of it work after 3.6 update)
and there are some CUDA plugins here https://github.com/nekopanda/AviSynthCUDAFilters (don't know if they can work in avs+ 3.6) and aside from all that, maybe we need someone backport opencl versions of plugins from VS like https://github.com/HomeOfVapourSynth...Synth-NNEDI3CL (the SEt avs one is closed source and no one can update it since SEt is no longer active) and there are some plugins have both CL and CPU functions in the same plugin like https://github.com/HomeOfVapourSynth...urSynth-TCanny Asd already backport it to avs but only for the CPU function!
__________________
See My Avisynth Stuff Last edited by real.finder; 23rd January 2021 at 16:17. |
24th January 2021, 10:47 | #5 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,904
|
The only reason why I didn't add Prefetch is that I know that some of the plugins create their own thread pool, like plugins_JPSDR which I'm using in my filterchain, so I don't know how it's gonna behave, but if you think it's gonna behave nicely, I'll add it.
Quote:
Note: not my pictures We've also noticed a deterioration of the binders in a magnetic tape which hold the iron oxide magnetic coating to its plastic carrier. Some people suggested dehydrating them in a carefully controlled manner, but we don't have the tools to do that, anyway for now it seems they're playing someway, somehow, so it might as well be the last time they play. They're in horrible conditions and a very strong denoise and degrain is needed (oh and I checked, I don't get ghosting, except when the ball is sometimes removed in tennis matches, but I encode them with different parameters to solve the problem, so it's not a big deal. ) Not that there are many details in those contents, but I'll give it a shot. Last edited by FranceBB; 24th January 2021 at 10:51. |
|
24th January 2021, 18:54 | #8 | Link |
Registered User
Join Date: Mar 2017
Location: Germany
Posts: 234
|
Just if you are interested:
We use NeatVideo as best solution for this kind of noise, but I forgot: VERY carefully... In amost all cases we turn it to only 5% spatial heights (mids and lows zero!), and temporally 2 or 3 frames. So it provides the best temporal noise remover I know up to now. In many cases we also mix back some of the original noise (overlay, transparency ~0.3) to avoid wax-effect. |
24th January 2021, 19:06 | #9 | Link |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,815
|
Yeah, forget about QTGMC placebo and just use medium. Anything above that is a waste of time and electricity. Regarding prefetch ,i recommend using number of physical cores first instead of going straight to number of total supported threads. You may also reduce number of threads in nnedi to 2 or even 1.
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper Last edited by Atak_Snajpera; 24th January 2021 at 19:10. |
28th January 2021, 11:17 | #10 | Link |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,904
|
Ok, I tried with Prefetch and I gotta say, I'm not impressed at all...
If anything, I'm surprised 'cause it's even slower than without it... I tried limiting NNEDI to 1 thread and also removing it completely from the filter chain, but nothing, in all my tests, I dropped from 0.3-0.5fps without Prefetch to 0.1fps with Prefetch at 28... EDIT: Lowering Prefetch down to 8 or 6 allows me to get the very same speed I usually get without Prefetch, so 0.3fps... It's not really worth it... I'm not gonna be using Prefetch! (Keep in mind that it's a 28c/56th Xeon, so I expected much better from it...) Last edited by FranceBB; 28th January 2021 at 12:00. |
28th January 2021, 12:10 | #11 | Link | |
Registered User
Join Date: Jan 2014
Posts: 2,314
|
Quote:
|
|
28th January 2021, 13:53 | #12 | Link |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,904
|
Setting SetMemoryMax(128000) so 128 GB, which is the maximum available RAM on the other server and Prefetch to 28, it goes all the way up to 21 GB of RAM, then it goes down to 14 GB, then it goes up to 21 GB, then it drops to 14 GB in a loop.
The speed however is the same: 0.1fps. With Prefetch 2 the RAM is steady and way lower and the speed is 0.3fps, so about the same as I get without Prefetch. This is definitely weird... |
28th January 2021, 15:53 | #13 | Link |
Registered User
Join Date: Jan 2014
Posts: 2,314
|
The bottleneck is TemporalMedian in Spotless.
TemporalMedian works internally by histograms, bit depth heavily affects the speed. Checking only 256 levels is much quicker than doing it with a histogram array size of 65536. First I have modded the plugin to use SSE2 for 16 bit videos. Presently only 8 bit videos have SSE2 in TemporalMedian, 10+ bit depths are using plain C. (Untested, did not put it in live code) It got quicker but not that much. Then I tried feeding MedianBlur with only a 10 bit clip. I recommend you trying this option. EDIT: specify directly threads=1 for dfttest when using Prefetch. Its default value is 0, which means that it is using num_processors internal threads. When thread count is not 1, this filter has MT_SERIALIZED behaviour instead of MT_MULTI_INSTANCE. Last edited by pinterf; 28th January 2021 at 17:50. Reason: dfttest |
30th January 2021, 14:19 | #14 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,904
|
Quote:
Code:
if (threads < 0 || threads > 16) env->ThrowError("dfttest: threads must be between 0 and 16 (inclusive)!"); |
|
30th January 2021, 14:42 | #15 | Link |
Registered User
Join Date: Jan 2014
Posts: 2,314
|
I don't know.
Back to Spotless: the way TemporalMedian is used (radius=0, temporal radius=1) is highly unoptimal in present plugin, I'm considering optimizing this special case. You could also try z_ConvertFormat instead of Matrix, it can combine the colorspace the bit depth conversion and dithering. |
30th January 2021, 14:58 | #16 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,904
|
Quote:
I'll try to replace it with z_ConvertFormat so that I don't have to go to 16bit interleaved and come back. That should speed things up even further. |
|
30th January 2021, 16:45 | #17 | Link |
Registered User
Join Date: Jan 2014
Posts: 2,314
|
I've just tried the above mentioned special use case (radius=0, temporal radius=1) with an optimized TemporalMedian version.
Breaking the script after Spotless: With the original DLL version the script run at 0.37fps. Then I developed AVX2 into TemporalMedian (still the generic approach) and it reached 0.57fps. Good. But this special case separation resulted in a huge speed gain, now I'm getting 3.08fps. A significant change. AAA+ Green Label I'm doing some more checks then I release it in some days. |
30th January 2021, 21:25 | #18 | Link |
Registered User
Join Date: Jan 2014
Posts: 2,314
|
Please test with this one: MedianBlur2 new version.
https://github.com/pinterf/MedianBlur2/releases/tag/1.1 Code:
- 1.1 (20210130) - pinterf - Speed: SSE2 and AVX2 for 10+ bits (generic case, MedianBlur) - Speed: SSE2 and AVX2 for TemporalMedianBlur - Speed: Much-much quicker: TemporalMedianBlur special case: temporal radius=1 or 2, spatial radius=0 (C, SSE4.1, AVX2) - Pass frame properties when Avisynth interface>=8 - Debug helper parameter 'opt': integer default -1 <0: autodetect CPU 0: C only (disable SSE2 and AVX2) 1: SSE2 (disable SSE4.1 and AVX2) 2: SSE4 (disable AVX2) 3: AVX2 |
30th January 2021, 21:26 | #19 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,904
|
Quote:
It would speed things up a lot considering that this filterchain is here to stay in the foreseeable future in our server! Thanks!! I really look forward to try it and put it in production! |
|
|
|