Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
4th July 2017, 01:21 | #321 | Link |
I'm Siri
Join Date: Oct 2012
Location: void
Posts: 2,633
|
Your original post was "asm shit is fast", and I been saying, intrinsics are equally fast, without having to use an assembler
Obviously you realized that, then you changed your point to, "you can't automatically convert raw asm to intrinsics" Get a room with Katie already, troll |
4th July 2017, 01:45 | #322 | Link | |
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
|
Quote:
What's wrong with using an assembler? Do you realize that for example the speed of libx264 is based on its highly efficient asm code? My point is that there is perfectly good and fast asm code in mvtools2 (32 and 64 bit). Having this converted to intrinsics would be good but it's a lot of work. |
|
4th July 2017, 17:08 | #324 | Link |
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
|
I'm not sure to which question you're referring, Mystery's or feisty's. Either way, nothing wrong with having a discussion. I still don't quite understand feisty's troll accusation but I suppose there was some kind of misinterpretation of something I posted...
|
4th July 2017, 17:42 | #325 | Link |
Excessively jovial fellow
Join Date: Jun 2004
Location: rude
Posts: 1,100
|
The main benefit of intrinsics over handwritten assembly is that it's easier to write and maintain, as well as easier to integrate into your C++ stuff (such as templates - a lot of the VS multi-bitdepth stuff uses templated intrinsics). A minor bonus is that you don't need a separate assembler in addition to your regular compiler. However, if you already have a bunch of well tested and functioning .asm (in separate files, not some inline monstrosity pain in the rear) and that you have no intention of changing, then porting to intrinsics is just a lot of busy-work that's probably going to introduce a lot of new and exciting bugs. Not even the VS port of MVTools has gotten rid of all the .asm files, because there was simply no need. New code has been ported to intrinsics though.
|
4th July 2017, 19:12 | #326 | Link | |
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
|
Quote:
|
|
10th July 2017, 08:40 | #327 | Link |
Registered User
Join Date: Feb 2003
Location: Russia, Moscow
Posts: 854
|
strange behaviour last MVTools
pinterf for update.
Now some my scripts work stable. But I see strange behaviour when open scripts in VirtualDubMod, I do not see error message if I writen script with error related to MVTools functions, during this VirtatualDub hung and not response. I can not close video in Vitualdub, only close Vitualdub. Job control also do not work. Please advice. yup. |
11th July 2017, 09:45 | #328 | Link | |
Registered User
Join Date: Jan 2014
Posts: 2,314
|
Quote:
Asm vs intrinsics. SAD and SATD code (which are the most important routines regarding mvtools2 speed) written in intrinsics is _much_ slower than using existing asm, I'm talking about VS2015/2017 code generator. I have experienced the opposite case as well when the generated code from intrinsics is faster than the original asm (experienced in FFT3DFilter and TIVTC) perhaps because of smarter instruction ordering. Even a C version can be faster than the old asm (TIVTC). I usually have a look at the generated assembler code of the intrinsics. There are cases when the optimizer uses too many xmm registers, so the prolog/epilog register save/restore (which we cannot control) takes significant time relative to the actual task, as experienced in 16 bit SAD intrinsics routines. I had to play with less-than-optimal loop unrolling until I found out the fastest result for a particular SAD blocksize. |
|
11th July 2017, 10:11 | #329 | Link | |
Registered User
Join Date: Jan 2014
Posts: 2,314
|
Quote:
In all other cases (such as for block size 16x16) the routines from the FFTW3 library are used. I don't know which fftw3 version are you using (i can see 3.3.6 as the latest one in http://www.fftw.org/ ), perhaps you could try comparing different versions. |
|
12th July 2017, 05:21 | #330 | Link | |
Soul Architect
Join Date: Apr 2014
Posts: 2,559
|
Quote:
Still the same problem. Stuck at 37% CPU usage. It is the libfftw3f-3.dll file in C:\Windows\SysWOW64, correct? If I use BlkSize=8, I get 47% CPU usage.
__________________
FrameRateConverter | AvisynthShader | AvsFilterNet | Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer Last edited by MysteryX; 12th July 2017 at 05:25. |
|
13th July 2017, 14:04 | #331 | Link |
Registered User
Join Date: Jun 2006
Posts: 397
|
What's supposed to happen if FFTW is missing?
QTGMC seems to work without it. Is it because MvTools2 doesn't always need it or something else? And can it load libfftw3f-3.dll from the same directory as mvtools2.dll instead of the system dir? AvsMeter says the FFTW DLL cannot be loaded, but maybe it only looks for it in the system directory. |
13th July 2017, 14:45 | #332 | Link | |
Registered User
Join Date: Jan 2012
Location: Mesopotamia
Posts: 2,587
|
Quote:
you can load FFTW DLL by using this, x64 here
__________________
See My Avisynth Stuff |
|
20th July 2017, 03:28 | #334 | Link | |
Registered User
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 744
|
When I use this hello_hello script (# 301) in 16 bits:
Quote:
|
|
20th July 2017, 04:50 | #335 | Link |
Registered User
Join Date: Mar 2011
Posts: 4,829
|
GMJCZP,
Delta and TR have to be the same. From the help file: MDeGrainN has a temporal radius given by the tr parameter, and uses a special motion vector clip. tr Temporal radius, > 0. Must match the mvmulti content, i.e. the delta parameter in MAnalyse. |
20th July 2017, 16:04 | #337 | Link |
Registered User
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 744
|
Now I have another problem, I do not present the image correctly if I do not use f3kdb and DitherPost together, I'm still a rookie at this 16-bit:
Code:
Dither_convert_8_to_16() Temporalsoften(2,1,2,mode=2,scenechange=10) dither_resize16(720,480,kernel="spline16",invks=true,invkstaps=3,src_left=0.0,u=3,v=3) MDegrainLight(2,lsb=true,thSAD=200) f3kdb(range=15, grainY=0, grainC=0, keep_tv_range=True, input_depth=16, output_depth=8) DitherPost() # MDegrainLight # https://forum.doom9.org/showthread.php?p=1810543#post1810543 # Original idea by hello_hello function MDegrainLight(clip input, int "tr", bool "mt", bool "lsb", int "thSAD", int "thSAD2", int "blksize", int "overlap") { tr = Default(tr, 1) # Temporal radius mt = Default(mt, true) # Internal multithreading lsb = Default(lsb, false) # 16-bit thSAD = Default(thSAD, 200) # Denoising strength thSAD2 = Default(thSAD2, 150) blksize = Default(blksize, 16) # Block size overlap = Default(overlap, 4) # Block overlap super = input.MSuper (mt=mt) multi_vec = MAnalyse (super, mt=mt, multi=true, blksize=blksize, overlap=overlap, delta=tr) input.MDegrainN (super, multi_vec, tr, mt=mt, lsb=lsb, thSAD=thSAD, thSAD2=thSAD2) return last } Can anyone please explain to me if I am redundant with DitherPost, or if my script is correct? Or is there a way to use only, or f3kdb or DitherPost? Last edited by GMJCZP; 20th July 2017 at 17:37. |
20th July 2017, 19:04 | #338 | Link |
Registered User
Join Date: Jan 2016
Posts: 79
|
1. Afaik, TemporalSoften does not support 16 bit stacked input, so you should apply it before the dither_convert_8_to_16 call.
2. I don't think MDegrainN takes in 16 bit stacked input. It can only output it using lsb=true. (No lsb_in parameter) 3. Ditherpost simply turns a 16 bit clip into an 8 bit clip. In your f3kdb call, you already output 8 bit video so you don't need ditherpost. Alternatively, you could change f3kdb's output_depth to 16, and then ditherpost would work as expected. Last edited by blaze077; 20th July 2017 at 19:12. |
20th July 2017, 21:07 | #339 | Link | ||
Registered User
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 744
|
1. Temporal soften does not have anything to do with the problem.
2. The script works perfectly as it is, the problem is if I use f3kdb and DitherPost together, I said it before. 3. I repeat it again, if I do not use DitherPost the video is poorly displayed. Anyway thanks for the reply. I repeat my doubt, Is my script okay and I'm not messing with DitherPost?, Because I can not get an f3kdb command that correctly displays the image. EDIT: I solved the problem, the MVTools documentation says: Quote:
Quote:
Last edited by GMJCZP; 20th July 2017 at 21:25. |
||
20th July 2017, 21:26 | #340 | Link |
Registered User
Join Date: Jan 2016
Posts: 79
|
I just tried to run your script and the cause is indeed your MDegrainLight function.
As you know, 16 bit stacked is double the height of the normal video (MSB and LSB). You pass a 16 bit stacked clip to MDegrainN, but MDegrain does not have any way of knowing that you passed a 16 bit stacked clip to it. It just assumes that you gave it an 8 bit clip and processes it accordingly. Since you pass lsb=true to MDegrainN, it tries to convert the already 16 bit stacked clip to 16 bit stacked again. The result is that your video is now 4 times it's normal height! With the f3kdb call, the video is back to double height and with the ditherpost call, it is back to normal height. A solution can be to call ditherpost() before all the MVTools calls (MSuper, analyze and degrainN) inside your MDegrainLight function. |
|
|