Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 4th July 2017, 01:21   #321  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: void
Posts: 2,633
Your original post was "asm shit is fast", and I been saying, intrinsics are equally fast, without having to use an assembler
Obviously you realized that, then you changed your point to, "you can't automatically convert raw asm to intrinsics"
Get a room with Katie already, troll
feisty2 is offline   Reply With Quote
Old 4th July 2017, 01:45   #322  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by feisty2 View Post
Your original post was "asm shit is fast", and I been saying, intrinsics are equally fast, without having to use an assembler
Obviously you realized that, then you changed your point to, "you can't automatically convert raw asm to intrinsics"
Get a room with Katie already, troll
I didn't change my point, you missed it. Also, you didn't write "intrinsics are equally fast, without having to use an assembler", you wrote "use intrinsics instead" which is very vague and implies that this could be done instantly.

What's wrong with using an assembler? Do you realize that for example the speed of libx264 is based on its highly efficient asm code?

My point is that there is perfectly good and fast asm code in mvtools2 (32 and 64 bit). Having this converted to intrinsics would be good but it's a lot of work.
Groucho2004 is offline   Reply With Quote
Old 4th July 2017, 13:04   #323  |  Link
tebasuna51
Moderator
 
tebasuna51's Avatar
 
Join Date: Feb 2005
Location: Spain
Posts: 6,915
Quote:
Originally Posted by feisty2 View Post
...
Get a room with Katie already, troll
Please guys stop that way.

The question is clear, stop the discussion.
__________________
BeHappy, AviSynth audio transcoder.
tebasuna51 is offline   Reply With Quote
Old 4th July 2017, 17:08   #324  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by tebasuna51 View Post
Please guys stop that way.

The question is clear, stop the discussion.
I'm not sure to which question you're referring, Mystery's or feisty's. Either way, nothing wrong with having a discussion. I still don't quite understand feisty's troll accusation but I suppose there was some kind of misinterpretation of something I posted...
Groucho2004 is offline   Reply With Quote
Old 4th July 2017, 17:42   #325  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,100
The main benefit of intrinsics over handwritten assembly is that it's easier to write and maintain, as well as easier to integrate into your C++ stuff (such as templates - a lot of the VS multi-bitdepth stuff uses templated intrinsics). A minor bonus is that you don't need a separate assembler in addition to your regular compiler. However, if you already have a bunch of well tested and functioning .asm (in separate files, not some inline monstrosity pain in the rear) and that you have no intention of changing, then porting to intrinsics is just a lot of busy-work that's probably going to introduce a lot of new and exciting bugs. Not even the VS port of MVTools has gotten rid of all the .asm files, because there was simply no need. New code has been ported to intrinsics though.
TheFluff is offline   Reply With Quote
Old 4th July 2017, 19:12   #326  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by TheFluff View Post
However, if you already have a bunch of well tested and functioning .asm (in separate files, not some inline monstrosity pain in the rear) and that you have no intention of changing, then porting to intrinsics is just a lot of busy-work that's probably going to introduce a lot of new and exciting bugs. Not even the VS port of MVTools has gotten rid of all the .asm files, because there was simply no need.
That was exactly my point. I think pinterf replaced most (if not all) inline asm with intrinsics so the remaining problem seems to be that some people have trouble producing a few .obj files using yasm/nasm. It's just bizarre.
Groucho2004 is offline   Reply With Quote
Old 10th July 2017, 08:40   #327  |  Link
yup
Registered User
 
Join Date: Feb 2003
Location: Russia, Moscow
Posts: 854
strange behaviour last MVTools

pinterf for update.
Now some my scripts work stable.
But I see strange behaviour when open scripts in VirtualDubMod, I do not see error message if I writen script with error related to MVTools functions, during this VirtatualDub hung and not response.
I can not close video in Vitualdub, only close Vitualdub.
Job control also do not work.
Please advice.

yup.
yup is offline   Reply With Quote
Old 11th July 2017, 09:45   #328  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,314
Quote:
Originally Posted by feisty2 View Post
any particular reason you cant just get rid of all that asm shit and use intrinsics instead?
Because the porting is done in my free time which is limited.

Asm vs intrinsics.

SAD and SATD code (which are the most important routines regarding mvtools2 speed) written in intrinsics is _much_ slower than using existing asm, I'm talking about VS2015/2017 code generator.

I have experienced the opposite case as well when the generated code from intrinsics is faster than the original asm (experienced in FFT3DFilter and TIVTC) perhaps because of smarter instruction ordering. Even a C version can be faster than the old asm (TIVTC).

I usually have a look at the generated assembler code of the intrinsics.

There are cases when the optimizer uses too many xmm registers, so the prolog/epilog register save/restore (which we cannot control) takes significant time relative to the actual task, as experienced in 16 bit SAD intrinsics routines. I had to play with less-than-optimal loop unrolling until I found out the fastest result for a particular SAD blocksize.
pinterf is offline   Reply With Quote
Old 11th July 2017, 10:11   #329  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,314
Quote:
Originally Posted by MysteryX View Post
DCT=0 is "fine" with 68% CPU usage. DCT=1 is what gives problems with multi-threading with 37% CPU usage, choppy playback, and occasional freezes -- but DCT=1 is definitely better than before the recent Pinterf fix!
DCT=1 is using integer arithmetic for 8 bit video and 8x8 block sizes.
In all other cases (such as for block size 16x16) the routines from the FFTW3 library are used.
I don't know which fftw3 version are you using (i can see 3.3.6 as the latest one in http://www.fftw.org/ ), perhaps you could try comparing different versions.
pinterf is offline   Reply With Quote
Old 12th July 2017, 05:21   #330  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,559
Quote:
Originally Posted by pinterf View Post
I don't know which fftw3 version are you using (i can see 3.3.6 as the latest one in http://www.fftw.org/ ), perhaps you could try comparing different versions.
I don't know which version but it is from March 2014 I'll try the latest and see how it behaves.

Still the same problem. Stuck at 37% CPU usage.

It is the libfftw3f-3.dll file in C:\Windows\SysWOW64, correct?

If I use BlkSize=8, I get 47% CPU usage.

Last edited by MysteryX; 12th July 2017 at 05:25.
MysteryX is offline   Reply With Quote
Old 13th July 2017, 14:04   #331  |  Link
shae
Registered User
 
Join Date: Jun 2006
Posts: 397
What's supposed to happen if FFTW is missing?

QTGMC seems to work without it. Is it because MvTools2 doesn't always need it or something else?

And can it load libfftw3f-3.dll from the same directory as mvtools2.dll instead of the system dir?
AvsMeter says the FFTW DLL cannot be loaded, but maybe it only looks for it in the system directory.
shae is offline   Reply With Quote
Old 13th July 2017, 14:45   #332  |  Link
real.finder
Registered User
 
Join Date: Jan 2012
Location: Mesopotamia
Posts: 2,587
Quote:
Originally Posted by shae View Post
What's supposed to happen if FFTW is missing?

QTGMC seems to work without it. Is it because MvTools2 doesn't always need it or something else?

And can it load libfftw3f-3.dll from the same directory as mvtools2.dll instead of the system dir?
AvsMeter says the FFTW DLL cannot be loaded, but maybe it only looks for it in the system directory.
yes, MvTools2 doesn't always need it

you can load FFTW DLL by using this, x64 here
__________________
See My Avisynth Stuff
real.finder is offline   Reply With Quote
Old 13th July 2017, 23:09   #333  |  Link
shae
Registered User
 
Join Date: Jun 2006
Posts: 397
I think I'll just go by "it's probably fine if it the script loads, doesn't crash, and the beginning of the video look okay".
shae is offline   Reply With Quote
Old 20th July 2017, 03:28   #334  |  Link
GMJCZP
Registered User
 
GMJCZP's Avatar
 
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 744
When I use this hello_hello script (# 301) in 16 bits:
Quote:
tr = 1 # Temporal radius
mt = true # Internal multithreading
lsb = false # 16-bit
thSAD = 200 # denoising strength
blksize = 16 # block size
overlap = 4 # block overlap
super = MSuper (mt=mt)
multi_vec = MAnalyse (super, mt=mt, multi=true, blksize=blksize, overlap=overlap, delta=tr)
MDegrainN (super, multi_vec, tr, mt=mt, lsb=lsb, thSAD=thSAD, thSAD2=150)
By putting lsb = true, delta =1 and tr> 1 I get artifacts.
__________________
By law and justice!

GMJCZP's Arsenal
GMJCZP is offline   Reply With Quote
Old 20th July 2017, 04:50   #335  |  Link
hello_hello
Registered User
 
Join Date: Mar 2011
Posts: 4,829
GMJCZP,
Delta and TR have to be the same. From the help file:

MDeGrainN has a temporal radius given by the tr parameter, and uses a special motion vector clip.
tr
Temporal radius, > 0. Must match the mvmulti content, i.e. the delta parameter in MAnalyse.
hello_hello is offline   Reply With Quote
Old 20th July 2017, 13:27   #336  |  Link
GMJCZP
Registered User
 
GMJCZP's Avatar
 
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 744


Problem solved.
__________________
By law and justice!

GMJCZP's Arsenal
GMJCZP is offline   Reply With Quote
Old 20th July 2017, 16:04   #337  |  Link
GMJCZP
Registered User
 
GMJCZP's Avatar
 
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 744
Now I have another problem, I do not present the image correctly if I do not use f3kdb and DitherPost together, I'm still a rookie at this 16-bit:

Code:
Dither_convert_8_to_16()
Temporalsoften(2,1,2,mode=2,scenechange=10)
dither_resize16(720,480,kernel="spline16",invks=true,invkstaps=3,src_left=0.0,u=3,v=3)
MDegrainLight(2,lsb=true,thSAD=200)
f3kdb(range=15, grainY=0, grainC=0, keep_tv_range=True, input_depth=16, output_depth=8)
DitherPost()

# MDegrainLight
# https://forum.doom9.org/showthread.php?p=1810543#post1810543
# Original idea by hello_hello

function MDegrainLight(clip input, int "tr", bool "mt", bool "lsb", int "thSAD", int "thSAD2", int "blksize", int "overlap")
{
tr = Default(tr, 1) # Temporal radius
mt = Default(mt, true) # Internal multithreading
lsb = Default(lsb, false) # 16-bit
thSAD = Default(thSAD, 200) # Denoising strength
thSAD2 = Default(thSAD2, 150)
blksize = Default(blksize, 16) # Block size
overlap = Default(overlap, 4) # Block overlap

super = input.MSuper (mt=mt)
multi_vec = MAnalyse (super, mt=mt, multi=true, blksize=blksize, overlap=overlap, delta=tr)
input.MDegrainN (super, multi_vec, tr, mt=mt, lsb=lsb, thSAD=thSAD, thSAD2=thSAD2)
return last
}
In truck, if I use dfttest(sigma=2, tbsize=1, lsb_in=true, lsb=true, Y=true, U=true, V=true, opt=3, dither=0), instead of MDegrain, DitherPost is no longer necessary.
Can anyone please explain to me if I am redundant with DitherPost, or if my script is correct?

Or is there a way to use only, or f3kdb or DitherPost?
__________________
By law and justice!

GMJCZP's Arsenal

Last edited by GMJCZP; 20th July 2017 at 17:37.
GMJCZP is offline   Reply With Quote
Old 20th July 2017, 19:04   #338  |  Link
blaze077
Registered User
 
Join Date: Jan 2016
Posts: 79
1. Afaik, TemporalSoften does not support 16 bit stacked input, so you should apply it before the dither_convert_8_to_16 call.

2. I don't think MDegrainN takes in 16 bit stacked input. It can only output it using lsb=true. (No lsb_in parameter)

3. Ditherpost simply turns a 16 bit clip into an 8 bit clip. In your f3kdb call, you already output 8 bit video so you don't need ditherpost.

Alternatively, you could change f3kdb's output_depth to 16, and then ditherpost would work as expected.

Last edited by blaze077; 20th July 2017 at 19:12.
blaze077 is offline   Reply With Quote
Old 20th July 2017, 21:07   #339  |  Link
GMJCZP
Registered User
 
GMJCZP's Avatar
 
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 744
1. Temporal soften does not have anything to do with the problem.
2. The script works perfectly as it is, the problem is if I use f3kdb and DitherPost together, I said it before.
3. I repeat it again, if I do not use DitherPost the video is poorly displayed.

Anyway thanks for the reply.

I repeat my doubt, Is my script okay and I'm not messing with DitherPost?, Because I can not get an f3kdb command that correctly displays the image.

EDIT: I solved the problem, the MVTools documentation says:

Quote:
lsb

Generates 16-bit data when set to true. The picture made of the most siginificant bytes (MSB) is stacked on the top of the least significant byte (LSB) block. Hence a twice taller resulting picture. You can extract the MSB or the LSB with a simple Crop() call. This mode helps recovering the full bitdepth of temporally dithered data.
Then the definitive script looks like this:

Quote:
Dither_convert_8_to_16()
Temporalsoften(2,1,2,mode=2,scenechange=10)
dither_resize16(720,480,kernel="spline16",invks=true,invkstaps=3,src_left=0.0,u=3,v=3)
MDegrainLight(2,lsb=true,thSAD=200).Crop(0,0,0,960)
f3kdb(range=15, grainY=0, grainC=0, keep_tv_range=True, input_depth=16, output_depth=8)

# MDegrainLight
# https://forum.doom9.org/showthread.p...43#post1810543
# Original idea by hello_hello

function MDegrainLight(clip input, int "tr", bool "mt", bool "lsb", int "thSAD", int "thSAD2", int "blksize", int "overlap")
{
tr = Default(tr, 1) # Temporal radius
mt = Default(mt, true) # Internal multithreading
lsb = Default(lsb, false) # 16-bit
thSAD = Default(thSAD, 200) # Denoising strength
thSAD2 = Default(thSAD2, 150)
blksize = Default(blksize, 16) # Block size
overlap = Default(overlap, 4) # Block overlap

super = input.MSuper (mt=mt)
multi_vec = MAnalyse (super, mt=mt, multi=true, blksize=blksize, overlap=overlap, delta=tr)
input.MDegrainN (super, multi_vec, tr, mt=mt, lsb=lsb, thSAD=thSAD, thSAD2=thSAD2)
return last
}
In short, DitherPost was not necessary.
__________________
By law and justice!

GMJCZP's Arsenal

Last edited by GMJCZP; 20th July 2017 at 21:25.
GMJCZP is offline   Reply With Quote
Old 20th July 2017, 21:26   #340  |  Link
blaze077
Registered User
 
Join Date: Jan 2016
Posts: 79
I just tried to run your script and the cause is indeed your MDegrainLight function.
As you know, 16 bit stacked is double the height of the normal video (MSB and LSB).
You pass a 16 bit stacked clip to MDegrainN, but MDegrain does not have any way of knowing that you passed a 16 bit stacked clip to it. It just assumes that you gave it an 8 bit clip and processes it accordingly.
Since you pass lsb=true to MDegrainN, it tries to convert the already 16 bit stacked clip to 16 bit stacked again.
The result is that your video is now 4 times it's normal height!
With the f3kdb call, the video is back to double height and with the ditherpost call, it is back to normal height.

A solution can be to call ditherpost() before all the MVTools calls (MSuper, analyze and degrainN) inside your MDegrainLight function.
blaze077 is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 00:12.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.