Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 2nd August 2017, 17:59   #361  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,559
Indeed it crashes with the combination of BlkSize and Overlap. If I set Overlap=0, it doesn't crash.

Performance-wise, BlkSize=8 runs at 51% CPU with 8 threads (1080p with 0 overlap)
Code:
FPS (min | max | average):      1.308 | 101671 | 8.866
Memory usage (phys | virt):     1531 | 1537 MiB
Thread count:                   29
CPU usage (average):            51%
BlkSize=16 runs at 39% CPU and is much slower
Code:
FPS (min | max | average):      0.313 | 83516 | 3.048
Memory usage (phys | virt):     1429 | 1431 MiB
Thread count:                   29
CPU usage (average):            39%
BlkSize=12 is even slower
Code:
FPS (min | max | average):      0.294 | 93537 | 2.691
Memory usage (phys | virt):     1387 | 1389 MiB
Thread count:                   29
CPU usage (average):            39%
BlkSize=8 with DCT=0 runs at 64% CPU and is only twice faster than DCT=1
Code:
FPS (min | max | average):      2.481 | 97435 | 15.34
Memory usage (phys | virt):     1528 | 1540 MiB
Thread count:                   29
CPU usage (average):            64%
MysteryX is offline   Reply With Quote
Old 2nd August 2017, 18:40   #362  |  Link
manolito
Registered User
 
manolito's Avatar
 
Join Date: Sep 2003
Location: Berlin, Germany
Posts: 3,079
Which version of fftw3.dll are you guys using? The plugin archive comes with three different versions (float, double and long), and after doing some research I always use the float version. Could different versions be responsible for the speed differences with DCT=1 ?
manolito is offline   Reply With Quote
Old 2nd August 2017, 18:42   #363  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
Already posted about it somewhere, Pinterf fixed it in his later builds.
OverLap has to be 0 in some builds on final use of vectors.

Something bout it here:

https://forum.doom9.org/showthread.p...84#post1785084
https://forum.doom9.org/showthread.p...99#post1785099
https://forum.doom9.org/showthread.p...95#post1785795

EDIT:
Mani, libfftw3f-3.dll. Also use renamed to fftw3.dll (Although some plugins were fixed to use either name).
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 2nd August 2017 at 18:45.
StainlessS is offline   Reply With Quote
Old 2nd August 2017, 19:07   #364  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 5,374
Does what feisty say about the vpy version have anything to do with it ?

Quote:
Originally Posted by feisty2 View Post
mvtools got this little "temporal" parameter that requires a linear frame request, it's incompatible with multi threading and got removed in the VS versions (both jackoneill's and mine)
Maybe it was on in ur avs mt mess and got the shit all fucked up
https://forum.doom9.org/showthread.p...96#post1813996

The error message I got when testing the one in post 359 seemed related, something about threading incompatible blah blah. Sorry I don't have the exact error message right now. The vpy version didn't crash with the same settings

If so, would it be a better approach to modify mvtools-pfmod and use the settings without restrictions ? Is it even possible and what would be the pros/cons of doing it ? eg Would it "break" other things?
poisondeathray is offline   Reply With Quote
Old 2nd August 2017, 19:10   #365  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,559
MvTools2 has bugs to fix but Pinterf is on holidays still for the next 2 weeks and you won't hear from him until then
MysteryX is offline   Reply With Quote
Old 23rd August 2017, 17:06   #366  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,559
From what I understand, SAD calculation is central to MvTools2. It takes a large amount of data in, does a very linear processing, and returns only the sum.

This is where GPU processing would excel! Especially with DCT=1. If anyone wants to take that as a project.
MysteryX is offline   Reply With Quote
Old 24th August 2017, 04:22   #367  |  Link
burfadel
Registered User
 
Join Date: Aug 2006
Posts: 2,229
To overcome transfer inefficiency the more done on the GPU before the transfer back the better. More like transfer to GPU, series of commands, transfer back.

PinterF, any chance of porting over the 'STAR' search method from x265 (it itself ported). It has the effectiveness of exhaustive search and is about as fast as UMH, sometimes faster. Additionally it may not even be optimised to it's full extent in AVX, AVX2 etc, so x265 could potentially benefit as well.

Last edited by burfadel; 24th August 2017 at 19:16.
burfadel is offline   Reply With Quote
Old 24th August 2017, 19:01   #368  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,559
Is SVPFlow1's source code available somewhere? If someone wants to look into adding some GPU acceleration, using code from that library would be a good place to start -- but I can't find it anywhere.
MysteryX is offline   Reply With Quote
Old 24th August 2017, 19:45   #369  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,559
Linking to some bugs here
MysteryX is offline   Reply With Quote
Old 24th August 2017, 22:11   #370  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,559
Here's the source code of SVPFlow1 which uses GPU acceleration
https://www.svp-team.com/files/gpl/svpflow1-src.zip
MysteryX is offline   Reply With Quote
Old 25th August 2017, 14:57   #371  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,314
Quote:
Originally Posted by MysteryX View Post
Does DCT=1 have the same issues with BlkSize=8 that doesn't use FFTW?

Code:
ColorBarsHD()
ConvertToYV12()
jm_fps()
Prefetch(8)

function jm_fps(clip source, float "fps")
{
	fps = default(fps, 60)
	fps_num = int(fps * 1000)
	fps_den = 1000
	
	prefiltered = RemoveGrain(source, 22)
	super = MSuper(source, hpad = 16, vpad = 16, levels = 1) # one level is enough for MRecalculate
	superfilt = MSuper(prefiltered, hpad = 16, vpad = 16) # all levels for MAnalyse
	backward = MAnalyse(superfilt, isb = true, blksize = 8, overlap = 4, search = 3, dct = 1)
	forward = MAnalyse(superfilt, isb = false, blksize = 8, overlap = 4, search = 3, dct = 1)
	forward_re = MRecalculate(super, forward, blksize = 4, overlap = 2, thSAD = 100)
	backward_re = MRecalculate(super, backward, blksize = 4, overlap = 2, thSAD = 100)
	out = MFlowFps(source, super, backward_re, forward_re, num = fps_num, den = fps_den, blend = false, ml = 200, mask = 2)
	
	return out
}
Code:
Exception 0xC0000005 [STATUS_ACCESS_VIOLATION]
Module:   C:\Windows\SysWOW64\KernelBase.dll
Address:  0x76C3A9F2
WOOPS!!
WOOPS indeed!
Spent a day on this issue. Thanks for the script, the memory exception appeared immediately. After catching the exception, it turned out that a non-valid motion vector resulted in a memory read past the bottom line of the frame. What? Nonzero motion vectors for a static colorbar clip? Finally it turned out that in mt mode sometimes real motion vectors were generated, even for a blank black clip.

The problem is that the assembly code that calculates integer dct for 8x8 block sizes is _not_ thread safe. It has a single internal buffer of 8x8 words and there is a possibility that the threads are using it parallelly, thus messing up the internal calculations.

At least this is my assumption. Providing a new buffer parameter for this assembly routine (different one for each filter instance), the problem disappeared and the results became consistent (same input -> same output).
pinterf is offline   Reply With Quote
Old 25th August 2017, 15:04   #372  |  Link
GMJCZP
Registered User
 
GMJCZP's Avatar
 
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 744
Hello pinterf, what happened to what I put in the post #353. Thanks.
__________________
By law and justice!

GMJCZP's Arsenal
GMJCZP is offline   Reply With Quote
Old 25th August 2017, 15:26   #373  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,314
Quote:
Originally Posted by GMJCZP View Post
Hello pinterf, what happened to what I put in the post #353. Thanks.
"I noticed that the parameter search = 5 (Umh) produces identical results as search = 4 (Hex). In x264 the results are different."
Umh starts with cross search then with a "ring" of radius 4 (or rings with radius 4, 8, ... depending on the search param), and is finally using hex search.

The two methods are different by looking at the code, though there is a comment saying that "// my mod: do not shift the center after Cross"
pinterf is offline   Reply With Quote
Old 25th August 2017, 19:34   #374  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,559
Welcome back! Perhaps you could look into the "MRecalculate: wrong pixel type" error I'm getting when combining FRC with mClean as a priority? This one should be easy to fix and is blocking me
MysteryX is offline   Reply With Quote
Old 25th August 2017, 21:12   #375  |  Link
GMJCZP
Registered User
 
GMJCZP's Avatar
 
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 744
Quote:
Originally Posted by pinterf View Post
Umh starts with cross search then with a "ring" of radius 4 (or rings with radius 4, 8, ... depending on the search param), and is finally using hex search.

The two methods are different by looking at the code, though there is a comment saying that "// my mod: do not shift the center after Cross"
What?
__________________
By law and justice!

GMJCZP's Arsenal
GMJCZP is offline   Reply With Quote
Old 25th August 2017, 22:08   #376  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,559
It seems MvTools2 is the plugin everybody uses yet nobody understands and nobody wants to fix. There are tons of bugs that have been there since the start and that were never fixed.

As for MT issues with DCT=1, the VapourSynth version works perfectly fine so the ff3d library isn't responsible for the issues.
MysteryX is offline   Reply With Quote
Old 28th August 2017, 07:48   #377  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,314
Quote:
Originally Posted by MysteryX View Post
It seems MvTools2 is the plugin everybody uses yet nobody understands and nobody wants to fix. There are tons of bugs that have been there since the start and that were never fixed.
Good. You can start fixing those tons of bugs... If you want, of course.
pinterf is offline   Reply With Quote
Old 28th August 2017, 15:24   #378  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,559
Quote:
Originally Posted by pinterf View Post
Good. You can start fixing those tons of bugs... If you want, of course.
I tried but gave up before getting local compilation to work.
MysteryX is offline   Reply With Quote
Old 30th August 2017, 10:41   #379  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,314
Back-from-the-holiday edition.

Download MvTools2 2.7.22 with depans

Code:
- 2.7.22 (20170830)
  Misc: Stop using version suffix .22
  Fix: [DCT 8x8@8bit] garbage on x64: internal assembly code did not save xmm6/xmm7
  Fix: [DCT 8x8@8bit] safe multithreading for integer DCT (8x8 block size, 8 bit video): assembly had a single working buffer.
  Fix: [MDegrain] did not release input motion vector clips in destructor, possible hang at script closing. Bug since 2.7.1.22 (introducing MDegrain4/5)       
  Mod: fftw conversion constant of sqrt(2)/2 is more accurate (was:0.707), 16 bit formats may benefit (by feisty2)
  Fix: SSE4 assembly instructions in x64, broke on non-SSE4 processors
This release fixes some ancient issues.

Multithreaded scripts using the integer DCT path (8x8 block size, 8 bits) now are producing identical results for each runs. Sor far they were different because of the single common internal buffer, which got overwritten and used by different threads simultaneously. With special input clips (like ColorbarsHD) it resulted in access violation. Thanks for MysteryX for the report and the script.

After an analysis with integer dct, the x64 version is now giving the same result as the 32-bit version. Previously it was different and wrong, because the assembly code did not save xmm6 and xmm7 registers, which is compulsory on x64. This one was a very hard-to-find problems.

Last edited by pinterf; 30th August 2017 at 10:43.
pinterf is offline   Reply With Quote
Old 30th August 2017, 15:23   #380  |  Link
yup
Registered User
 
Join Date: Feb 2003
Location: Russia, Moscow
Posts: 854
Quote:
Originally Posted by yup View Post
pinterf for update.
Now some my scripts work stable.
But I see strange behaviour when open scripts in VirtualDubMod, I do not see error message if I writen script with error related to MVTools functions, during this VirtatualDub hung and not response.
I can not close video in Vitualdub, only close Vitualdub.
Job control also do not work.
Please advice.

yup.
pinterf
With last update issue gone.
yup.
yup is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 09:40.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.