Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
2nd August 2017, 17:59 | #361 | Link |
Soul Architect
Join Date: Apr 2014
Posts: 2,559
|
Indeed it crashes with the combination of BlkSize and Overlap. If I set Overlap=0, it doesn't crash.
Performance-wise, BlkSize=8 runs at 51% CPU with 8 threads (1080p with 0 overlap) Code:
FPS (min | max | average): 1.308 | 101671 | 8.866 Memory usage (phys | virt): 1531 | 1537 MiB Thread count: 29 CPU usage (average): 51% Code:
FPS (min | max | average): 0.313 | 83516 | 3.048 Memory usage (phys | virt): 1429 | 1431 MiB Thread count: 29 CPU usage (average): 39% Code:
FPS (min | max | average): 0.294 | 93537 | 2.691 Memory usage (phys | virt): 1387 | 1389 MiB Thread count: 29 CPU usage (average): 39% Code:
FPS (min | max | average): 2.481 | 97435 | 15.34 Memory usage (phys | virt): 1528 | 1540 MiB Thread count: 29 CPU usage (average): 64% |
2nd August 2017, 18:40 | #362 | Link |
Registered User
Join Date: Sep 2003
Location: Berlin, Germany
Posts: 3,079
|
Which version of fftw3.dll are you guys using? The plugin archive comes with three different versions (float, double and long), and after doing some research I always use the float version. Could different versions be responsible for the speed differences with DCT=1 ?
|
2nd August 2017, 18:42 | #363 | Link |
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
|
Already posted about it somewhere, Pinterf fixed it in his later builds.
OverLap has to be 0 in some builds on final use of vectors. Something bout it here: https://forum.doom9.org/showthread.p...84#post1785084 https://forum.doom9.org/showthread.p...99#post1785099 https://forum.doom9.org/showthread.p...95#post1785795 EDIT: Mani, libfftw3f-3.dll. Also use renamed to fftw3.dll (Although some plugins were fixed to use either name).
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 2nd August 2017 at 18:45. |
2nd August 2017, 19:07 | #364 | Link | |
Registered User
Join Date: Sep 2007
Posts: 5,374
|
Does what feisty say about the vpy version have anything to do with it ?
Quote:
The error message I got when testing the one in post 359 seemed related, something about threading incompatible blah blah. Sorry I don't have the exact error message right now. The vpy version didn't crash with the same settings If so, would it be a better approach to modify mvtools-pfmod and use the settings without restrictions ? Is it even possible and what would be the pros/cons of doing it ? eg Would it "break" other things? |
|
23rd August 2017, 17:06 | #366 | Link |
Soul Architect
Join Date: Apr 2014
Posts: 2,559
|
From what I understand, SAD calculation is central to MvTools2. It takes a large amount of data in, does a very linear processing, and returns only the sum.
This is where GPU processing would excel! Especially with DCT=1. If anyone wants to take that as a project. |
24th August 2017, 04:22 | #367 | Link |
Registered User
Join Date: Aug 2006
Posts: 2,229
|
To overcome transfer inefficiency the more done on the GPU before the transfer back the better. More like transfer to GPU, series of commands, transfer back.
PinterF, any chance of porting over the 'STAR' search method from x265 (it itself ported). It has the effectiveness of exhaustive search and is about as fast as UMH, sometimes faster. Additionally it may not even be optimised to it's full extent in AVX, AVX2 etc, so x265 could potentially benefit as well. Last edited by burfadel; 24th August 2017 at 19:16. |
24th August 2017, 22:11 | #370 | Link |
Soul Architect
Join Date: Apr 2014
Posts: 2,559
|
Here's the source code of SVPFlow1 which uses GPU acceleration
https://www.svp-team.com/files/gpl/svpflow1-src.zip |
25th August 2017, 14:57 | #371 | Link | |
Registered User
Join Date: Jan 2014
Posts: 2,314
|
Quote:
Spent a day on this issue. Thanks for the script, the memory exception appeared immediately. After catching the exception, it turned out that a non-valid motion vector resulted in a memory read past the bottom line of the frame. What? Nonzero motion vectors for a static colorbar clip? Finally it turned out that in mt mode sometimes real motion vectors were generated, even for a blank black clip. The problem is that the assembly code that calculates integer dct for 8x8 block sizes is _not_ thread safe. It has a single internal buffer of 8x8 words and there is a possibility that the threads are using it parallelly, thus messing up the internal calculations. At least this is my assumption. Providing a new buffer parameter for this assembly routine (different one for each filter instance), the problem disappeared and the results became consistent (same input -> same output). |
|
25th August 2017, 15:26 | #373 | Link | |
Registered User
Join Date: Jan 2014
Posts: 2,314
|
Quote:
Umh starts with cross search then with a "ring" of radius 4 (or rings with radius 4, 8, ... depending on the search param), and is finally using hex search. The two methods are different by looking at the code, though there is a comment saying that "// my mod: do not shift the center after Cross" |
|
25th August 2017, 21:12 | #375 | Link | |
Registered User
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 744
|
Quote:
|
|
25th August 2017, 22:08 | #376 | Link |
Soul Architect
Join Date: Apr 2014
Posts: 2,559
|
It seems MvTools2 is the plugin everybody uses yet nobody understands and nobody wants to fix. There are tons of bugs that have been there since the start and that were never fixed.
As for MT issues with DCT=1, the VapourSynth version works perfectly fine so the ff3d library isn't responsible for the issues. |
30th August 2017, 10:41 | #379 | Link |
Registered User
Join Date: Jan 2014
Posts: 2,314
|
Back-from-the-holiday edition.
Download MvTools2 2.7.22 with depans Code:
- 2.7.22 (20170830) Misc: Stop using version suffix .22 Fix: [DCT 8x8@8bit] garbage on x64: internal assembly code did not save xmm6/xmm7 Fix: [DCT 8x8@8bit] safe multithreading for integer DCT (8x8 block size, 8 bit video): assembly had a single working buffer. Fix: [MDegrain] did not release input motion vector clips in destructor, possible hang at script closing. Bug since 2.7.1.22 (introducing MDegrain4/5) Mod: fftw conversion constant of sqrt(2)/2 is more accurate (was:0.707), 16 bit formats may benefit (by feisty2) Fix: SSE4 assembly instructions in x64, broke on non-SSE4 processors Multithreaded scripts using the integer DCT path (8x8 block size, 8 bits) now are producing identical results for each runs. Sor far they were different because of the single common internal buffer, which got overwritten and used by different threads simultaneously. With special input clips (like ColorbarsHD) it resulted in access violation. Thanks for MysteryX for the report and the script. After an analysis with integer dct, the x64 version is now giving the same result as the 32-bit version. Previously it was different and wrong, because the assembly code did not save xmm6 and xmm7 registers, which is compulsory on x64. This one was a very hard-to-find problems. Last edited by pinterf; 30th August 2017 at 10:43. |
30th August 2017, 15:23 | #380 | Link | |
Registered User
Join Date: Feb 2003
Location: Russia, Moscow
Posts: 854
|
Quote:
With last update issue gone. yup. |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|