MVTools-pfmod - Page 19

MysteryX · 2nd August 2017, 17:59

Indeed it crashes with the combination of BlkSize and Overlap. If I set Overlap=0, it doesn't crash.

Performance-wise, BlkSize=8 runs at 51% CPU with 8 threads (1080p with 0 overlap)

Code:

FPS (min | max | average):      1.308 | 101671 | 8.866
Memory usage (phys | virt):     1531 | 1537 MiB
Thread count:                   29
CPU usage (average):            51%

BlkSize=16 runs at 39% CPU and is much slower

Code:

FPS (min | max | average):      0.313 | 83516 | 3.048
Memory usage (phys | virt):     1429 | 1431 MiB
Thread count:                   29
CPU usage (average):            39%

BlkSize=12 is even slower

Code:

FPS (min | max | average):      0.294 | 93537 | 2.691
Memory usage (phys | virt):     1387 | 1389 MiB
Thread count:                   29
CPU usage (average):            39%

BlkSize=8 with DCT=0 runs at 64% CPU and is only twice faster than DCT=1

Code:

FPS (min | max | average):      2.481 | 97435 | 15.34
Memory usage (phys | virt):     1528 | 1540 MiB
Thread count:                   29
CPU usage (average):            64%

manolito · 2nd August 2017, 18:40

Which version of fftw3.dll are you guys using? The plugin archive comes with three different versions (float, double and long), and after doing some research I always use the float version. Could different versions be responsible for the speed differences with DCT=1 ?

StainlessS · 2nd August 2017, 18:42

Already posted about it somewhere, Pinterf fixed it in his later builds.
OverLap has to be 0 in some builds on final use of vectors.

Something bout it here:

https://forum.doom9.org/showthread.p...84#post1785084
https://forum.doom9.org/showthread.p...99#post1785099
https://forum.doom9.org/showthread.p...95#post1785795

EDIT:
Mani, libfftw3f-3.dll. Also use renamed to fftw3.dll (Although some plugins were fixed to use either name).

poisondeathray · 2nd August 2017, 19:07

Does what feisty say about the vpy version have anything to do with it ?

Quote:

Originally Posted by feisty2

mvtools got this little "temporal" parameter that requires a linear frame request, it's incompatible with multi threading and got removed in the VS versions (both jackoneill's and mine)
Maybe it was on in ur avs mt mess and got the shit all fucked up

https://forum.doom9.org/showthread.p...96#post1813996

The error message I got when testing the one in post 359 seemed related, something about threading incompatible blah blah. Sorry I don't have the exact error message right now. The vpy version didn't crash with the same settings

If so, would it be a better approach to modify mvtools-pfmod and use the settings without restrictions ? Is it even possible and what would be the pros/cons of doing it ? eg Would it "break" other things?

MysteryX · 2nd August 2017, 19:10

MvTools2 has bugs to fix but Pinterf is on holidays still for the next 2 weeks and you won't hear from him until then

MysteryX · 23rd August 2017, 17:06

From what I understand, SAD calculation is central to MvTools2. It takes a large amount of data in, does a very linear processing, and returns only the sum.

This is where GPU processing would excel! Especially with DCT=1. If anyone wants to take that as a project.

burfadel · 24th August 2017, 04:22

To overcome transfer inefficiency the more done on the GPU before the transfer back the better. More like transfer to GPU, series of commands, transfer back.

PinterF, any chance of porting over the 'STAR' search method from x265 (it itself ported). It has the effectiveness of exhaustive search and is about as fast as UMH, sometimes faster. Additionally it may not even be optimised to it's full extent in AVX, AVX2 etc, so x265 could potentially benefit as well.

MysteryX · 24th August 2017, 19:01

Is SVPFlow1's source code available somewhere? If someone wants to look into adding some GPU acceleration, using code from that library would be a good place to start -- but I can't find it anywhere.

MysteryX · 24th August 2017, 19:45

Linking to some bugs here

MysteryX · 24th August 2017, 22:11

Here's the source code of SVPFlow1 which uses GPU acceleration
https://www.svp-team.com/files/gpl/svpflow1-src.zip

pinterf · 25th August 2017, 14:57

Quote:

Originally Posted by MysteryX

Does DCT=1 have the same issues with BlkSize=8 that doesn't use FFTW?

Code:

ColorBarsHD()
ConvertToYV12()
jm_fps()
Prefetch(8)

function jm_fps(clip source, float "fps")
{
	fps = default(fps, 60)
	fps_num = int(fps * 1000)
	fps_den = 1000
	
	prefiltered = RemoveGrain(source, 22)
	super = MSuper(source, hpad = 16, vpad = 16, levels = 1) # one level is enough for MRecalculate
	superfilt = MSuper(prefiltered, hpad = 16, vpad = 16) # all levels for MAnalyse
	backward = MAnalyse(superfilt, isb = true, blksize = 8, overlap = 4, search = 3, dct = 1)
	forward = MAnalyse(superfilt, isb = false, blksize = 8, overlap = 4, search = 3, dct = 1)
	forward_re = MRecalculate(super, forward, blksize = 4, overlap = 2, thSAD = 100)
	backward_re = MRecalculate(super, backward, blksize = 4, overlap = 2, thSAD = 100)
	out = MFlowFps(source, super, backward_re, forward_re, num = fps_num, den = fps_den, blend = false, ml = 200, mask = 2)
	
	return out
}

Code:

Exception 0xC0000005 [STATUS_ACCESS_VIOLATION]
Module:   C:\Windows\SysWOW64\KernelBase.dll
Address:  0x76C3A9F2

WOOPS!!

WOOPS indeed!
Spent a day on this issue. Thanks for the script, the memory exception appeared immediately. After catching the exception, it turned out that a non-valid motion vector resulted in a memory read past the bottom line of the frame. What? Nonzero motion vectors for a static colorbar clip? Finally it turned out that in mt mode sometimes real motion vectors were generated, even for a blank black clip.

The problem is that the assembly code that calculates integer dct for 8x8 block sizes is _not_ thread safe. It has a single internal buffer of 8x8 words and there is a possibility that the threads are using it parallelly, thus messing up the internal calculations.

At least this is my assumption. Providing a new buffer parameter for this assembly routine (different one for each filter instance), the problem disappeared and the results became consistent (same input -> same output).

GMJCZP · 25th August 2017, 15:04

Hello pinterf, what happened to what I put in the post #353. Thanks.

pinterf · 25th August 2017, 15:26

Quote:

Originally Posted by GMJCZP

Hello pinterf, what happened to what I put in the post #353. Thanks.

"I noticed that the parameter search = 5 (Umh) produces identical results as search = 4 (Hex). In x264 the results are different."
Umh starts with cross search then with a "ring" of radius 4 (or rings with radius 4, 8, ... depending on the search param), and is finally using hex search.

The two methods are different by looking at the code, though there is a comment saying that "// my mod: do not shift the center after Cross"

MysteryX · 25th August 2017, 19:34

Welcome back! Perhaps you could look into the "MRecalculate: wrong pixel type" error I'm getting when combining FRC with mClean as a priority? This one should be easy to fix and is blocking me

GMJCZP · 25th August 2017, 21:12

Quote:

Originally Posted by pinterf

Umh starts with cross search then with a "ring" of radius 4 (or rings with radius 4, 8, ... depending on the search param), and is finally using hex search.

The two methods are different by looking at the code, though there is a comment saying that "// my mod: do not shift the center after Cross"

What?

MysteryX · 25th August 2017, 22:08

It seems MvTools2 is the plugin everybody uses yet nobody understands and nobody wants to fix. There are tons of bugs that have been there since the start and that were never fixed.

As for MT issues with DCT=1, the VapourSynth version works perfectly fine so the ff3d library isn't responsible for the issues.

pinterf · 28th August 2017, 07:48

Quote:

Originally Posted by MysteryX

It seems MvTools2 is the plugin everybody uses yet nobody understands and nobody wants to fix. There are tons of bugs that have been there since the start and that were never fixed.

Good. You can start fixing those tons of bugs... If you want, of course.

MysteryX · 28th August 2017, 15:24

Quote:

Originally Posted by pinterf

Good. You can start fixing those tons of bugs... If you want, of course.

I tried but gave up before getting local compilation to work.

pinterf · 30th August 2017, 10:41

Back-from-the-holiday edition.

Download MvTools2 2.7.22 with depans

Code:

- 2.7.22 (20170830)
  Misc: Stop using version suffix .22
  Fix: [DCT 8x8@8bit] garbage on x64: internal assembly code did not save xmm6/xmm7
  Fix: [DCT 8x8@8bit] safe multithreading for integer DCT (8x8 block size, 8 bit video): assembly had a single working buffer.
  Fix: [MDegrain] did not release input motion vector clips in destructor, possible hang at script closing. Bug since 2.7.1.22 (introducing MDegrain4/5)       
  Mod: fftw conversion constant of sqrt(2)/2 is more accurate (was:0.707), 16 bit formats may benefit (by feisty2)
  Fix: SSE4 assembly instructions in x64, broke on non-SSE4 processors

This release fixes some ancient issues.

Multithreaded scripts using the integer DCT path (8x8 block size, 8 bits) now are producing identical results for each runs. Sor far they were different because of the single common internal buffer, which got overwritten and used by different threads simultaneously. With special input clips (like ColorbarsHD) it resulted in access violation. Thanks for MysteryX for the report and the script.

After an analysis with integer dct, the x64 version is now giving the same result as the 32-bit version. Previously it was different and wrong, because the assembly code did not save xmm6 and xmm7 registers, which is compulsory on x64. This one was a very hard-to-find problems.

yup · 30th August 2017, 15:23

Quote:

Originally Posted by yup

pinterf

for update.
Now some my scripts work stable.
But I see strange behaviour when open scripts in VirtualDubMod, I do not see error message if I writen script with error related to MVTools functions, during this VirtatualDub hung and not response.
I can not close video in Vitualdub, only close Vitualdub.
Job control also do not work.
Please advice.

yup.

pinterf

With last update issue gone.
yup.

2nd August 2017, 17:59	#361 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	Indeed it crashes with the combination of BlkSize and Overlap. If I set Overlap=0, it doesn't crash. Performance-wise, BlkSize=8 runs at 51% CPU with 8 threads (1080p with 0 overlap) Code: FPS (min \| max \| average): 1.308 \| 101671 \| 8.866 Memory usage (phys \| virt): 1531 \| 1537 MiB Thread count: 29 CPU usage (average): 51% BlkSize=16 runs at 39% CPU and is much slower Code: FPS (min \| max \| average): 0.313 \| 83516 \| 3.048 Memory usage (phys \| virt): 1429 \| 1431 MiB Thread count: 29 CPU usage (average): 39% BlkSize=12 is even slower Code: FPS (min \| max \| average): 0.294 \| 93537 \| 2.691 Memory usage (phys \| virt): 1387 \| 1389 MiB Thread count: 29 CPU usage (average): 39% BlkSize=8 with DCT=0 runs at 64% CPU and is only twice faster than DCT=1 Code: FPS (min \| max \| average): 2.481 \| 97435 \| 15.34 Memory usage (phys \| virt): 1528 \| 1540 MiB Thread count: 29 CPU usage (average): 64% __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer

2nd August 2017, 18:42	#363 \| Link
StainlessS HeartlessS Usurer Join Date: Dec 2009 Location: Over the rainbow Posts: 10,980	Already posted about it somewhere, Pinterf fixed it in his later builds. OverLap has to be 0 in some builds on final use of vectors. Something bout it here: https://forum.doom9.org/showthread.p...84#post1785084 https://forum.doom9.org/showthread.p...99#post1785099 https://forum.doom9.org/showthread.p...95#post1785795 EDIT: Mani, libfftw3f-3.dll. Also use renamed to fftw3.dll (Although some plugins were fixed to use either name). __________________ I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 2nd August 2017 at 18:45.

2nd August 2017, 19:10	#365 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	MvTools2 has bugs to fix but Pinterf is on holidays still for the next 2 weeks and you won't hear from him until then __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer

23rd August 2017, 17:06	#366 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	From what I understand, SAD calculation is central to MvTools2. It takes a large amount of data in, does a very linear processing, and returns only the sum. This is where GPU processing would excel! Especially with DCT=1. If anyone wants to take that as a project. __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer

24th August 2017, 04:22	#367 \| Link
burfadel Registered User Join Date: Aug 2006 Posts: 2,229	To overcome transfer inefficiency the more done on the GPU before the transfer back the better. More like transfer to GPU, series of commands, transfer back. PinterF, any chance of porting over the 'STAR' search method from x265 (it itself ported). It has the effectiveness of exhaustive search and is about as fast as UMH, sometimes faster. Additionally it may not even be optimised to it's full extent in AVX, AVX2 etc, so x265 could potentially benefit as well. Last edited by burfadel; 24th August 2017 at 19:16.

2nd August 2017, 18:40	#362 \| Link
manolito Registered User Join Date: Sep 2003 Location: Berlin, Germany Posts: 3,079	Which version of fftw3.dll are you guys using? The plugin archive comes with three different versions (float, double and long), and after doing some research I always use the float version. Could different versions be responsible for the speed differences with DCT=1 ?

24th August 2017, 19:01	#368 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	Is SVPFlow1's source code available somewhere? If someone wants to look into adding some GPU acceleration, using code from that library would be a good place to start -- but I can't find it anywhere. __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer

24th August 2017, 19:45	#369 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	Linking to some bugs here __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer

24th August 2017, 22:11	#370 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	Here's the source code of SVPFlow1 which uses GPU acceleration https://www.svp-team.com/files/gpl/svpflow1-src.zip __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer

25th August 2017, 15:04	#372 \| Link
GMJCZP Registered User Join Date: Apr 2010 Location: I have a statue in Hakodate, Japan Posts: 744	Hello pinterf, what happened to what I put in the post #353. Thanks. __________________ By law and justice! GMJCZP's Arsenal

25th August 2017, 19:34	#374 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	Welcome back! Perhaps you could look into the "MRecalculate: wrong pixel type" error I'm getting when combining FRC with mClean as a priority? This one should be easy to fix and is blocking me __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer

25th August 2017, 22:08	#376 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	It seems MvTools2 is the plugin everybody uses yet nobody understands and nobody wants to fix. There are tons of bugs that have been there since the start and that were never fixed. As for MT issues with DCT=1, the VapourSynth version works perfectly fine so the ff3d library isn't responsible for the issues. __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer

30th August 2017, 10:41	#379 \| Link
pinterf Registered User Join Date: Jan 2014 Posts: 2,314	Back-from-the-holiday edition. Download MvTools2 2.7.22 with depans Code: - 2.7.22 (20170830) Misc: Stop using version suffix .22 Fix: [DCT 8x8@8bit] garbage on x64: internal assembly code did not save xmm6/xmm7 Fix: [DCT 8x8@8bit] safe multithreading for integer DCT (8x8 block size, 8 bit video): assembly had a single working buffer. Fix: [MDegrain] did not release input motion vector clips in destructor, possible hang at script closing. Bug since 2.7.1.22 (introducing MDegrain4/5) Mod: fftw conversion constant of sqrt(2)/2 is more accurate (was:0.707), 16 bit formats may benefit (by feisty2) Fix: SSE4 assembly instructions in x64, broke on non-SSE4 processors This release fixes some ancient issues. Multithreaded scripts using the integer DCT path (8x8 block size, 8 bits) now are producing identical results for each runs. Sor far they were different because of the single common internal buffer, which got overwritten and used by different threads simultaneously. With special input clips (like ColorbarsHD) it resulted in access violation. Thanks for MysteryX for the report and the script. After an analysis with integer dct, the x64 version is now giving the same result as the 32-bit version. Previously it was different and wrong, because the assembly code did not save xmm6 and xmm7 registers, which is compulsory on x64. This one was a very hard-to-find problems. __________________ AviSynth+ on github, Other repos: RgTools, Masktools2, MvTools2, TIVTC, Average Last edited by pinterf; 30th August 2017 at 10:43.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode