Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 1st December 2011, 04:20   #1  |  Link
SAPikachu
Registered User
 
SAPikachu's Avatar
 
Join Date: Aug 2007
Posts: 218
MP_Pipeline 0.18 - run parts of avisynth script in external processes [2014-04-06]

This plugin is originally written for my friend to work-around the [del]2GB[/del] 4GB problem of 32-bit process. (Well, also for fun. ) Don't know whether it is useful for others, but I decided to post it here anyways.

As of 0.11, overhead of the plugin is much smaller, it may be possible to use it to speed up more scripts.

Change log:
Code:
0.18
* Fix deadlock when exported clip is consumed by multiple script block

0.17
* Properly terminate slave processes when initialization fails
* Fix "Not a clip" error when using ### inherit and the last block is empty

0.16
* Try to silent all error dialogs on exit of slave process
* Slave process shouldn't be stuck on exit anymore, it will terminate itself if it doesn't exit cleanly after 15 seconds
* Fix ### branch statement, previously it incorrectly rejects some input

0.15
* Properly clean script environment up on exit
* Allow using different avisynth dll to run script block (### dll)

0.14
* Fixed another crashing bug

0.13
* Fixed a bug that causes occasional crashing

0.12
* Fixed a problem that makes scripts unable to be loaded in some programs

0.11
* Greatly improved performance, maximum 80% overhead reduction
* New feature: Ability to lock threads to cores, may improve performance in some cases
* (0.10 is skipped to avoid confusion)

0.9
* New feature: Frame prefetching
* New feature: Exporting multiple clip variables in a single process
* New feature: Code block can be shared between processes

0.3
* Binaries in the x86 folder are in correct version now (In 0.2 the win64 slave is actually win32...)
* Integrated a patched TCPDeliver, no longer depend on the external one
* Fixed random crash when filter chain is destroyed
* Thunked branching

0.2
* x64 support (please copy TCPDeliver.dll in the package to respective plugin folder)
* x86/x64 mixed slave process (requires both x86/x64 version of AviSynth to be installed)
* Add a script variable in branch slave process, make it distinguishable in script
Limitations:
* Since each process have its own script environment, all script variables and loaded plugins won't be inherited, they must be re-initialized if needed
* Due to the limitation above, manually-loaded plugins and imported scripts need to be reloaded/re-imported before they can be used in new process (Or use inherited script snippet, please see MP_Pipeline_readme.avs for details)
* Clips before MP_Pipeline will be ignored
* Audio is not supported
* Every script block must return a clip (i.e. "last" must be a clip), otherwise MPP will raise this error: Invalid arguments to function "MPP_PrepareDownstreamClip"

Binary: http://nmm.me/z6
Source code: https://github.com/SAPikachu/MP_Pipeline/tree/0.18

Some example:

1. Basic usage:
Code:
MP_Pipeline("""
FFVideoSource("SomeVideo")
QTGMC()
### prefetch: 16, 0
### ###
""")
MCTD()

# MCTD and QTGMC will be run parallelly in 2 separate processes
2. Speed up MCTD at the cost of memory
Code:
# Must be 64bit system with at least 8GB memory to run this script
MP_Pipeline("""

# This may be smaller, but I only tested this number
SetMemoryMax(3072)

FFVideoSource("SomeVideo")
MCTD(settings="high")
### prefetch: 16, 0
### ###
""")

# Some time ago I used a script similar to this one for encoding, it is about 20% ~ 30% faster than plain MCTD.
3. Branching
Code:
MP_Pipeline("""
FFVideoSource("SomeVideo")
TNLMeans()
### prefetch: 16, 0
### branch: 4
### ###
""")

# TNLMeans will be run in 4 processes with branching (please see example script in the package for details)
4. Frame caching
Code:
MP_Pipeline("""
FFVideoSource("SomeUnseekableVideo", seekmode=-1)
TNLMeans()
### prefetch: 32, 24
# It is important to use a big backward cache since we can't seek

### ###

MCTD()

""")
Please see example script in the binary package for some other usage and setting explanations.
__________________
f3kdb 1.5.1 / MP_Pipeline 0.18

ffms2 builds with 10bit output hack:
libav-9a60b1f / ffmpeg-1e4d049 / FFmbc-0.7.1
Built from ffms2 6e0d654 (hack a9fe004)

Mirrors: http://bit.ly/19TwDD3

Last edited by SAPikachu; 6th April 2014 at 10:57.
SAPikachu is offline   Reply With Quote
Old 1st December 2011, 08:10   #2  |  Link
TheRyuu
warpsharpened
 
Join Date: Feb 2007
Posts: 787
Quote:
Originally Posted by SAPikachu View Post
work-around the 4GB problem of 32-bit process.
ftfy.
TheRyuu is offline   Reply With Quote
Old 1st December 2011, 08:16   #3  |  Link
SAPikachu
Registered User
 
SAPikachu's Avatar
 
Join Date: Aug 2007
Posts: 218
Quote:
Originally Posted by TheRyuu View Post
ftfy.
User processes can only use 2GB of full address space, don't they? (well... actually 3GB on some conditions, but that's a special case)
__________________
f3kdb 1.5.1 / MP_Pipeline 0.18

ffms2 builds with 10bit output hack:
libav-9a60b1f / ffmpeg-1e4d049 / FFmbc-0.7.1
Built from ffms2 6e0d654 (hack a9fe004)

Mirrors: http://bit.ly/19TwDD3
SAPikachu is offline   Reply With Quote
Old 1st December 2011, 11:55   #4  |  Link
SEt
Registered User
 
Join Date: Aug 2007
Posts: 374
Actually 4GB on 64 bit OS.
SEt is offline   Reply With Quote
Old 1st December 2011, 12:30   #5  |  Link
SAPikachu
Registered User
 
SAPikachu's Avatar
 
Join Date: Aug 2007
Posts: 218
Quote:
Originally Posted by SEt View Post
Actually 4GB on 64 bit OS.
Didn't notice that until read this. Learned something today, thanks.
__________________
f3kdb 1.5.1 / MP_Pipeline 0.18

ffms2 builds with 10bit output hack:
libav-9a60b1f / ffmpeg-1e4d049 / FFmbc-0.7.1
Built from ffms2 6e0d654 (hack a9fe004)

Mirrors: http://bit.ly/19TwDD3
SAPikachu is offline   Reply With Quote
Old 1st December 2011, 14:35   #6  |  Link
kemuri-_9
Compiling Encoder
 
kemuri-_9's Avatar
 
Join Date: Jan 2007
Posts: 1,348
Quote:
Originally Posted by SEt View Post
Actually 4GB on 64 bit OS.
generally everything has to be compiled with large address awareness for the 32bit binaries (executable and dlls) to really allow addressing over 2GB of memory.

as this is also not usually a default build option iirc, most things don't have it enabled, preventing beyond 2GB of addressable memory.
__________________
custom x264 builds & patches | F@H | My Specs
kemuri-_9 is offline   Reply With Quote
Old 1st December 2011, 14:58   #7  |  Link
Gavino
Avisynth language lover
 
Join Date: Dec 2007
Location: Spain
Posts: 3,431
Quote:
Originally Posted by kemuri-_9 View Post
generally everything has to be compiled with large address awareness for the 32bit binaries (executable and dlls) to really allow addressing over 2GB of memory.
I thought it was just executables (not dlls). Thus Avisynth will benefit from increased memory if used by a client that has been built as 'large address aware'.
__________________
GScript and GRunT - complex Avisynth scripting made easier
Gavino is offline   Reply With Quote
Old 1st December 2011, 15:09   #8  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by Gavino View Post
I thought it was just executables (not dlls). Thus Avisynth will benefit from increased memory if used by a client that has been built as 'large address aware'.
Right:
http://blogs.msdn.com/b/oldnewthing/.../10065933.aspx

But then loading a DLL into some "LARGEADDRESSAWARE" process might break it, if the code in that DLL isn't prepared to deal with addresses beyond 2 GB.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊
LoRd_MuldeR is offline   Reply With Quote
Old 1st December 2011, 14:58   #9  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
So this basically is an AVS2YUV clone, but not as a stand-alone application, but as an Avisynth plug-in?

I think this would be particularly useful to load 32-Bit plugins (that don't have 64-Bit equivalents) into a 64-Bit Avisynth environment. Or vice versa.

Is that supported/intended?
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 1st December 2011 at 15:01.
LoRd_MuldeR is offline   Reply With Quote
Old 1st December 2011, 15:30   #10  |  Link
SAPikachu
Registered User
 
SAPikachu's Avatar
 
Join Date: Aug 2007
Posts: 218
Quote:
Originally Posted by LoRd_MuldeR View Post
So this basically is an AVS2YUV clone, but not as a stand-alone application, but as an Avisynth plug-in?

I think this would be particularly useful to load 32-Bit plugins (that don't have 64-Bit equivalents) into a 64-Bit Avisynth environment. Or vice versa.

Is that supported/intended?
It is functionally similar to avs2yuv, but with some additional features like multiple levels of pipeline.

That is not my original intention, but it is interesting. It only supports x86 now, I will add x64 and mixed script environment support later when I have time.
__________________
f3kdb 1.5.1 / MP_Pipeline 0.18

ffms2 builds with 10bit output hack:
libav-9a60b1f / ffmpeg-1e4d049 / FFmbc-0.7.1
Built from ffms2 6e0d654 (hack a9fe004)

Mirrors: http://bit.ly/19TwDD3
SAPikachu is offline   Reply With Quote
Old 1st December 2011, 16:30   #11  |  Link
kolak
Registered User
 
Join Date: Nov 2004
Location: Poland
Posts: 2,843
Can we use this to divide file (using trim) to few parts and run on each one (in seperate process) QTGMC and put them together at the end?


Andrew
kolak is offline   Reply With Quote
Old 1st December 2011, 21:25   #12  |  Link
-Vit-
Registered User
 
Join Date: Jul 2010
Posts: 448
Quote:
Originally Posted by kolak View Post
Can we use this to divide file (using trim) to few parts and run on each one (in seperate process) QTGMC and put them together at the end?
It's an interesting plugin, but doesn't seem to help for that unless I'm missing something. I tried this on some SD footage:
Code:
MP_Pipeline("""

WhateverSource("Some\Source")

### ###

QTGMC("Placebo")

### branch: 4

### ###

""")
Worked OK, ran five slave processes and produced the correct result. However, it was slower than single threaded (single threaded is 6fps, this script was 5fps). Used about 2.4Gb memory. Increasing branch slowed it down further, reducing branch to 2 speeded it up to just over 6fps.

By comparison, splitting the video and running many separate single threaded encoding processes, or just using SetMTMode gives 20-25fps. SetMTMode uses a lot less memory.
-Vit- is offline   Reply With Quote
Old 1st December 2011, 21:44   #13  |  Link
kolak
Registered User
 
Join Date: Nov 2004
Location: Poland
Posts: 2,843
Hmmm- shame.

I'm forced to run few instances for HD- not a big deal, but if it could be automated than it would be easier.
kolak is offline   Reply With Quote
Old 2nd December 2011, 02:08   #14  |  Link
SAPikachu
Registered User
 
SAPikachu's Avatar
 
Join Date: Aug 2007
Posts: 218
Quote:
Originally Posted by -Vit- View Post
It's an interesting plugin, but doesn't seem to help for that unless I'm missing something. I tried this on some SD footage:
Code:
MP_Pipeline("""

WhateverSource("Some\Source")

### ###

QTGMC("Placebo")

### branch: 4

### ###

""")
Worked OK, ran five slave processes and produced the correct result. However, it was slower than single threaded (single threaded is 6fps, this script was 5fps). Used about 2.4Gb memory. Increasing branch slowed it down further, reducing branch to 2 speeded it up to just over 6fps.

By comparison, splitting the video and running many separate single threaded encoding processes, or just using SetMTMode gives 20-25fps. SetMTMode uses a lot less memory.
The branch statement is actually not very useful, it is only suitable for spatial single-threaded plugins like TNLMeans, for temporal scripts/filters (especially complex script like QTGMC), the same frame will be repeatedly processed by multiple processes and cpu time will be wasted, decreasing speed. That's why I didn't mention it in OP.
__________________
f3kdb 1.5.1 / MP_Pipeline 0.18

ffms2 builds with 10bit output hack:
libav-9a60b1f / ffmpeg-1e4d049 / FFmbc-0.7.1
Built from ffms2 6e0d654 (hack a9fe004)

Mirrors: http://bit.ly/19TwDD3
SAPikachu is offline   Reply With Quote
Old 1st December 2011, 16:33   #15  |  Link
06_taro
soy sauce buyer
 
Join Date: Mar 2010
Location: United Kingdom
Posts: 164
Now add large memory aware flag to exceed 2GB limit in avs4x264mod.
06_taro is offline   Reply With Quote
Old 6th December 2011, 12:45   #16  |  Link
pbristow
Registered User
 
pbristow's Avatar
 
Join Date: Jun 2009
Location: UK
Posts: 263
*SLAPS OWN FOREHEAD* Of course.

Can the StackHorizontal be placed inside MP_Pipeline call, perhaps as a third process? No, again, we'd still need some way to represent the output of each of the other two processes to feed them into StackHorizontal.

How about this:
Code:
# Let's assume that all plugins are auto-loaded, for simplicity.

AVIsource("some_anaglyph_3D_thing.avi")

StackHorizontal(  \
        MP_Pipeline("""ExtractOneSide(Eye="Right")""",  \
        MP_Pipeline("""ExtractOneSide(Eye="Left")"""  \
)
When presented with a single line in the internal script, does MP_Pipeline launch that as a separate process from the calling script?
Is the use of the seperator (i.e. "### ###") mandatory to cause a new process to be created?
Will adding a separator to one-line script (or extra separators in the general case) confuse the plugin, or will it just disregard any surplus ones?

Can see I'm gonna need to have a play with this one, as soon as I get time.
pbristow is offline   Reply With Quote
Old 6th December 2011, 12:53   #17  |  Link
SAPikachu
Registered User
 
SAPikachu's Avatar
 
Join Date: Aug 2007
Posts: 218
Quote:
Originally Posted by pbristow View Post
*SLAPS OWN FOREHEAD* Of course.

Can the StackHorizontal be placed inside MP_Pipeline call, perhaps as a third process? No, again, we'd still need some way to represent the output of each of the other two processes to feed them into StackHorizontal.

How about this:
Code:
# Let's assume that all plugins are auto-loaded, for simplicity.

AVIsource("some_anaglyph_3D_thing.avi")

StackHorizontal(  \
        MP_Pipeline("""ExtractOneSide(Eye="Right")""",  \
        MP_Pipeline("""ExtractOneSide(Eye="Left")"""  \
)
When presented with a single line in the internal script, does MP_Pipeline launch that as a separate process from the calling script?
Is the use of the seperator (i.e. "### ###") mandatory to cause a new process to be created?
Will adding a separator to one-line script (or extra separators in the general case) confuse the plugin, or will it just disregard any surplus ones?

Can see I'm gonna need to have a play with this one, as soon as I get time.
That's inspiring, I didn't think of this model before. Maybe you can try this:

Code:
StackHorizontal(  \

MP_Pipeline("""

AVIsource("some_anaglyph_3D_thing.avi")
ExtractOneSide(Eye="Right")
### ###

""",  \
MP_Pipeline("""

AVIsource("some_anaglyph_3D_thing.avi")
ExtractOneSide(Eye="Left")
### ###

"""  \
)
You may also need to add ThreadRequest to make it process at full speed.

Yes, the "### ###" splitter is required to tell the plugin to spawn a new process. Script body can be empty though.
__________________
f3kdb 1.5.1 / MP_Pipeline 0.18

ffms2 builds with 10bit output hack:
libav-9a60b1f / ffmpeg-1e4d049 / FFmbc-0.7.1
Built from ffms2 6e0d654 (hack a9fe004)

Mirrors: http://bit.ly/19TwDD3

Last edited by SAPikachu; 6th December 2011 at 13:03.
SAPikachu is offline   Reply With Quote
Old 6th December 2011, 15:38   #18  |  Link
Gavino
Avisynth language lover
 
Join Date: Dec 2007
Location: Spain
Posts: 3,431
Quote:
Originally Posted by pbristow View Post
How about this:
Code:
AVIsource("some_anaglyph_3D_thing.avi")

StackHorizontal(  \
        MP_Pipeline("""ExtractOneSide(Eye="Right")""",  \
        MP_Pipeline("""ExtractOneSide(Eye="Left")"""  \
)
I'm not sure that would have worked as it stands, and I see that SAPikachu's example has an AviSource call inside each MP_Pipeline.

Am I right in thinking that MP_Pipeline is essentially a source filter, and takes no clip input, or does it make use of 'last' in some way?
__________________
GScript and GRunT - complex Avisynth scripting made easier
Gavino is offline   Reply With Quote
Old 6th December 2011, 23:54   #19  |  Link
pbristow
Registered User
 
pbristow's Avatar
 
Join Date: Jun 2009
Location: UK
Posts: 263
Tried it out: Looks like you're right Gavino. MP_Pipeline.dll starts up each new process as an instance of MP_Pipeline.dll.slave.exe, which executes the relevant segment of the script via its own instance of AviSynth. Those processes don't appear (at present) to have any way of receiving input from the parent instance of AviSynth other than the script section itself. So, the AviSource (or equivalent) call has to be in the script section that's passed to MP_Pipeline.

A consequence of that is that if there's any common processing that needs to be done to the video before the processing paths diverge (e.g., in my usual cases, denoising and resizing the picture to fit half the screen width), that pre-processing will have to be run twice... Unless you prepare a mezzanine file first with a separate script.

Common *post*-processing, on the other hand is easier: Just put it after the calls to MP_Pipeline.

SAPikachu, can you confirm/critique that analysis?

It works though! I did a test using an MVTools-based frame-doubler, as a simplified proxy for my 3D processing, and a dummy "minimal load encoder" - i.e. just cropping off most of the picture in VirtualDub and saving small rectangle of it, uncompressed. Using MP_Pipeline, instead of the same script without, finished in 58s rather than 95s. During processing, two processor cores were nearly fully used, rather than one.

Last edited by pbristow; 7th December 2011 at 00:01. Reason: Forgot to say "It works!" :)
pbristow is offline   Reply With Quote
Old 7th December 2011, 05:11   #20  |  Link
SAPikachu
Registered User
 
SAPikachu's Avatar
 
Join Date: Aug 2007
Posts: 218
Quote:
Originally Posted by pbristow View Post
Tried it out: Looks like you're right Gavino. MP_Pipeline.dll starts up each new process as an instance of MP_Pipeline.dll.slave.exe, which executes the relevant segment of the script via its own instance of AviSynth. Those processes don't appear (at present) to have any way of receiving input from the parent instance of AviSynth other than the script section itself. So, the AviSource (or equivalent) call has to be in the script section that's passed to MP_Pipeline.

A consequence of that is that if there's any common processing that needs to be done to the video before the processing paths diverge (e.g., in my usual cases, denoising and resizing the picture to fit half the screen width), that pre-processing will have to be run twice... Unless you prepare a mezzanine file first with a separate script.

Common *post*-processing, on the other hand is easier: Just put it after the calls to MP_Pipeline.

SAPikachu, can you confirm/critique that analysis?

It works though! I did a test using an MVTools-based frame-doubler, as a simplified proxy for my 3D processing, and a dummy "minimal load encoder" - i.e. just cropping off most of the picture in VirtualDub and saving small rectangle of it, uncompressed. Using MP_Pipeline, instead of the same script without, finished in 58s rather than 95s. During processing, two processor cores were nearly fully used, rather than one.
Yes, your analysis is right. I intentionally left out clip input in MP_Pipeline. Although it is possible to accept clip input, but it may cause thread-safety problems.

In next version, I will add a script variable to slave AviSynth environment so that different slave process can be distinguished in script. And then you can use BRANCH statement to workaround this problem. But I think you can try ThreadRequest, in theory it can give bigger performance boost than MP_Pipeline, since overhead of multiprocessing is big.
__________________
f3kdb 1.5.1 / MP_Pipeline 0.18

ffms2 builds with 10bit output hack:
libav-9a60b1f / ffmpeg-1e4d049 / FFmbc-0.7.1
Built from ffms2 6e0d654 (hack a9fe004)

Mirrors: http://bit.ly/19TwDD3

Last edited by SAPikachu; 7th December 2011 at 05:19. Reason: typo & some addendum
SAPikachu is offline   Reply With Quote
Reply

Tags
avisynth, multi-process, pipeline

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 09:09.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.