Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
1st April 2020, 11:15 | #141 | Link | ||
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,883
|
Quote:
Quote:
It's working flawlessly now on XP! Your new build works too! Thank you both! |
||
2nd April 2020, 00:04 | #142 | Link |
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
|
@FranceBB
Time to update AVSMeter. 2.8.5 is 1.5 years old.
__________________
Groucho's Avisynth Stuff |
2nd April 2020, 00:57 | #143 | Link |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,883
|
Oops.
How can I have missed that...? Updated right now! By the way, thank you for AVSMeter, I found it very useful on several occasions, especially when I was walking by some of my colleagues who had any kind of fucked up installation with double/triple plugins, random names etc. It also helped me to keep my installation and plugin folder perfectly clean with no redundant things. |
7th April 2020, 12:17 | #144 | Link |
結城有紀
Join Date: Dec 2003
Location: NJ; OR; Shanghai
Posts: 894
|
r2 -- marked as pre-release for now -- has been uploaded to GitHub.
I spend a few hours to completely rewrite the dual interface wrappers and it should be easier to work with now. I'm planning to move some of my filters into the new platform if I have time, including f3kdb (before fixing some raised issue). Let me know how it works -- and if it is faster than it was. I experimentally used C++ Parallel STL in a few hot code paths and should have improved speed a bit. |
7th April 2020, 13:31 | #145 | Link | |
Registered User
Join Date: Dec 2005
Location: Germany
Posts: 1,795
|
Quote:
Code:
clang 109.2 fps msvc 113.3 fps PHP Code:
And some warning to go Code:
Core freed but 1866240000 bytes still allocated in framebuffers EDIT: No wait, I mixed up the dlls: r2 always throws this error: There is no function named neo_fft3d(). I tried all x64+x86 versions
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth VapourSynth Portable FATPACK || VapourSynth Database Last edited by ChaosKing; 7th April 2020 at 13:42. |
|
7th April 2020, 22:50 | #146 | Link |
結城有紀
Join Date: Dec 2003
Location: NJ; OR; Shanghai
Posts: 894
|
Yea vapoursynth should have some issues. Let me set up some test environment and see how to deal with the issues.
A leaking filter instance cost me nearly an hour or two. This thing is a time killer. And I put the wrong name on the function name, so...
__________________
Projects x265 - Yuuki-Asuna-mod Download / GitHub TS - ADTS AAC Splitter | LATM AAC Splitter | BS4K-ASS Neo AviSynth+ filters - F3KDB | FFT3D | DFTTest | MiniDeen | Temporal Median Last edited by MeteorRain; 8th April 2020 at 01:49. |
8th April 2020, 10:00 | #148 | Link |
Registered User
Join Date: Dec 2005
Location: Germany
Posts: 1,795
|
The fps gap between clang and msvc is now almost gone.
Both ranged between 31-33 fps. Output 500 frames in 15.71 seconds (31.82 fps) Output 500 frames in 14.99 seconds (33.37 fps) 185 fps now on my tested DVD ntsc source. But this speed up again... What kind of sorcery is this? EDIT: memory warning is gone avisynth works. I get 28-29fps in avisynth x64 3.4 So now we have a nearly x3 faster fft3d filter. Awesome job! EDIT2: Added plugin to vsrepo & avsrepo
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth VapourSynth Portable FATPACK || VapourSynth Database Last edited by ChaosKing; 8th April 2020 at 10:28. |
8th April 2020, 10:49 | #149 | Link |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,542
|
As a programming noob as I am, may I ask why some of AVS contributors started to provide both compiles? I mean, which are differences and advantages? Speed on different CPUs? I am really curious.
__________________
@turment on Telegram |
8th April 2020, 11:24 | #150 | Link |
Registered User
Join Date: Dec 2005
Location: Germany
Posts: 1,795
|
It is just something I noticed...
In general different compilers have different optimization strategies implemented. This can lead to plugin A is faster then plugin B with compiler X but plugin C is faster with compiler Y. Then you could also inspect the generated machine code by compiler X and see why it is faster or slower then the code by compiler Y. tldr: compiler, please undestand what I want.
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth VapourSynth Portable FATPACK || VapourSynth Database |
8th April 2020, 12:24 | #151 | Link |
Registered User
Join Date: Jan 2014
Posts: 2,309
|
There are so many options and sub-filters that it is impossible to test all cases and choose one "best" compiler.
E.g. there are a tons of possible branches by processor types: SSE2, SSE4, AVX, AVX2, plus 24 modes in RgTools, plus repair option. Many hundred different scenarios. Then it was a real case that msvc produced 10% faster code in AVX2 for Removegrain mode X while it was 10% slower at mode Y, compared to clang. Let the user choose, if user is a power-user and have inifinite time to test, he will know why this or that version is chosen. All other user just choose by sympathy (MS rulez, clang inside yeah). |
8th April 2020, 12:58 | #153 | Link | |
結城有紀
Join Date: Dec 2003
Location: NJ; OR; Shanghai
Posts: 894
|
Quote:
1) clang does lots of optimizations on even SIMD intrinsics code, while MSVC tends to compile as-is. For example, in f3kdb, where there's a selection operation (I can't remember the details), clang directly optimize the whole operation into a completely different instruction sequence, which greatly improved the speed on SSE2, even making it almost as fast or faster than the AVX variant (due to lack of equivalent instruction in AVX). While MSVC simply compiled as-is, with some unroll maybe? 2) clang checks lots of extra warnings and errors. Having a clang compiled version means the code will likely compile on clang @ Linux or MacOS as well. I personally also compile the code in GCC to make sure it satisfies all common compilers. However GCC uses POSIX ABI so it's not usable under MSVC C++ AVS+ so I didn't include it in the release. |
|
8th April 2020, 13:05 | #154 | Link | |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,542
|
Quote:
Average on 10x launch of same script: MaskTools clang: 13.43 fps MaskTools VS2019: 13.59 fps The true bottleneck here is MVTools, that I always beg you to release on both compiler versions to see if any difference occurs.
__________________
@turment on Telegram |
|
8th April 2020, 13:08 | #155 | Link |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,542
|
Thanks for the exaustive reply. In the mean time I asked my old friends in University what they use and confirmed me Intel compiler is one of the best to have fast code. On some peculiar cases, mostly scientific simulations, they can see 20x increase vs VS one.
__________________
@turment on Telegram |
8th April 2020, 13:08 | #156 | Link |
結城有紀
Join Date: Dec 2003
Location: NJ; OR; Shanghai
Posts: 894
|
I have a pretty smart LRU caching system and a multi-threading counting system, to make sure I use the least possible memory to work with give multi-threading model. So it should now runs at 6 threads from input.
Then for each plane I let them fly on their own threads, so 18 threads into the engine. From there, some hot paths like copying window data into blocks operation gets parallel'd using C++ PSTL, which is basically a trouble free thread pool. Core functions such as Apply and Sharpen families get full SSE/AVX/AVX512 optimization. Because I have a centralized wrapper and put variants into templates, it's pretty easy for me to transform loops into parallel computing. I've tried putting all blocks into their own threads, and it turns out it runs slower. So current design is to run 4 parallel computing batch, and each run a quarter of the data blocks. I definitely don't want it to occupy too much CPU, because it's usually paired with other filters. So I tweak it to use about 3-4 full cores maximum to maintain a high running efficiency. |
8th April 2020, 13:15 | #157 | Link | |
結城有紀
Join Date: Dec 2003
Location: NJ; OR; Shanghai
Posts: 894
|
Quote:
Like I said, you must be able to optimize assembly code to further improve the performance. Sometimes you also have to take into consideration what the target CPU is, because, there's a few instructions that runs faster and slower on different CPUs. For example, if in the SIMD I write code A; B; C, but the compiler thinks my code is dumb because it can be done using X; Y, which is faster, it should delete my code and change it to the faster version right away. Then there's times when ABC is faster on, for example, haswell, but XY is faster on skylake, now the compiler would ask you, which CPU would you prefer to target. If you say I love haswell then it should leave ABC there. Optimization is a very interesting topic to study, and I only know a small portion of it. Here's my $0.02. |
|
8th April 2020, 13:25 | #158 | Link | |
Registered User
Join Date: Dec 2005
Location: Germany
Posts: 1,795
|
Quote:
Haha just kidding.
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth VapourSynth Portable FATPACK || VapourSynth Database |
|
8th April 2020, 13:46 | #160 | Link | |
Registered User
Join Date: Dec 2005
Location: Germany
Posts: 1,795
|
Quote:
https://github.com/pinterf/mvtools https://github.com/dubhater/vapoursynth-mvtools But maybe it would be better and easier to improve the c++ reimplementation of mvtools by feisty2 https://forum.doom9.org/showthread.php?t=172525 https://github.com/IFeelBloated/vapoursynth-mvtools-sf <-- I had some problematic scenes where this version was ok where the other mvtools showed some artifacts. Maybe 32bit precision had something to do with it, idk.
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth VapourSynth Portable FATPACK || VapourSynth Database Last edited by ChaosKing; 8th April 2020 at 13:50. |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|