Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
23rd January 2018, 14:57 | #41 | Link |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,816
|
This multithreaded resizing plugin is really good! Much better than plain Prefetch option in AviSynth+MT
CPU: E5-2690@2.9GHz (8C/16T) Source: 3840x2160 YUV420P10 Crop(0,280,0,-280) + Spline36Resize(1920,800) + Prefetch(8) Crop(0,280,0,-280) + Spline36Resize(1920,800) + Prefetch(16) Crop(0,280,0,-280) + Spline36ResizeMT(1920,800) The fastest,lower memory consumption and cpu usage! For comparison regular resizer. Crop(0,280,0,-280) + Spline36Resize(1920,800)
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper Last edited by Atak_Snajpera; 23rd January 2018 at 15:11. |
1st April 2018, 17:12 | #43 | Link |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,816
|
SetAffinity=true in latest version works terrible even without prefetch in script. Now it is slower than regular single threaded resizer!
SetAffinity=false BTW. I see that newer version is faster than old one (55 fps vs 52 fps)
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
1st April 2018, 17:24 | #45 | Link |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,816
|
Code:
#MT #VideoSource LoadPlugin("C:\Users\Dave\Documents\Delphi_Projects\RipBot264\_Compiled\Tools\AviSynth plugins\ffms\ffms_latest\x64\ffms2.dll") video=FFVideoSource("E:\_Video_Samples\mkv\Passengers_2016_4K.mkv",cachefile = "C:\Temp\RipBot264temp\job1\Passengers_2016_4K.mkv.ffindex") #Deinterlace #Decimate #Crop video=Crop(video,0,280,-0,-280) #Resize LoadPlugin("C:\Users\Dave\Documents\Delphi_Projects\RipBot264\_Compiled\Tools\AviSynth plugins\Plugins_JPSDR\Plugins_JPSDR.dll") video=Spline36ResizeMT(video,1920,800,SetAffinity=true).Sharpen(0.2) #Levels #Colours #Denoise #Custom #Prefetch #Subtitles #AudioSource Import("C:\Temp\RipBot264temp\job1\job1_a1.avs") #Triming #AVSameLength #ColorSpace #Return
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
1st April 2018, 18:02 | #46 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,316
|
What CPU do you have, more exactly how many logical cores do you have ? I just want to understand the 49 threads, but totaly expected if you have something like a 20 logical cores CPU.
__________________
My github. |
1st April 2018, 18:19 | #47 | Link |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,816
|
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
2nd April 2018, 10:26 | #48 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,316
|
Ok, there is something odd indeed, thanks reporting. All the other filters seems to behave properly, but the resampler runs only on one core with SetAffinity set to true, which is totaly unexpected. I can't investigate right now, but i will very shortly.
There is a bug somewhere... aWarpsharp2 and nnedi3 give me 41 threads on my 20 cores CPU, in both cases true/false. ResampleMT gives me 41 threads with true, 174 with false !!! Yes, there is something realy wrong...
__________________
My github. Last edited by jpsdr; 2nd April 2018 at 10:39. |
3rd April 2018, 11:43 | #49 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,316
|
More fishy !! I'm on my break lunch and made some tests on my PC a work, and everything works fine, but i don't have the same CPU than i have at home (it's a simple 4 cores without HT). The only thing i can't check for now is Intel vs VS. Have you made your tests with VS or Intel version ? If Intel, can you make a test with the VS version ? I'll try also this when back home, but it will not be before several hours.
Edit : Sometimes i'm very stupid, of course i can test, i just have to download them from my github... Results : The VS AVX and Intel AVX2 versions work fine with standard avisynth. The VS AVX version works fine with avs+ (both x86 & x64). The Intel AVX2 version is working... "fishy" with avs+ (both x86 & x64), but only for the resampler, the other filters work fine. I'll update the release files on github, removing the Intel versions, and keeping only VS version, and adding an VS AVX2 version. Wait at least 24h to check/re-download the files.
__________________
My github. Last edited by jpsdr; 3rd April 2018 at 12:09. |
3rd April 2018, 16:12 | #51 | Link |
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
|
I have a rather basic question - What makes you think that messing with Windows' thread scheduler by manipulating thread affinity improves the speed? What if another program does the same? Have you measured the speed in different scenarios (various Windows versions, CPUs with Hyperthreading, software that messes with thread priority)?
__________________
Groucho's Avisynth Stuff |
3rd April 2018, 16:47 | #52 | Link | |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,316
|
Quote:
And, the threadpool i've used as exemple to make mine was even more restrictive, no choice, put each thread on one CPU only. I've expended that. Nevertheless, this has nothing to do with Intel compiler messing the code... But maybe it's also my fault, using /O3 may be too much experimental.
__________________
My github. Last edited by jpsdr; 3rd April 2018 at 16:50. |
|
3rd April 2018, 20:19 | #54 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,316
|
LOL... Yes, the "Random" part means that you can directly access randomly to the data if you want, because the memory chipset/componant have an address bus allowing you to choose whatever memory data/case you want. Opposed to different kind of memory, which have for exemple only serial access, meaning that you can't directly access to whatever data you want without accessing to others before.
So... What this has to do with the fact that the memory zone you're working on can eventualy fit in the cache ?
__________________
My github. |
3rd April 2018, 20:25 | #56 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,316
|
Some bench tests :
Script : Code:
Colorbars(width=1920*2,height=1080*2,pixel_type="yv12").killaudio().assumefps(25,1).trim(0,9999) Spline36ResizeMT(1920,1080,SetAffinity=true) Code:
[Runtime info] Frames processed: 10000 (0 - 9999) FPS (min | max | average): 1080 | 1315 | 1293 Memory usage (phys | virt): 47 | 44 MiB Thread count: 41 CPU usage (average): 81% Efficiency index: 15.96 Time (elapsed): 00:00:07.736 Code:
[Runtime info] Frames processed: 10000 (0 - 9999) FPS (min | max | average): 1050 | 1372 | 1173 Memory usage (phys | virt): 47 | 45 MiB Thread count: 41 CPU usage (average): 69% Efficiency index: 17.00 Time (elapsed): 00:00:08.527 Code:
Colorbars(width=1920*2,height=1080*2,pixel_type="yv12").killaudio().assumefps(25,1).trim(0,9999) aWarpSharp2(SetAffinity=true) Code:
[Runtime info] Frames processed: 10000 (0 - 9999) FPS (min | max | average): 165.6 | 270.2 | 239.8 Memory usage (phys | virt): 60 | 60 MiB Thread count: 41 CPU usage (average): 84% Efficiency index: 2.854 Time (elapsed): 00:00:41.707 Code:
[Runtime info] Frames processed: 10000 (0 - 9999) FPS (min | max | average): 170.4 | 271.0 | 207.2 Memory usage (phys | virt): 60 | 60 MiB Thread count: 41 CPU usage (average): 67% Efficiency index: 3.092 Time (elapsed): 00:00:48.270 Code:
Colorbars(width=1920,height=1080,pixel_type="yv12").killaudio().assumefps(25,1).trim(0,4) nnedi3(dh = true, nsize = 3, nns = 4, qual = 2,pscrn=0,threads=0,SetAffinity=true) Code:
[Runtime info] Frames processed: 5 (0 - 4) FPS (min | max | average): 0.431 | 0.433 | 0.432 Memory usage (phys | virt): 49 | 53 MiB Thread count: 41 CPU usage (average): 97% Efficiency index: 0.00446 Time (elapsed): 00:00:11.569 Code:
[Runtime info] Frames processed: 5 (0 - 4) FPS (min | max | average): 0.379 | 0.403 | 0.392 Memory usage (phys | virt): 49 | 52 MiB Thread count: 41 CPU usage (average): 87% Efficiency index: 0.00451 Time (elapsed): 00:00:12.741 Nevertheless, doesn't mean it will be like this for everybody. This is why everyone can tune according his results.
__________________
My github. Last edited by jpsdr; 3rd April 2018 at 20:29. |
4th April 2018, 12:07 | #57 | Link |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,816
|
Still something is not right. I used dll from Release_W7 folder.
SetAffinity=true (it is even slower than before SetAffinity=false
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
4th April 2018, 13:15 | #58 | Link | |
Excessively jovial fellow
Join Date: Jun 2004
Location: rude
Posts: 1,100
|
Quote:
I find "benchmarks are, like, just your opinion, maaaan" to be an exceptionally poor argument, by the way. Your results compared to Atak_Snajpera's ones seem to imply that your implementation doesn't actually work, or at least doesn't do what you think it does. Heavens above know what you're even benchmarking. e: to just quickly restate the argument about cache locality in resizers: recall that most resizers are separable filters which work by moving a sampling window over the input image one dimension at a time. Where do you see the potential for great time savings in the form of cache hits in this, exactly? Last edited by TheFluff; 4th April 2018 at 13:19. |
|
4th April 2018, 14:27 | #59 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,316
|
@Atak_Snajpera
Can you provide yours results with both true/false for just the following script : Code:
Colorbars(width=1920*2,height=1080*2,pixel_type="yv12").killaudio().assumefps(25,1).trim(0,9999) Spline36ResizeMT(1920,1080,SetAffinity=true)
__________________
My github. Last edited by jpsdr; 4th April 2018 at 15:02. |
4th April 2018, 14:39 | #60 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,316
|
The script are provided, so, if looking at them it's impossible to say what is benchmarked, i indeed don't know what to do more.
About cache, i'm just saying that if you have 8 physical CPUs with 8 threads workings each one on 1/8 of 1Mb frame and each thread on a different CPU, there is more chances that the working memory zone of each threads will totaly fit and stay within the cache during the whole process, than if you have 8 threads working each one on a full 1Mb frame. No more, no less.
__________________
My github. |
|
|