Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
29th June 2017, 03:31 | #5421 | Link |
Registered User
Join Date: Feb 2007
Location: Sweden
Posts: 480
|
x265 v2.4+89-fa076d29d619 (MSYS/MinGW, GCC 6.3.0, 32 & 64bit 8/10/12bit multilib EXEs)
x265 [info]: HEVC encoder version 2.4+89-fa076d29d619 x265 [info]: build info [Windows][GCC 6.3.0][32 bit/64 bit] 8bit+10bit+12bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2 Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default |
30th June 2017, 09:35 | #5423 | Link |
Registered User
Join Date: Dec 2014
Posts: 666
|
Guys,
Can you explain the difference between --analysis vs --multipass? Both are trying to eliminate the work redundancy on multi pass encoding. Why not just have one command for this?
__________________
Asus ProArt Z790 - 13th Gen Intel i9 - RTX 3080 - DDR5 64GB Predator - LG OLED C9 - Yamaha A3030 - Windows 11 x64 - PotPlayerr - Lav - MadVR |
30th June 2017, 10:07 | #5424 | Link | |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
|
The mysteries of
--[no-]multi-pass-opt-analysis --[no-]multi-pass-opt-distortion Quote:
--analysis-reuse-file <filename> --analysis-reuse-level <1..10> I guess that using an analysis file supersedes several on-the-fly internal calculations a refinement would have used with fixed values not meant to be changed further... I wonder how deep you have to understand the principles of operation in the encoder core, to understand the relations between these two groups of options. You may probably have to be able to read and understand the C sources, at least. I see the analysis file use rather as a kind of debugging and optimization tool for running dozens of encodings and comparing a lot of statistics, rather than speeding up a casual user's movie conversion. Last edited by LigH; 30th June 2017 at 10:11. |
|
30th June 2017, 10:23 | #5425 | Link |
Registered User
Join Date: Dec 2014
Posts: 666
|
Thanks LigH.
It is really difficult to understand these things especially for ordinary folks like me. How I wish someone can explain these through images. I think that is more comprehensible for everyone
__________________
Asus ProArt Z790 - 13th Gen Intel i9 - RTX 3080 - DDR5 64GB Predator - LG OLED C9 - Yamaha A3030 - Windows 11 x64 - PotPlayerr - Lav - MadVR |
30th June 2017, 13:45 | #5426 | Link |
Registered User
Join Date: Feb 2007
Location: Sweden
Posts: 480
|
x265 v2.4+96-58b4fa89c42d (MSYS/MinGW, GCC 6.3.0, 32 & 64bit 8/10/12bit multilib EXEs)
x265 [info]: HEVC encoder version 2.4+96-58b4fa89c42d x265 [info]: build info [Windows][GCC 6.3.0][32 bit/64 bit] 8bit+10bit+12bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2 Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default Last edited by Barough; 30th June 2017 at 14:09. |
30th June 2017, 18:07 | #5427 | Link | |
Guest
Posts: n/a
|
Quote:
|
|
3rd July 2017, 16:03 | #5428 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
|
x265 2.4+96-58b4fa89c42d (GCC 7.1.0, Win32+Win64, AIO EXE+DLL only)
merge with stable; several fixes and tweaks renamed / changed / new CLI options: Code:
--analysis-reuse-mode <string|int> save - Dump analysis info into file, load - Load analysis buffers from the file. Default 0 --analysis-reuse-file <filename> Specify file name used for either dumping or reading analysis data. Deault x265_analysis.dat --analysis-reuse-level <1..10> Level of analysis reuse indicates amount of info stored/reused in save/load mode, 1:least..10:most. Default 5 --refine-intra <int> Enable intra refinement for load mode. Default 0 --[no-]const-vbv Enable consistent vbv. turned on with tune grain. Default disabled |
6th July 2017, 21:11 | #5429 | Link |
Unavailable
Join Date: Mar 2009
Location: offline
Posts: 1,480
|
x265.exe 2.4+97-006c75cf822e
Code:
Aruna Matheswaran committed 006c75c 2017-06-22 Allocate frame threads based on available pool threads This patch decides #frame-threads based on #pool-threads available. If pools not specified, #frame-threads will be decided based on detected #CPU-threads. This patch also decreases #frame-threads allocated for #pool-threads in the interval (15 - 31) and (>= 32) as there is high run to run variation in bitrate and SSIM with higher frame threads.With this reduction in #frame-threads there is ~3-4 % drop in fps with little SSIM improvement for #pool-threads (15 - 31) and no significant change in performance for #pool-threads (>= 32). |
6th July 2017, 23:06 | #5430 | Link | |
Registered User
Join Date: May 2015
Posts: 185
|
Quote:
|
|
7th July 2017, 07:20 | #5431 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
|
The maximum number of threads will still be somewhat limited, due to a limited bitfield width of the core mask. Furthermore, too many threads for one encoder instance don't make sense, will be inefficient because each thread will only "see" a small part of the frame, and the thread sync overhead rises. With a huge number of physical cores, running several instances of the encoder in parallel on a subset of cores each is much more efficient (quality / speed).
I guess this patch will support the execution of several instances, each limited to a subset of cores, by allocating threads based on the limited amount of cores in each separate pool, as if there was only a CPU with fewer cores, thus avoiding too many threads in the sum of all instances. |
7th July 2017, 07:29 | #5432 | Link | |
Registered User
Join Date: Dec 2002
Posts: 5,565
|
Did we read the same text?
Quote:
15 - 31 threads: 3% to 4% slower >= 32 threads: +/- 0% |
|
7th July 2017, 09:37 | #5433 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
|
Pradeep's patch review note before committing was: "The improvements in quality seem to justify the change."
Quite imaginable to me: Fewer threads reduce the speed a bit, but increase the quality. And beyond a threshold of threads, saturation effects of the thread management may be the bottleneck, I guess. |
7th July 2017, 17:30 | #5434 | Link |
Registered User
Join Date: Feb 2007
Location: Sweden
Posts: 480
|
x265 v2.4+99-3160e1a0cc5f (MSYS/MinGW, GCC 7.1.0, 32 & 64bit 8/10/12bit multilib EXEs)
x265 [info]: HEVC encoder version 2.4+99-3160e1a0cc5f x265 [info]: build info [Windows][GCC 7.1.0][32/64 bit] 8bit+10bit+12bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 Merge with default; prep for v2.5 Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default Last edited by Barough; 8th July 2017 at 16:55. |
7th July 2017, 23:29 | #5435 | Link | |
Registered User
Join Date: May 2015
Posts: 185
|
Quote:
|
|
7th July 2017, 23:41 | #5436 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,344
|
It slows down speed for some thread/core configurations but increases quality.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
8th July 2017, 09:40 | #5439 | Link | |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
|
Let me explain more verbosely how I imagine that. (If I am wrong, don't hesitate to correct me. Understanding x265 frame threading constraints is hard.)
Let's imagine you have a dual socket mainboard, each of your two CPU's has 16 cores (maybe logical due to HT). So the whole system will report 32 cores overall. You decide to run two instances of x265 at the same time, each with a thread pool of 16, to run distinctly on either CPU. Before this patch, the number of frame threads depended on the number of cores in the whole system, so it was calculated in relation to the number 32. After this patch, the number of frame threads will depend on the capacity of each separate thread pool, thus be calculated in relation to the number 16. In addition, it reduces them a bit more, as the developers discovered that even fewer threads in this range, compared to the previous calculation of the optimal number, has an advantage. Of course this means fewer frame threads. That will slow down the calculation a bit. But before, x265 may have been less efficient, because it spawned too many threads for the limited pool (causing more synchronization overhead than necessary) and with a smaller scope each ("seeing" less of the whole neighborhood of the currently encoded slice, finding less candidates to reduce redundancy). After this patch, the number of frame threads will match the size of the limited thread pool better, there is less thread synchronization, and each frame thread can have a wider scope, encoding more efficiently by finding better inter-coding candidates in a further distance. (At least the second half of this statement may be misunderstood.) _ P.S. - a quote from the docs: Quote:
Last edited by LigH; 8th July 2017 at 09:50. |
|
8th July 2017, 12:48 | #5440 | Link |
Testeur de codecs
Join Date: May 2003
Location: France
Posts: 2,484
|
x264 and x265 coding are really problematic with new CPU with multiple core like "threadripper" or "skylake-X". x264 and x265 are unable to make encoding at 100% for CPU charge for 2K or even for 4K with 16C/32T or more.
@ x265 team Not possible to create a new encoding mode in x265 with multiple instance for better threading compatibility? Use for exemple high lookahead buffer for make frame type decision and open new instance coding at each new Iframe (will be IDR). This mode imply certainely high frame buffer but if you have "threadripper" or "skylake-X" CPU, you must have at least 16 GB for RAM or more. This mode imply just Closed GOP and perhaps short GOP (60 frames maximum) to minimize frame buffer.
__________________
Le Sagittaire ... ;-) 1- Ateme AVC or x264 2- VP7 or RV10 only for anime 3- XviD, DivX or WMV9 Last edited by Sagittaire; 8th July 2017 at 15:05. |
Thread Tools | Search this Thread |
Display Modes | |
|
|