Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 29th June 2017, 03:31   #5421  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 480
x265 v2.4+89-fa076d29d619 (MSYS/MinGW, GCC 6.3.0, 32 & 64bit 8/10/12bit multilib EXEs)

x265 [info]: HEVC encoder version 2.4+89-fa076d29d619
x265 [info]: build info [Windows][GCC 6.3.0][32 bit/64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2

Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default
Barough is offline   Reply With Quote
Old 30th June 2017, 03:25   #5422  |  Link
Midzuki
Unavailable
 
Midzuki's Avatar
 
Join Date: Mar 2009
Location: offline
Posts: 1,480
x265.exe 2.4+93-ef8dfbb70dd6

https://forum.videohelp.com/threads/...=1#post2490163
Midzuki is offline   Reply With Quote
Old 30th June 2017, 09:35   #5423  |  Link
Magik Mark
Registered User
 
Join Date: Dec 2014
Posts: 666
Guys,

Can you explain the difference between --analysis vs --multipass?

Both are trying to eliminate the work redundancy on multi pass encoding. Why not just have one command for this?
__________________
Asus ProArt Z790 - 13th Gen Intel i9 - RTX 3080 - DDR5 64GB Predator - LG OLED C9 - Yamaha A3030 - Windows 11 x64 - PotPlayerr - Lav - MadVR
Magik Mark is offline   Reply With Quote
Old 30th June 2017, 10:07   #5424  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
The mysteries of



--[no-]multi-pass-opt-analysis
--[no-]multi-pass-opt-distortion

Quote:
Multipass {analysis refinement|refinement of qp} cannot be enabled when ‘analysis-save/analysis-load’ option is enabled and both will be disabled when enabled together.
--analysis-reuse-mode <string|int>
--analysis-reuse-file <filename>
--analysis-reuse-level <1..10>

I guess that using an analysis file supersedes several on-the-fly internal calculations a refinement would have used with fixed values not meant to be changed further... I wonder how deep you have to understand the principles of operation in the encoder core, to understand the relations between these two groups of options. You may probably have to be able to read and understand the C sources, at least.

I see the analysis file use rather as a kind of debugging and optimization tool for running dozens of encodings and comparing a lot of statistics, rather than speeding up a casual user's movie conversion.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid

Last edited by LigH; 30th June 2017 at 10:11.
LigH is offline   Reply With Quote
Old 30th June 2017, 10:23   #5425  |  Link
Magik Mark
Registered User
 
Join Date: Dec 2014
Posts: 666
Thanks LigH.

It is really difficult to understand these things especially for ordinary folks like me. How I wish someone can explain these through images. I think that is more comprehensible for everyone
__________________
Asus ProArt Z790 - 13th Gen Intel i9 - RTX 3080 - DDR5 64GB Predator - LG OLED C9 - Yamaha A3030 - Windows 11 x64 - PotPlayerr - Lav - MadVR
Magik Mark is offline   Reply With Quote
Old 30th June 2017, 13:45   #5426  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 480
x265 v2.4+96-58b4fa89c42d (MSYS/MinGW, GCC 6.3.0, 32 & 64bit 8/10/12bit multilib EXEs)

x265 [info]: HEVC encoder version 2.4+96-58b4fa89c42d
x265 [info]: build info [Windows][GCC 6.3.0][32 bit/64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2

Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default

Last edited by Barough; 30th June 2017 at 14:09.
Barough is offline   Reply With Quote
Old 30th June 2017, 18:07   #5427  |  Link
x265_Project
Guest
 
Posts: n/a
Analysis reuse provides the basic framework for more advanced solutions that we are building on top of / around x265 (in UHDkit). I don't want to go into more detail about all of the use-cases, as we've seen our competition already attempting to our methods. The basic idea is to produce solutions that run faster (for live encoding scenarios) and more computationally efficient (for offline encoding scenarios). These modes are not going to provide any benefit to anyone today who is just using x265 alone. They won't produce higher quality.
  Reply With Quote
Old 3rd July 2017, 16:03   #5428  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
x265 2.4+96-58b4fa89c42d (GCC 7.1.0, Win32+Win64, AIO EXE+DLL only)

merge with stable; several fixes and tweaks

renamed / changed / new CLI options:

Code:
   --analysis-reuse-mode <string|int>  save - Dump analysis info into file, load - Load analysis buffers from the file. Default 0
   --analysis-reuse-file <filename>    Specify file name used for either dumping or reading analysis data. Deault x265_analysis.dat
   --analysis-reuse-level <1..10>      Level of analysis reuse indicates amount of info stored/reused in save/load mode, 1:least..10:most. Default 5

   --refine-intra <int>          Enable intra refinement for load mode. Default 0

   --[no-]const-vbv              Enable consistent vbv. turned on with tune grain. Default disabled
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 6th July 2017, 21:11   #5429  |  Link
Midzuki
Unavailable
 
Midzuki's Avatar
 
Join Date: Mar 2009
Location: offline
Posts: 1,480
x265.exe 2.4+97-006c75cf822e

Code:
Aruna Matheswaran  committed 006c75c
2017-06-22

Allocate frame threads based on available pool threads

This patch decides #frame-threads based on #pool-threads available. If pools not
specified, #frame-threads will be decided based on detected #CPU-threads.

This patch also decreases #frame-threads allocated for #pool-threads in the
interval (15 - 31) and (>= 32) as there is high run to run variation in bitrate
and SSIM with higher frame threads.With this reduction in #frame-threads there
is ~3-4 % drop in fps with little SSIM improvement for #pool-threads (15 - 31)
and no significant change in performance for #pool-threads (>= 32).
https://forum.videohelp.com/threads/...=1#post2490727
Midzuki is offline   Reply With Quote
Old 6th July 2017, 23:06   #5430  |  Link
pingfr
Registered User
 
Join Date: May 2015
Posts: 185
Quote:
Originally Posted by Midzuki View Post
x265.exe 2.4+97-006c75cf822e

Code:
Aruna Matheswaran  committed 006c75c
2017-06-22

Allocate frame threads based on available pool threads

This patch decides #frame-threads based on #pool-threads available. If pools not
specified, #frame-threads will be decided based on detected #CPU-threads.

This patch also decreases #frame-threads allocated for #pool-threads in the
interval (15 - 31) and (>= 32) as there is high run to run variation in bitrate
and SSIM with higher frame threads.With this reduction in #frame-threads there
is ~3-4 % drop in fps with little SSIM improvement for #pool-threads (15 - 31)
and no significant change in performance for #pool-threads (>= 32).
https://forum.videohelp.com/threads/...=1#post2490727
If this is what I think it is, this should tremendously help encoding faster on high-end dedicated servers? Thinking of a SMP machine with like 60 cores, 120 threads, etc?
pingfr is offline   Reply With Quote
Old 7th July 2017, 07:20   #5431  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
The maximum number of threads will still be somewhat limited, due to a limited bitfield width of the core mask. Furthermore, too many threads for one encoder instance don't make sense, will be inefficient because each thread will only "see" a small part of the frame, and the thread sync overhead rises. With a huge number of physical cores, running several instances of the encoder in parallel on a subset of cores each is much more efficient (quality / speed).

I guess this patch will support the execution of several instances, each limited to a subset of cores, by allocating threads based on the limited amount of cores in each separate pool, as if there was only a CPU with fewer cores, thus avoiding too many threads in the sum of all instances.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 7th July 2017, 07:29   #5432  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
Did we read the same text?
Quote:
With this reduction in #frame-threads there
is ~3-4 % drop in fps with little SSIM improvement for #pool-threads (15 - 31)
and no significant change in performance for #pool-threads (>= 32).
To me it means:
15 - 31 threads: 3% to 4% slower
>= 32 threads: +/- 0%
sneaker_ger is offline   Reply With Quote
Old 7th July 2017, 09:37   #5433  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
Pradeep's patch review note before committing was: "The improvements in quality seem to justify the change."

Quite imaginable to me: Fewer threads reduce the speed a bit, but increase the quality. And beyond a threshold of threads, saturation effects of the thread management may be the bottleneck, I guess.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 7th July 2017, 17:30   #5434  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 480
x265 v2.4+99-3160e1a0cc5f (MSYS/MinGW, GCC 7.1.0, 32 & 64bit 8/10/12bit multilib EXEs)

x265 [info]: HEVC encoder version 2.4+99-3160e1a0cc5f
x265 [info]: build info [Windows][GCC 7.1.0][32/64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2

Merge with default; prep for v2.5

Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default

Last edited by Barough; 8th July 2017 at 16:55.
Barough is offline   Reply With Quote
Old 7th July 2017, 23:29   #5435  |  Link
pingfr
Registered User
 
Join Date: May 2015
Posts: 185
Quote:
Originally Posted by LigH View Post
Pradeep's patch review note before committing was: "The improvements in quality seem to justify the change."

Quite imaginable to me: Fewer threads reduce the speed a bit, but increase the quality. And beyond a threshold of threads, saturation effects of the thread management may be the bottleneck, I guess.
In layman's terms, does that implies this patch merged in not only increases encoding speed but the overall resulting quality as well or am I getting confused here?
pingfr is offline   Reply With Quote
Old 7th July 2017, 23:41   #5436  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,344
Quote:
Originally Posted by pingfr View Post
In layman's terms, does that implies this patch merged in not only increases encoding speed but the overall resulting quality as well or am I getting confused here?
It slows down speed for some thread/core configurations but increases quality.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 7th July 2017, 23:46   #5437  |  Link
pingfr
Registered User
 
Join Date: May 2015
Posts: 185
So quality increase (visual subjectivity is subjective) at equal bitrate pre-patch but at a 3-4% speed decrease cost, correct?
pingfr is offline   Reply With Quote
Old 8th July 2017, 02:56   #5438  |  Link
burfadel
Registered User
 
Join Date: Aug 2006
Posts: 2,229
Depending on configuration. Actual difference also includes any filters used. The figures aren't given as 'you will receive', if figures weren't given people would be questioning how much.
burfadel is offline   Reply With Quote
Old 8th July 2017, 09:40   #5439  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
Let me explain more verbosely how I imagine that. (If I am wrong, don't hesitate to correct me. Understanding x265 frame threading constraints is hard.)

Let's imagine you have a dual socket mainboard, each of your two CPU's has 16 cores (maybe logical due to HT). So the whole system will report 32 cores overall.

You decide to run two instances of x265 at the same time, each with a thread pool of 16, to run distinctly on either CPU.

Before this patch, the number of frame threads depended on the number of cores in the whole system, so it was calculated in relation to the number 32.

After this patch, the number of frame threads will depend on the capacity of each separate thread pool, thus be calculated in relation to the number 16. In addition, it reduces them a bit more, as the developers discovered that even fewer threads in this range, compared to the previous calculation of the optimal number, has an advantage.

Of course this means fewer frame threads. That will slow down the calculation a bit. But before, x265 may have been less efficient, because it spawned too many threads for the limited pool (causing more synchronization overhead than necessary) and with a smaller scope each ("seeing" less of the whole neighborhood of the currently encoded slice, finding less candidates to reduce redundancy).

After this patch, the number of frame threads will match the size of the limited thread pool better, there is less thread synchronization, and each frame thread can have a wider scope, encoding more efficiently by finding better inter-coding candidates in a further distance. (At least the second half of this statement may be misunderstood.)
_

P.S. - a quote from the docs:

Quote:
Over-allocating frame threads can be very counter-productive. They each allocate a large amount of memory and because of the limited number of CTU rows and the reference lag, you generally get limited benefit from adding frame encoders beyond the auto-detected count, and often the extra frame encoders reduce performance.
Doesn't really explain the reason for a potential quality limitation due to over-allocation, though...
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid

Last edited by LigH; 8th July 2017 at 09:50.
LigH is offline   Reply With Quote
Old 8th July 2017, 12:48   #5440  |  Link
Sagittaire
Testeur de codecs
 
Sagittaire's Avatar
 
Join Date: May 2003
Location: France
Posts: 2,484
x264 and x265 coding are really problematic with new CPU with multiple core like "threadripper" or "skylake-X". x264 and x265 are unable to make encoding at 100% for CPU charge for 2K or even for 4K with 16C/32T or more.

@ x265 team

Not possible to create a new encoding mode in x265 with multiple instance for better threading compatibility?

Use for exemple high lookahead buffer for make frame type decision and open new instance coding at each new Iframe (will be IDR).

This mode imply certainely high frame buffer but if you have "threadripper" or "skylake-X" CPU, you must have at least 16 GB for RAM or more.

This mode imply just Closed GOP and perhaps short GOP (60 frames maximum) to minimize frame buffer.
__________________
Le Sagittaire ... ;-)

1- Ateme AVC or x264
2- VP7 or RV10 only for anime
3- XviD, DivX or WMV9

Last edited by Sagittaire; 8th July 2017 at 15:05.
Sagittaire is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 19:04.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.