Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 24th January 2019, 04:57   #6661  |  Link
brumsky
Registered User
 
Join Date: Jun 2016
Posts: 93
@excellentswordfight

I've come to the same conclusion as well. merange should be based off of the CTU. If you leave CTU at 64 then merange should also be reduced. It'll be searching outside of it's original block. Not to say that is bad necessarily though. I've seen certain "high quality" encodes that use meranges larger than the CTU. I'm not 100% sold that it provides tangible differences thoughs.

I personally drop merange to 26 when using CTU 32. Since I never plan on using hex search I should probably change merange to 58 when using CTU64...

Best way to find out is to try it for yourself.
brumsky is offline   Reply With Quote
Old 24th January 2019, 05:28   #6662  |  Link
brumsky
Registered User
 
Join Date: Jun 2016
Posts: 93
Quote:
Originally Posted by kosta1000 View Post
another question : When Ryzen 2 get out around this June, would this processor have much better performance in x265 encoding, while AMD put 256-bit AVX2 now, along with many other notable improvments.
I hope it does. I also hope AMD allows them to be fused like they current do with two 128-bit AVX FPUs that can run one 256-bit AVX2. If they do this then they will have the lead in AVX512!

Intel's top of the line 18 core CPU only has 2 AVX512 units on it. Imagine a Zen 2 CPU with up to 8 AVX512 fused units!!! I know there is an extra cycle or two when doing a fused operation but still up to 8 AVX512 operations will be nice!
brumsky is offline   Reply With Quote
Old 24th January 2019, 05:31   #6663  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 5,799
Quote:
--hevc-aq overrides whatever --aq-mode is set to.
okay
Quote:
I suspect that --aq-strength may get overridden itself, with --aq-adaption-range being the equivalent. Or both parameters could be used together, ala the interaction of CRF with maxrate/bufsize.
And here it get's confusing to me.
Questions are:
a. is '--aq-strength' an option for both '--aq-mode' and '--hevc-aq' or just '--aq-mode'?
b. is '--aq-adaption-range' an option for both '--aq-mode' and '--hevc-aq' or just '--hevc-aq'?

Cu Selur
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 24th January 2019, 09:04   #6664  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 9,615
Quote:
Originally Posted by brumsky View Post
I know there is an extra cycle or two when doing a fused operation but still up to 8 AVX512 operations will be nice!
Other then compatibility, that really doesn't offer anything. AVX/AVX2 instructions making use of those 256-bit units would be about the same speed.

256-bit AVX on current Ryzen isn't that much faster then 128-bit SSE due to that.

In any case, there have been zero hints about AVX512 support.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 24th January 2019, 18:19   #6665  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 2,794
Quote:
Originally Posted by nevcairiel View Post
Other then compatibility, that really doesn't offer anything. AVX/AVX2 instructions making use of those 256-bit units would be about the same speed.

256-bit AVX on current Ryzen isn't that much faster then 128-bit SSE due to that.

In any case, there have been zero hints about AVX512 support.
And so far, AVX512 in x265 has only been demonstrated to be helpful on Intel systems doing UHD at slower+ presets; it'll make things slower for other scenarios on existing processors. That's why it is off by default even on a system with AVX512 support. AVX & AVX2 are always used if there is hardware support because they help significantly most of the time, and I don't know of any cases where they hurt.

Generally the value of AVX? instructions have improved over time, as microarchitecture improvements help with thermal throttling and other bottlenecks.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 24th January 2019, 18:22   #6666  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 2,794
Quote:
Originally Posted by Selur View Post
okay

And here it get's confusing to me.
Questions are:
a. is '--aq-strength' an option for both '--aq-mode' and '--hevc-aq' or just '--aq-mode'?
b. is '--aq-adaption-range' an option for both '--aq-mode' and '--hevc-aq' or just '--hevc-aq'?
It is definitely not b. So either a, or:

c. --aq-strength is an parameter for --aq-mode, and --aq-adaption-range is a parameter for --hevc-aq, and neither is used when the other aq type is used.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 25th January 2019, 09:30   #6667  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,513
Quote:
Originally Posted by benwaggoner View Post
And so far, AVX512 in x265 has only been demonstrated to be helpful on Intel systems doing UHD at slower+ presets; it'll make things slower for other scenarios on existing processors. That's why it is off by default even on a system with AVX512 support. AVX & AVX2 are always used if there is hardware support because they help significantly most of the time, and I don't know of any cases where they hurt.

Generally the value of AVX? instructions have improved over time, as microarchitecture improvements help with thermal throttling and other bottlenecks.
At least with a first generation Ryzen, you'll want to disable AVX2 in the x265 command line. It's slightly faster that way.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 26th January 2019, 14:42   #6668  |  Link
Stereodude
Registered User
 
Join Date: Dec 2002
Location: Region 0
Posts: 1,032
Quote:
Originally Posted by Boulder View Post
At least with a first generation Ryzen, you'll want to disable AVX2 in the x265 command line. It's slightly faster that way.
The Zen 2 architecture doubles the width of the datapath and the execution units to 256-bits which should double the performance of AVX2 on it vs. Zen.

The whole FPU got supersized in Zen 2 (vs. 1).

2x wider datapath (256-bit, up from 128-bit)
2x wider EUs (256-bit FMAs, up from 128-bit FMAs)
2x wider LSU (2x256-bit L/S, up from 128-bit)

from: https://en.wikichip.org/wiki/amd/mic...tectures/zen_2
Stereodude is offline   Reply With Quote
Old 26th January 2019, 15:20   #6669  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,501
Another important advantage of Zen 2 architecture and its implementation of Ryzen 3000 series, will be the CPU clock during heavy execution of AVX2 instructions, like running x265 app.

If AMD has been interpreted correctly, we will see no performance penalty due to lower clocks during x265 AVX2 code execution.

Intel sees lower clocks leveraging AVX2 instructions of x265 with all CPU architectures, so far.

I think Ryzen 3000 (and Threadripper, EPYC) based on Zen 2 architecture has all the benefits to be a lot faster than any Intel CPU ever released with the same number of cores.

And due to the fact that all AMD CPUs have more cores than Intel nowadays, then x265 could be a killer app for AMD, like Cinebench.
__________________
Win 10 x64 (17763.195) - Core i3-4170/ iGPU HD 4400 (v.5058)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 27th January 2019, 10:32   #6670  |  Link
chinobino
Registered User
 
chinobino's Avatar
 
Join Date: Dec 2014
Posts: 8
Quote:
Originally Posted by Wolfberry View Post
Perfect, thankyou.
chinobino is offline   Reply With Quote
Old 27th January 2019, 14:59   #6671  |  Link
hajj_3
Registered User
 
Join Date: Mar 2004
Posts: 874
HEVC licensing info article: http://www.streamingmedia.com/Articl...te-129386.aspx
hajj_3 is offline   Reply With Quote
Old 28th January 2019, 01:04   #6672  |  Link
WhatZit
Registered User
 
Join Date: Aug 2016
Posts: 59
Quote:
Originally Posted by hajj_3 View Post
"Tragedy Of The Commons” a.k.a. "Stake The Velos Vampires"
WhatZit is offline   Reply With Quote
Old 29th January 2019, 15:24   #6673  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 307
x265 v3.0_RC+13-ae085e5cd8a2 (32 & 64-bit 8/10/12bit Multilib Windows Binaries) (32bit : GCC 7.4.0 / 64bit : GCC 8.2.1)

Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default
NOTE :
Checked with Pradeep (@MulticoreWare) about why the Default Branch haven't been pushed to v3.0 'Stable' and this is the reply/info i got

"
Our plan is to continue to use 3.0_RC on the default branch and have completed tags only on the stable branch. So we don't intend to merge back.
"
Barough is offline   Reply With Quote
Old 30th January 2019, 06:24   #6674  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 341
Quote:
Originally Posted by benwaggoner View Post
And so far, AVX512 in x265 has only been demonstrated to be helpful on Intel systems doing UHD at slower+ presets; it'll make things slower for other scenarios on existing processors. That's why it is off by default even on a system with AVX512 support.
AVX-512 only reduces performance on server chips because of artificial frequency limits that Intel imposes. After the downclock, x265 actually draws less power and runs cooler, showing how it was unnecessary. Hopefully the next generation of server improves the power management algorithm so downclocking will not be needed.

On Skylake-X/WS and Cannon Lake, AVX-512 only ever increases performance. It will presumably also be the case on Ice Lake.

Quote:
Originally Posted by Stereodude View Post
The Zen 2 architecture doubles the width of the datapath and the execution units to 256-bits which should double the performance of AVX2 on it vs. Zen.

The whole FPU got supersized in Zen 2 (vs. 1).

2x wider datapath (256-bit, up from 128-bit)
2x wider EUs (256-bit FMAs, up from 128-bit FMAs)
2x wider LSU (2x256-bit L/S, up from 128-bit)

from: https://en.wikichip.org/wiki/amd/mic...tectures/zen_2
What is important for encoding is not FMACs or even LSU, but rather shuffler throughput and latency. AMD has a long history of dropping the ball on shuffles, so we will have to wait and see here...

Quote:
Originally Posted by NikosD View Post
Intel sees lower clocks leveraging AVX2 instructions of x265 with all CPU architectures, so far.
This is only true on server CPU. There is no separate frequency for AVX on client CPU (7700K, 8700K, 9900K, etc.). The extent to which you may see "lower clocks" on non-server is if you are encoding faster and running into the 65/95 W power limit, and in that case AMD will be no different.

Last edited by Stephen R. Savage; 30th January 2019 at 06:29.
Stephen R. Savage is offline   Reply With Quote
Old 30th January 2019, 09:21   #6675  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,501
Quote:
Originally Posted by Stephen R. Savage View Post
This is only true on server CPU. There is no separate frequency for AVX on client CPU (7700K, 8700K, 9900K, etc.). The extent to which you may see "lower clocks" on non-server is if you are encoding faster and running into the 65/95 W power limit, and in that case AMD will be no different.
But this is exactly the case for AMD, that they will not hit that power limit because of better efficiency/ architecture of Zen 2 implementing AVX2 instructions than any Intel architecture so far.

Probably 7nm could help too, keeping the same clocks for AVX2 like all the other instruction sets.
__________________
Win 10 x64 (17763.195) - Core i3-4170/ iGPU HD 4400 (v.5058)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 30th January 2019, 09:27   #6676  |  Link
StvG
Registered User
 
Join Date: Jul 2018
Posts: 24
Quote:
Originally Posted by benwaggoner View Post
And so far, AVX512 in x265 has only been demonstrated to be helpful on Intel systems doing UHD at slower+ presets...
A simple test with 4K video downscaled to 1080p with avs+ and passed to x265 with avs2yuv, preset slower + ctu 32, AVX2@4500, AVX512@4500:
AVX2 - 7.08 fps
AVX512 - 7.87 fps

Another 1080p encoding with the same preset slower + ctu 32:
AVX2 - 7.68 fps
AVX512 - 8.08 fps

Also:
AVX2 ~ 290W
AVX512 ~ 250W

I'm using adaptive offset for vcore. So my vcore is 1.24v for @4800 (non-avx) and when encoding with AVX2 my core speed is @4500 but vcore remains the same 1.24v. When encoding with AVX512 my core speed is @4500 and vcore is 1.13v.
StvG is offline   Reply With Quote
Old 30th January 2019, 10:09   #6677  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 9,615
Quote:
Originally Posted by Stephen R. Savage View Post
AVX-512 only reduces performance on server chips because of artificial frequency limits that Intel imposes. After the downclock, x265 actually draws less power and runs cooler, showing how it was unnecessary. Hopefully the next generation of server improves the power management algorithm so downclocking will not be needed.

On Skylake-X/WS and Cannon Lake, AVX-512 only ever increases performance. It will presumably also be the case on Ice Lake.
Actually the offset is needed to maintain stability. I know, because I have a 7900X, and tried to get the best out of it.
You wouldn't notice this problem with x264 or x265, because its AVX512 usage is pretty "light", but if you run some heavy AVX512 tasks on all cores, and don't configure an appropriate offset, the chip just crashes. The energy density of the AVX512 units is just too high for running at full turbo clocks, nevermind OCed.

If x265 is the only AVX512 you ever run, and you want to risk it, sure, you can disable the offset and hope that it never happens. But I prefer to know that my system is stable no matter what software does.
But be careful, and do know that you can not judge the requirement from one pretty lightweight workload.

The only way the offset is getting lower is when the cores get more efficient, which they really only do on a process shrink. So hopefully that'll significantly reduce the AVX512 offset, even if I don't expect it to go away quite just yet.

This is easily testable by anyone with such a chip. For example, recent versions of the Intel LINPACK floating-point benchmark will put enough AVX-512 stress on the CPU to cause this.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 30th January 2019 at 10:25.
nevcairiel is offline   Reply With Quote
Old 30th January 2019, 16:44   #6678  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 59
Quote:
Originally Posted by StvG View Post
A simple test with 4K video downscaled to 1080p with avs+ and passed to x265 with avs2yuv, preset slower + ctu 32, AVX2@4500, AVX512@4500:
AVX2 - 7.08 fps
AVX512 - 7.87 fps

Another 1080p encoding with the same preset slower + ctu 32:
AVX2 - 7.68 fps
AVX512 - 8.08 fps

Also:
AVX2 ~ 290W
AVX512 ~ 250W

I'm using adaptive offset for vcore. So my vcore is 1.24v for @4800 (non-avx) and when encoding with AVX2 my core speed is @4500 but vcore remains the same 1.24v. When encoding with AVX512 my core speed is @4500 and vcore is 1.13v.
I think he is refering to this https://software.intel.com/en-us/art...-intel-avx-512

And his statment is true for xeons. I did some tests on a Xeon Gold 6126 and even got lower performance for 2160p preset slow, clockspeeds down almost 20%, while gains of running avx512 gave maybe 10%. OCd X299 platforms are a niche (altough maybe not here).

Last edited by excellentswordfight; 30th January 2019 at 16:47.
excellentswordfight is offline   Reply With Quote
Old 30th January 2019, 18:05   #6679  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 341
Quote:
Originally Posted by NikosD View Post
But this is exactly the case for AMD, that they will not hit that power limit because of better efficiency/ architecture of Zen 2 implementing AVX2 instructions than any Intel architecture so far.

Probably 7nm could help too, keeping the same clocks for AVX2 like all the other instruction sets.
That statement makes no sense. If AVX2 does not "hit that power limit," that means that the frequency for non-AVX was too low. As long as it can reach a higher throughput with AVX, that means that either AVX will reduce frequency or non-AVX will be underutilized.

Quote:
Originally Posted by nevcairiel View Post
Actually the offset is needed to maintain stability. I know, because I have a 7900X, and tried to get the best out of it.

This is easily testable by anyone with such a chip. For example, recent versions of the Intel LINPACK floating-point benchmark will put enough AVX-512 stress on the CPU to cause this.
The separate turbo schedule for AVX is because of worst-case scenarios, but the CPU actually has VID tables for the same maximum frequency (~4 GHz) in both AVX and AVX-512. It can already manage different workload intensities through RAPL (TDP throttling), so the different frequency limits for all-cores is because of power management limitations. That is why I said that hopefully the next version will have better power management to enable higher AVX and AVX-512 frequency in low intensity workloads.

Last edited by Stephen R. Savage; 30th January 2019 at 18:10.
Stephen R. Savage is offline   Reply With Quote
Old 30th January 2019, 19:00   #6680  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,501
Quote:
Originally Posted by Stephen R. Savage View Post
That statement makes no sense. If AVX2 does not "hit that power limit," that means that the frequency for non-AVX was too low. As long as it can reach a higher throughput with AVX, that means that either AVX will reduce frequency or non-AVX will be underutilized.
But it does make sense in the context of real-world code optimization/execution.

Do you really hear from me for the first time that most non-SIMD code can't utilize a modern CPU in the way that SIMD code can ?

The only way to reach TDP limits of a modern CPU is from optimized SIMD code.

There are other limits to reach before power limits for non-AVX code.
__________________
Win 10 x64 (17763.195) - Core i3-4170/ iGPU HD 4400 (v.5058)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 18:25.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.