x265 HEVC Encoder - Page 334

benwaggoner · 24th January 2019, 18:19

Quote:

Originally Posted by nevcairiel

Other then compatibility, that really doesn't offer anything. AVX/AVX2 instructions making use of those 256-bit units would be about the same speed.

256-bit AVX on current Ryzen isn't that much faster then 128-bit SSE due to that.

In any case, there have been zero hints about AVX512 support.

And so far, AVX512 in x265 has only been demonstrated to be helpful on Intel systems doing UHD at slower+ presets; it'll make things slower for other scenarios on existing processors. That's why it is off by default even on a system with AVX512 support. AVX & AVX2 are always used if there is hardware support because they help significantly most of the time, and I don't know of any cases where they hurt.

Generally the value of AVX? instructions have improved over time, as microarchitecture improvements help with thermal throttling and other bottlenecks.

benwaggoner · 24th January 2019, 18:22

Quote:

Originally Posted by Selur

okay

And here it get's confusing to me.

Questions are:
a. is '--aq-strength' an option for both '--aq-mode' and '--hevc-aq' or just '--aq-mode'?
b. is '--aq-adaption-range' an option for both '--aq-mode' and '--hevc-aq' or just '--hevc-aq'?

It is definitely not b. So either a, or:

c. --aq-strength is an parameter for --aq-mode, and --aq-adaption-range is a parameter for --hevc-aq, and neither is used when the other aq type is used.

Boulder · 25th January 2019, 09:30

Quote:

Originally Posted by benwaggoner

And so far, AVX512 in x265 has only been demonstrated to be helpful on Intel systems doing UHD at slower+ presets; it'll make things slower for other scenarios on existing processors. That's why it is off by default even on a system with AVX512 support. AVX & AVX2 are always used if there is hardware support because they help significantly most of the time, and I don't know of any cases where they hurt.

Generally the value of AVX? instructions have improved over time, as microarchitecture improvements help with thermal throttling and other bottlenecks.

At least with a first generation Ryzen, you'll want to disable AVX2 in the x265 command line. It's slightly faster that way.

Stereodude · 26th January 2019, 14:42

Quote:

Originally Posted by Boulder

At least with a first generation Ryzen, you'll want to disable AVX2 in the x265 command line. It's slightly faster that way.

The Zen 2 architecture doubles the width of the datapath and the execution units to 256-bits which should double the performance of AVX2 on it vs. Zen.

The whole FPU got supersized in Zen 2 (vs. 1).

2x wider datapath (256-bit, up from 128-bit)
2x wider EUs (256-bit FMAs, up from 128-bit FMAs)
2x wider LSU (2x256-bit L/S, up from 128-bit)

from: https://en.wikichip.org/wiki/amd/mic...tectures/zen_2

NikosD · 26th January 2019, 15:20

Another important advantage of Zen 2 architecture and its implementation of Ryzen 3000 series, will be the CPU clock during heavy execution of AVX2 instructions, like running x265 app.

If AMD has been interpreted correctly, we will see no performance penalty due to lower clocks during x265 AVX2 code execution.

Intel sees lower clocks leveraging AVX2 instructions of x265 with all CPU architectures, so far.

I think Ryzen 3000 (and Threadripper, EPYC) based on Zen 2 architecture has all the benefits to be a lot faster than any Intel CPU ever released with the same number of cores.

And due to the fact that all AMD CPUs have more cores than Intel nowadays, then x265 could be a killer app for AMD, like Cinebench.

chinobino · 27th January 2019, 10:32

Quote:

Originally Posted by Wolfberry

x265-v3.0+1-ed72af837053 [ICC 1900][64 bit]

Redistributable Libraries for Intel® C++

x265-v3.0+1-ed72af837053-multilib [GCC 8.2.1][64 bit]

Perfect, thankyou.

hajj_3 · 27th January 2019, 14:59

HEVC licensing info article: http://www.streamingmedia.com/Articl...te-129386.aspx

WhatZit · 28th January 2019, 01:04

Quote:

Originally Posted by hajj_3

HEVC licensing info article: http://www.streamingmedia.com/Articl...te-129386.aspx

"Tragedy Of The Commons” a.k.a. "Stake The Velos Vampires"

Barough · 29th January 2019, 15:24

x265 v3.0_RC+13-ae085e5cd8a2 (32 & 64-bit 8/10/12bit Multilib Windows Binaries) (32bit : GCC 7.4.0 / 64bit : GCC 8.2.1)

Code:

https://bitbucket.org/multicoreware/x265/commits/branch/default

NOTE :
Checked with Pradeep (@MulticoreWare) about why the Default Branch haven't been pushed to v3.0 'Stable' and this is the reply/info i got

"
Our plan is to continue to use 3.0_RC on the default branch and have completed tags only on the stable branch. So we don't intend to merge back.
"

NikosD · 30th January 2019, 09:21

Quote:

Originally Posted by Stephen R. Savage

This is only true on server CPU. There is no separate frequency for AVX on client CPU (7700K, 8700K, 9900K, etc.). The extent to which you may see "lower clocks" on non-server is if you are encoding faster and running into the 65/95 W power limit, and in that case AMD will be no different.

But this is exactly the case for AMD, that they will not hit that power limit because of better efficiency/ architecture of Zen 2 implementing AVX2 instructions than any Intel architecture so far.

Probably 7nm could help too, keeping the same clocks for AVX2 like all the other instruction sets.

StvG · 30th January 2019, 09:27

Quote:

Originally Posted by benwaggoner

And so far, AVX512 in x265 has only been demonstrated to be helpful on Intel systems doing UHD at slower+ presets...

A simple test with 4K video downscaled to 1080p with avs+ and passed to x265 with avs2yuv, preset slower + ctu 32, AVX2@4500, AVX512@4500:
AVX2 - 7.08 fps
AVX512 - 7.87 fps

Another 1080p encoding with the same preset slower + ctu 32:
AVX2 - 7.68 fps
AVX512 - 8.08 fps

Also:
AVX2 ~ 290W
AVX512 ~ 250W

I'm using adaptive offset for vcore. So my vcore is 1.24v for @4800 (non-avx) and when encoding with AVX2 my core speed is @4500 but vcore remains the same 1.24v. When encoding with AVX512 my core speed is @4500 and vcore is 1.13v.

nevcairiel · 30th January 2019, 10:09

Quote:

Originally Posted by Stephen R. Savage

AVX-512 only reduces performance on server chips because of artificial frequency limits that Intel imposes. After the downclock, x265 actually draws less power and runs cooler, showing how it was unnecessary. Hopefully the next generation of server improves the power management algorithm so downclocking will not be needed.

On Skylake-X/WS and Cannon Lake, AVX-512 only ever increases performance. It will presumably also be the case on Ice Lake.

Actually the offset is needed to maintain stability. I know, because I have a 7900X, and tried to get the best out of it.
You wouldn't notice this problem with x264 or x265, because its AVX512 usage is pretty "light", but if you run some heavy AVX512 tasks on all cores, and don't configure an appropriate offset, the chip just crashes. The energy density of the AVX512 units is just too high for running at full turbo clocks, nevermind OCed.

If x265 is the only AVX512 you ever run, and you want to risk it, sure, you can disable the offset and hope that it never happens. But I prefer to know that my system is stable no matter what software does.

But be careful, and do know that you can not judge the requirement from one pretty lightweight workload.

The only way the offset is getting lower is when the cores get more efficient, which they really only do on a process shrink. So hopefully that'll significantly reduce the AVX512 offset, even if I don't expect it to go away quite just yet.

This is easily testable by anyone with such a chip. For example, recent versions of the Intel LINPACK floating-point benchmark will put enough AVX-512 stress on the CPU to cause this.

excellentswordfight · 30th January 2019, 16:44

Quote:

Originally Posted by StvG

A simple test with 4K video downscaled to 1080p with avs+ and passed to x265 with avs2yuv, preset slower + ctu 32, AVX2@4500, AVX512@4500:
AVX2 - 7.08 fps
AVX512 - 7.87 fps

Another 1080p encoding with the same preset slower + ctu 32:
AVX2 - 7.68 fps
AVX512 - 8.08 fps

Also:
AVX2 ~ 290W
AVX512 ~ 250W

I'm using adaptive offset for vcore. So my vcore is 1.24v for @4800 (non-avx) and when encoding with AVX2 my core speed is @4500 but vcore remains the same 1.24v. When encoding with AVX512 my core speed is @4500 and vcore is 1.13v.

I think he is refering to this https://software.intel.com/en-us/art...-intel-avx-512

And his statment is true for xeons. I did some tests on a Xeon Gold 6126 and even got lower performance for 2160p preset slow, clockspeeds down almost 20%, while gains of running avx512 gave maybe 10%. OCd X299 platforms are a niche (altough maybe not here).

NikosD · 30th January 2019, 19:00

Quote:

Originally Posted by Stephen R. Savage

That statement makes no sense. If AVX2 does not "hit that power limit," that means that the frequency for non-AVX was too low. As long as it can reach a higher throughput with AVX, that means that either AVX will reduce frequency or non-AVX will be underutilized.

But it does make sense in the context of real-world code optimization/execution.

Do you really hear from me for the first time that most non-SIMD code can't utilize a modern CPU in the way that SIMD code can ?

The only way to reach TDP limits of a modern CPU is from optimized SIMD code.

There are other limits to reach before power limits for non-AVX code.

NikosD · 30th January 2019, 22:35

Quote:

Originally Posted by Stephen R. Savage

Non-AVX code can and does reach power limits. SKX 28-core can easily reach 165 W in p95 non-AVX. RyZen 2000 hits around 150 W in p95 non-AVX with power limits disabled (which implies throttling with 95 W power limit enforced).

Anyway, SIMD code existed before AVX2, like MMX, SSE, SSE2, SSE3, SSSE3, SEE4.x etc.

For all those SIMD instruction sets, we never had lower clocks not even for AVX1.

AVX2 is even denser than all of those sets, but that doesn't mean that there is no architecture with no performance hit.

Zen 2 could be the first one.

katzenjoghurt · 30th January 2019, 23:22

Oh my.
Am I the only one having such a hard time encoding scenes with red light / red backgrounds?

E.g. if a character turns on a red light, his face would turn suddenly totally blocky.
Blue light is fine, green light also seems to be a bit bad, but red light is the devil.
Looks like x265 (also x264 I think) detects the scene as super-dark and reduces
the bitrate like crazy.

I doubt that it's just a display thing as I can see the problem on my Dell display,
my Benq display and my Samsung TV.

By now I fix it by scanning every source for red scenes before encoding and setting
zones like crazy via --zones startframe,endframe,b=1.5/startframe,endfr....
Super-tedious.

I was shocked again today after I checked my encoding of Disney's Aladdin...
Red sand with black dots -> blurred to unshaded flat areas.
Red stone wall backgrounds -> bluuuurr.

Looks like I need to double or triple the bitrate manually in these scenes just to keep
the subjectively visible detail level compared to the non-reddish scenes.

AQ3 won't help all too much either. The overall bitrate would get just too high if
I want to retain the details that way. *sigh*

benwaggoner · 31st January 2019, 06:04

Quote:

Originally Posted by katzenjoghurt

AQ3 won't help all too much either. The overall bitrate would get just too high if
I want to retain the details that way. *sigh*

Raise CFR with aq-mode 3. The question is what looks best at a given bitrate.

I wish x265 gave us a way to change the SAO parameters, so we could adjust how the smoothing works in different luma ranges.

benwaggoner · 31st January 2019, 06:06

Quote:

Originally Posted by Stephen R. Savage

That is why I said that hopefully the next version will have better power management to enable higher AVX and AVX-512 frequency in low intensity workloads.

And we saw thermal throttling for AVX2 get a lot better between Haswell and Skylake SP, so there is precedent for exactly that.

I suspect a truly optimized x265 would actually have different ASM depending on processor generation due to this kind of stuff.

katzenjoghurt · 31st January 2019, 11:49

Quote:

Originally Posted by benwaggoner

Raise CFR with aq-mode 3. The question is what looks best at a given bitrate.

I wish x265 gave us a way to change the SAO parameters, so we could adjust how the smoothing works in different luma ranges.

AQ3 isn't the solution for me...
I tried again yesterday... the file size doubled and still I didn't really
reach the quality I could achieve by defining zones manually.

SAO might add to it... but the issue is also there with SAO disabled.

In fact two of my past problems eventually seemed to come just from
the lighting / red area issue.

I wondered why Star Wars III gave me so much more trouble encoding than other Star Wars movies -> Answer: As Anakin turns evil here, the movie got a more reddish color coding.
https://forum.doom9.org/showpost.php...postcount=6353

Here the problem was in fact the character's dark red hair:
https://forum.doom9.org/showpost.php...6&postcount=11

With Aladdin I have the same issue now... it's dark red all over the place and super-hard to encode due to that.
It's even a problem in bright scenes.
Everything is crisp... and than there is this character wearing a dark red Fez. And the Fez is blocky and blurry.

fauxreaper · 31st January 2019, 13:32

Quote:

Originally Posted by katzenjoghurt

Oh my.
Am I the only one having such a hard time encoding scenes with red light / red backgrounds?

Use --cbqpoffs and --crqpoffs with negative values.

26th January 2019, 15:20	#6665 \| Link
NikosD Registered User Join Date: Aug 2010 Location: Athens, Greece Posts: 2,901	Another important advantage of Zen 2 architecture and its implementation of Ryzen 3000 series, will be the CPU clock during heavy execution of AVX2 instructions, like running x265 app. If AMD has been interpreted correctly, we will see no performance penalty due to lower clocks during x265 AVX2 code execution. Intel sees lower clocks leveraging AVX2 instructions of x265 with all CPU architectures, so far. I think Ryzen 3000 (and Threadripper, EPYC) based on Zen 2 architecture has all the benefits to be a lot faster than any Intel CPU ever released with the same number of cores. And due to the fact that all AMD CPUs have more cores than Intel nowadays, then x265 could be a killer app for AMD, like Cinebench. __________________ Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all

29th January 2019, 15:24	#6669 \| Link
Barough Registered User Join Date: Feb 2007 Location: Sweden Posts: 483	x265 v3.0_RC+13-ae085e5cd8a2 (32 & 64-bit 8/10/12bit Multilib Windows Binaries) (32bit : GCC 7.4.0 / 64bit : GCC 8.2.1) Code: https://bitbucket.org/multicoreware/x265/commits/branch/default *NOTE :* Checked with Pradeep (@MulticoreWare) about why the Default Branch haven't been pushed to v3.0 'Stable' and this is the reply/info i got " Our plan is to continue to use 3.0_RC on the default branch and have completed tags only on the stable branch. So we don't intend to merge back. "

30th January 2019, 23:22	#6676 \| Link
katzenjoghurt Registered User Join Date: Feb 2007 Posts: 128	The frustrated encoder Oh my. Am I the only one having such a hard time encoding scenes with red light / red backgrounds? E.g. if a character turns on a red light, his face would turn suddenly totally blocky. Blue light is fine, green light also seems to be a bit bad, but red light is the devil. Looks like x265 (also x264 I think) detects the scene as super-dark and reduces the bitrate like crazy. I doubt that it's just a display thing as I can see the problem on my Dell display, my Benq display and my Samsung TV. By now I fix it by scanning every source for red scenes before encoding and setting zones like crazy via --zones startframe,endframe,b=1.5/startframe,endfr.... Super-tedious. I was shocked again today after I checked my encoding of Disney's Aladdin... Red sand with black dots -> blurred to unshaded flat areas. Red stone wall backgrounds -> bluuuurr. Looks like I need to double or triple the bitrate manually in these scenes just to keep the subjectively visible detail level compared to the non-reddish scenes. AQ3 won't help all too much either. The overall bitrate would get just too high if I want to retain the details that way. sigh Last edited by katzenjoghurt; 30th January 2019 at 23:26.

27th January 2019, 14:59	#6667 \| Link
hajj_3 Registered User Join Date: Mar 2004 Posts: 1,126	HEVC licensing info article: http://www.streamingmedia.com/Articl...te-129386.aspx