Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 24th January 2019, 18:19   #6661  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
Quote:
Originally Posted by nevcairiel View Post
Other then compatibility, that really doesn't offer anything. AVX/AVX2 instructions making use of those 256-bit units would be about the same speed.

256-bit AVX on current Ryzen isn't that much faster then 128-bit SSE due to that.

In any case, there have been zero hints about AVX512 support.
And so far, AVX512 in x265 has only been demonstrated to be helpful on Intel systems doing UHD at slower+ presets; it'll make things slower for other scenarios on existing processors. That's why it is off by default even on a system with AVX512 support. AVX & AVX2 are always used if there is hardware support because they help significantly most of the time, and I don't know of any cases where they hurt.

Generally the value of AVX? instructions have improved over time, as microarchitecture improvements help with thermal throttling and other bottlenecks.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 24th January 2019, 18:22   #6662  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
Quote:
Originally Posted by Selur View Post
okay

And here it get's confusing to me.
Questions are:
a. is '--aq-strength' an option for both '--aq-mode' and '--hevc-aq' or just '--aq-mode'?
b. is '--aq-adaption-range' an option for both '--aq-mode' and '--hevc-aq' or just '--hevc-aq'?
It is definitely not b. So either a, or:

c. --aq-strength is an parameter for --aq-mode, and --aq-adaption-range is a parameter for --hevc-aq, and neither is used when the other aq type is used.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 25th January 2019, 09:30   #6663  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,718
Quote:
Originally Posted by benwaggoner View Post
And so far, AVX512 in x265 has only been demonstrated to be helpful on Intel systems doing UHD at slower+ presets; it'll make things slower for other scenarios on existing processors. That's why it is off by default even on a system with AVX512 support. AVX & AVX2 are always used if there is hardware support because they help significantly most of the time, and I don't know of any cases where they hurt.

Generally the value of AVX? instructions have improved over time, as microarchitecture improvements help with thermal throttling and other bottlenecks.
At least with a first generation Ryzen, you'll want to disable AVX2 in the x265 command line. It's slightly faster that way.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 26th January 2019, 14:42   #6664  |  Link
Stereodude
Registered User
 
Join Date: Dec 2002
Location: Region 0
Posts: 1,436
Quote:
Originally Posted by Boulder View Post
At least with a first generation Ryzen, you'll want to disable AVX2 in the x265 command line. It's slightly faster that way.
The Zen 2 architecture doubles the width of the datapath and the execution units to 256-bits which should double the performance of AVX2 on it vs. Zen.

The whole FPU got supersized in Zen 2 (vs. 1).

2x wider datapath (256-bit, up from 128-bit)
2x wider EUs (256-bit FMAs, up from 128-bit FMAs)
2x wider LSU (2x256-bit L/S, up from 128-bit)

from: https://en.wikichip.org/wiki/amd/mic...tectures/zen_2
Stereodude is offline   Reply With Quote
Old 26th January 2019, 15:20   #6665  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Another important advantage of Zen 2 architecture and its implementation of Ryzen 3000 series, will be the CPU clock during heavy execution of AVX2 instructions, like running x265 app.

If AMD has been interpreted correctly, we will see no performance penalty due to lower clocks during x265 AVX2 code execution.

Intel sees lower clocks leveraging AVX2 instructions of x265 with all CPU architectures, so far.

I think Ryzen 3000 (and Threadripper, EPYC) based on Zen 2 architecture has all the benefits to be a lot faster than any Intel CPU ever released with the same number of cores.

And due to the fact that all AMD CPUs have more cores than Intel nowadays, then x265 could be a killer app for AMD, like Cinebench.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 27th January 2019, 10:32   #6666  |  Link
chinobino
Registered User
 
chinobino's Avatar
 
Join Date: Dec 2014
Posts: 9
Quote:
Originally Posted by Wolfberry View Post
Perfect, thankyou.
chinobino is offline   Reply With Quote
Old 27th January 2019, 14:59   #6667  |  Link
hajj_3
Registered User
 
Join Date: Mar 2004
Posts: 1,120
HEVC licensing info article: http://www.streamingmedia.com/Articl...te-129386.aspx
hajj_3 is offline   Reply With Quote
Old 28th January 2019, 01:04   #6668  |  Link
WhatZit
Registered User
 
Join Date: Aug 2016
Posts: 60
Quote:
Originally Posted by hajj_3 View Post
"Tragedy Of The Commons” a.k.a. "Stake The Velos Vampires"
WhatZit is offline   Reply With Quote
Old 29th January 2019, 15:24   #6669  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 480
x265 v3.0_RC+13-ae085e5cd8a2 (32 & 64-bit 8/10/12bit Multilib Windows Binaries) (32bit : GCC 7.4.0 / 64bit : GCC 8.2.1)

Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default
NOTE :
Checked with Pradeep (@MulticoreWare) about why the Default Branch haven't been pushed to v3.0 'Stable' and this is the reply/info i got

"
Our plan is to continue to use 3.0_RC on the default branch and have completed tags only on the stable branch. So we don't intend to merge back.
"
Barough is offline   Reply With Quote
Old 30th January 2019, 09:21   #6670  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by Stephen R. Savage View Post
This is only true on server CPU. There is no separate frequency for AVX on client CPU (7700K, 8700K, 9900K, etc.). The extent to which you may see "lower clocks" on non-server is if you are encoding faster and running into the 65/95 W power limit, and in that case AMD will be no different.
But this is exactly the case for AMD, that they will not hit that power limit because of better efficiency/ architecture of Zen 2 implementing AVX2 instructions than any Intel architecture so far.

Probably 7nm could help too, keeping the same clocks for AVX2 like all the other instruction sets.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 30th January 2019, 09:27   #6671  |  Link
StvG
Registered User
 
Join Date: Jul 2018
Posts: 447
Quote:
Originally Posted by benwaggoner View Post
And so far, AVX512 in x265 has only been demonstrated to be helpful on Intel systems doing UHD at slower+ presets...
A simple test with 4K video downscaled to 1080p with avs+ and passed to x265 with avs2yuv, preset slower + ctu 32, AVX2@4500, AVX512@4500:
AVX2 - 7.08 fps
AVX512 - 7.87 fps

Another 1080p encoding with the same preset slower + ctu 32:
AVX2 - 7.68 fps
AVX512 - 8.08 fps

Also:
AVX2 ~ 290W
AVX512 ~ 250W

I'm using adaptive offset for vcore. So my vcore is 1.24v for @4800 (non-avx) and when encoding with AVX2 my core speed is @4500 but vcore remains the same 1.24v. When encoding with AVX512 my core speed is @4500 and vcore is 1.13v.
StvG is offline   Reply With Quote
Old 30th January 2019, 10:09   #6672  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
Quote:
Originally Posted by Stephen R. Savage View Post
AVX-512 only reduces performance on server chips because of artificial frequency limits that Intel imposes. After the downclock, x265 actually draws less power and runs cooler, showing how it was unnecessary. Hopefully the next generation of server improves the power management algorithm so downclocking will not be needed.

On Skylake-X/WS and Cannon Lake, AVX-512 only ever increases performance. It will presumably also be the case on Ice Lake.
Actually the offset is needed to maintain stability. I know, because I have a 7900X, and tried to get the best out of it.
You wouldn't notice this problem with x264 or x265, because its AVX512 usage is pretty "light", but if you run some heavy AVX512 tasks on all cores, and don't configure an appropriate offset, the chip just crashes. The energy density of the AVX512 units is just too high for running at full turbo clocks, nevermind OCed.

If x265 is the only AVX512 you ever run, and you want to risk it, sure, you can disable the offset and hope that it never happens. But I prefer to know that my system is stable no matter what software does.
But be careful, and do know that you can not judge the requirement from one pretty lightweight workload.

The only way the offset is getting lower is when the cores get more efficient, which they really only do on a process shrink. So hopefully that'll significantly reduce the AVX512 offset, even if I don't expect it to go away quite just yet.

This is easily testable by anyone with such a chip. For example, recent versions of the Intel LINPACK floating-point benchmark will put enough AVX-512 stress on the CPU to cause this.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 30th January 2019 at 10:25.
nevcairiel is online now   Reply With Quote
Old 30th January 2019, 16:44   #6673  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 322
Quote:
Originally Posted by StvG View Post
A simple test with 4K video downscaled to 1080p with avs+ and passed to x265 with avs2yuv, preset slower + ctu 32, AVX2@4500, AVX512@4500:
AVX2 - 7.08 fps
AVX512 - 7.87 fps

Another 1080p encoding with the same preset slower + ctu 32:
AVX2 - 7.68 fps
AVX512 - 8.08 fps

Also:
AVX2 ~ 290W
AVX512 ~ 250W

I'm using adaptive offset for vcore. So my vcore is 1.24v for @4800 (non-avx) and when encoding with AVX2 my core speed is @4500 but vcore remains the same 1.24v. When encoding with AVX512 my core speed is @4500 and vcore is 1.13v.
I think he is refering to this https://software.intel.com/en-us/art...-intel-avx-512

And his statment is true for xeons. I did some tests on a Xeon Gold 6126 and even got lower performance for 2160p preset slow, clockspeeds down almost 20%, while gains of running avx512 gave maybe 10%. OCd X299 platforms are a niche (altough maybe not here).

Last edited by excellentswordfight; 30th January 2019 at 16:47.
excellentswordfight is offline   Reply With Quote
Old 30th January 2019, 19:00   #6674  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by Stephen R. Savage View Post
That statement makes no sense. If AVX2 does not "hit that power limit," that means that the frequency for non-AVX was too low. As long as it can reach a higher throughput with AVX, that means that either AVX will reduce frequency or non-AVX will be underutilized.
But it does make sense in the context of real-world code optimization/execution.

Do you really hear from me for the first time that most non-SIMD code can't utilize a modern CPU in the way that SIMD code can ?

The only way to reach TDP limits of a modern CPU is from optimized SIMD code.

There are other limits to reach before power limits for non-AVX code.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 30th January 2019, 22:35   #6675  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by Stephen R. Savage View Post
Non-AVX code can and does reach power limits. SKX 28-core can easily reach 165 W in p95 non-AVX. RyZen 2000 hits around 150 W in p95 non-AVX with power limits disabled (which implies throttling with 95 W power limit enforced).
Anyway, SIMD code existed before AVX2, like MMX, SSE, SSE2, SSE3, SSSE3, SEE4.x etc.

For all those SIMD instruction sets, we never had lower clocks not even for AVX1.

AVX2 is even denser than all of those sets, but that doesn't mean that there is no architecture with no performance hit.

Zen 2 could be the first one.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 30th January 2019, 23:22   #6676  |  Link
katzenjoghurt
Registered User
 
Join Date: Feb 2007
Posts: 128
The frustrated encoder

Oh my.
Am I the only one having such a hard time encoding scenes with red light / red backgrounds?

E.g. if a character turns on a red light, his face would turn suddenly totally blocky.
Blue light is fine, green light also seems to be a bit bad, but red light is the devil.
Looks like x265 (also x264 I think) detects the scene as super-dark and reduces
the bitrate like crazy.

I doubt that it's just a display thing as I can see the problem on my Dell display,
my Benq display and my Samsung TV.

By now I fix it by scanning every source for red scenes before encoding and setting
zones like crazy via --zones startframe,endframe,b=1.5/startframe,endfr....
Super-tedious.

I was shocked again today after I checked my encoding of Disney's Aladdin...
Red sand with black dots -> blurred to unshaded flat areas.
Red stone wall backgrounds -> bluuuurr.

Looks like I need to double or triple the bitrate manually in these scenes just to keep
the subjectively visible detail level compared to the non-reddish scenes.

AQ3 won't help all too much either. The overall bitrate would get just too high if
I want to retain the details that way. *sigh*

Last edited by katzenjoghurt; 30th January 2019 at 23:26.
katzenjoghurt is offline   Reply With Quote
Old 31st January 2019, 06:04   #6677  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
Quote:
Originally Posted by katzenjoghurt View Post
AQ3 won't help all too much either. The overall bitrate would get just too high if
I want to retain the details that way. *sigh*
Raise CFR with aq-mode 3. The question is what looks best at a given bitrate.

I wish x265 gave us a way to change the SAO parameters, so we could adjust how the smoothing works in different luma ranges.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 31st January 2019, 06:06   #6678  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
Quote:
Originally Posted by Stephen R. Savage View Post
That is why I said that hopefully the next version will have better power management to enable higher AVX and AVX-512 frequency in low intensity workloads.
And we saw thermal throttling for AVX2 get a lot better between Haswell and Skylake SP, so there is precedent for exactly that.

I suspect a truly optimized x265 would actually have different ASM depending on processor generation due to this kind of stuff.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 31st January 2019, 11:49   #6679  |  Link
katzenjoghurt
Registered User
 
Join Date: Feb 2007
Posts: 128
Quote:
Originally Posted by benwaggoner View Post
Raise CFR with aq-mode 3. The question is what looks best at a given bitrate.

I wish x265 gave us a way to change the SAO parameters, so we could adjust how the smoothing works in different luma ranges.
AQ3 isn't the solution for me...
I tried again yesterday... the file size doubled and still I didn't really
reach the quality I could achieve by defining zones manually.

SAO might add to it... but the issue is also there with SAO disabled.


In fact two of my past problems eventually seemed to come just from
the lighting / red area issue.

I wondered why Star Wars III gave me so much more trouble encoding than other Star Wars movies -> Answer: As Anakin turns evil here, the movie got a more reddish color coding.
https://forum.doom9.org/showpost.php...postcount=6353

Here the problem was in fact the character's dark red hair:
https://forum.doom9.org/showpost.php...6&postcount=11


With Aladdin I have the same issue now... it's dark red all over the place and super-hard to encode due to that.
It's even a problem in bright scenes.
Everything is crisp... and than there is this character wearing a dark red Fez. And the Fez is blocky and blurry.
katzenjoghurt is offline   Reply With Quote
Old 31st January 2019, 13:32   #6680  |  Link
fauxreaper
Registered User
 
Join Date: Oct 2014
Posts: 23
Quote:
Originally Posted by katzenjoghurt View Post
Oh my.
Am I the only one having such a hard time encoding scenes with red light / red backgrounds?
Use --cbqpoffs and --crqpoffs with negative values.
fauxreaper is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 11:29.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.