Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 11th April 2018, 21:17   #6001  |  Link
Midzuki
Unavailable
 
Midzuki's Avatar
 
Join Date: Mar 2009
Location: offline
Posts: 1,480
Quote:
Originally Posted by LigH View Post


307 patches with AVX-512 (and other improved assembly) code uploaded to the developer mailing list. That will take a little while to review.
They are up and running 0_o

https://bitbucket.org/multicoreware/x265/commits/all

Last edited by Midzuki; 11th April 2018 at 21:18. Reason: :-/
Midzuki is offline   Reply With Quote
Old 11th April 2018, 21:25   #6002  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,752
Damn. I waited for the "Re: 0/307 — approved" mail.

Time to build.
_

P.S.: Compiling x265 with AVX-512 support works only for x86-64 architecture targets. A "bailout" for x86 (Win32) architecture targets seems to be missing, so it throws "invalid opcode" errors for the 8-bit depth core where assembler is still enabled.
_

x265 2.7+332-593e63cda903 (Win64)

Support for AVX-512 assembly optimized kernels; remember: enable it manually by adding --asm avx512 to the CLI — and don't fry your CPU...

Only x86-64 (Win64) version available, skipping it in x86 (Win32) mode for NASM is necessary not to break compilation completely.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid

Last edited by LigH; 11th April 2018 at 22:52.
LigH is online now   Reply With Quote
Old 12th April 2018, 01:15   #6003  |  Link
hajj_3
Registered User
 
Join Date: Mar 2004
Posts: 1,120
anyone with an avx512 capable processor fancy doing benchmarks comparing it to the previous build?
hajj_3 is offline   Reply With Quote
Old 12th April 2018, 02:10   #6004  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,752
I already have that feeling that one day, x265 will be used rather as a benchmark for the efficiency of the AVX implementations in a specific CPU, rather than as a benchmark for efficient video encoding ...
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is online now   Reply With Quote
Old 12th April 2018, 02:54   #6005  |  Link
Midzuki
Unavailable
 
Midzuki's Avatar
 
Join Date: Mar 2009
Location: offline
Posts: 1,480
More AVX-512 code = bigger filesize

Midzuki is offline   Reply With Quote
Old 12th April 2018, 05:04   #6006  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,883
Yep, but in 2018 20.7 MB is still very small and the increase is negligible. Unfortunately I can't test the latest AVX512 instruction set 'cause I have a Intel Xeon E5-2660 v4 that supports AVX2 only, sadly.
I look forward for benchmarks.
FranceBB is offline   Reply With Quote
Old 12th April 2018, 05:09   #6007  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,558
I thought having Kaby Lake meant I had them, but nope, servers only. I have one customer who has a brand spanking new Skylake-X server that I can remote into, I should be able to get benchmarks tomorrow.
foxyshadis is offline   Reply With Quote
Old 12th April 2018, 06:37   #6008  |  Link
Asmodian
Registered User
 
Join Date: Feb 2002
Location: San Jose, California
Posts: 4,406
AVX-512 is faster!

I did some benchmarks using LigH's build x265 2.7+332-593e63cda903 (Win64) above. I used the same build for the AVX2 tests, simply without the "--asm avx512" command.
i9-7900X @ 4.5 GHz all cores, 3.0 GHz mesh/cache, DDR4 4000-17-18-18-41-1T. No AVX2 or AVX-512 multiplier offsets. Max 92 degC package CPU temperature during both veryslow encodes. The faster modes did not saturate all 20 threads.

The source is 1920x1080 8-bit gradient MagicYUV 4:2:0 on a NvME SSD encoding to another NvME SSD. I used the first 1000 frames from Firefly episode 9 which I had already denoised (SMDegrain) and had on my drive.

avs2pipemod.exe -y4mp=1:1 "fireflyshort.avs" | x265_AVX512.exe --input - --y4m -o "D:\temp\fireflyshort.mkv" --asm avx512 --preset veryslow --crf 18.5 --output-depth 10
x265 [info]: HEVC encoder version 2.7+332-593e63cda903
x265 [info]: build info [Windows][GCC 7.3.0][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512
x265 [info]: Main 10 profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 20 threads

veryslow:
AVX512: encoded 1000 frames in 303.44s (3.30 fps), 4037.41 kb/s, Avg QP:20.64
AVX2: encoded 1000 frames in 335.83s (2.98 fps), 4037.41 kb/s, Avg QP:20.64
medium:
AVX512: encoded 1000 frames in 28.41s (35.20 fps), 3183.67 kb/s, Avg QP:20.46
AVX2: encoded 1000 frames in 30.71s (32.57 fps), 3183.67 kb/s, Avg QP:20.46
veryfast:
AVX512: encoded 1000 frames in 15.47s (64.64 fps), 2769.26 kb/s, Avg QP:20.89
AVX2: encoded 1000 frames in 16.89s (59.20 fps), 2769.26 kb/s, Avg QP:20.89
ultrafast:
AVX512: encoded 1000 frames in 6.86s (145.77 fps), 1398.46 kb/s, Avg QP:25.00
AVX2: encoded 1000 frames in 7.22s (138.41 fps), 1398.46 kb/s, Avg QP:25.00

Thanks to everyone who works on x265 and thanks for the regular builds LigH.
__________________
madVR options explained

Last edited by Asmodian; 12th April 2018 at 07:12.
Asmodian is offline   Reply With Quote
Old 12th April 2018, 10:41   #6009  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 322
Slower here.

Using LGHs build with a dell 2u rack server with a Xeon Gold 6126 (12c/24t). CPU utilization dropped with about 10% (both for 1080p and 2160p) and clockspeed dropped from 2.9Ghz to 2.4Ghz. I'm guessing that the gains for AVX512 didnt outweight the dropp in clockspeed and utilization.

Tears of steal source (10bit UHD-Bluray compat x265 source for 2160p test, 8bit bluray compat x264 soruce for 1080p)

2160p with avx512: 80-90% CPU usage, 2.28 fps
Code:
--asm avx512 --preset slow --profile main10 --level-idc 51 --crf 22

2160p: 100% CPU usage, 2.36 fps
Code:
--preset slow --profile main10 --level-idc 51 --crf 22

1080p with avx512: 45-55% CPU usage, 6.54 fps
Code:
--asm avx512 --preset slow --profile main10 --level-idc 41 --crf 18

1080p: 55-65% CPU usage, 7.14 fps
Code:
--preset slow --profile main10 --level-idc 41 --crf 18

Last edited by excellentswordfight; 12th April 2018 at 11:58.
excellentswordfight is offline   Reply With Quote
Old 12th April 2018, 11:10   #6010  |  Link
WhatZit
Registered User
 
Join Date: Aug 2016
Posts: 60
Quote:
Originally Posted by excellentswordfight View Post
I'm guessing that the gains for AVX512 didn't outweight the drop in clockspeed and utilization.
Yep, a Catch-22 also discovered by Cloudfare after some cryptography assessments: https://blog.cloudflare.com/on-the-d...uency-scaling/
WhatZit is offline   Reply With Quote
Old 12th April 2018, 12:47   #6011  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,340
Asmodian runs without a AVX512 offset, which would instantly crash his system if a strong AVX512 workload would run, so clearly its faster with some "light" AVX512 usage. Usually you need at least a -10 offset or such to get it working stable under strong AVX512 load (or boost voltages substantially for more heat). Non-OCed Xeon CPUs probably downlock quite substantially.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 12th April 2018, 14:18   #6012  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 480
x265 v2.7+337-54ff74d2b635 (GCC 7.3.0, 32 & 64-bit 8/10/12bit Multilib Windows Binaries)

Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default
Barough is offline   Reply With Quote
Old 12th April 2018, 14:26   #6013  |  Link
burfadel
Registered User
 
Join Date: Aug 2006
Posts: 2,229
Probably best to utilise AVX-512 where it gives the best gains without triggering thermal throttle. The good thing at least with 307 separate patches this can be whittled down. If a function is frequently used and gives only a small gain, it may actually encode faster if on mitred fire to the throttling the patch causes. Even if throttling isn't triggered on a particular rig, temperature difference should be taken into account to cover typical situations.
burfadel is offline   Reply With Quote
Old 12th April 2018, 17:52   #6014  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,752
x265 2.7+337-54ff74d2b635
  • Merge with default; prep for v3.0
  • Support for HLG-graded content and pic_struct
  • Fix conditions for single-sei NAL
  • Fix 32 bit build error (means: AVX-512 support is only included in x86-64 architecture target)
(VMAF support to report per frame and aggregate VMAF score — unfortunately not yet? available for Windows builds)

New CLI parameters:

Code:
   --atc-sei <integer>           Emit the alternative transfer characteristics SEI message where the integer is the preferred transfer characteristics. Default disabled
   --pic-struct <integer>        Set the picture structure and emits it in the picture timing SEI message. Values in the range 0..12. See D.3.3 of the HEVC spec. for a detailed explanation.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is online now   Reply With Quote
Old 12th April 2018, 17:57   #6015  |  Link
Asmodian
Registered User
 
Join Date: Feb 2002
Location: San Jose, California
Posts: 4,406
Quote:
Originally Posted by nevcairiel View Post
Asmodian runs without a AVX512 offset, which would instantly crash his system if a strong AVX512 workload would run, so clearly its faster with some "light" AVX512 usage. Usually you need at least a -10 offset or such to get it working stable under strong AVX512 load (or boost voltages substantially for more heat). Non-OCed Xeon CPUs probably downlock quite substantially.
I had downclocked from my normal max clocks when running without an AVX offset.

I also ran some tests at my normal OC settings with -2, -4 multiplier offsets. 4.8 GHz max core, 4.6 GHz AVX2, 4.4 GHz AVX-512.

AVX512: encoded 1000 frames in 310.46s (3.22 fps), 4037.41 kb/s, Avg QP:20.64
AVX2: encoded 1000 frames in 335.85s (2.98 fps), 4037.41 kb/s, Avg QP:20.64

It would probably still melt with a heavy AVX-512 load but it also wasn't completely maxed. AVX-512 ran cooler than AVX2 at these settings. I am not sure why my AVX2 run only had the same speed as the previous 4.5 GHz encode, maybe a latency penalty due to the core changing states.

This is a binned, delidded, and water cooled CPU... other systems may have different results.

Edit: If I run Prime95 (p95v294b8) with AVX-512 at 4.5 GHz I do get thermal throttling.
__________________
madVR options explained

Last edited by Asmodian; 12th April 2018 at 19:09.
Asmodian is offline   Reply With Quote
Old 12th April 2018, 20:57   #6016  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,340
Quote:
Originally Posted by Asmodian View Post
Edit: If I run Prime95 (p95v294b8) with AVX-512 at 4.5 GHz I do get thermal throttling.
Try with LinX/Linpack and see your system die. Prime95 does not fully use AVX512 yet (only trial factoring, not full FFTs)
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 12th April 2018, 21:06   #6017  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
Quote:
Originally Posted by nevcairiel View Post
Try with LinX/Linpack and see your system die. Prime95 does not fully use AVX512 yet (only trial factoring, not full FFTs)
It's actually not so bad at higher frequencies, because each 100 MHz increment saves a lot more power, compared to 2.5 GHz server SKUs. i9-7900X can reach 4.1-4.2 GHz AVX-512 frequency with an aftermarket cooling solution.
Stephen R. Savage is offline   Reply With Quote
Old 12th April 2018, 21:37   #6018  |  Link
jlpsvk
Registered User
 
Join Date: Dec 2014
Posts: 240
Quote:
Originally Posted by Kavitha View Post
x265 has static levels of refinement(--refine inter <level>/refine intra <level>) which can be used with --analysis-reuse-level 10.
Efficiency in terms of quality increases as the levels of refinement increases. This quality increase results from additional computation thereby increasing the overall encoding time.
For a better quality-speed trade-off, dynamic refinement was introduced where the encoder dynamically switches between different inter refine levels.
This basically exploits the fact that not all CUs are required to be encoded with same level for better performance/quality.
Considering the complexity of video content and the analysis information from first pass, the encoder can intelligently decide the optimal level of refinement for each CU.
Intra frames are usually encoded with best quality as they are used as references by the consecutive frames. Hence error introduced in intra frames due to reusing analysis data can propagate to frames that use these intra frames as reference.
To minimize the chances of error propagation, refine-intra 4 (level with best quality) restricts reusing analysis data for intra frames and forces the encoder to perform full intra analysis in the second pass.
This is why x265 documentation suggests to use dynamic refinement along with refine-intra 4 and this setting is expected to give improved quality than other refine intra levels for some videos.
any suggested quality wise settings recommendation for 4K HDR encoding? with CRF ie 17?
jlpsvk is offline   Reply With Quote
Old 12th April 2018, 21:37   #6019  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,340
Quote:
Originally Posted by Stephen R. Savage View Post
It's actually not so bad at higher frequencies, because each 100 MHz increment saves a lot more power, compared to 2.5 GHz server SKUs. i9-7900X can reach 4.1-4.2 GHz AVX-512 frequency with an aftermarket cooling solution.
You can reach that if you boost the power you give the CPU, but unfortunately that also boosts the power outside of AVX512 mode, making your CPU overall less efficient. The integrated voltage controller has no option to increase the core voltage only in AVX512 mode, unfortunately.

But this is probably going a bit off-topic for X265.
I would've thought the X265 people already learned the down-clocking lesson with AVX2 though, where they experienced the same effect - fancy instructions that made the overall encode slower, especially on server systems, due to clock changes.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 12th April 2018, 21:44   #6020  |  Link
mandarinka
Registered User
 
mandarinka's Avatar
 
Join Date: Jan 2007
Posts: 729
https://forums.anandtech.com/threads...#post-39149633

Quote:
RZN vs. CFL vs. SKL-X in X265 2.5+31:

RZN: /w AVX2 = 100.00%, /wo AVX2 = 105.21%
CFL: /w AVX2 = 130.61%, /wo AVX2 = 101.13%
SKL-X: w/ AVX2 = 135.47%, /wo AVX2 = 105.21%

Ryzen's performance without AVX2 is impressive, but it is sad to see that there is still a penalty (like on Excavator) when running 256-bit code.
I wish somebody would adjust the CPU detection code to disable AVX2 on Zen. Easy performance gain just from that simple change: Zen gets 5.2% faster by disabling AVX2.
mandarinka is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 09:57.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.